- add activity diagram for metadata deletion process
parent
f276d0f1ae
commit
e4fd1197af
4 changed files with 86 additions and 11 deletions
75
C14-Storage-Integrity.md
Normal file
75
C14-Storage-Integrity.md
Normal file
|
@ -0,0 +1,75 @@
|
|||
# C14.1. Processes and documents to ensure that the repository staff have a clear understanding of all storage locations and how they are managed.
|
||||
|
||||
Tethys RDR (Research Data Repository) is a data archiving service that aims to provide secure, sustainable, and long-term storage of research data. This means that Tethys RDR is designed to ensure that research data is stored in a way that ensures its integrity, security, and accessibility for the long-term, even as technology and data formats change over time. In order to achieve this, various conditions must be met. The following processes and documents have been developed for the repository staff, ensuring that data is stored securely, backed up regularly, and preserved for the long-term.
|
||||
|
||||
## **Disaster recovery plan**
|
||||
|
||||
The [disaster recovery plan for TETHYS](https://gitea.geologie.ac.at/geolba/tethys.backend/wiki/DisasterManagement) specifies how the repository will be recovered from a data loss or system failure. The plan includes procedures for restoring data files from backups, recovering the database and restoring the access to the repository. In the recovery plan there are scenarios for recovering specific data files (if checksum test fails), whole folders for all files of a specific dataset and restoring a file in all available backup versions. In addition, the restoration of the entire IT system using Docker containers is described in detail.
|
||||
|
||||
## **Database Model**
|
||||
|
||||
The [TETHYS database model](https://gitea.geologie.ac.at/geolba/tethys.backend/wiki/Database) is based on a relational database model (PostgreSQL). The database model includes data constraints and validation rules that helps to ensure data integrity. This helps to prevent errors, duplication, and inconsistencies in the data. The model includes several tables that store information about different types of research data with related tables for storing metadata like licenses, authors, contributors, abstracts, titles, subjects, collections, data files, projects and users with permissions.
|
||||
|
||||
## **Data Architecture Diagram**
|
||||
|
||||
With the help of the [Data Architecture Diagram](https://gitea.geologie.ac.at/geolba/tethys.backend/wiki/DataArchitectureDiagram) the whole repository staff has a clear understanding of the management off all storage locations. TETHYS collects and manages scanned geological maps, spatial data sets, and other types of research outputs. The Data Architecture Diagram is providing information on how to prepare data sources for deposit, like licenses, correct use of keywords, file formats and file upload limits.
|
||||
|
||||
**Data storage:** The Data Architecture Diagram also describes the virtual infrastructure ([data storage](https://gitea.geologie.ac.at/geolba/tethys.backend/wiki/DataArchitectureDiagram#3-storage-infrastructure)) used to store the data and metadata in the repository. By using PostgreSQL, TETHYS is able to manage large volumes of metadata and provide fast and secure access to this information. The data files are stored on an Ubuntu 22.04 file server with ext4 partition. Corresponding file checksums md5 and sha512 are also stored in the database.
|
||||
|
||||
**Data Discovery:** TETHYS supports data discovery in various ways. The datasets can always be found through the Data Frontend, https://tethys.at/search, browsing by subject, author, language or year, or by searching inside the metadata attributes title, author or keywords. All visible metadata are indexed and searchable by [Solr](https://tethys.at/solr/rdr_data/select?q=\*%3A\*). Tethys Metadata and File downloads can be queried by a REST API (Representational State Transfer Application Programming Interface), which allows repository staff to interact with the Tethys system and retrieve metadata and data files programmatically.
|
||||
|
||||
For the **metadata management** the Data Architecture Diagram provides information on the specific types of metadata that should be included with the data. This may include information on the format and structure of the metadata, as well as the types of information that should be included. Tethys RDR supports three metadata standards for the metadata export (**Dublin Core, DataCite and ISO19139**).
|
||||
|
||||
**Security and Access Control:** To protect sensitive data from unauthorized access, TETHYS provides an [Access Control List](https://gitea.geologie.ac.at/geolba/tethys.backend/wiki/Database#the-acl-tables-used-by-tethys-are) (ACL) system that is used to manage users, user roles and permissions.
|
||||
|
||||
**System Integration** with other systems involves integrating the research repository with other systems, such as project management tools, data platforms and reporting tools. By providing the Open Archives Initiative Protocol for Metadata Harvesting [(OAI-PMH)](https://www.tethys.at/oai?verb=Identify) any other data provider can harvest TETHYS metadata. An example would be the Bielefeld Academic Search Engine BASE: https://www.base-search.net/Search/Results?q=coll:fttethysrdr&refid=dctablede Matomo is used to track statistics data for TETHYS. Matomo is an open-source web analytics platform that can be used to track user behavior on a website.
|
||||
|
||||
# C14.2. The repository’s strategy for multiple copies.
|
||||
|
||||
IBM Spectrum Protect (formerly known as Tivoli Storage Manager or TSM) is used to protect the data stored in Tethys Repository. Tivoli Spectrum Protect provides a comprehensive backup and recovery solution for research data, which can help ensure data availability, integrity, and recoverability in case of a disaster or data loss. For TETHYS up to 90 incremental versions of each data file will be backed up and are available for recovery. This means that there are 90 possible instances of the data. These versions are typically created based on the backup schedule and retention policies defined by the computer center of the „GeoSphere Austria“.
|
||||
|
||||
Database backup and recovery is also described in the [disaster management plan](https://gitea.geologie.ac.at/geolba/tethys.backend/wiki/DisasterManagement#create-db-backup).
|
||||
|
||||
# C14.3. The risk management techniques used to inform the strategy.
|
||||
|
||||
Risk management techniques for a research data repository involve identifying potential risks to the data, assessing the likelihood and impact of those risks, and implementing strategies to mitigate or manage those risks. The following risk management techniques are used for the TETHYS:
|
||||
|
||||
- **Access Controls** to protect against unauthorized access: The access to the administrative backend is limited to authorized users only. To provide a secure connection, only https is allowed. Fail2ban protects the repository server from brute-force attacks, denial-of-service attacks, and other malicious activities. It works by monitoring log files and dynamically updating firewall rules to block IP addresses that are exhibiting suspicious behavior.
|
||||
- **Back up data regularly**: up to 10 incremental backups of the data are maintained to ensure that it can be recovered in case of data loss or corruption.
|
||||
- **Encrypt sensitive data**: Sensitive data, such as personally identifiable information and passwords are encrypted inside the PostgreSQL database.
|
||||
- **Monitor activity logs**: Activity logs of the webserver are monitored via fail2ban to detect suspicious activities, such as unauthorized access attempts or data exfiltration.
|
||||
- **Implement data retention policies**: Policies for the for data retention and data deletion process are implemented.
|
||||
- **Conduct regular security assessments**: the security of the repository is regularly assessed by the repository staff to identify potential vulnerabilities and implement strategies to address them.
|
||||
|
||||
The potential disruptive physical threats, which can occur at any time and affect the normal business process, are listed below:
|
||||
|
||||
- **Fire**: Fire suppression systems are installed, there are fire and smoke detectors on all floors
|
||||
- **Electric power failure**: Redundant UPS systems with standby generators are available. (Monitoring: 24/7)
|
||||
- **Communication Network loss**: Unfortunately, there is no redundant repository sever in case of network loss. By monitoring the network in real-time and receiving alerts when network loss is detected, the IT department can quickly investigate and resolve issues before they impact end-users.
|
||||
- **Flood**: All critical equipment is located on 2nd floor.
|
||||
- **Sabotage**: Only authorized IT personal have access to the server room.
|
||||
|
||||
# C14.4. Procedures for handling and monitoring deterioration of storage media.
|
||||
|
||||
TETHYS RDR calculates internal checksums during the ingestion workflow. These checksums ensure that ingested data has not been altered or damaged. A checksum is a short sequence of bits or bytes that is calculated from the data using an algorithm. If even a single bit of the data changes, the checksum will also change, indicating that the data has been changed unintentionally on the file store. During the file upload, TETHYS calculates and stores **md5** and **sha512**-checksums for each file.
|
||||
|
||||
# C14.5. Procedures to ensure that data and metadata are only deleted as part of an approved and documented process.
|
||||
|
||||
Tethys Repository assigns a DOI for each published dataset, which means that once the data is published, it cannot be changed or deleted. In exceptional cases (violation of legal rights or subject of a justified complaint), access to the datasets in question may be blocked so that only the DOI can be cited. In this case, only the DOI can be cited, which then redirects to a landing page that explains why the dataset had to be removed. This ensures that the citability and scientific traceability of the dataset are maintained even if access to the dataset itself is restricted. See also the general guideline for publishing research data in the [manual p.14](https://data.tethys.at/docs/HandbuchTethys.pdf#page=15).
|
||||
|
||||
The deletion of a record initiates the following process:
|
||||
|
||||
1. Conduct an investigation: The repository may conduct an investigation to determine whether the dataset or metadata record is in fact in violation of legal rights or is the subject of a legitimate complaint.
|
||||
1. Notify the submitter: If the dataset or metadata record is found to be in violation, the repository may attempt to notify the depositor of the issue and the reason for removal.
|
||||
1. Remove the dataset or metadata record: If the violation is confirmed, the repository administrator removes the dataset or metadata record from its collection.
|
||||
1. Document the removal: The repository administrator documents the removal of the dataset or metadata record, including the reason for removal and any communications with the submitter.
|
||||
1. Review and revise policies: The repository may review and revise its policies and procedures to prevent similar violations from occurring in the future.
|
||||
|
||||

|
||||
|
||||
|
||||
# C14.6. Any checks (i.e. fixity checks) used to verify that a digital object has not been altered or corrupted from deposit to use.
|
||||
|
||||
For internal fixity checks, TETHYS Repository operates an automated cron job that routinely tests all the md5 and sha512-checksums for data stored by the TETHYS Repository and produces a regular report providing appropriate warnings if a silent data corruption is detected in the storage layer. Corresponding code of the cron job can downloaded via [TEHYS Code repository](https://gitea.geologie.ac.at/geolba/tethys.backend/src/branch/master/commands/ValidateChecksum.ts).
|
||||
|
||||
If the web backend is in production mode, the logger writes the error messages as a mail to the administrator. Depending on these warnings, the administrators will investigate the cause of the changes and the corrupted files will be restored from the backup (IBM Spectrum Protect).
|
|
@ -1,4 +1,4 @@
|
|||
## C15.1. The repository software used for deposit, curation, preservation and access management. Whether it is community supported, open source, or locally developed.
|
||||
# C15.1. The repository software used for deposit, curation, preservation and access management. Whether it is community supported, open source, or locally developed.
|
||||
|
||||
The basic technical structure of TETHYS corresponds to a three-tiered client/server architecture with a number of clients and middleware components controlling the information flow and quality. On the server side a RDBMS (PostgreSQL) is used for information storage (metadata, access control lists, file checksums). All metadata is replicated into middleware/frontend systems (Solr) for fast access and search capabilities. All public interfaces on the client side (tethys.frontend) to the information system are standards compliant (W3C, ISO, OGC) and are based on web services (REST API and SOLR search services). By using modern technologies like Bulma, Typescript, Vue3, and Webpack, the TETHYS frontend is able to provide a more engaging and efficient user experience while also improving overall performance. These technologies are widely used in the industry, so it is likely that the frontend will be compatible with a wide range of browsers and devices. The TETHYS REST Service, which is known as tethys.api, is built on top of the Express framework. Express is a popular web framework for Node.js that provides a minimalist set of features for deploying RESTful APIs that are easy to maintain and scale.
|
||||
|
||||
|
@ -7,25 +7,25 @@ In order to comply with international metadata standards such as Dublin Core, Da
|
|||
The new TETHYS editorial system on the server side, which is also known as tethys.backend, is a web-based open-source software that operates directly on the PostgreSQL database. It is built using AdonisJS, which is a Node.js-based web framework that provides a robust set of tools and features for building scalable and secure web applications. AdonisJS also includes an object-relational mapper (ORM) that allows developers to work with database tables and records using object-oriented programming techniques. The styling of the TETHYS backend is built using Tailwind, which is a utility-first CSS framework.
|
||||
|
||||
|
||||
## C15.2. Any IT service management approach followed and the functions this approach specifies (e.g. systems documentation, software inventories, code repositories, infrastructure development planning).
|
||||
# C15.2. Any IT service management approach followed and the functions this approach specifies (e.g. systems documentation, software inventories, code repositories, infrastructure development planning).
|
||||
|
||||
General descriptions of the systems and software used can be found in our [Wiki]( https://gitea.geologie.ac.at/geolba/tethys.backend/wiki/?action=_pages). There you will find public information about the recovery of the Tethys research repository, details about the database model, instructions for starting docker container and a data architecture diagram for a clear understanding of all storage locations. All code repositories are accessible online via a [Gitea instance](https://gitea.geologie.ac.at) hosted at the GeoSphere Austria data center. The associated Tethys Docker images are also securely stored there using the [built-in Docker registry functionality](https://gitea.geologie.ac.at/geolba/-/packages). All necessary configurations to launch the Docker container are described in the wiki. \
|
||||
Internal information about virtual server, maintenance and security settings are stored in a separate, **private** wiki on internal LAN servers. \
|
||||
Hardware Infrastructure is generally renewed every 3-4 years, which is transparent to the system because of virtualization. Operating systems are regularly updated to the latest releases and patches.
|
||||
|
||||
|
||||
## C15.3 Any international, community or other technical infrastructure standards in place and how compliance is monitored.
|
||||
# C15.3 Any international, community or other technical infrastructure standards in place and how compliance is monitored.
|
||||
|
||||
~~GeoSphere Austria is certified according to the international standard ISO 27001 for information security management. Compliance with these standards is monitored by the organization responsible for issuing certification and conducting audits. This organization inspects and monitors the organization's infrastructure and processes to ensure that the relevant standards are met and the certification for GeoSphere Austria can be renewed and compliance is monitored.
|
||||
~~
|
||||
Compliance with international standards in IT is monitored and checked by the organization to ensure that the organization's infrastructure and processes can continue to operate.
|
||||
|
||||
## C15.4. The version control systems used for repository generated software.
|
||||
# C15.4. The version control systems used for repository generated software.
|
||||
|
||||
All the software components used for the three-tiered client/server architecture in the TEHYS research data repository are versioned through GIT and can be accessed online via a [Gitea instance](https://gitea.geologie.ac.at) hosted by the computer center of GeoSphere Austria. The frontend can be accessed at https://gitea.geologie.ac.at/geolba/tethys.frontend, the backend at https://gitea.geologie.ac.at/geolba/tethys.backend, and the API at https://gitea.geologie.ac.at/geolba/tethys.api.\
|
||||
The TETHYS research repository is being developed using the **Continuous Integration/Continuous Deployment (CI/CD)** practice, which involves frequent testing and deployment of code changes to production. Whenever new code is published, automatic tests run in the background (CI), and when new versions are released, new Docker images are automatically deployed (CD). This entire process is implemented using Gitea Actions, a powerful platform that enables automation of various tasks like building, testing, and deploying code, as well as sending notifications such as alerts for failed tests. All workflows are defined in YAML files, such as the [ci.yaml file](https://gitea.geologie.ac.at/geolba/tethys.backend/actions?workflow=ci.yaml&state=closed), which is always triggered whenever new code is committed to the TETHYS backend.
|
||||
|
||||
## C15.5 Measures taken to ensure that availability, bandwidth, and connectivity are sufficient to meet the needs of the designated community.
|
||||
# C15.5 Measures taken to ensure that availability, bandwidth, and connectivity are sufficient to meet the needs of the designated community.
|
||||
|
||||
To ensure that the availability, bandwidth, and connectivity are sufficient to meet the needs of the designated community of TETHYS RDR, the following measures are taken:
|
||||
|
||||
|
@ -39,7 +39,7 @@ The use of professional monitoring software like ICINGA helps to ensure that Tet
|
|||
5. **Multiple access points**: TETHYS provides multiple access points to the repository, including web interfaces, REST APIs, and web applications. TETHYS frontend web application employs a full responsive frontend design to ensure that users can access the data from a variety of devices and platforms. This means that the repository's web interface is optimized to provide an optimal viewing and interaction experience across a wide range of screen sizes and device types, including desktops, laptops, tablets, and smartphones.
|
||||
|
||||
|
||||
## C15.6 Processes in place to monitor and manage the need for technical change, including in response to the changing needs of Preservation C09, and Reuse C13 by the Designated Community.
|
||||
# C15.6 Processes in place to monitor and manage the need for technical change, including in response to the changing needs of Preservation C09, and Reuse C13 by the Designated Community.
|
||||
|
||||
There are several processes that are used to monitor and manage the need of technical changes in software development of Tethys.
|
||||
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
#### C16.1 The levels of security required for differnt data and metadata and environments, and how these are supportet
|
||||
# C16.1 The levels of security required for differnt data and metadata and environments, and how these are supportet
|
||||
For supporting the levels of the required security for data, metadata and environments we have implemented a multi-layered approach to security, which includes physical, technical and administrative controls. Physical controls involve securing access points, restricting visitor access, and monitoring who enters the premises. Strong encryption, a firewall and an antivirus software are used for technical control to secure networks. Administrative controls involve developing security policies and procedures, training employees, and conducting regular security audits.
|
||||
|
||||
#### C16.2. The IT security system, employees with roles related to security and any risk analysis approach in use.
|
||||
# C16.2. The IT security system, employees with roles related to security and any risk analysis approach in use.
|
||||
The IT security system has several different types of employees and roles which are involved to manage the IT security system and perform the risk analyses. This means that there are different roles involved in the IT of Geosphere Austria.
|
||||
|
||||
* The Information Security Analyst is responsible for identifying and managing security risks, as well as developing and implementing security policies and procedures.
|
||||
|
@ -9,14 +9,14 @@ The IT security system has several different types of employees and roles which
|
|||
* The Security Architect is responsible for designing and implementing security systems, as well as ensuring that all security policies and procedures are being followed.
|
||||
* The Administration Team is there to analyze and monitor security data to identify treads and vulnerabilities, and respond to security incidents as needed. They help the organization to develop security strategies and plans. They are also responsible for overseeing all aspects of an organization's security program, including risk management, compliance, and incident response.
|
||||
|
||||
#### C16.3 Measures in place to protect the facility. How the premises where digital objects are held area secured.
|
||||
# C16.3 Measures in place to protect the facility. How the premises where digital objects are held area secured.
|
||||
To premise where digital objects are held there is a multi-layered security system implemented that includes physical, electronic, and procedural controls.
|
||||
* The Physical security measures include surveillance cameras, access control systems, and perimeter security to prevent unauthorized entry.
|
||||
* The Electronic security measures include the firewall, intrusion detection system, and encryption to protect digital data from cyber threats.
|
||||
* The Procedural controls include security policies and procedures, employee training, and background checks to ensure that everyone who has access to the digital objects follows the appropriate security protocols.
|
||||
|
||||
#### C16.4 Any security-specific standards the repository references or compiles with.
|
||||
# C16.4 Any security-specific standards the repository references or compiles with.
|
||||
We have one of the most well-known standards at the moment: ISO/IEC 27001. This is a standard that provides a framework for establishing, implementing, maintaining, and continually improving information security management systems.
|
||||
|
||||
#### C16.5 Any authentification and authorization protectures employed to securely manage access to system use.
|
||||
# C16.5 Any authentification and authorization protectures employed to securely manage access to system use.
|
||||
For authentification and authorization protectures employed to securely manage access to Tethys, we are using LDAP, SAML, and Keycloak to ensure access to system usage. LDAP is used for user authentication and authorization, while SAML provides a secure way to exchange authentication and authorization data between different systems. Keycloak as an identity and access management solution is integrated with both LDAP and SAML, allowing for easy management of user identities and credentials. Overall, this combination of technologies provides a secure and reliable way to authenticate users and ensure that only authorized individuals have access to Tethys.
|
BIN
images/Metadata_deletion_process.png
Normal file
BIN
images/Metadata_deletion_process.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 43 KiB |
Loading…
Add table
Reference in a new issue