# DataCite Update Command AdonisJS Ace command for updating DataCite DOI records for published datasets. ## Overview The `update:datacite` command synchronizes your local dataset metadata with DataCite DOI records. It intelligently compares modification dates to only update records when necessary, reducing unnecessary API calls and maintaining data consistency. ## Command Syntax ```bash node ace update:datacite [options] ``` ## Options | Flag | Alias | Description | |------|-------|-------------| | `--publish_id ` | `-p` | Update a specific dataset by publish_id | | `--force` | `-f` | Force update all records regardless of modification date | | `--dry-run` | `-d` | Preview what would be updated without making changes | | `--stats` | `-s` | Show detailed statistics for datasets that need updating | ## Usage Examples ### Basic Operations ```bash # Update all datasets that have been modified since their DOI was last updated node ace update:datacite # Update a specific dataset node ace update:datacite --publish_id 231 node ace update:datacite -p 231 # Force update all datasets with DOIs (ignores modification dates) node ace update:datacite --force ``` ### Preview and Analysis ```bash # Preview what would be updated (dry run) node ace update:datacite --dry-run # Show detailed statistics for datasets that need updating node ace update:datacite --stats # Show stats for a specific dataset node ace update:datacite --stats --publish_id 231 ``` ### Combined Options ```bash # Dry run for a specific dataset node ace update:datacite --dry-run --publish_id 231 # Show stats for all datasets (including up-to-date ones) node ace update:datacite --stats --force ``` ## Command Modes ### 1. **Normal Mode** (Default) Updates DataCite records for datasets that have been modified since their DOI was last updated. **Example Output:** ``` Using DataCite API: https://api.test.datacite.org Found 50 datasets to process Dataset 231: Successfully updated DataCite record Dataset 245: Up to date, skipping Dataset 267: Successfully updated DataCite record DataCite update completed. Updated: 15, Skipped: 35, Errors: 0 ``` ### 2. **Dry Run Mode** (`--dry-run`) Shows what would be updated without making any changes to DataCite. **Use Case:** Preview updates before running the actual command. **Example Output:** ``` Dataset 231: Would update DataCite record (dry run) Dataset 267: Would update DataCite record (dry run) Dataset 245: Up to date, skipping DataCite update completed. Updated: 2, Skipped: 1, Errors: 0 ``` ### 3. **Stats Mode** (`--stats`) Shows detailed information for each dataset that needs updating, including why it needs updating. **Use Case:** Debug synchronization issues, monitor dataset/DOI status, generate reports. **Example Output:** ``` ┌─ Dataset 231 ───────────────────────────────────────────────────────── │ DOI Value: 10.21388/tethys.231 │ DOI Status (DB): findable │ DOI State (DataCite): findable │ Dataset Modified: 2024-09-15T10:30:00.000Z │ DOI Modified: 2024-09-10T08:15:00.000Z │ Needs Update: YES - Dataset newer than DOI └─────────────────────────────────────────────────────────────────────── ┌─ Dataset 267 ───────────────────────────────────────────────────────── │ DOI Value: 10.21388/tethys.267 │ DOI Status (DB): findable │ DOI State (DataCite): findable │ Dataset Modified: 2024-09-18T14:20:00.000Z │ DOI Modified: 2024-09-16T12:45:00.000Z │ Needs Update: YES - Dataset newer than DOI └─────────────────────────────────────────────────────────────────────── DataCite Stats Summary: 2 datasets need updating, 48 are up to date ``` ## Update Logic The command uses intelligent update detection: 1. **Compares modification dates**: Dataset `server_date_modified` vs DOI last modification date from DataCite 2. **Validates data integrity**: Checks for missing or future dates 3. **Handles API failures gracefully**: Updates anyway if DataCite info can't be retrieved 4. **Uses dual API approach**: DataCite REST API (primary) with MDS API fallback ### When Updates Happen | Condition | Action | Reason | |-----------|--------|--------| | Dataset modified > DOI modified | ✅ Update | Dataset has newer changes | | Dataset modified ≤ DOI modified | ❌ Skip | DOI is up to date | | Dataset date in future | ❌ Skip | Invalid data, needs investigation | | Dataset date missing | ✅ Update | Can't determine staleness | | DataCite API error | ✅ Update | Better safe than sorry | | `--force` flag used | ✅ Update | Override all logic | ## Environment Configuration Required environment variables: ```bash # DataCite Credentials DATACITE_USERNAME=your_username DATACITE_PASSWORD=your_password # API Endpoints (environment-specific) DATACITE_API_URL=https://api.test.datacite.org # Test environment DATACITE_SERVICE_URL=https://mds.test.datacite.org # Test MDS DATACITE_API_URL=https://api.datacite.org # Production DATACITE_SERVICE_URL=https://mds.datacite.org # Production MDS # Project Configuration DATACITE_PREFIX=10.21388 # Your DOI prefix BASE_DOMAIN=tethys.at # Your domain ``` ## Error Handling The command handles various error scenarios: - **Invalid modification dates**: Logs errors but continues processing other datasets - **DataCite API failures**: Falls back to MDS API, then to safe update - **Missing DOI identifiers**: Skips datasets without DOI identifiers - **Network issues**: Continues with next dataset after logging error ## Integration The command integrates with: - **Dataset Model**: Uses `server_date_modified` for change detection - **DatasetIdentifier Model**: Reads DOI values and status - **OpenSearch Index**: Updates search index after DataCite update - **DoiClient**: Handles all DataCite API interactions ## Common Workflows ### Daily Maintenance ```bash # Update any datasets modified today node ace update:datacite ``` ### Pre-Deployment Check ```bash # Check what would be updated before deployment node ace update:datacite --dry-run ``` ### Debugging Sync Issues ```bash # Investigate why specific dataset isn't syncing node ace update:datacite --stats --publish_id 231 ``` ### Full Resync ```bash # Force update all DOI records (use with caution) node ace update:datacite --force ``` ### Monitoring Report ```bash # Generate sync status report node ace update:datacite --stats > datacite-sync-report.txt ``` ## Best Practices 1. **Regular Updates**: Run daily or after bulk dataset modifications 2. **Test First**: Use `--dry-run` or `--stats` before bulk operations 3. **Monitor Logs**: Check for data integrity warnings 4. **Environment Separation**: Use correct API URLs for test vs production 5. **Rate Limiting**: The command handles DataCite rate limits automatically