- fix: Update API routes to include DOI URL handling and improve route organization - chore: Add ORCID preload rule file and ensure proper registration - docs: Add MIT License to the project for open-source compliance - feat: Implement command to detect and fix missing dataset cross-references - feat: Create command for updating DataCite DOI records with detailed logging and error handling - docs: Add comprehensive documentation for dataset indexing command - docs: Create detailed documentation for DataCite update command with usage examples and error handling
216 lines
No EOL
7.3 KiB
Markdown
216 lines
No EOL
7.3 KiB
Markdown
# DataCite Update Command
|
|
|
|
AdonisJS Ace command for updating DataCite DOI records for published datasets.
|
|
|
|
## Overview
|
|
|
|
The `update:datacite` command synchronizes your local dataset metadata with DataCite DOI records. It intelligently compares modification dates to only update records when necessary, reducing unnecessary API calls and maintaining data consistency.
|
|
|
|
## Command Syntax
|
|
|
|
```bash
|
|
node ace update:datacite [options]
|
|
```
|
|
|
|
## Options
|
|
|
|
| Flag | Alias | Description |
|
|
|------|-------|-------------|
|
|
| `--publish_id <number>` | `-p` | Update a specific dataset by publish_id |
|
|
| `--force` | `-f` | Force update all records regardless of modification date |
|
|
| `--dry-run` | `-d` | Preview what would be updated without making changes |
|
|
| `--stats` | `-s` | Show detailed statistics for datasets that need updating |
|
|
|
|
## Usage Examples
|
|
|
|
### Basic Operations
|
|
|
|
```bash
|
|
# Update all datasets that have been modified since their DOI was last updated
|
|
node ace update:datacite
|
|
|
|
# Update a specific dataset
|
|
node ace update:datacite --publish_id 231
|
|
node ace update:datacite -p 231
|
|
|
|
# Force update all datasets with DOIs (ignores modification dates)
|
|
node ace update:datacite --force
|
|
```
|
|
|
|
### Preview and Analysis
|
|
|
|
```bash
|
|
# Preview what would be updated (dry run)
|
|
node ace update:datacite --dry-run
|
|
|
|
# Show detailed statistics for datasets that need updating
|
|
node ace update:datacite --stats
|
|
|
|
# Show stats for a specific dataset
|
|
node ace update:datacite --stats --publish_id 231
|
|
```
|
|
|
|
### Combined Options
|
|
|
|
```bash
|
|
# Dry run for a specific dataset
|
|
node ace update:datacite --dry-run --publish_id 231
|
|
|
|
# Show stats for all datasets (including up-to-date ones)
|
|
node ace update:datacite --stats --force
|
|
```
|
|
|
|
## Command Modes
|
|
|
|
### 1. **Normal Mode** (Default)
|
|
Updates DataCite records for datasets that have been modified since their DOI was last updated.
|
|
|
|
**Example Output:**
|
|
```
|
|
Using DataCite API: https://api.test.datacite.org
|
|
Found 50 datasets to process
|
|
Dataset 231: Successfully updated DataCite record
|
|
Dataset 245: Up to date, skipping
|
|
Dataset 267: Successfully updated DataCite record
|
|
DataCite update completed. Updated: 15, Skipped: 35, Errors: 0
|
|
```
|
|
|
|
### 2. **Dry Run Mode** (`--dry-run`)
|
|
Shows what would be updated without making any changes to DataCite.
|
|
|
|
**Use Case:** Preview updates before running the actual command.
|
|
|
|
**Example Output:**
|
|
```
|
|
Dataset 231: Would update DataCite record (dry run)
|
|
Dataset 267: Would update DataCite record (dry run)
|
|
Dataset 245: Up to date, skipping
|
|
DataCite update completed. Updated: 2, Skipped: 1, Errors: 0
|
|
```
|
|
|
|
### 3. **Stats Mode** (`--stats`)
|
|
Shows detailed information for each dataset that needs updating, including why it needs updating.
|
|
|
|
**Use Case:** Debug synchronization issues, monitor dataset/DOI status, generate reports.
|
|
|
|
**Example Output:**
|
|
```
|
|
┌─ Dataset 231 ─────────────────────────────────────────────────────────
|
|
│ DOI Value: 10.21388/tethys.231
|
|
│ DOI Status (DB): findable
|
|
│ DOI State (DataCite): findable
|
|
│ Dataset Modified: 2024-09-15T10:30:00.000Z
|
|
│ DOI Modified: 2024-09-10T08:15:00.000Z
|
|
│ Needs Update: YES - Dataset newer than DOI
|
|
└───────────────────────────────────────────────────────────────────────
|
|
|
|
┌─ Dataset 267 ─────────────────────────────────────────────────────────
|
|
│ DOI Value: 10.21388/tethys.267
|
|
│ DOI Status (DB): findable
|
|
│ DOI State (DataCite): findable
|
|
│ Dataset Modified: 2024-09-18T14:20:00.000Z
|
|
│ DOI Modified: 2024-09-16T12:45:00.000Z
|
|
│ Needs Update: YES - Dataset newer than DOI
|
|
└───────────────────────────────────────────────────────────────────────
|
|
|
|
DataCite Stats Summary: 2 datasets need updating, 48 are up to date
|
|
```
|
|
|
|
## Update Logic
|
|
|
|
The command uses intelligent update detection:
|
|
|
|
1. **Compares modification dates**: Dataset `server_date_modified` vs DOI last modification date from DataCite
|
|
2. **Validates data integrity**: Checks for missing or future dates
|
|
3. **Handles API failures gracefully**: Updates anyway if DataCite info can't be retrieved
|
|
4. **Uses dual API approach**: DataCite REST API (primary) with MDS API fallback
|
|
|
|
### When Updates Happen
|
|
|
|
| Condition | Action | Reason |
|
|
|-----------|--------|--------|
|
|
| Dataset modified > DOI modified | ✅ Update | Dataset has newer changes |
|
|
| Dataset modified ≤ DOI modified | ❌ Skip | DOI is up to date |
|
|
| Dataset date in future | ❌ Skip | Invalid data, needs investigation |
|
|
| Dataset date missing | ✅ Update | Can't determine staleness |
|
|
| DataCite API error | ✅ Update | Better safe than sorry |
|
|
| `--force` flag used | ✅ Update | Override all logic |
|
|
|
|
## Environment Configuration
|
|
|
|
Required environment variables:
|
|
|
|
```bash
|
|
# DataCite Credentials
|
|
DATACITE_USERNAME=your_username
|
|
DATACITE_PASSWORD=your_password
|
|
|
|
# API Endpoints (environment-specific)
|
|
DATACITE_API_URL=https://api.test.datacite.org # Test environment
|
|
DATACITE_SERVICE_URL=https://mds.test.datacite.org # Test MDS
|
|
|
|
DATACITE_API_URL=https://api.datacite.org # Production
|
|
DATACITE_SERVICE_URL=https://mds.datacite.org # Production MDS
|
|
|
|
# Project Configuration
|
|
DATACITE_PREFIX=10.21388 # Your DOI prefix
|
|
BASE_DOMAIN=tethys.at # Your domain
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
The command handles various error scenarios:
|
|
|
|
- **Invalid modification dates**: Logs errors but continues processing other datasets
|
|
- **DataCite API failures**: Falls back to MDS API, then to safe update
|
|
- **Missing DOI identifiers**: Skips datasets without DOI identifiers
|
|
- **Network issues**: Continues with next dataset after logging error
|
|
|
|
## Integration
|
|
|
|
The command integrates with:
|
|
|
|
- **Dataset Model**: Uses `server_date_modified` for change detection
|
|
- **DatasetIdentifier Model**: Reads DOI values and status
|
|
- **OpenSearch Index**: Updates search index after DataCite update
|
|
- **DoiClient**: Handles all DataCite API interactions
|
|
|
|
## Common Workflows
|
|
|
|
### Daily Maintenance
|
|
```bash
|
|
# Update any datasets modified today
|
|
node ace update:datacite
|
|
```
|
|
|
|
### Pre-Deployment Check
|
|
```bash
|
|
# Check what would be updated before deployment
|
|
node ace update:datacite --dry-run
|
|
```
|
|
|
|
### Debugging Sync Issues
|
|
```bash
|
|
# Investigate why specific dataset isn't syncing
|
|
node ace update:datacite --stats --publish_id 231
|
|
```
|
|
|
|
### Full Resync
|
|
```bash
|
|
# Force update all DOI records (use with caution)
|
|
node ace update:datacite --force
|
|
```
|
|
|
|
### Monitoring Report
|
|
```bash
|
|
# Generate sync status report
|
|
node ace update:datacite --stats > datacite-sync-report.txt
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
1. **Regular Updates**: Run daily or after bulk dataset modifications
|
|
2. **Test First**: Use `--dry-run` or `--stats` before bulk operations
|
|
3. **Monitor Logs**: Check for data integrity warnings
|
|
4. **Environment Separation**: Use correct API URLs for test vs production
|
|
5. **Rate Limiting**: The command handles DataCite rate limits automatically |