- Documents recent infrastructure cleanup (8 CTs destroyed, 35 DNS records removed, Headscale cleanup) - Adds 24 new runbooks covering Authentik, PeerTube, Meshtastic, RECON, Proxmox, Mailcow, Internet Archive, GPU routing - Adds project documentation for headscale, vaultwarden, peertube, matrix, mmud, advbbs, arr stack - Updates services.md, environment.md, caddy.md, authentik.md to match live infrastructure - Removes 4 deprecated runbook duplicates (canonical versions live in projects/) - Adds .gitignore for binary archives and editor temp files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
300 lines
6.9 KiB
Markdown
300 lines
6.9 KiB
Markdown
# Internet Archive CLI Reference
|
|
|
|
Quick reference for the `ia` command-line tool on pi-nas.
|
|
|
|
---
|
|
|
|
## Location & Setup
|
|
|
|
| Detail | Value |
|
|
|--------|-------|
|
|
| Host | pi-nas (192.168.1.245 / 100.64.0.21) |
|
|
| Binary | `ia` (v5.7.2, pip-installed) |
|
|
| Config | `~/.config/internetarchive/ia.ini` |
|
|
|
|
---
|
|
|
|
## 1. Configure / Authenticate
|
|
|
|
Required for uploads, metadata edits, and accessing restricted items. Not required for public downloads or searches.
|
|
|
|
```bash
|
|
ia configure
|
|
# Prompts for archive.org email + password
|
|
# Stores credentials in ~/.config/internetarchive/ia.ini
|
|
```
|
|
|
|
Verify:
|
|
|
|
```bash
|
|
ia configure --help # Should show options without errors
|
|
```
|
|
|
|
---
|
|
|
|
## 2. Search
|
|
|
|
Search the archive.org catalog. Returns JSON by default.
|
|
|
|
### Basic syntax
|
|
|
|
```bash
|
|
ia search '<query>'
|
|
```
|
|
|
|
### Query syntax
|
|
|
|
Queries use Lucene syntax. Combine fields with AND/OR, quote phrases.
|
|
|
|
| Field | Example | Notes |
|
|
|-------|---------|-------|
|
|
| `collection` | `collection:prelinger` | Items in a specific collection |
|
|
| `subject` | `subject:"ham radio"` | Subject/tag match |
|
|
| `mediatype` | `mediatype:texts` | texts, movies, audio, software, image, data, web, collection |
|
|
| `creator` | `creator:"ARRL"` | Author/creator |
|
|
| `title` | `title:"emergency"` | Item title |
|
|
| `date` | `date:[2020-01-01 TO 2024-12-31]` | Date range (YYYY-MM-DD) |
|
|
| `year` | `year:2023` | Shorthand for year |
|
|
| `language` | `language:eng` | ISO language code |
|
|
| `licenseurl` | `licenseurl:*creativecommons*` | License filter |
|
|
|
|
### Combined queries
|
|
|
|
```bash
|
|
# PDFs about ham radio published after 2020
|
|
ia search 'subject:"ham radio" mediatype:texts date:[2020-01-01 TO 2099-12-31]'
|
|
|
|
# All items in a specific collection
|
|
ia search 'collection:prelinger'
|
|
|
|
# Creator + mediatype
|
|
ia search 'creator:"ARRL" AND mediatype:texts'
|
|
```
|
|
|
|
### Output options
|
|
|
|
```bash
|
|
# Default: JSON objects, one per line
|
|
ia search 'collection:prelinger'
|
|
|
|
# Itemlist mode — outputs only identifiers, one per line
|
|
# Pipe this to ia download --itemlist
|
|
ia search 'collection:prelinger' --itemlist
|
|
|
|
# Save itemlist to file
|
|
ia search 'collection:prelinger' --itemlist > prelinger-items.txt
|
|
|
|
# Limit results with parameters
|
|
ia search 'subject:radio' --parameters='rows=50'
|
|
|
|
# Count results without downloading them all
|
|
ia search 'collection:prelinger' --num-found
|
|
```
|
|
|
|
### Practical examples
|
|
|
|
```bash
|
|
# Find all items in a collection and count them
|
|
ia search 'collection:arrl_qst' --num-found
|
|
|
|
# Get identifiers for bulk download
|
|
ia search 'collection:arrl_qst' --itemlist > arrl-items.txt
|
|
|
|
# Search within a collection for specific subjects
|
|
ia search 'collection:prelinger subject:"san francisco"' --itemlist
|
|
|
|
# Find audio recordings by a specific creator
|
|
ia search 'creator:"Grateful Dead" mediatype:audio' --itemlist
|
|
|
|
# Search for items with specific file formats available
|
|
ia search 'collection:librivoxaudio format:"64Kbps MP3"' --itemlist
|
|
```
|
|
|
|
---
|
|
|
|
## 3. List Item Contents
|
|
|
|
View files within an item without downloading.
|
|
|
|
```bash
|
|
# List all files in an item
|
|
ia list <identifier>
|
|
|
|
# Example
|
|
ia list prelinger_films
|
|
```
|
|
|
|
Output shows filenames, sizes, and formats.
|
|
|
|
---
|
|
|
|
## 4. Metadata
|
|
|
|
View and modify item metadata.
|
|
|
|
### Read metadata
|
|
|
|
```bash
|
|
# Full metadata as JSON
|
|
ia metadata <identifier>
|
|
|
|
# Pretty-print with jq
|
|
ia metadata <identifier> | jq .
|
|
|
|
# Get specific fields
|
|
ia metadata <identifier> | jq '.metadata.title'
|
|
ia metadata <identifier> | jq '.metadata.subject'
|
|
ia metadata <identifier> | jq '.metadata.collection'
|
|
|
|
# List available formats for an item
|
|
ia metadata <identifier> --formats
|
|
```
|
|
|
|
### Modify metadata (requires authentication)
|
|
|
|
```bash
|
|
# Set a field
|
|
ia metadata <identifier> --modify="description:Updated description"
|
|
|
|
# Remove a field
|
|
ia metadata <identifier> --modify="subject:REMOVE_TAG"
|
|
|
|
# Append to existing value
|
|
ia metadata <identifier> --append="subject:new-tag"
|
|
|
|
# Add to array field
|
|
ia metadata <identifier> --append-list="collection:another-collection"
|
|
|
|
# Bulk modify from CSV (must have 'identifier' column)
|
|
ia metadata --spreadsheet=metadata.csv
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Upload (requires authentication)
|
|
|
|
```bash
|
|
# Upload files to a new or existing item
|
|
ia upload <identifier> file1.pdf file2.pdf \
|
|
--metadata="mediatype:texts" \
|
|
--metadata="title:My Upload" \
|
|
--metadata="subject:test"
|
|
|
|
# Upload from stdin
|
|
curl -sL https://example.com/file.pdf | \
|
|
ia upload <identifier> - --remote-name=file.pdf
|
|
|
|
# Retry on failure
|
|
ia upload <identifier> largefile.zip --retries 10
|
|
|
|
# Bulk upload from CSV (requires 'identifier' and 'file' columns)
|
|
ia upload --spreadsheet=uploads.csv
|
|
```
|
|
|
|
**Important:** `mediatype` cannot be changed after initial upload.
|
|
|
|
---
|
|
|
|
## 6. Delete (requires authentication)
|
|
|
|
```bash
|
|
# Delete a specific file
|
|
ia delete <identifier> filename.pdf
|
|
|
|
# Delete file and all its derivatives
|
|
ia delete <identifier> filename.pdf --cascade
|
|
|
|
# Delete all files in an item
|
|
ia delete <identifier> --all
|
|
```
|
|
|
|
Deleted files are backed up to `history/files/` automatically.
|
|
|
|
---
|
|
|
|
## 7. Copy / Move
|
|
|
|
```bash
|
|
# Copy a file between items
|
|
ia copy source-item/file.pdf dest-item/file.pdf
|
|
|
|
# Copy with metadata for new items
|
|
ia copy source/file.pdf new-item/file.pdf --metadata="title:Copied Item"
|
|
|
|
# Move (copy + delete source)
|
|
ia move source-item/file.pdf dest-item/file.pdf
|
|
```
|
|
|
|
---
|
|
|
|
## 8. Tasks
|
|
|
|
View catalog processing tasks (derive jobs, uploads in progress, etc.).
|
|
|
|
```bash
|
|
# Tasks for a specific item
|
|
ia tasks <identifier>
|
|
|
|
# All your queued/running tasks
|
|
ia tasks
|
|
```
|
|
|
|
---
|
|
|
|
## Command Quick Reference
|
|
|
|
| Command | Alias | Purpose |
|
|
|---------|-------|---------|
|
|
| `ia configure` | `ia co` | Set up credentials |
|
|
| `ia search` | `ia se` | Search catalog |
|
|
| `ia download` | `ia do` | Download files |
|
|
| `ia list` | `ia ls` | List item files |
|
|
| `ia metadata` | `ia md` | View/edit metadata |
|
|
| `ia upload` | `ia up` | Upload files |
|
|
| `ia delete` | `ia rm` | Delete files |
|
|
| `ia copy` | `ia cp` | Copy between items |
|
|
| `ia move` | `ia mv` | Move between items |
|
|
| `ia tasks` | `ia ta` | View task queue |
|
|
|
|
---
|
|
|
|
## Global Flags
|
|
|
|
| Flag | Short | Purpose |
|
|
|------|-------|---------|
|
|
| `--help` | `-h` | Show help |
|
|
| `--version` | `-v` | Show version |
|
|
| `--config-file FILE` | `-c` | Use alternate config |
|
|
| `--log` | `-l` | Enable logging |
|
|
| `--debug` | `-d` | Verbose debug output |
|
|
| `--insecure` | `-i` | Use HTTP instead of HTTPS |
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### "You need to be logged in"
|
|
|
|
Run `ia configure` and enter your archive.org credentials. Verify with:
|
|
|
|
```bash
|
|
cat ~/.config/internetarchive/ia.ini
|
|
```
|
|
|
|
### Search returns no results
|
|
|
|
- Check query syntax — field names are case-sensitive
|
|
- Use quotes around multi-word values: `subject:"ham radio"` not `subject:ham radio`
|
|
- Verify the collection/identifier exists: `ia metadata <identifier>`
|
|
|
|
### Slow searches
|
|
|
|
Large collections can take minutes to enumerate. Use `--parameters='rows=100'` to limit during testing, or `--num-found` to just get the count first.
|
|
|
|
### Rate limiting
|
|
|
|
Archive.org may throttle aggressive requests. Space out bulk operations and use `--retries` on downloads.
|
|
|
|
---
|
|
|
|
*Last updated: 2026-02-14*
|