- Documents recent infrastructure cleanup (8 CTs destroyed, 35 DNS records removed, Headscale cleanup) - Adds 24 new runbooks covering Authentik, PeerTube, Meshtastic, RECON, Proxmox, Mailcow, Internet Archive, GPU routing - Adds project documentation for headscale, vaultwarden, peertube, matrix, mmud, advbbs, arr stack - Updates services.md, environment.md, caddy.md, authentik.md to match live infrastructure - Removes 4 deprecated runbook duplicates (canonical versions live in projects/) - Adds .gitignore for binary archives and editor temp files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6.9 KiB
Internet Archive CLI Reference
Quick reference for the ia command-line tool on pi-nas.
Location & Setup
| Detail | Value |
|---|---|
| Host | pi-nas (192.168.1.245 / 100.64.0.21) |
| Binary | ia (v5.7.2, pip-installed) |
| Config | ~/.config/internetarchive/ia.ini |
1. Configure / Authenticate
Required for uploads, metadata edits, and accessing restricted items. Not required for public downloads or searches.
ia configure
# Prompts for archive.org email + password
# Stores credentials in ~/.config/internetarchive/ia.ini
Verify:
ia configure --help # Should show options without errors
2. Search
Search the archive.org catalog. Returns JSON by default.
Basic syntax
ia search '<query>'
Query syntax
Queries use Lucene syntax. Combine fields with AND/OR, quote phrases.
| Field | Example | Notes |
|---|---|---|
collection |
collection:prelinger |
Items in a specific collection |
subject |
subject:"ham radio" |
Subject/tag match |
mediatype |
mediatype:texts |
texts, movies, audio, software, image, data, web, collection |
creator |
creator:"ARRL" |
Author/creator |
title |
title:"emergency" |
Item title |
date |
date:[2020-01-01 TO 2024-12-31] |
Date range (YYYY-MM-DD) |
year |
year:2023 |
Shorthand for year |
language |
language:eng |
ISO language code |
licenseurl |
licenseurl:*creativecommons* |
License filter |
Combined queries
# PDFs about ham radio published after 2020
ia search 'subject:"ham radio" mediatype:texts date:[2020-01-01 TO 2099-12-31]'
# All items in a specific collection
ia search 'collection:prelinger'
# Creator + mediatype
ia search 'creator:"ARRL" AND mediatype:texts'
Output options
# Default: JSON objects, one per line
ia search 'collection:prelinger'
# Itemlist mode — outputs only identifiers, one per line
# Pipe this to ia download --itemlist
ia search 'collection:prelinger' --itemlist
# Save itemlist to file
ia search 'collection:prelinger' --itemlist > prelinger-items.txt
# Limit results with parameters
ia search 'subject:radio' --parameters='rows=50'
# Count results without downloading them all
ia search 'collection:prelinger' --num-found
Practical examples
# Find all items in a collection and count them
ia search 'collection:arrl_qst' --num-found
# Get identifiers for bulk download
ia search 'collection:arrl_qst' --itemlist > arrl-items.txt
# Search within a collection for specific subjects
ia search 'collection:prelinger subject:"san francisco"' --itemlist
# Find audio recordings by a specific creator
ia search 'creator:"Grateful Dead" mediatype:audio' --itemlist
# Search for items with specific file formats available
ia search 'collection:librivoxaudio format:"64Kbps MP3"' --itemlist
3. List Item Contents
View files within an item without downloading.
# List all files in an item
ia list <identifier>
# Example
ia list prelinger_films
Output shows filenames, sizes, and formats.
4. Metadata
View and modify item metadata.
Read metadata
# Full metadata as JSON
ia metadata <identifier>
# Pretty-print with jq
ia metadata <identifier> | jq .
# Get specific fields
ia metadata <identifier> | jq '.metadata.title'
ia metadata <identifier> | jq '.metadata.subject'
ia metadata <identifier> | jq '.metadata.collection'
# List available formats for an item
ia metadata <identifier> --formats
Modify metadata (requires authentication)
# Set a field
ia metadata <identifier> --modify="description:Updated description"
# Remove a field
ia metadata <identifier> --modify="subject:REMOVE_TAG"
# Append to existing value
ia metadata <identifier> --append="subject:new-tag"
# Add to array field
ia metadata <identifier> --append-list="collection:another-collection"
# Bulk modify from CSV (must have 'identifier' column)
ia metadata --spreadsheet=metadata.csv
5. Upload (requires authentication)
# Upload files to a new or existing item
ia upload <identifier> file1.pdf file2.pdf \
--metadata="mediatype:texts" \
--metadata="title:My Upload" \
--metadata="subject:test"
# Upload from stdin
curl -sL https://example.com/file.pdf | \
ia upload <identifier> - --remote-name=file.pdf
# Retry on failure
ia upload <identifier> largefile.zip --retries 10
# Bulk upload from CSV (requires 'identifier' and 'file' columns)
ia upload --spreadsheet=uploads.csv
Important: mediatype cannot be changed after initial upload.
6. Delete (requires authentication)
# Delete a specific file
ia delete <identifier> filename.pdf
# Delete file and all its derivatives
ia delete <identifier> filename.pdf --cascade
# Delete all files in an item
ia delete <identifier> --all
Deleted files are backed up to history/files/ automatically.
7. Copy / Move
# Copy a file between items
ia copy source-item/file.pdf dest-item/file.pdf
# Copy with metadata for new items
ia copy source/file.pdf new-item/file.pdf --metadata="title:Copied Item"
# Move (copy + delete source)
ia move source-item/file.pdf dest-item/file.pdf
8. Tasks
View catalog processing tasks (derive jobs, uploads in progress, etc.).
# Tasks for a specific item
ia tasks <identifier>
# All your queued/running tasks
ia tasks
Command Quick Reference
| Command | Alias | Purpose |
|---|---|---|
ia configure |
ia co |
Set up credentials |
ia search |
ia se |
Search catalog |
ia download |
ia do |
Download files |
ia list |
ia ls |
List item files |
ia metadata |
ia md |
View/edit metadata |
ia upload |
ia up |
Upload files |
ia delete |
ia rm |
Delete files |
ia copy |
ia cp |
Copy between items |
ia move |
ia mv |
Move between items |
ia tasks |
ia ta |
View task queue |
Global Flags
| Flag | Short | Purpose |
|---|---|---|
--help |
-h |
Show help |
--version |
-v |
Show version |
--config-file FILE |
-c |
Use alternate config |
--log |
-l |
Enable logging |
--debug |
-d |
Verbose debug output |
--insecure |
-i |
Use HTTP instead of HTTPS |
Troubleshooting
"You need to be logged in"
Run ia configure and enter your archive.org credentials. Verify with:
cat ~/.config/internetarchive/ia.ini
Search returns no results
- Check query syntax — field names are case-sensitive
- Use quotes around multi-word values:
subject:"ham radio"notsubject:ham radio - Verify the collection/identifier exists:
ia metadata <identifier>
Slow searches
Large collections can take minutes to enumerate. Use --parameters='rows=100' to limit during testing, or --num-found to just get the count first.
Rate limiting
Archive.org may throttle aggressive requests. Space out bulk operations and use --retries on downloads.
Last updated: 2026-02-14