- Document location extraction feature with lat/lon/address - Add configuration example for LLM prompt - Document web search collector with Google Search grounding - Update intel output schema with location fields 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> |
||
|---|---|---|
| app | ||
| config | ||
| .env.example | ||
| .gitignore | ||
| docker-compose.yml | ||
| Dockerfile | ||
| README.md | ||
| requirements.txt | ||
CIPHER-OSINT
Open Source Intelligence Plugin for CIPHER
Collects and analyzes news and RSS feeds using LLM for content analysis, categorization, and priority assessment.
Features
- RSS/Atom Feed Collection - Fetches from configurable news sources
- Web Search Collection - Active intelligence gathering using Gemini with Google Search grounding
- LLM Analysis - Uses Gemini/OpenAI/Anthropic for content analysis
- Automatic Categorization - Assigns global/cyber/regional/local categories
- Priority Assessment - Determines critical/high/normal/low priority
- Entity Extraction - Identifies people, organizations, places
- Tag Generation - Auto-generates relevant tags
- Location Extraction - Extracts geographic coordinates for map visualization
Quick Start
1. Configure Environment
cp .env.example .env
# Edit .env with your settings (especially GEMINI_API_KEY)
2. Start with Docker
# Ensure cipher-net network exists
docker network create cipher-net 2>/dev/null || true
# Start the plugin
docker compose up -d
3. Verify
# Check health
curl http://localhost:5000/health
# Check status
curl http://localhost:5000/status
# Trigger manual collection
curl -X POST http://localhost:5000/collect
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
CIPHER_CORE_URL |
http://cipher-core:5000 |
Core-CIPHER API URL |
LLM_PROVIDER |
gemini |
LLM provider (gemini/openai/anthropic/ollama) |
LLM_MODEL |
gemini-2.0-flash-exp |
Model name |
GEMINI_API_KEY |
- | Google Gemini API key |
LOG_LEVEL |
INFO |
Logging level |
RSS Feeds
Edit config/plugin.yaml to customize RSS feeds:
sources:
rss:
enabled: true
feeds:
- name: My Custom Feed
url: https://example.com/feed.xml
category: cyber # global, cyber, regional, local
Collection Settings
collection:
interval: 300 # Seconds between collection cycles
batch_size: 50 # Max items per cycle
lookback_hours: 24 # How far back to look
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ CIPHER-OSINT Plugin │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ RSS │───▶│ Analyzer │───▶│ Core Client │ │
│ │ Collector │ │ (LLM) │ │ (Submitter) │ │
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ feedparser Gemini API HTTP POST │
│ │ │
└─────────────────────────────────────────────────┼──────────────┘
│
▼
┌──────────────┐
│ Core-CIPHER │
│ /api/ │
└──────────────┘
Plugin Lifecycle
-
Startup
- Load configuration
- Connect to Core-CIPHER
- Register plugin
- Start heartbeat thread
- Start collection loop
-
Collection Cycle (every 5 minutes)
- Fetch RSS feeds
- Filter new items (dedup by source_id)
- Analyze with LLM (summarize, categorize, prioritize)
- Submit to Core-CIPHER
-
Shutdown
- Send offline heartbeat
- Stop threads gracefully
Development
Local Testing
# Install dependencies
pip install -r requirements.txt
# Set environment
export CIPHER_CORE_URL=http://localhost:5055
export GEMINI_API_KEY=your-key
# Run single collection
python -c "from app.plugin import OSINTPlugin; p = OSINTPlugin(); p.run_once()"
# Run continuously
python -m app.plugin
Project Structure
cipher-osint/
├── app/
│ ├── __init__.py
│ ├── config.py # Configuration loader
│ ├── core_client.py # Core-CIPHER API client
│ ├── llm_client.py # Multi-provider LLM client
│ ├── analyzer.py # Content analysis
│ ├── plugin.py # Main orchestrator
│ ├── api.py # Flask health/status API
│ └── collectors/
│ ├── __init__.py
│ └── rss.py # RSS feed collector
├── config/
│ └── plugin.yaml # Plugin configuration
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md
Intel Output
Items submitted to Core follow the standard schema:
{
"source_id": "rss-abc123",
"source_plugin": "osint",
"source_url": "https://example.com/article",
"intel_type": "news_article",
"category": "cyber",
"title": "Major Vulnerability Discovered",
"content": "Full article text...",
"summary": "LLM-generated summary",
"timestamp": "2024-01-15T10:30:00Z",
"priority": "high",
"tags": ["vulnerability", "security", "critical"],
"latitude": 37.7749,
"longitude": -122.4194,
"location_name": "San Francisco",
"metadata": {
"source_name": "Security News",
"extracted_entities": ["CVE-2024-1234", "Microsoft"],
"address": "San Francisco, California, USA"
}
}
Location Extraction
The analyzer extracts geographic location data from articles using LLM analysis:
- location_name: Most specific place name (city, region, or country)
- latitude/longitude: Approximate decimal coordinates
- address: Full location string stored in metadata
Location data enables map visualization on the Core-CIPHER public dashboard. Markers are colored by priority and clustered when multiple items share the same location.
Configuration in config/plugin.yaml:
prompts:
summarize: |
Analyze this news article and provide:
...
6. Geographic location - the PRIMARY physical location this article is about:
- location_name: Most specific place name (city, region, or country)
- latitude: Approximate latitude coordinate (decimal degrees)
- longitude: Approximate longitude coordinate (decimal degrees)
- address: Full location string if available
Use null for all location fields if the article is not about a specific physical location.
Web Search Collection
Active intelligence gathering using Gemini with Google Search grounding:
sources:
web_search:
enabled: true
categories:
- global # Military conflicts, geopolitics, international security
- cyber # Vulnerabilities, breaches, APT activity
- regional # US domestic security, critical infrastructure
The web search collector uses Gemini 2.0 Flash with built-in Google Search to find breaking news and recent developments that haven't yet appeared in RSS feeds.
License
MIT License