CIPHER OSINT Plugin - RSS/News intelligence collector with LLM analysis
Find a file
Matt Wayne 75fd69218a docs: Add location extraction and web search documentation
- Document location extraction feature with lat/lon/address
- Add configuration example for LLM prompt
- Document web search collector with Google Search grounding
- Update intel output schema with location fields

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-05 11:47:27 -07:00
app feat: Add LLM-based location extraction and web search collector 2025-12-05 11:45:50 -07:00
config feat: Add LLM-based location extraction and web search collector 2025-12-05 11:45:50 -07:00
.env.example Initial commit: CIPHER-OSINT plugin v1.0.0 2025-12-05 00:54:07 -07:00
.gitignore Initial commit: CIPHER-OSINT plugin v1.0.0 2025-12-05 00:54:07 -07:00
docker-compose.yml feat: Add LLM-based location extraction and web search collector 2025-12-05 11:45:50 -07:00
Dockerfile Initial commit: CIPHER-OSINT plugin v1.0.0 2025-12-05 00:54:07 -07:00
README.md docs: Add location extraction and web search documentation 2025-12-05 11:47:27 -07:00
requirements.txt feat: Add LLM-based location extraction and web search collector 2025-12-05 11:45:50 -07:00

CIPHER-OSINT

Open Source Intelligence Plugin for CIPHER

Collects and analyzes news and RSS feeds using LLM for content analysis, categorization, and priority assessment.

Features

  • RSS/Atom Feed Collection - Fetches from configurable news sources
  • Web Search Collection - Active intelligence gathering using Gemini with Google Search grounding
  • LLM Analysis - Uses Gemini/OpenAI/Anthropic for content analysis
  • Automatic Categorization - Assigns global/cyber/regional/local categories
  • Priority Assessment - Determines critical/high/normal/low priority
  • Entity Extraction - Identifies people, organizations, places
  • Tag Generation - Auto-generates relevant tags
  • Location Extraction - Extracts geographic coordinates for map visualization

Quick Start

1. Configure Environment

cp .env.example .env
# Edit .env with your settings (especially GEMINI_API_KEY)

2. Start with Docker

# Ensure cipher-net network exists
docker network create cipher-net 2>/dev/null || true

# Start the plugin
docker compose up -d

3. Verify

# Check health
curl http://localhost:5000/health

# Check status
curl http://localhost:5000/status

# Trigger manual collection
curl -X POST http://localhost:5000/collect

Configuration

Environment Variables

Variable Default Description
CIPHER_CORE_URL http://cipher-core:5000 Core-CIPHER API URL
LLM_PROVIDER gemini LLM provider (gemini/openai/anthropic/ollama)
LLM_MODEL gemini-2.0-flash-exp Model name
GEMINI_API_KEY - Google Gemini API key
LOG_LEVEL INFO Logging level

RSS Feeds

Edit config/plugin.yaml to customize RSS feeds:

sources:
  rss:
    enabled: true
    feeds:
      - name: My Custom Feed
        url: https://example.com/feed.xml
        category: cyber  # global, cyber, regional, local

Collection Settings

collection:
  interval: 300       # Seconds between collection cycles
  batch_size: 50      # Max items per cycle
  lookback_hours: 24  # How far back to look

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                      CIPHER-OSINT Plugin                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────┐  │
│  │     RSS      │───▶│   Analyzer   │───▶│   Core Client    │  │
│  │  Collector   │    │    (LLM)     │    │   (Submitter)    │  │
│  └──────────────┘    └──────────────┘    └──────────────────┘  │
│         │                   │                     │             │
│         │                   │                     │             │
│    feedparser          Gemini API           HTTP POST          │
│                                                  │              │
└─────────────────────────────────────────────────┼──────────────┘
                                                   │
                                                   ▼
                                          ┌──────────────┐
                                          │ Core-CIPHER  │
                                          │    /api/     │
                                          └──────────────┘

Plugin Lifecycle

  1. Startup

    • Load configuration
    • Connect to Core-CIPHER
    • Register plugin
    • Start heartbeat thread
    • Start collection loop
  2. Collection Cycle (every 5 minutes)

    • Fetch RSS feeds
    • Filter new items (dedup by source_id)
    • Analyze with LLM (summarize, categorize, prioritize)
    • Submit to Core-CIPHER
  3. Shutdown

    • Send offline heartbeat
    • Stop threads gracefully

Development

Local Testing

# Install dependencies
pip install -r requirements.txt

# Set environment
export CIPHER_CORE_URL=http://localhost:5055
export GEMINI_API_KEY=your-key

# Run single collection
python -c "from app.plugin import OSINTPlugin; p = OSINTPlugin(); p.run_once()"

# Run continuously
python -m app.plugin

Project Structure

cipher-osint/
├── app/
│   ├── __init__.py
│   ├── config.py         # Configuration loader
│   ├── core_client.py    # Core-CIPHER API client
│   ├── llm_client.py     # Multi-provider LLM client
│   ├── analyzer.py       # Content analysis
│   ├── plugin.py         # Main orchestrator
│   ├── api.py            # Flask health/status API
│   └── collectors/
│       ├── __init__.py
│       └── rss.py        # RSS feed collector
├── config/
│   └── plugin.yaml       # Plugin configuration
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
└── README.md

Intel Output

Items submitted to Core follow the standard schema:

{
  "source_id": "rss-abc123",
  "source_plugin": "osint",
  "source_url": "https://example.com/article",
  "intel_type": "news_article",
  "category": "cyber",
  "title": "Major Vulnerability Discovered",
  "content": "Full article text...",
  "summary": "LLM-generated summary",
  "timestamp": "2024-01-15T10:30:00Z",
  "priority": "high",
  "tags": ["vulnerability", "security", "critical"],
  "latitude": 37.7749,
  "longitude": -122.4194,
  "location_name": "San Francisco",
  "metadata": {
    "source_name": "Security News",
    "extracted_entities": ["CVE-2024-1234", "Microsoft"],
    "address": "San Francisco, California, USA"
  }
}

Location Extraction

The analyzer extracts geographic location data from articles using LLM analysis:

  • location_name: Most specific place name (city, region, or country)
  • latitude/longitude: Approximate decimal coordinates
  • address: Full location string stored in metadata

Location data enables map visualization on the Core-CIPHER public dashboard. Markers are colored by priority and clustered when multiple items share the same location.

Configuration in config/plugin.yaml:

prompts:
  summarize: |
    Analyze this news article and provide:
    ...
    6. Geographic location - the PRIMARY physical location this article is about:
       - location_name: Most specific place name (city, region, or country)
       - latitude: Approximate latitude coordinate (decimal degrees)
       - longitude: Approximate longitude coordinate (decimal degrees)
       - address: Full location string if available
       Use null for all location fields if the article is not about a specific physical location.

Web Search Collection

Active intelligence gathering using Gemini with Google Search grounding:

sources:
  web_search:
    enabled: true
    categories:
      - global    # Military conflicts, geopolitics, international security
      - cyber     # Vulnerabilities, breaches, APT activity
      - regional  # US domestic security, critical infrastructure

The web search collector uses Gemini 2.0 Flash with built-in Google Search to find breaking news and recent developments that haven't yet appeared in RSS feeds.

License

MIT License