Find similar markets, detect derivatives, and discover arbitrage opportunities through semantic vector search
pip install -r requirements.txt
python cli.py scrape --max-markets 1000
This will fetch political markets from Polymarket, extract text, generate embeddings using Sentence Transformers, and store in ChromaDB. Progress: ~5-10 minutes for 1000 markets.
python api.py
API available at http://localhost:8000
python frontend.py
UI available at http://localhost:5000
# Build and Run
docker-compose up -d
# Initial Scrape
docker-compose exec api python cli.py scrape --max-markets 1000
# Scrape active political markets
python cli.py scrape --max-markets 1000
# Include closed markets
python cli.py scrape --max-markets 1000 --closed
# Custom database path
python cli.py scrape --db-path /path/to/db --max-markets 500
# Natural language search
python cli.py search "Trump wins 2024" --limit 10
# Custom similarity threshold
python cli.py search "senate control" --limit 20
python cli.py similar 12345 --limit 10
python cli.py derivatives 12345 --min-similarity 0.75 --limit 20
python cli.py arbitrage 12345 --min-similarity 0.90 --min-price-diff 0.05
Get database statistics
{
"total_markets": 847,
"name": "polymarket_political_markets",
"metadata": {
"embedding_model": "all-MiniLM-L6-v2",
"embedding_dimensions": 384
}
}
Natural language search
// Request
{
"query": "Trump wins 2024",
"limit": 10
}
// Response
{
"query": "Trump wins 2024",
"results": [
{
"id": "12345",
"similarity": 0.92,
"metadata": {
"question": "Will Trump win the 2024 election?",
"volume": 150000,
"last_price": 0.65,
"active": true
}
}
]
}
Find derivative markets
// Request
{
"market_id": "12345",
"min_similarity": 0.75,
"max_results": 20
}
// Response
{
"market_id": "12345",
"derivatives": [
{
"id": "67890",
"similarity": 0.88,
"relationship": "strong_derivative",
"metadata": {...}
}
]
}
Find arbitrage opportunities
// Request
{
"market_id": "12345",
"min_similarity": 0.90,
"min_price_diff": 0.05
}
// Response
{
"market_id": "12345",
"opportunities": [
{
"market_a": {
"id": "12345",
"question": "Trump wins 2024",
"price": 0.65
},
"market_b": {
"id": "67890",
"question": "Republican wins 2024",
"price": 0.60
},
"similarity": 0.93,
"price_diff": 0.05,
"arb_type": "subset",
"expected_profit": 0.05,
"confidence": "high"
}
]
}
| Score Range | Interpretation |
|---|---|
| 0.95-1.00 | Near duplicates (same question, different outcome) |
| 0.85-0.95 | Strong derivatives (closely related) |
| 0.75-0.85 | Moderate derivatives (related but distinct) |
| 0.65-0.75 | Weak correlation |
| < 0.65 | Unrelated |
vectorization_derivatives/
├── gamma_api.py # Polymarket API client
├── text_processor.py # Text extraction
├── embeddings.py # Sentence Transformers wrapper
├── chroma_client.py # ChromaDB interface
├── derivatives.py # Analysis functions
├── market_vectorizer.py # Orchestrator
├── cli.py # CLI tool
├── api.py # FastAPI backend
├── frontend.py # Flask frontend
├── templates/
│ └── index.html # Web UI
├── requirements.txt # Dependencies
├── docker-compose.yml # Docker orchestration
├── Dockerfile.api # API container
├── Dockerfile.frontend # Frontend container
└── README.md # Documentation