Opportunity Finder (Phase 1 Spec)
Opportunity Finder (Phase 1 Spec)
Purpose
Identify, store, and maintain equivalent or near-equivalent market pairs across Polymarket and Kalshi, starting with high-confidence categories (BTC hourly, select politics). The goal is a reliable, reviewable match set that can later power automated opportunity detection and execution.
Scope (Phase 1)
- Markets: Polymarket + Kalshi only.
- Categories: BTC hourly first, then a narrow set of political events; sports later.
- Output: A list of cross-venue market pairs with confidence scores and status.
Core concepts
- Global market: canonical record that represents “the same real-world event” across venues.
- Market link: a per-venue market tied to a global market with confidence + status.
- Pair: a global market with two linked venue markets (Polymarket + Kalshi).
Matching strategy (semi-automated)
Candidate generation (fast gates)
- Time proximity: require end/close times within a configurable window.
- Hourly BTC: tight window (e.g., 1–2 hours).
- Politics/sports: broader window (e.g., same day).
- Category match: only compare within the same high-level category.
- Outcome shape: binary vs strike grid must be compatible.
Scoring signals (ranked)
- Title similarity (tokenized + normalized).
- End-time proximity (smaller diff = higher score).
- Keyword anchors (e.g., BTC, CPI, candidate names, team names).
- Resolution/oracle alignment (if available).
Confidence and status
score >= 0.85→auto_confirmed0.70–0.85→pending_review< 0.70→ discard
Status tracking
- Active pairs: both markets open for trading.
- Upcoming pairs: scheduled or listed but not yet live.
- Expired pairs: closed or resolved; keep for history and backtests.
Category-specific notes
BTC hourly
- Deterministic slugs/tickers make this mostly rule-based.
- Use ET hour boundaries and strict time matching.
- Treat strike differences as approximate (range overlap is the edge).
Politics
- Titles can be noisy; require strong keyword anchors plus end-time gates.
- Manual review is required until enough samples exist to tune thresholds.
Sports (later phase)
- Team name normalization is critical (abbreviations, nicknames).
- Start with manual review only; automation requires careful tuning.
Review workflow (human-in-the-loop)
- System proposes pairs with scores + key evidence (title diff, time diff).
- Reviewer confirms, rejects, or edits.
- Confirmed links become durable unless explicitly removed.
Data outputs (minimum fields)
- Global market:
id,title,category,start_time,end_time,resolution_source - Link:
venue,venue_market_id,global_market_id,score,status,matched_at - Pair status:
active|upcoming|expired
Risks and mitigations
- False positives: enforce time + category gates before fuzzy matching.
- Stale pairs: refresh statuses on a schedule; archive resolved pairs.
- Oracle mismatch: flag when resolution sources differ; require manual review.
Phase 1 success criteria
- Reliable BTC hourly pairing with low false positives.
- Reviewable pipeline for politics with clear audit trail.
- Ability to enumerate active vs upcoming pairs.
Test use case (case study)
Sports as the first apples-to-apples category
Stakeholder view: sports is the logical next category after crypto because events are short, numerous, and likely to be listed on both platforms with similar naming.
Why it matters
- Simpler pattern: true 1:1 market matches (same teams, same game time) are easier than range-based crypto.
- Low agency bot: detect + suggest without execution, enabling a safe first deployment.
- Opportunity discovery: many events increase the chance of finding clear pricing gaps.
Prototype workflow
- Pull upcoming sports events from both venues for a narrow time window.
- Match on team names + event time; require tight time proximity.
- If both sides are binary yes/no, compute combined cost and flag edges.
- Emit “found” vs “suggested” signals for review (no auto execution).
Mini-spec (sports test case)
Data fields
- Event core:
sport,league,home_team,away_team,start_time,end_time - Venue market:
venue,venue_market_id,title,outcomes,close_time - Pair link:
global_market_id,score,status,matched_at - Pricing snapshot:
yes_ask,no_ask,yes_bid,no_bid,timestamp
Matching rules
- Team normalization: strip punctuation, standardize abbreviations, map nicknames.
- Time gate: start times within a tight window (e.g., <= 60 minutes).
- Title gate: both team names must appear in each title.
- Outcome gate: both sides are binary yes/no (no spreads/totals in v1).
- Score formula (draft):
0.6 * title_similarity + 0.3 * time_proximity + 0.1 * league_match- Require
score >= 0.85for auto-confirmed.
Alert format (suggest-only)
- Title:
SPORTS ARB CANDIDATE - Body:
- Teams + start time
- Venue pair identifiers
- Best YES/NO asks and total cost
- Edge in cents and percentage
- Status:
foundorsuggested
Appendix A: Team normalization (starter map)
Use this as a seed; expand as new leagues and aliases appear.
NBA
la lakers->los angeles lakerslakers->los angeles lakersla clippers->los angeles clippersclippers->los angeles clippersny knicks->new york knicksknicks->new york knicksny nets->brooklyn netsnets->brooklyn netsgs warriors->golden state warriorswarriors->golden state warriorssa spurs->san antonio spursspurs->san antonio spurs
NFL
ny giants->new york giantsgiants->new york giantsny jets->new york jetsjets->new york jetsla rams->los angeles ramsrams->los angeles ramsla chargers->los angeles chargerschargers->los angeles chargers
Premier League (near future)
man utd->manchester unitedman united->manchester unitedman city->manchester cityspurs->tottenham hotspurwolves->wolverhampton wanderers
Appendix B: Alert payload examples
Found
{
"type": "sports.arb.found",
"sport": "nba",
"league": "NBA",
"home_team": "los angeles lakers",
"away_team": "golden state warriors",
"start_time": "2026-01-26T03:00:00Z",
"polymarket_market_id": "poly-123",
"kalshi_market_id": "kalshi-456",
"yes_ask": 48,
"no_ask": 49,
"total_cost": 97,
"edge_cents": 3,
"status": "found"
}
Suggested
{
"type": "sports.arb.suggested",
"sport": "nfl",
"league": "NFL",
"home_team": "new york giants",
"away_team": "new york jets",
"start_time": "2026-01-26T20:25:00Z",
"polymarket_market_id": "poly-789",
"kalshi_market_id": "kalshi-321",
"yes_ask": 52,
"no_ask": 50,
"total_cost": 102,
"edge_cents": -2,
"status": "suggested"
}