Home

/

Library

/

opportunity-finder.md

Opportunity Finder (Phase 1 Spec)

Opportunity Finder (Phase 1 Spec)

Purpose

Identify, store, and maintain equivalent or near-equivalent market pairs across Polymarket and Kalshi, starting with high-confidence categories (BTC hourly, select politics). The goal is a reliable, reviewable match set that can later power automated opportunity detection and execution.

Scope (Phase 1)

  • Markets: Polymarket + Kalshi only.
  • Categories: BTC hourly first, then a narrow set of political events; sports later.
  • Output: A list of cross-venue market pairs with confidence scores and status.

Core concepts

  • Global market: canonical record that represents “the same real-world event” across venues.
  • Market link: a per-venue market tied to a global market with confidence + status.
  • Pair: a global market with two linked venue markets (Polymarket + Kalshi).

Matching strategy (semi-automated)

Candidate generation (fast gates)

  • Time proximity: require end/close times within a configurable window.
    • Hourly BTC: tight window (e.g., 1–2 hours).
    • Politics/sports: broader window (e.g., same day).
  • Category match: only compare within the same high-level category.
  • Outcome shape: binary vs strike grid must be compatible.

Scoring signals (ranked)

  1. Title similarity (tokenized + normalized).
  2. End-time proximity (smaller diff = higher score).
  3. Keyword anchors (e.g., BTC, CPI, candidate names, team names).
  4. Resolution/oracle alignment (if available).

Confidence and status

  • score >= 0.85auto_confirmed
  • 0.70–0.85pending_review
  • < 0.70 → discard

Status tracking

  • Active pairs: both markets open for trading.
  • Upcoming pairs: scheduled or listed but not yet live.
  • Expired pairs: closed or resolved; keep for history and backtests.

Category-specific notes

BTC hourly

  • Deterministic slugs/tickers make this mostly rule-based.
  • Use ET hour boundaries and strict time matching.
  • Treat strike differences as approximate (range overlap is the edge).

Politics

  • Titles can be noisy; require strong keyword anchors plus end-time gates.
  • Manual review is required until enough samples exist to tune thresholds.

Sports (later phase)

  • Team name normalization is critical (abbreviations, nicknames).
  • Start with manual review only; automation requires careful tuning.

Review workflow (human-in-the-loop)

  1. System proposes pairs with scores + key evidence (title diff, time diff).
  2. Reviewer confirms, rejects, or edits.
  3. Confirmed links become durable unless explicitly removed.

Data outputs (minimum fields)

  • Global market: id, title, category, start_time, end_time, resolution_source
  • Link: venue, venue_market_id, global_market_id, score, status, matched_at
  • Pair status: active|upcoming|expired

Risks and mitigations

  • False positives: enforce time + category gates before fuzzy matching.
  • Stale pairs: refresh statuses on a schedule; archive resolved pairs.
  • Oracle mismatch: flag when resolution sources differ; require manual review.

Phase 1 success criteria

  • Reliable BTC hourly pairing with low false positives.
  • Reviewable pipeline for politics with clear audit trail.
  • Ability to enumerate active vs upcoming pairs.

Test use case (case study)

Sports as the first apples-to-apples category

Stakeholder view: sports is the logical next category after crypto because events are short, numerous, and likely to be listed on both platforms with similar naming.

Why it matters

  • Simpler pattern: true 1:1 market matches (same teams, same game time) are easier than range-based crypto.
  • Low agency bot: detect + suggest without execution, enabling a safe first deployment.
  • Opportunity discovery: many events increase the chance of finding clear pricing gaps.

Prototype workflow

  1. Pull upcoming sports events from both venues for a narrow time window.
  2. Match on team names + event time; require tight time proximity.
  3. If both sides are binary yes/no, compute combined cost and flag edges.
  4. Emit “found” vs “suggested” signals for review (no auto execution).

Mini-spec (sports test case)

Data fields

  • Event core: sport, league, home_team, away_team, start_time, end_time
  • Venue market: venue, venue_market_id, title, outcomes, close_time
  • Pair link: global_market_id, score, status, matched_at
  • Pricing snapshot: yes_ask, no_ask, yes_bid, no_bid, timestamp

Matching rules

  • Team normalization: strip punctuation, standardize abbreviations, map nicknames.
  • Time gate: start times within a tight window (e.g., <= 60 minutes).
  • Title gate: both team names must appear in each title.
  • Outcome gate: both sides are binary yes/no (no spreads/totals in v1).
  • Score formula (draft):
    • 0.6 * title_similarity + 0.3 * time_proximity + 0.1 * league_match
    • Require score >= 0.85 for auto-confirmed.

Alert format (suggest-only)

  • Title: SPORTS ARB CANDIDATE
  • Body:
    • Teams + start time
    • Venue pair identifiers
    • Best YES/NO asks and total cost
    • Edge in cents and percentage
    • Status: found or suggested

Appendix A: Team normalization (starter map)

Use this as a seed; expand as new leagues and aliases appear.

NBA

  • la lakers -> los angeles lakers
  • lakers -> los angeles lakers
  • la clippers -> los angeles clippers
  • clippers -> los angeles clippers
  • ny knicks -> new york knicks
  • knicks -> new york knicks
  • ny nets -> brooklyn nets
  • nets -> brooklyn nets
  • gs warriors -> golden state warriors
  • warriors -> golden state warriors
  • sa spurs -> san antonio spurs
  • spurs -> san antonio spurs

NFL

  • ny giants -> new york giants
  • giants -> new york giants
  • ny jets -> new york jets
  • jets -> new york jets
  • la rams -> los angeles rams
  • rams -> los angeles rams
  • la chargers -> los angeles chargers
  • chargers -> los angeles chargers

Premier League (near future)

  • man utd -> manchester united
  • man united -> manchester united
  • man city -> manchester city
  • spurs -> tottenham hotspur
  • wolves -> wolverhampton wanderers

Appendix B: Alert payload examples

Found

{
  "type": "sports.arb.found",
  "sport": "nba",
  "league": "NBA",
  "home_team": "los angeles lakers",
  "away_team": "golden state warriors",
  "start_time": "2026-01-26T03:00:00Z",
  "polymarket_market_id": "poly-123",
  "kalshi_market_id": "kalshi-456",
  "yes_ask": 48,
  "no_ask": 49,
  "total_cost": 97,
  "edge_cents": 3,
  "status": "found"
}

Suggested

{
  "type": "sports.arb.suggested",
  "sport": "nfl",
  "league": "NFL",
  "home_team": "new york giants",
  "away_team": "new york jets",
  "start_time": "2026-01-26T20:25:00Z",
  "polymarket_market_id": "poly-789",
  "kalshi_market_id": "kalshi-321",
  "yes_ask": 52,
  "no_ask": 50,
  "total_cost": 102,
  "edge_cents": -2,
  "status": "suggested"
}