Mapping Heuristics v1
Mapping Heuristics v1
Goal
Link venue markets to a canonical global_markets row with a confidence score.
Terminology (Canon)
- Exchange: the platform/company (Polymarket, Kalshi).
- Venue: Cornice-internal alias for Exchange (used in logs and fields).
- Market: the event definition + resolution rules (cross-venue concept).
- Contract: the tradable instrument for a specific Outcome (venue-specific).
Normalization
- Lowercase, strip punctuation, collapse whitespace.
- Remove stopwords: "will", "the", "a", "an", "by", "at", "in".
- Normalize currency/units:
$100k->100000,USD->usd.
Matching Signals
- Title similarity: Jaccard or cosine similarity on tokenized titles.
- End time proximity: within a configurable window (e.g., 24h).
- Category overlap: shared tags/labels if present.
- Outcome alignment: both are binary YES/NO (hard gate).
Confidence Score (Draft)
score = 0.6 * title_similarity
+ 0.3 * end_time_score
+ 0.1 * category_score
Status Rules
score >= 0.85->auto_confirmed0.70 <= score < 0.85->pending< 0.70-> no link created
UI Workflow
- Show pending pairs with title + end time.
- Confirm or reject to lock the mapping.