The signal layer that should exist
Building the GTM intelligence platform every AI-native company is currently rebuilding in-house, badly. 88K jobs from 1,667 source boards, 186-node skill graph, sub-$1/day LLM spend. Solo.
For engineers — full technical walkthrough in the system design notes.
01The context
I left Mesh-AI in early 2023 to build OM. The pattern that had played out at Cisco and Mesh-AI was now playing out across the entire AI-native B2B SaaS category.
Every modern GTM team has the same broken signal layer. Data fragmented across CRM, enrichment APIs, intent providers, ATS systems, and 50+ point tools. No queryable centralized intelligence. Every company in the category currently solves this by hiring a GTM Engineer to wire it all together — 54% of the fastest-growing B2B SaaS companies now have one. Not because they want to. Because the missing infrastructure forces them to.
I'd watched this break twice from inside companies. The third time, I built the answer.
02The diagnosis
The category doesn't have a tooling problem. It has a missing-platform problem.
Clay is a workflow tool. Apollo is a data tool. The CRM is a system of record. None of them is the platform underneath. Every GTM team currently glues these together by hand, then re-glues them every six months when something changes. It works, sort of, at permanent operational cost. The category is collectively rebuilding the same broken wheel and pretending that's not the problem.
The fix isn't a better Clay. It's the platform that should exist underneath every GTM team — the way Snowflake exists underneath every data team. Centralized signal, queryable interface, model-aware scoring, no per-company rebuild.
GTM teams aren't lacking data. They're drowning in it.
03The build
Solo engineer-PM-founder. Three years in. Architecture below.
- Multi-source ingestion pipeline 1,667 source boards across 4 production ATS platforms (Greenhouse, Ashby, Lever, SmartRecruiters); 4 more wired and pending population. 300–1,000 jobs/day baseline. Each platform's payload normalized to a uniform job dict at the dispatch boundary.
- 186-node skill graph Hand-curated capability ontology — two-level taxonomy (L0 cluster, L1 sub-competency), 337 edges across composition, dependency, and transfer relations. Designed for explainability and zero query-time LLM cost; ceiling at ~10K nodes before hierarchical mapping becomes mandatory.
- Map-once, score-forever scoring Five-dimension LLM evaluation per job (role, capability, org, domain, location) at ingest; deterministic graph cosine at query time. Architecture converts an O(users × jobs) LLM problem into O(users + jobs) plus pure math — that's the cost story underneath.
- Cost-engineered inference ~600 LLM calls/day at sub-$1 daily spend; ~5,000 daily (user, job) evaluations across the active pool. Cost was an architectural choice from day one — the map-once-score-forever decomposition is what makes unit economics hold past the toy-project ceiling. Anthropic Batch API (50% off) is the next-step lever as load scales.
- End-to-end product surface pipelineom.com lead capture and enrichment, real-time scoring dashboard, calendar sync, automated tracking. User workflow reduced from 8 hours to 45 minutes — same operation as Cisco, at platform scope.
04How it actually works
Seven hops, each with a silent-drop failure mode. The recurring theme: fixed schemas and cached negatives.
-
Sources
company_boardsis the canonical registry — eight ATS systems (Greenhouse, Ashby, Lever, SmartRecruiters, Workday, Teamtailor, BambooHR, iCIMS), plus manual paste via/api/add-opportunityand the browser extension. Auto-discovery: paste a job URL andtry_register_board()extracts platform, slug, and params. -
Scraper / dispatch
scripts/run_ats.pydispatches perscraper_module: NULL routes to the sharedats_clientfor the four API ATSes; everything else routes to dedicated Workday, Teamtailor, BambooHR, or iCIMS clients. ThreadPool concurrency, ~1.5s per-domain throttle, three-retry exponential backoff on 5xx and timeouts. ETag/304 conditional GETs implemented on Greenhouse only — extending to the others is on the list. -
Normalizer
Each client maps the platform-specific payload to a uniform dict —
url,title,company,jd_text,ats_type,ats_job_id,location,apply_url. -
SQL persistence
Insert path in
data/database.py:INSERT OR IGNOREonurlUNIQUE — same URL is a no-op on re-scrape. Field updates to existing rows run through enrichment writers keyed byopportunity_id. Activity table dedupes via deterministicevent_key. Two backup files indata/(pre-dedup,pre-event-key-backfill) mark manual migrations; Alembic before another engineer touches the schema. -
Enrichment
Async after insert: company domain backfill, location parse, ATS-board discovery for the company, L1/L2 role classification, salary extraction. Slug result cached in
slug_validation_log. -
Ranking — graph-based
Two stages. (a) Claude Haiku scores five dimensions — role_fit, capability_fit, org_quality, domain, location — into
evaluation_score, prompt-cached. (b) Skill graph:user_experiencesmap to skill nodes viaremap_user_if_needed()and propagate through edges; jobs map to the same node space; cosine similarity is the structural fit. Feed orders byadjusted_score. -
API
Flask in
web/app.py, BearerDB_SYNC_API_KEY./api/tasks/claude/board(ranked feed grouped by stage),/api/tasks/claude/triage(AI focus, 30-min cache),/api/tasks/claude/batch(write ops, ≤50),/api/add-opportunity(ingest).
For the full technical walkthrough — data model, cost economics ($25K–$50K vs $40–$250 at scale), failure modes, and the scale ceiling — read the OM system design notes.
05Where it stands
OM is alive, paying, and validated across multiple stakeholder types. The architectural numbers are in §03 and §04. This section answers a different question: who's paying attention.
Won the ABLE Activator national startup accelerator (Bulgaria) before relocating to San Francisco. The pattern across these proof points: every conversation opened cold, every pilot closed direct, every credential earned on the work itself.
Before OM: DaylaCare — AI voice agent for elderly care, deployed in HIPAA-regulated environments, originated at Stanford Health++, advanced via Meta AI Fellowship.
06The pattern
Same operation, three scales. Find the broken signal layer, rebuild it as infrastructure, quantify what changes downstream.
Cisco was the analyst seat — proved I find the leverage. Mesh-AI was the operator seat — proved I turn it into ARR. OM is the builder seat — the same diagnostic move applied to an entire category instead of a single company. The pattern doesn't change with scope. The leverage does.
OM is also the foundation of something larger. The thesis underneath OM — that signal infrastructure is the layer modern operations are missing — extends past GTM. That's the long arc. The short arc is what's already shipped: 88K jobs from 1,667 source boards, 186-node skill graph, 184K cumulative scorings, sub-$1 daily LLM spend, ten paying pilots in sixty days, customers going from 2 to 15 outbound meetings per week — all built solo. The pattern works. The platform is alive. The thesis is no longer hypothetical.