Scoring, search, and pipelines that structure your data
We turn raw, messy sources into structured data your product can rank, search, and match on. That means scoring engines that personalize results per user, full-text and geographic search that feels instant, and ingestion pipelines that clean and deduplicate external data into something you can trust — built on PostgreSQL, so there's no extra search service to operate or pay for.
The engineering, in plain terms.
Logic lives in SQL
Scoring, classification, and category inference run as deterministic SQL on derived columns — fast, auditable, and explainable, with no LLM cost or guesswork. The same query gives the same result every time.
Search inside Postgres
Full-text and fuzzy search use pg_trgm and unaccent so typos and accents still match, keeping search in your existing database instead of a separate Elasticsearch cluster to run and sync.
Per-user scoring modes
Ranking is parameterized by user profile and mode, so the same catalog reorders itself for each person — strict vs. relaxed, near vs. relevant — by recomputing weights at query time rather than hardcoding one order.
Map search that scales
Geographic queries use haversine distance for nearby-search and Supercluster for clustering, so thousands of pins stay smooth on the map and the right results surface as the user pans and zooms.
From kickoff to launch.
- 1
Map the sources
We inventory your raw inputs — feeds, scrapes, exports — and define the clean, structured shape your product actually needs to rank and search on.
- 2
Build the pipeline
We write the ingestion that parses, normalizes, deduplicates, and enriches each source, with validation that drops or flags bad records instead of polluting the catalog.
- 3
Add scoring and search
We layer in the scoring engine, full-text and geo search, and tune relevance against real queries and real data, not synthetic samples.
- 4
Verify and ship
We test ranking and ingestion against known inputs, measure query performance under realistic volume, and deploy with indexes and re-ingestion jobs in place.
Deliverables, and when it fits.
Deliverables
- An ingestion pipeline that turns raw external sources into clean, deduplicated, structured records
- A scoring/ranking engine with per-user or per-mode personalization, implemented in SQL
- Full-text and fuzzy search with typo and accent tolerance, plus filtering that stays fast at scale
- Geographic search: nearby-distance queries and map clustering ready for thousands of points
- Indexes, derived columns, and re-ingestion jobs documented so the system stays fast as data grows
- Tests covering ranking logic and pipeline correctness against known inputs
A good fit when
- You aggregate data from multiple external sources and need it clean, deduplicated, and queryable
- You want personalized ranking or scoring without the cost and unpredictability of an LLM
- You need fast full-text or map-based search but don't want to operate a separate search cluster
- Your catalog has outgrown naive filtering and search now feels slow or irrelevant
Shipped a pipeline that parsed ~1,500 raw listings down to ~1,200 clean, deduplicated records with map clustering and nearby-search, and a separate scoring engine that personalizes recommendations entirely in SQL.
Want this for your product?
Tell us the goal — we scope it honestly.