Work /TECH · case study 02

DomainScope.
Seven million domains, AI-categorized, in seventy-seven markets.

A multi-tenant data platform that turns the public DNS surface into structured intelligence: twenty-plus attributes per entity, refreshed on a clock, queryable by anyone with API credentials.

RoleFounder · CTO
StackGo · PostgreSQL · LLM pipelines
StatusLive · paid customers
Sitedomainscope.scrapetheworld.org

The problem.

The public DNS surface is huge and almost completely unstructured. WHOIS gives you registration metadata; commercial enrichment services give you a stamp-collector's view — “technology used,” “industry,” “estimated traffic” — at prices that work for enterprise SDRs and almost nobody else. If you wanted to ask “which domains in Romania mention emergency-response services, ship products to Hungary, and run on Cloudflare?”, you couldn’t.

What I built.

DomainScope is the answer to that family of questions. The crawler ingests new TLD additions and re-validates the existing index on a rolling cadence. An LLM categorization layer extracts twenty-plus structured attributes from each domain’s landing page, robots, tech stack, copy, and metadata. A multi-tenant Go API exposes the result with proper auth, rate limiting, and tenant isolation.

  ┌──────────────────────────────────────────────────────────────┐
  │  acquisition   ── new TLD feeds, zone files, partner drops       │
  │      │                                                         │
  │      ▼                                                         │
  │  enrichment    ── headless fetch · text extraction · whois       │
  │      │                                                         │
  │      ▼                                                         │
  │  llm-categorize ── 20+ structured attributes per entity         │
  │      │                                                         │
  │      ▼                                                         │
  │  storage       ── postgres · hot index · cold archive            │
  │      │                                                         │
  │      ▼                                                         │
  │  api           ── go · jwt/oauth2 · tenant isolation · rate     │
  │                       limiting · audit log                     │
  │      │                                                         │
  │      ▼                                                         │
  │  customer queries                                              │
  └──────────────────────────────────────────────────────────────┘

Why it’s defensible.

The moat is not the data — anyone can crawl. The moat is the schema. We’ve spent two years tuning a twenty-plus-attribute taxonomy that survives the long tail of small-business websites in markets the western SaaS world has never indexed — Romanian regional registrars, Indonesian e-commerce, francophone West African media, the Eastern European cooperative sector. The LLM doesn’t hallucinate categories because the categories are pinned to evidence sentences captured at crawl time.

What customers do with it.

  • Investor-research teams use it as a top-of-funnel for SMB acquisition pipelines.
  • Cyber-threat teams use it for typo-squatting detection at country scale.
  • Civic-tech and journalism users use it under a reduced-rate program (the same data discipline that powers ACTFOIA).
9,000,000+
Domains indexedAs of May 2026
77
Markets / TLDsLive index
23
Attributes / entityAI categorization layer
~3M
Records processed / dayRolling refresh
Go + PG
Backend stackMulti-tenant from day one
JWT + OAuth2
Auth surfacePer-tenant isolation
SLA: 99.5%
API availabilitySelf-hosted on Hetzner
2 years
Schema iterationTaxonomy pinned to evidence

What I learned. What I’d do differently.

The first eighteen months I optimized crawler throughput. The compounding gain wasn’t there — it was in the categorization taxonomy. Move the team’s best engineer to the schema work earlier next time; treat crawler throughput as a solved problem once you’re past 100K records/day.

The second lesson is unglamorous: build the audit log before you have customers, not after the first compliance review. We retrofitted it. It cost two engineer-weeks that could have been one engineer-day.

Links.