2″ — method & sources

2″ is built by a cycle of sieves. A crawler pulls recording credits from an open database; a chain of filters throws out everything that isn't a real, in-window record; and a language model nominates the next batch of people to crawl — round after round. This page documents that process first, then the sources, then the literal prompts. No claims of taste are made on your behalf. Here is the method; judge it.

01 The process — a cycle of sieves

the language model (Claude) acts here deterministic code — no LLM the stored index (data.json)

seed → crawl → three sieves → merge → repeat

The graph grows in rounds. Each round starts from engineers we already trust, crawls their credits, runs everything through three sieves, and merges what survives. Then a language model nominates more in-scope engineers — the hubs that connect the scene — and the cycle repeats. Nothing is invented: every node is a real credit that passed every sieve.

Year sieve — 1995–2003. A hard filter in code. Anything outside the window is dropped before anything else looks at it. (Why this window: see §05.)
Notability sieve — Claude. The only place a language model touches the data. It removes obvious noise — compilations posing as artists, DJ mixes, mis-parsed text, total unknowns — and nothing else. It does not judge genre, era, or worth. Exact prompt in §03.
Dedup sieve. Anything already in the index — curated or previously crawled — is dropped, so the same record can't be added twice.

02 What the model does — and doesn't

The language model is a sieve, not a curator. It does not decide who belongs in the scene, does not rank, and does not write biographies. The crawl is network-first: keep almost everything a real engineer recorded. The model's only jobs are (a) throwing out obvious noise at SIEVE ②, and (b) nominating the next round's engineers. Where it is unsure, the rule is keep — we would rather carry a few odd inclusions, visibly tagged, than silently delete something real. Era is enforced in code (SIEVE ①), not by the model, so it can't double-penalise.

03 The prompts, verbatim

These are the literal instructions given to the model. Nothing is paraphrased.

// SIEVE ② — the notability sieve (run on every newly-discovered artist)

You vet candidate musical artists discovered by crawling the production
credits of indie / alternative recording engineers (Albini, McEntire, Ek,
Vernhes, etc.). The project is NETWORK-FIRST: keep almost everything a real
engineer recorded, across ANY genre and ANY era. Your only job is removing noise.

For each candidate, set keep=true UNLESS it is clearly one of:
  - not a real musical artist: "Various Artists", a label sampler/compilation,
    a DJ mix, a soundtrack-various, spoken-word/audiobook, or mis-parsed text;
  - a total unknown with NO discernible notability: no real label, no press,
    no Wikipedia/Discogs footprint — indistinguishable from noise.

NOT reasons to drop: wrong genre, wrong era, being obscure-but-real, being
famous (e.g. a rock legend an indie engineer happened to record stays IN).
When unsure, keep=true — we trim later.

Return ONLY a JSON array, one object per candidate:
{"name": <exact name>, "keep": true|false, "reason": <short>}.

// the loop — nominating the next round's engineers

You curate a database of 1995-2003 US/UK indie, alternative, post-rock, lo-fi,
slowcore, math-rock, post-hardcore, shoegaze recording credits. We ALREADY index
these recording engineers/producers: {…current list…}. Name 25 MORE real,
well-documented recording engineers or producers who worked with bands in that
scene and era and are NOT already in the list. Favor people who recorded several
notable indie/alternative acts (they connect the graph). Reply with ONLY a JSON
array of name strings, nothing else.

04 Data sources

Curated core	Hand-built and hand-verified from album liner notes, label pages, AllMusic, Tape Op and Sound on Sound. Every credit double-checked. This is the spine.
Expansion	A crawl of MusicBrainz (open, CC0). One request per person returns role + album + year + performing band; parsing is deterministic — no language model reads the data.
Vetting	A single language model (Claude Sonnet) runs SIEVE ② and the engineer nomination. It judges nothing about taste, genre, or era — see §02–§03.
Album art	Fetched on demand from the iTunes Search API, cached per session. Not stored.
Outbound links	Every entry links to Discogs and MusicBrainz so you can check the source yourself.
Source code	The crawler, this site, and the methodology write-up are open source: github.com/shawnzam/2inch.

05 The window — an admitted bias

To the person who built this, the golden age of this music is 1995–2003. So that is the window the index is seeded from and bounded to (SIEVE ①). This boundary is arbitrary — Slint's Spiderland (1991) and plenty after 2003 matter just as much — and it is stated here rather than hidden behind a claim of objectivity. The hand-curated core reaches a little outside it where a record is foundational; the automated expansion does not.

06 What's in here right now

—curated nodes

—added by crawl

—total nodes

—album credits

Counts are read live from the same data file the graph uses. Anything the pipeline added carries a src:"mb" tag; the curated core carries none — so the two are always separable, and nothing automated can quietly overwrite a hand-checked entry.

07 Honest limitations

MusicBrainz is community-sourced; a credit can be wrong or a release mis-dated. The curated core is checked; the crawl is not, beyond the sieves.
Network-first means off-genre detours stay by design — if an indexed engineer recorded a rock legend or a Celtic-punk band, they're in. That's a feature, not an error.
The 1995–2003 window is one person's bias. Records you love are missing because of an arbitrary line. See §05.
A force-directed graph gets dense fast; legibility is traded for showing the whole web. Use the filters and search a name to see its neighbourhood.
The pipeline is resumable and rate-limited to be a good citizen of the free APIs it depends on.