Writing

Building an NPI roll up tool for my job

4 min read

The problem

In POC marketing, the workflow for executing a campaign goes roughly: a client gives you a target list of healthcare providers, you figure out which ones your media vendors can reach, and you buy media against those. The work that gets glossed over is everything between those bullets.

The biggest piece of that work is turning a list of people (NPIs) into a list of places (offices). Vendors operate at the office level — they serve a building, not a doctor. So before you can buy anything you have to consolidate hundreds of thousands of NPI rows into a much smaller set of unique offices, decide what to do when NPIs at the same office disagree on attributes, strip out internal data your vendors shouldn't see, and send the result out so vendors can mark which offices they can reach. Then you take their responses back, filter to the segments you care about, and decide what you're actually buying.

This tool covers that whole loop.

Why it's annoying in practice

A typical target list might be 600,000 rows. Each row is one NPI with 30+ columns of attributes (specialty, decile, segment, internal scoring fields). When you roll that up into offices, a few things bite:

Address normalization. "123 Main Street" and "123 MAIN ST" should be one office. Your client doesn't normalize, your vendors don't normalize, and Excel definitely doesn't.
Conflicts. Three NPIs at the same office, one is a Cardiologist, the others are Endocrinologists. What's the office's specialty? Depends — sometimes the rule is "whichever has the highest decile," sometimes it's "Cardiology if there's one." No universal answer.
Internal data. Your decile scoring is your business. You don't want to leak it to vendors, but it lives in the same spreadsheet as the addresses, and stripping it cleanly is fiddly.

Most teams handle this in Excel. Excel is fine for a thousand rows. At six hundred thousand it gets slow, formulas break, address normalization is impossible, and the work lives as one person's spreadsheet that nobody else can run.

The roll-up

Upload a CSV. The tool runs entirely in your browser — nothing about the data leaves your machine, which matters when the data is patient-adjacent.

Pick the columns that hold the address and the NPI number, and the tool groups your NPIs into offices. The address normalization handles things like "Ste / Suite / STE." spelling, ZIP codes that lost their leading zeros to xlsx number conversion, and address fields with trailing whitespace.

For every other column, you choose what the rollup does when NPIs at the same office disagree:

Priority columns get a ranking ("Cardiology over GP, Decile 10 over 9 over 8…"). The office takes the value of the highest-ranked NPI.
Watch columns are displayed; the cell turns yellow at any office where NPIs disagree.
Pass-through columns are displayed but never flagged. Right for identifiers like NPI numbers and names, where disagreement is expected.
Excluded columns drop from the rollup output entirely.

The tool has reasonable defaults — high-cardinality columns auto-mark themselves as pass-through, and clicking Configure priority pre-fills an obvious ranking when one exists (Decile 1–10 ranks high to low, High/Medium/Low does the same).

After you run the rollup, a Conflict resolution panel surfaces every column with disagreements: "12,400 offices where NPIs disagree on Specialty — rank it, ignore it, or pass through?" You make the calls in a few clicks instead of scrolling through a flagged spreadsheet.

Office summary

Once you have your offices, the Office Summary lets you slice them. Pick the columns you care about (Decile, Specialty, Segment) and the tool shows you a flat breakdown of every value combination:

Cardiology × Decile 10 × Loyalist — 2,341 offices (8.2%)
Cardiology × Decile 9 × Loyalist — 1,892 offices (6.6%)
Endocrinology × Decile 10 × Switcher — 443 offices (1.5%)

You can filter individual columns if you only want certain values, and the breakdown updates. This part is informational — it's how you check whether your segmentation is too narrow before you commit to a buy.

Export

Two files come out: an NPI drilldown CSV (every NPI in your target list, with its Office ID) that goes to vendors, and an Office List xlsx (one row per office, with the rolled-up values) for internal review. You pick which columns from your source data go into the drilldown — internal scoring stays internal. The configuration itself can also be saved as JSON and reloaded next time. Nothing about the data persists — only the rules.

The other half: vendor match analysis

After you send the drilldown out, vendors send their responses back with match data attached — "I can reach this office on Display, that one on Connected TV, this other one not at all."

The second half of the tool ingests those responses and lets you ask, against the union of every vendor's data: "if I want offices that are reachable on Display AND in Decile 9–10 AND with my preferred specialties, how many is that?" Same UI as the Office Summary — pick slicing columns, see the breakdown, filter — but the dimensions now span across vendors and tactics.

You see things like "this vendor adds 8,700 reachable offices on top of vendor A" or "Display reaches 73% of my Decile 10 list but only 41% of my Decile 7." When you've made your decisions, the tool exports a Buy File — a list of (office, vendor, tactic) tuples for execution.

What I took away

Most of the design decisions in the tool came from watching how teams actually do this work, not from what the workflow should be. The conflict triage panel exists because real target lists generate tens of thousands of conflicts and people don't have time to scroll through all of them. The columns-first slicing UI exists because nobody knows what their qualifying counts will look like until they see the data. The "drop NPIs with no address" toggle exists because the phantom blank-address office was throwing off every count.

A lot of pharma media work runs on Excel. This tool isn't doing anything Excel can't theoretically do — it's just doing it correctly, repeatably, and in a form anyone on a team can run.