Blog.

PII Shield: The Plugin That Forgets Your Secrets Before Claude Remembers Them

Cover Image for PII Shield: The Plugin That Forgets Your Secrets Before Claude Remembers Them
Stephen Doherty
Stephen Doherty
7 min read

The Brilliant Assistant With No Discretion

Imagine hiring the most brilliant analyst in the world. She can find patterns in payroll data in seconds, reconcile benefits across five files before lunch, and explain what she found in plain English. There is one problem: she remembers everything she reads, and you have no idea who else might eventually see her notes.

This is the situation with AI coding assistants and sensitive data. Claude is very good at analyzing your CSV files. It is very bad at not seeing the Social Security numbers inside them. The data enters the context window, and from that point forward, it is seen. You cannot un-ring that bell.

The standard advice is to remember to obfuscate your data before you ask for help. This is like advising people to remember to lock their doors. It works right up until the one morning it doesn't, and that one morning is the one that matters.

PII Shield takes a different approach. Instead of hoping you'll remember, it makes forgetting impossible.

claude plugin add pii-shield@stemado/pii_shield

The Four Walls

PII Shield doesn't ask Claude to be careful with your data. It prevents Claude from seeing your data at all.

The plugin installs four hooks, one for each tool Claude might use to read a file: Read, Bash, Grep, and Edit. Every one of these tools, when aimed at a CSV or Excel file, hits a wall. The hook fires before the tool executes. It checks the file, finds that it contains unshielded data, and blocks the request with a hard exit. Claude never receives the file contents. The model sees a refusal, not a redaction.

This is the critical distinction. Other approaches redact or mask data inside the AI's context. PII Shield operates outside the context entirely. The data is obfuscated on disk before any tool can read it. If Read is a door, PII Shield is not a lock on the door. It is a wall where the door used to be, with a different door around the corner that opens onto safe data.

Four walls. Four tools. No gaps.

A sentinel file called .pii-protected marks an entire directory tree as off-limits. The only path through is the shielded/ subdirectory, where obfuscated copies live with _shielded in the filename. The hooks enforce this with a five-rule evaluation: block reference files first, allow shielded files second, block anything under a sentinel third, block any CSV or Excel file by extension fourth, allow everything else fifth. The rules are evaluated in order, so the strictest protections win.

The Human Stays in the Loop

When Claude tries to read a payroll file and gets blocked, it calls one of two MCP tools: pii_shield_batch for multiple related files, or pii_select_columns for a single file. Either way, the same thing happens: a GUI window opens on your screen.

This is a dark-mode dialog built with customtkinter. It shows every column in your file, classified as SHIELD (will be obfuscated) or PASS (safe to show Claude). The classification is automatic but not final. You review it. You adjust it. You confirm it. Only then does the obfuscation happen.

The auto-classification is smarter than a simple name match. Date columns where every value is more than fifteen years old get reclassified as probable dates of birth. Numeric columns named "zip", "phone", or "fax" get flagged as PII per HIPAA Safe Harbor rules. Columns with names like employee_id, ssn, or policy_number are detected as identifiers. The system follows the eighteen-identifier list from 45 C.F.R. 164.514(b), which is the HHS standard for de-identification.

But the machine does not get the final word. You do.

The Trick With the Tokens

The obfuscation itself is deterministic. The same input value always produces the same token within a session. "John Smith" becomes first_name_003 every time it appears, in every file, across every column with that name.

This sounds like a small detail. It is the detail that makes everything work.

If employee 47291 appears in your payroll file, your benefits file, and your census file, they need to be the same token in all three. Otherwise you cannot join the files. You cannot ask Claude to find discrepancies between payroll and benefits for the same person if "the same person" maps to three different tokens.

PII Shield's batch mode handles this with a shared reference table. When you process multiple files together, every ID column shares a global namespace. The same SSN produces the same token whether it appears in column A of file one or column C of file three. Foreign key relationships survive obfuscation intact. Claude can join, filter, group, and compare across files without ever seeing a real name or number.

The reference file that maps tokens back to real values? Also blocked. The hooks will not let Claude read it. If you need to restore the originals, you run deobfuscate.py in a separate terminal, outside Claude's reach.

Templates, Because Payroll Files Have a Hundred Columns

Anyone who works with payroll data knows the files are wide. A hundred columns is normal. Two hundred is not unusual. Selecting which columns to shield every time you process a new file would be tedious enough to make the tool useless.

Column templates solve this. After you configure your column selections once, you save them as a template. Next time you process a file with similar columns, the template loads automatically. The GUI remembers your preferences so you don't have to. For teams that process the same file layouts weekly or monthly, this is the difference between a tool you use and a tool you used once and abandoned.

The templates live in ~/.pii-shield/templates/ as JSON files, portable across projects and machines.

What Happens in Practice

Here is what it looks like to use PII Shield:

You open Claude Code in a directory with payroll CSVs. You say: "Compare the January and February census files and flag any employees whose coverage tier changed."

Claude tries to read the first CSV. The hook blocks it. Claude recognizes the block, calls pii_shield_batch, and passes both file paths. A GUI window appears showing every column: employee_id, ssn, first_name, last_name, dob, address, coverage_tier, effective_date. The sensitive columns are pre-classified as SHIELD. Coverage tier and effective date are classified as PASS. You glance at it, confirm, and click the button.

Two shielded files appear in the shielded/ subdirectory. Claude reads them. Every name is first_name_012, every SSN is ssn_012, every address is address_012. But coverage tier still says "Employee + Spouse" and effective dates are real dates, because those aren't PII. Claude compares the files, finds the tier changes, and reports back using the opaque tokens.

You, the human, can cross-reference those tokens against the reference file whenever you need the real names. Claude cannot.

Why This Matters Now

HIPAA, SOC 2, state privacy laws, internal compliance policies. The list of reasons you should not feed sensitive data into an AI model is long and growing. But the list of reasons you want to use AI with that data is also long and growing. The spreadsheet is too big. The comparison is too tedious. The pattern is too subtle. You want the brilliant analyst. You just need her to work blindfolded on the sensitive parts.

PII Shield is that blindfold. It sits between Claude and your data, and it enforces a simple rule: the AI sees the structure, the relationships, the patterns, and the answers. It does not see the people.

Install it:

claude plugin add pii-shield@stemado/pii_shield

The repository is on GitHub. MIT licensed. It works with any CSV or Excel file, in any Claude Code project. The only requirement is Python 3.11 and a willingness to let your data stay private.

The best security is the kind that works when you forget. PII Shield is the kind that works when you forget.


Written with AI assistance. Ideas and direction: 100% human.

Share:

Comments

Get insights delivered

Join the newsletter to get system-level insights and organizational patterns in your inbox.

No spam. Unsubscribe anytime.