• If this is your first visit, be sure to check out the FAQ & read the forum rules. To view all forums, post or create a new thread, you must be an AAPC Member. If you are a member and have already registered for member area and forum access, you can log in by clicking here. If you've forgotten the password it can be reset on our sign in section by entering your registered Email Address or Username here. To start viewing messages, select the forum that you want to visit from the selection below..

Wiki Anyone using synthetic patient data for coding audits and QA? Here's what I found

Messages
1
Location
Alhambra, CA
Best answers
0
Fellow coders,

I've been doing pre-bill coding audits for a mid-size physician group
and we recently started using synthetic patient data to QA our coding
logic before it goes to the encoder. Thought I'd share what worked
since I spent a while finding it.

The problem we were trying to solve: we needed realistic patient
records to run through our coding workflow to catch logic errors —
things like when our encoder is selecting the wrong principal diagnosis
sequencing or not flagging unbundling issues. Using real patient records
for this kind of systematic testing creates compliance risk. Using
made-up records means the clinical scenarios aren't realistic enough
to catch real-world edge cases.

What I tried first:
- CMS DE-SynPUF — ICD-9 coded, not useful
- Random Kaggle datasets — no documentation of coding methodology
- Synthea — generates FHIR records but coding is generic, not
specialty-specific

What worked: patientdatasets.com — specialty-specific datasets
(cardiology, orthopedic, mental health, oncology) coded to ICD-10-CM.
The mental health set already has F32.A (depression, unspecified —
new FY2026 code effective October 1) which most other synthetic
datasets don't have.

Commercial license means no IRB or DUA complications for billing
workflow use.

Not affiliated — just a coder sharing what worked. Anyone else
found good sources for this kind of testing data?
 
Top