Selective PHI Redaction

Redact only some PHI from a clinical note rather than all PHI found.

Sometimes, you only want to redact some of the protected health information (PHI) that is identified by the API. This may happen when you are using patient data internally, like for training or storage, and you need to see and/or use some of the PHI while scrubbing the rest.

To do this, we can first call the Identify PHI endpoint to see what PHI exists in our query text, and then we can run code to exclude some of those PHI Types from redaction.

Step 1: Call the Identify PHI Endpoint

You will need to import and set up the ScienceIO library and pandas (under the pd alias) to successfully run the code. Feel free to replace the query_text with your own if you want to try it on your own data or clinical notes.

# Import and set up required packages
from scienceio import ScienceIO
scio = ScienceIO()
import pandas as pd

# Provide query text containing PHI
query_text = "Jane Doe went to see Dr. House for an outpatient visit. The doctor upon examining her determined she needs to come in for a surgery. The surgery was scheduled to be performed at the Mercy Hospital in New Hope, PA. "

# Call the Identify PHI endpoint
identify_result = scio.identify_phi(query_text)
identify_result

Step 2: Choose Which PHI to Keep Visible

If you already know which PHI Types you want to exclude from redaction, you can go directly to step 3 to specify those values. The following PHI Types are available for redaction and could show up in your JSON response:

AGE
BIOID
CITY
COUNTRY
DATE
DEVICE
DOCTOR
EMAIL
FAX
HEALTHPLAN
HOSPITAL
IDNUM
LOCATION-OTHER
MEDICALRECORD
ORGANIZATION
PATIENT
PHONE
PROFESSION
STATE
STREET
URL
USERNAME
ZIP

Otherwise, look at the JSON response and decide which PHI Types you wish to redact. They can be found in the label values. The response looks like this:

{'input_text': 'Jane Doe went to see Dr. House for an outpatient visit. The doctor upon examining her determined she needs to come in for a surgery. The surgery was scheduled to be performed at the Mercy Hospital in New Hope, PA. ',
 'annotations': [{'labels': {'phi_type': {'label': '[PATIENT]',
     'score': 0.999},
    'category': {'label': '[PERSON]'}},
   'text': 'Jane Doe',
   'span': {'start': 0, 'end': 8}},
  {'labels': {'phi_type': {'label': '[DOCTOR]', 'score': 0.999},
    'category': {'label': '[PERSON]'}},
   'text': 'House',
   'span': {'start': 25, 'end': 30}},
  {'labels': {'phi_type': {'label': '[HOSPITAL]', 'score': 0.999},
    'category': {'label': '[INSTITUTION]'}},
   'text': 'Mercy Hospital',
   'span': {'start': 182, 'end': 196}},
  {'labels': {'phi_type': {'label': '[CITY]', 'score': 0.999},
    'category': {'label': '[LOCATION]'}},
   'text': 'New Hope',
   'span': {'start': 200, 'end': 208}},
  {'labels': {'phi_type': {'label': '[STATE]', 'score': 0.999},
    'category': {'label': '[LOCATION]'}},
   'text': 'PA',
   'span': {'start': 210, 'end': 212}}]}

Step 3: Selectively Redact the Remaining PHI

First, we’ll specify which label values should be excluded from the redaction in a do_not_redact variable. Next, we’ll process the original text again but parse the annotations in reverse order (this is to avoid unexpected changes in the start and end values as PHI is redacted), and redact only the PHI Types that are not included in the do_not_redact variable. Then we’ll print the result.

In this example, we are choosing NOT to redact the following PHI Types:

  • DOCTOR
  • HOSPITAL
  • STATE

If you wish, change the PHI Types to ones found in your own data or clinical notes. The full list of possible PHI Types is shown in Step 2.

# DO NOT REDACT list
do_not_redact = ["[DOCTOR]","[HOSPITAL]","[STATE]"]

# Redact the text
redacted_text = identify_result["input_text"]
for annotation in reversed(identify_result["annotations"]):
    phi_type = annotation["labels"]["phi_type"]["label"]

    if phi_type not in do_not_redact:

        start = annotation["span"]["start"]
        end = annotation["span"]["end"]
        redacted_text = redacted_text[:start] + phi_type + redacted_text[end:]

redacted_text

The selectively redacted text looks like this:

[PATIENT] went to see Dr. House for an outpatient visit. The doctor upon examining her determined she needs to come in for a surgery. The surgery was scheduled to be performed at the Mercy Hospital in [CITY], PA.