Full PHI Redaction for Multiple Clinical Notes

Redact all PHI from a set of clinical notes.

You can redact protected health information (PHI) from as many clinical notes as you wish, up to a maximum of 10,000 characters per API call. In this example, we will redact the PHI from a set of five clinical notes and display the results in two different ways.

Step 1: Install ScienceIO and Pandas

Use this code if you do not have pandas and ScienceIO installed on your machine, or if you are starting a new Jupyter notebook. Otherwise, skip to Step 2.

# Install scienceio and pandas
pip install scienceio
pip install pandas

Step 2: Call the Redact PHI Endpoint

You will need to import and set up the ScienceIO library and pandas (under the pd alias) to successfully run the code. You can replace the strings in clinical_notes with your own strings, if desired.

# Import and set up required packages
from scienceio import ScienceIO
scio = ScienceIO()
import pandas as pd

# Provide the clinical notes with unredacted PHI
clinical_notes = [
    "Patient: Jack Ryan, Address: 789 Blue Ridge Parkway, Arlington, VA 22202, Phone: 917-235-2351, Date of birth: 05/15/1972",
    "Dr. Robert House consulted on the case.",
    "Amy Smith had gallbladder surgery on 8/12/2022 and will follow up with Dr. Riley in 6 weeks.",
    "Hospital: St. Mary's Hospital, San Francisco, CA, Admission Date: 04/15/2020",
    "Medical Record Number: 311612351634, Diagnosis: Hypertension"

# Loop through each note and call the Redact PHI endpoint
redacted_responses = []

for text in clinical_notes:
    response = scio.redact_phi(text)

Step 3: View the Results

Option 1: Create an Ordered List

In this option, we will look at the index of each clinical note, display the redacted notes in their original order, and include the original note as a reference.

# Display redacted notes alongside originals, in original order
for idx, response in enumerate(redacted_responses):
    print(f"Original Note {idx + 1}:\n{clinical_notes[idx]}\n")
    print(f"Redacted Note {idx + 1}:\n{response['output_text']}\n")
    print('-' * 50)

The result looks like this:

Full Redaction Results

Option 2: Use a Dataframe to Create a Table

In this option, we will use the json_normalize function to display the results in a tabular format, and show each redacted clinical note in its original order along with a table that includes full details for the PHI redaction.

# Display redacted notes in order, with a full table for PHI
for idx, response in enumerate(redacted_responses):
    print(f"Redactions for Note {idx + 1}:\n", response["output_text"])
    df = pd.json_normalize(response['annotations'])
    print('-' * 50)

The result looks like this (image truncated due to length):

Full Redaction Results