Home

Identify_PHI

The identify_phi endpoint identifies all protected health information (PHI) in your text and categorizes it.

When you submit text to the ScienceIO API via this endpoint, our AI analyzes the query text, identifies and categorizes all protected health information (PHI), and returns a JSON response with that information.

Remember that only information with identifiers tying it to an individual is considered PHI.

How to Access the Endpoint

📘

For additional help with API calls, see Make an API Call (Python SDK) or Make an API Call (HTTP).

Python SDK

First, make sure you are using the latest version of the SDK; the endpoint will not work on versions prior to 2.0.0. You can use this command to upgrade:

pip install scienceio --upgrade

After initializing the ScienceIO client, use scio.identify_phi() to submit the request:

from scienceio import ScienceIO
scio = ScienceIO()

query_text = """Patient: John Doe
Address: 112 First Ave, New York, NY
Phone: 555-555-1212
Admission date: December 13, 2022
Diagnosis: UTI
Physician: Dr. Jane Smith
NPI: 1234567890
Physician number: 555-555-9876
Clinical note:
Mr. Doe is a 75-year-old male with a history of urinary tract infections.
He presented to the Pearson clinic today with symptoms of dysuria and frequency.
A urine culture was performed by Dr. Jones and showed significant growth of Escherichia coli.
The patient was started on a course of oral antibiotics and will follow up with
the clinic in one week for a repeat urine culture.
If no improvement, patient will be referred to St. Joseph Hospital."""

#calling the identify_phi endpoint
response = scio.identify_phi(query_text)

print(response)

Optional:

Format the response to be more readable, as is seen in the sample JSON response on this page. Use the following code instead of print(response):

# Format the JSON response and print
# Use instead of print(response)
import json
print(json.dumps(response, indent=2))

HTTP

After configuring your environment variables, you can submit a POST request to the identify-phi endpoint with your PHI text provided to the input_text keyword:

📘

The change to input_text is part of a larger standardization of underlying schemas. It will be rolled out to other endpoints in future releases.

curl https://api.aws.science.io/v2/identify-phi \
  --request POST \
  --header "Content-type: application/json" \
  --header "x-api-id: $SCIENCEIO_KEY_ID" \
  --header "x-api-secret: $SCIENCEIO_KEY_SECRET" \
  --data '{ "input_text": "Patient: John Doe\nAddress: 112 First Ave, New York, NY\nPhone: 555-555-1212\nAdmission date: December 13, 2022\nDiagnosis: UTI\nPhysician: Dr. Jane Smith\nNPI: 1234567890\nPhysician number: 555-555-9876\nClinical note: Mr. Doe is a 75-year-old male with a history of urinary tract infections. He presented to the Pearson clinic today with symptoms of dysuria and frequency. A urine culture was performed by Dr. Jones and showed significant growth of Escherichia coli.The patient was started on a course of oral antibiotics and will follow up with the clinic in one week for a repeat urine culture.If no improvement, patient will be referred to St. Joseph Hospital."}'

Make sure your GET request also uses the identify-phi endpoint in line 1:

curl https://api.aws.science.io/v2/identify-phi/<REQUEST_ID> \
  --request GET \
  --header "x-api-id: $SCIENCEIO_KEY_ID" \
  --header "x-api-secret: $SCIENCEIO_KEY_SECRET"

📘

For additional help with HTTP configuration, POST requests, or GET requests, see Make an API Call (HTTP).

JSON Response

The resulting JSON message includes each piece of PHI found, its location, the type of PHI it is (phi_type), and the broader PHI category (category) it was assigned to. It also includes a score, which is the confidence our API's model has in selecting the appropriate label.

{
  "input_text": "Patient: John Doe\nAddress: 112 First Ave, New York, NY\nPhone: 555-555-1212\nAdmission date: December 13, 2022\nDiagnosis: UTI\nPhysician: Dr. Jane Smith\nNPI: 1234567890\nPhysician number: 555-555-9876\nClinical note:\nMr. Doe is a 75-year-old male with a history of urinary tract infections.\nHe presented to the Pearson clinic today with symptoms of dysuria and frequency.\nA urine culture was performed by Dr. Jones and showed significant growth of Escherichia coli.\nThe patient was started on a course of oral antibiotics and will follow up with\nthe clinic in one week for a repeat urine culture.\nIf no improvement, patient will be referred to St. Joseph Hospital.",
  "annotations": [
    {
      "labels": {
        "phi_type": {
          "label": "[PATIENT]",
          "score": 1.0
        },
        "category": {
          "label": "[PERSON]"
        }
      },
      "text": "John Doe",
      "span": {
        "start": 9,
        "end": 17
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[STREET]",
          "score": 0.998
        },
        "category": {
          "label": "[LOCATION]"
        }
      },
      "text": "112 First Ave",
      "span": {
        "start": 27,
        "end": 40
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[CITY]",
          "score": 0.94
        },
        "category": {
          "label": "[LOCATION]"
        }
      },
      "text": "New York",
      "span": {
        "start": 42,
        "end": 50
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[STATE]",
          "score": 0.999
        },
        "category": {
          "label": "[LOCATION]"
        }
      },
      "text": "NY",
      "span": {
        "start": 52,
        "end": 54
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[PHONE]",
          "score": 0.983
        },
        "category": {
          "label": "[CONTACT]"
        }
      },
      "text": "555-555-1212",
      "span": {
        "start": 62,
        "end": 74
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[DATE]",
          "score": 1.0
        },
        "category": {
          "label": "[DATE]"
        }
      },
      "text": "December 13, 2022",
      "span": {
        "start": 91,
        "end": 108
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[DOCTOR]",
          "score": 0.999
        },
        "category": {
          "label": "[PERSON]"
        }
      },
      "text": "Jane Smith",
      "span": {
        "start": 139,
        "end": 149
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[MEDICALRECORD]",
          "score": 0.829
        },
        "category": {
          "label": "[IDENTIFIER]"
        }
      },
      "text": "1234567890",
      "span": {
        "start": 155,
        "end": 165
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[PHONE]",
          "score": 0.843
        },
        "category": {
          "label": "[CONTACT]"
        }
      },
      "text": "555-555-9876",
      "span": {
        "start": 184,
        "end": 196
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[PATIENT]",
          "score": 1.0
        },
        "category": {
          "label": "[PERSON]"
        }
      },
      "text": "Doe",
      "span": {
        "start": 216,
        "end": 219
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[AGE]",
          "score": 0.999
        },
        "category": {
          "label": "[DEMOGRAPHICS]"
        }
      },
      "text": "75",
      "span": {
        "start": 225,
        "end": 227
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[HOSPITAL]",
          "score": 0.996
        },
        "category": {
          "label": "[INSTITUTION]"
        }
      },
      "text": "Pearson clinic",
      "span": {
        "start": 306,
        "end": 320
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[DOCTOR]",
          "score": 0.999
        },
        "category": {
          "label": "[PERSON]"
        }
      },
      "text": "Jones",
      "span": {
        "start": 404,
        "end": 409
      }
    },
    {
      "labels": {
        "phi_type": {
          "label": "[HOSPITAL]",
          "score": 0.999
        },
        "category": {
          "label": "[INSTITUTION]"
        }
      },
      "text": "St. Joseph Hospital",
      "span": {
        "start": 639,
        "end": 658
      }
    }
  ]
}

View the Results in a Table

A table view can make it easier to interpret the results. Simply use pandas to create a DataFrame.

# Use pandas to view the results in a table.
import pandas as pd
df = pd.json_normalize(response['annotations'])
df
1686

PHI Labels

PHI labels are broken down by phi_type (the PHI identifier assigned to the text) and category (the broader PHI category assigned to the text).

phi_type

The following PHI types are possible:

* AGE
* BIOID
* CITY
* COUNTRY
* DATE
* DEVICE
* DOCTOR
* EMAIL
* FAX
* HEALTHPLAN
* HOSPITAL
* IDNUM
* LOCATION-OTHER
* MEDICALRECORD
* ORGANIZATION
* PATIENT
* PHONE
* PROFESSION
* STATE
* STREET
* URL
* USERNAME
* ZIP

category

The following categories are possible:

* CONTACT
* DATE
* DEMOGRAPHICS
* IDENTIFIER
* INSTITUTION
* LOCATION
* ORGANIZATION
* PERSON
* WEBADDRESS

Mappings

Current mappings between each category (the broader PHI category assigned to the text) and its included phi_type (the PHI identifier assigned to the text) are as follows:

CategoryPHI Type
CONTACTEMAIL, FAX, PHONE, USERNAME
DATEDATE
DEMOGRAPHICSAGE, PROFESSION
IDENTIFIERBIOID, DEVICE, HEALTHPLAN, IDNUM, MEDICAL RECORD
INSTITUTIONORGANIZATION
LOCATIONCITY, COUNTRY, STATE, STREET, ZIP, LOCATION-OTHER
ORGANIZATIONORGANIZATION
PERSONDOCTOR, PATIENT
WEBADDRESSURL

Troubleshooting

NameError or AttributeError

Check your code and the endpoint to be sure everything is correct.

  • Check to be sure you have included your API keys
  • In cURL, make sure you have "v2" in your url and that you have used a dash and not an underscore in the endpoint (https://api.aws.science.io/v2/identify-phi)

If these steps fail, you may wish to generate new API keys from the ScienceIO dashboard and try again. Note, however, that your old API keys will no longer work. Be sure they are not being used in production (or that you are prepared to update your keys) before you generate new ones.

SyntaxError

This error is usually caused by contractions.

  • Manually remove the apostrophes in your query text
  • Write code to automatically clean up your query text and remove apostrophes

If this does not resolve the issue, make sure you have enclosed your query text with quotation marks (single or double).

Incomplete or No PHI Identification

PHI should be identified in all text that contains it. If this does not happen:

  • Make sure your text included unredacted PHI.
  • Make sure you have included a variety of PHI types.
  • Make sure the information is actually PHI; medications like Benadryl, for example, are not PHI.

Remember that only information with identifiers tying it to an individual is considered PHI.

  • "Type II diabetes" is health information, not PHI.
  • "Sally Jones has type II diabetes" includes a patient identifier and therefore is PHI; removing the patient name turns it back into health information.

If the issue persists, contact support to alert us of a problem so that we may investigate.