[BETA] Structure with Ontologies

Now our API can return the ontologies along with the UMLS codes in each JSON response.

The API's enhanced Structure model (structure-ontologies) automatically maps UMLS codes to a select set of ontologies. This means you no longer have to manually look up every UMLS code received from the API to find the ontologies that are related to it.

❗️

The endpoint is currently only available via HTTP or a Python requests library. SDK support is coming soon!

Note that we cannot guarantee backward compatibility for any beta endpoint. API signatures and response schemas are subject to change in future releases.

Why Try It?

Use this new endpoint if you want to:

  • See if ScienceIO maps your data to the ontologies you care about
  • See how many ontologies ScienceIO can find for an identified piece of healthcare data

Supported Ontologies

In this first phase, the following ontologies are identified and supported:

  • CPT
  • HCPCS
  • ICD-10
  • SNOMED-CT

The API uses a Knowledge Graph to connect healthcare concepts to ontologies. The ontologies above are mapped by leveraging UMLS as a primary ontology.

📘

These are not the only ontologies.

Additional ontologies like ChEMBL, NCIT, RxNORM, GeneID, Cell Line, LOINC, and more are still used by the API to identify and structure healthcare information, but are not yet supported by this feature. For more information about ontologies, see our Ontologies page.

How to Access the New Endpoint

In this release, Python users can access the new endpoint via a requests library. HTTP users will need to update the endpoint in their code to structure-ontologies in order to automatically map to the ontologies.

HTTP

Updating the endpoint to structure-ontologies (line one in the example below) is optional and reversible; you may continue to call the current structure endpoint.

curl https://api.aws.science.io/v2/structure-ontologies \
  --request POST \
  --header "Content-type: application/json" \
  --header "x-api-id: $SCIENCEIO_KEY_ID" \
  --header "x-api-secret: $SCIENCEIO_KEY_SECRET" \
  --data '{ "text": "ALS is often called Lou Gehrigs disease, after the baseball player who was diagnosed with it. Doctors usually do not know why ALS occurs."}'

📘

For additional help, see Make an API Call (HTTP.

Requests Library (Python)

First, create a mini SDK for the API that includes your API keys. You will use an exponential backoff, and ScienceIO recommends maximizing the attempts at 8. Make sure you add your API keys in the appropriate variables (the last two lines of this code sample).

👍

When you have finished and tested this piece, go to the Examples section to learn how to make your API calls.

import time
import requests

MAX_ATTEMPTS = 8
INITIAL_TIMEOUT_SECS = 1

def get_result_with_exponential_backoff(base_url: str, request_id: str, api_key_id: str, api_key_secret: str):
  url = f"{base_url}/{request_id}"
  headers = {
    "Content-Type": "application/json",
    "x-api-id": api_key_id,
    "x-api-secret": api_key_secret,
  }

  current_timeout = INITIAL_TIMEOUT_SECS

  for _ in range(MAX_ATTEMPTS):
    response = requests.get(url, headers=headers)

    response.raise_for_status()

    response_json = response.json()

    #print(response_json)

    inference_result = response_json.get("inference_result", None)

    if inference_result is not None:
        return inference_result

    time.sleep(current_timeout)
    current_timeout *= 2

  raise Exception("Number of attempts exhausted, try again later")

def call_short_async(model: str, text: str, api_key_id: str, api_key_secret: str):
  url = f"https://api.aws.science.io/v2/{model}"
  json_request = {"text": text}
  headers = {
    'x-api-id': api_key_id,
    'x-api-secret': api_key_secret,
    'Content-Type': 'application/json'
  }
  response = requests.request("POST", url, headers=headers, json=json_request)

  status_code = response.status_code

  if status_code != 201:
    reason = response.reason
    raise Exception(f"Request failed with status code {status_code} and reason: {reason}")

  request_id = response.json()["request_id"]

  return get_result_with_exponential_backoff(url, request_id, api_key_id, api_key_secret)

# Add your ScienceIO API keys here.
api_key_id = "<YOUR_API_KEY_ID>"
api_key_secret = "<YOUR_API_SECRET_KEY>"

Examples

👍

Click to go to the example:

Make a Basic Call

Use the following code to make a call to the structure-ontologies endpoint with your healthcare text.

# Basic call to the endpoint. Replace the text with yours.

response = call_short_async("structure-ontologies", "The patient presents with a complaint of allergies to ragweed.", api_key_id, api_key_secret)
response

The response looks like this, and includes the new ontologies dictionary:

{
  "request_id": "0cefa1a8-2320-4684-9a3b-2c081dfb9e3b",
  "inference_result": {
    "text": "The patient presents with a complaint of allergies to ragweed.",
    "spans": [
      {
        "concept_id": "UMLS:C0086418",
        "concept_name": "Homo sapiens",
        "concept_type": "Species & Viruses",
        "pos_end": 11,
        "pos_start": 4,
        "score_id": 0.999993085861206,
        "score_type": 0.9999990463256836,
        "text": "patient",
        "ontologies": {
          "SNOMEDCT_US": [
            {
              "aui": "A2882492",
              "code": "337915000",
              "name": "Homo sapiens"
            },
            {
              "aui": "A3497260",
              "code": "278412004",
              "name": "Human - origin"
            }
          ]
        }
      },
      {
        "concept_id": "UMLS:C0020517",
        "concept_name": "Hypersensitivity",
        "concept_type": "Medical Conditions",
        "pos_end": 50,
        "pos_start": 41,
        "score_id": 0.6348877549171448,
        "score_type": 0.9998083710670471,
        "text": "allergies",
        "ontologies": {
          "ICD10": [
            {
              "aui": "A0244105",
              "code": "T78.4",
              "name": "Allergy, unspecified"
            }
          ],
          "SNOMEDCT_US": [
            {
              "aui": "A10863724",
              "code": "421961002",
              "name": "Hypersensitivity reaction"
            },
            {
              "aui": "A9379243",
              "code": "418634005",
              "name": "Allergic reaction to substance"
            }
          ]
        }
      },
      {
        "concept_id": "UMLS:C0946568",
        "concept_name": "Ambrosia artemisiifolia",
        "concept_type": "Species & Viruses",
        "pos_end": 61,
        "pos_start": 54,
        "score_id": 0.8726935386657715,
        "score_type": 0.6667567491531372,
        "text": "ragweed",
        "ontologies": {
          "SNOMEDCT_US": [
            {
              "aui": "A24089354",
              "code": "41020006",
              "name": "Ambrosia artemisiifolia"
            }
          ]
        }
      }
    ]
  },
  "model_type": "structure-ontologies",
  "inference_status": "COMPLETED",
  "message": "Your inference results are ready."
}

To format these results into a table, use pandas (see the next example for a more detailed look at pandas):

# Pandas can help you better analyze the response.
import pandas as pd
df = pd.DataFrame(response['spans'])
df

Extract Concepts that Map to a Specific Ontology Using Pandas

In this example, we'll take some sample text and extract out the ICD-10 codes.

First, let's input some sample healthcare text (make sure you have pandas installed before beginning):

# First, we need to input the text.

text = """Patient is a 35-year-old male with a history of hypertension and 
obesity. He presents today with chest pain and shortness of breath. 
Physical examination reveals elevated blood pressure and increased heart rate.
EKG shows evidence of acute myocardial infarction. Patient is started on aspirin
 and referred for urgent cardiac catheterization. Further management to be 
 determined following catheterization."""

# Now we'll call the endpoint and request a response using pandas.

response = call_short_async("structure-ontologies", text, api_key_id, api_key_secret)

pd.DataFrame(response["spans"])

The response looks like this:

2708

The far right column contains the ontologies. You can use a wrapper function to extract only the healthcare concepts that are mapped to a specific ontology (in this example, ICD-10, entered as "ICD10"), and return a pandas dataframe with only the ontology codes you were looking for.

Note: You could find SNOMED-CT, instead, by changing the last line in the code below to: extract_specific_codes(response,"SNOMEDCT_US")

# In this example, we are looking for ICD10.

def extract_specific_codes(response,ontology = "ICD10"):
  """
  Parse the response from the API and extract concepts that map to a specific ontology
  """
  df = pd.DataFrame(response["spans"])
  temp_df = pd.json_normalize(df["ontologies"])
  df = df.join(temp_df)
  if ontology in ["SNOMEDCT_US","ICD10"]:
    if ontology in df.columns:
      temp_df = df[~df[ontology].isna()].reset_index()
      temp_df[ontology+"_codes"] = temp_df[ontology].apply(lambda x :[r["code"] for r in x])
      temp_df[ontology+"_count"] = temp_df[ontology].apply(len)
      return temp_df
    else:
      return print("Response had no annotations in", ontology)
  else:
    return print("Ontology requested can be one of SNOMEDCT_US,ICD10")
  
  # Specify the ontology here. You can also try this exercise with "SNOMEDCT_US".
  
  extract_specific_codes(response,"ICD10")

The response looks like this:

2870

Interpreting the New Ontologies Dictionary

The JSON response maps each concept_id (UMLS code) to one or more secondary ontologies by using its atom unique identifier (AUI), providing insight into the specific ontologies that are associated with each piece of healthcare information. For existing users, the arrays are the same as before except they now include the ontologies dictionary, with the following information for each mapped ontology:

  • aui = the atom unique identifier
  • code = the source identifier in the ontology
  • name = the concept name in the ontology

In the example below, the UMLS codes have both been mapped to SNOMED-CT.

  • UMLS code C5203670 for "COVID-19" was mapped to SNOMED-CT based on the AUI A31531574. The corresponding concept name in SNOMED-CT for "COVID-19" is "Disease caused by 2019-nCoV."

  • UMLS code C0012634 for "Disease" was mapped to SNOMED-CT based on the AUI A2880798. The corresponding concept name in SNOMED-CT for "Disease" is the same as the UMLS concept_name.

Remember that only the currently supported ontologies will display; look for additional ontologies in future releases.

{
    "text": "COVID-19 is a disease",
    "spans": [
        {
            "concept_id": "UMLS:C5203670",
            "concept_name": "COVID-19",
            "concept_type": "Medical Conditions",
            "pos_end": 8,
            "pos_start": 0,
            "score_id": 0.9998598098754883,
            "score_type": 0.9999113082885742,
            "text": "COVID-19",
            "ontologies": {
                "SNOMEDCT_US": [
                    {
                        "aui": "A31531574",
                        "code": "840539006",
                        "name": "Disease caused by 2019-nCoV"
                    }
                ]
            }
        },
        {
            "concept_id": "UMLS:C0012634",
            "concept_name": "Disease",
            "concept_type": "Medical Conditions",
            "pos_end": 21,
            "pos_start": 14,
            "score_id": 0.9999176263809204,
            "score_type": 0.9999895095825195,
            "text": "disease",
            "ontologies": {
                "SNOMEDCT_US": [
                    {
                        "aui": "A2880798",
                        "code": "64572001",
                        "name": "Disease"
                    }
                ]
            }
        }
    ],
    "model_type": "structure-ontologies",
    "inference_status": "COMPLETED",
    "message": "Your inference results are ready."
}

Troubleshooting

NameError or AttributeError

Check your code and the endpoint to be sure everything is correct.

  • Check your requests library to be sure the mini SDK code was copied and edited correctly, and that you included your API keys
  • In cURL, make sure you have "v2" in your url and that you have used a dash and not an underscore in the endpoint (https://api.aws.science.io/v2/structure-ontologies)

If these steps fail, you may wish to generate new API keys from the ScienceIO dashboard and try again. Note, however, that your old API keys will no longer work. Be sure they are not being used in production (or that you are prepared to update your keys) before you generate new ones.

SyntaxError

This error is usually caused by contractions.

  • Manually remove the apostrophes in your query text
  • Write code to automatically clean up your query text and remove apostrophes

If this does not resolve the issue, make sure you have enclosed your query text with quotation marks (single or double).

Ontology Not Found

Make sure you have used the correct nomenclature, and that you are requesting one of the supported ontologies. They should be written exactly as follows in your code:

  • CPT
  • HCPCS
  • ICD10
  • SNOMEDCT_US

Feedback

We'd love your feedback! Tell us what you think about this product update. Email us at [email protected].