Our API leverages ontologies to make sense of healthcare concepts.

The ScienceIO API uses a Knowledge Graph to connect healthcare concepts to 20+ leading ontologies. These ontologies contain millions of concepts and we refresh them regularly.


To help make understanding your results easier, we've defined our own concept types (concept_type) to create higher-level views of the data. This allows you to build groupings of similar data types, which you can then examine more closely using the concept_id. See Concept Types for more information.

What is an Ontology?

In a healthcare data setting, an ontology is a standardized way of identifying different types of healthcare information across the entire industry and around the globe. This is usually done through codes, groupings, and specific naming conventions.

Ontologies are used to help medical data move between different electronic systems, and to semantically represent healthcare concepts. Many ontologies focus on one type of concept (procedure codes, medical conditions, clinical drugs, etc.) but others, like the UMLS, include a variety of concept types and have some overlap with other ontologies.

Supported Ontologies

ScienceIO's Knowledge Graph categorizes each of our supported ontologies as primary or secondary when it uses them to map your data. These relationships are internally created and will evolve as we continue to train our models.

Primary Ontologies

A primary ontology is considered a parent-level ontology within our API. The Knowledge Graph uses it to map healthcare concepts found within that ontology and then looks for any additional relationships in associated secondary ontologies (example: UMLS, which also maps to a number of other secondary ontologies like LOINC, CPT, RxNorm, etc.). Our primary ontologies include:

  • UMLS
  • ChEMBL and ChEBI
  • dbSNP
  • Cell Line Ontology (CLO and CVCL)
  • GeneID
  • ClinVar
  • NCBI Taxonomy ID

Primary ontologies are shown in concept_type.

Secondary Ontologies

A secondary ontology is considered a child-level ontology within our API. Our Knowledge Graph relies on its association to a primary ontology to map its concepts (example: RxNorm is a secondary ontology, so its primary ontology - UMLS - would be leveraged to map RxNorm concepts). Our secondary ontologies include:

  • CPT
  • ICD9CM
  • ICD10
  • ICD-10-CM
  • MeDRA
  • MSH (MeSH)
  • NCIt
  • OMIM
  • RxNorm
  • NDDF (FDB MedKnowledge)

Note that only some primary ontologies have associated secondary ontologies. Any secondary ontologies are shown in ontologies.

Understanding Ontology Relationships


Secondary ontologies only display when calling the structure-ontologies endpoint, and only for some ontologies. See [BETA] Structure v1.1 with Ontologies for more information. For all other responses, the concept_id is included to show the primary ontology.

When you send a query to the API, our Knowledge Graph looks for each healthcare concept in a primary ontology and returns those findings in the concept_id. When calling the structure-ontologies endpoint, it also looks in any applicable secondary ontologies to locate aliases for that concept and returns them in ontologies.

Secondary ontologies are identified by mapping each concept_id (UMLS code) to one or more ontologies using its atom unique identifier (AUI), which provides insight into the specific ontologies that are associated with each piece of healthcare information. The ontologies dictionary (if it displays) includes the following information for the secondary ontology:

  • aui = the atom unique identifier
  • code = the source identifier in the ontology
  • name = the concept name in the ontology

In the example below to the structure-ontologies endpoint, the UMLS codes have both been mapped to SNOMED-CT.

  • UMLS code C5203670 for "COVID-19" was mapped to SNOMED-CT based on the AUI A31531574. The corresponding concept name in SNOMED-CT for "COVID-19" is "Disease caused by 2019-nCoV."
  • UMLS code C0012634 for "Disease" was mapped to SNOMED-CT based on the AUI A2880798. The corresponding concept name in SNOMED-CT for "Disease" is the same as the UMLS concept_name.
        “text”: “COVID-19 is a disease”,
        “spans”: [
                “concept_id”: “UMLS:C5203670",
                “concept_name”: “COVID-19",
                “concept_type”: “Medical Conditions”,
                “pos_end”: 8,
                “pos_start”: 0,
                “score_id”: 0.9998598098754883,
                “score_type”: 0.9999113082885742,
                “text”: “COVID-19",
                “ontologies”: {
                    “SNOMEDCT_US”: [
                            “aui”: “A31531574",
                            “code”: “840539006",
                            “name”: “Disease caused by 2019-nCoV”
                “concept_id”: “UMLS:C0012634",
                “concept_name”: “Disease”,
                “concept_type”: “Medical Conditions”,
                “pos_end”: 21,
                “pos_start”: 14,
                “score_id”: 0.9999176263809204,
                “score_type”: 0.9999895095825195,
                “text”: “disease”,
                “ontologies”: {
                    “SNOMEDCT_US”: [
                            “aui”: “A2880798",
                            “code”: “64572001",
                            “name”: “Disease”
    “model_type”: “structure-ontologies”,
    “inference_status”: “COMPLETED”,
    “message”: “Your inference results are ready.”

Learn More

This section offers more details about the primary ontologies our Knowledge Graph uses to structure healthcare data. Use it to gain a deeper understanding of ontologies and what type of information is included in each.


Remember that we use more ontologies than are listed in this section. It has been provided for learning purposes only. See Supported Ontologies for a full list.


The Unified Medical Language System (UMLS) is composed of the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon and Lexical Tools. It is widely used to develop digital tools and applications, and to link terms and codes across different systems or interested parties (doctors, pharmacies, insurance companies, hospital departments, etc.). It is also used in search engines, data mining, research, and statistics.

  • The UMLS is exceptionally comprehensive, and includes records that span all ScienceIO concept types; it is the basis for many of ScienceIO's internal concept mappings.
  • The UMLS also maps to LOINC, SNOMED-CT, ICD-9/10, ICD-10-CM, CPT, HCPCS, OMIM, MeDRA, MeSH, NCIt, and RxNorm.
  • The UMLS code displays in the concept_id for each piece of UMLS healthcare data identified.


ChEMBL is a database of bioactive molecules with drug-like properties. Its goal is to help translate genomic information into new drugs. ChEMBL also maps to RxNorm, and includes the following:

  • 2.3 million compounds
  • 1.5 million assays
  • 85,000 documents
  • 43,000 indications
  • 15,000 targets
  • 14,000 drugs
  • 6,300 mechanisms
  • 2,000 cells
  • 1,200 drug warnings
  • 757 tissues

ChEBI is a non-proprietary dictionary focused on "small" molecular entities (chemical compounds). These entities may be products of nature or synthetic products used to intervene in the processes of living organisms. ChEBI includes classes of molecular entities and part-molecular entities, but does not include nucleic acids, proteins, or peptides derived from proteins by cleavage.

ScienceIO Concept Type:

Chemical and Drugs


Cell Line Ontology Database (CLO) is a community-based ontology of cell lines that is designed to create a standardized, logically defined format for publicly available cell line entry data. CLO includes more than 36,000 cell lines that are drawn from the following repositories:

  • Cell Line Knowledgebase (CLKB)
  • European Bioinformatics Institute (EMBL-EBI)
  • Coriell Catalog
  • Bioassay Ontology (BAO)

ScienceIO Concept Type:

Cell Biology

Gene (GeneID)

Gene provides detailed information about genes, identifies gene-specific connections, and assigns genes a unique identifier. It includes over 33 million entries for a wide range of species that are pulled from all major taxonomic groups. Gene's records include:

  • Nomenclature
  • Reference Sequences (RefSeqs)
  • Maps
  • Pathways
  • Variations
  • Phenotypes
  • Links to worldwide resources related to genome, phenotype, and locus

ScienceIO Concept Type:



The Single Nucleotide Polymorphism database (dbSNP) is an authoritative central repository for simple genetic polymorphisms that spans all classes of simple molecular variation, including neutral polymorphisms and those that cause rare clinical phenotypes. dbSNP includes:

  • Single-base nucleotide substitutions, also known as single nucleotide polymorphisms (SNPs)
  • Small-scale multi-base deletions or insertions, also known as deletion insertion polymorphisms (DIPs)
  • Retroposable element insertions and microsatellite repeat variations, also known as short tandem repeats (STRs)
  • Genomic and RefSeq mapping for common variations and clinical mutations
  • Population frequency
  • Molecular consequence
  • Publication information

ScienceIO Concept Type:



ClinVar is a public archive of the relationships between human variations and phenotypes, with the goal of aggregating information about genomic variation such that we can understand its relationship to human health. ClinVar provides:

  • Records for a gene
  • Records by chromosome location
  • Records for a disease or phenotype

ScienceIO Concept Type:


NCBI Taxonomy

The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names as well as classifications, and spans every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration (INSDC). It distinguishes between formal and informal names. NCBI is also the standard nomenclature and classification repository for:

  • GenBank
  • The European Molecular Biology Laboratory (EMBL)
  • DNA Data Bank of Japan (DDBJ)

ScienceIO Concept Type:

Species & Viruses