Ontologies

Our AI platform leverages ontologies to make sense of healthcare concepts.

The ScienceIO AI platform uses a Knowledge Graph to connect healthcare concepts to 20+ leading ontologies. These ontologies contain millions of concepts and we refresh them regularly.

What is an Ontology?

In a healthcare data setting, an ontology is a standardized way of identifying different types of healthcare information across the entire industry and around the globe. This is usually done through codes, groupings, and specific naming conventions.

Ontologies are used to help medical data move between different electronic systems, and to semantically represent healthcare concepts. Many ontologies focus on one type of concept (procedure codes, medical conditions, clinical drugs, etc.) but others, like the UMLS, include a variety of concept types and have some overlap with other ontologies.

Supported Ontologies

ScienceIO’s Knowledge Graph categorizes each of our supported ontologies as primary when it uses them to map your data. These relationships are internally created and will evolve as we continue to train our models.

Primary Ontologies

A primary ontology is considered a parent-level ontology within our API. The Knowledge Graph uses it to map healthcare concepts found within that ontology and then looks for any additional relationships in associated secondary ontologies (example: UMLS, which also maps to a number of other secondary ontologies like LOINC, CPT, RxNorm, etc.). Our primary ontologies include:

  • UMLS
  • ChEMBL and ChEBI
  • dbSNP
  • Cell Line Ontology (CLO and CVCL)
  • GeneID
  • ClinVar
  • NCBI Taxonomy ID

Primary ontologies are shown in concept_type.


UMLS

The Unified Medical Language System (UMLS) is composed of the Metathesaurus, the Semantic Network, and the SPECIALIST Lexicon and Lexical Tools. It is widely used to develop digital tools and applications, and to link terms and codes across different systems or interested parties (doctors, pharmacies, insurance companies, hospital departments, etc.). It is also used in search engines, data mining, research, and statistics.

  • The UMLS is exceptionally comprehensive, and includes records that span all ScienceIO concept types; it is the basis for many of ScienceIO’s internal concept mappings.
  • The UMLS also maps to LOINC, SNOMED-CT, ICD-9/10, ICD-10-CM, CPT, HCPCS, OMIM, MeDRA, MeSH, NCIt, and RxNorm.
  • The UMLS code displays in the concept_id for each piece of UMLS healthcare data identified.

ChEMBL and ChEBI

ChEMBL is a database of bioactive molecules with drug-like properties. Its goal is to help translate genomic information into new drugs. ChEMBL also maps to RxNorm, and includes the following:

  • 2.3 million compounds
  • 1.5 million assays
  • 85,000 documents
  • 43,000 indications
  • 15,000 targets
  • 14,000 drugs
  • 6,300 mechanisms
  • 2,000 cells
  • 1,200 drug warnings
  • 757 tissues

ChEBI is a non-proprietary dictionary focused on “small” molecular entities (chemical compounds). These entities may be products of nature or synthetic products used to intervene in the processes of living organisms. ChEBI includes classes of molecular entities and part-molecular entities, but does not include nucleic acids, proteins, or peptides derived from proteins by cleavage.

ScienceIO Concept Type:

Chemicals and Drugs


CLO

Cell Line Ontology Database (CLO) is a community-based ontology of cell lines that is designed to create a standardized, logically defined format for publicly available cell line entry data. CLO includes more than 36,000 cell lines that are drawn from the following repositories:

  • Cell Line Knowledgebase (CLKB)
  • European Bioinformatics Institute (EMBL-EBI)
  • Coriell Catalog
  • Bioassay Ontology (BAO)

ScienceIO Concept Type:

Cell Biology


Gene (GeneID)

Gene provides detailed information about genes, identifies gene-specific connections, and assigns genes a unique identifier. It includes over 33 million entries for a wide range of species that are pulled from all major taxonomic groups. Gene’s records include:

  • Nomenclature
  • Reference Sequences (RefSeqs)
  • Maps
  • Pathways
  • Variations
  • Phenotypes
  • Links to worldwide resources related to genome, phenotype, and locus

ScienceIO Concept Type:

Genetics


dbSNP

The Single Nucleotide Polymorphism database (dbSNP) is an authoritative central repository for simple genetic polymorphisms that spans all classes of simple molecular variation, including neutral polymorphisms and those that cause rare clinical phenotypes. dbSNP includes:

  • Single-base nucleotide substitutions, also known as single nucleotide polymorphisms (SNPs)
  • Small-scale multi-base deletions or insertions, also known as deletion insertion polymorphisms (DIPs)
  • Retroposable element insertions and microsatellite repeat variations, also known as short tandem repeats (STRs)
  • Genomic and RefSeq mapping for common variations and clinical mutations
  • Population frequency
  • Molecular consequence
  • Publication information

ScienceIO Concept Type:

Genetics


ClinVar

ClinVar is a public archive of the relationships between human variations and phenotypes, with the goal of aggregating information about genomic variation such that we can understand its relationship to human health. ClinVar provides:

  • Records for a gene
  • Records by chromosome location
  • Records for a disease or phenotype

ScienceIO Concept Type:

Genetics


NCBI Taxonomy

The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names as well as classifications, and spans every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration (INSDC). It distinguishes between formal and informal names. NCBI is also the standard nomenclature and classification repository for:

  • GenBank
  • The European Molecular Biology Laboratory (EMBL)
  • DNA Data Bank of Japan (DDBJ)

ScienceIO Concept Type:

Species & Viruses