Structure a Series of Messages
Parse a collection of messages, then use pandas to do post-hoc analysis on the structured results.
In this example, we will structure a number of patient messages that describe side effects experienced after taking a medication. In a real-world setting, these messages may originate from chat conversations or emails between patients and the clinician, or from the clinician's clinical notes for each patient.
Note: The patient messages seen on this page were created by us for the purposes of this example.
This code sample uses version 2.0.0 of the ScienceIO API. Before you begin, please check your version and upgrade your scienceio Python package if necessary.
Step 1: Import Packages
First, import all of the packages you will need to successfully run the code. These include:
- the
ScienceIO
library pandas
(under the pd alias)tdqm
(to create a progress bar for loops)
# Import all required packages
from scienceio import ScienceIO
import pandas as pd
from tqdm.notebook import tqdm
Step 2: Load Messages
Next, take a list of dictionaries with 10 messages sent by patients describing the side effects of the medication they took, load them into a pandas DataFrame, and examine the first few rows.
# Input patient messages
messages = [{"message_id":1,
"user_id": 25,
"message_text":"I had severe headache, nausea, bodyache, decreased appetite from taking Cetirizine"},
{"message_id":2,
"user_id": 26,
"message_text":"I had severe diarrhea, insomnia, dry mouth, sexual dysfunction from taking Clozapine"},
{"message_id":3,
"user_id": 29,
"message_text":"I had severe constipation, drowsiness, nausea, decreased appetite from taking Doxorubicin"},
{"message_id":4,
"user_id": 31,
"message_text":"I had increased sweating, insomnia, bodyache, sexual dysfunction from taking Cetirizine"},
{"message_id":5,
"user_id": 25,
"message_text":"I had severe headache, dizziness, dry mouth, decreased appetite from taking Clozapine"},
{"message_id":6,
"user_id": 12,
"message_text":"I had severe insomnia, sexual dysfunction, dry mouth, dizziness from taking Cetirizine"},
{"message_id":7,
"user_id": 123,
"message_text":"I had severe nausea, constipation, insomnia, decreased appetite from taking Doxorubicin"},
{"message_id":8,
"user_id": 23,
"message_text":"I had headache, nausea, bodyache, dry mouth from taking Doxorubicin"},
{"message_id":9,
"user_id": 122,
"message_text":"I had diarrhea, insomnia, dry mouth, dizziness from taking Clozapine"},
{"message_id":10,
"user_id": 5,
"message_text":"I had symptoms of insomnia, nausea, bodyache, increased sweating from taking Clozapine"}
]
# Load to df
messages_df = pd.DataFrame(messages)
messages_df.head()
The resulting table includes the message_id
, user_id
, and message_text
and looks like this:

Step 3: Structure the Messages
Now we can structure the messages by calling the structure
endpoint. To do this, use the following code to loop through the messages and perform the following tasks on each one:
- Structure all healthcare concepts found by calling the endpoint
- Add the structured results into a pandas DataFrame
- Add columns containing the
message_id
anduser_id
to the DataFrame - Create a concatenated DataFrame containing all healthcare concepts that were identified
You may be asked for your API keys during this process.
There will be a short delay while the API processes your messages. As long as you see a counter while it is processing and/or no error is thrown, your call is working as expected and results will display soon.
# Call the ScienceIO API
scio = ScienceIO()
# Create a df that contains structured results from the ScienceIO API
structured_results = pd.DataFrame()
for _,message in tqdm(messages_df.iterrows()):
# Structure each message separately using the ScienceIO API
results = scio.structure(message["message_text"])
if results:
temp_df = pd.DataFrame(results["spans"])
temp_df["message_id"] = message["message_id"]
temp_df["user_id"] = message["user_id"]
structured_results = pd.concat([structured_results,temp_df])
#Inspect the df
structured_results.head()
The resulting table includes both the message_id
and user_id
columns as well as the columns returned by the API. Note that instead of message_text
, we have text
to denote each healthcare concept found within the original message_text
.

If there are additional metadata columns available (such as time of the message, message sender, user metadata, medication information, etc), they can be tied to each individual row of the
structured_results
DataFrame by using themessage_id
anduser_id
columns.
Additional Code Samples
Want to do more? Try filtering these messages by concept type.
Questions?
If you need additional help, we're standing by ready to assist! Contact support.
Updated 25 days ago