NSF: A Framework for Epidemic Contact Tracing Using Multi-Contextual Information

Krishna Kavi
Ebola, Contextual Intelligence, Susceptibility, Risk factor, Bayes theorem

NSF : A Framework for Epidemic Contact Tracing Using Multi-Contextual Information

Link to NSF Award Information

NSF AWARD: Framework for Epidemic Contact Tracing

The West African countries witnessed an "extraordinary" outbreak of the Ebola virus on August 8, 2014. It was declared to be a Public Health Emergency of International Concern (PHEIC) by the World Health Organization (WHO). Due to the complex nature of the outbreak, CDC has created interim guidance for monitoring people potentially exposed to Ebola and for evaluating their intended travel and restricting the movements of carrier when needed. Tools to evaluate the risk of individuals and groups of individuals contracting the disease could mitigate the fear and anxiety. Our goal is to understand and analyse the nature of risk an individual would posses when he/she comes in contact with a carrier. This Project presents an innovative tool that makes use of contextual data intelligence to predict the risk factor of individuals who come in contact with the carrier.

Major Goals of the Project

In light of concerns about the spread of lethal Ebola virus in the US, there is an urgent need to develop methods to better understand how to track the spread of the disease and how to contact everyone who may have been exposed to the disease. Our proposal addresses this specific need rapidly and in a timely manner, and will produce software applications that can be installed on smart phones.

We propose a framework that uses existing readily available technologies that can do effective contact tracing without compromising the privacy of the users. The framework has two parts: aggregator: which collects information from the users, and analyser: which processes the information to do contact tracing.

Note that applications such as IDid-Inc already have the capability to collect contextual information on the user, hence it is possible to design and develop our aggregator rapidly. This application tracks the daily activities of the user including places visited, people visited, mode of transportation, and other activities, which can be selected by a user. Using the user's calendar and contact lists, the application can track the locations and persons visited. It can differentiate between different modes of transportation (walking, biking, driving or flying), such information can be useful to identify potential interactions with infected persons. All the collected information is stored only on the smart-phone thus preserving the privacy of the user. While GPS localization is limited and not accurate for indoor navigation, several applications are currently available, which can improve the accuracy of location identification: these applications rely on varying techniques (such as WiFi signatures, cameras, magnetic signature of indoor structures, proximity sensors). The storage contains user information for a window determined by the infectious disease characteristics such as incubation period. For example, for Ebola we need to store the information for a window of at-least 21 days.

Analyzer is given access to the information of a specific user after that user is diagnosed with having an infectious disease. Analyser uses the information to estimate the 'potential contacts' of this specific user. It uses publicly available information, such as CCTV footage from the specific location to identify close contact, as well as the multi-contextual information, such as shops visited, things purchased, etc., to minimize the false positives. The framework is parametrized to allow contact tracing for various infectious diseases with differing pattern of infection. Once the 'potential contacts' are estimated, the framework can be used to alert and trace the 'contacts' using one of the various interfaces, such as Smart-phone application or Facebook application.

The specific research challenges that will be addressed in this proposal are:

  • Algorithms to minimize false positives in identifying potential contacts
  • Privacy preserving information storage
  • Scalable storage architecture that can efficiently scale with the users
  • Integration of available technologies to create a usable aggregator and analyzer
  • Generalization of the technique for tracking other types of epidemic diseases (e.g., TB, new flu strains)

The intellectual merit of this proposal is the application of existing mobile, cloud technologies, and Internet technologies to gather contextual knowledge and intelligence that can be analysed and applied in a rapid fashion to address new challenges such as contact tracing epidemic diseases. Furthermore, our proposed framework is designed to be scalable and flexible to be used for contact tracing various infectious diseases. The proposed framework is not only applicable to contact tracing Ebola but also can be used for contact tracing other infectious diseases as well as emergency preparedness in the case of Bio-Terrorism attacks.

The technologies developed as part of this project, such as privacy preserving storage architecture, can also be applied to other problems. Apart from the direct outcomes of the project, this proposal will benefit a greater community at UNT as it has well established programs, centres and working relationships with institutions for benefiting under-represented group of students and local community. Two national magazines have named UNT among the top colleges and universities in the United States based on the number of undergraduate degrees conferred to minority students. UNT and Texas Woman’s University (TWU) have signed an agreement that will allow students to attend both universities, and simultaneously receive bachelor’s degrees from both schools. Apart from these unique and valuable programs the international research collaborations of the PIs enable them to have high impact throughout the world and in the industry.

Accomplishments under these goals

Part 1 : Major Activities

In summary, contextual data from IDid-Inc is used to identify the nature of contacts. In addition, utilizing available literature about infectious diseases, a model is developed to estimate the potential risk of contracting the disease. A web-based application is developed and is currently being tested.

When the efforts to prevent a disease fails and an outbreak occurs, the resulting distribution of cases may take various forms that are called epidemic curves. These epidemic curves project the nature of a disease outbreak within a population that are potentially at risk of contracting the disease. Although they indicate the nature of an outbreak, it doesn't provide sufficient data to understand the chances that a particular individual gets affected by a disease outbreak. In order to minimize the spread of an epidemic such as Ebola virus, we need \textbf{effective contact tracing}. The problem is complex as the contact tracing must be done retroactively after a patient is diagnosed with the disease. Any lapse in the tracing could fail to track the citizens at risk. Ad Hoc tracing, relying on the infected carrier's recollection of places visited and people met, may lead to inaccurate findings. We need a contextually intelligent application that can keep track of both the individuals movement and the carrier's movement in order to identify if the individual is at low risk or at high risk of contracting Ebola virus. In this project we developed a Framework and means of tracing and analyzing the risk of contracting the Ebola Virus using contextual intelligence and other contributing factors which are discussed in detail in the following sections.

Part 2: Specific Objectives

Factors of disease susceptibility

On studying the disease outbreaks and their nature , any infectious disease involves 4 important factors that cause an individual to be susceptible to the disease. They are:

  • Time of Exposure
  • Proximity of Carrier
  • Carrier Status
  • Individual Medical History
Time of Exposure

The time of exposure provides us information about the amount of time spent by an individual with the carrier. Certain communicable diseases have a higher rate of susceptibility even if the exposure is for a short duration. Based on research, we can say the exposure rate for contracting the Ebola virus (or any other infectious disease) is dependent on the duration of contacts made by the individual with carrier.

Proximity of Carrier

The proximity or the nature of contact with the infected individual also plays a vital role on the probability of contracting any disease. The distance of the carrier from individual can indicate the risk of contracting the disease.

When an individual stays a few miles away from the carrier,their probability of contracting Ebola will be negligible, but the value will be much higher if the individual is closer to the carrier. In some cases actual physical contact is needed to contract a disease while in other cases (such as flu) no physical contact is needed.

Carrier Status

The Ebola virus is a disease which has an incubation period of 21 days and this is period when the infected carrier can spread the disease. Thus an individual's probability of contracting from the carrier varies depending on the day within the infectious period of the carrier. The carrier status refers to the day on which an individual comes in contact with the carrier.

Individual Medical History

An individual's medical history is important to see how the immune levels of the person could either strengthen or weaken the prospect of contracting a disease. The medical records can be used to gather information to predict how prone an individual is to contracting Ebola virus.

Peripheral Factors

Besides these primary factors, there are a few incidental factors that contribute to an individual's susceptibility.

Environment of Individual and Facilities

The environment or the locality plays a vital role in analyzing the risk: if the individual is located in a place with very good health facilities, and in general good hygiene, then their risk of contracting a disease or spreading the disease is less when compared to an individual living in an environment with fewer medical facilities and in general poor hygienic conditions.

Population Demography

The implications of demographic changes for the spread and control of infectious diseases are not fully understood. But an individual's susceptibility can be studied based on the population structure and the marked effect it can have on any disease transmission. A population with more number of carriers can indicate higher risk for any individual located in that area. Suppose if an individual is located in Sierra Leone, then he/she has a higher risk of contracting Ebola than an individual living in New York.

Another factor is related cultural habits including touching and treating infected persons. This can vary from one environment to another.

Heat Zones

Heat Zones

The Global Surveillance Network which was developed by the CDC in the year 1995 was based on the concept of a data collection network for the surveillance of travel related morbidity. The goal was to direct the clinics to be ideally situated so as to effectively detect geographic and temporal trends in morbidity among travelers, immigrants and refugees. Such a concept would be useful in order to track the carriers of Ebola virus who can be monitored and thus, we can provide the heat zone to indicate how a particular area is geo-fenced based on the nature of affected population.


From the data collected by CDC based on the surveillance network established, heat zones can be constructed. This figure shows a heat zoned area in California, where the traces of carrier's movement is shown on the map. The zones opacity indicate how the area is particularly is at higher risk in comparison to the other areas. This could be useful for an individual travelling to an address that is close to the hot zones.



Significant Results


Contextual Data Intelligence and IDid-Inc application

As stated previously, in order to accurately track diseases, it is necessary to track the daily activities of both the carrier and the individual in question over the previous days and weeks. For example since an Ebola infected carrier is contagious for 21 days, it would be necessary to discover all the locations the carrier visited (and people the carrier met). This information can then be used by individuals to correlate their own movements over the past 21 days to assess their risks. In many cases it is possible to predict past activities and behaviors of individuals based on current or future activities. However, the process of collecting the data of individual's activity and maintaining a repository is a tedious task: it must be collected in real-time in order to have an active monitoring scheme. It is also necessary to collect geographical information (GPS data) as well as the nature of the activity.

For example, using GPS data, it will be possible to understand the nature of the location where a person was and this in turn may provide contextual information such as the duration of the activity.

Case1: If the location was a restaurant, it is reasonable to assume that the individual was present at that place for considerable duration of time as he/she would be ordering food to eat. This means that there is a good chance for the individual to come in contact with the carrier at the Restaurant.

Case2: If the activity indicates that the individual was driving, it is reasonable to assume that there is very little possibility of direct or very close contact with a carrier(unless the carrier is inside the vehicle). Similar contextual information can be used to assess the risk of contracting or spreading infectious diseases.

Integration of IDid-Inc and Risk Factor Generation Application

The data collected on individuals from IDid-Inc app can be utilized to track their movement on a routine basis.The database contains information such as duration spent at a place, time of visit, time of travel to a new location, number of contacts made with the individuals at the new location, etc. Data is stored in a back-end repository and user's future schedule is predicted by analysing his/her currently available data.

The data thus collected can be correlated with the data collected for the carrier (assuming such data exists) in order to generate discrete susceptibility ratio graphs (indicating probability of contracting the disease).

Creation of Web Application

The purpose of the application developed is to provide the individual with their nature of risk or how susceptible they are to contracting the disease. The application collects information from the back-end which contains the individual's daily activities as described above. In addition, the individual's medical records can be used to improve the accuracy of the predictions. It should be noted that in order to preserve privacy of the individual, the application does not save the individual's activities or medical history; the information will only be used to assess the risks of the individual in contracting the decease. This is achieved by correlating activity history of disease carrier(s).

The probabilities are generated based on the data collected based on the various factors described previously (such as duration of a contact, proximity to the carriers, infectious state of the carrier, and the personal medical history of the individual), a comprehensive risk factor is generated. The application also provides details of the risks and how the risk is computed.

Data Privacy

The information required to calculate the probability of contracting the disease is personal and often covered by HIPAA. Therefore, data privacy is considered to be of paramount importance. To protect a user's privacy in our system, the data for a given user will be maintained in their own smartphone or their personal storage spaces, often protected with passwords and data encryption. The iDid app uses calendar selected by the user when he/she installs the app and a private database is maintained to log the completed activities – such as drives, flights, places visited, sleep, etc. Our application does not save user contextual information or medical history, but uses the data only to compute risk factors. The iDid app also uses the Google Maps API(s).

With the user’s permission, the risk probabilities are used to track disease spread in a population, without any information that can be used to identify the individual.

Bayesian Analysis of Data

Probabilistic aspects take on an important dimension in this network of relations to overcome limited knowledge and to help in the decision making process. Bayesian networks are directed acyclic graphs (DAGs) where nodes represent random variables, and assumptions of inter variable independence are maintained. Nodes in graphs correspond to Bayesian network random variables and may vary in nature.

In order to provide significant risk analysis of the individual, we determine the conditional probability using Bayesian theorem.

Quick Bayesian Risk Calculator

Bayesian Formula

The Bayesian network for determining the probability of risk factor describes the probability of an event, based on the conditions that might relate to that event. If an individual travels to the same places as the carrier involving in activities that make them spend certain amount of time around the carrier, then their risk factor depends on the duration of time they spend in the place doing that activity.

Instance: Suppose if a carrier is sitting at Starbucks at 10 am and the Individual enters Starbucks and has a cup of coffee. Then based on the activity, we can say the individual would spend 15-20 minutes in the place and so their risk factor is higher than an individual who drives past the same Starbucks who does not come in close quarters with the carrier. This Bayesian network involves knowing the type of activity, place and duration spent by an individual, by which we can calculate the risk factor instantly at that given time.

Bayesian Analysis of Medical Record

An individual's medical history contributes to a fair share of the person's risk to a disease. The information they provide could lead to an accurate analysis of understanding the individual's probability of conracting the Ebola virus.

The figure shows a Venn Diagram that shows the Individual A having an allergy for dust and has indicated that he/she had Chicken pox at some point in life. The Bayesian value can be obtained by considering this simple instance in which case, the individual's risk to a disease such as Ebola can be calculated by determining the conditional probability of having had Chicken pox given the fact that they also have dust allergy. The probabilty of the resulting Bayesian would indicate their risk value based on their current medical status. This process can be instrumental in calculating a chain of other symptoms that the individual may indicate using the Medical Record Web form provided to the user via the application.

Contextual Intelligence Probability Calculator (CIPC)

The goal is to provide the individual with a risk value from which they understand the nature of risk. The CIPC will yield a precise risk value which maybe high or low depending on the individual's susceptibility. It is calculated by the integration of probability values calculated from the Quick Bayesian calculator and the Medical history Bayesian value, along with the probability values obtained from the factors that contribute to the risk, such as time of exposure, number of contacts made and the environment.

Working of the Model

Frame Work of the application consists of the following steps:

Individual/Carrier Movement: Trace the movement of individual/carrier by real time monitoring using Idid Inc. Then store the data and their previous movements that were tracked can be used in calculating the risk factor of contracting Ebola Virus.

Data Correlation: The data is collected from the individuals and stored in the repository. They are correlated based on the factors that help and analyse the risk of contracting Ebola virus.

Medical History Form Repository: An individual's medical history will yield a more accurate analysis of the risk in contracting Ebola from a carrier. Using Bayesian Data analysis, we can predict the accurate value of their medical risk.

CIPC: The Contextual Intelligence Probability Calculator computes by taking the value generated from the medical history of individual, the conditional probability is calculated based on individual's movement with respect to the carrier at that instant of time. The type of activity, place of visit and duration spent at a place whilst the carrier is also in motion are correlated along with the main contributing factors such as day of exposure, number of contacts made. This conditional probability obtained is used to provide an accurate value indicating if the individual is at high risk or at low risk.

Contextual Data Collection

Ebola_idid data

Using Contextual intelligence, we can get information about their previous history and populate the database with that information.This data can be analyzed and the probability graphs can be obtained from the values of the individual and the carrier.They in turn generate the susceptibility ratio or the risk factor of an individual for contracting the disease.


Average Actvity Time

Based on analysis of time spent on various activities, we compute an average time for each activity.

This chart describes the average time an individual spends on a particluar activity. This may be helpful for estimating the duration of untracked activities for which empirical data is not otherwise available. This could reduce the number of false positives when predicting a risk value.

Accuracy of CIPC(Contextual Intelligence Probability Calculator)

The application monitored a series of individuals located at different proximities to the carrier, then probability values were assigned based on the contributing factors such as time of exposure, number of contacts made and medical history. The Bayesian Conditional probability was measured based on movements with respect to their activities.

This chart describes the accuracy of Contextual intelligence Bayesian analysis based on the different factors monitored for an individual with respect to the carrier's movement.

Dynamic Bayesian Networks and Hidden Markov Models

HMMs have become increasingly popular in computational biology and many state-of-the-art sequence analysis algorithms have been built on HMMs.

Consider a scenario in this application where there is no history of information regarding an individual's location on a specific day. Using HMM and the information collected by the iDid application, the hidden values for a person's history can be calculated from the Markov chain. These observations are themselves probability values that can be used as most-likely predictions when emprical values are not available.

Given the conditional probabilities, the Hidden Markov model that could be used in this application falls in a subclass of Bayesian Networks known as Dynamic Bayesian Networks, which are simply Bayesian Networks for modeling time series data. In the time series model, the assumption is that an event can cause another event in the future but not vice-versa simplifying the design of the Bayesian network model. Hence, we are not using nth-order HMMs in our model. However, extensions could be done to the application to include them in a future time.

Future Improvements

Although the thrust of this paper has been on epidemiological research relating to the factors that contribute to contracting the Ebola virus, there are several other communicable diseases that could affect individuals. One of the main reasons that the Ebola virus spread across Africa was due to the lack of awareness among the population about the severity of the epidemic. There are similar deadly diseases for which people fail to understand the risk factors and unwittingly become carriers. An epidemic of cholera infections was documented in Haiti for the first time in more than 100 years in October 2010. Cases have continued to occur, raising the question of whether the microorganism has established environmental reservoirs in Haiti.The patterns of cholera transmission and the seasonality of cholera in an environment is largely based on water contamination, poor sanitation facilities and inadequate hygiene.

Our application described in this paper can be extended to predict contaminated water locations and the probability of individual's susceptibility to Cholera based on location and climatic conditions. In May 2015, the Pan American Health Organization (PAHO) issued an alert regarding the first confirmed Zika virus infection in Brazil and on Feb 1, 2016, the World Health Organization (WHO) declared the Zika virus a public health emergency of international concern (PHEIC). Local transmission has been reported in many other countries and territories. The Zika virus will likely continue to spread to new areas. Our approach to Ebola risk analysis can be used to analyze risk posed by other diseases such as Zika, by changing the factors that play a role in the spread of the disease, probability curves and other external factors described in this paper.


In this paper we described a framework that can be used to assess the risk posed to individual by relating contextual information that tracks the activities of the individual and correlates this data with that of a carrier. We relied on iDid app and demonstrated how our system works. Since actual contextual information on any specific carriers is unavailable, we used made up data and provided users with privacy of data. We used Bayesian models to combine the risks emanating from several factors into a single risk value. We plan to extend our study to model other infectuous diseases

In an article titled 'Computer Modelers vs Ebola', it is stated that in 2014, a team of researchers from Virginia Tech Institute tried to create a model and characterize the nature of the disease outbreak in West Africa. But the research yielded results that did not prove to be accurate. Nevertheless, it can be a stepping stone to understand and analyze an individual's behavior and their movement around the carrier to statistically predict the nature of a disease outbreak. If each individual is able to understand their risk factor and susceptibility to the disease, it could mitigate the possibility of a disease outbreak.

This application reported the ways in which mobile and wireless technologies can be used to implement the vision of pervasive healthcare. 


PI : Dr. Krishna Kavi
Names of students in project : Arjun Gopalakrishnan , Shashank Adavally

Link to Application ( under construction)

Ebola web application



University of North Texas