Norovirus : Epidemic detection using tweets

Internship, HumanTech Institute, 2020

  • 1 student to supervise (remotely).
  • The student worked full-time for 2 months.

Context of the project

A norovirus outbreak happened in France and was caused by raw shellfish and oysters. The number of people in France who became ill after eating contaminated raw shellfish has jumped to more than 1,000. Other countries such as Sweden, Italy and the Netherlands have all also reported outbreaks linked to live oysters from France. Products have been recalled due to a risk of norovirus contamination in Luxembourg, Switzerland, Hong Kong and Singapore. Symptoms such as diarrhea and vomiting and incubation times are consistent with norovirus or other enteric virus infections. In addition, a number of supermarket chains in different countries (France, Belgium, and Luxembourg) have informed consumers of recalls because of possible norovirus. In Switzerland, the epidemic probably also happened, but on a smaller scale, probably before the end of January 2020.

A dataset of more than 500 000 tweets has been collected between the 1st of December 2019 and the 24th of January 2020. The goal of the project is to analyze the content of these tweets and see if there are any evidence of this outbreak in France or Switzerland. Symptoms like vomitting or diarrhea or companies names like the one that recalled products can be used to detect these. The study should be done on the textual content of the tweets. The tweets relevant to this outbreak should be extracted and deeply analysed. A first analysis can be done on tweets that are only in French, which are more likely to be relevant.

Tasks done by the student

  • Analysis of similar studies to detect epidemics
  • Theoretical concept of the architecture
  • Implementation of back-end (REST API using Flask framework + mongoDB ) and front-end (Vue.js)
  • Definition of keywords used to collect tweets related to symptoms, medias and stores involved with Norovirus
  • Analysis of results:
    • Daily tweets containing keywords for the 3 categories
    • Percentage of tweets containing sympoms by country
    • Comparison with official number of cases
  • Redaction of final report
  • Presentation of results to the institute