Title: Big Data Research Brings Surprising Results About the Spread of Flu
Georgetown biology professor Shweta Bansal and her research team are using big data to dig deeper into what causes the spread of flu.
“When we visit a doctor’s office, make a phone call or search for a term on Google, we are leaving behind anonymous digital traces that can provide significant information for tracking the diseases we spread,” says Bansal, who recently co-edited a special issue of The Journal of Infectious Diseases on using big data to track infectious diseases.
“Translating that information into knowledge presents a real opportunity and challenge for infectious disease epidemiology,” she explains.
Holidays: Less Flu
In a study in press at The Journal of Infectious Diseases, Bansal and her research team determined through big data that flu transmission is reduced during the holidays because children, the primary spreaders of flu, aren’t in school.
“Our hypothesis was that because schools are closed, fewer people will be infected,” Bansal says. “And in fact, both our data and our statistical model showed that to be the case. The surprising part was finding that the burden of flu shifts from school-age children to adults during the holidays, highlighting the significance of schools in driving infection among youth.”
Anne Ewing (C’16), now in medical school at the University of Louisville, and Elizabeth Lee (G’17) a Ph.D. candidate in global infectious diseases, are co-first authors on the study, along with an NIH scientist and Bansal.
Humidity vs. Population Density
Bansal says other research being conducted in her lab shows that humidity actually plays a larger role in the spread of flu than population density.
“We have two explanations for this finding,” says Bansal, who holds a Ph.D. in network modeling and infectious disease ecology from the University of Texas at Austin. “When climates are cold and dry, people tendto stay indoors in more crowded conditions where it’s easier to spread influenza, so it doesn’t matter if you’re in a high-population center.”
“But also in drier conditions, the virus itself is able to survive more easily,” she adds.
Unprecedented Opportunity
Bansal’s lab is testing numerous hypotheses about the spread of flu using more comprehensive big data coupled with sophisticated statistical epidemiology models.
The Center for Disease Control and Prevention predicts flu with data from only about 1,900 doctors across the country reporting state-by-state on a voluntary basis.
But Bansal and her team are using a database of health insurance claims that covers most physician’s offices and hospitals in the United States on the county level.
So instead of having data from fewer than 2,000 doctors, the team is accessing information from 400,000 physicians and almost 1 billion visits a year.
“That is an order of magnitude more information than what we’ve had access to in the past,” the professor notes. “This provides an unprecedented opportunity to understand very precisely what the distribution of flu is around the country, which helps us better predict future seasons of flu and design improved control strategies.”
Social Justice
The team is also looking at the role of poverty, health care access and other socio-economic factors in relation to the spread of flu. In the future, the researchers also hope to examine health disparities.
“This work really opens up some research directions for us,” Bansal says, “This vast amount of data allows us to consider a range of determinants in the spread of disease, and suggests it may be attributable beyond biological or environmental factors to include socioeconomic factors.
“Infectious diseases are amplified by health care, material and social disparities, and therefore have significant social justice implications,” she adds. While this line of research addresses flu, the methods the Bansal Lab uses have implications for predicting a range of infectious diseases such as Ebola, Zika and Chikungunya.
Traditional vs. Big Data Surveillance
Cecile Viboud, a senior scientist at NIH’s Fogarty International Center who co-edited the special The Journal of Infectious Diseases issue, says the “ultimate goal” of such research “is to be able to forecast the size, peak or trajectory of an outbreak weeks or months in advance in order to better respond to infectious disease threats.”
The article Bansal and her colleagues published in the same special issue looked at the pros and cons of traditional surveillance versus big data.
“Traditional disease and behavior surveillance is notorious for severe time lags, lack of spatial resolution and high costs,” explains Bansal, who came to Georgetown in 2012 after post-doctoral work in disease ecology at Penn State and public health policy at the Fogarty Center. “Instead, big data provides the potential to have access to information on disease cases with precise locations and in almost real-time.”
Because big data comes from social media, Internet searches and administrative data such as medical claims, researchers need to “proceed with caution and account for what data might be missing and what biases may exist, so that estimates are accurate,” she says.
Dangerous Clusters
Sandra Goldlust (G’17), a master’s degree candidate in the Biohazard Threat Agents and Emerging Infectious Diseases at Georgetown University Medical Center, is working on yet another research project in Bansal’s lab as a research associate in the biology department.
Goldlust is looking at the geographic distribution of vaccination refusal.
“When vaccination refusal cases occur in clusters they are a lot more dangerous,” Bansal explains, “because the infection then has access to a large number of individuals who can spread disease to one another and cause a significant outbreak.”
Amazing Feeling
Lee, who is pursuing a Ph.D. in global infectious disease at the Graduate School, and was the lead author on another article in the special journal issue, says she feels “incredibly lucky” to be working with Bansal.
“She continually pushes me to think more critically as a scientist and to aspire to be a better citizen of the scientific community,” says Lee, of Pleasanton, California.
Ewing says it’s an “amazing feeling” to know that her undergraduate work will culminate in getting published in a prestigious journal.
Forward-Looking Science
The professor says she is appreciative of the investment Georgetown and the biology department is making in the sciences, and especially toward interdisciplinary science.
“To bring someone like me who comes from a nontraditional biology training background onto the faculty takes a department and an administration that’s very forward-looking in terms of where science is going and where biology research needs to go,” Bansal says.
She also says there are lots of incentives for students to get involved in research, making the link between teaching and research at Georgetown strong.
The Next Generation
Bansal teaches courses that include Influenza: Science & Policy, Modeling Biological Populations and Linked: Exploring Social and Life Sciences in addition to instructing students in her lab.
“Dr. Bansal encourages every student to be active participants and will take the time to fully answer each question until everyone understands,” Ewing says.
The professor says she finds teaching both graduate and undergraduate students very rewarding.
“To watch them engage in the scientific process starting with a research question and finishing with a peer-reviewed scientific publication is a very fulfilling process,” she says. “These students represent the next generation of leaders, and I think it’s important for them to understand the process that goes into every scientific study they read about.”
“Doing scientific research requires a serious commitment,” Bansal adds. “To dedicate six to eight semesters to research on top of coursework, extracurricular activities and applying to jobs or graduate school – that takes a lot of dedication.”