In 1854, a cholera outbreak ravaged the SoHo district of London, killing more than 600 people. The city had weathered cholera epidemics in 1832 and 1849, but 1854 was different because of Dr. John Snow. The British doctor sought the source of the cholera through a radical experiment—the collection of data.
Snow documented every instance of the disease, literally mapping the outbreak and discovering that at the center of the illness was a single water pump. The finding paved the way for a theory that sickness was caused by germs rather than bad air.
Seth Stephens-Davidowitz cites Snow as a pioneer and his study as one of the earliest uses of Big Data. In his new book, Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who We Really Are, Stephens-Davidowitz takes data analysis a step further and uses search results from Google and other Internet platforms to uncover the disparity between what people say and what people do.
Without Internet data, social scientists have traditionally relied on self-reported information. But, Stephens-Davidowitz notes, people lie about their innermost feelings. “Certain online sources get people to admit things they would not admit anywhere else. They serve as a digital truth serum,” he writes.
And that truth is not always pretty. Stephens-Davidowitz learned that a virulent, widespread racism exists in the midwestern United States; that the number of searches for “voting” in October can predict turnout for elections in November; and that child abuse reporting rates went down while actual abuse rose.
Our interaction with the Internet and with social media is also leading to new types of information that provides insight into society’s fears and anxieties. For example, Internet searches reveal that parents are excited about the intellectual prowess of their sons but concerned about the appearance of their daughters.
While these insights are revealing, Stephens-Davidowitz warns about putting too much stock in numbers alone. “A special sauce is often necessary to help Big Data work best: the judgment of humans and small surveys, what we might call small data.”
This small data manifests in human decisions based on expertise in a certain subject or simply on experience.
Both types of data are critical in solving a problem explored in this month’s cover story. In “The Dirty Secret of Drug Diversion,” Assistant Editor Lilly Chapa talks to experts who crunch numbers to determine which medical facilities might be victims of diversion—instances when controlled substances are intercepted before they reach the patient. However, investigators are on hand to explore the human element as well, noting that drug diverters—who are often addicts—need help and are relieved to be caught.