Sue Halpern, a writer and scholar-in-residence at Middlebury College in Vermont, wrote recently in the New York Review of Books about the extraordinary reach of data mining. Her article is ostensibly a review of two books, but is actually her summary of the reach of various internet sites into our so-called private lives. You would be astonished at what the Internet knows about you. Every click on Facebook, Google, or Amazon is added to your profile.

She writes:

“A few months ago The Washington Post reported that Facebook collects ninety-eight data points on each of its nearly two billion users. Among this ninety-eight are ethnicity, income, net worth, home value, if you are a mom, if you are a soccer mom, if you are married, the number of lines of credit you have, if you are interested in Ramadan, when you bought your car, and on and on and on.

“How and where does Facebook acquire these bits and pieces of one’s personal life and identity? First, from information users volunteer, like relationship status, age, and university affiliation. They also come from Facebook posts of vacation pictures and baby pictures and graduation pictures. These do not have to be photos one posts oneself: Facebook’s facial recognition software can pick you out of a crowd. Facebook also follows users across the Internet, disregarding their “do not track” settings as it stalks them. It knows every time a user visits a website that has a Facebook “like” button, for example, which most websites do.

“The company also buys personal information from some of the five thousand data brokers worldwide, who collect information from store loyalty cards, warranties, pharmacy records, pay stubs, and some of the ten million public data sets available for harvest. Municipalities also sell data—voter registrations and motor vehicle information, for example, and death notices, foreclosure declarations, and business registrations, to name a few. In theory, all these data points are being collected by Facebook in order to tailor ads to sell us stuff we want, but in fact they are being sold by Facebook to advertisers for the simple reason that the company can make a lot of money doing so….

“In fact, the datafication of everything is reductive. For a start, it leaves behind whatever can’t be quantified. And as Cathy O’Neil points out in her insightful and disturbing book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, datafication often relies on proxies—stand-ins that can be enumerated—that bear little or no relation to the things they are supposed to represent: credit scores as a proxy for the likelihood of being a good employee, for example, or “big five” personality tests like the ones used by the Cambridge Psychometrics Centre, even though, as O’Neil reports, “research suggests that personality tests are poor predictors of job performance.”

“There is a tendency to assume that data is neutral, that it does not reflect inherent biases. Most people, for instance, believe that Facebook does not mediate what appears in one’s “news feed,” even though Facebook’s proprietary algorithm does just that. Someone—a person or a group of people—decides what information should be included in an algorithm, and how it should be weighted, just as a person or group of people decides what to include in a data set, or what data sets to include in an analysis. That person or group of people come to their task with all the biases and cultural encumbrances that make us who we are. Someone at the Cambridge Psychometrics Centre decided that people who read The New York Review of Books are feminine and people who read tech blogs are masculine. This is not science, it is presumption. And it is baked right into the algorithm.

“We need to recognize that the fallibility of human beings is written into the algorithms that humans write. While this may be obvious when we’re looking at something like the Cambridge Psychometrics analysis, it is less obvious when we’re dealing with algorithms that “predict” who will commit a crime in the future, for example—which in some jurisdictions is now factored into sentencing and parole decisions—or the algorithms that deem a prospective employee too inquisitive and thus less likely to be a loyal employee, or the algorithms that determine credit ratings, which, as we’ve seen, are used for much more than determining creditworthiness. (Facebook is developing its own credit-rating algorithm based on whom one associates with on Facebook. This might benefit poor people whose friends work in finance yet penalize those whose friends are struggling artists—or just struggling.)…”

“If it is true, as Mark Zuckerberg has said, that privacy is no longer a social norm, at what point does it also cease to be a political norm? At what point does the primacy of the individual over the state, or civil liberties, or limited government also slip away? Because it would be naive to think that governments are not interested in our buying habits, or where we were at 4 PM yesterday, or who our friends are. Intelligence agencies and the police buy data from brokers, too. They do it to bypass laws that restrict their own ability to collect personal data; they do it because it is cheap; and they do it because commercial databases are multifaceted, powerful, and robust.

“Moreover, the enormous data trail that we leave when we use Gmail, post pictures to the Internet, store our work on Google Drive, and employ Uber is available to be subpoenaed by law enforcement. Sometimes, though, private information is simply handed over by tech companies, no questions asked, as we learned not long ago when we found out that Yahoo was monitoring all incoming e-mail on behalf of the United States government. And then there is an app called Geofeedia, which has enabled the police, among others, to triangulate the openly shared personal information from about a dozen social media sites in order to spy on activists and shut down protests in real time.

“Or there is the secretive Silicon Valley data analysis firm Palantir, funded by the Central Intelligence Agency and used by the NSA, the CIA, the FBI, numerous police forces, American Express, and hundreds of other corporations, intelligence agencies, and financial institutions. Its algorithms allow for rapid analysis of enormous amounts of data from a vast array of sources like traffic cameras, online purchases, social media posts, friendships, and e-mail exchanges—the everyday activities of innocent people—to enable police officers, for example, to assess whether someone they have pulled over for a broken headlight is possibly a criminal. Or someday may be a criminal.

“It would be naive to think that there is a firewall between commercial surveillance and government surveillance. There is not.

“Many of us have been concerned about digital overreach by our governments, especially after the Snowden revelations. But the consumerist impulse that feeds the promiscuous divulgence of personal information similarly threatens our rights as individuals and our collective welfare. Indeed, it may be more threatening, as we mindlessly trade ninety-eight degrees of freedom for a bunch of stuff we have been mesmerized into thinking costs us nothing.”