Sue Halpern, a writer and scholar-in-residence at Middlebury College in Vermont, wrote recently in the New York Review of Books about the extraordinary reach of data mining. Her article is ostensibly a review of two books, but is actually her summary of the reach of various internet sites into our so-called private lives. You would be astonished at what the Internet knows about you. Every click on Facebook, Google, or Amazon is added to your profile.
She writes:
“A few months ago The Washington Post reported that Facebook collects ninety-eight data points on each of its nearly two billion users. Among this ninety-eight are ethnicity, income, net worth, home value, if you are a mom, if you are a soccer mom, if you are married, the number of lines of credit you have, if you are interested in Ramadan, when you bought your car, and on and on and on.
“How and where does Facebook acquire these bits and pieces of one’s personal life and identity? First, from information users volunteer, like relationship status, age, and university affiliation. They also come from Facebook posts of vacation pictures and baby pictures and graduation pictures. These do not have to be photos one posts oneself: Facebook’s facial recognition software can pick you out of a crowd. Facebook also follows users across the Internet, disregarding their “do not track” settings as it stalks them. It knows every time a user visits a website that has a Facebook “like” button, for example, which most websites do.
“The company also buys personal information from some of the five thousand data brokers worldwide, who collect information from store loyalty cards, warranties, pharmacy records, pay stubs, and some of the ten million public data sets available for harvest. Municipalities also sell data—voter registrations and motor vehicle information, for example, and death notices, foreclosure declarations, and business registrations, to name a few. In theory, all these data points are being collected by Facebook in order to tailor ads to sell us stuff we want, but in fact they are being sold by Facebook to advertisers for the simple reason that the company can make a lot of money doing so….
“In fact, the datafication of everything is reductive. For a start, it leaves behind whatever can’t be quantified. And as Cathy O’Neil points out in her insightful and disturbing book Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, datafication often relies on proxies—stand-ins that can be enumerated—that bear little or no relation to the things they are supposed to represent: credit scores as a proxy for the likelihood of being a good employee, for example, or “big five” personality tests like the ones used by the Cambridge Psychometrics Centre, even though, as O’Neil reports, “research suggests that personality tests are poor predictors of job performance.”
“There is a tendency to assume that data is neutral, that it does not reflect inherent biases. Most people, for instance, believe that Facebook does not mediate what appears in one’s “news feed,” even though Facebook’s proprietary algorithm does just that. Someone—a person or a group of people—decides what information should be included in an algorithm, and how it should be weighted, just as a person or group of people decides what to include in a data set, or what data sets to include in an analysis. That person or group of people come to their task with all the biases and cultural encumbrances that make us who we are. Someone at the Cambridge Psychometrics Centre decided that people who read The New York Review of Books are feminine and people who read tech blogs are masculine. This is not science, it is presumption. And it is baked right into the algorithm.
“We need to recognize that the fallibility of human beings is written into the algorithms that humans write. While this may be obvious when we’re looking at something like the Cambridge Psychometrics analysis, it is less obvious when we’re dealing with algorithms that “predict” who will commit a crime in the future, for example—which in some jurisdictions is now factored into sentencing and parole decisions—or the algorithms that deem a prospective employee too inquisitive and thus less likely to be a loyal employee, or the algorithms that determine credit ratings, which, as we’ve seen, are used for much more than determining creditworthiness. (Facebook is developing its own credit-rating algorithm based on whom one associates with on Facebook. This might benefit poor people whose friends work in finance yet penalize those whose friends are struggling artists—or just struggling.)…”
“If it is true, as Mark Zuckerberg has said, that privacy is no longer a social norm, at what point does it also cease to be a political norm? At what point does the primacy of the individual over the state, or civil liberties, or limited government also slip away? Because it would be naive to think that governments are not interested in our buying habits, or where we were at 4 PM yesterday, or who our friends are. Intelligence agencies and the police buy data from brokers, too. They do it to bypass laws that restrict their own ability to collect personal data; they do it because it is cheap; and they do it because commercial databases are multifaceted, powerful, and robust.
“Moreover, the enormous data trail that we leave when we use Gmail, post pictures to the Internet, store our work on Google Drive, and employ Uber is available to be subpoenaed by law enforcement. Sometimes, though, private information is simply handed over by tech companies, no questions asked, as we learned not long ago when we found out that Yahoo was monitoring all incoming e-mail on behalf of the United States government. And then there is an app called Geofeedia, which has enabled the police, among others, to triangulate the openly shared personal information from about a dozen social media sites in order to spy on activists and shut down protests in real time.
“Or there is the secretive Silicon Valley data analysis firm Palantir, funded by the Central Intelligence Agency and used by the NSA, the CIA, the FBI, numerous police forces, American Express, and hundreds of other corporations, intelligence agencies, and financial institutions. Its algorithms allow for rapid analysis of enormous amounts of data from a vast array of sources like traffic cameras, online purchases, social media posts, friendships, and e-mail exchanges—the everyday activities of innocent people—to enable police officers, for example, to assess whether someone they have pulled over for a broken headlight is possibly a criminal. Or someday may be a criminal.
“It would be naive to think that there is a firewall between commercial surveillance and government surveillance. There is not.
“Many of us have been concerned about digital overreach by our governments, especially after the Snowden revelations. But the consumerist impulse that feeds the promiscuous divulgence of personal information similarly threatens our rights as individuals and our collective welfare. Indeed, it may be more threatening, as we mindlessly trade ninety-eight degrees of freedom for a bunch of stuff we have been mesmerized into thinking costs us nothing.”

If we do not institutionalize a right of privacy through constitutional amendment, we will ultimately sacrifice our civil liberties. In medical and financial applications to our private lives, in political and philosophical beliefs as they play out publically, privacy is an important concept. We need to accurately define what should be private, and enshrine these concepts in a constitution that explicitly defines privacy and anticipates future inroads made on our private lives by technological innovation.
LikeLike
Although not explicitly stated in the US Constitution, the Supreme Court has articulated a generalized right of privacy. See
Griswold v. Connecticut 1965
https://www.oyez.org/cases/1964/496
Hard to believe, but in 1965, (in Connecticut) it was illegal for a married to put a condom on, before he had intercourse with his wife.
Since there have only been 27 amendments to the US Constitution since 1789, I would find the odds of getting a constitutional amendment to protect privacy, as incredibly slim.
LikeLike
Big data is a two edged sword. It is has and will yield incredibly valuable results in area like transportation (air, traffic, marine) as well as in disease research. Bioinformatics were unknown in cancer research a decade ago and now no serious research institution can do its work without it. Rosabeth Moss Kanter’s book “Move: Putting America’s Infrastructure Back in the Lead” explains this as well as anything I’ve ever read.
But when big data is used for human manipulation, as is outlined very well in this and past postings on this blog. Sadly, the toothpaste can’t be put back into the tube to differentiate between the good and sinister applications of big data.
If anything, as Diane and so many commentators have expressed far more succinctly than I can, this underscores why citizen engagement is so vital to combat the nefarious uses of big data. I’m just pessimistic, given that lack of civic engagement on so many fronts, that it will get far worse in the near future.
LikeLike
correction 2nd paragraph: But when big data is used for human manipulation, as is outlined very well in this and past postings on this blog, it has the potential to be much worse than anything Big Brother embodied.
LikeLike
First, there is an intimate, even essential, relationship between privacy and the freedoms that are ensconced in the First Amendment and that are supposed to be operative for all. A collaborative reflection on that relationship would be productive in developing resistances to the threat to our privacy, I am sure.
Second, embedded in the post is this: “”We need to recognize that the fallibility of human beings is written into the algorithms that humans write. While this may be obvious when we’re looking at something like the Cambridge Psychometrics analysis, it is less obvious when we’re dealing with algorithms that ‘predict’ who will commit a crime in the future, for example— ”
Two problems rear their ugly heads here. If you are interested, I have added a brief explanation in my own REPLY below. But the upshot is this:
**. . . the statistician hates it that they cannot totally predict in bulls-eye fashion; and so, instead of accounting for the huge differences in the fields of studying human data, including the hugely influential field of their own potentially fallible development on their questioning activities, scientists and statisticians continually try to reduce human data to fit onto a Procrustean bed made up of the expectations of unhistorical, non-conscious and non-self-directive (non-human), natural and statistical data.
A basic tenet of science is to pay attention to the data. In the human situation for the sciences, we are on our way, but we are not there yet. The implications on the future of the sciences (all of them) of that quoted paragraph above are vast.
LikeLike
ADDENDUM to above post with quoted paragraph about the fallibility of scientists when dealing with human data–two problems rear their ugly heads:
(1) the problem of predictability–when your data is about human beings and not merely some laws of physical or even statistical science.
Bulls-eye predictability escapes from the physical sciences because the laws hold only where scientists can start with complete generality and “all things being equal, . . . ” And in statistics, such predictability escapes, basically, because the whole idea of the norm assumes “errant” data that do not fall within that norm. This later plays out in the classroom when, in opposition to the norm thing, a teacher spontaneously gives an “errant” child (who falls outside of the norm ensconced in the curricula) more, not less, help and attention.
(2) about the “fallibility of human beings” written into the algorithms. <–the most insightful paragraph I have seen in a very long time.
That is, the present keepers and users of algorithms are inheritors of a scientific tradition devoid of self-reflection (psychological, social, philosophical, ethical, political, or otherwise). And yet, in their own long history of development (or lack of it), long before they became scientists, they “developed” views of all sorts that, when they become scientists (or whatever), they then bring to their work.
It’s a huge but systematically-neglected variable that scientists and statisticians cannot NOT bring to their work. So too they bring their experience and intelligence. So there is no blank slate and, if there were, such scientists would be rendered brain-dead and so scientists no more.
With natural and even some statistical data, the effect of that kind of background development is minimal in most (not all) instances. But when fully human data come under consideration by other human beings, that across-the-board background development of the scientist is made up of a set of horizons and experiences that become informative of the very kinds of questions the scientist asks. Though ethical “norms” are often a part of our fields, such oversight of the scientist’s personal development still too easily can negate the question about whether there are some questions we should not ask of our data. FWIW
LikeLike
The fallabilities of big data as a commodity can be seen in a much simpler form in the collection agency racket. Lists of bad debts are bought, used, and resold unedited, that is, with entirely inaccurate data remaining in the list. Assuming that a few collections are successful with each iteration, mostly due to new additions, the list becomes a bigger and bigger waste of time having less potential for a ROI as the dead wood builds up. Since the debt is sold for about 4 cents on the dollar, there is a negative incentive to purge the bad data from the lists since they are a more reliable source of income than collections are. There’s even a term for it: “ZOMBIE DEBT”! http://www.nolo.com/legal-encyclopedia/debt-scavengers-zombie-debt-32240.html
How long until the idea of ZOMBIE DATA takes hold and throws water on the dumpster fire of the unwarranted assumptions about big data? That’s up to all of us to determine.
LikeLike
Here’s a WSJ article on the topic, just substitute big data for debt and feel the chill run down your spine. http://www.wsj.com/articles/SB10001424052702304885404579550191517738938
LikeLike
Kids are data, and data is for sale.
LikeLike
Personal religious beliefs, skin color, education, political leanings etc have all been and are currently used in some countries as reason for execution. Privacy is a big deal! We have failed to protect ourselves from the dangers of the loss of privacy.
LikeLike
“Code-Dependent: Pros and Cons of the Algorithm Age
Algorithms are aimed at optimizing everything. They can save lives, make things easier and conquer chaos. Still, experts worry they can also put too much control in the hands of corporations and governments, perpetuate bias, create filter bubbles, cut choices, creativity and serendipity, and could result in greater unemployment”
“On January 17, 2017, the Future of Life Institute published a list of 23 Principles for Beneficial Artificial Intelligence, created by a gathering of concerned researchers at a conference at Asimolar, in Pacific Grove, California. The more than 1,600 signatories included Steven Hawking, Elon Musk, Ray Kurzweil and hundreds of the world’s foremost AI researchers.
The use of algorithms is spreading as massive amounts of data are being created, captured and analyzed by businesses and governments. Some are calling this the Age of Algorithms and predicting that the future of algorithms is tied to machine learning and deep learning that will get better and better at an ever-faster pace.
While many of the 2016 U.S. presidential election post-mortems noted the revolutionary impact of web-based tools in influencing its outcome, XPrize Foundation CEO Peter Diamandis predicted that “five big tech trends will make this election look tame.” He said advances in quantum computing and the rapid evolution of AI and AI agents embedded in systems and devices in the Internet of Things will lead to hyper-stalking, influencing and shaping of voters, and hyper-personalized ads, and will create new ways to misrepresent reality and perpetuate falsehoods.”
State tests use algorithms. State tests hold bias and undermine self determination.
http://www.pewinternet.org/2017/02/08/code-dependent-pros-and-cons-of-the-algorithm-age/
LikeLike