Continued Privacy Worries That Could Eradicate Predictive Analytics

graphsPredictive analytics is an extraordinarily sophisticated business practice that is so successful it might just evolve itself to the brink of extinction.

Tech writer Vangie Beal defines the space nicely:

"Predictive analytics is the practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends. Predictive analytics doesn't tell you what will happen in the future. It forecasts what might happen in the future with an acceptable level of reliability, and includes what-if scenarios and risk assessment."

graphsFor decades, marketers have been trying to find and exploit patterns in what makes their customers buy, buy again, stay, switch, browse, click, join, sign up, opt in and check out. And while the astonishing science behind the dozens of different strains of predictive analytics is exploding, it's not even close to keeping pace with the stunningly vast amounts of data being collected and made available every minute of every day, on a healthy percentage of the Earth's inhabitants.

In many countries, data privacy laws have recently and rapidly grown much more stringent, with the penalties for breaking those laws becoming significantly more onerous with each revision. Take, for example, the $1 million dollar a day fines set forth in Canada's CASL legislation, and the prison sentences being meted out in places like Hong Kong.

The laws are generally clearly written (with China being a huge and notable exception), consistently enforced and abundantly available for inspection to all concerned. The discourse on these issues is serious, the boundaries are defined, and the apparatus for adjudicating infractions in most countries is known and robust.

What's less obvious — and to some more worrisome — is the current murky state of data that's collected and weaved together for use in something referred to as "alternative scores," which are being churned out in endless complexity and with increasingly specialized purposes through the discipline of predictive analytics.

Everybody knows what a credit score is, and what it's used for. We're also aware that a person's credit history — the combination of data elements that feed the sophisticated models and algorithms that make up our credit score — can and regularly do contain errors. To remedy this in the U.S., we can readily examine our credit reports and use a carefully regulated process to correct erroneous information held by credit bureaus.

As far as data privacy goes, most of us generally understand what opting in and opting out means. Every day, most of us are presented with at least one decision to allow or prevent an entity from collecting a little, or a lot, of data about us. But there are other kinds of scores — these alternative scores — being widely used today.

Some are used for things that make sense to most of us. They're increasingly adept at determining whether someone will commit fraud, cheat on their taxes, engage in criminal activity or become ensnared in terrorist propaganda leading to sometimes horrifying results. Protecting ourselves from these behaviors by finding them before they happen seems at worst a necessary evil, and at best a welcome safeguard.

Yet some other alternative models are used to predict other kinds of things, and could be considered a little creepy.

Maybe you're OK if data about what food you buy, how much you weigh and some key details of your family health history are combined to produce a score that says you're on the bullet train to diabetes. But maybe you're not — especially if that information somehow interferes with your ability to get or keep a particular job.

Maybe as a senior manager at Hewlett-Packard you're thrilled to have access to the company's "Flight Risk Score," which identifies employees who are at risk of leaving their jobs, which would leave you with a big, productivity-crushing hole in your senior team. But what if you're the employee, and you suddenly realize your company is keeping track of your raises, changes in position, performance evaluations and is synthesizing them into a nice, neat predictive score on you?

And there's the now-famous case of Target which, in 2012, noticed a correlation between certain products being purchased and pregnancy and, therefore, started marketing baby- and pregnancy-related products to those consumers as soon as that purchase behavior began. It seemed like a good idea at the time, but it freaked out a lot of people, especially those who hadn't yet mentioned the impending parenthood to their families, significant others or employers.

Eric Siegel, executive editor of "Predictive Analytics Times", refers to these as "unvolunteered truths," and says they're the source of the head-on collision that this critical area of statistical science is about to experience. Why? Because jillions of these data points are collected and synthesized every day, yielding one customized, predictive score after another. The problem is that this is done without the data subject's knowledge or consent and without the individual's ability to either correct errors or opt out of the process. As such, those data privacy regulators and watchdogs are turning their attention toward predictive analytics.

Advocates of predictive analytics — which I generally consider myself to be — remind us that alternative scores aren't specific to a single person, and are instead an aggregated pattern of documented facts from many individuals that, when combined in certain ways, yield a likelihood of some future behavior. However, for some, it might not feel that way when you're the one receiving the maternity wear promotions and you've yet to disclose your pregnancy — to anyone.

I'm not casting aspersions. My company developed proprietary alternative scores to compare and contrast entire audiences for direct marketers, uncovering previously unconsidered matches between specific promotions and the thousands of audiences that marketers can choose from for email and other direct marketing campaigns.

As we built those tools, we looked closely at the tenets espoused by IBM's Jeff Jonas, who has spent years baking in data privacy guards into its SPSS Modelers software. An inspirational figure in the field, Jeff is elevating the discipline by making infraction-proof systems, which is the template used for our tools — our systems never bring in any individual audience member's personally identifiable information. So for now, we like where we sit in the privacy spectrum, and we remain hopeful that the science of predictive analytics can continue to grow and thrive, while the industry creates reasonable oversight and privacy protections.