Originally published May 8, 2015 on Huffington Post
Today we are witnessing the unparalleled explosion of technology. But the sad fact of the matter is that Big Data and more technology alone have not improved the quality of our thinking and the ways in which the majority of people and organizations make important decisions. By themselves, Big Data and technology do not help people avoid what are known as Type Three Errors: Solving the Wrong Problems Precisely!
There is no doubt that technology enhances, magnifies, and improves the senses, but it does not necessarily improve our sensibilities. Thus, technology allows to see and to travel farther and faster, etc., but it does not necessarily make us wiser.
Lest I be accused of being anti-science and anti-technology, let me point out that I have a BS in Engineering Physics, an MS in Structural Engineering, and a Ph.D. in Industrial Engineering, all from UC Berkeley. The thing, however, that changed my thinking forever is the fact that when I was studying for my Ph.D., I deliberately chose to take a 3 ½ year minor in the Philosophy of Science. Furthermore, the particular kind of Philosophy of Science that I studied was deeply interdisciplinary. Accordingly, no single discipline has a monopoly on The Truth or The Way to study Reality. In short, interdisciplinary inquiry is the only guarantor of which we know for minimizing and avoiding Type Three Errors!
One of the primary ways for acquiring knowledge in Western societies is by means of Expert Consensus. In this system of inquiry, “truth” is that with which a group of experts agrees strongly. Alternately, it is also the average of a set of tightly bunched data, observations, scores, etc.
Consider global warming. The “body of ‘reputable scientists worldwide'” is now in strong, if not overwhelming, agreement that human activities are mainly responsible for global warming. This fact which is based on enumerable scientific studies is taken as “strong evidence” that the debate whether humans are or are not responsible for global warming is essentially over even if all the mechanisms for the phenomenon are not understood completely.
The point is that agreement is as important in science as it is in any field of human activity. One could in fact argue that agreement is even more important in science because so much is riding on the outcome of scientific knowledge.
The latest incarnation of Expert Consensus is Big Data. The hope is that by compiling enough data from different sources, it will reveal, say, the “underlying, true buying habits and preferences” of a selected group of consumers. However, writing in The New York Times, Alex Peysakhovich and Seth Stephens-Davidowitz note that for Big Data to actually work one is dependent on Small Data. That is, one needs old-fashioned interviews and surveys to understand in-depth why people give the numerical responses they do. In other words, by themselves, numbers are never enough.
The biggest downfall of Expert Agreement is that it assumes that one can gather data, facts, and observations on an issue or phenomenon without having to presuppose any prior theory about the nature of what one is studying. In other words, it assumes that data, facts, and observations are theory and value-free. It’s not just that one can’t interpret anything without a theory of some kind, but even more fundamental, one can’t collect any data in the first place without having presupposed some theory about the that underlies the data, certainly why the data are important to collect and how they should be collected such that they accurately reflect the “true nature of phenomenon.”
In contrast, the philosophical school known as Rationalism assumes that theories are free or independent of data, facts, and observations. In principle, the formulation of theories is dependent only upon pure thought or logic alone. In reality, theories are dependent upon the background, experience, and life history of the one person or small set of persons formulating the theories.
In sum, I’m extremely critical of Big Data. It’s not because the concept makes no sense at all, or that I’m fundamentally opposed to collecting data. Instead, as it’s currently conceived, the concept is deeply flawed. If all data are dependent upon some underlying theory or theoretical concepts before the data can even be captured, let alone analyzed, then what does it mean to put all kinds of data indiscriminately into larger and larger pools?
Unless one knows a lot about the different theories that underlie different sets of data, and how they can be reconciled and integrated, then one is literally squishing together apples and oranges.
Mush is the inevitable result!