Data Pseudoscience
It’s 2021, and data science has become a commodity in many companies, but only those who really understood the value and its impact. On the other hand, some companies follow a new religion inspired by data science, which I call “data pseudoscience”.
Trapped by the hype of Artificial Intelligence, Machine Learning, Deep Learning and other cool words, many companies feel the urge of adopting all this stuff in the name of profit increasing. The fact is that they don’t even bother to understand what it is all about, presumably because it would transform the company in a way nobody in the Direction Board would accept: when decisions are based on data, how will the directors justify their high recompenses?
Anyway, it’s always better a portion of wrong data science than nothing because it’s a hot topic in newspapers, but it’s then when data pseudoscience arises. As traditional pseudoscience, data pseudoscience comes with a wide range of flavours, some of them so engaging and convincing that data scientists themselves might embrace them blindfolded.
Let me show you some typical models of practices, beliefs and attitudes incompatible with an effective data science culture, compared to famous pseudoscience topics:
-
Astrology: in pseudodata-driven enterprises, astrologists in disguise appear in the form of decision-makers, thanks to their limitless intuition, but ignoring the confirmation bias all humans suffer from. Instead of studying the celestial bodies movements to predict your future, they use their feelings and presentiments to lead the company to success. Moreover, as astrologists don’t take responsibility for their wrong predictions, there is also a massive lack of accountability among decision-makers for their continuous failures. Astrologists may listen to data scientists, but only when data can confirm what they already guess.
-
Homoeopathy: homeopathists’ main fault is the lack of rigorous experiments to demonstrate the good effects of their products. Data pseudoscientists apply the same rule: changes are deployed to production without experimentation, wishing they will positively affect the company, but when that doesn’t occur, any cause is used as a fictitious explanation. Basic time-series behaviours like stationary changes are unclear, and any changes made in dates when important metrics like revenue is always increasing are adopted as effective. And what’s more, when decisions and their effects are not analysed, the company’s learning process is heavily affected. Previous decisions that were supposed to be positive might be adverse in the long term.
-
Parapsychology: one of the most typical forms of data pseudoscience is represented by data ghostbusters. When data quality is low, data scientists try their best to extract something meaningful from such a mess. As a result, they become data scientists focused on noisy data until they realize their efforts are useless and eventually leave the company. Another way of detecting data parapsychology companies is by measuring dedicated data engineering hours per data science hours. The lower, the more probable you are suffering from data parapsychology.
-
Feng Shui: environment can affect us in many ways, as Feng shui promotes. Nowadays, Tableau, Power BI and similar tools are used for the same purpose. With these tools, data pseudoscientists create colourful, attractive and fancy visualizations that offer the most beautiful pictures any museum desires. Still, unfortunately, they leave apart probability, statistics and the scientific method. If we truly aspire to know why our sales performed badly from a scientific perspective, it is improbable to discover it with beautiful visualizations from data painters. Look around you. How many great visualizations do not answer the most relevant questions at the moment in your company? Don’t ask data pseudoscientists for thousands of visualizations without any statistical support; hire convinced data scientists with the right skills and throw them to what really matters.
-
Numerology: everyone has a lucky number! As in Numerology, where numbers rule our fate, data numerologists are obsessed with numbers. In this data pseudoscientific behaviour, the short-term view is the pattern: decisions are made using yesterday’s or last week’s KPIs values. Moreover, numbers are so important that tables are the main standard for accessing data-based insights. Only long and tiresome tables. Numerologists in these companies show up in another particular and disturbing aspect: they ignore basic statistics rules like hypothesis contrasts to solve averages comparisons, even when data scientists warn them. “Give me numbers, data scientists, and shut up!” they say.
-
Geocentrism: perhaps geocentrism is the most influential belief in pseudodata-driven organisations, mainly because it is the most accepted one by data scientists and engineers. Let’s repeat it: data science facilitates democratic decisions so that anyone in the company might potentially participate with a hypothesis in the decision-making process. What actually appears in geocentric companies is very far from it: only a tiny percentage of the company participates in the vast majority of the decisions. It is hard to swallow, but science demonstrated a long time ago that the earth is not the centre of the universe.
-
Climate change denial: denialism of climate change is very common these days, and it takes a similar form in pseudodata-driven companies, where warnings from data scientists are not heard. Why? Basically, because data disagree with our beliefs in some cases. In this low-quality culture based on falsehood, data scientists are responsible for low-impact tasks, like reporting, simple visualizations or tables designing. Thus, data teams are unable to spend time on what really matters.
-
Ancient astronauts: some people believe extraterrestrial beings visited earth and influenced humans in several moments in the past, which led us to important advances in human history. In many pseudodata-driven organisations, something alike is occurring. Let’s be honest with ourselves: data science is not cheap. Good data scientists are arduous to find, and effective data teams are even harder to build, infrastructure costs are high and cleaning data is ridiculous demanding. In other words, there are no short-cuts, and there is no use in waiting for ancient astronauts that will help us increase data analysis impact on our business. The shortage of budget for data teams is the clear sign of ancient astronauts devotees: set a correct budget for data teams if you absolutely desire to see a tangible impact. There is no free lunch, no matter the approach you follow.
-
Alchemy: data science is based on data, but sometimes the data we require are incomplete or don’t exist. Data scientists might become then data alchemists. As alchemists tried hard to generate gold from any material, data alchemists do their best using data unconnected with the matter, believing they are right doing so. They make up proxy variables that are statistically wrong but can convince anyone out of the data science world. Or even they produce persuasive analysis without relevant data. No data, no party: accept data science doesn’t have all the answers and sometimes doesn’t even have one answer.
Finally, it is nothing wrong with ignoring the data science drift. Many prosperous companies are not data-driven. However, if you are definitely into data science, instead of focusing on what you have to accomplish next to be more data-driven, put your efforts into avoiding as many pseudoscientific techniques shown in this article as possible.