Recommendations vs. raw data — what is better?
Is it better to show the users recommendations or raw data? On the one hand, raw data allows them to make their conclusions. It gives them control. It may even give people the satisfaction of discovering some insights, even if those insights very obvious. On the other hand, what if you want them to do a specific action and they misinterpret the data?
In the “Data Nerdism at Large” episode of the DataFramed podcast, Mara Averick says that data visualization can be helpful and misleading at the same time.
She talks about a study of the toxicity of chemicals which polluted one community.
After the study, the researchers showed the people a scatter plot with a line which marks the threshold at which the substances become cancerogenic. On the same chart, there were points which represented other people in the study and the one point that represented the household of the person who was looking at the chart.
Surprisingly often, somebody who was way, way above the threshold did not worry about it as long as they saw that they were below other houses in their neighborhood.
What problem does it show? There are two ways of looking at this situation. Which one you choose is in my opinion strongly correlated with your place on the political spectrum, but I cannot prove that.
What if the people understood the significance of that information, but they ignored it because that was a more comfortable approach? They saw that other people are in a worse situation. That justified their lack of action, so they anchored to that information and neglected everything else.
Maybe we have a problem with understanding that if we are dying it does not matter that other people are dying faster.
That is one explanation. Let’s look at the second one.
What if the people who participated in the study understood the data, knew that they have a huge problem, but for some reason were unable to deal with it, so they were looking for something that gave them hope?
What if the only recommendation they could get was “move somewhere else”? If they live in such a nasty area, we can assume that moving out was beyond their ability. What if they ignore the advice because they cannot afford to take action?
Recommendation vs. raw data
That brings us back to the original question. Is it better to show raw data or to give recommendations?
In my opinion, we should offer a summary of the data (perhaps in the form of a recommendation), the raw data, and some explanation of the process we followed to get the recommendation. It seems that we should emphasize the importance of explainability not only in machine learning models, but also in data visualizations.
Most importantly, we should never judge the consumers of our recommendations. They are doing what is best for them. Even if their definition of “best” is entirely different than ours.
Did you enjoy reading this article?
Would you like to learn more about software craft in data engineering and MLOps?
Subscribe to the newsletter or add this blog to your RSS reader (does anyone still use them?) to get a notification when I publish a new essay!
You may also like
- The difference between the expanding and rolling window in Pandas
- The silly mistakes in exploratory data analysis
- Predicting customer lifetime value using the Pareto/NBD model and Gamma-Gamma model
- Understanding uncertainty intervals generated by Prophet
- I worked as a data scientist and that was the worst job I have ever had.
- Data/MLOps engineer by day
- DevRel/copywriter by night
- Python and data engineering trainer
- Conference speaker
- Contributed a chapter to the book "97 Things Every Data Engineer Should Know"
- Twitter: @mikulskibartosz