I’ve been thinking a lot about story lately – it’s such a key point about really engaging people with your work. Storytelling with facts, helping people to really understand a situation, is one way to describe a key data science skill.
As part of this I’ve been looking into aspects of the storytelling craft, such as the hero’s journey, or the five elements of storytelling. It strikes me that there’s an imperfect fit between a novelist’s conceptualisation of story and that held by, say, a journalist, an academic, or a data scientist. One key difference that we, as data scientists, face is that we are communicating information foremost – sure we may make it engaging but we are constrained always that the truth is what it is and we may not ethically deviate from it. A novelist, conversely, creates the world, the conflict, the narrative, that best serves their purpose.
In academia, where the same issues have existed for centuries, has never really prioritised engaging content in their lectures or journal articles, preferring as a rule to let the content stand for itself. This erects a barrier-to-entry, where the learner is expected to bring a level of expertise and commitment to the table in order to benefit from the material. A sub-profession then develops to interpret these weighty tomes into something more palatable for non-academic audiences, compiling research into textbooks or putting new information into context for the layperson. In this way, the academic themselves can concentrate on their content and on communicating it only to an audience of their direct peers.
As data scientists or analysts, neither of the two models — academic or creative — are a perfect fit. The communication of our work to non-specialists who have an immediate and personal stake in it is part of the work. Without that communication, the work becomes not only meaningless but also valueless. We cannot either go the other way and seek out a level of pure engagement as a creative writer could. Perhaps the closest fit is the science fiction writer, trying to communicate to their reader the details of the world they have created so that the plot and context can work together. How much “worldbuilding” do we do in our data storytelling so that our audience can understand the context in which our efforts have meaning?
Ultimately data science exists in a junction between old statistics and new tech, so picking up and making use of different techniques is a standard part of our toolkit. The challenge with communication is that we are a community of programmers, mathematicians, statisticians and scientists, and we have not been trained to communicate in an engaging manner. Building a professional toolkit for communication is going to take time.
For the meantime, I think we can develop more engaging data storytelling with these techniques:
- Aim to demonstrate some form of progression, change, or new idea when communicating. This provides momentum and keeps your audience engaged.
- Focus on the purpose of your work. That is what brings meaning, and in an organisation shared purpose is a powerful connective.
- Bring in the context sensitively. Providing some judicious world building for your audience can help them share your vision for why this matters. But like a science fiction writer, avoid too much unnecessary exposition.
- Don’t be afraid to personalise it: “I was surprised to see this…” Emotion can be contagious, and your audience is unlikely to care if you don’t appear to.
- Use black boxes in your communications where necessary. You can refer to, for instance, a “random forest” algorithm as a black box item that you don’t need to describe in detail and you don’t need to go under the hood. You don’t need to know how the laser pistol “stun” setting works, you just need your audience to know that it does.
I think a description of conflict or struggle, a key element in many stories, doesn’t help here, or at least only in a limited sense. I’ve tried to get managers to sympathise with technical challenges before, but library conflicts or the frustration of syntax variation are so limited to the programming context that they simply don’t work to generate any shared space with a non-technical reader. We also need to recognise that much of the struggle of actually producing the data product is not actually a central part of its journey to realisation – it’s an aside, more analogous to a builder crushing their thumb during a house build than Hercules completing his tasks. Struggle on its own doesn’t contribute to the story; it has to be a struggle that directly, by the fact of the struggle, moves us forward towards our goal. So while our headaches with obscure error messages, real as the pain is, may not support our greater story, the argument we had with Becky in Accounts, which resulted in the whole tool pivoting to something new, can.
Where we do have an advantage, as data scientists, is in the number of tools and opportunities we have to engage in data storytelling. If academia publishes a weighty and informative tome at the end of each study, we can constantly communicate our purpose with our colleagues through informal conversation, chat and IM, requesting feedback at various stages, and of course through formal presentations and reports. We communicate with the general public through case studies, videos, blogposts and talks. We even communicate with each other and ourselves through code comments, naming conventions and GitHub version messages. In all of these different spaces, though, we’re still telling our story. Telling it well is a skill worth honing.