How to Write Data Driven Stories

Data is a crucial resource for the modern journalist: not only is it a source of vital information, but it is also a source of power to be held to account.

Knowing how to use data to find and check stories helps empower you to do so much more with your reporting, from widening your potential news leads and inspiring new feature ideas, to helping you identify the right people to interview — and giving you the right questions to ask.

In this post I will walk through a number of steps involved in data driven stories, from finding the data and making sure it’s reliable, to getting answers from the data — perhaps by combining it with other datasets to put it into context. And finally you’ll need to communicate the resulting information in an engaging and accessible way.

But the first step with any story is a really good idea — and that’s where we’ll begin…

The Inverted Pyramid of Data Journalism outlines the different stages that might be involved in a data driven story.

Finding and generating ideas for data driven stories

Like any piece of reporting, a data driven story can start when you respond to some new information — data — being released, or it can start with an idea that leads you to seek out information that helps shed new light on an issue or event.

A good way to get started with data driven stories is to find datasets that are released regularly, and plan for their next release. Many public and statistical bodies have data release calendars detailing when future data releases are scheduled, but you can also estimate roughly when the next edition of a dataset will be released by seeing how often it is released (yearly, for example) and setting a reminder to check back when that amount of time is close to having passed (see my post on planning a data news diary for more on this).

The advantage of data releases is that they are inherently topical: you can write a story reporting that ‘new data reveals’ something (see the section on getting your data to answer questions below). It also helps you build your knowledge and confidence with datasets.

These ‘story cards’ show different workflows for data driven stories. Source: https://datajournalism.com/read/longreads/data-journalism-ideas

Alternatively, you might be wanting to dig into a particular issue and use data as part of your reporting. As with any story idea, it’s useful to do some initial research into what reporting has already been done on the issue, not least to check if someone has already done your story — but also to identify models for the type of story you might do.

For example, you may find someone used data to look at the same issue, but in a different place or time. This can provide a template that you can repeat for your story, bringing things up to date, or looking at the same data in a different region.

If you have been given a tip-off, or have a hunch, that something is happening, and you want to use data to check that, it can be useful to write down a hypothesis to help identify what sort of data you’ll need to check it (Mark Lee Hunter’s Story Based Inquiry provides a good overview of hypothesis-driven investigations). You will, however, need to be prepared to adapt your story idea if you can’t find the data that you need…

More can be found here.

How to find data for your story

If you have an idea that requires you to seek out data there are a number of strategies you can use to find it — or to adapt your story idea if you can’t get exactly what you want.

One simple tip to improve your search results is to try adding filetype:xlsx or filetype:pdf to your keywords — this will restrict results to Excel spreadsheets or PDFs.

If the data is likely to be published by a public body try also adding site: followed by the domain (for example site:gov.bg would limit results to Bulgarian government sites and site:mig.gov.bg would limit results to that particular ministry).

Look for ‘open data’ initiatives that might be publishing datasets of interest: Data Portals lists hundreds around the world. And look outside your own country for data, too: organisations like the EU and World Bank publish data on a number of countries, and you may find that other countries publish data on their economic or other interactions with your own country.

If the data isn’t published anywhere, you may be able to use Freedom of Information laws — again, not just in your own country: it can be used in other countries to obtain data about your own too.

Finally, consider proxy data that might give you an indirect indication of the phenomena you are interested in. During the pandemic, for example, a drop in pollution acted as a proxy for the rise in Coronavirus cases in China. Global services like Google Trends can provide data on whether search volumes are increasing or decreasing — a potential proxy for other activity.

Assessing data — and ‘cleaning’ it

Like any source, data should be treated with at least some scepticism. How authoritative and reliable a dataset is will depend on a range of factors, including the independence and expertise of the organisation collecting the data, its reasons for doing so (e.g. monitoring, persuasion, research etc.), and the methods it uses. Heather Krause’s data biography model provides a useful framework for mapping out this information.

More can be found here.

It’s important to identify what the data does not measure, and how key terms are defined. Crime data, for example, may record reports of crime, but not experiences of crime. Data on ‘homelessness’ may classify that as ‘living on the streets’ rather than ‘not having a home’.

Look out for terms like ‘margin of error’ or ‘confidence interval’: these indicate the upper and lower ranges within which the real figures are likely to be, when the data is based on a sample.

Some data will need ‘cleaning’: look for duplicate entries that need removing, for example, or unusually large or small numbers which are likely to have been misentered. Check that names are spelt consistently.

Getting your data to answer questions

Once you’re confident that the data can contribute something to your story, you need to be able to ask it questions. The questions you can ask will depend on the data itself, the tools you have available, and your own technical abilities to use those.

For example, if a dataset only provides quite general overall figures (e.g. annual totals, in broad categories) then it won’t be able to answer as many questions as a more ‘granular’ dataset which has more detail. This is because someone else has essentially already asked some questions, and you’re left with the answers to those questions, rather than the ones you really wanted to ask.

Three good questions to ask of any dataset are:

What is the scale of the issue?
How have the numbers changed?
Who or what ranks top or bottom? Or: where does an area of interest rank?

Calculating a grand total can establish the scale of an issue — or you can filter the data to a particular category or area and calculate a grand total for that. To calculate change, work out the totals for two different periods and then subtract the earlier total from the more recent one.

To ask a question about ranking, sort a dataset by a column of interest to bring the highest or lowest numbers to the top, along with their associated regions or categories. If the data is so detailed that numbers are not already aggregated by those categories, you can aggregate using a pivot table. Pivot tables are perhaps the most powerful tool for a journalist using granular data, allowing you to look at your data in a variety of ways, including aggregating by time period in order to see change, and there are lots of tutorials and videos to help you use them.

There are other potential angles as well as stories about scale, ranking and change. For example you might use data to establish how much numbers vary if they are not expected to (for example access to health services); you can use data to look at relationships; you can use it to identify specific leads (for example an organisation that is an outlier in the field); or you might combine a number of approaches in a feature that explores an issue from a range of angles. Finally, if the data doesn’t exist or is flawed, that might be a story in its own right.

(These angles are explored in more depth here).

You can ask different questions of your data to get different angles on your story. Source: https://onlinejournalismblog.com/2020/08/11/here-are-the-7-types-of-stories-most-often-found-in-data/

Putting data into context

Data will often tell you what is happening — but you will need extra information to put that into context, such as:

Time: are these numbers going up or down compared to the past?
Space: is this region better or worse than others?
Why is it happening? And why does it matter?

You can add regional context by adding data from other regions or countries if you don’t already have it, while temporal context can be added by combining your data with data from previous periods to see if things are going up or down.

But interviews will be key to answering the ‘why’ questions that your data raises. Speak to experts to ask why this is happening and why it matters, and try to find case studies whose experiences illustrate that.

Writing up the story — and making it visual

Once you have the core information for your story, you need to be able to communicate that to a specific audience. It’s important to remember that all the figures you’ve been working with are just a means to an end: the final story might focus on just one, key figure (especially if it’s an audio or video piece).

You don’t have to lead your story with the numbers, either: if you’ve got a strong reaction to your figures, or a compelling case study, you might start with those instead. For example, when the BBC Data Unit looked at the deaths of people on benefits the headline focused on the call for an inquiry made in the resulting interviews, rather than the data that provided the basis for that. In these situations, the data becomes the context to the reaction, or the human story.

Try to use as few numbers as possible, and keep them simple. Approximations like ‘over a million’ or ‘more than 150’ are useful to help the reader digest figures more quickly as they read. Ratios and fractions can be useful, too: for example, a phrase like ‘almost a third’ or ‘almost one in three’ will be easier for most readers to understand than saying “32%”.

If you have a list of numbers in a sentence, putting them in a chart or map instead can be a good way to move them out of the story. Data visualisation can be extremely effective in engaging readers and communicating numbers succinctly, and prevent misunderstanding. Simple charts work best — and make sure you’re using the right chart for the job: for a story about composition, use a pie chart; for a story about comparisons, use a bar chart; to show change over time, use a line chart. Use a map to show distribution or for readers to explore — but don’t just use it because the data is geographical (to show the ranking of regions a bar chart may be better, for example).

The FT’s Visual Vocabulary sheet provides a useful guide to the charts that are useful for different types of stories.

Ultimately, remember that a data driven story is just another story. Having the skills to work with data just empowers you to do more with your reporting. There are a wide range of skills to master, so don’t worry about mastering them all at once: the key thing is to start with a story that you are motivated by, and that requires one or two new skills, but not too many. Then try to learn more new skills with each new story.