From Swiss leaks to Panama papers, data leaks to journalists are getting bigger and more frequent.
Frederik Obermaier, investigative reporter and editor at Suddeutsche Zeitung knows all about it. He and his colleague, Bastian Obermayer, started the Panama Papers, a project about the largest dataset leaked to journalists so far. It was also the largest collaboration of journalists, involving more than 350 reporters from 80 countries, coordinated by the International Consortium of Investigative Journalists.
Handling such big leaks is complicated, but how to get them in the first place?
Be visible
“To get leaks means that you are visible, in a way like you are on the internet, you’re speaking at public discussions, you’re out there. You have to present yourself because otherwise informants or whistleblowers won’t find you,” Obermaier said.
His first advice is to start your own web page.
“My web page is something like a business card,” he said.
On their page, journalists should list all the points of contact, their phone numbers, personal email addresses, mailing address (if someone wants to mail in a hard drive), tips on safe communication and all the encrypted messenger services they use. “As soon as a new one pops up, I create an account because I don’t want to force a whistleblower to a certain way of communicating,” Obermaier said.
The downside is that your contact details are out there, so people can call you up at night, and send annoying messages, like they do to Obermaier. However, he said it’s worth it. “For every 10 people that insult you, you get one good story and that story is worth it,” Obermaier said.
Questions to ask
After they get a leak, journalists should ask if the dataset is authentic. If it is, check if it is in the public interest. If it is, check if there is any condition, other than source protection, attached to the dataset, such as publishing only certain documents or at a certain time.
“I always get a little bit nervous and cautious when I hear, ‘Oh you have to publish on a certain day,’ or, ‘You have to publish this and that story,’” Obermaier said. “I would normally shy away.”
The hardest step, however, is the first one – making sure that the leak is genuine. “Checking the authenticity of a leak is a pain in the ass, sorry for my French,” he said.
The tools you need
The problem is that a normal leak today is several gigabytes or terabytes of data, and journalists have to quickly scan through it and cross-check them with public records, company records, court records or with human sources.
To do that, journalists need tools. For Obermaier, one of the most important is optical character recognition, OCR, which makes scanned documents machine-readable. For that, he likes Abbyy’s FineReader, which is a paid solution for OCR. Next, when you have all your leaked documents OCR-ed, you need something to search them with. For that, Obermaier recommends Aleph, a tool developed by Friedrich Lindenberg from the Organized Crime and Corruption Reporting Project, OCCRP.
He said that Nuix is also very helpful, but quite costly. Then it’s really helpful to visualize your data and for that he recommends neo4j software.
“But of course, with when dealing with such a big leak like the Panama Papers, where we had more than two terabytes of data, it brings a certain responsibility. You have to check every document before you publish it. You have to … give the other side the chance to comment on the documents on which you want to report,” Obermaier said.
Emailing Vladimir Putin
That means reaching out to people you’re reporting on, even if it means emailing Vladimir Putin, the President of the Russian Federation.
Several days before publishing the Panama Papers, journalists sent a request for comment to the Kremlin, asking Putin to comment about his friend Sergei Roldugin, his companies and the role these companies had in financing Putin’s daughter’s wedding.
They never got an answer.
“But we saw that the spokesperson of Putin, Mr Peskov, invited journalists to a ‘spontaneous’ press conference, speaking of an information attack from the West. That was the moment when we realised, ‘OK, they read our email,’” Obermaier said.