The internet has revolutionised investigative journalism. From the comfort of my office, we are able to consult official records of companies, land/cadastral authorities and governments from Panama to Mozambique. We can map the movements of an Egyptian arms dealer using social media.
While the old skills of the trade – such as finding sources and scouting locations – remain as important as ever, you can carry out a huge amount of background research without leaving your office. This means it is possible to test your story and make important breakthroughs without having to expend large amounts of time and money.
Given the importance of online research, it surprising that many journalists fail to understand how the internet functions, and therefore fail to exploit its amazing potential.
If you do not understand how to use Google properly, know what the dark web is, and can trawl social media, you will be hugely handicapping your potential.
The internet of things
Surface Web: Google.com is the most visited website in the world and rightly the gateway for many journalists when they are researching an investigation.
Google, and other search engines, work by indexing (taking copies of) of billions of pages across the web. They use a programme called a search engine spider (sometimes known as a robot) to crawl through webpage after webpage, downloading copies of the pages into their servers.
You can see what Google saves by checking the “Cache” option which drops down if you click on the green arrow on the right-hand side of each result.
Tip – How to find Lost pages: Sometimes you can retrieve lost webpages (or pages that have been deliberately deleted) by finding the copy kept in Google cache. Do remember that Google regularly updates the copies it keeps in cache so this is only a short-term option. You can also use the waybackmachine.com, which we will discuss in more detail later.
The pages indexed by search engines are known as the surface web and represent a small fraction of the overall information held online. Exactly how much is disputed but the perhaps around only 20 per cent of all online information is held on the surface web. Surface webpages have their own URL, the Uniform Resource which is often referred to as a “Web address”. Remember, even within this section of the internet which can be indexed by a search engine, there is no guarantee that Google has indexed every page.
The Deep Web: Beyond the surface web lies the “deep web” which, if we are to believe the estimates, represents the remaining 80 per cent of the data. The “deep web” is much more than its image of a shady underworld of arms bought with bitcoins and hitmen hired on secret forums.
Here is a summary of some of the information contained the deep web:
- Databases that are accessed via a search interface. Public databases of companies and land ownership are often free to search but are kept in a “walled garden”. As a result, you will not find results from the archive with a Google search and need to search through each relevant database.
- Password-protected data: This includes court registers, such as the US court archive Pacer, and paid-for databases for company databases such as Orbis.
- A page not linked to by any other page: There may be pages on a particular website which the company does not wish to make public and are invisible to search engines but can sometimes be found through playing with the URL structure.
- The dark web: The dark web is defined as a section of the deep web which is intentionally hidden from search engines and accessible with a particular type of browser which masks your IP address (that’s the equivalent of the home address of your computer). The best-known browser for entering the dark web is Tor (The Onion Ring). The dark web is where some of the seedier activities of the internet occur (it was where the online drug marketplace Silk Road operated until it was shut down by the FBI). While there are no doubt corners of the dark web for journalists to explore, Tor browser is most useful for protecting journalists’ privacy, of which we will speak more later.
- Social media: Social media represents parts of the web often referred to as “walled gardens”, such as social media websites like Facebook and Twitter. This means that the treasure trove of information is invisible to Google and needs to be searched directly through the search engine of the social media application. Use the social media’s own search function.
- Historical pages: pages which have been removed or modified are not searchable but are maintained by various websites, the best being Wayback machine. Here you can find not just deleted webpages but also deleted documents.
- PDFs and images: Google has become increasingly sophisticated in its ability to turn data such as PDFs and photos, previously arguably part of the “deep web”, into searchable material. Google transforms PDFs into searchable text using optical character recognition, which is improving by the day but still fails to convert all the text particularly when it encounters different languages, unusual characters and poor quality PDFs. As a result, there remain huge quantities of data kept in unreadable formats online. If you inspect the mountains of data and information collected by the UN Security Council committee, it is all stored in the form of PDFs. Some of this is searchable, but swathes of the information, mostly original documents which have been scanned in, are lost to the search engines. Unless you read through the documents carefully you may miss key pieces of information.
Key lesson: Do not rely on Google to find information, no matter how good it is. Check webpages manually, find walled-off databases, read PDFs thoroughly and learn to search social media.
Google remains the greatest tool for journalistic research and all reporters must learn how to harness its potential.
Here are some key tips, from the extremely basic to more advanced, that you need to make the most of Google.
Rule 1: Quotation marks
Use quotation marks around your search term to find a specific name. If, for example, you are searching for references to Lawrence Marzouk and do not include the quote marks (“Lawrence Marzouk”), Google will return pages with references to Lawrence and Marzouk, but not necessarily the exact phrase “Lawrence Marzouk”
Tip: Tracking an individual
When you are searching in Google for information about a certain person, remember that names can be misspelt or transliterated in various ways. It is particularly important to look for variants if the name is originally written in a different alphabet. For example, in BIRN’s investigation into Damir Fazlic’s business in Albania, it was important to look for stories in a variety of languages. Damir Fazlić is sometimes spelt “Damir Fazlic” and “Damir Fazliq” when it is published in some Albanian media. We would therefore search for the variants of his name with OR (note the capital letters) between each search term.
While researching a story about Euroaxis, a Serbian bank set up in the Milosevic era in Russia, we found that we had to try multiple spellings to uncover all the relevant information. We had a similar problem while researching the Palestinian businessman and politician Mohammed Dahlan (whose name you can find transliterated into the Latin alphabet in more than a dozen ways).
It is very helpful if you understand how letters are transliterated in order to find alternatives. For example, Euroaxis in Cyrillic is officially written Евроаксис with the Euro changed in Evro. If it is transliterated phonetically it appears as Еуроаксес. Both of these spellings occur and searches revealed useful information. Finally, some outlets had transliterated the spellings back into Latin, with some stories writing about “Evroaksis” or “Euroaksis”.
When researching Mohammed Dahlan, we tried a variety of spellings, but most useful was using Google in Arabic. While my knowledge of the language is limited, I was able to use Google Translate to write out his name in Arabic script. You can select Arabic as your base language and as you type Google translate will automatically transliterate into Arabic script. You may need to ask an Arabic speaker for help in getting this right, but Google Translate often works well.
It’s worth remembering that on member official registers sometimes a person’s surname comes first and often it includes middle names. In this case, a simple search for “Damir Fazlic” or “Mohammed Dahlan” would not return results. You will need to search for “Fazlic, Damir” for some official forms.
To tackle the middle name issue use the * (known as a wildcard) which represent one or more unspecified word for example “Mohammed * Dahlan” also shows results for “Mohammed Yusuf Dahlan”.
Last, you can use Google Trends (www.google.com/trends), which tracks trends in Google searches, to find out how other people are searching for your target.
Rule 2: Narrowing your search
Using the world’s biggest library has its benefits, but how do you filter the information to what is important to you? Once you have found it, do you trust the information? You can do this by using two Google search commands “site:” and “:inurl” which we will explain here.
Trawling government archives
The best (although not unimpeachable) source of information is from governments and official bodies. Luckily Google allows you to filter results in that way.
Most countries in the world have “gov” in the domain name of their official websites. The US has “.gov”, the UK “.gov.uk”, Albania “gov.al” and Kosovo “rks-gov.net”. There are exceptions within countries and internationally, of course – the Prime Minister’s office in Albania is kryeministria.al and German official websites end with just the country prefix of “.de” – which you should take into account.
By typing your search term followed by site:.gov in Google you will be searching for any webpages with your search term within websites with domain names which ends in gov. As a result, it will return official, US government webpages.
Type: “Lawrence Marzouk” site:.gov
Result: All references to “Lawrence Marzouk” foundon websites which end in .gov
If you type your search term followed by inurl:gov you will be searching for any webpages with your search term where the domain name includes .gov.
Type: “Lawrence Marzouk” inurl:gov
Result: Government webpages from around the world, including gov.al (Albania), gov.rs (Serbia) and gov.uk (United Kingdom), which contain “Lawrence Marzouk”.
Narrowing your searches by country
Beyond looking at government webpages, you may want to limit your searches to a particular country. For example, perhaps you are looking for references to someone in the Spanish media.
Type: “Lawrence Marzouk” site:.es
Or for Albania:
Type: “Lawrence Marzouk” site:.al
Of course, not all webpages are country-specific (there are many webpages which end in .com without being American), but this does give you an option to narrow down your search and is particularly useful if you are looking for media stories from a particular country.
Rule 3: Filetype
It is possible to narrow down your search based on the type of document you are searching for. This is done with the “Filetype:” search term.
This can be useful when you are looking for official material. Often companies, organisations and governments upload important documents (yearly accounts for example) in the form of PDFs, Word documents or presentations.
This function allows you to hone in on these documents.
So, if you type your search term followed by Filetype:PDF and inurl:gov Google will search for all pages with your search term in a PDF and where the domain name includes gov.
Type: “Lawrence Marzouk” filetype:pdf inurl:gov
You can select different types of filetypes (and indeed carry out much more advanced search options) by clicking on the cog at the top right-hand side of the screen and then selecting “advanced search”.
There are many other useful search commands you can use – too many to list here. But you can check out most of them here: http://www.googleguide.com/ and practise using them in your own time.
The final thing to consider when carrying out a Google Search is that even if you master the commands, you will not get the full use out of Google unless you think creatively and think carefully about what search terms to use. Try to imagine what sort of page you are looking for, what type of format the information you are looking for might come inand alongside what other words. Also remember that a person or company name is not the only thing that identifies them – you might want to try a phone number, email address, website or office address.
Google Image Search:
Images can also be extremely helpful sources of information for your story and also provide leads to other information.
Google provides a means to see if and when a photo has been used elsewhere on the internet.
If you click on the “image” option on Google, you will notice on the far right-hand side of the search bar a camera icon.
Click on that, and it gives you the ability either to upload a photo from your hard disk or include the URL (the web address) of an image.
Once you have done that, Google will search online for any other examples of that same image and will even come up with similar pictures, although the latter is not of much use to journalists.
This allows you to find other webpages which have used the same image, for example, new stories from a variety of countries or different social media profiles using the same photo.
It can also help to check if a company’s claims are authentic.
When BIRN was looking into the New York-based firm Siva Partners, which announced a series of developments in Albania and Kosovo in 2012 meeting the then Albanian PM Sali Berisha, we checked the photos of its offices and boardroom which featured on its website and looked unusually grand.
Is this office really what it seems? By running the photos through Google images, we were able to discover that its boardroom was rented office space and not the grand boardroom of a major company.
Google also has an advanced image search option which allows you to tinker with colours and add search terms to narrow down your options: https://www.google.co.uk/advanced_image_search
Google Maps and Earth:
Do you need to check out the offices of a company in London, New York or Paris?
Perhaps you would like to check the development of a new building in a capital city? Zoom in on a restricted military base? Find out what a prominent politician’s home looks like behind the tall walls.
Google Maps, the online search function from Google,and Google Earth, the downloadable application, can help out with all of these.
Search for particular places using the satellite view or, where it is available, use Street View to get a roadside picture of a location.
On Google Map, using Street View, you will notice at the top left-hand side the option to scroll back through time and look at how a location has evolved. You may notice interesting developments, changes to posters or intriguing vans parked outside.
Google Earth (which you will need to download) offers historical Satellite images by clicking on the timer at the top right on the screen. You can also find street-level 3D images which can be helpful to visualise a location.
Unfortunately, Google Maps offers historical images of Street View and Google Earth of satellite pictures, but not vice versa, so you have to toggle between the two.
Tip: Searching for people in Google maps
It is possible to search for people based on email addresses, twitter handles or just names through Google Maps. The results are a little erratic but it is worth a try.
This article is originally published in BIRN Albania’s manual ‘Getting Started in Data Journalism’