OSINT Archives

How to Use Tweetdeck to Conduct Open Source Human Rights Investigations?

osr4rights: Using Tweetdeck to conduct open source human rights investigations

Dragana Peco: How to Track Offshore Companies?

Offshore companies have got a bad press in the last few years, not surprisingly, since they play a massive role in all kinds of criminal activities.

Legality question

“There are some country destinations, some business registers, where we can’t find information on who the shareholders of the company are,” said Dragana Pećo, investigative reporter for KRIK and OCCRP.

Although establishing an offshore company is legal, people can use it for illegal activities. Pećo illustrates this with what some corrupt Balkan politicians do. “They steal the money from the budget, buy property and want to hide it,” she explained.

Tracking offshore companies has become a daily activity for her. “It’s not that I find some new offshore company every day that I should investigate, but it’s every day … [either] as a part of some cross-border project I’m working on with some colleagues, or something related to local stories,” she said.

Secrecy, remember?

The most important tools for tracking offshore companies are business registries and databases. Some are free, while some can be pricey. The first task, however, is to find information and documents about the company in an official business registry. Usually, these kinds of information and documents cannot be found in official company registries, since it is an offshore jurisdiction – secrecy, remember? The agent who helped found an offshore company will usually ignore a journalist’s emails.

But some information can be found in other places. These companies usually have subsidiaries, she said. “What I do is, I search for the documents that they should submit to local business registries,” Peco explained.

Information can be found in court records and different databases worldwide, such as Orbis, LexisNexis, Offshore Alert, Sayari and Arachnys. Journalists can also use opencorporates, ICIJ’s Offshore Leaks database, OCCRP’s tool for records search as well as ask for assistance from the investigators at an online platform run by OCCRP, Investigative Dashboard.

How To Make a Database of Proceedings Against Officials

Is the Serbian state interested in fighting corruption? A good question, to which many citizens would probably say: No. But how to show this is true? How to find good examples of unsuccessful fights against corruption without pointing to specific cases? How to detect loopholes in the system?

The answer imposes itself; this systematic problem can only be detected by systematic analysis.

That’s how we at CINS came up with the idea of making a database of proceedings against officials – a creative and innovative way of showing how important and powerful institutions like the Anti-Corruption Agency, ACAS, are not doing their job.

What makes this CINS project special is that we haven’t just written an investigative story in which we list all the problems. We wanted to show our readers, and the public, what the true problem was, and give the problem it’s first and last name.

We wanted to give the public an opportunity to see the data and all the information by themselves and get a chance to have important documents just a click away, so they can believe what their eyes tell them.

To achieve that, this project became a collaboration of journalistic skills and innovative open source technology in the purpose of reporting about corruption.

What the database is

The database of proceedings against officials is a unique database in Serbia. It contains information that you will not find on any government or institutional website.

First published in 2016, it was upgraded in December 2018 and now contains information on all proceedings initiated by the ACAS against public officials since 2010.

By public official, we mean practically anyone appointed to work for the state, from the President to ministers, mayors, municipal heads and directors of public companies. But not only them. An official is also a public prosecutor, judge, faculty dean or director of a school.

They have all had an obligation to send data on their assets and revenues every year to the Anti-Corruption Agency since 2010, and to inform it about any potential conflicts of interest while working for the state.

If they don’t report, for example, that they have an apartment in another city, or own a company, or that their wife or husband owns cars, the agency should initiate proceedings against them.

If they don’t report that they have employed a family member, the agency should also initiate a procedure.

All this information is available in one place, the CINS database.

For several months, CINS collected data, such as:

The list of officials that the agency initiated proceedings against, and the reason for initiating the proceedings.
The list of procedures initiated.
Who the officials that violated the law were.
Which officials the agency filed misdemeanour or criminal charges against.
What the prosecutors and courts did with a misdemeanour or criminal charges

The results

1. Who are the officials potentially breaching the law?

Are you interested in whether Prime Minister Ana Brnabic, President Aleksandar Vucic, former Mayor of Belgrade Dragan Djilas or former and current finance ministers Mladjan Dinkic or Sinisa Mali broke the law? All you have to do is type in their name in the search IME I PREZIME. The list contains more than 1,700 officials.

What you will find is the following: In the case of Sinisa Mali, today Minister of Finance, for example, we see that the agency launched three proceedings against him over 2014 and 2015, and that in all three cases he received the smallest possible “punishment” – a written warning.

2. In which towns do these officials work?

Do you live in Nis, and are you interested in officials from Nis that have come under the radar of the agency? In the search option MESTO, type the word “Nis” and the database will give you a list of all the officials from this town against whom a procedure has been or is being conducted. In this way, citizens and journalists can quickly get information about officials from their hometown and keep track of what these officials are doing.

The City of Belgrade is in the first place according to the data of the database, which makes sense, as so many state institutions are located in the capital. In second place is Novi Sad, again with a host of provincial officials. The third and fourth place is occupied by Nis and Kragujevac. Behind them comes Vranje. This tells us that the most irresponsible officials come from these towns and cities.

3. Institutions and positions

Do you want to know which current or former minister the agency has investigated? This option is also possible through search option FUNKCIJA (position). Simply write the Serbian word “minister” and the database will list all the ministers from several governments that have been on the radar of the agency.

Analysis of the position that the officials held when they violated the law shows us that the most irresponsible ones were directors of public enterprises and, in second place, the employees of the highest bodies of local municipalities.

This search option is complemented by the search of a type of institution where officials work – ORGAN. We have thus enabled our readers to search, for example, for who the officials employed by the Agency for Privatization that violated the law are. Or, against which judges of the Basic Courts the agency started proceedings?

4. What were the outcomes of the proceedings?

During our work on this database, we constantly wondered what our readers would want to find, what would be useful search options for journalists, and what information could be important to NGOs. So, the process of organizing search options was long. Without the help and suggestions of our developer who worked with us on developing the database, we would not have been able to do something that is simple, on the one hand, and useful on the other.

We also strengthened and improved the four basic search options to provide data on how the agency “punished” these officials through four basic measures:

a warning
a decision on law violation
recommended dismissal
misdemeanour and criminal charges

It was important for us to have this data because it shows how seriously the agency takes the fight against corruption, and whether irresponsible officials were eventually punished for violating the law.

The results of the data were disappointing. In most cases, the agency went for the smallest punishment – a warning measure that was merely a written statement and does not entail any special consequences. In most cases, the most severe “penalties” – criminal charges – were dismissed by the prosecutors’ offices. No investigated officials went to jail, the data show.

The right recipe for creating a database

To create a database, more things need to come together to make the magic – a dedicated team, a good organization, clear knowledge of the topic and data you are working with, a programmer and a lot of patience.

When we started to work on this database, we wanted to include young journalists in the project, to give them a chance to work on a big project and learn from the beginning.

So two of our then participants at the CINS School of Investigative Journalism joined the team of experienced investigative reporters already working on this complex project.

These were key elements:

– For a team to function and work on schedule, good organization is crucial. First, it means a clear division of work and coordination, a balanced distribution of work, regular meetings to monitor what has been done and plan the next steps, continuous communication and solving problems as they come, because they will surely exist even though you think it is clear from the beginning who is doing what.

– Another key element is Excel, an incredibly useful but also complex program that was the beginning and end of every day for us while working on the database. It was necessary for journalists to know how to work in it and to respect the strict rules that we agreed on. This means that if we say that the years are written without a dot, in the end, one person cannot write “2014”, the second “2014.”, and the third “2014. God”. Similarly, if we agreed that the figures should be written only as a number, they should not write “50 thousand dinars.”

Working in Excel obliges you also to something called cleaning and data processing, because errors will appear somewhere. But if you initially create a system of how data is entered, it will be much easier.

– Our programmer was very helpful here, because he marked all the cells and columns in the Excel that contained minor errors and omissions, such as extra space functions or incorrectly typed words.

– In the end, before publishing anything, CINS standard demanded fact-checking. So, one person who did not have anything to do with the work on the database checked all the data entered in the Excel for several days and pointed out potential flaws.

– Time. Databases cannot be done quickly. One of the key reasons in our case is that Serbia is still far from what is called digitization, and even further away from the idea of open data. In some countries, all these data would be publicly available and organized by the state. That’s not the case with us. The Anti-Corruption Agency publishes part of its information on its website in the form of various not-searchable scans and unorganized documents located in different places and not easy to find. The other part of the information is no longer available.

A few months after CINS released the database, the agency pulled key documents from its site. You cannot see any information about proceedings against officials, except on the CIJS website today.

In the end, time is an important factor in collecting data for any database because we had to send dozens of requests for access to information of public importance to institutions throughout Serbia to collect the documentation that we need.

If a person wanted to get information about all the proceedings against public officials in Serbia and to find the outcomes of these procedures, he or she would have to send nearly 1,000 requests for access to information of public importance to various institutions. Even if you have enough journalists and time to do this, it’s still not certain at the end of the process that you will collect all the information. This is because in Serbia institutions simply do not respond in time, do not send out everything you asked for, sometimes send out poorly scanned documents that are almost useless and sometimes send the wrong documents.

Although the deadline for responding to requests is 15 days, some institutions, and in the first place the Anti-Corruption Agency, has not yet responded to the submitted requests.

Despite the obstacles, a Law on Free Access to Information of Public Importance exists in many countries around the world and is important for all journalists who want to collect data from various institutions and create a database of them.

In the end, there were many challenges over the course of several months while we were working on the database, and my role was a sort of coordinator of the entire project in addition to the journalistic work on the research. Having a person working with the whole team and knowing at all times how much has been done and how much more you need to do is very important. That is the only way you can keep up, especially when the project lasts for six months and time can easily slip away.

Why GitHub

Reporting about fight against corruption through open source tools and open data was the main idea when we decided to make the entire database code publicly available at GitHub.

The idea is that anyone who wants to do something similar can make use of our code and make something new. All you need is knowledge in R programming language or a programmer that can re-use the code for you. Therefore, you can go to the CINS profile on the GitHub platform at any time and download the database code.

Database Proceedings Against Public Officials is the first example of data made available by journalists in the form of open-source code in Serbia. The project was also presented at the Open Data Week in early April 2019 and CINS was the first media in Serbia to participate in the event.

In May 2019, the database was nominated for the international journalistic award Data Journalism Awards 2019. It seems to me that we did a good job.

Benjamin Strick: How to Get Into Open Source Investigations?

First, what are open source investigations? Benjamin Strick, an open source investigator for BBC Africa Eye and Bellingcat, calls them “any publicly available data that we can get our hands on and running an investigation without having to get feet on the ground.”

For troublesome areas in the world, the possibility of researching without putting yourself in danger is important.

Getting into it is easy, Strick said. “There’s a really friendly online community on Twitter. That’s how Bellingcat was formed, with all of these friendly people.”

When anyone at Bellingcat publishes an investigation, they often show how they got their result, so it’s easy to follow their steps.

“Not only is it finding out really interesting things about how the world works and what happens in conflict and government corruption, but also how you can find that information and the different tools and tips that they pass on that you can use as well,” Strick said.

“Anatomy of a Killing”

Getting familiar with different OSINT techniques opens up a huge field of potential stories.

One of the most striking was “Anatomy of a Killing ,” a documentary published by BBC Africa Eye. In July 2018, a video published on social media showed two women and two children being led away at gunpoint by a group of Cameroonian soldiers. They were blindfolded, forced to the ground, and shot 22 times. The government of Cameroon dismissed the video as “fake news,” but BBC Africa Eye proved where the video was filmed, when it happened and who was responsible for the killings, with names of the people who were involved.

Benjamin Strick: How to get into open source investigations?

It took them three months to piece the story together.

“Sometimes, it can be quite quick, such as a simple fact check of an open source or a simple geo-location or chrono-location. But sometimes a complete investigation could take quite long. There have been investigations that we worked on that took six months up to a year,” he said.

Tools of the trade

Every investigation starts with some kind of footage that must be verified, located and explained in detail. The first task is to precisely determine the way it came to the investigator; was it from social media, or from a phone app like WhatsApp or Signal?

“They might already have information: ‘Yeah, this is a video that was shot in Sudan about the protests recently,’” Strick said. If that’s the case, the investigator’s job will be somewhat easier, although they still need to go through all the steps to verify it.

“I would go through using image-reverse searches, having a look if I could geo-locate the videos, then find the location on satellite imagery, which would give me a name of a place, and going further and pivoting on that information to find out more,” Strick said.

The most important tools Strick uses are Google Earth and various satellite imagery tools. Many of them can be found in Bellingcat’s list of tools as well as on BIRD.

How To Scrape, Pentagon Arms Investigation Case Study

Huge amounts of data are publicly available but poorly organised. One way to unleash the power of the data is to scrape it from its original source and repackage it in a spreadsheet.

Here’s an example of how. “The Pentagon’s $2.2 Billion Soviet Arms Pipeline Flooding Syria” was almost entirely based on public information that the United States government makes available on several websites like USA Spending or Federal Procurement Data System. But the only way to get to the interesting data we needed was to scrape.

Basically, scraping is downloading data from a web page or an online database and storing it locally, so you can play with it and organize it in a way that helps you work on your story.

Data are all around the web, but often organised in a way that is readable to humans, but not to machines. That’s “unstructured” data.

Your goal is to make it structured, to end up with a clean set of data stored in a table with only the information you need for the story.

However, it’s a long road to get there.

Scraping is not simple. There are a lot of tools, browser add-ons and apps that offer one-click solutions but those often don’t work on larger sets of data and can’t handle multiple pages or, if they do, they are expensive.

Good scraping requires some coding skills. Even if you use free and open-source tools like Scrapy or BeautifulSoup you will need to understand programming to follow online tutorials for scraping. The good news is that there are a lot of free online tutorials to help you get started.

Those tutorials will teach you how to extract information from a website and export it into a table that you can then manipulate.

Chrome extensions such as Scraper and Data Miner, for example, are excellent tools for simple scraping jobs.

A more complex and more expensive solution is websites such as import.io which enables you to make scrapers for multiple pages.

If you do have some coding skills, several websites, like USAspending.gov for example, offer an application programming interface, API, that you can use to dig your own data.

If not, there are readily available solutions like enigma.io or data.world that already have done half a job for you; they have scraped public databases so you can use it to dig your own stories.

That’s what we did for the Pentagon story. We had spotted that for each public payment to a contractor, the origin of the goods was declared. This meant that it was possible to see where the US military was buying weapons from in Central and Eastern Europe.

Unfortunately, there was no easy way to search through the US federal procurement data system by “country of origin”, so we needed a Plan B.

We decided to turn to enigma.io, a website that offers a comprehensive repository of public data. They have already scraped complete USAspending.gov, so we didn’t have to, and could concentrate on filtering what was relevant to our story.

That sounds easy, but it was far from that. The dataset of US government contracts lists every dollar contracted by every US agency, so, with all the filters applied, the first set of data we downloaded was well over 100MB in one .csv file. CSV stands for Comma Separated Values, and it’s just one big text file containing columns separated by commas and rows in separate paragraphs. You import that in a program that handles tables, like MS Excel, LibreOffice Calc or Google Sheets.

As we were facing this type of data for the first time, we had to become acquainted with it pretty fast.

The data contained all the weapons and ammunition procured by Pentagon from 2011 to 2016. But to figure out what we needed, we had to understand the procurement processes at the US Department of Defense and decode the technical lexicon.

Without doing background research, the data would have been meaningless.

Since we were all new to this dataset, we had to go back and forth several times before we were satisfied with the gathered information.

We focused our attention only on so-called “non-standard” weapons procured from Central and Eastern European countries and intended, or likely to be intended, for Syria. At the same time, because the clerks filing this information were sometimes careless, we needed to go through all other data and dig out valuable contracts hidden behind incomprehensible codes.

After removing all the data that was not relevant to our work, we reduced it to less than 1MB, which was much easier to work with. That shows why it’s important to know exactly what you’re looking for, what your story is about and what data you need to tell it. Click here for a copy of our final dataset.

It shows why it’s important to ask the right question. Only then can you expect data to give you answers.

However, to trust the gathered information, we had to verify it.

Luckily, US procurement data are stored in multiple databases.

To verify every contract we were interested in, we used the US online procurement records database in the US federal procurement data system. Likewise, we checked the most problematic contracts with officials at the Pentagon.

Thanks to all of that, we were able to catch the Pentagon’s attempt to hide some embarrassing data. We found out that, after we started to work on the story and after asking the Pentagon inconvenient questions, the US Department of Defense staff went into FPDS database and changed several mentions of Syria. As we had stored the originals, we were able to expose this attempt to rewrite history.

This article was originally published in BIRN Albania’s manual ‘Getting Started in Data Journalism’.

How To Search Online, Google Basics

The internet has revolutionised investigative journalism. From the comfort of my office, we are able to consult official records of companies, land/cadastral authorities and governments from Panama to Mozambique. We can map the movements of an Egyptian arms dealer using social media.

While the old skills of the trade – such as finding sources and scouting locations – remain as important as ever, you can carry out a huge amount of background research without leaving your office. This means it is possible to test your story and make important breakthroughs without having to expend large amounts of time and money.

Given the importance of online research, it surprising that many journalists fail to understand how the internet functions, and therefore fail to exploit its amazing potential.

If you do not understand how to use Google properly, know what the dark web is, and can trawl social media, you will be hugely handicapping your potential.

The internet of things

Surface Web: Google.com is the most visited website in the world and rightly the gateway for many journalists when they are researching an investigation.

Google, and other search engines, work by indexing (taking copies of) of billions of pages across the web. They use a programme called a search engine spider (sometimes known as a robot) to crawl through webpage after webpage, downloading copies of the pages into their servers.

You can see what Google saves by checking the “Cache” option which drops down if you click on the green arrow on the right-hand side of each result.

Tip – How to find Lost pages: Sometimes you can retrieve lost webpages (or pages that have been deliberately deleted) by finding the copy kept in Google cache. Do remember that Google regularly updates the copies it keeps in cache so this is only a short-term option. You can also use the waybackmachine.com, which we will discuss in more detail later.

The pages indexed by search engines are known as the surface web and represent a small fraction of the overall information held online. Exactly how much is disputed but the perhaps around only 20 per cent of all online information is held on the surface web. Surface webpages have their own URL, the Uniform Resource which is often referred to as a “Web address”. Remember, even within this section of the internet which can be indexed by a search engine, there is no guarantee that Google has indexed every page.

The Deep Web: Beyond the surface web lies the “deep web” which, if we are to believe the estimates, represents the remaining 80 per cent of the data. The “deep web” is much more than its image of a shady underworld of arms bought with bitcoins and hitmen hired on secret forums.

Here is a summary of some of the information contained the deep web:

Databases that are accessed via a search interface. Public databases of companies and land ownership are often free to search but are kept in a “walled garden”. As a result, you will not find results from the archive with a Google search and need to search through each relevant database.

Password-protected data: This includes court registers, such as the US court archive Pacer, and paid-for databases for company databases such as Orbis.

A page not linked to by any other page: There may be pages on a particular website which the company does not wish to make public and are invisible to search engines but can sometimes be found through playing with the URL structure.

The dark web: The dark web is defined as a section of the deep web which is intentionally hidden from search engines and accessible with a particular type of browser which masks your IP address (that’s the equivalent of the home address of your computer). The best-known browser for entering the dark web is Tor (The Onion Ring). The dark web is where some of the seedier activities of the internet occur (it was where the online drug marketplace Silk Road operated until it was shut down by the FBI). While there are no doubt corners of the dark web for journalists to explore, Tor browser is most useful for protecting journalists’ privacy, of which we will speak more later.

Social media: Social media represents parts of the web often referred to as “walled gardens”, such as social media websites like Facebook and Twitter. This means that the treasure trove of information is invisible to Google and needs to be searched directly through the search engine of the social media application. Use the social media’s own search function.

Historical pages: pages which have been removed or modified are not searchable but are maintained by various websites, the best being Wayback machine. Here you can find not just deleted webpages but also deleted documents.

PDFs and images: Google has become increasingly sophisticated in its ability to turn data such as PDFs and photos, previously arguably part of the “deep web”, into searchable material. Google transforms PDFs into searchable text using optical character recognition, which is improving by the day but still fails to convert all the text particularly when it encounters different languages, unusual characters and poor quality PDFs. As a result, there remain huge quantities of data kept in unreadable formats online. If you inspect the mountains of data and information collected by the UN Security Council committee, it is all stored in the form of PDFs. Some of this is searchable, but swathes of the information, mostly original documents which have been scanned in, are lost to the search engines. Unless you read through the documents carefully you may miss key pieces of information.

Key lesson: Do not rely on Google to find information, no matter how good it is. Check webpages manually, find walled-off databases, read PDFs thoroughly and learn to search social media.

Mastering Google

Google remains the greatest tool for journalistic research and all reporters must learn how to harness its potential.

Here are some key tips, from the extremely basic to more advanced, that you need to make the most of Google.

Rule 1: Quotation marks

Use quotation marks around your search term to find a specific name. If, for example, you are searching for references to Lawrence Marzouk and do not include the quote marks (“Lawrence Marzouk”), Google will return pages with references to Lawrence and Marzouk, but not necessarily the exact phrase “Lawrence Marzouk”

Tip: Tracking an individual
When you are searching in Google for information about a certain person, remember that names can be misspelt or transliterated in various ways. It is particularly important to look for variants if the name is originally written in a different alphabet. For example, in BIRN’s investigation into Damir Fazlic’s business in Albania, it was important to look for stories in a variety of languages. Damir Fazlić is sometimes spelt “Damir Fazlic” and “Damir Fazliq” when it is published in some Albanian media. We would therefore search for the variants of his name with OR (note the capital letters) between each search term.

While researching a story about Euroaxis, a Serbian bank set up in the Milosevic era in Russia, we found that we had to try multiple spellings to uncover all the relevant information. We had a similar problem while researching the Palestinian businessman and politician Mohammed Dahlan (whose name you can find transliterated into the Latin alphabet in more than a dozen ways).

It is very helpful if you understand how letters are transliterated in order to find alternatives. For example, Euroaxis in Cyrillic is officially written Евроаксис with the Euro changed in Evro. If it is transliterated phonetically it appears as Еуроаксес. Both of these spellings occur and searches revealed useful information. Finally, some outlets had transliterated the spellings back into Latin, with some stories writing about “Evroaksis” or “Euroaksis”.

When researching Mohammed Dahlan, we tried a variety of spellings, but most useful was using Google in Arabic. While my knowledge of the language is limited, I was able to use Google Translate to write out his name in Arabic script. You can select Arabic as your base language and as you type Google translate will automatically transliterate into Arabic script. You may need to ask an Arabic speaker for help in getting this right, but Google Translate often works well.

It’s worth remembering that on member official registers sometimes a person’s surname comes first and often it includes middle names. In this case, a simple search for “Damir Fazlic” or “Mohammed Dahlan” would not return results. You will need to search for “Fazlic, Damir” for some official forms.

To tackle the middle name issue use the * (known as a wildcard) which represent one or more unspecified word for example “Mohammed * Dahlan” also shows results for “Mohammed Yusuf Dahlan”.

Last, you can use Google Trends (www.google.com/trends), which tracks trends in Google searches, to find out how other people are searching for your target.

Rule 2: Narrowing your search

Using the world’s biggest library has its benefits, but how do you filter the information to what is important to you? Once you have found it, do you trust the information? You can do this by using two Google search commands “site:” and “:inurl” which we will explain here.

Trawling government archives

The best (although not unimpeachable) source of information is from governments and official bodies. Luckily Google allows you to filter results in that way.

Most countries in the world have “gov” in the domain name of their official websites. The US has “.gov”, the UK “.gov.uk”, Albania “gov.al” and Kosovo “rks-gov.net”. There are exceptions within countries and internationally, of course – the Prime Minister’s office in Albania is kryeministria.al and German official websites end with just the country prefix of “.de” – which you should take into account.

By typing your search term followed by site:.gov in Google you will be searching for any webpages with your search term within websites with domain names which ends in gov. As a result, it will return official, US government webpages.

Type: “Lawrence Marzouk” site:.gov

Result: All references to “Lawrence Marzouk” foundon websites which end in .gov

If you type your search term followed by inurl:gov you will be searching for any webpages with your search term where the domain name includes .gov.

Type: “Lawrence Marzouk” inurl:gov

Result: Government webpages from around the world, including gov.al (Albania), gov.rs (Serbia) and gov.uk (United Kingdom), which contain “Lawrence Marzouk”.

Narrowing your searches by country

Beyond looking at government webpages, you may want to limit your searches to a particular country. For example, perhaps you are looking for references to someone in the Spanish media.

Type: “Lawrence Marzouk” site:.es

Or for Albania:

Type: “Lawrence Marzouk” site:.al

Of course, not all webpages are country-specific (there are many webpages which end in .com without being American), but this does give you an option to narrow down your search and is particularly useful if you are looking for media stories from a particular country.

Rule 3: Filetype

It is possible to narrow down your search based on the type of document you are searching for. This is done with the “Filetype:” search term.

This can be useful when you are looking for official material. Often companies, organisations and governments upload important documents (yearly accounts for example) in the form of PDFs, Word documents or presentations.

This function allows you to hone in on these documents.

So, if you type your search term followed by Filetype:PDF and inurl:gov Google will search for all pages with your search term in a PDF and where the domain name includes gov.

Type: “Lawrence Marzouk” filetype:pdf inurl:gov

You can select different types of filetypes (and indeed carry out much more advanced search options) by clicking on the cog at the top right-hand side of the screen and then selecting “advanced search”.

Be creative.

There are many other useful search commands you can use – too many to list here. But you can check out most of them here: http://www.googleguide.com/ and practise using them in your own time.
The final thing to consider when carrying out a Google Search is that even if you master the commands, you will not get the full use out of Google unless you think creatively and think carefully about what search terms to use. Try to imagine what sort of page you are looking for, what type of format the information you are looking for might come inand alongside what other words. Also remember that a person or company name is not the only thing that identifies them – you might want to try a phone number, email address, website or office address.

Image search

Google Image Search:

Images can also be extremely helpful sources of information for your story and also provide leads to other information.

Google provides a means to see if and when a photo has been used elsewhere on the internet.

If you click on the “image” option on Google, you will notice on the far right-hand side of the search bar a camera icon.

Click on that, and it gives you the ability either to upload a photo from your hard disk or include the URL (the web address) of an image.

Once you have done that, Google will search online for any other examples of that same image and will even come up with similar pictures, although the latter is not of much use to journalists.

This allows you to find other webpages which have used the same image, for example, new stories from a variety of countries or different social media profiles using the same photo.

It can also help to check if a company’s claims are authentic.

When BIRN was looking into the New York-based firm Siva Partners, which announced a series of developments in Albania and Kosovo in 2012 meeting the then Albanian PM Sali Berisha, we checked the photos of its offices and boardroom which featured on its website and looked unusually grand.

Is this office really what it seems? By running the photos through Google images, we were able to discover that its boardroom was rented office space and not the grand boardroom of a major company.

Google also has an advanced image search option which allows you to tinker with colours and add search terms to narrow down your options: https://www.google.co.uk/advanced_image_search

Google Maps and Earth:

Do you need to check out the offices of a company in London, New York or Paris?

Perhaps you would like to check the development of a new building in a capital city? Zoom in on a restricted military base? Find out what a prominent politician’s home looks like behind the tall walls.

Google Maps, the online search function from Google,and Google Earth, the downloadable application, can help out with all of these.

Search for particular places using the satellite view or, where it is available, use Street View to get a roadside picture of a location.

On Google Map, using Street View, you will notice at the top left-hand side the option to scroll back through time and look at how a location has evolved. You may notice interesting developments, changes to posters or intriguing vans parked outside.

Google Earth (which you will need to download) offers historical Satellite images by clicking on the timer at the top right on the screen. You can also find street-level 3D images which can be helpful to visualise a location.

Unfortunately, Google Maps offers historical images of Street View and Google Earth of satellite pictures, but not vice versa, so you have to toggle between the two.

Street View hasn’t arrived in Tirana yet but you are able to scroll through historical satellite images and use the 3D street-level view.

Tip: Searching for people in Google maps
It is possible to search for people based on email addresses, twitter handles or just names through Google Maps. The results are a little erratic but it is worth a try.

This article is originally published in BIRN Albania’s manual ‘Getting Started in Data Journalism’

BIRD Community