IG: How attackers find out your kid's DNA using Google ?

Sep 19, 2014 • pentest

How to gather information that are exposed using simple google scans

Information Gathering is the first step involved in penetration testing and Google Dorking is one of the most easiest way you can find several useful informations (even vulnerabilities, password exposures, etc..) about the target site. Some people believe that “Google Dorking is stupid and won’t revel anything useful. Since Google search is so common, companies will make sure they are not vulnerable to Google Dorking”. But this is absolutely wrong. History has shown us that Google can expose a lot more information that you can imagine. Telstra customer database exposure via Google was a very famous news back in 2011 when whole of their private database information including Usernames, Passwords, customer account number etc.. were revealed.

So lets us analyse how this happened: Google Dorking means that you are using Google’s advanced search to to find out all the information regarding your target. If your target is not using a valid and correct robots.txt, then things will become much easier. We will deal with robots.txt and Google dorking protection mechanisms at the end of the article. Let us see some ways to search Google so that we can get all txt files uploaded to a particular site:

1) intext:admin intext:password filetype:txt site:target.com

2) filetype:bak or filetype:mdb or filetype:sql or filetype:csv or filetype:plist or filetype:txt or filetype:url or filetype:user or filetype:vcs or filetype:pdb site:target.com

The first one searches the entire database of Google for any file txt files which contain the string “password” in it. If Google has indexed any txt file like above from your target site, chances are very high that it might contain sensitive information which shouldn’t be shared with 3rd party sites. The second one searches for different types of file types including sql, txt, bak etc.. which are the most commonly used file formats for saving sensitive information. Let us see a more powerful dorking technique:

RedHat linux (used on most of the linux servers) has an option called kickstart for unattended installations, where all the details for OS installations are placed and saved in a file with cfg extension and read from the file during installations. But this file might not be deleted from the webserver once the installation is complete and might be indexed by Google. A peculiarity of this file is that it contains rootpw (root password) which might not be changed after installation is complete.

# Kickstart filetype:cfg

This will search Google for all the sites which has files with .cfg as extension and starts with the keyword Kickstart. Similarly there are so many deadly payloads which you can use to recon a target website. A list of different Google search operators can be seen on their support (go through this to understand different search operators). Exploit DB Google Hacking database is very famous and you can use it to understand several different ways to do the same.

But there is a defect with this approach while pen-testing. Even through Google dorking is a good method to find out sensitive information leakage, manually trying out different search for a given list of targets is very time consuming. So in order to do this, we can use automation tools which are very good at finding sensitive information via Google. Let us see some tools which can automate the tasks for us:

1) OWTF - Passive Online Scanner:

If you wanna try out basic Google Dorking about your target, OWTF online scanner is the best way. You can simply go to Passive online scanner and enter your target URL. Click on the “target” option in the next page and you can see several results. Click on OWASP-IG-002 Search engine discovery/reconnaissance Google Hacking, Metadata > Passive  … You can see online resources which contains several Google advanced search, which you can simply click and see the result.

2) Google Hacking Diggity Project:

Google Hacking Diggity Project is a very active development project which is dedicated for finding out vulnerable systems and sensitive data in corporate networks by leveraging search engines like Google and Bing. This is considered to be number one tool in finding out vulnerabilities using Google Dorks.

3) SiteDigger:

SiteDigger is a powerful tool form McAfee which make use of Google cache to find out vulnerabilities, security configurations and protected file exposure. It even supports scanning via proxy or TOR.

Preventing this is not very difficult but a bit tricky. The best way to ensure good security is to implement an Authorization mechanism before accessing the sensitive information. By this way, we need not specify the sensitive information explicitly on robots.txt and even if Google try to index the same, it cannot see what’s inside. So Authorization mechanism is the best way to ensure good security against Google Dorking.

If creating an Authorization mechanism is absolutely not possible, then try to give long random names to the sensitive files and also try not to reference the file form any locations or other files. By that way, it remains hidden and will be very difficult to exploit. But this doesnot provide a bullet proof security. It could be still prone to attacks.


Sometimes people will save all the confidential files under a common folder like “hidden” and then will explicitly specify in the robots.txt to disallow indexing. This is a very bad way and doesn’t ensure full security. Even though by this way, you can restrict search engines from indexing the folder, robots.txt is a very common file and the first thing an attacker will do as a recon is to analyse the robots.txt of the target site. On the first look itself, attacker can understand there is something valuable in the directory “/hidden” and that could be the reason why the webmaster specified it explicitly in the robots.txt.

Anirudh Anand

Security Engineer @CRED | Web Application Security ♥ | Google, Microsoft, Zendesk, Gitlab Hall of Fames | Blogger | CTF lover - @teambi0s | Certs - eWDP, OSCP