6. Sourcing and Dorking

Using advanced search tools in order to filter through the overwhelming amount of information available on the internet is termed Google "Dorking" or "Hacking." For journalists conducting investigations, this technique provides a great opportunity to uncover otherwise "hidden" documents on databases, such as internal directories, government websites and research papers.

Dorks enable you to locate targeted data that would not be readily available through a regular keyword search. Remember, as with all open source journalism, you will not be able to find everything you seek. The internet can only find what is already out there, but these techniques will guide you through the maze.

Source: Zhenyu Qi, Global Investigative Journalism Network

Google treats the space bar as "and" so there is no need to use the word. Using quotation marks, minus signs (the hyphen - ) and "or" are some of the most helpful ways to narrow or expand your search field. The term "site:" will restrict your results to a specific website, domain, or domain extension. Likewise, "filetype (or ext:)" will limit results to a specific file format (such as PDFs, DOC, PPT, XLS).

Specific Search Techniques

Operator

Function

Example

Space (AND)

Requires all terms to appear.

corruption defense ministry

OR

Finds either term; expands your net.

"data breach" OR "security incident"

Hyphen (-)

Excludes words or sites.

DOGE lawsuit -site:reddit.com

Quotes ("")

Forces an exact phrase match.

"unexplained wealth order"

Parentheses ()

Group terms to control logic.

(iPad OR iPhone) charger


Precision Investigative Dorks

  • site: Restricts search to a specific domain or extension (site:gov or site:ru).
  • filetype: Filters by format (pdf, xlsx, docx). Use for data dumps or reports.
  • intitle: Limits search to the page title. High relevance for sensitive labels (intitle:"draft") or (intitle:"confidential").
  • inurl: Finds keywords in the web address. Useful for finding directories (inurl:admin).
  • intext: searches only the body text, ignoring titles, URLs, and metadata.

Proximity & Pattern Matching

  • Wildcard (*): Acts as a placeholder for unknown words.
    • Example: "transferred $* million to * account"
  • AROUND(X): Finds two terms within X number of words of each other to ensure context.
    • Example: "Stony Brook" AROUND(5) "received" OR "awarded" "$*million" -grants to Stony Brook
    • Putin AROUND(10) "private jet" will find articles discussing the two in the same paragraph

Using these operators will turn Google from a regular search engine into an advanced investigative database. For example, if you need to trace where money has been sent throughout the world, use AROUND(X), or if you want to find documents that are "not for circulation," or in unpublished "draft" form, use intitle: to search for the page title of a document. Simple phrases such as "private", "confidential" or "secret" can turn up interesting results.

Dorking can help filter out the noise of the public web. In today's world, some of the biggest stories can be found by those who know how to use the correct syntax.

Combining Operators

Single operators are useful. Stacked operators are where investigative dorking actually lives.

Finding internal documents on a ministry website:

site:epa.gov filetype:pdf (intitle:"internal" OR intitle:"confidential" OR intitle:"draft")

Tracing procurement irregularities:

site:.gov.ng (filetype:pdf OR filetype:xlsx) "tender" intext:"sole source"

Finding open directory listings (Apache/Nginx auto-indexes):

intitle:"index of" "parent directory" site:.org filetype:xls

Cross-checking a shell company across leak databases:

"Lotus Holdings Ltd" (site:icij.org OR site:occrp.org OR site:offshoreleaks.icij.org)