Relying on traditional web search engine queries to mineinformation may omit critical data needed to break your case.

|

Unsophisticated users of Google, Bing, or Yahoo believe thosesearch engines locate everything on the web. As webresearch exploded because of the ease of typing a search query intoGoogle or Bing, specialized search results lay hidden.

|

However, Google and Bing searches should be analogized to icefloating on the top of much deeper water. Estimates are that Googleindexes 40 billion web pages, but there are 450 billion web pages(often database-related) hidden to search engines. Where is thehidden web data? Below, we explore the “hidden web,” whyeverything is not visible, strategies to conduct more effectivesearches and deep web resources to consider when performing adatabase query in your claims investigation.

|

The Hidden Web: Who is the Real WalterWhite?

|

The hidden web is data not indexed by a search engine because itis closed off. Hidden web data is specialized content from aspecific data topic of changing content. Unreported general searchdata was originally called “invisible” but now referred to as partof the hidden or deep web.

|

Why is data not indexed? One reason is that some search enginesseek static HTML pages that are also linked to other pages.However, estimates are that dynamic web pages outnumber staticpages 100 to 1.

|

Hidden web pages are dynamic search results generated fromsearching a database with a customized query. Once those resultsare viewed and closed, the results cease to exist on the web.Analysts, experts or researchers at a vast array of worldwideinstitutions compile databases that are not normally recoverable bya general Google search. Much like the character Walter White inthe TV series “Breaking Bad,” Google search results looks can bedeceiving (on the surface, Walter White is a mild mannered highschool science teacher diagnosed with cancer who in fact startsmaking methamphetamine initially to pay for his medical billseventually becoming a murderous drug kingpin).

|

Why do some search engines miss this hidden web data when theyuse sophisticated computer crawling technology? Search engines useweb crawlers/spiders that follow hyperlinks through protocolnumbers. Spiders are artificial intelligence programs that searchthe public Internet reading static web pages. The spider reports toits mother database with the results. Those results are catalogedfor general searching by users.

|

That technique is effective to identify resources on the surfaceweb. However, spidering technology often returns links based onpopularity not content. Those results are not necessarily showingrecent data or relevant information. Some web data containsrobot/spyder exclusions blocking certain pages from being indexed.Password access material is also not indexed. Moreover, somecreators submit their own web pages directly for listing withsearch engines. Nevertheless, search engines do not use databasequeries because of the unlimited possible number of potentialqueries in the database format. Those web crawler programs do nottype. They do not think. They do not input key words in separatesearch boxes in databases. Nor do they enter passwords.

|

Thus, databases that require individualized searches generatepages on demand and are not accurately reflected in Google websearch results. If you want to fly to San Francisco, you can searchGoogle, and you will be directed to airlines or to servicesoffering discount airfares. You will not be able to initially gettimes and days of flight you need, because you have notindividually entered those appropriate search queries in theairlines' search query. The actual flights you need to get to SanFrancisco are not shown in a Google search result, requiring you todo a deeper query at the airline web site. Hence, if youassume your Google or Bing searches will pull all responsive datato your investigation, your investigation will likely never “getoff the ground.”

|

Preparing An Effective Search: Seeking Walter's Lab Location

|

Hidden web data does not mean the information cannot beaccessed. In “Breaking Bad,” Walter White and his former highschool student partner Jessie Pinkman first start cookingmethamphetamine in an RV and later in an underground room, all thewhile staying one step ahead of the DEA's search for “Heisenberg”(Walter White's pseudonym in the drug business). To pursue Walter White's lab's location, DEAAgent Hank Schrader had to establish search parameters inAlbuquerque, New Mexico. Similarly, if you want to conduct aneffective search of the hidden web, you have to plan your searchmore than just typing search words into Google or Bing.

|

Analyze your search topic. Where do you begin? Are there uniqueterms, jargon or phrases that describe your issue? For example,your search involves “organizational fraud intelligence.” Are thereequivalent terms or different ways to spell your search query?Should you consider use of bureaucratic, departmental, andmanagerial in place of the word “organizational?” For “fraud,”should you instead use extortion, deceit, and scam? Compile a listof all terms and potential search queries using the alternateterms.

|

Start your search in the right place. Identify specificdatabases you want to search about your claim. Is there a directoryfor the data you are seeking? Are there organizations, people,groups or societies that may have the information you want? Dothose organizations have databases you can access? Some databasesmay be pay for access only. Sometimes you get what you pay for. TheUniversity of Michigan developed OAIster, a searchable databasethat provides access to public materials from research and academicinstitutions.

|

Are there experts in your field of interest? What organizationsdo they belong to? Is there a discussion group/blog for thoseorganizations? Are there searchable databases at thoseorganizations?

|

Continue to refine your search terms and the databases you areindividually searching. If your search strategy does not work, tryanother approach.

|

Deep Web Resources: Searching for the “Blue Sky”Formula

|

In “Breaking Bad,” Heisenberg develops a formula for “Blue Sky”methamphetamine that is 99.1% pure described by his partner Jesseas “the bomb.” Assuming the hypothetical formula was on the hiddenweb, what resources need to be assessed to identify a database thatmight help explain the likely properties of “Blue Sky?”

|

If you need a mega portal to jump start your hidden web search,consider InfoMine.It has thousands of links to hundreds of databases collectedby the University of California, Riverside under subject categoriesBio, Ag & Med Services, Business & Economics, PhysSci,Engr, CS & Math to name just a few.

|

Another option is The Complete Planet, which provides what it calls a“comprehensive listing” of dynamic searchable databases that arenot crawled or indexed by search engines with a topic break outcategories.

|

The Virtual Library provides aquick search option and category jumping off points, while theLibrary Spot can be usedto obtain an overview of the subject. Claims professionalsshould also consider the following resources: check for scientificmaterial on Intute; globalscientific updates at WorldWide Science; Science.gov ; Google scholar; and other similarspecialized search engines. If you don't know of a specificdatabase to search, then consider a metasearch engine that combinesresults of several top search engines, such as Clusty.

|

The “Blue Sky” methamphetamine sold by Heisenberg in “BreakingBad” took time to penetrate the drug market and ultimately itssuccess led to the downfall of the operation. Competitors wanted totake over the product or put Heisenberg and his partner Jesse outof business. Similarly, piercing the hidden web by searches takestrial and error. The old saying “Nothing ventured nothing gained”comes to mind. The failure to pursue information on the hidden webmeans viable claim information about your issue of interest,claimant or insured will remain undiscovered. A false picture muchlike Heisenberg as a meek high school teacher will colorperceptions of your claims investigation results. Go the extra mileby going beyond a generic Google or Bing search to specializeddatabases.

|

PeterA. Lynch is a partner in the subrogation and recoverydepartment at Cozen O'Connorand a legal columnist for interfire.org and the CaliforniaConference of Arson Investigators. He can be reached at [email protected] or follow him ontwitter @firesandrain.

|

Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader

  • All PropertyCasualty360.com news coverage, best practices, and in-depth analysis.
  • Educational webcasts, resources from industry leaders, and informative newsletters.
  • Other award-winning websites including BenefitsPRO.com and ThinkAdvisor.com.
NOT FOR REPRINT

© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.