Getting in Deep:
Finding the Deep Web
When You Need It

by Jamie McKenzie

Despite the ease of locating information on the Internet, the value of that information is often suspect. Much of what we find with search engines may be unreliable, untruthful or irrelevant.

Schools must teach young ones how and where to cast their nets in order to realize a rich information catch.

If we aim for research that probes beyond the obvious, avoids superficialities and reaches for deeper insights, we must coach students on the fine art of locating good sources.

Fishing for the best information on a topic? It may be hidden from Google and other search engines. It may be stored digitally in what is called the Deep Internet.

Matters of Definition

Few users seem aware of the Deep Internet. When polling educational audiences, only a few hands go up when asked how many have heard of this domain. While the Deep Internet provides the most reliable information for many topics, it can do little good for those who do not consult it because it is unknown to them.

The concept first arose in 1994 when Dr. Jill Ellsworth coined the term "invisible Web" to identify information often missed by regular search engines. Some commentators quickly pointed out that the information was not actually invisible but was more importantly difficult to find. The terms Deep Internet or Deep Web seemed to capture the phenomenon more accurately.

Like many new phrases, the term Deep Internet has a span of meanings extending across the information map. For some it refers to partially hidden sites that offer rich digital collections. They are virtually hidden from the general crowd of Internet users because their contents are not subject to search and indexing by the major search engines. They block the "spider" programs sent out by these search engines to find and catalogue the contents of sites around the globe. They are databases that must be searched by query at their own site.

According to Laura Cohen's tutorial on the Deep Web at University Libraries in University at Albany:

This is distinct from static, fixed Web pages, which are documents that can be accessed directly. A significant amount of valuable information on the Web is generated from databases. In fact, it has been estimated that content on the deep Web may be 500 times larger than the fixed Web. (Click for tutorial.)

The Deep Internet term is also used to refer to collections that are available only by paid subscription. Because specialized articles and information in areas such as law, science and business may be quite costly to produce and organize, the publishers of these resources may require payment for access.

Locating the Deep and the Worthy

The best way for schools and students to find the deep and the worthy is to take advantage of Debbie Abilock's wonderful page "Information Literacy: Search Strategies - Choose the Best Search for Your Information Need" at NoodleTools. Click here to give it a try.

Debbie, one of the most inventive teacher librarians and Internet pioneers in the USA (click for bio), has listed 38 types of research or questions that a student might need to approach and then points to several sources that are appropriate for each. Many but not all of these are Deep Internet sources.

Debbie's page simplifies the task of identifying the best sources so that there is little danger of missing resources that would otherwise seem hidden from view.

NoodleTools also offers a search strategy wizard designed with young students in mind. Click here to give it a try.

Looking Beyond Google

One strategy that becomes quickly apparent is the importance of using sources other than search engines. At the top of the list would be directories listing the best sources for various topics. These might include the following recommended by UC Berkeley - Teaching Library Internet Workshops:

Source: Finding Information on the Internet: A Tutorial

Checking for Understanding

How many of the teachers and the students in a school know about the Deep Web?

A good place to begin an awareness campaign would be a staff meeting. It takes just 5-10 minutes to share an article like this one and give the teachers a chance to try out something like Debbie's search strategies.

After this meeting, a brief 4-5 item survey could be administered by the teacher-librarian or one department in the school to all students.

  1. Have you heard of the Deep Internet? (Y/N)
  2. Can you write a definition of the Deep Internet? ________________________________
  3. If you cannot find the information you need with a search engine like Google, list other ways you could locate it. ________________________________
  4. Which of the following have you used when looking for information:
  • __ The advanced version of a search engine
  • __ A listing or directory of good information sites
  • __ NoodleTools
  • __ A page of helpful search hints

The point of checking for understanding, of course, is to set in motion a plan to make sure that all staff members and all students know to look beyond the search engines. A good teacher-librarian should be able to close the knowledge gap with just a few lessons.

Moving on past the mainstream meanings of the Deep Web, there are extended issues related to finding good information and insight on the Web. They may be a bit far fetched, and that is exactly the point. The clever researcher learns that far fetched can turn out to be quite fetching.

Tapping Sources of Novelty

When the Internet first came to schools in the mid 1990s, one of the benefits promised was the richness of perspective that would be available to students because of the unfiltered information sources. Ten years later, that claim seems distant and elusive as much of the information that first pops into view is superficial and processed like slices of cheese wrapped in individual plastic wrap.

While the two categories mentioned earlier in this article are the prime meanings intended by those who employ the term, the Deep Internet has also attracted more mystical definitions. Some use the term to capture elusive but desirable elements buried so far under piles of useless and unreliable information that they may not ever surface or come to the casual researcher's attention. Special search strategies are required to strip away the many layers of detritus (waste) that sit heavily on the surface of the information landfill. Important information is often obscured by accumulations of silt, debris and rubbish. In addition, modern activists of various persuasions often indulge in distortion and spin. Virtual truth abounds. It is in struggling against what David Shenk called "data smog" in his 1997 book of that title, that we seek a path to those regions of the Net that are not readily apparent or easily located but highly valuable like a vein of precious ore running through a mountain range.

In yet another somewhat mystical approach to the Deep Internet, there are some who would find meaning in the layers of detritus itself, arguing that the waste products of digital endeavors can produce insights when properly exploited, employing a metaphor such as the extraction of methane gas from a landfill to explain the strategy. Others might argue a kind of information chaos theory - that the accumulation of information debris might actually possess within it patterns and structures that we do not have a mental framework to capture (yet) and may offer new truths for new times. The study of cosmology (outer space) and the dust that lies between stars and planets is another fitting metaphor for this search for truth and meaning on the Net.

Novelists, especially science fiction writers such as William Gibson, have coined terms like Cyberspace in combination with data mining to capture this notion of converting apparent chaos into meaning.

A main character in William Gibson's novel, IDORU, has the job of "an intuitive fisher of patterns of information," actually trying to help a TV program expose the sins of celebrities by looking for trends in vast databases of seemingly innocent information like credit card charges, phone calls and household bills.

Laney was the equivalent of a dowser, a cybernetic water-witch. (pg. 25)

He'd spent his time skimming vast floes of undifferentiated data, looking for "nodal points" he'd been trained to recognize . . ." (pg. 25)

. . . info-faults that might be followed down to some other kind of truth, another mode of knowing, deep within gray shoals of information. (pg. 39)

We are after the same nodes as Gibson's cyber-witch . . . the junctions, meeting-points, intersections, and crossroads which enable us to "make up our minds," "put 2 and 2 together," and make sense from non-sense.

The Line Twixt Searching and Snooping

As controversy swirls in the USA about electronic surveillance without court warrants, the Gibson passage above takes on special meaning. While the Administration speaks of careful focus on terrorists, the New York Times reported that the Administration had employed a kind of electronic dragnet that was more like the "skimming vast floes" done in Idoru. The original definitions of wiretapping no longer suffice for a world of widespread wireless communications, whether they be phone or email. It is tempting for law enforcement and anti-terrorist professionals to skim hundreds of thousands of communications without observing the legal niceties required by the U.S. Constitution and by law, but there is little evidence that this skimming has been profitable and it represents a chilling shift away from American traditions and values.

