Web Science

Nearly two decades ago, on Christmas Day 1990, Tim Berners-Lee"invented" the World Wide Web when he connected two computers usingHTTP via the Internet. I am not quite sure where Al Gore was thatday, but at that point, the Internet was in place. Now, 18 yearslater, the World Wide Web consists of some 15 billion pages.Considering there are only something like 6.8 billion people on theplanet–most of whom have no access to the World Wide Web–that is atruly astonishing number.

A couple of years ago, Sir Berners-Lee joined forces with NigelShadbolt–professor of artificial intelligence at the University ofSouthampton–to try to understand the phenomenal growth of the Weband find ways to control and use the Web in meaningful ways.Together, they and their respective universities created a newdiscipline they call Web science. "The Web Science ResearchInitiative will allow researchers to take the Web seriously as anobject of scientific inquiry, with the goal of helping to fosterthe Web's growth and fulfill its great potential as a powerful toolfor humanity," according to an MIT press release dated Nov. 2,2006.

The notion the Web can be studied as an object of scientificinquiry is not new. Google and other search providers have beendoing that for years. In fact, the amount of time and effort spentdeveloping search and relevancy algorithms is phenomenal. What isnew is the idea the Web can or should be controlled and directed to"fulfill its potential." The seemingly random growth of the Web iswhat makes it so interesting and attractive. The very randomness ofcontent displayed on the Web is what necessitated the developmentof search and ranking engines in the first place. In fact, onecould argue, search engines have made the Web what it is today.

If a Site Has No Links, Does It Exist?

Imagine a Web site created by a hermitlike misanthrope. The siteitself has a complicated, obfuscated domain name. The site links tono other sites, and the site creator has no friends or colleagueswith whom he will share the URL. The content of the site isirrelevant–it could be anything from drug-addled, dementedramblings to the secret transcripts of the Warren Commission. If noother sites link to it, does the site even exist? Of course, itexists in the same sense a tree falling in the deserted forestcreates a sound, but in the context of the World Wide Web, it doesnot exist if it is not accessed.

However, even without links in or links out, one of the searchrobots will find that site and index it. Then other search engineswill crawl it and index it, and soon our stand-alone site becomespart of the Web. If it really does contain the lost transcriptsfrom the Warren Commission, it may even rise to the top of searchesfor things such as "conspiracy theory" or "JFK assassination." Ifit is nonsense, then it will remain obscure and unused. The pointhere is without search engines and Web bots, many pages on the Webwould exist solely for the use of their owner–kind of like a secretdiary hidden under my bed.

One of the real issues we run into when we start trying to"study" the Web is the fact there is no single type of Web page. Infact, the only thread that runs through the whole of it is HTML(and other things) is transported over HTTP. The difference betweena business site and a personal blog is so great it makes littlesense to speak of them as part of a unified "Web." The Web isdefined by where it exists and how it is accessed, not by what itdoes. A bank in a strip mall provides the same services as anonline banking site. A conversation among friends over a cup ofcoffee provides the same information as a discussion group. Amazonis certainly one of the great "success" stories in Web history, butit does not provide any real difference in functionality than abrick-and-mortar bookstore. So here, we have one fact about theWeb:

Large parts of the World Wide Web serve only as an alternativedelivery mechanism for processes or functions that alreadyexist.

Online shopping and business sites are just a small part of theWeb. They generate the most revenue, and they certainly have madeit easier to do things, but beyond that, they have not really doneanything unique. People who used to catalog shop now shop online.People who used to be visited by an insurance agent to servicetheir policy now do it online–or by phone. The interesting thing iswhile the use of online goods and services sites grows,brick-and-mortar sites also continue to grow. In most cases,business sites are an adjunct to a real business. There are, ofcourse, questionable, quasi-legal, and unpleasant businesses thatexist only because of the Web. While they may merit study in Webscience, I am not going to discuss them here.

It's Me!

Social networking is all the rage these days. E-mail and instantmessaging have morphed into phenomena such as Facebook and YouTube.Blogging has reached epic proportions. One attends a conference andinstead of receiving a handout with reference material from thespeakers, we are encouraged to visit their blog for moreinformation. Hundreds of millions of blogs exist, and they are allcrawled and linked and rated. To what end? Humankind seems to havesome innate need to speak out and say, "Here I am." Services suchas twitter allow individuals to share the most boring and mundanemoments of their lives with us.

Technology has provided us with incredibly easy ways tocommunicate, and for some reason we find a need to communicate moreand more. Consider the cell phone. Why are people always talking ortexting on their phone? Are they exchanging useful information, orare they just talking to hear themselves talk? I do not believe theconstant chatting that surrounds us is all that useful. Moreover, Ido not believe enabling technologies such as e-mail ultimately makeus more productive. The value of every useful e-mail I receive isdiminished by the need to filter through all the junk. Even in theworkplace where e-mail is supposed to be used for work-relatedpurposes, we need to put rules on our inbox to look at items onlywhere we are the "to:" addressee and maybe find time to wadethrough the rest on the weekend.

Social networking, while extremely popular, provides little realvalue.

What about all those "other" Web sites out there? The sites thatjust provide information. Tens of thousands of sites on how to tiea fly or how to make an omelet or how to get the best seat at thebest price on an airliner. Those information sites are the heart ofthe semantic Web–and those informational sites also are the core ofthe real value of the Web.

Take away the business use of the Web, take away the ability tocommunicate using the Web, and what we are left with is billions ofpages of information, which is unsorted, unclassified, andunverified. Some of that information is valuable. Softwaredevelopment would slow to a snail's pace overnight if all theonline code samples and information would disappear. Some of theinformation is worthless. There exist Web sites that expound everybaseless and senseless idea imaginable. That is the first problemwith information gathered on the Web. How do you know theinformation you are viewing is correct? What constitutes anauthoritative source on the Web? Certainly not Wikipedia. While itis very useful and the very epitome of what a Wiki can be, itcertainly is not authoritative.

There currently is no common standard for good information onthe Web. If there were such a standard, who would enforce it? Theproliferation of information available has made it increasinglydifficult to separate the wheat from the chaff. The Internetprovides so much raw information users are unable to distinguishinformation from knowledge.

I fear the Web is producing a population of undiscerningconsumers of information. I recently had a conversation with anindividual who was espousing a certain pop philosophy that inessence says we control our destiny by thinking positive thoughts.She asked me if I understood quantum physics. I stated I had areasonable understanding of quantum mechanics. Her reply was,"Good. So now you understand what I am talking about." The fact herbelief had absolutely nothing to do with quantum mechanics totallyescaped her. She was firmly convinced quantum mechanics hadsomething to do with mind control. She read it on the Internet.Pseudo-science and useless information on the Web are both rampantand dangerous.

We need a way to "rate" the value of information provided on theWeb.

I spend a lot time helping organizations build company intranetsites. One of the things I encourage is the use ofmetadata–information attached to other information that can help toclassify and identify that data. Properly applied metadata makes itvery easy to "find" what you want your users to find. Notice howthat was phrased–the metadata is applied with the goal of makinginformation readily available to the user. The owner of the dataprovides the ancillary information to classify that data.

Right now, the World Wide Web is like the wild, wild West.Millions of terabytes of data exist on the Web with noclassification, no organization, and no way to get to that datawith the exception of search engines. We are desperately in need ofsome easy-to-use, extensible system to classify data on the Web sothat it can be located using some sort of structured query. Onesuch system exists. Part of a possible semantic Web frameworkinvolves Resource Description Framework (RDF), which is layered ontop of basic HTML and consists of triples containing a subject, apredicate, and an object. The concept is very simple. "Babe Ruth"belongs to (is) baseball players . There currently exist Web querytools that allow searches using the principles of RDF.

This is a good start, but we need to devise other methods ofproperly classifying all that information. Imagine a librarywithout a system for classifying and labeling books. Now, imaginethe books just dumped randomly into a football stadium. Now, createan algorithm to find any particular bit of information in thatstadium.

We need a way to apply meaningful, actionable metadata to theinformation available on the Web.

I am reminded of the "Library of Babel"–a short story by JorgeLuis Borges. The library consisted of an infinite number of volumeswith alphabet symbols printed in every possible combination andpermutation. All of humankind's knowledge and literature iscontained somewhere in this library. The problem is there is no wayto discover where it is. Nor is there a way to separate "goodstuff" from nonsense. This actually is what we are facing when weaddress the problems of the World Wide Web. It contains an immenseamount of information. Our challenge is how to get useful knowledgefrom that wealth of information. Let's hope Web science can provideus with the solution.

Want to continue reading?
Become a Free PropertyCasualty360 Digital Reader

All PropertyCasualty360.com news coverage, best practices, and in-depth analysis.
Educational webcasts, resources from industry leaders, and informative newsletters.
Other award-winning websites including BenefitsPRO.com and ThinkAdvisor.com.

NOT FOR REPRINT