Saturday, April 5, 2008

Limitation of Search on the Web

Let us say if you want to see all the Dilbert cartoons in which "Asok the Intern" appeared. You can go to the website http://www.dilbert.com and use their search box. It sends you out to the web and you get to the page showing the information

Asok the Intern

Asok, pronounced ah-shook, was introduced to satisfy the hordes of interns who wrote to request their own character. Asok is brilliant, but as an intern he is immensely naive about the cruelties and politics of the business world. His name is a common one in India (but usually spelled Ashok).


Now if you want to see the strips in which Asok appeared you are at the dead end of the search. May be not. Search further for "Dilbert Archives" you get to this web site http://www.unitedmedia.com/comics/dilbert/archive/

but it only lets you access the strips that are within the calender year. There is no Asok in the year 2008. I think he dies comic death because I have not seen him lately.

We go to the next web site http://www.archive.org and search for old Dilbert comics. You can get the original site but no Dilbert strips because you need the exact web address for the strip you are looking for and obviously I do not have it.

We go to the third web sit http://www.geek.nl/pics/dilbert-arch/ This has all the strips as far as I can see but one has to wade through all of them to find the one I need and in the process become a Dilbert expert if one have the time to process all the strips stored here.

Joking aside it is a serious problem on how to search information on the Internet it is out there but how to get to it. Like I am looking for one particular Dilbert strip where Asok claims to heat the coffee through the heat generated in his head during his thinking process and I know it is there within that directory but it will take couple of hours of sorting through all the files in that directory to get to it.

What Google indexes is really a very small part of the Internet. Images, videos and other non textual information that is not properly tagged can not be found searching through the Index based search engines. One needs to wade through this huge tome of work by Dr. Giles to make sense of that statement.

The citation method for research and judging merit of research work based on the number of citations is an old idea. It became in operable when people misused it and start creating a tightly knit research groups who will cite each other research work to increase the the number of citations. The idea is being revived again in the form of social book marking. If used with intellectual integrity this idea can make finding useful things on the Internet easy. It is being used in del.icio.us ,stumbleupon and digg and many other similar web site that are promoting collaborative filtering( 1, 2) of web addresses.

This approach can help in finding web resources that are not indexed correctly by the search engine robots or are not indexed at all because they are data driven web sites that create dynamic web pages based upon the query of the user.

I think a combination of the two approaches: that is collaborative filtering used by the dedicated web based communities and the web sites suggestions by the index based search engine will solve the problem of finding useful resources on the web.

The idea of social book marking is gaining some currency. There is a popular Facebook group and a website from the creator/admin of the FB group where he is teaching about social bookmarking.

It will work but like any thing else that is useful first the issues of power and its cousin politics needs to be settled. Who should get how much of the profits that are generated by creating a service like this has to be resolved. Especially the successful CEO's who like little children would like to have it all. They may even believe in sharing as long as they can own all the profits. No sharing or caring here.

Any body remember Dilbert comic strip about Asok I mentioned here or know how to find it?

No comments:

Post a Comment