Talk : Search engines change access to knowledge : There will be search after Google :

Nutch


Doug Cutting (Xerox,Apple,Excite,Yahoo!)

Search engines are as critical to Internet use as any other part of the network infrastructure, but they differ from other components in two important ways. First, their internal workings are secret.... Second, they hold political and cultural power, as users increasingly rely on them to navigate online content. When so many rely on services whose internals are closely guarded, the possibilities for honest mistakes, let alone abuse, are worrisome. Further, keeping search-engine algorithms secret means that further advances in the area become less likely. Much relevant research is kept behind corporate walls, and useful methods remain largely unknown.

http://www.acmqueue.com/modules.php?name=Content&pa=showpage&pid=144

[Nutch's primary feature is] Transparency. Nutch is open source, so anyone can see how the ranking algorithms work. With commercial search engines, the precise details of the algorithms are secret so you can never know why a particular search result is ranked as it is. Furthermore, some search engines allow rankings to be based on payments, rather than on the relevance of the site's contents. Nutch is a good fit for academic and government organizations, where the perception of fairness of rankings may be more important.

http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html


Previous: Precydent.com Up:There will be search after Google

Spam-able?


Created by rik@cogsci.ucsd.edu 4/12/08