April 27, 2013

Letters from the Future: Challenging Google’s Search Engine

A previous version of this article was posted on Duck Duck Go reddit, where the user _zekiel pointed out that DDG currently uses the two-level search we had proposed.

Google is the undisputed search leader (88% market share in the US1). Google is not only ahead of competitors in terms of quality of search results, infrastructure worthy of science fiction, and computer science research. Another of their strengths is how quickly they apply their own research.

How can Google be dethroned? Sure, there are other search engines, and newcomers like Blekko and Duck Duck Go make headlines from time to time. However, when you look more closely at those other search engines, you find that they cannot seriously compete with Google.

Benchmarking Search Engines

A search for “reverse engineering” on Blekko returns a hundred thousand results, while the same search on Google returns approximately two million. If it is so difficult for Blekko to compete at the crawling level, imagine what happens in the rest of the search engine pipeline. Just looking at Google’s search quality reports  tells you that Page Rank was only the catalyst for much more sophisticated algorithms.

Duck Duck Go manages to attract a geeky audience with highlighted features like putting privacy first. If we search for “reverse engineering” on Duck Duck Go the results seem wacky: the second result is http://reverseengineeringinc.com/ a content poor site which just has the right domain name.

Google appears to be in a league by itself. It currently seems unlikely that they could lose significant market share due to an engineering weakness. In order to outdo Google, we must think holistically and try to guess how the web as a whole will evolve over the next ten years.

A Holistic Approach

Duck Duck Go created a two level search engine for sites like Wikipedia or YouTube. DDG offers DuckDuckGo Instant Answer API to incorporate the search engines of third parties. In order to take advantage of DDG and other two-tiered search engines, sites will have to improve their local search. Currently if you search using the local site search inside Stack Overflow, for example, the results are much lower quality than the same query in Google restricted to stackoverflow.com. When each site understands its own data better than Google, its internal search results will surpass Google’s. Google will no doubt continue to provide better global results, but the two-tiered search would decentralize efforts to improve algorithms. It is important to note that this solution does not need to be distributed: sites can share their local indexes and ranking algorithms with the routing search engine.

The fact that a small number of sites receive the majority of Internet traffic means that optimizing the top sites for a two layer search would make a big difference.

Notes

  1. Search Engine Market Share

Additional Resources

  1. Can a small search engine take on Google?
  2. Google’s Weakness, AltaVista’s Strength [2002]
  3. Is the Internet driving competition or market monopolization?
  4. Media and Internet concentration in Canada

See Also

  1. Reverse Engineering and The Cloud
  2. Egont, A Web Orchestration Language
  3. Helping Search Engines to Find Content in the Invisible Web
  4. The Data Portability Fact Sheet