Tuesday, November 13, 2007

PageRank and HITS

Two theses: PageRank and HITS

PageRank is the heart of Google search engine, which is originally developed by Brin and Page. PageRank is a link analysis algorithm that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references. The numerical weight that it assigns to any given element E is also called the PageRank of E and denoted by PR(E).

Nobody does not know Google, and indeed Google became a part of the modern life. But a lot of IT people do not know HITS, a similar link analysis model / algorithm, developed by Jon Kleinburg and presented on January 1998, at least seven months earlier than Brin and Page’s presentation. From Wiki, “In fact, some credit Kleinberg's work as the inspiration for PageRank, though he's far too modest to accept that mantle.” HITS was not incorporated into a commercial search engine until 2001 when the search newcomer Teoma adopted it. Check http://www.ask.com .

Google does graciously provide public access to a very rough approximation of the PageRank score from 0-10, which can be found at http://toolbar.google.com within Google tool bar. Or you can access the scores without getting the toolbar – check http://www.seochat.com/seo-tools/future-pagerank/ .

HITS method for ranking pages uses both inlinks (inbound) and outlinks (outbound) to create two popularity scores for each page. HITS defines hubs and authorities. It’s easy to get the number of outlinks for any pages, but it’s not obvious to get the inlinks number. Try typing link:http://www.1000knots.net into Google search and notice some numbers. To find out how many links your page has in the indexes of outer search enginees, go to http://www.marketleap.com/publinkpop/ .


Picture of Jon Kleinberg