Google uses a proprietary mathematical algorithm called pagerank to help build its search results.

Google’s PageRank technology plays an important role in how online stores show up in search results. Understanding how this ranking system works will help ecommerce merchants improve their search engine optimization and potentially increase website traffic.

PageRank is a proprietary algorithm — a mathematical formula — that Google uses to calculate the importance of a particular web page based on incoming links. The algorithm assigns each web page a numeric value. That value is the URL’s PageRank.

The underlying assumption is that links are analogous to “votes” for a page’s importance. The more votes a page has, the more important it is. And votes from important URLs have more weight than votes from unimportant ones.

PageRank passes ranking power via links.

In this post, I will (i) discuss why PageRank is important and (ii) explain how to use a simplified PageRank calculation to make sound SEO decisions about internal linking. In all, this article should give you a foundational understanding of this ranking system. And in future posts, I will build on this PageRank information and apply it to SEO techniques.

Importance of PageRank

“Using PageRank, we can order search results so that more important and central Web pages are given preference. In experiments, this turns out to provide higher quality search results to users,” wrote Google’s founders Larry Page and Sergey Brin (along with Rajeev Motwani and Terry Winograd) in their January 29, 1998 paper, “The PageRank Citation Ranking: Bringing Order to the Web.”

Despite this paper and the complex calculations it included, Google’s exact recipe for ranking web pages is not public. But there is enough data available to make some educated guesses and assumptions about the PageRank algorithm and a search engine’s basic procedures.

Our assumption goes like this. Jack starts a search for the phrase “golf clubs.” Google first seeks relevant pages that include content matching Jack’s query. Once Google has located the relevant pages, it ranks those pages based on importance — that is, PageRank. The first page listed on the Google results page had the most PageRank out of all the pages relevant to Jack’s search query. The last page listed had the least.

Good content that matches a search query determines whether a given page will be included in Google’s results. But PageRank determines the order relevant pages are shown.

PageRank is important, then, because it will determine if your site shows up first or last when a potential customer searches for your products.

Google’s search process is as follows:

  1. A user submits a search query.
  2. Google searches all of the pages it has indexed for relevant content.
  3. Google sorts the relevant pages based on PageRank scores.
  4. Google displays a results page, placing those pages with the most PageRank first.

Google does not disclose its exact PageRank formula. But it is a pretty safe bet that calculating PageRank is not easy math.

Google’s PageRank formula is complex.

The folks at SEOmoz have come up with an excellent guess about the PageRank algorithm in the paper, “The Professional’s Guide To PageRank Optimization.” The paper helps site owners know how to estimate a page’s actual Google PageRank and don’t mind spending $39.99.

But when it comes to making certain good choices about SEO (particularly internal linking choices), you don’t really need to know a URL’s actual PageRank. Rather, a simple model that estimates the effect of one SEO strategy or another is just as good. For example, you’ll be able to compare two different internal linking strategies, estimating how each one will affect a page’s ranking, without having to employ higher mathematics.

PageRank Example

Google assigns every new web page an initial PageRank score. For the sake of our example, that initial PageRank will be 1. If I create two new product pages, Blue and Red, those pages would each have an initial PageRank of 1.

Google uses a proprietary mathematical algorithm called pagerank to help build its search results.

Three hypothetical products — Blue, Red, Green — would each have an initial PageRank of 1.

A link from Red to Blue would effectively be a vote for Blue’s importance, and that vote would increase Blue’s PageRank to 2 — Blue’s initial PageRank plus the value of Red’s vote. Red’s vote is worth its PageRank and is called “ranking power.”

If we add a new page Green and Red linked to it, Blue’s PageRank would fall from 2 to 1.5 while Green’s PageRank would rise from 1 to 1.5. Adding more links from Red to Blue or Green will not change things since only one link from Red to Blue distributes ranking power. A second link would not add additional ranking power.

With just this simple model, we can now start to test SEO tactics for internal linking. Plot out two or more scenarios, adding up each page’s PageRank to determine which tactic will work best for a given goal.

For example, imagine that your ecommerce site has five pages: a home page, a category page, and three product pages as illustrated in figure B, below, where the blue box represents the home page, the red box the category page, and the green boxes the three product pages.

What is the best navigation strategy if your goal is to boost your category page’s rank? Interconnecting every page would give the category page a total PageRank of 2, as in figure A above.

Linking (green) product pages to the (red) category page only, as shown below, would result in a PageRank of 5 for the category page, making it the better choice.

Google uses a proprietary mathematical algorithm called pagerank to help build its search results.

Linking (green) product pages to the (red) category page only would result in a PageRank of 5 for the category page.

PageRank Resources

When users seek information from Google, the search engine relies on a proprietary algorithm called PageRank™ to determine the order of the sites that show up in search results. Now, two researchers say a similar algorithm can be used to determine which species are critical to the preservation of ecosystems, allowing scientists to focus conservation efforts on species that will most benefit the entire system.

The research, by Stefano Allesina with the National Center for Ecological Analysis and Synthesis at the University of California, Santa Barbara, and Mercedes Pascual of the University of Michigan at Ann Arbor, was published today in the journal PLoS Computational Biology.

Google's PageRank algorithm ranks Web pages in importance based on the number of other Web sites that link to them. Allesina and Pascual have taken this approach into the wild and determined that PageRank could be adapted to apply to the study of food webs—the complex networks describing who eats whom in an ecosystem. Basically, according to Allesina and Pascual, the species that the greatest number of other species rely on for food are the ones that are most essential to the health of an ecosystem. Or as the authors put it, "a species is important if important species rely on it for their survival."

This approach contrasts with other ways of looking at ecosystems, which use a "hub" approach to rank species based on the number of other species that are directly linked to it through the food web. According to the authors, this technique, which emphasizes the number of connections, does not take into account the position of a species in the food web and the cascading effects its removal would create. They say the extinction of one species could cause the elimination of another, which in turn would cause the loss of a third species. The "PageRank" way of looking at ecosystems makes the species that goes extinct first the most important because it would result in further extinctions down the line.

Coming up with a mathematical equation to determine the top-ranking, or most important, species in an ecosystem wasn't easy. Allesina and Pascual actually reverse engineered their algorithm and used it to determine which species' extinction would create the most ecological harm. As the authors wrote: "We study how we can make biodiversity collapse in the most efficient way in order to investigate which species cause the most damage if removed."

So why is this advanced mathematics even necessary when looking at nature? Writing in the paper's abstract, the authors warned that "because of their mutual dependence, the loss of a single species can cascade in multiple co-extinctions." But food webs are so complex, it would take forever to go through all possible extinction scenarios without an algorithm like this.

What comes next? The authors say they hope their method could be applied beyond ecology to solve problems in other network-related biological fields, such as protein interaction and gene regulation.

Image: Google's Earth Day, 2008 home page logo, via