Ranking and PageRank

  • Ranking is the process a search engine uses to decide the order in which search results are shown after a user enters a query
  • When a user enters a search term, the search engine first looks for matching pages in its index
  • Ranking is used to determine which of these matching pages are most relevant and should be placed at the top of the results page
  • Ranking algorithms are used to score each page to determine its position

PageRank Algorithm

  • PageRank is a trademarked algorithm developed by Larry Page at Google
  • It is used to compile and rank web pages in the results returned by a search engine
  • There are also other algorithms that do similar things
  • It works by checking the number and quality of links to a page in order to determine roughly how important that page is
  • The assumption is that websites of greater importance are more likely to be linked to from other websites
  • The PageRank algorithm was created to address the difficulty of determining the importance of a web page with the growing amount of information available on the internet
  • The algorithm provides better search results that are more precise and related by taking into account various factors beyond just matching keywords

Key elements of the PageRank algorithm

  • There are 4 key elements to the PageRank algorithm:
    • Link analysis
    • Link weight distribution
    • Iterative calculation
    • Damping factor
  • The PageRank algorithm analyses the structure of links between pages on the web
  • Web pages are given importance by the algorithm, which considers the quantity and quality of inbound links from other pages
  • Each link acts as a “vote” for the target page, with the voting weight determined by the importance of the linking page
  • Websites that have more high-quality links pointing towards them are deemed to be more valuable and pertinent and have a higher weight
  • Webpages with a higher weight will score more highly and have a higher ranking
  • The importance of a webpage is calculated by PageRank, which takes into account the total number of “votes” it has received
  • The algorithm distributes the importance of a page to the pages it links to by sharing a portion of its importance with each outgoing link
  • By following this process, pages of superior quality are given greater importance and make a larger impact in determining the ranking of other pages

Iterative calculation

  • The PageRank algorithm uses a repetitive calculation process. At the beginning, every webpage is given the same value to start with
  • In subsequent iterations, the significance of each page is re-evaluated by considering the weighted impact of inbound links
  • The process continues until the rankings become stable

Damping factor

  • The damping factor is a value between 0 and 1 (usually 0.85)
  • It represents the probability that a user will not follow a link on a page and will instead jump to a random new page
  • It prevents the algorithm from getting stuck in infinite loops and makes the model more realistic

Factors influencing PageRank

  • Although the initial PageRank algorithm mainly concentrated on link analysis, present-day search engines consider many factors to improve search results rankings. These factors may include:
    • Relevance
    • User engagement
    • Authority and trust
    • Content freshness
    • Mobile-friendliness

Relevance

  • The content of a web page is a crucial factor in determining its ranking. This is influenced by the keywords used, the quality of the content, and how relevant it is to the search query

User engagement

  • The way users interact with a website can be measured through metrics like click-through rates, time spent on a page (dwell time), and bounce rates. These metrics can reveal the level of user engagement
  • Pages that receive greater engagement from users may be deemed more valuable

Authority & trust

  • The reputation and authority of a webpage or website play a crucial role
  • Several factors can enhance a website’s ranking, including the age of the domain, quality backlinks from reputable sources e.g. government website or the BBC, and trustworthy content

Content freshness

  • Search engines value fresh and up-to-date content
  • Search queries may give priority to web pages that are frequently updated or have up-to-date information

Mobile-friendliness

  • As mobile devices became more prominent, search engines started to factor in the mobile compatibility of web pages
  • Google primarily uses the mobile version of a site’s content to rank pages from that site
  • Having a responsive design and optimising the user experience on mobile devices can have a positive impact on a website’s rankings

Limitations & evolving nature

  • Although the PageRank algorithm is important in search engine rankings, it is not the only factor that determines them
  • Search engines use different algorithms and factors to guarantee that they provide varied, relevant, and top-quality search outcomes
  • Over time, the details of the PageRank algorithm have undergone changes. Search engines regularly enhance their ranking methods to cater to new challenges and meet user expectations

Usage

  • You do not need to know the formula itself, just the abstract workings of it
  • The mathematical formula for PageRank is defined as:
  • is the PageRank of Page A
  • is the total count of outbound links from web page
  • Each web page has a notional vote (its own PageRank), shared equally between all the web pages it links to
  • is the share of the vote page gets from a specific back-linking page
  • All incoming vote fractions are summed and then multiplied by the damping factor to prevent from having too much influence
  • is the damping factor that balances the weight of links against the probability of a random jump
  • It is typically set to 0.85, representing an 85% chance a user follows a link (roughly six clicks before a “teleport”)
  • represents the probability that a user teleports to page specifically, rather than following a link to get there

Example

  • Setup: 3 pages where A → B, C; B → C; C → A ().
  • Process:
    • Initialise: All pages start with
    • Iteration 1: , ,
    • Iteration 2: , ,
  • Convergence: Ranks stabilise to A ≈ 1.36, C ≈ 1.06, B ≈ 0.57
  • Final Result: A is the most important, followed by C, then B