Hyperlink Behavior and the Absence of Reason

Chaos theory champions the discovery of underlying order and themes within complex and unpredictable data sets.

Phenomena such as weather, the emergence and proliferation of biological populations, and even the behavior of water within a moving container all illustrate 'perfectly random' environments exhibiting thematic properties.

The expansion and convolution of the World Wide Web also shows deep-rooted patterns of 'ordered disorder,' especially concerning cross-referential modeling.

The link graph of the Internet has characteristics very similar to the Lorenz Attractor (pictured above). That is - it continuously displays similar start/stop conditions and shows elements of periodic behavior without ever actually repeating those behaviors.

Social media has only compounded the issue - giving increased inter-connectivity and involvement with the web through vBulletin posts, community profiles, blog comments, user signatures and profiles, bookmarks, likes, shares... ad infinitum.

Search engines attempt to apply hierarchical mapping to the internet by employing complex algorithms assigned to find reason within complex and dynamic web environments. The collective efforts of its algorithm attempts to assign finite meaning to otherwise infinite patterns - failing miserably in the process.

Earlier this year - Google's Panda/Farmer Update (as well as Panda 2.2) marked with it the departure of two elements that have had historical success at gaming search engine algorithms:

  • Spinners and republishers of unique content
  • Linking structures; link pyramids, wheels, farms, etc..

a simple link star

If Panda really did decimate the consequential effects of link structures, we would assume that it had to assign finite properties to these systems. As mentioned above - quite an impossible task.

So then.. why did so many properties that spooled their ranking power from raw link structures suddenly drop in ranking space?

They didn't.

Quite conversely - new elements were introduced into the algorithm that were tasked not with defining what a deceitful link structure was, but rather - what a deceitful link structure wasn't.

This emergence of new signals gave a consequential boost to documents that were characterized as having high amounts of brand equity/trust and showing domain-related legitimacy (historical data, physical server locations, rackspace profiles, age, etc) - and a resulting drop in those documents that didn't.

It's not so much about what Panda did, but what it didn't do.

Using an aggregate index from popular Adwords categories for Q1-Q4 2010 (obtained easily from any semi-shady web service selling 'PPC Top Click Bid Lists') combined with the inclusion of a cronjob to a simple PHP script requesting SERPs data on our index (thanks, Pradeep) - we are able to draw some meaningful insights into the collective behavior of this data.

A sample of just over 25,000 Search Engine Results pages (The top ~10% of query spaces for 9 industrial categories including Electronics, Education, and Information Technology) exhibits a strong correlation for signals that support brand recognition and equity - scaled 1/1000th (for this design - the reach and frequency of exact-match appearances across the web were collected for 2 months using a scripted Methabot running at a 'polite' speed)

This same sample shows almost no correlation - scaled 1/1000th - to linking behaviors exemplified by raw links, the number and nature of the linking root domains, the linearity of backlink profiles (ranging from naturally logarithmic to exponentially viral/pushed), nor to the frequency of repeated anchor tags that are present in amateur link schemes.

Any 1 of these 4 characteristics could give rise (with sometimes reasonable certainty) to a legitimate or illegitimate backlink profile - and yet - it appears to be almost entirely ignored by the post-Panda Google algorithm.

Mike Martinez recently stated that both PageRank and it's predacessor Backrub are (and have always been) nothing more than fables lacking any real-world utility.

Given the nature of unpredictability and the conclusions drawn from a post-Panda environment, I'm somewhat inclined to agree with him.