This is an experiment that aims to test SEO (Search Engine Optimisation) techniques in a controlled fashion. In a commercial setting it's often hard to determine which factors are having effect because there are so many variables, ranging from internal content and marketing strategy through to external factors such as competitors optimising for the same terms.
The aim of this experiment is not to produce objectively The Best™© SEO, it is to examine specific techniques and attempt to determine their effectiveness.
For this experiment a brand will be created that is a nonsense word which, prior to the experiment, returns no results on a Google search. This brand will maintain a single page site on a top level domain for the brand's name which will serve as a benchmark for the performance of a domain name on its own.
To compare techniques, three dummy e-commerce sites will be created that all "sell" items from this brand and are therefore competing for position in search results for the brand's name. Each site will be similar but will take a different approach to SEO:
- Control: Basic e-commerce site, not adopting the SEO techniques being tested.
- Optimised: Outwardly similar to the control site but adopting the SEO techniques being tested.
- Client side rendered: Identical to the control but the content is rendered client side.
As there are many factors that influence SEO the sites will in most respects be identical in order to control for these. For example they will be hosted on identical infrastructure to ensure this is not a factor but they will be on different IP addresses as sites residing at the same IP address is a signal that they are in some way connected. Similarly, the URL structure for the sites will be the same unless a test is specifically being run on the effect of URL structure.
Content is obviously a very important factor in SEO and hard to control for. However, given the number of other factors being held consistent this is an issue that needs a different approach.
Unlike the other factors we're controlling for, the same content can't be used for all sites because this would be transparently obvious and issues with duplicated content that can seriously harm SEO would be encountered. However, if different sets of content were used for each site then it would run the risk that one set would favour a particular site and this would skew the results.
What is needed is effectively random content but with the same linguistic value. To achieve this the content for each site will be generated by a Markov chain based on a single corpus taken from real e-commerce sites*. The content will be re-generated on each build of the sites to keep up the entropy and, as they will all be updated at the same time, the content refresh rate is held constant.
* Whilst looking at e-commerce sites to gather the content for this corpus I found that they tend to contain very little meaningful content so Markov generated text is actually a fair representation.
As the goal of SEO is to rank higher in search results, unsurprisingly, the primary method of measuring results for this experiment will be based on Google search results for the brand's name.
The main outcomes for this experiment will be judging the effectiveness of the techniques being employed and also assessing how to effectively track what effects the changes made are having.
This is part of a series of posts that cover experiments in SEO.
Next post: Soft Launch