Spidering a JavaScript rendered website

Orde Saunders' avatarUpdated: Published: by Orde Saunders

I wanted to diagnose some SEO problems with a site that was entirely client side rendered in JavaScript. It wasn't behaving the way I'd expect from previous experience so wanted to run an experiment in a controlled environment to understand what was happening.

For this I wanted a fully client side rendered site. Rather than "simply" setting up a development environment containing a multi-faceted tool chain, I needed something to get up and running fast. I asked on Twitter and got several good suggestions. I went with Vuepress as it was possible to get a site up and running by only writing some markdown.

Vuepress is actually a bit too good because it generates an isomorphic static site that will work with JavaScript disabled. This is very commendable but not what I needed for testing client side rendered pages so I stripped out the server side markup and set it up with a webserver that respond to all requests with the index.html page. Credit again to Vuepress here because it handled this fine and still rendered the correct markup client side for each page.

With the tech stack sorted I now needed a site I could spider. This experiment only required a homepage with a number of links and a few deeper level pages to check nested link discovery.

Spidering crawl depth

Screaming Frog

For the spidering test I used Screaming Frog. I've got plenty of experience using this to diagnose issues with HTML sites so I knew the issues I was seeing were related to a fully client side rendered site.

As a spider, Screaming Frog isn't trying to accurately replicate Googlebot but for this experiment I wanted to run it against a site where I was fully in control of the content and could see in the server logs exactly what was being requested. This was important because I wanted a reference point for link discovery and can get an insight into how the spider's rendering content.

Screaming Frog's spider report

The browser Screaming Frog uses is Chrome 60 which, for most purposes is fine. However, it's not exactly what Googlebot uses.

Chrome 41

As of May 2019 Googlebot now uses a current, evergreen, version of Chrome so this is no longer required.

For a final rendering check a copy of Chrome 41 was needed as that's what the documentation says Googlebot uses for rendering. This is what's used to create the rendered previews in Webmaster Tools but if you really need to get down into what's being rendered then you need access to dev tools so that means having a copy of the browser you can interact with.

Unfortunately - and unhelpfully - Google doesn't make old versions of Chrome available. However, I did manage to find an archived copy of the 32 bit Linux version so, with a 32bit VM and a minimal Debian install I was able to run it.

This is the Chrome version you are looking for

Or rather I was able to run it with the --no-sandbox command line flag.

Chrome's no no-sandbox warning

(No idea why and it had taken far too long exploring blind alleys to even get to this point so - as long as it ran - I was well past caring.)

Tying it together

Armed with the what I've learned from this - and combined with information from Google Search Console, Google Webmaster Tools, search results and analytics - I've got enough to come up with a strategy to address the specific issue on the site in question.

If you'd like help with the technical SEO of your website then please get in touch.


Liked:

  • Benjamin Read
  • Chris Johnson
  • Rick Chadwick
  • Screaming Frog
  • paolobi
  • Amine Mouafik @ 🇹🇭
  • Manas
  • Georgina Rayner
  • Martin Winberg
  • Chris Taylor
  • Jason Duncan

Republished:

  • (((Lennie)))
  • (((Lennie)))
  • Georgina Rayner

Comments, suggestions, corrections? Contact me on this website or Mastodon.