Read, Learn and Grow

Venture Stream Blog

seo screaming frog 1

SEO Spider Crawling Tips for Ecommerce Sites

January 30, 2019

SEO Spider Crawling Tips for Ecommerce Sites

Last Thursday, I was lucky enough to attend the world’s first ever SEO Spider training, hosted by none other than Charlie Williams – SEO strategist for Screaming Frog. I’ll get on to what I thought about the session at the end of this article, but first I’d like to share some of my personal light-bulb moments from the training. Hopefully, this article will help you troubleshoot some of the things I was struggling with.

This is also the first time I’m writing for the Venture Stream blog since the big merge with Flow at the back end of last year. So to all my old readers (hi mam), I’m looking forward to sharing my insights on the SEO front with you all.

1. Most of the Things you’re Struggling with can be Solved with a Ticky Box

I’ve used the SEO Spider tool for a good five or so years now, but there are still times when the tool doesn’t act how you want it to. That said, there’s a lot of things that can be solved by tweaking just a few settings, it’s just a matter of knowing where they are.

2. Eliminating all URL parameters in your crawl

This is an issue which some of you that have used the tool may be familiar with. Performing a crawl without removing your product parameters – shoe size, colour, brand etc – can result in a crawl with potentially millions of URLs. There’s a great article by Screaming Frog on how to crawl large sites which advises you to use the exclude function to strip out any URL parameter you don’t want to crawl (Configuration > Exclude). What I didn’t realise was, that if you just want to exclude all parameters, there’s a handy little ‘remove all’ checkbox that lets you do just that (Configuration > URL Rewriting).

1 1

3. Screaming Frog’s SEO Spider is an Amazing Scraping tool!

Custom extraction is something I’ve been playing around with for a while when crawling sites (Configuration > Custom > Extraction). For those who are unfamiliar; custom extraction basically allows you to scrape additional data from a URL when performing your crawls. Data such as:

  • The author of a blog post
  • How many comments a blog has received
  • Social media share counts
  • Email addresses mentioned on the page
  • Structured data – such as the average star rating of all of your products

There are 100s of possibilities when it comes to custom extraction – I may even be inspired to write a blog on this alone (we’ll see how well this one goes down first). This data is so valuable for internal SEO audits and for competitor analysis. It gives you extra insight into your pages and lets you draw more accurate conclusions on why a page or a set of pages is outperforming others.

One of my favouirte things to do with custom extraction, is to pull out the number of products in a set of category pages. This data can help you determine which pages to remove or consolidate on your ecommerce site. For example, if you’ve got several category pages with less than five products in it say, and you’re competing with sites which contain 100’s of products, it’s highly unlikely you’re going to outrank them. In these cases, it could be best to canonicalise it into the parent category, noindex it or in some cases, just delete and redirect it if it receives little to no traffic.

Sometimes, however, I’d crawl a site and the custom extraction data was missing. I now realise this was down to the Javascript. By default, the SEO Spider will crawl in text only mode, i.e. the raw HTML from a site. However, some site functionality, such as the script which runs to tell us how many products there are on a page, is handled using JavaScript.

2 1

Putting the spider in JavaScript mode (Configuration > Spider > Rendering > JavaScript) and running the crawl on this set of URLs again unlocks this additional layer of data. Another headache solved by a simple drop-down menu.

4. Pull in Keyword and External Link Data into your Crawls.

Another really handy feature which I’ve experimented with, but not to its full potential, is the ability to tap into third-party APIs. As an SEO, there are a number of tools which I use on a daily basis: Google Analytics, Search Console, Ahrefs, Moz, this list goes on. The Spider Tool, lets you pull in data from all of these platforms, into one crawl. Data such as how many organic sessions your landing pages have generated, how many impressions a particular page has and how many external links point to your pages.

Knowing the data that’s available, and knowing what to do with it are two separate matters however, but through a number of exports and using the good old vlookup function, you can use this data to put together a handy content audit and prioritise areas of the site to work on. Let’s take the below as three very basic scenarios, all data which is available via the SEO Spider.

URL (SF) Organic Sessions (GA) External Links (AH) Internal Links (SF) Impressions (SC) Potential Solution
/product1 10 0 10 1000 Low traffic but a decent number of page impressions. Few internal/external  links – focus on building these
/product2 100 1 30 500 Good organic traffic. Could be a result of the number of internal/external backlinks that point to this page.
/product3 50 1 0 300 No inlinks may suggest an orphaned page/not in the sitemap. Potentially deleted product – redirect to an alternative
Key: SF = Screaming Frog, GA = Google Analytics, AH = Ahrefs, SC = Search Console

Again, by having the data all in one place, you’re able to get a better insight into the performance of your site, and put together an action plan of what to prioritise. Combine this with custom extraction data such as product reviews, product descriptions and so on, and it really does become a powerful spreadsheet.

SEO Spider Training Conclusion

So there you go, just a handful of useful things that I got out of the training session. The aim of the course was to boost everyone’s confidence in being able to crawl a website, and I personally believe that was delivered. As the first ever group of trainees – aka the guinea pigs – the training was well structured with enough caffeine to stop everyone’s brains from shutting down.

My one negative is that there was perhaps too much information to take in. I mean, it’s not like me to complain about value for money, but there were times when we delved into data visualisation when I glazed over. On a personal front, I’ll be looking to brush up on my regex knowledge to help with the more advanced custom extractions. Or, alternatively, I’ll stick to bribing our devs with tea and cake. Hopefully, Screaming Frog put on more of these training days in the future, outside of London. I would highly recommend anybody who wants to expand their ability to crawl a site to attend.

Finally, practice makes perfect! If you want me to perform a technical crawl of your site, get in touch today. Happy crawling!


Written by Ian

Ian joined the team as Venture Stream’s SEO Manager after the purchase of fellow agency Flow. He has worked in digital marketing for the past seven years, specializing in SEO, SEM and CRO.

See more posts by