Headless SEO and Best Practices

What is Headless Web Architecture and why does it Matter?

The formal definition of a Headless architecture is the decoupling of the front-end and the back-end. But what does this actually mean?

Well in a traditional CMS setup like WordPress and Drupal, the CMS is responsible for storing all the content, business logic and rendering the HTML markup that is shown on the client’s browser when a specific page is requested. In Headless approach, the client side (front-end) is decoupled from the CMS (back-end). The client-side application is usually built with a JavaScript based Static Site Generator like Gatsby, Next, Nuxt etc.

For optimising a site for SEO, traditional CMSes have a plethora of add-ons (or plug-ins) that can enhance features. So, for WordPress websites Yoast and Rank Math Pro are both exceptional SEO plug-ins (and there are plenty more) that make it much simpler to optimise your site. With headless CMS’s these aren’t available, so there are a few things you need to do in development to compensate.

Static Site Generator?

A static site generator (SSG) is a JavaScript tool that automatically generates the HTML pages for the website given a template and content. With a traditional CMS, the pages are built on the server when the user visits the relevant URL – known as server-side rendering. By contrast, with a static site generator, the pages can be built ahead of time and can therefore be served up much more quickly, usually via a CDN. Which means there is almost no load-time delay as the pages are not fetching data from the server every time they load.

The SEO Problem with Headless

With blazing fast load times one and great CWV performance gains surely rankings for headless websites are going to be through the roof, right? Well, yes and no. There is a bit more than just quick load times when it comes to technical SEO, as well as some extra work that needs doing to optimise headless websites.

As already stated, headless sites are mostly static with the content is populated at build time and the rest populated when the user loads the page.

As search engines use bots too crawl through web pages and these do it very quickly. It is highly likely that the dynamic content loads after the bots have already crawled the page in which case the page won’t be indexed properly.

This means the developers need to ensure that as much content as possible is loaded in the pre-rendered static page including the metadata.

To drive the SEO factors for headless the developers must use every trick up their sleeve and all the best practices available to speed up load times and optimise client-side rendering. We have listed a few of the areas we look at whilst developing. Just remember, we cannot include all SSG tools in one article, so we are mostly focusing on Gatsby in this one, and as technical SEO is such an important factor during development make sure you research as much as you can to ensure nothing is overlooked.

Headless CMS and Search Engine Optimization (SEO)

Whether you're using a traditional or Headless platform, the first thing you need to understand is that there is no difference in SEO best practices, and there are plenty of guides and tools to help you with this. Your strategy for headless will therefore ultimately remain the same since you’ll still be managing key on-page SEO elements like meta titles, descriptions, alt text, linking, URLs, etc. The only difference will be the implementation.

Unlike traditional CMSes, a headless CMS doesn't include the SEO-enhancing plug-ins or offer control over content rendering. On paper, this probably sounds like a red flag, but it's not. Using a headless architecture can benefit your SEO due to its enhanced speed, security, front-end flexibility and user experience.

However, since this CMS type won’t necessarily give you the plug-and-play simplicity of a traditional CMS, you’ll need to follow the best practices (or work with a great development team like us) to optimize technical SEO aspects and performance.

SEO basics

Headless or not, when it comes to SEO, getting the basics right is always a key factor. The points may seem minor from a traditional CMS perspective as the CMS usually takes care of these under the hood but considering headless builds are very bespoke, developers must explicitly cater to these key SEO factors.

Metadata

Metadata is the information contained in the <head> tag of each page, such as the page title, description, and social media tags, and is used both to rank pages and determine what is shown in the search results page. In terms of SEO, metadata can be the low hanging fruit.

We recommend the inclusion of at least the following metadata items in each page:

Title – The page title is shown on search results pages. Meta title should be unique for each page on the site.
Description – The description is also displayed on the search results page. It is very important for encouraging users to click the link and visit the site.
Twitter cards – Twitter cards are used to provide a rich preview of the site when it’s linked to on Twitter, which in turn helps to drive traffic to the site.
Open Graph cards – Like Twitter cards, Open Graph cards are used to provide a rich preview of the site on Facebook.
Canonical URL – By providing a canonical URL we can tell search engines which URL is the master copy of a so they know which one to list in search results. Specifying a canonical URL also helps to avoid issues related to duplicate content.
Robots - This is an area where you can give additional instructions to search engines and where you can give instructions about not including a particular page in their index.

With Gatsby, it is a good practice to store sites metadata in Gatsby-config.js asd. To include this data into the site pages at build time an SEO component can be created using the Gatsby’s React Helmet plugin. By adding the SEO components to our templates or page components, we can ensure that the metadata is available as soon as a search engine crawler loads the page.

Robot Meta Tag - robots.txt

The robots.txt file is used to tell the crawlers which pages of the site should be indexed and which one should be ignored. It’s protocol that search engines should adhere to but be careful as this isn’t always the case. It also doesn’t remove block pages from search results so be prepared to use no index and removal requests if there’s something you don’t want in the SERPs.

In Gatsby Gatsby robots txt plugin can be used to generate the robot.txt file automatically. Once installed, add the plugin to gatsby-config.js and specify the host of your site and the location of your sitemap (which you can generate automatically using Gatsby Sitemap plugin)

Structured Data

Structured data is a way for website owners to tell search engines like Google what the content on their website means. It adds context to content. Google and other search engines use structured data to improve the appearance of search results by presenting additional relevant information, using rich snippets and knowledge graphs.

We can implement structured data in your Gatsby site using the Gatsby React Helmet plugin and SEO component we also used to include metadata on our pages.

To test your structured data head over to Google Structured Data Testing Tool

Media

Media like images and videos or specifically load time for the media files are the key contributing factor to slow page speeds. Used right, images and videos can elevate the user experience of any webpage but if they are preventing users from actually accessing the important content due to slow connection, it will have direct impact on the sites SEO

For images there is an array of best practices to choose from to maintain performance: resizing, creating alternate versions for different devices, file compression, blur-up with an SVG placeholder, and lazy loading to name a few. The only issue is it takes time and resources to apply all these techniques. Fortunately, all these are automatically taken care of in Gatsby using Gatsby Image Plugin

For videos it is always a good idea to load the video after the whole page has loaded. For prebuild-page the video container can have a static thumbnail image of the video to emulate the video is loaded. This will provide a better UX for the users.

Prismic has built a great solution to optimise media files with Imgix (https://imgix.com/). This dynamically compress and optimize images making it quicker and simpler to optimise your site.

API calls (Pre-Rendering)

Pre-rendering is one of the most important Headless SEO practices. It entails adding JavaScript code to HTML to prepare pages for crawling and generate static HTML pages for caching by CDN. In other words, it’s the process of building a dynamic page that is turned into static at build time.

Headless builds by nature rely heavily on API calls. The calls can impact the URL if they are dynamic. A very common case for this issue is paginated blogs or articles as it is easier to generate these pages via just API calls.

While dynamically generated pages are nice and all, search engines cannot read these pages and hence these pages will not be indexed. So, any page that needs to be indexed needs to have static URL.

Maintenance of URLs

If we are migrating from a traditional CMS to a headless setup, one thing to keep in mind is to maintain as much of the URL structure as possible from the previous build. This way it is less likely to see the dip in SEO rankings as the crawlers won’t have to re index all the pages again.

Links vs onclick

With a headless CMS the front-end stack will usually be written using JavaScript, JavaScript is completely different to HTML and even though Google has made great strides to understand it there are still principles it struggles with for example Onclick – we had a client that used Onclick for their international selector – Google can’t replicate a click like a user can and therefore could not find the internal links to crawl the international websites.

404 Handling

A common issue with modern web apps is if a user enters an incorrect URL or follows a broken link. While they may see content that tells them the page was not found, the accompanying HTTP status code is still 200 (success). This is because the URL has been translated into an API route. As long as the API responds successfully and serves a page it will return a 200-status code.

From an SEO perspective, this is a problem. Search engine bots crawling your pages check the status code to determine if there’s a real page at each URL. Returning a success status causes the page to be treated as a normal page, ranked based on the (sparse) “page not found” contents and potentially listed in search results. To avoid this and ensure the page is treated as the error you intended, you need to return a 4xx status.

Luckily, Gatsby handles this for you. You can create a custom 404 page at src/pages/404.js with helpful content to redirect your users, and when a user hits a route that doesn’t exist, they are redirected to that page with a 404-status code. Simple!

We’re always happy to help, feel free to get in touch with our team if you have any questions, or book a discovery meeting if you have a project, you think we can help you with.