Moderne JavaScript-Frameworks and SEO-Optimierung: Tips and Tricks

Spread the love

As a modern programming language, JavaScript is the most used language on the web today. In this post, we’ll be discussing some of the common issues that can affect your site from the perspective of an SEO specialist. We’ll be looking at how to help Google support your site through better search engine optimization (SEO). Jaynike comes with a lot of musical audience to reach them like real visitors. As you read through this blog post, keep in mind that it is meant to be an informative guide and not a complete solution. Every webmaster should talk with their website’s developer(s) and make sure they know how they can benefit from implementing these best practices.

Modern JavaScript Frameworks and SEO: Tips and Tricks

Daniel Marx 2 years online No comments

Historically, search engines have always had major problems with indexing content that is loaded via JavaScript, and in some cases still do. So it’s no wonder that many SEO managers feared JavaScript for a long time like the devil feared holy water. But there are solutions to the problem.

© Shutterstock / Berezovska Anastasia

MORE ON THE SUBJECT

Tales from the wrong end – a maintainer’s story of open source & CVEs: The livestreams from IPC & iJS to watch

With the increasing spread of modern JavaScript frameworks such as Angular, React or Vue.js, it is hardly possible in SEO today to deal with the topic of JavaScript and SEO. At the same time, developers should also be aware of how search engines process their pages based on JavaScript frameworks. While many search engines are still struggling to process JavaScript-based pages in 2019, industry leader Google is now coping with it quite well. Nevertheless, there are still numerous stumbling blocks that can prevent a page from being found well. In this post, we want to take a look at the obstacles JavaScript brings with it from an SEO perspective and which best practices can be used to support Google and Co. with crawling and indexing.

How does a search engine work?

In order for a document, i.e. usually a website, to appear in the search results at all, every search engine must first find the document and then understand its content. The first step is to get an overview of as many available documents as possible. A crawler, such as the Google Bot, basically does nothing more than follow all URLs and thus constantly discover new pages.

If the crawler lands on a URL, the HTML document is downloaded first and the source code is searched for basic information on the one hand and links to other URLs on the other. Meta information such as the robots meta tag or the canonical tag provide information on how the page is to be processed in principle. 

The crawler can continue to follow the links found.

To do this, the crawler sends the HTML file and all resources to the indexer, in Google’s case this is Caffeine. The indexer renders the document and can then index the contents of the web page. The algorithm then ensures that the most relevant documents can be found for relevant search queries.

However, if JavaScript is used to finish rendering the DOM (Document Object Model) on the client side, i.e. in the browser, the crawler will find an HTML document when visiting a URL that looks in simplified form as shown in Listing 1.

01

02

03

04

05

06

07

08

09

10

While the structure and content are already contained in the HTML source code when a document rendered on the server side is called up, the pre-DOM HTML is pretty empty in React, Angular and Co. If the entire content of a page is loaded in this way, a search engine crawler that only reads the HTML code of a page receives virtually no information. The robot can neither find links that it can follow, nor basic information on the content of the page, nor any meta tags.

Therefore, the crawler now has to send the HTML as well as CSS and JavaScript resources to the indexer, which then first has to execute the JavaScript to render the page before the content and meta information can be processed. Only then can further links be taken from the rendered DOM, which are passed back to the crawler, which then comes to further URLs.

Can search engines render JavaScript?

So far, unfortunately, Google is the only search engine that we know of that actually uses a rendering engine to run JavaScript and also provides information about the process in their documentation. The other major search engines, including Bing, are not as advanced as an experiment by Polish SEO specialist Bartosz Góralewicz has shown. Even if Bing claims to also render JavaScript, this only seems to actually take place there on very large and prominent pages.

In fact, we know that Google uses a Web Rendering Service (WRS) in Caffeine that is based on a headless Chrome. Unfortunately, we also know that this is currently still on the Chrome version 41, so it behaves like a three-year-old browser. Fortunately, the responsible team around John Müller and Martin Splitt has already emphasized that they are working flat out to get a newer version as soon as possible and to keep up with the Chrome updates in the future.

As long as this is not the case, you can read on www.caniuse.com or the Chrome Platform Status which features are supported by Chrome 41 and which are not.

In any case, the prerequisite for successful rendering is that you do not block any JavaScript or CSS resources with the robots.txt file. Correct rendering will be difficult if Google is not allowed to access these resources. In addition, all relevant content must be loaded before the load event is fired. All content that is loaded by a user event after the load event is no longer taken into account during indexing.

It is also interesting that Google indexes and crawls here in two waves. Because JavaScript rendering is extremely resource-intensive, the URL is first indexed. Of course, only the information that can be found directly in the pre-DOM HTML source can be used here. In a second wave, the page is then rendered and the entire content of the Post-DOM-HTML is indexed ( Fig. 1 ). To test how Google renders a JS-based website, you can use the “Fetch as Google” function in the (old) Search Console. Here you can see on the one hand how Google actually renders the page, on the other hand you can also see the source code.

Fig. 1: Indexing on Google in two waves

Two other tools to test the rendering by Google are the mobile-friendly test , which can be used to test the view on mobile devices, and the rich results test . The rendered HTML can be found here, as well as a screenshot of the rendered page in the mobile-friendly test.

In addition to problems with the rendering of the individual pages, completely different sources of error come into play when using JavaScript frameworks. The SEO basics are also often forgotten.

URLs are the API for crawlers

First of all, every page needs a URL. Because URLs are the entities that search engines list in search results. JavaScript can be used to dynamically change content without changing the URL. But every single page must have a unique, distinguishable, and persistent URL in order for it to be indexed at all. If new content is loaded, a new URL with server-side support must also be called. You have to make sure that normal URLs are used and no hashes (#) or hashbangs (#!), Even if these are provided as a standard in the framework:

example.com/#about

example.com/!#about

example.com/about

It is important to avoid pushSta e errors with internal links so that the URL supported on the server is actually called. Otherwise it can happen that content is suddenly available on several URLs and thus becomes duplicate content.

Another source of error: If JavaScript takes over the navigation, it can happen that a URL only works from within the application. This happens exactly when the content is reloaded and the URL is manipulated well, but it is not supported on the server side. If a user comes directly to the URL or is reloaded, there is no content on this URL.

You have to know that the Google bot always works stateless. Cookies, Local Storage, Session Storage, IndexedDB, ServiceWorkers etc. are not supported. The crawler visits every URL as a completely new user. It is therefore essential to ensure that all routes, i.e. all URLs, can always be called directly.

Server status codes

With regard to URLs, the general best practice in SEO also applies, namely to use the server status codes correctly. If the URL of a content changes, but it is still available on the page, users and search engines should be directed to the new URL with a server-side redirect (HTTP status code 301) or a corresponding client-side JavaScript redirect. By forwarding you not only get the desired content, but the redirects also pass on the authority of the old URL (keyword backlinks) to the new URL.

Web performance and Page Speed ​​2021

with Sven Wolfermann (maddesigns)

UX Design: Kitsch – you want it too!

with Lutz Schmitt (Lutz Schmitt Design & Consulting)

SEO loves PR: How two disciplines can be successful together

with Anne Kiefer (sexeedo GmbH)

If a piece of content is no longer available, a 404 status should also be returned correctly. Search engines then remove these URLs from search results. This also includes avoiding soft 404 errors: A 404 page is called a 404 page because it displays such a status, and not just an “Oops! Sorry, this page doesn’t exist ”is displayed while the server is playing a 200 (OK) code.

Leverage markup

HTML is a markup language: the available markup should be used accordingly. Even if there are various ways to call URLs with JavaScript, you should use the anchor tag provided for this purpose, including the href attribute, for a link to another URL :

My subpage

My subpage

But in the first step – we remember – the Google bot comes to the page and downloads the unfinished HTML in order to find the noindex there. This means that the page is not passed on to Caffeine at all, where the page would be rendered, and Google does not see that the finished DOM would no longer contain a noindex in the head. This results in the page not being indexed at all.

Prerender JavaScript on the server side

Rendering websites is extremely resource-intensive – even for industry giants like Google. Therefore, as mentioned above, the rendering process does not take place immediately after the crawler has discovered a URL, but only when the corresponding resources are free. It can take up to a week for a page to render. This makes the process of crawling and indexing, which is quite simple in itself, extremely time-consuming and inefficient.

Other search engine crawlers, as well as Facebook, Twitter, LinkedIn, etc. crawlers that visit pages to generate a preview box, for example, do not yet render JavaScript at all. To ensure that a page is also understood by crawlers outside the Google bot, but also to relieve Google, we recommend prerendering pages on the server. This ensures that Google actually finds all important content and indexes it more quickly.

Quite apart from the problems of the bots, client-side rendering can also have disadvantages for users. At least the initial page load, i.e. the loading of the first page, usually takes significantly longer because the rendering has to be completely taken over by the client. In addition, the loading time depends on the quality and computing power of the respective end device. That is why the magic word to make JavaScript frameworks suitable for SEO is: server-side rendering. Instead of having the HMTL code calculated on the client side, the page is already pre-rendered on the server side (pre-rendering, Fig. 2) and delivered ready. In addition to paid pre-rendering services, there are now also various open source solutions that use PhantomJS, a headless Chrome or another headless browser to render the pages on the server. Both the browser and the crawler receive the pre-rendered HTML on the server side directly. All JavaScript that is necessary to basically render the page is already running on the server. On the client side, only JavaScript that originates from user interactions is executed.

Fig. 2: Rendering and pre-rendering

Netflix, for example, achieved great success with this technology, completely converting its React application to server-side rendering and only running Vanilla JavaScript on the client side. By switching to server-side rendering, Netflix was able to improve loading times by 50 percent.

Dynamic rendering

In the model that Google calls Dynamic Rendering, a distinction is made between browser and crawler ( Fig. 3 ). While a normal browser gets the JavaScript version of the page delivered and has to render on the client side, crawlers get a server-side pre-rendered version.

Fig. 3: Dynamic rendering

This requires middleware that distinguishes whether access comes from a normal browser or a bot. Here the user agent is simply read out and, if necessary, the IP address from which the respective bot usually accesses is verified. In addition, John Müller, Senior Webmaster Trends Analyst at Google, mentioned in the Google I / O 18 Talk that this variant is not counted as cloaking. However, it should be clear that the pre-rendered and the client-side version must not differ in terms of content.

Hybrid “isomorphic” rendering

Google itself always recommends a hybrid rendering solution in which both normal users and search engines first receive a pre-rendered version of the page. Only when the user begins to interact with the page does the JavaScript begin to change the source code via the DOM ( Fig. 4 ). As far as possible, the JavaScript is already rendered on the server, and JavaScript is then executed on the client side for all further actions. And because crawlers are stateless, they always receive a pre-rendered page for each individual URL.

Fig. 4: Hybrid rendering

Another advantage of this solution is that users who have deactivated JavaScript will also receive a functioning page. In practice, however, the set-up of such a solution is often quite complicated, even if there are now very good modules for the most common frameworks.

Testing whether you are found

To test how Google processes its own JavaScript-based website, there are various tools that Google itself provides. The mobile friendly test and rich results test tools mentioned above can be misused to see whether and how the indexer can render the page. It remains to be seen whether the Fetchas Google function, which is practical for JavaScript audits, will also be integrated into the new Search Console.

Of course, you can always download version 41 of the Chrome browser and see how the page is rendered there. The console of the local DevTools provides information about which features this old version does not yet support.

Some SEO crawlers can now render JavaScript, for example the popular SEO crawler Screaming Frog. In the paid version of the tool, JavaScript rendering can be activated in the Spider configuration. The screenshot of the page can then be viewed under Rendered Page and the pre-DOM and post-DOM HTML can also be compared directly. Because JavaScript rendering consumes a lot of resources, very large pages can hardly be crawled completely with the desktop application.

A simple Google search can also be used to test whether the content of your own page has been correctly indexed. With the search operator site: example.com and a text excerpt from the page to be checked, you can quickly determine whether Google can find the content.

Conclusion

The topic of SEO has reached the development teams of the major frameworks. You deal with the problems with crawling and indexing and develop appropriate solutions. Google is also aggressively addressing the topic of JavaScript and publishes documentation and assistance. If developers deal with the topic and turn the right screws in their application, a JavaScript-based website and good findability in search engines are no longer mutually exclusive.

JavaScript brings completely new challenges for search engine optimizers. In the future, SEO will have to delve much deeper into technical details and deal more intensively with JavaScript in order to identify potential obstacles and work with developers to remove them.