SEO Best Practices for Multilingual International Sitecore Sites
Nov 12, 2017 • 6 Minute Read • Elizabeth Spranzani, Chief Technology Officer
In this blog post, I am going to talk a bit about some best practices to keep in mind from an SEO perspective around Multilingual and Multi-Country Sitecore solutions. My colleague at Verndale, Kevin Schofield, contributed much of the knowledge here for identifying steps to follow for the best SEO experience when you are executing on a multilingual, multi-country strategy.
I touched upon this briefly in my previous post, but you seriously want to consider country-code Top Level Domains (ccTLD) for each country, if you either already own those domains, or they are available.
An example of this is yoursite.com, yoursite.ca (for Canada), yoursite.co.uk (for United Kingdom).
A ccTLD provides a strong signal to both users and search engines that your site and its content is intended for and targeted to a certain country (and not exclusively targeting a specific language).
A ccTLD strategy is ideal if the strategy dictates a specific presence and messaging within a region / country. ccTLDs are restricted within some countries unless you have a physical presence and go through an application process. In many cases, this process can be expensive as you must apply for, purchase, and maintain each ccTLD as if it were a different website. Additionally, ccTLDs have no effect on the domain authority of the primary website (i.e. yoursite.com) because they are seen by the search engines as unique websites. In essences, a ccTLD strategy would necessitate development of content, and optimization practices for each country version of the website and could potentially require different infrastructure (if servers or data need to reside in the country itself).
Pros/Strengths of ccTLD
- Obvious and intuitive to the user.
- Clearest signal to search engines as it provides geolocation.
- Provides the ability for each domain to be hosted on a country specific IP address. This can be essential for ranking in country specific search engines as well as helps provide a boost for local SEO.
- Easy to market.
Cons/Weaknesses of ccTLD
- Each site has separate domain authority. This means they do not share any of the benefits of inbound links and essentially must be optimized as different websites.
- Limits where you can operate and optimize as many countries and regions require a physical presence and registration.
- Provides the potential for censorship of content as well as lose of ccTLD depending on targeted country.
- Can be costly to register/purchase all relevant domains.
- May need to host certain elements within country.
Using Sitecore Language Fallback with ccTLD overcomes these challenges:
- Starting from zero from an optimization and site structure perspective: additional country sites don't need to be built up (perhaps just translated/localized).
- Additional infrastructure for hosting: as long as the country doesn't require, you can leverage the same CM/CD servers.
- Tracking difficulties: Site structure, user flow, and tagging will be more consistent.
If the ccTLD strategy is selected, then remember EACH domain will need its own domain-specific search resources like XML sitemaps and the robots.txt file. Google will not be able to tell that these sites are managed centrally and will consider them to be standalone domains and therefore treated as such.
The good news is that each domain can be setup as their own configuration entry in the Site config file and if you leverage the sitemap XML manager from Sitecore (and Verndale has an enhanced version of it), this will automatically generate each country specific xml file with no extra development effort. Since we also manage the robots.txt file virtually and maintain its content for each language separately, we can adhere to these standards with no extra effort.
Instead of using ccTLD, you could use the same domain with different languages embedded directly after the domain:
eg: yoursite.com/en-us/ and yoursite.com/en-ca, etc.
If the multilingual content is NOT targeted at specific countries (eg Spanish content is just for people that speak Spanish, not tailored for a specific Spanish speaking country), the content SHOULD be kept on a single, non-ccTLD domain with the language content divided up using language code embedded after the domain.
Note: Sitecore makes it easy to do this, with a Link Manager attribute 'languageEmbedding'. Unfortunately Sitecore has this as a global setting, and you can probably see that within a single instance, you may have many sites, some with ccTLD and some without. Verndale has customized our Link Manager settings to look for this at a site-specific config file level, so you can set it differently per site.
Pros/Strengths of embedded language/country
- All links to any language version of the site help boost the domain.
- Lower cost for not having to purchase domains.
- Here are some other points that are typically considered Pros, although using Language Fallback with a single node/instance with ccTLD makes these less impactful differentiators:
- Less prone to linking mistakes as this format follows the standard website convention. (fallback enforces standard convention)
- Simple to set up and maintain. (fallback means setup once and done)
- Low cost of infrastructure (could reuse servers with the same instance)
- Easiest from a content management perspective (again, one instance and site node means still easy)
Cons/Weaknesses of embedded language/country
- This option does not perform as well in country specific search engines.
- Weakest geolocation signal.
- Separation of site is less clear to both users and search engines.
- Potentially confusing for users looking for a ccTLD (example yoursite.ca) version of the site.
- The site may be hosted from one location so server location signal is lost to localized users.
- Search engines can confuse the users with what is served in the SERP based on the search query.
- Necessitates additional set up within Google Search Console to provide additional information to search engines.
Language is a highly relevant and important signal to search engines about how and where to rank content. The site should utilize the proper tags (rel=”alternative”) to serve up the appropriate language / translated content based on the selected location and language by the user. Quite often search engines have difficulty differentiating similar languages (US English vs UK English) which can result in duplicate content problems. To prevent this, you will want to make it simple for search engines and users to determine which language speaking group of users are targeted. In the same vein, you will want to ensure that the same or similar content served on different URLs in the same language utilizes the proper canonical tagging to show search engines which content is preferred.
Canonicals are NOT used to identify language variants of pages, since they serve different audiences (the only pseudo-exception here is the language directory example above where /page and /en/page would be canonicalized because they serve the same audience). Instead, hreflang alternate tags are used to identify all the variants of each page on a site. These tags are placed in an array in the section and list out each variant of a given page including the page itself. An example looks like this:
<link rel="alternate" hreflang="de-DE" href="http://www.yoursite.de/" />
<link rel="alternate" hreflang="es-AR" href="http://www.yoursite.com.ar/" />
<link rel="alternate" hreflang="es-CO" href="http://www.yoursite.com.co/" />
<link rel="alternate" hreflang="es-MX" href="http://www.yoursite.mx/" />
<link rel="alternate" hreflang="zh-CN" href="http://www.yoursite.cn/" />
<link rel="alternate" hreflang="en-CA" href="http://www.yoursite.ca/" />
<link rel="alternate" hreflang="en" href="http://www.yoursite.com/" />
Note: hreflang tags can be used across domains, and can identify BOTH country-specific and language-only variants (in the example above the English variant is for English speakers anywhere, while there are 3 different Spanish variants directed at three specific countries). These alternates can also be defined in the XML sitemap if that is easier than putting it in the <head> section (though keep in mind it would need to be reflected in the sitemaps for ALL domains, this is likely only a good option for directory-based language variants on a single domain).
We have automated the output of these with Verndale sites by specifying which language/countries' map to which site/domain configurations and their target domains.
Finally, there are two other places in the markup that language should be identified in a multilingual environment:
- Bing: Bing's crawlers look for a content-language tag to indicate language/country first, so best practice is to include that as well. Each page should identify its language (and target country if applicable) using an ISO code. An example tag looks like:
<meta http-equiv="content-language" content="en-us">
- HTML: Sometimes a “lang” attribute is attached to the opening <html> tag of each page’s markup on a site. While this is not necessary, it is used as a backup language indicator if a crawler has trouble finding other indicators for some reason, and so it is not a bad idea to include in multilingual environments, using the attribute to define the language of each page:
<html lang="en-us">
By following the advice here, your multilingual, multi-country Sitecore instance will keep you in-line with current SEO best practices!