Introduction to Image Taxonomy for Content Hub DAM
Oct 03, 2024 • 2 Minute Read • Richard Cabral, Technical Director
data:image/s3,"s3://crabby-images/21764/217640298d13b178cfc2aa2d7c6805756f82f86d" alt="Hero_Insight_Contenthub_03"
My name is Rick Cabral, known affectionately here in the Northeast US as "Sergeant Sitecore." I've been working with Sitecore for over 15 years starting with the release of Sitecore 5.0 back in 2005. In that time I've created and led dozens of top-notch Sitecore development teams. While my teams have worked on scores of "new" Sitecore projects, we've also "rescued" dozens of failed projects or faltering installations.
Here at Verndale, we're seeing a recurring theme among these rescue missions: "Sitecore is not performing well." It might be that the website is failing under significant traffic loads, or it might be that the website is simply slow. Because the website is being run through Sitecore, the brand name gets the ding, but it's seldom a problem with the Sitecore product itself. (We'll get to the exceptions later). Let's unpack how my team identifies and fixes performance problems.
Often we get two kinds of non-specific complaints about website performance:
While subjective complaints are fine for a starting point, your Sitecore team is going to want to have an objective, scientific way to identify the site's current performance and measure real progress via testable KPIs.
Know what your visitors do on site. Just because a given page is slow doesn't mean you should move heaven and earth to fix it.
Your Chrome browser ships with a wonderful tool for both diagnosing site performance problems as well as giving you a wrapped-up KPI for measuring improvements. With Page Speed Insights you get:
Simply addressing all the recommendations Page Speed Insights produces in its report will produce a visibly snappier site without touching Sitecore at all.
If you've never run a load test on your production installation, you have no idea how many concurrent users it can support. Here at Verndale we use K6, https://k6.io, to simulate real-world visitor click paths, including loading all assets per page. Load testing tools can be as simple to use as recording a visitor session in your browser and uploading it to the bot network to test. K6 gives you outstanding real-time monitoring of your load test, and allows you to schedule capacity bumps in stages to help identify the effectiveness of caching strategies.
A load testing network will give you information on:
Here at Verndale we put every new website project through a load test before launch to ensure that the system can actually handle normal daily traffic, as well as the increased load caused by marketing the site on launch day.
Whenever you deploy a fix, use the tool that indicated the performance problem to verify the problem is resolved. Track your statistics in Page Speed Insights over time to see if you're making progress. Re-running tests will make sure your improvements are effective and economical. It's possible (particularly with Page Speed Insights) to focus on problems that are identified as important but:
Whenever you make any change to your site, part of your DevOps strategy should involve performance analysis using the tools mentioned here. Marketing needs will change. The site will evolve. It's entirely possible to introduce something "new" that will have an adverse effect on site performance. Diagnostic tools can help prevent launch-day catastrophes.
Now that we can evaluate our current condition and benchmark improvements, let's get into the actual problems that we've encountered.
I'm going to list these problem areas in order of expense to fix. We're going to start with the low-hanging fruit and work our way up to major infrastructure changes.
My number-one cause of poor page performance on Sitecore-run sites is large image file sizes. I rank this one at the absolute top because it's 100% preventable if the developer guards against human nature from the start. Content authors are often not aware of what causes poor page performance, and they're also not necessarily masters of Photoshop. They pick a good picture for the task at hand, and upload it to Sitecore. If that picture came off a 30MP camera, it's going to be huge. There are a few ways to defend against this behavior:
Sitecore is a Content Management System (CMS), not a digital asset warehouse. The only digital assets that should be added to Sitecore are ones that are already optimized for web delivery. If you don't have an in-house plan for processing images for the web before they go to the content team, you need to address this in your content lifecycle immediately.
Sitecore itself has always offered on-the-fly image transformation via query string parameters. (The Sitecore UI makes heavy use of this feature). Used in conjunction with the <picture/> element, you can get right-sized images every time even before you factor in CDN-based image processing. Developers reaching for "Dianoga" would be better served using Sitecore's media URL transformation parameters.
Some tips for Sitecore's Image Transformation:
Many Sitecore developers attempt to circumvent this problem by installing an open source "plugin" for Sitecore called "Dianoga." This oddly-named tool runs 3rd party optimization strategies on images as they're requested by your site's visitors. Here at Verndale we recommend against this plugin for a number of reasons:
If you are employing a CDN in any capacity (and you should be) your provider may offer image optimization services that require little-to-no developer intervention. Akamai Image Services, for example, takes any source type (GIF/JPEG/PNG) of image and creates a WebP variant, which it adds to its cache. If the user's browser supports WebP, Akamai will serve the browser the WebP file, regardless of the file type mentioned in the URL. This seamless optimization means Developers do not need to alter the media URLs provided by Sitecore, and don't need to specify all possible formats in their <picture/> elements. CDN-based image management can be an extremely low effort page speed improvement. It can also be extremely cost effective. For example, Cloudflare's new "Polish" image processing service is included in Business level contracts.
Sitecore is not capable of streaming media assets like video. When a user stores a video file in the Media Library, the browser must download the entire file before starting playback. Since even a small video file is several megabytes in size, this can destroy a visitor's page performance immediately. If there are a significant number of users on your site, a single Media Library video on your home page can take down your website entirely by overloading the Content Delivery server's ability to respond to requests.
Use a 3rd party video streaming provider like Brightcove, Vimeo, or even YouTube to host your videos and integrate them into Sitecore-hosted pages using their "embed" style players or JavaScript APIs. These 3rd party players can optimize the video's size base on the size of the player on page, and the user's available bandwidth, ensuring they get the highest-performance experience.
Here at Verndale we use a "belt and suspenders" approach:
Media-based performance mitigation can be done in stages by implementing the above steps in any order.
Discussing the full nature of modern, quality, responsive HTML is beyond the scope of this article. Running Google Page Speed Insights on a page will also provide an incredible amount of advice on how to ensure an HTML document is organized for high performance. We'll take a moment to talk about a few key developer behaviors that can have a negative impact on your Sitecore installation:
All responsive websites developed in the last 3-5 years should be using the <picture/> element instead of the traditional <img/> element. For each breakpoint on your website, HTML developers should be specifying the exact URL of the image to display in a given component. If you have 3 breakpoints, there should be 3 image URLs. Each image should be exactly the size needed for that breakpoint and optimized for that size. This strategy can shave megabytes off of your page load at the mobile breakpoint. Given that in 2022 most website views come through the smartphone rather than the desktop, optimizing that experience with right-sized images should be priority one.
We discussed using the <picture/> element previously. Ensure that the image that gets downloaded has just enough bytes to fill the space for a given breakpoint. We can optimize one step further by adding the "loading=lazy" attribute to image tags. This ensures that the browser doesn't start to download the image immediately, but rather waits until the image would be visible in the viewport. Just adding loading=lazy to all image tags on page can have a remarkable effect on performance.
I once saw a home page that clocked in at 20MB! The largest contributor to size was a collection of 30 (thirty) font references in the HTML document header. Every possible variant of the given fonts had been loaded, although only 3 fonts were used and each only needed one variant.
We recently encountered a client that had 85 script files referenced on every page of their site. Performance was predictably poor.
Each URL in your HTML document requires a new connection to the server to download. It's much faster to download one large document than to download 20 documents due to the overhead of managing the connection. Additionally, all browsers have a limited number of connections they can establish at any given time. As soon as the HTML author exceeds that limit, no more assets will be loaded until an existing connection is closed. For JavaScript files, this can be mitigated with the use of the "async" or "defer" tags. Their use is well documented and beyond the scope of this article.
Assuming deferred JavaScript files, another common sin is to load all JavaScript required for the entire site on every page in a given Sitecore installation. Considering this puts the largest burden on the first page a visitor encounters, it's far from ideal. Instead, JavaScript should only be loaded if an HTML component on page requires it.
While this is also a Sitecore development strategy issue, the solution involves changes in the way the HTML document is constructed and thus it's relevant here.
Consider a contact form that exists in the header of every page of a site. The form is "hidden" behind a button and only rolls out if the user engages with it. The form includes a "country" and "state/province" selector, both of which are dropdowns. Between the two dropdowns you have more than 500 discrete <option/> elements. For a multi-language site, these options must be represented as content items within Sitecore. Generating this list involves a significant amount of CMS data processing and produces a large amount of HTML that dilutes the SEO relevance of the page by filling the top 500KB of the document with generic facts. Having all 500 of these options (indeed, having the form at all) included in the original page request is inefficient and hurts SEO.
The solution is to remove the form from the hosting page, and only deliver it when the button to display it is clicked. This AJAX approach has the following benefit:
We've been going through common performance problems in order of expense to fix, and we've only just started to talk about Sitecore development issues. Because Sitecore problems are a broad topic, we're going to start with the lowest hanging fruit: mistakes made by junior Sitecore developers that directly impact Rendering performance. These mistakes are the cheapest to fix, and are where I start every diagnostic of a poorly performing Sitecore installation.
If I encounter a poorly performing Sitecore installation, I can almost guarantee the developers didn't activate any of the built-in Rendering caching flags. The side effect of this is that every page component must ask the Sitecore Data layer for relevant Items on every request. The "average" page in Sitecore can reference 20 to 100 discrete content items. Depending on other settings, this can cause a large amount of memory churn and database access. Output cache management is generally thought of as an "end game" optimization, but in reality output caching is an architectural element that must be carefully planned into your installation.
The visible "Header" of any modern website consists of many parts:
Junior developers will often start programming a header as a single component. This causes all kinds of problems because you cannot cache the entire header:
Historically, Primary Navigation is a demanding component from a data retrieval perspective. Not only does one have to interrogate a few hundred items for links, but the relationship of those links to the currently viewed page must also be divined. Caching the Primary Navigation on a page-by-page basis is critical to getting Sitecore performance where it needs to be. (This also applies to Fat Footers and the like.)
Instead of having a single Header, the Header should be broken out into a series of placeholders, each of which holds a Rendering that can be cached based on Sitecore's available cache parameters: Data, Context Item, Language, Querystring, User, etc...
The key development philosophy is to keep Sitecore page components as small as physically possible. This is usually determined by the uniqueness of the data they're displaying.
An MVC "Partial View" allows the developer to reference one HTML fragment from another. They can also pass information from the "parent" fragment to the "child" fragment. The problem with this technology is that it's completely invisible to Sitecore.
Developers should never use Partials in Sitecore. Instead, reference Partials as full-fledged Renderings bound to Placeholders for maximum caching, personalization, and multivariate testing possibilities.
Similar to "Partial Views," Child Actions are a way for developers to reference controller-hosted functionality from within a "parent" view. They have exactly the same problems as "Partial Views" in Sitecore, and cannot be cached, designed, debugged, as well as are susceptible to losing mission critical request context.
Developers should never use Child Actions in Sitecore. Instead, convert Child Actions to full-fledged Renderings bound to Placeholders for maximum caching, personalization, and multivariate testing possibilities.
Sitecore remedial developer training does not cover extensive use of the ContentSearch API, which is the data retrieval API backed by Solr indexes. As such, junior Sitecore developers will instead lean on the older XPATH API to retrieve bulk Sitecore Items. The XPATH API is good for two purposes:
When developers need to retrieve a series of Items that may be scattered throughout the content tree, or when they need to retrieve thousands of Items, the XPATH API lacks the performance to handle the task quickly and will rapidly consume available server power if the site is under load.
The solution is to move all bulk item retrieval to the ContentSearch API. However:
Developers who became familiar with Sitecore during the version 7 series were introduced to "native" Sitecore APIs created for the "brand new" WYSIWYG Experience Editor. However, these APIs were never meant to be used for public content delivery. They present such a security risk that they're disabled by default on "Content Delivery" servers. They also lack the cache layers of the standard HttpRequest pipeline and do not scale. Sites with poor performance and a lot of AJAX calls tend to be suffering from inappropriate use of this particular Sitecore feature.
The ItemService API:
Rather than use the API "folder," the best approach is to use Item Controllers, and treat any AJAX API calls as if they were page URLs within a given site. This has the following benefits:
In my experience, a few bad choices in Sitecore architectural design can render an installation hopeless. In this section, we'll again be looking at problems in order of expense to fix. If you need to start changing your Sitecore architecture, be aware that you're about to make a significant investment.
Keep in mind that I'm limiting the conversation to development problems that can negatively impact performance. This isn't an exhaustive list of Sitecore faux pas.
Between late versions of Sitecore 6 and the introduction of Sitecore 9 an ORM craze struck the Sitecore developer community. There was a desire to further abstract Sitecore's Item objects into a more class-like structure that closely mirrored the Template structure defined by developers. "Glass Mapper" became the de-facto open source solution to this problem and was implemented broadly. However, Glass Mapper and similar technologies have a number of performance downsides:
While out of the scope of performance problems, Glass Mapper introduces a number of compile-time and DevOps concerns as well. The Sitecore Developer community has almost universally moved away from Glass Mapper as best practice in favor of lighter-weight, high performing solutions.
When a Sitecore solution has performance issues getting data out of the database, the culprit is often bad content tree design. A junior school of thought tends to organize content by silos of content type. While this may work out for a "product catalog," it breaks down quickly when one is organizing page fragments. The Information Architect designing the content tree needs to look at a given Page in the tree as well as the related Items it's likely to access. These should be stored as close to the page Item as practically possible, depending on whether the Page has exclusive access to that data or whether the data is shared by multiple pages. If a page consists of renderings that reference Items scattered broadly throughout the content tree, it becomes very difficult to build queries to grab appropriate data in an efficient manner. Bad content tree design leads to inefficient or slow XPATH statements or an over-use of the ContentSearch API, which can overload your backing Solr installation.
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">Some developers over-embrace the concept of reusable components, or use an off-the-shelf framework like Sitecore SXA to build up a site from a single concept of "page." While this provides tremendous flexibility to the Content Author, who can put anything anywhere, it creates scenarios where it's extremely difficult to locate specific content when establishing lists, faceted search, taxonomies, or even simply "related content." Effectively a variant on "Bad Content Tree Design" above, getting data out of Sitecore becomes extremely resource intensive, which introduces performance problems.
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">All websites today need to support the "https" protocol. Developers that are unfamiliar with the way DNS resolves, or the way Windows IIS handles protocols, will frequently program "http to https" shunting within Sitecore renderings. Here are the causes:
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">Using Sitecore to redirect unencrypted (http) traffic to encrypted (https) requests is a very slow, CPU-intensive process. A Sitecore implementation of this redirect structure also tends to be haphazard, which can produce difficult to replicate runtime errors. While this seems like a simple fix, developers tend to "hunt" for solutions to this problem and it may take a significant amount of time to untangle their efforts, particularly if they've ignored, bent, or replaced Sitecore's built in link management system.
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">When it debuted in Sitecore 6.2, The "OMS" product (now "XDB" or simply "XP") added the ability for content authors to "personalize" any page component without programmer intervention. This highly desirable function unfortunately has a number of performance side effects, which is why in 2020 Sitecore purchased Boxever and now offers a Jamstack/SAAS based approach with their "Sitecore Personalize" product.
Sitecore XDB Performance Liabilities:
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">We see a lot of installations where use of Sitecore's in-system analytics data and Marketing Automation is a "phase II" item that never gets the attention it deserves. Developers are asked to "turn it on" but the system is not given any specific objectives, except possibly storing all form data from the Sitecore Forms module. On busy sites, this lights the fuse on a runtime problem that may rear its head a week, month, or year down the line, almost certainly during a peak traffic time, and without warning. Here's what breaks:
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">The promise of Sitecore SXA is to remove all custom server-side development from the platform in favor of a WIX/Squarespace style HTML/Design framework. This provides Content Authors with the ability to "wireframe" pages up and send them to HTML developers for styling. From a performance perspective, SXA introduces some challenges:
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">While "Headless" when typically implemented, Sitecore JSS is not 100% "Jamstack." Requests from visitors are processed on a server, and the page is assembled before being sent to the browser. A typical JSS installation replaces ASP.NET MVC with a Node.JS server within your Sitecore installation. This Node server is what responds to visitor requests. Behind Node, there is either a Content Delivery server or Experience Edge responsible for providing data to Node in real-time. Like any Sitecore installation, a JSS installation requires careful programming and sufficient infrastructure to handle your visitor load effectively.
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">Getting high performance out of a JSS installation requires very specific approaches:
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">Often the complaint "Sitecore is slow" doesn't come from the page analytics team, but from the content authoring group. Ensuring that Sitecore is reliable and easy to use for content maintenance is absolutely key to the success of the installation. Let's look at the most common problems encountered:
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">Sitecore Item Cloning was a unique solution to an intractable problem, but it's never the best solution to the problem. Cloning overrides Sitecore's default field value resolver to allow you to essentially copy one Item and maintain the reference back to the original, to keep the two in lock step. Aside from the Content Authoring challenges this system exposes, Cloning creates some very real performance problems:
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">If a Sitecore system was implemented with Cloning as a core strategy, the best solution is usually to start from scratch and re-implement the system without cloning.
Sitecore Language Fallback is a feature that allows a page component to display an alternate language should there be no data for the context language. This technology was released in Sitecore version 7 series, and pre-dates the idea of "Final Renderings" and language-specific page layout. Aside from a lack of compatibility with the more modern Presentation Details structure, Language Fallback causes performance problems during page response generation, as for each Item referenced by the page, it must walk through all installed System languages looking for the "best fit" content.
As for mitigation, Language Fallback can be "disabled" by bringing all in-system languages into full coverage, rendering fallback unnecessary. If 1:1 translation is not an option, significant content tree organization to separate language options and regionalized sites may be required. Extensive regression testing will also be required to ensure disabling Language Fallback will not introduce runtime errors.
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">Sitecore is an incredibly flexible framework for designing enterprise websites. But the solution will only be as good as the implementer. All programming, from SQL to PHP, suffers from the same liabilities. A programmer building a website that will support a significant number of visitors needs to think about problems in a very different way than a programmer designing a single-user desktop application. Every aspect of getting data out of a database, formatting it for display, and delivering it to the browser needs to be tested against realistic traffic expectations. That said, optimization at every level can get expensive. Here's some guidance on how to attack the problem:
</h3id="poorlyperformingsitecoreinstallation?here'sthesolution.-problem:acontenttreethatdoesn'tsupportefficientxpathnavigation">