What are XML sitemaps?
An XML file for search engines containing a list of URLs on a particular domain. This file can be used to supplement regular indexing, where a bot/crawler goes out and visits each page of a site by itself.
You want Google to crawl every important page of your website, but sometimes pages end up without any internal links pointing to them, making them hard to find. An XML sitemap lists a website’s important pages, making sure Google can find and crawl them all, and helping it understand your website structure:
What websites need an XML sitemap?
Google’s documentation says XML sitemaps are beneficial for “really large websites”, for “websites with large archives”, for “new websites with just a few external links to it” and for “websites which use rich media content”.
Here at Yoast, while we agree that these kinds of websites will definitely benefit the most from having one, we think XML sitemaps are beneficial for every website. Every single website needs Google to be able to easily find the most important pages and to know when they were last updated, which is why this feature is included in the Yoast SEO plugin.
Which pages should be in your XML sitemap?
How do you decide which pages to include in your XML sitemap? Always start by thinking of the relevance of a URL: when a visitor lands on a particular URL, is it a good result? Do you want visitors to land on that URL? If not, it probably shouldn’t be in it. However, if you really don’t want that URL to show up in the search results you’ll need to add a ‘noindex, follow’ tag. Leaving it out of your XML sitemap doesn’t mean Google won’t index the URL. If Google can find it by following links, Google can index the URL.
Example 1: A new blog
Say, for example, you are starting a new blog. You will want Google to find new posts quickly to make sure your target audience can find your blog on Google, so it’s a good idea to create an XML sitemap right from the start. You might create a handful of first posts and categories for them as well as some tags to start with. But there won’t be enough content yet to fill the tag overview pages, making them “thin content” that’s not valuable to visitors – yet. In this case, you should leave the tag’s URLs out of the sitemap for now. Set the tag pages to ‘noindex, follow’ because you don’t want people to find them in search results.
Example 2: Media and images
The ‘media’ or ‘image’ XML sitemap is also unnecessary for most websites. This is because your images are probably used within your pages and posts, so will already be included in your ‘post’ or ‘page’ sitemap. So having a separate ‘media’ or ‘image’ sitemap would be pointless and we recommend leaving it out. The only exception to this is if images are your main business. Photographers, for example, will probably want to show a separate ‘media’ or ‘image’ XML sitemap to Google.
« Back to Glossary Index