Thursday, September 22nd, 2022

How to customize the default WordPress sitemap

Image of multiple WordPress icons with a you are here sticker over the top

After years of relying on plugins for that most basic of SEO tools, WordPress finally has a sitemap. I’ve been testing it for a month or so, and the results are OK.

The default WordPress sitemap is a crude instrument. It lists every post, page, category and tag on your site without filter. For the most basic of SEO this might be OK, but you may also want to tweak it a little. Maybe remove things that could harm your security, or you don’t want to be indexed so easily.

In this post I’ll take a deeper look at the default WordPress sitemap and offer suggestions (with code) for how to improve and tailor it.


What is the default WordPress sitemap?

Problems with the default WordPress sitemap

Changing the default behaviour

How to hide authors from search engines

How to hide specific posts and pages

How to hide specific tags and categories

How to disable the sitemap completely

Bottom line: A useful addition you can tailor

What is the default WordPress sitemap?

A sitemap is a list of all the content on your site that a search engine uses to build its index. It could be as simple as a long list of links or a structured data file called XML that’s organised into posts and index pages. It’s this latter approach WordPress takes.

Add a new post, page, category or tag to your WordPress site, and it gets added. The next time a search engine’s crawler visits your site it’ll look at the sitemap, see the new content and go to work adding it to its index. This is a simpler and more efficient way of finding new content than checking every page and index on your site.

If you’ve kept your WordPress up to date you should have a default sitemap. You’ll find it by typing yourdomain/wp-sitemap.xml into your browser, where yourdomain is your site’s web address.

Screenshot of a webpage with lines of text linking to different site maps
The default sitemap for this site before being customized.

By default, the WordPress default contains a list of:

  • all your posts
  • all your pages
  • every category (including “uncategorized”)
  • every tag with at least 1 post assigned
  • all your users who have posted content
  • every post format you’ve used (e.g. aside, image, gallery etc)

These are arranged on different pages, all indexed from sitemap.xml.

Problems with the default WordPress sitemap

While useful, there are some security and SEO issues with the default sitemap. 

  • it exposes all of the usernames used to post content. This can help hackers find a way into your site
  • tags you’re using to control functionality in the blog will be exposed, such as if you’re using one to hide content from certain searches
  • content you want hidden from search engines, such as “thanks for subscribing” pages will also be exposed.

Changing the default behaviour

As with most things in WordPress, these problems can be circumvented with a little extra code. There are filters and hooks to use, most of which will be familiar if you’ve been coding WordPress.

Posts and Pages

As you might expect, single posts and pages use the wp_query object. Just as you can manipulate it to produce lists of posts in your theme, so you can to change what appears in a site map.

Indexes and archives

Again these can be manipulated using the wp_query object, with an important exception: the author index.

Hooks and filters

Manipulating the default sitemap requires the usual mix of writing functions and hooking them to filters. There’s over two dozen filters and hooks, although I’ve found only three are needed for general content customising:

wp_sitemaps_posts_query_args for manipulating lists of posts and pages

wp_sitemaps_add_provider for turning specific archives on or off

wp_sitemaps_taxonomies_query_args for manipulating categories and tags

Hardcoding vs configuration

Throughout the examples I’ve used hardcoded values. This isn’t me saying “this is what you should do”, it’s just to keep the example code clean and focused. You can take this approach, or use configuration variables and meta fields.

Nor have I included any checks on whether values have already been set. Again, this is to keep the example code easier to read.

Where to put the code.

Opinions vary on where the extra code should go. As this is changing the behaviour of WordPress regardless of which theme is running, I suggest putting it in a plugin specific to your site. If you do update or change the look and feel of your site later, you won’t accidentally damage your SEO.

How to hide your authors from search engines

As a minimum you should hide your authors. This is because they’re linked directly to their usernames and helps hackers find a way into your site.

The following code removes the authors index:

function custom_remove_author_sitemap($provider, $name)
    // I disable the author sitemap
    if ( $name === 'users' ) {
        return false;
    return $provider;

add_filter('wp_sitemaps_add_provider', 'custom_remove_author_sitemap', 10, 2);

How to hide specific posts and pages from the sitemap

You may have content you don’t want to reveal to search engines. For example, if you have a page visitors only see after subscribing to your newsletter, you won’t want it to appear in your sitemap.

There are four ways you can do this, but let’s start with the basic code:

function custom_exclude_posts_sitemap( $args, $post_type ) {
	// I remove specific posts from the sitemap

	// Filtering code here

	return $args;

add_filter( 'wp_sitemaps_posts_query_args', 'custom_exclude_posts_sitemap', 10, 2 );

Everything we do within this function is manipulating the arguments used in $args.

by post id

If you know the ids of the posts you want to exclude, you can state them using post__not_in. Remember to include multiple IDs in an array, and you need to keep this updated as you add and remove posts and pages.

$exclude_posts = array(1,2,3,4,etc);
$args['post__not_in'] = $exclude_posts;

by tag or category

If you’re using a specific tag or category to flag posts as “hidden”, you can use the tag__not_in and category__not_in $args. As with post__not_in they accept arrays of ids.

For example, if you have a tag “Hide from Sitemap” that has the id “4”, you would use:

$args['tag__not_in'] = array( 4 );

Any post that has “Hide from Sitemap” as a tag won’t appear in the default sitemap.

To hide posts with a category id of 5, use:

$args['category__not_in'] = array( 5 );

by a single custom field

If you are using a single custom field to determine whether a post should be included you can use the ‘meta_query‘ argument. This takes an array of arrays of field name, value and comparison.

For example, you have a custom field called “index_this” that has “yes” if you want it to appear, and “no” if you don’t. If you wanted to exclude any post where it’s “no”, you would use:

$exclude_index_this_no = array(
	'field' => 'index_this',
	'value' => 'no',
	'compare' => '!=',

$args['meta_query'] = array( $exclude_index_this_no );

Alternatively, if you only wanted to include posts where index_this is set to “yes”, you would use:

$include_index_this_yes = array(
	'field' => 'index_this',
	'value' => 'yes',
	'compare' => '=',

$args['meta_query'] = array( $include_index_this_yes );	

by a custom field array

To improve database performance, some plugins use a custom field as an array of data. Instead of one field in WordPress being stored as one field in the database, multiple fields are brought together and saved in one database field. For example, saving our “index_this” field in an array might become seo_settings[index_this].

This approach is more efficient from a database perspective, and arguably makes for easier code as well. However, it also makes working with posts a little more cumbersome. I’ve not found a reliable way to use ‘meta_query’.

The solution I am using is to cycle through the posts or pages being added to the sitemap, collect the IDs of those whose “index_this” is set to “no” and set ‘post__not_in’ to this array. The sitemap then excludes all of the identified posts.

// This is our list of IDs to exclude from the sitemap
$exclude_id = array();

// Set up a query for all the 'post_types' being shown
$tmp_args = array( 
	'nopaging' => true,
	'post_type' => $post_type,
	'post_status' => "publish",
// Get everything that matches the above 
$working_posts = get_posts( $tmp_args ); 

// Cycle through the list
foreach( $working_posts as $this_post ) :
	// Get the SEO settings for this post
	$seo_data = get_post_meta($this_post->ID, 'seo_settings', true ); 

	// Catch any old posts without $seo_data and include them. Setting $index_this to "no" will exclude them.
   if ( empty($seo_data )) : 
   		$index_this = "yes";
		$index_this = $seo_data["index_this"];
	if ( $index_this === "no") :
   		// the post's id to the list of posts to exclude
   		$exclude_id[] = $this_post->ID;

// Finally, tell the sitemap to exclude all of the hidden posts we found
$args['post__not_in'] = $exclude_id;

Although this looks like a lot of code, remember it is only going to be called when a specific sitemap is requested, and then by a search engine. There is no impact on your visitors.

A word on indexing

This code only hides posts from the sitemap. It doesn’t stop search engines from finding or indexing them through other means. For example, if they appear on the home page they will still be indexed.

You need to add code to the theme to limit the likelihood of being indexed. This is a topic I will cover in a future post.

How to hide specific tags and categories

Sometimes we use tags or categories to control behaviour in a theme, but don’t want them exposed to the outside world. To hide them from the default WordPress sitemap, we can use the following code:

function custom_exclude_tag_or_category( $args ) {
	// I exclude specific tags or categories from the default WordPress sitemap
	$args['exclude'] = array(1,2,3);

	return $args;

add_filter( 'wp_sitemaps_taxonomies_query_args', 'custom_exclude_tag_or_category');

This tells the sitemap to exclude any tag or category that has the ID 1, 2 or 3. 

How to disable the sitemap completely

If you want to remove the WordPress sitemap completely, use the following code:

add_filter('wp_sitemaps_enabled', false );

It’s worth checking whether any plugins you’re using have already done this, or need it to stay active to work. This is particularly important if you’re using a plugin for SEO.

Bottom line: A useful addition you can tailor

Adding a default sitemap is a useful addition to the WordPress toolkit. At first glance it looks a little crude as it reveals everything on your site. With a little patience and some basic coding skills, it can be tamed and tailored to something quite useful and powerful in its own right.

For those of us running our sites without using burdensome plugins and heavy custom code, it makes life a lot easier.

My name is Ross Hori

I'm a freelance writer, designer and photographer. By day I create articles, features and reports. At night I take photos and write fiction. Find out more.