What is a Non-Canonical URL in a Sitemap?
A “Non-Canonical URL in my Sitemap” refers to a situation where URLs listed in your website’s sitemap are not the canonical versions of those pages. Canonical URLs are the preferred versions of a set of duplicate or very similar pages on your website. The canonical is the one you want search engines to consider as the main or authoritative version of the page.
When a non-canonical page is present in your sitemap, the HTML source code of the page might include a canonical link element pointing to a different link than the one listed in the sitemap. Here’s how it might look:
Example Scenario
Sitemap : https://example.com/page-a
Canonical : https://example.com/page-b
HTML Source Code of https://example.com/page-a
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Page A</title>
<!-- Canonical Tag Pointing to Page B -->
<link rel="canonical" href="https://example.com/page-b">
</head>
<body>
<h1>This is Page A</h1>
<p>Content of Page A...</p>
</body>
</html>
Issues Demonstrated
1. Sitemap URL: The sitemap contains https://example.com/page-a.
2. Canonical Tag: The HTML of https://example.com/page-a contains a canonical link pointing to https://example.com/page-b
Implications
1. Search Engine Confusion. Search engines see https://example.com/page-a in the sitemap but are instructed via the canonical tag to consider https://example.com/page-b as the authoritative version.
2. Duplicate Content. Search engines may treat https://example.com/page-a as duplicate content and might not index or rank it as desired.
Correct Approach
To fix this, you should ensure that the page in your sitemap is the canonical URL.
Updated Sitemap
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page-b</loc>
<lastmod>2023-06-01</lastmod>
<priority>0.8</priority>
</url>
</urlset>
HTML Source Code of https://example.com/page-b
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Page B</title>
<!-- Canonical Tag Pointing to Itself -->
<link rel="canonical" href="https://example.com/page-b">
</head>
<body>
<h1>This is Page B</h1>
<p>Content of Page B...</p>
</body>
</html>
By ensuring the sitemap URL matches the canonical link, you provide clear guidance to search engines, helping them index and rank your content correctly.
What Triggers This Issue?
Several factors can trigger the issue of non-canonical URLs appearing in your sitemap. Here are some common triggers:
1. Multiple Versions of a Page
- HTTP vs. HTTPS: Both http://example.com and https://example.com might exist.
- WWW vs. non-WWW: Both http://www.example.com and http://example.com might be accessible.
- Trailing Slashes: Both http://example.com/page and http://example.com/page/ might exist.
2. URL Parameters
Pages might be accessible with various URL parameters (e.g., http://example.com/page?ref=google vs. http://example.com/page). Even though the content is the same, different links might exist for tracking or session purposes.
3. Content Management System (CMS) Configuration
Some CMSs can generate multiple URLs for the same content due to different ways of linking or categorizing pages. For example, WordPress can create different URLs for the same post based on the category or tag archives.
4. Pagination
Paginated content might create different URLs for the same canonical page (e.g., http://example.com/page/2 might have a canonical URL of http://example.com/page).
5. Duplicate Content
Intentional or unintentional duplication of content across different URLs can lead to multiple versions of the same content being accessible on your website.
6. Incorrect Canonical Tags
Misconfigured canonical tags can point to the wrong URL, causing the sitemap URL to differ from the intended canonical link.
7. Automated Sitemap Generators
Some automated tools for generating sitemaps might include URLs without properly checking the canonical tags, leading to discrepancies.
8. Manual Sitemap Errors
When sitemaps are manually created or edited, human error can lead to the inclusion of non-canonical URLs.
How to Check it
1. Sitechecker SEO Audit Tool
In our Site Audit section under “Indexability,” you can find different “Canonical” issues. This feature of our tool is designed to help you uncover and resolve multiple canonical-related problems that could be affecting your website’s SEO performance. The tool pinpoints various issues such as canonical tags pointing to non-HTTPS link, missing canonicals, and canonicals set to ‘noindex, nofollow’ among others.
By clicking on “View issue” for each category, you gain access to a detailed list of pages affected by these specific canonical issues.
Optimize Your SEO with Accurate Canonical Tags!
Discover the power of proper canonical tagging with our easy-to-use Canonical Tag Checker.
2. Google Search Console
- Upload your sitemap in Google Search Console.
- Under the “Index” section, navigate to “Sitemaps” to see if there are any issues with the URLs.
- Use the “Coverage” report to identify any discrepancies between the submitted and indexed links.
3. Manual Inspection
Download and Inspect Sitemap
- Download your sitemap file (usually located at https://example.com/sitemap.xml).
- Manually check a sample of links in the sitemap against the canonical pages specified in the HTML source of those pages.
View Page Source
- Open the HTML source of the links listed in your sitemap.
- Look for the <link rel=”canonical” href=”URL”> tag.
- Verify that the link in the canonical tag matches the link listed in the sitemap.
4. Automated Scripts
You can use automated scripts to check for non-canonical links. Here’s a basic Python script using the requests and BeautifulSoup libraries to get you started:
import requests
from bs4 import BeautifulSoup
def get_canonical_url(page_url):
response = requests.get(page_url)
soup = BeautifulSoup(response.content, 'html.parser')
canonical_link = soup.find('link', rel='canonical')
if canonical_link:
return canonical_link['href']
return None
def check_sitemap(sitemap_url):
response = requests.get(sitemap_url)
soup = BeautifulSoup(response.content, 'xml')
urls = [loc.text for loc in soup.find_all('loc')]
for url in urls:
canonical_url = get_canonical_url(url)
if canonical_url and canonical_url != url:
print(f"Non-canonical URL found:\n Sitemap URL: {url}\n Canonical URL: {canonical_url}")
sitemap_url = 'https://example.com/sitemap.xml'
check_sitemap(sitemap_url)
By using these methods, you can effectively identify and address non-canonical links in your sitemap, ensuring better SEO performance and clearer guidance for search engines.
Preventing and Fixing the Issue
To prevent and fix the issue of non-canonical pages in your sitemap, there are several best practices you should follow.
First, make sure every page on your site includes a canonical tag. This involves adding the <link rel=”canonical” href=”URL”> tag to the HTML of each page, pointing to the canonical version of the page.
Next, maintain a consistent URL structure throughout your site. This means standardizing the use of HTTP versus HTTPS, deciding whether to include “www” in your URLs, and being consistent with trailing slashes.
Proper configuration of your Content Management System (CMS) is also crucial. Ensure your CMS generates consistent pages and handles duplicate content correctly.
Using 301 redirects is another effective strategy. Implement these redirects to guide users and search engines from non-canonical pages to their canonical counterparts. This helps consolidate link equity and prevents duplicate content issues.
Reviewing how your sitemap is generated is important as well. Utilize reliable sitemap generation tools that respect canonical tags and settings. Regularly check your sitemap for accuracy and update it as needed.
Finally, regular monitoring and auditing of your site is essential. Use Sitechecker to routinely scan your site for non-canonical URLs and address any issues promptly.
By following these practices, you can ensure that your sitemap accurately reflects the canonical pages of your website, thereby improving your SEO performance and reducing confusion for search engines.
Final Idea
A “Non-Canonical link in my Sitemap” refers to URLs in your sitemap that are not the canonical versions. Canonical URLs are the preferred versions of pages you want search engines to index. If a non-canonical link is in your sitemap, the page’s HTML might point to a different canonical page, confusing search engines and potential duplicate content issues. To resolve this, ensure that your sitemap includes only canonical pasges, standardize URL structures, configure your CMS correctly, use 301 redirects, and regularly audit your site using SEO tools to maintain accuracy and improve SEO performance.