What Does the “Non-ASCII Characters in a URL” Issue Mean?
The “non-ASCII characters in a URL” issue in a site audit indicates that there are characters in the address of a webpage that are not part of the ASCII character set. The ASCII character set includes the English alphabet (both uppercase and lowercase letters), digits (0-9), and some special characters (like !, @, #, etc.). Non-ASCII characters can include letters with accents, characters from non-Latin scripts, symbols, and any other character not defined in the ASCII standard.
Here are some potential problems and implications of non-ASCII characters in URLs:
Browser Compatibility
Some older browsers and certain versions of software may not correctly handle hyperlinks with non-ASCII characters.
SEO Impact
Search engines can crawl and index URLs with non-ASCII characters, but they may not be as efficiently indexed as URLs with only ASCII characters. Additionally, having clean, easily readable web addresses is generally better for SEO.
Link Sharing and Emailing
Non-ASCII characters can sometimes cause issues when URLs are shared via email or on social media platforms. Some systems may not encode them properly, leading to broken links.
User Experience
URLs with non-ASCII characters can be difficult to read and remember for users, especially if they are copying the link manually or trying to share it.
What Triggers This Issue?
When generating a URL, only ASCII symbols are allowed to be used. An example of a non-ASCII character is the Ñ. The URL can’t contain any non-ASCII character or even a space. This issue commonly arises from developers misusing symbols or making coding mistakes — it could arise from a lack of knowledge or even negligence.
How to Check the Issue
There are third-party tools you can use to automatically check for non-ASCII characters within a URL. For example, Sitechecker – conduct an audit and find all URLs with non-ASCII characters.
In the result, get a list of all affected URLs, so you can fix them.
If you’re the developer and you’re working on a code that results in a non-ASCII error warning, there’s a simple procedure you can follow.
To look for non-ASCII characters, try following these steps in any source code editor:
- Open the code editor.
- Press Ctrl + F to run a Find or Search command.
- Enter [^\x00-\x7F]+ in the search box.
- Choose “Regular expression” as the search mode. Click Next.
- Wait for the results.
Keep your URLs clean and accessible!
Unsure if there are Non-ASCII characters in your link? Check it right now with Sitechecker!
How to Fix the Issue
1. URL Encoding
Non-ASCII characters should be percent-encoded. This means replacing them with their corresponding ASCII code preceded by a % symbol. For example, the character ü is encoded as %C3%BC.
Example:
- Non-encoded URL: https://example.com/über-cool
- Encoded URL: https://example.com/%C3%BCber-cool
2. Use ASCII Equivalents
Where possible, replace non-ASCII characters with their closest ASCII equivalents.
- Original: https://example.com/café
- ASCII equivalent: https://example.com/cafe
3. Update Internal Links
Ensure that all internal links on your site point to the new, encoded, or ASCII-only hyperlinks.
<a href="https://example.com/%C3%BCber-cool">Cool Page</a>
4. Update External Links
If possible, update external links pointing to your site to use the new URLs. You might need to contact webmasters of other sites for this.
5. Implement URL Redirects
Set up 301 redirects from the old addresses to the new URLs to ensure that visitors and search engines are directed to the correct pages. This helps preserve SEO value.
Example: In your .htaccess file (Apache server):
Redirect 301 /über-cool https://example.com/%C3%BCber-cool
6. Update Sitemap and Robots.txt
Update your sitemap to include the new URLs and ensure your robots.txt file does not block them.
<url>
<loc>https://example.com/%C3%BCber-cool</loc>
<lastmod>2024-07-25</lastmod>
</url>
7. Test the Changes
After making these changes, test the links to ensure they are working correctly and that the redirects are functioning as expected. You can use tools like Google Search Console to check for crawl errors.
8. Monitor for Issues
Regularly monitor your site to ensure there are no new issues with non-ASCII characters in URLs.