20i
A robot stands in front of a website, looking confused

5 ways to prevent duplicate content using .htaccess

If you run a website, you want it to rank highly in search engines like Google. A common problem that you might face is duplicate content.

If you’re aren’t aware of content that’s been duplicated, you risk your website being penalised. This could make your website almost invisible in the Google results.

In this article, we’ll take a look at what duplicate content is, how it affects your website and how you can prevent this with a few simple .htaccess rules.

What is duplicate content and how does it affect your website?

Duplicate content means that two or more pages on a website have identical content. There isn’t a set penalty imposed by Google for duplicate content – they know that not all duplicate content is deceptive. But their filtering can make mistakes, so it’s best to avoid it where possible.

The impact of duplicate content for your website is that it will confuse Google on which page to rank in the top results, which could result in lower rankings. Or your site could be penalised as a whole.

Duplicate content interpreted by Spiderman

There are good reasons why Google (and other search engines) have these penalties. One is that it’s bad for user experience – it could confuse users, making them less likely to convert or visit again.

Another is that in the past, duplicate content was often the hallmark of untrustworthy sites. Finally, it wastes the search engine crawlers’ resources: bad for them and the environment.

How does duplicate content on a website happen?

Here are some of the most common causes:

IssueExample
HTTP and HTTPS versionshttp://yourwebsite.com
https://yourwebsite.com
www and non-www versionswww.yourwebsite.com
yourwebsite.com
Trailing slashesYourwebsite.com/page
Yourwebsite.com/page/

While the above URLs are just different ways to access the same page, Google will see this as multiple pages with the same content but using different URLs. This will cause Google to mark this as duplicate content and not know which page should be ranked higher.

How to use .htaccess to resolve this

There is an easy way to resolve this by creating or modifying your .htaccess file. The .htaccess file will allow you to alter the webserver functionality.

If you use 20i hosting, you can create an .htaccess file using the file manager within My20i. You will just need to head over to Manage hosting > Options > Manage > FIle Manager > Right click in the file management section > ‘Create New File’

Creating an .htaccess file in My20i

Other web hosts should also allow you to create an .htaccess file – consult their documentation for the details.

After you’ve created the file, you can start editing the contents of the .htaccess file and you can simply add the rules below.

Here are five simple rules to prevent the above issues causing duplicate content:

Force your website to use HTTPS

If your website is hosted on our platform, you can simply hit the one click Force HTTPS option in My20i > Security > TLS/SSL which will force all pages of your website to use the secure HTTPS protocol.

Alternatively, you can add this rule to your .htaccess file which will force all webpages in your website to use HTTPS.

RewriteEngine On
RewriteCond %{env:HTTPS} =off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301,NE]

In the above ”RewriteCond %{env:HTTPS} =off”, will check to see if the connection is the non-secure HTTP type, and if it is the next line will be put into effect. This will redirect the request to HTTPS.

Force www

www.yourwebsite.com and yourwebsite.com resolve to the same webpage. But as the URLs are different, Google will see this as two different pages.

If your website is WordPress, you can update the home and site URL to achieve this. This can be done easily using WordPress Tools, included with the WordPress platform.

You can also force your website to use www by using the following .htaccess rule:

RewriteEngine on
RewriteCond %{HTTP_HOST} ^yourwebsite.com [NC]
RewriteRule ^(.*)$ http://www.yourwebsite/$1 [L,R=301,NC]

This line “RewriteCond %{HTTP_HOST} ^yourwebsite.com [NC]” will detect if your domain is being loaded in the browser without www. If it isn’t, the next line will redirect the URL to www.

Force non-www

Like above, you can force your website pages to use non-www by using the following .htaccess rule. Again, if your website is WordPress, you can update the home and Site URL to achieve this.

RewriteEngine On
RewriteCond %{HTTP_HOST} ^www.yourdomain.com [NC]
RewriteRule ^(.*)$ http://yourdomain.com/$1 [L,R=301]

Force trailing slash

https://www.yourdomain.com/page and https://www.yourdomain.com/page/ both resolve to the same page and similar to www and non-www variations of the URL, Google will see these as different pages because of the trailing slash. This is easily fixed by using the following .htaccess rule.

This rule will force your webpage URLs to use a forward slash (/) in the address bar.

RewriteEngine On
RewriteCond %{REQUEST_URI} /+[^\.]+$
RewriteRule ^(.+[^/])$ %{REQUEST_URI}/ [R=301,L]

This line “RewriteCond %{REQUEST_URI} /+[^\.]+$” will detect if there is not a trailing slash at the end of the URL, if there isn’t the next line will redirect the URL to force a trailing slash at the end of the page URL.

For example, https://www.yourdomain.com/page will become https://www.yourdomain.com/page/.

Force non-trailing slash

Alternatively, you can force the URL to remove a trailing slash. It makes no difference to search engines.

RewriteEngine On
RewriteCond %{REQUEST_URI} /+[^\.]+$
RewriteRule ^(.+?)/$ /$1 [R=301,NE,L]

If your website is hosted on 20i’s platform, it’s recommended to enable Developer Mode in the CDN settings and use an incognito window to ensure that the new .htaccess changes are seen when testing the .htaccess rules.

The Google algorithm struggles with duplicate content

More tips on duplicate content

Ensure that all internal links are up to date on your website

Once you’ve chosen the preferred URL you’d like to use, i.e., www rather than non-www, you will want to ensure that all internal links also use www.domain.com.  If your website is WordPress, you can use the Better Search and Replace plugin to make the replacements.

Test all .htaccess variations

The above are just some examples of .htaccess rules available. There are types of rules that you can use. We advise that you test any you use to ensure that you’re getting the result you want. The rules above have been tested and work on 20i’s platform.

Configure your .htaccess rules from the start

When you create your website, it’s better set your strategy from the start and add the rules to the .htaccess file then. This way, you can ensure that duplicate content is prevented from the start of your project and there is less risk of breaking your website.

Summary

No matter if you are hosting your own WordPress site or run an agency providing hosting for your customers, hopefully this article has given some insight on the impact of duplicate content. The above are just some of the problems you may encounter with duplicated content and ways you can prevent this.

Do you have other methods of preventing duplicate content? Let us know in the comments below.

Add comment

Ben Perry

Ben is a Technical Salesperson at 20i. He has a keen interest in cyber security, which he explores in his rare free time when not entertaining his new-born.