A canonical tag is one of several important meta tags that should always be included on a web page. The clinical reason for including this tag is to “normalize” multiple urls into 1 url. When i first read this over on Moz, I too was confused. I had typically included this tag for a completely different reason, which was to indicate that one of my sites is the original publisher of a piece of content.
I’ll explain all of this in a minute. This tag is included in the <HEAD></HEAD> section of your web pages HTML file. The tag looks like this:
<link rel=”canonical” href=”http://site.com/blog” />
The canonical tag fixes a couple of issues:
1 – If a piece of content appears at multiple locations around the web, the canonical tag is used to indicate the location of the original content. A typical example is when one site allows another site to republish a piece of content so that their user base has direct access to the content without having to click over to another site.
Lets say that Site A publishes a piece of content and then syndicates that same content to Site B. The correct way to publish this would be to have both sites include a canonical tag pointing to the page on Site A containing the content:
<link rel=”canonical” href=”http://siteA.com/InterestingArticle” />
The reason that BOTH sites should include this is because Site A as the original publisher, should include the tag for the reason below and Site B should include the tag indicating that Site A is the original publisher of the content and that Site As version of the page should be included in the Google Index.
2 – The other reason for you to include a canonical tag is because different web servers call pages different names to refer to the same place. The canonical tag makes sure that all of these disparate page names all point to the same place. The inclusion will reduce the risk associated with perceived duplicate content.
For example, Apache web servers can and will include all four of these URLs to indicate the same page:
Microsoft’s web server (IIS) will include the following 4 page formats, but since Windows & IIS are not case sensitive, any combination of characters that “kind of” mean the same thing can be treated as such.
- http://www.sitea.com/default.asp (or .aspx depending on the version)
- http://sitea.com/default.asp (or .aspx)
- any combination of these things with capital letters in any location within the URL.
The only real risk of using the canonical tag is to create a loop that with not end up with a clearly canonicalized page. For example, including a canonical on “sitea.com/index.php” pointing to “sitea.com/index.htm” and including a canonical on “sitea.com/index.htm” pointing to “sitea.com/index.php” would create a look where there’s no conclusion. Just imagine from the Wizard of Oz, the scarecrow pointing in both directions when he was telling Dorothy where to go. It just doesn’t work.
Since the above is the only potential drawback to using the canonical tag, we always want our clients to include canonicals. For this reason, the crawl that we execute will give a “-1” score to a page that does not include one.