Quantcast
Channel: Recent posts across whole site
Viewing all articles
Browse latest Browse all 49234

Beware duplicate content - redirect IP to URL with .htaccess

$
0
0

Today I discovered something disturbing: a bunch of pages of my Drupal 5 cruise guide site were missing from Google's index under the site name, and showed up under the IP address instead. Worse yet, Google has been known to penalize sites for duplicating content on a large scale - a real risk if it's seeing both the URL and the IP address. I did some digging, and I think I found a solution worth sharing.

First, here's an example: my blog entry on “How to get Seasick” was in the Google index when I searched the URL, site:72.52.247.79/site_blog/how_to_get_seasick, but NOT when I searched the main site, site:cruisesavvy.com/site_blog/how_to_get_seasick.

In theory there shouldn't really be a problem because Googlebot should never find the IP because who would ever link to something so unweildy? But, as it turns out, Google somehow picked up both the URL and the IP address from a post I wrote in Drupal Groups, and some of my own pages. Which is weird, because the links were all URLs, even in the Google cache, but nonetheless the page shows up on searches for link:72.52.247.79. Needless to say, I don't use IP when linking pages.

Does Drupal sometimes change URLs to IPs for some reason?? I'm perplexed.

But whatever the cause, it seems that the way around this is to add the following line to .htaccess , as part of the rewrite rules:

RewriteCond %{HTTP_HOST} ^[0-9]+(\.[0-9]+){3} [OR]

That’s in addition, of course, to un-commenting either

# RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
# RewriteRule ^(.*)$ http://www.example.com/$1 [L,R=301]

Or

# RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
# RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]


Viewing all articles
Browse latest Browse all 49234

Trending Articles