A new client needs my help, their web developer messed up - built website on a draft/test server but forgot to block Google etc. I would appreciate help for the community here, I am not an expert with HTACCESS redirection.
As I said, another website developer setup the clients draft site on their draft server, its been there for months, however they forgot to hide it from search engines, so the content has been indexed by Google etc – this will trigger a duplicate content penalty if web put the new website live and the new website will be useless effectively.
I have access to the draft site / server and can modify the HTACCESS file, so when the new site goes live I would like to have the correct redirects in place. There are a few subdomains on the site (it's a multi language site), so it's a little tricky.
The website is built on Wordpress
The website structure looks like this on the test server. All files page names and file names are identical, just moving to a new server.
http://clientdomain.testserver.com
http://it.clientdomain.testserver.com
http://fr.clientdomain.testserver.com
http://es.clientdomain.testserver.com
http://de.clientdomain.testserver.com
http://ko.clientdomain.testserver.com
http://pt.clientdomain.testserver.com
http://ru.clientdomain.testserver.com
http://tr.clientdomain.testserver.com
http://cn.clientdomain.testserver.com
The redirects will need to go here:
http://clientdomain.com
http://it.clientdomain.com
http://fr.clientdomain.com
http://es.clientdomain.com
http://de.clientdomain.com
http://ko.clientdomain.com
http://pt.clientdomain.com
http://ru.clientdomain.com
http://tr.clientdomain.com
http://cn.clientdomain.com
The existing HTACCESS file on the test server looks like this
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
# add a trailing slash to /wp-admin
RewriteRule ^wp-admin$ wp-admin/ [R=301,L]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
RewriteRule ^(wp-(content|admin|includes).*) $1 [L]
RewriteRule ^(.*\.php)$ $1 [L]
RewriteRule . index.php [L]
I would really appreciate any help on this.
There are some existing threads which contain all the pieces of the HTACCESS puzzle, but I am a little confused:
How can I redirect from one subdomain to another in .htaccess?
How can I redirect from one subdomain to another in .htaccess?
Kind Regards,
GG
If it was me I wouldn't bother messing around with redirects, get the urls removed from the index. Google will remove them with 24 hours, sometimes much quicker nowadays.
Add the development domain to your Webmaster Tools account and verify it. Then go to Google Index -> Remove Urls;
Just enter the the value / in the removal request which tells Google to remove every url in the index for that domain.
Then add a blocking robots.txt file to site root;
User-agent: *
Disallow: /
And what I normally do (this has happened a couple of times to me despite robots.txt and basic auth protection - git disaster/shenanigans) is prompt Google to reindex the site straight away. Go to Crawl -> Fetch as Google
Leave the input box blank so it fetches the whole site and just hit the Fetch button. When Google has fetched it click the 'Submit to Index' button.
You will be amazed how quickly this can happen these days, used to take weeks if you were lucky.
EDIT
And just to make sure this doesn't happen to anyone else finding this, the best way to stop it getting a dev site indexed isn't a robots.txt file or using Basic Auth via the .htaccess file (as previously mentioned it's easy to accidentally delete these). You should enable Basic Auth on the development site via the vhosts file.
Like it's not only for Google...
You can use this .htaccess:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^((?:..\.)?clientdomain)\.testserver\.com [NC]
RewriteRule ^ http://%1.com%{REQUEST_URI} [NE,L,R=301]
Related
I've set up a reverse proxy from my Windows server to a blog hosted elsewhere. All is fine except for the sitemaps.
The blog is on a subdomain: http://blog.example.com
The proxied domain is https://example.com/blog
As I'm using Wordpress, I've opted for Yoast SEO, but despite ARR doing the rerouting Google tools still complains about images it cannot access - on the origin domain. This is correct in one sense because I've added a second robots.txt on the subdomain, to stop duplicate content, but it doesn't make sense, in the sense that Application Request Routing should be hiding the subdomain. However, we all know that Google does what it wants to do.
I've found some code which I've added to my htaccess file:
# WordPress SEO - XML Sitemap Rewrite Fix - for reverse proxy
RewriteEngine On
RewriteBase /
RewriteRule ^sitemap_index.xml$ https://example.com/blog/index.php?sitemap=1 [L]
RewriteRule ^([^/]+?)-sitemap([0-9]+)?.xml$ https://example.com/blog/index.php?sitemap=$1&sitemap_n=$2 [L]
# END WordPress SEO - XML Sitemap Rewrite Fix
I'm not sure whether it's doing anything at the moment because the image issue still exists, so my next step would be to try and redirect images to the new domain structure... and herein lies the problem - I know absolutely nothing about Apache stuff and definitely not apache rewriting.
What I need to do is redirect anything in the uploads folder, to a new absolute path
From, /wp-content/uploads/myimage.jpg to https://example.com/wp-content/uploads/myimage.jpg
Can anyone help with this final piece of the jigsaw?
Thanks in advance.
You can probably use something like the following in your .htaccess:
RewriteCond %{REQUEST_URI} ^/wp-content/uploads/
RewriteRule ^(.*)\.(jpe?g|gif|png|bmp)$ https://example.com/wp-content/uploads/$1\.$2 [NC,L,R=302]
I am trying to figure out how best to block requests from a certain domain.
I have found that there is a site that is scrapping data using PhP.
I believe (based on my tests and looking at logs) that they are doing this with every request instead of using a cron job.
I don't know enough about PhP to know if I am going down the right path or not. But I have the URL of the PhP page (I will just block the entire domain).
My website is built on Rails.
The best way is to block the user when he hits your server. If you are running Apache, you can add this to your .htaccess file:
RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} badsite\.com [NC]
RewriteRule .* - [F]
We are building a website where user has to login in order to view site's content (similar to what facebook and twitter use)
The problem is that our site's navigation is completely messed up:
When user opens the site, he is at: sitename.com
When user logs-in, location changes to: sitename.com/login_success.php
when user uses navigation bar, location changes to: sitename.com/login_success.php#page2 (AJAX is used to change the div content)
In comparison to facebook (url):
user is loged-in: sitename.com
user is NOT loged-in: sitename.com
user navigates to friend search: sitename.com/search
user navigates to settings: sitename.com/settings
Why do sites like facebook have so clean URLs? How do they do it? I'd like to create a clean website, with clean/user-riendly URLs (without # or ? and & and =) - where do I start? Do we need to use any framework (yii, zend, etc..)?
yeah, you gotta use mod-re-write.
for example, this is how to change sitename.com/login_success.php#page2 into sitename.come/page2:
<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine On
</IfModule>
#first, what is the original request
RewriteCond %{THE_REQUEST} /login_success.php#page([0-9]*)
# now use regex to redirect to the clean url structure
RewriteRule ^$ /page%1? [R=301,L,NE]
# now make the clean url serve the content from the ugly one
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^page([0-9]+) /login_success.php#page$1 [L]
I'm not quite sure about that last regex match, but I hope this gets you on the right track!
I believe considerable amount of coding is done in your case.So it would not be advisable to switch to some framework like yii or zend now, this decision should be taken earlier.
Check how to simplify the url.
You can use mod_rewrite of apache web server.
They use mod_rewrite and similar tools to clean up their URLs.
mod_rewrite is available on Apache. The IIS equivalent is named URL Rewrite.
http://httpd.apache.org/docs/current/mod/mod_rewrite.html
http://www.iis.net/downloads/microsoft/url-rewrite
You don't need to use a special framework to get this to work, but it helps ease the process, as many frameworks have this feature built-in.
One that comes to mind is Wordpress. Wordpress gives you great control over how this works without having to touch the configuration files too much.
I currently run a site with 750 pages of .html webpages (yeah I know it was a stupid idea, but I'm a novice). I'm looking to move these to php. I don't really want to set up 750 individual 301 redirects and rewrite each page to .php
I've heard that I can use htaccess to this. Anyone know how?
A few additional questions -
Can I permanently redirect these links from html to php without losing my search engine rankings and
if I want to add php to each of the files (i.e. a php file menu (using the include command) to make the links quicker to update will this work? Because won't they still be html files?
Sorry for the stupid questions, but I'm still learning.
Congratulations on a 750 page site - you must have put some work into that.
To collect your current list of pages use a tool called xenu to create an export into excel. You can then easily change the name the files to PHP in column b and create a .htaccees file.
However why would you want 750 php files? If you have lots of data pages, make it one page and suck in the HTML main content and reference one page. If you have a page called warehouse-depot-22-row-44.html then change that to show-warehouse-row.php?depot=22&row=44 and return that content only. This will significantly reduce your number of pages and to start using databases to render the content.
For redirecting you could use the Apache Module mod_rewrite: https://httpd.apache.org/docs/current/mod/mod_rewrite.html
You can use url rewriting to match a specific file name request with a regular expression and then decide where to redirect if matched
RewriteRule ^myname/?$ myname.php [NC,L]
http://www.addedbytes.com/articles/for-beginners/url-rewriting-for-beginners/
Depends on the structure you have.
You want the user to access them in their natural location?
/public_html/folder1/file.php
user would access like
mydomain.com/folder1/file
or you want to map them differently?
Personally I think I would use a rewrite rule to map all requests to my /public_html/index.php and would map the requests from there using php (using include for instance). This gives great flexibility, plus you have a single point of entry for your application which is very beneficial since you can easily maintain control of the application flow.
The .htaccess would look like this
#
# Redirect all to index.php
#
RewriteEngine On
# if a directory or a file exists, use it directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# RewriteCond %{REQUEST_URI} !^/index\.php
# RewriteCond %{REQUEST_URI} (/[^.]*|\.(php|html?))$ [NC]
RewriteCond %{REQUEST_URI} (/[^.]*|\.)$ [NC]
RewriteRule .* index.php [L]
of course I place all my not directly accessible files (everything except index and css, js, images, etc) to a folder outside the public_html to ensure no user can ever access them directly ;)
I've had a similar (yet much much smaller) site that went through the same thing.
I have this in my .htaccess:
RewriteEngine On
RewriteRule ^(.*)\.html$ $1.php [L]
This will help redirect any visitors to your .html addresses to your .php addresses.
You hopefully have an IDE (I recommend Aptana), and you can use some of the find/change functions project-wide, and hopefully with some time and patience get your internal links from .html to .php.
But, I caution you a little bit - Perhaps it is time to look into a database based CMS, such as Wordpress or Drupal?
my website has a log in by open id feature. When a user logs in for the first time using his/ her openid they are redirected to a create account page. I noticed just recently that one user when logged in using her google account created an account for the first time. However when she tried to log in again using the same google account - she was faced with creating a new account again. I checked the db and saw that although she used the same google account - the open ID urls which were retrieved are different?
EDIT===================
Thanks Kobi for the information - the issue is that I need to set up my website so it always opens with www prepended to it i.e. http://www.mysite.com and NOT http://mysite.com
Owing to this subtle difference google OpenID recognises the two urls as different urls!!! Help please
I realised its an htaccess thing however I googled a bit and found these htaccess commands:
RewriteCond %{HTTP_HOST} ^site.com [NC]
RewriteRule (.*) http://www.site.com/$1 [L,R=301]
However the problem is that when I use this in my htaccess it does forward and ensure the link reads as www.site.com however it messed up all the javascript links - actually I'm using url rewriting here as well... my whole htaccessfile is somewhat like this:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule .* index.php
RewriteRule !\.(js|ico|gif|jpg|png|css)$ index.php
AddType text/css .css
inclusion of the two lines messes up the url rewriting :( what do I do here
======================
Uh never mind I figured it out :) I was putting the two rewrite url lines at the end thus somehow overriding the other rewrite rules - putting them in the beginning fixed it :) thanks anyway
Google gives different URLs for different domains.
It is possible your user used a different URL each time to log in? Even www on the start of the url can change the code Google returns.