I am trying to figure out how best to block requests from a certain domain.
I have found that there is a site that is scrapping data using PhP.
I believe (based on my tests and looking at logs) that they are doing this with every request instead of using a cron job.
I don't know enough about PhP to know if I am going down the right path or not. But I have the URL of the PhP page (I will just block the entire domain).
My website is built on Rails.
The best way is to block the user when he hits your server. If you are running Apache, you can add this to your .htaccess file:
RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} badsite\.com [NC]
RewriteRule .* - [F]
Related
A new client needs my help, their web developer messed up - built website on a draft/test server but forgot to block Google etc. I would appreciate help for the community here, I am not an expert with HTACCESS redirection.
As I said, another website developer setup the clients draft site on their draft server, its been there for months, however they forgot to hide it from search engines, so the content has been indexed by Google etc – this will trigger a duplicate content penalty if web put the new website live and the new website will be useless effectively.
I have access to the draft site / server and can modify the HTACCESS file, so when the new site goes live I would like to have the correct redirects in place. There are a few subdomains on the site (it's a multi language site), so it's a little tricky.
The website is built on Wordpress
The website structure looks like this on the test server. All files page names and file names are identical, just moving to a new server.
http://clientdomain.testserver.com
http://it.clientdomain.testserver.com
http://fr.clientdomain.testserver.com
http://es.clientdomain.testserver.com
http://de.clientdomain.testserver.com
http://ko.clientdomain.testserver.com
http://pt.clientdomain.testserver.com
http://ru.clientdomain.testserver.com
http://tr.clientdomain.testserver.com
http://cn.clientdomain.testserver.com
The redirects will need to go here:
http://clientdomain.com
http://it.clientdomain.com
http://fr.clientdomain.com
http://es.clientdomain.com
http://de.clientdomain.com
http://ko.clientdomain.com
http://pt.clientdomain.com
http://ru.clientdomain.com
http://tr.clientdomain.com
http://cn.clientdomain.com
The existing HTACCESS file on the test server looks like this
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
# add a trailing slash to /wp-admin
RewriteRule ^wp-admin$ wp-admin/ [R=301,L]
RewriteCond %{REQUEST_FILENAME} -f [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [L]
RewriteRule ^(wp-(content|admin|includes).*) $1 [L]
RewriteRule ^(.*\.php)$ $1 [L]
RewriteRule . index.php [L]
I would really appreciate any help on this.
There are some existing threads which contain all the pieces of the HTACCESS puzzle, but I am a little confused:
How can I redirect from one subdomain to another in .htaccess?
How can I redirect from one subdomain to another in .htaccess?
Kind Regards,
GG
If it was me I wouldn't bother messing around with redirects, get the urls removed from the index. Google will remove them with 24 hours, sometimes much quicker nowadays.
Add the development domain to your Webmaster Tools account and verify it. Then go to Google Index -> Remove Urls;
Just enter the the value / in the removal request which tells Google to remove every url in the index for that domain.
Then add a blocking robots.txt file to site root;
User-agent: *
Disallow: /
And what I normally do (this has happened a couple of times to me despite robots.txt and basic auth protection - git disaster/shenanigans) is prompt Google to reindex the site straight away. Go to Crawl -> Fetch as Google
Leave the input box blank so it fetches the whole site and just hit the Fetch button. When Google has fetched it click the 'Submit to Index' button.
You will be amazed how quickly this can happen these days, used to take weeks if you were lucky.
EDIT
And just to make sure this doesn't happen to anyone else finding this, the best way to stop it getting a dev site indexed isn't a robots.txt file or using Basic Auth via the .htaccess file (as previously mentioned it's easy to accidentally delete these). You should enable Basic Auth on the development site via the vhosts file.
Like it's not only for Google...
You can use this .htaccess:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^((?:..\.)?clientdomain)\.testserver\.com [NC]
RewriteRule ^ http://%1.com%{REQUEST_URI} [NE,L,R=301]
We are building a website where user has to login in order to view site's content (similar to what facebook and twitter use)
The problem is that our site's navigation is completely messed up:
When user opens the site, he is at: sitename.com
When user logs-in, location changes to: sitename.com/login_success.php
when user uses navigation bar, location changes to: sitename.com/login_success.php#page2 (AJAX is used to change the div content)
In comparison to facebook (url):
user is loged-in: sitename.com
user is NOT loged-in: sitename.com
user navigates to friend search: sitename.com/search
user navigates to settings: sitename.com/settings
Why do sites like facebook have so clean URLs? How do they do it? I'd like to create a clean website, with clean/user-riendly URLs (without # or ? and & and =) - where do I start? Do we need to use any framework (yii, zend, etc..)?
yeah, you gotta use mod-re-write.
for example, this is how to change sitename.com/login_success.php#page2 into sitename.come/page2:
<IfModule mod_rewrite.c>
Options +FollowSymlinks
RewriteEngine On
</IfModule>
#first, what is the original request
RewriteCond %{THE_REQUEST} /login_success.php#page([0-9]*)
# now use regex to redirect to the clean url structure
RewriteRule ^$ /page%1? [R=301,L,NE]
# now make the clean url serve the content from the ugly one
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^page([0-9]+) /login_success.php#page$1 [L]
I'm not quite sure about that last regex match, but I hope this gets you on the right track!
I believe considerable amount of coding is done in your case.So it would not be advisable to switch to some framework like yii or zend now, this decision should be taken earlier.
Check how to simplify the url.
You can use mod_rewrite of apache web server.
They use mod_rewrite and similar tools to clean up their URLs.
mod_rewrite is available on Apache. The IIS equivalent is named URL Rewrite.
http://httpd.apache.org/docs/current/mod/mod_rewrite.html
http://www.iis.net/downloads/microsoft/url-rewrite
You don't need to use a special framework to get this to work, but it helps ease the process, as many frameworks have this feature built-in.
One that comes to mind is Wordpress. Wordpress gives you great control over how this works without having to touch the configuration files too much.
I currently run a site with 750 pages of .html webpages (yeah I know it was a stupid idea, but I'm a novice). I'm looking to move these to php. I don't really want to set up 750 individual 301 redirects and rewrite each page to .php
I've heard that I can use htaccess to this. Anyone know how?
A few additional questions -
Can I permanently redirect these links from html to php without losing my search engine rankings and
if I want to add php to each of the files (i.e. a php file menu (using the include command) to make the links quicker to update will this work? Because won't they still be html files?
Sorry for the stupid questions, but I'm still learning.
Congratulations on a 750 page site - you must have put some work into that.
To collect your current list of pages use a tool called xenu to create an export into excel. You can then easily change the name the files to PHP in column b and create a .htaccees file.
However why would you want 750 php files? If you have lots of data pages, make it one page and suck in the HTML main content and reference one page. If you have a page called warehouse-depot-22-row-44.html then change that to show-warehouse-row.php?depot=22&row=44 and return that content only. This will significantly reduce your number of pages and to start using databases to render the content.
For redirecting you could use the Apache Module mod_rewrite: https://httpd.apache.org/docs/current/mod/mod_rewrite.html
You can use url rewriting to match a specific file name request with a regular expression and then decide where to redirect if matched
RewriteRule ^myname/?$ myname.php [NC,L]
http://www.addedbytes.com/articles/for-beginners/url-rewriting-for-beginners/
Depends on the structure you have.
You want the user to access them in their natural location?
/public_html/folder1/file.php
user would access like
mydomain.com/folder1/file
or you want to map them differently?
Personally I think I would use a rewrite rule to map all requests to my /public_html/index.php and would map the requests from there using php (using include for instance). This gives great flexibility, plus you have a single point of entry for your application which is very beneficial since you can easily maintain control of the application flow.
The .htaccess would look like this
#
# Redirect all to index.php
#
RewriteEngine On
# if a directory or a file exists, use it directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# RewriteCond %{REQUEST_URI} !^/index\.php
# RewriteCond %{REQUEST_URI} (/[^.]*|\.(php|html?))$ [NC]
RewriteCond %{REQUEST_URI} (/[^.]*|\.)$ [NC]
RewriteRule .* index.php [L]
of course I place all my not directly accessible files (everything except index and css, js, images, etc) to a folder outside the public_html to ensure no user can ever access them directly ;)
I've had a similar (yet much much smaller) site that went through the same thing.
I have this in my .htaccess:
RewriteEngine On
RewriteRule ^(.*)\.html$ $1.php [L]
This will help redirect any visitors to your .html addresses to your .php addresses.
You hopefully have an IDE (I recommend Aptana), and you can use some of the find/change functions project-wide, and hopefully with some time and patience get your internal links from .html to .php.
But, I caution you a little bit - Perhaps it is time to look into a database based CMS, such as Wordpress or Drupal?
I have player.php file which calls the video player to play a certain video. How can i block certain sites from accessing this file and using it to embed videos on there site. In other words What code can i use inside player.php to block certain sites from accessing this file only.
You can do this on three levels.
1) Web server
For instance, using .htaccess file if you're on an Apache server.
This could be done with a rewrite that pushes them to some dummy file or 404 or whatever you like. For example:
RewriteEngine on
# Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} badsite\.com [NC]
RewriteRule .* - [F]
This is really the ideal way because it precludes the need to interpret PHP.
2) PHP
In your page, use the $_SERVER['HTTP_REFERER'] (which may not be set if there is no referrer) and search for the domain in question in the string.
This is second best, and may be your only option if you can't alter the Apache configuration.
3) Javascript
Doesn't really prevent access to anything, because the check happens client-side (they've downloaded player.php and the Javascript itself prior to running it). If they went directly to the video or whatever, it wouldn't stop them from getting the file. You would use the document.referrer and search for the domain as with the PHP example.
If you are using Apache and have access to your .htaccess file, I suggest you use that instead. This page is an excellent resource.
You could try something like this, assuming player.php is in your web root:
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^player\.php.*
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?your-domain\.com/ [NC]
RewriteRule .* http://your-domain.com/please-dont-steal\.php[NC]
You're better off dealing with this issue server side, so PHP is a good bet. You'll need to examine the HTTP referrer header to see whether you're being hotlinked.
there are lots of tricks you can do with Apache mod-rewrite and/or .htaccess
I have two different domains that both point to my homepage in the same server.
I want to log every single access made to my homepage and log which domain the user used to access my homepage, how can I do this?
I tried mod_rewrite in Apache and logging to a MySQL database with PHP but all I could do was infinite loops.
Any ideas?
EDIT:
By your answers, I see you didn't get what I want...
As far as I know Google Analytics does not allow me to differentiate the domain being used if they both point to the same site and it also does not allow me to see that some files like images were accessed directly instead of through my webpages.
I can't also just use $_SERVER['HTTP_HOST'] cause like I just said, I want to log EVERYTHING, like images and all other files, every single request, even if it doesn't exist.
As for Webalizer, I never saw it differentiate between domains, it always assumes the default domain configure in the account and use that as root, it doesn't even display it. I'll have to check it again, but I'm not sure it will do what I want...
INFINITE LOOP:
The approach I tried involved rewriting the urls in Apche with a simple Rewrite rule pointing to a PHP script, the PHP script would log the entry into a MySQL database and the send the user back to the file with the header() function. Something like this:
.htaccess:
RewriteCond %{HTTP_HOST} ^(www\.)?domain1\.net [NC]
RewriteRule ^(.*)$ http://www.domain1.net/logscript?a=$1 [NC,L]
RewriteCond %{HTTP_HOST} ^(www\.)?domain2\.net [NC]
RewriteRule ^(.*)$ http://www.domain2.net/logscript?a=$1 [NC,L]
PHP Script:
$url = $_GET['a'];
$domain = $_SERVER['HTTP_HOST'];
// Code to log the entry into the MySQL database
header("Location: http://$domain/$url");
exit();
So, I access some file, point that file to the PHP script and the script will log and redirect to that file... However, when PHP redirects to that file, the htaccess rules will pick it up and redirect again too the PHP script, creating an infinite loop.
The best thing do would be to parse the server logs. Those will show the domain and request. Even most shared hosting accounts provide access to the logs.
If you're going to go the rewrite route, you could use RewriteCond to check the HTTP_REFERER value to see if the referer was a local link or not.
RewriteCond %{HTTP_HOST} ^(www\.)?domain1\.net [NC]
RewriteCond %{HTTP_REFERER} !^(.*)domain1(.*)$ [NC]
RewriteRule ^(.*)$ http://www.domain1.net/logscript?a=$1 [NC,L]
RewriteCond %{HTTP_HOST} ^(.*)domain2\.net [NC]
RewriteCond %{HTTP_REFERER} !^(.*)domain2(.*)$ [NC]
RewriteRule ^(.*)$ http://www.domain2.net/logscript?a=$1 [NC,L]
You may also want to post in the mod_rewrite forum. They have a whole section about handling domains.
If Google Analytics is not your thing,
$_SERVER['HTTP_HOST']
holds the domain that is used, you can log that (along with time, browser, filepath etc). No need for mod_rewrite I think. Check print_r($_SERVER) to see other things that might be interesting to log.
Make sure to still escape (mysql_real_escape_string()) all the log values, it's trivially easy to inject SQL via the browser's user-agent string for example.
So, I access some file, point that file to the PHP script and the script will log and redirect to that file... However, when PHP redirects to that file, the htaccess rules will pick it up and redirect again too the PHP script, creating an infinite loop.
Can you check for HTTP headers in the RewriteCond? If so, try setting an extra header alongside the redirect in PHP (by convention custom HTTP headers start with 'X-' so it could be header('X-stayhere: 1');), and if the X-stayhere header is present, the RewriteCond fails and it doesn't forward the browser to the PHP script.
If, however, you can cron a script to download the server logs and run them through some freeware logfile analyzer, I'd go with that instead. Having two redirects for every request is a fair bit of overhead.. (and if I was more awake I might be able to come up with different solutions)
Does Google Analytics not provide this option? Or could you not parse your server log files?
Why not use the access log facility build in apache?
Apache have a "piped log" function that allow you redirect the access log to any program.
CustomLog "|/path/to/your/logger" common