stop google indexing php page that redirects - php

I have a (randomly named) php file that does a bit of processing and then uses a header("location:url") to redirect to another page.
As mentioned, the script has a random name (eg: euu238843a.php) for security reasons as I don't want people stumbling upon it.
Thing is - How do I stop Google from indexing it - I want other pages in the same directory to be indexed but not this php file. I don't want people to be able to do a site:myurl.com and find the "hidden" script.
I would normally put a meta name="robots" content="noindex" in the head but, I can't do that as the page needs to ouput headers at the end to redirect.

You can dynamically updated the robots.txt file within the directory using a PHP script that outputs a new or appended robots.txt as needed. If you specify each dynamic filename on a new line, such as Disallow: /File_12345.html, you can avoid having to disallow the entire directory.

Related

Is there a way I could redirect to a common php based pdf display template when a user clicks a link to a PDF file?

Is there a way I could redirect users to a pdf display template inside my website rather than going directly to the pdf file in their browser.
For example, if a user clicks on a link to http://example.com/docs/date/1.pdf
I want him to be redirected to let's say http://example.com/docview.php and this PHP needs to get details of the pdf file from the URL of the previous link and then display the right PHP file.
All help appreciated.
Thanks in advance!
Two options:
1 - Use htaccess Rewrite rules to turn PDF accesses into PHP. This has the advantage that the user will see a link that actually says "PDF" in the URL. However, it can get a bit tricky to implement and you need to be careful that it is limited or you could easily end up with ALL PDFs anywhere on the site, including some that should just be static PDF files, redirecting to the script. This will do exactly what you ask - Google htaccess Rewrite and you can get plenty of examples.
2 - Change the links to reference your PHP scripts directly. The PHP script can then provide whatever frame or viewer desired or simply check for permission (if needed) and read the PDF file and output it to the browser. A .php extension on the URL doesn't mattter - the browser will display PDFs correctly based on the mimetype of the output. This is my personal preference for providing PDF output and I have done this many times.

Apache 2.4 - How to redirect images to a php file?

I need to redirect all images to a php file, including the path and file name.
Imagine my domain is example.com
I might have https://example.com/art/logo.png and want to do redirect this to https://example.com/scripts/image_loader.php?a:art&b=logo.png
So, I would force any request to be redirected to a php-file which will display the image. I am doing this to control who can access the images and also to prevent hotlinking.
I also want html-files to use this redirect and use the php file to display the image, e.g. for included 'img src=/art/logo.png'
I already did the mod_rewrite using the referral header method but it's not working and I am not sure why. I am assuming that the https protocol doesn't use a referrer in the header.
For the php I need to know how I can stop someone from hotlinking my images and from just downloading them as files.
Can I determine if the user is a person or a bot or some kind of site ripper?
I know I can't stop a person from downloading images. I just want to make it harder for them. So, they cannot just easily download them or use a site ripper software.
Any suggestions would be great. My website uses or runs on HTTPS. A lot of solutions for hotlinking online only shows examples using http.
1. Can you determine wether the request is from a user or a bot?
NYeah, more or less^^ You can use the UserAgent-Header field to determine if the request was sent via a browser or not. But then the question is do you want to blacklist all site-rippers (like HTTrack) or whitelist all common Browsers? Both will be an incomplete and pretty long list.
2. How to rewrite all images to your php-file
RewriteEngine On
RedirectRule ^/(.+/(.+\.(jpe?g|gif|png|tif|bmp))$ /scripts/image_loader.php?a:$1&b=$2
This rewrites all requests for standard-images, if the called URI-pattern follows your specified '/path/image.png' to /scripts/image_loader.php?a:path&b=image.png
update
RedirectRule ^^/(.+/)+(.+\.(jpe?g|gif|png|tif|bmp))$ /scripts/image_loader.php?a:$1&b=$2
The first bracket (.+) includes the part after the servername up to the last file name, so in the example /path/example/image.png it would contain path/example. The second bracket (.+) only includes the file name, so in the example /path/example/image.png it would contain image.png. Now both values are assigned to the target-URL -->
/scripts/image_loader.php?a:path/example&b=image.png
Hopefully I got your question right...

Google finds Duplicate Title Tags

I have 2 issues regarding duplicate content.
I am using google webmasters and was notified that i have a few files with duplicate content.
Both are related to Duplicate Title Tags:
ISSUE #1: 2 files:
(a) this is the correct file
(b) this is the same file but with "?xxxx" after the filename
I do not know how to remove the (b) from the list, as it is not a real file, merely the
same filename but attaching a tracking code to the end. How do I remove?
ISSUE #2: 2 files:
(a) this is the correct file
(b) this same file, but with ".php" added to the end
I added a redirect script to .htaccess to remove the filename extension (.php) from the
files so that it would load without the filename extension. Now Google is telling me some
are duplicate content. How do I remove (b)?
ISSUE #1: Google use "url"s to index pages not file address. so each url must represent a unique content. for instance "a.php?r=1" is different from "a.php?r=2" because urls are different. to solve this problem you can either do a 301 redirect (b) to (a) if they work totally similar or you can use rel="canonical"
ISSUE #1: you can use rel="canonical" too or if you are certainly sure that nobody can access (b) you can disregard the messages, they will be removed after a while.

Hide uploaded files from search results?

A client running WordPress has requested the development of the following feature on their website.
They would like to include/exclude specific files (typically PDF) uploaded via the WordPress media uploader from search results.
I'm guessing this could be done somehow using a robots.txt file, but I have no idea where to start.
Any advice/ideas?
This is from Google Webmaster Developers site https://developers.google.com/webmasters/control-crawl-index/docs/faq
How long will it take for changes in my robots.txt file to affect my search results?
First, the cache of the robots.txt file must be refreshed (we generally cache the contents for up to one day). Even after finding the change, crawling and indexing is a complicated process that can sometimes take quite some time for individual URLs, so it's impossible to give an exact timeline. Also, keep in mind that even if your robots.txt file is disallowing access to a URL, that URL may remain visible in search results despite that fact that we can't crawl it. If you wish to expedite removal of the pages you've blocked from Google, please submit a removal request via Google Webmaster Tools.
And here are specifications for robots.txt from Google https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
If your file's syntax is correct the best answer is just wait till Google updates your new robots file.
I'm not certain how to do this within the confines of WordPress, but if you're looking to exclude particular file types, I would suggest using the X-Robots-Tag HTTP Header. It's particularly great for PDFs and non-HTML based file types where you would normally want to use a robots tag.
You can add the header for all specific FileType requests and then set a value of NOINDEX. This will prevent the PDFs from being included in the search results.
You can use the robots.txt file if the URLs end with the filetype or something that is unique to the file type. Example: Disallow: /*.pdf$ ... but I know that's not always the case with URLs.
https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag

Including headers and CSS in different directories

This must be a simple fix, but I just can't figure it out!
I am including the header via PHP into each new page, the header contains the CSS and scripts etc.
Only problem is since the page is in a different directory, when I link to the header like ../header etc it looks fine but the included scripts, are the included via a short URL e.g. /js/script.js
Which means on the page (in another directory) the scripts do not work!
I'm finding it hard to explain but take a look at this:
http://www.healthygit.com/
If you view the source all the scripts link fine.
Now look at this:
http://www.healthygit.com/fitness/running.php
If you try to click on a script to view it, it takes you to a 404 or in this case 302's you to the homepage.
Easy peasy, your source for the scripts, stylesheets etc. should have a / in front of them, that way the header file will always refer to the files from the root of the site.
I.e. this:
src="js/whatever.js"
Should become
src="/js/whatever.js"
This way it will always look for the files from the root of the site.
The same applies with CSS files, i.e. /css/whatever.css.

Categories