Tell search engines that page does not exist - php

I have checked the logs and found that the search engines visits a lot of bogus URL's on my website. They are most likely from before a lot of the links were changed, and even though I have made 301 redirects some links have been altered in very strange ways and aren't recognized by my .htaccess file.
All requests are handled by index.php. If a response can't be created due to a bad URL a custom error page is presented instead. With simplified code index.php looks like this
try {
$Request = new Request();
$Request->respond();
} catch(NoresponseException $e) {
$Request->presentErrorPage();
}
I just realized that this page returns a status 200 telling the bot that the page is valid even though it ain't.
Is it enough to add a header with 404 in the catch statement to tell the bots to stop visiting that page?
Like this:
header("HTTP/1.0 404 Not Found");
It looks OK when I tests it, but I'm worried that SE bots (and maybe user agents) will get confused.

You're getting there. The idea is correct - you want to give them a 404. However, just one tiny correction: if the client queries using HTTP/1.1 and you answer using 1.0, some clients will get confused.
The way around this is as follows:
header($_SERVER['SERVER_PROTOCOL']." 404 Not Found");

A well-behaved crawler respects robots.txt at the top level of your site. If you want to exclude crawlers, then #SalmanA's response will work. A sample robots.txt file follows:
User-agent: *
Disallow: /foo/*
Disallow: /bar/*
Disallow: /hd1/*
It needs to be readable by all. Note this is not going to get users off the pages, just a bot that respects robots.txt, which most of them do.

The SE bots DO get confused when they see this:
HTTP/1.1 200 OK
<h1>The page your requested does not exist</h1>
Or this:
HTTP/1.1 302 Object moved
Location: /fancy-404-error-page.html
It is explained here:
Returning a code other than 404 or 410 for a non-existent page (or
redirecting users to another page, such as the homepage, instead of
returning a 404) can be problematic. Firstly, it tells search engines
that there’s a real page at that URL. As a result, that URL may be
crawled and its content indexed. Because of the time Googlebot spends
on non-existent pages, your unique URLs may not be discovered as
quickly or visited as frequently and your site’s crawl coverage may be
impacted (also, you probably don’t want your site to rank well for the
search query File not found).
Your idea about programmatically sending the 404 header is correct and it instructs the search engine that the URL they requested does not exist and they should not attempt to crawl and index it. Ways to set response status:
header($_SERVER["SERVER_PROTOCOL"] . " 404 Not Found");
header(":", true, 404); // this is used to set a header AND modify the http response code
// ":" is used as a hack to avoid specifying a real header
http_response_code(404); // PHP >= 5.4

Related

HTTP 302 error because of a header()

if I validate html or register web in any serch engine, I get 302 error.
The reason is a header() function. If I take it away, everything is fine with 200 OK status.
So the main problem is that I need this redirection for web to be multilingual.
The logic is next. When user enters the web page for the first time index.php - require_once a file with a function:
function cookies() {
if (!isset($_COOKIE["lang"])){
setcookie('lang','ukr', time()+(60*60*24*31));
header('Location: index.php');
}}
cookies();
so the user sees a page already filed with a deafault language.
If there would be no redirection from require_once file the data from mysql won't be downloaded and user won't see any text.
The question: should I leave this with HTTP 302 or rebuild the whole site/logic not to have any redirects at index page???
302 is not an error. It is the status code for "Found" (aka "The document you asked for is over here"). PHP will insert this for you automatically if you add a Location header (unless insert a status manually, but you don't want a 301 here)
This is the expected response if you are telling people to go and get a different document based on their language preferences.
It is odd to redirect from index.php to index.php though. Presumably you should just return the appropriate document directly instead of redirecting.
I got it. It's actually pretty simple.
The validators don't accept cookies. So they get stuck in a an infinite loop.
You can test this:
delete all your cookies from your computer.
Disable cookies in your browser and try loading your website.
Whenever You use header("location: .... you will get a 302, it's a status and not an error, it's telling the browser that the site has redirected the page:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
Read those validators and engines and see if having the 302 is a problem for whatever you are trying to do, normally it shouldn't be.
A dirty way would be to force the header, personally I don't encourage this and don't know what side-effects could it have really, but it could be a quick workaround to trick those engines:
function cookies() {
if (!isset($_COOKIE["lang"])){
setcookie('lang','ukr', time()+(60*60*24*31));
header('Location: index.php');
header('HTTP/1.1 200 OK'); // <--- Forcing the header status
}}
cookies();

Is it possible a Search Engine Friendly Redirct with Interface page?

I want to redirect page like: http://www.mysite.com/index.php?id=1 to http://www.mysite.com/the-real-name.htm but I don’t have the-real-name in first url then I should get it from db.
I made an interface page then I redirected page http://www.mysite.com/index.php?id=1 to it and I fetched the-real-name (with id parameter in url) from db then I redirected page to http://www.mysite.com/the-real-name.htm with PHP Header function.
Is this process search engine friendly?
Which page will be indexed with search engine crawler? Interface page or http://www.mysite.com/the-real-name.htm ?
Which is the best solution for indexing http://www.mysite.com/the-real-name.htm ?
Thanks a lot
If you want to tell the search engine that the final URL is "the URL", you need to do a permanent redirect. The HTTP status code is 301.
header('Location: http://www.mysite.com/the-real-name.htm', true, 301);
For the first redirect, you need to do a temporary redirect. The HTTP status code is 302.
header('Location: http://www.mysite.com/index.php?id=1', true, 302);
Keep in mind that it's good practice to not only send headers back for redirects, but a HTTP/HTML BODY as well that is shipping human readable information where the new location is. Redirects are not to be expected to be automatically performed by the client.
Different ways to implement
Depending on the system you work on, setting a HTTP status header with PHP might differ. The code above is for a working PHP version. Stick to latest. However, if you can not and the server integration is broken you might to push a bit the limits and force around a bit:
# Manually sending the HTTP 1/1 status line header - PHP does this nowadays, so normally not needed. But if you need it, ensure it's the first header you send.
header ('HTTP/1.1 301 Moved Permanently');
# Same here, but some CGI/FCGI+PHP implementations require you to set the Status header as well manually. Normally not needed.
header ('Status: 301');
# Set the Location header and status: (you will always need this)
header ('Location: http://www.mysite.com/the-real-name.htm', true, 301);
Always check if your script sends the correct headers by requesting it with a tool that is able to display the response headers not performing the redirect automatically, like curl:
$ curl -i "http://www.mysite.com/index.php?id=1"
Otherwise it takes a little long to wait for google to reflect the changes only for you to realize that you made some error.
When redirecting also set a 301 header and the search engines will know what to to from there.
header ('HTTP/1.1 301 Moved Permanently');
header ('Location: '.$location);
If the relationship cannot change, use a permanent redirect as hakre suggested (a 301 status code). Otherwise, if that same id value might point somewhere else in the future, use a temporary redirect.
In either case, if the canonical (official, main, primary) URL is "http://www.mysite.com/the-real-name.htm", you can tell search engines that with a canonical meta tag in the page's head section:
<link rel="canonical" href="http://www.mysite.com/the-real-name.htm" />

PHP or htaccess make dynamic url page to go 404 when item is missing in DB

Typical scenario:
DB items are displaied in page http://...?item_id=467
User one day deletes
the item
Google or a user
attempts to access http://...?item_id=467
PHP diggs into DB and sees items does not exist anymore, so now PHP must tell
Google/user that item is not existing via a 404 header and page.
According to this answer I undertstood there is no way to redirect to 404 Apache page via PHP unless sending in code the 404 header + reading and sending down to client all the contents of your default 404 page.
The probelm: I already have an Apache parsed custom 404.shtml page, so obvioulsy I would like to simply use that page.
But if i read an shtml page via PHP it won't be parsed by Apache anymore.
So what do you suggest me to do?
Is there maybe some trick I could use palying with htaccess too?
Thanks,
Hmm. Two ideas come to mind:
Redirect to the 404 page using header("Location:...") - this is not standards-compliant behaviour though. I would use that only as a last straw
Fetch and output the Apache-parsed SHTML file using file_get_contents("http://mydomain.com/404.shtml"); - also not really optimal because a request is made to the web server but, I think, acceptable in most cases.
I doubt there is anything you can do in .htaccess because the PHP script runs after any rewrite rules have already been parsed.
IF you are using apache mod_php, use virtual('/404.shtml'); to display the parsed shtml page to your user.
I was trying to do this exact same thing yesterday.
Does Pekka's file_get_contents/include result in a 404 status header being sent? Perhaps you need to do this before including the custom error page?
header($_SERVER["SERVER_PROTOCOL"]." 404 Not Found");
You can test using this Firefox extension.
I was looking exactly for something like you needed, so you have a page:
http://example.com/page?item_id=456
and if later you want that if item is missing you are redirected to:
http://example.com/page_not_found?item_id=456
In reality I found it is much more maintainable solution to just use the original page as 404 page.
<?php
$item = findItem( $_GET['item_id']);
if($item === false){
//show 404 page sending correct header and then include 404 message
header( $_ENV['SERVER_PROTOCOL'].' 404 Not Found', true );
// you can still use $_GET['item_id'] to customize error message
// "maybe you were looking for XXX item"
include('somepath/missingpage.php');
return;
}
//continue as usual with normal page
?>
So if item is no longer in the DB, the 404 page is showed but you can provide custom items in replace or error messages.

Return 404 if non existant page # PHP

I have a dynamic review system in place that displays 30 reviews per page, and upon reaching 30 reviews it is paginated. So I have pages such as
/reviews/city/Boston/
/reviews/city/Boston/Page/2/
/reviews/city/Boston/Page/3/
and so on and so forth
Unfortunately, Google seems to be indexing pages through what seems like inference - such as
/reviews/city/Boston/Page/65/
This page absolutely does not exist, and I would like to inform Google of that. Currently it displays a review page but with no reviews. I can't imagine this being very good for SEO. So, what I am trying to do if first check the # of results from my MySQL query, and if there are no results return a 404 and forward them to the home page or another page.
Currently, this is what I have.
if (!$validRevQuery) {
header("HTTP/1.0 404 Not Found");
header("Location: /index.php");
exit;
}
Am I on the right track?
You need to output the 404 status, and show a response body (= an error page) at the same time.
if (!$validRevQuery) {
http_response_code(404);
// output full HTML right here, like include '404.html'; or whatever
exit;
}
Note that you cannot use a redirect here. A redirect is a status code just as the 404 is. You can't have two status codes.
You cannot do both send a 404 status code and do a redirection (usually 3xx status code). You can only do one of them: Either send a 404 status code and an error document or respond with a redirection.
As Pekka suggests, the best option is to do a 404 status, and then put your 404 page code after that.
It is bad practice for SEO if you just 301 (redirect) the page because then the search engines will continue to visit the page in order to see if the redirect is still there.

Redirecting to frontpage after 404 error in PHP

I have a php web page that now uses custom error pages when a page is not found. The custom error pages are included in PHP.
So when somebody types in an URL that does not exists I just include an error page, and the error page starts with:
<?php header("HTTP/1.1 404 Not> Found"); ?>
This also tells crawlers that the page does not exist.
Now I have set up a new system. When a user types a wrong url, the user is sent back to the frontpage and a message is displayed on the frontpage. I redirect to the frontpage like this:
header('Location:' . __TINY_URL . '/');
Now the problem is PHP just sends back a 200 code, page found.
How can I mix these two to create a 404 code on the frontpage.
And is this overall a nice way of presenting and error page.
It's giving you a 200 code because you are redirecting to a page that returns a 200 code. The way ive done this before is to send the 404 header then load the 404 view.
header("HTTP/1.0 404 Not Found");
include("four_o_four.php");
Redirecting after an error is not a very good idea. It's especially annoying for people who like to type in/edit URLs, because if you make a typo, you'll get redirected to some arbitrary page and have to start over.
I suggest you don't do this at all. If you want to, you can have your error page look like your front page though, albeit I think that'd be somewhat confusing.
You can add this into your htaccess
ErrorDocument 404 http://www.yourdomain.com/404.php
or
ErrorDocument 404 http://www.yourdomain.com/index.php #Your homepage
This will send the intials 404 header

Categories