Return 404 if non existant page # PHP - php

I have a dynamic review system in place that displays 30 reviews per page, and upon reaching 30 reviews it is paginated. So I have pages such as
/reviews/city/Boston/
/reviews/city/Boston/Page/2/
/reviews/city/Boston/Page/3/
and so on and so forth
Unfortunately, Google seems to be indexing pages through what seems like inference - such as
/reviews/city/Boston/Page/65/
This page absolutely does not exist, and I would like to inform Google of that. Currently it displays a review page but with no reviews. I can't imagine this being very good for SEO. So, what I am trying to do if first check the # of results from my MySQL query, and if there are no results return a 404 and forward them to the home page or another page.
Currently, this is what I have.
if (!$validRevQuery) {
header("HTTP/1.0 404 Not Found");
header("Location: /index.php");
exit;
}
Am I on the right track?

You need to output the 404 status, and show a response body (= an error page) at the same time.
if (!$validRevQuery) {
http_response_code(404);
// output full HTML right here, like include '404.html'; or whatever
exit;
}
Note that you cannot use a redirect here. A redirect is a status code just as the 404 is. You can't have two status codes.

You cannot do both send a 404 status code and do a redirection (usually 3xx status code). You can only do one of them: Either send a 404 status code and an error document or respond with a redirection.

As Pekka suggests, the best option is to do a 404 status, and then put your 404 page code after that.
It is bad practice for SEO if you just 301 (redirect) the page because then the search engines will continue to visit the page in order to see if the redirect is still there.

Related

Redirecting with 404 doesnt work

I havea code that should redirect in case it doesnt have some request parameter set correctly.
if(!is_numeric($_GET['id'])){
header("HTTP/1.0 404 Not Found");
header('Location: '.$url);
exit();
}
Problem is that whenever I check with Firefoxes plugin, live HTTP headers, I see 302 temporary redirect. why is that? why no 404 response is given?
It doesn't make sense to send a Location header with a 404 status code.
Location means "What you asked for is over here"
404 means "I don't have what you asked for"
The two statements are incompatible.
If you want to send a particular human readable explanation of the error, then just output it as you would for any other kind of document. You could include() it if you like. Don't try to redirect to it.
Problem is that whenever I check with Firefoxes plugin, live HTTP
headers, I see 302 temporary redirect. why is that? why no 404
response is given?
Not sure what you want to do here but following will provide your purpose.
if (! is_numeric($_GET['id'])) {
header('Location: ' . $url, true, 404);
exit();
}
You can only send ONE of the headers you are attempting as stated above.
In case of 404 - this will display the error relevant error pages as defined by you're web server config.
To do what you want... you will need to modify the server's 404 from a plain html to a php page and work out what to display from there (you can base on referrer etc).
From within the 'dynamic' 404 page, you can then do you're redirect if required.

Tell search engines that page does not exist

I have checked the logs and found that the search engines visits a lot of bogus URL's on my website. They are most likely from before a lot of the links were changed, and even though I have made 301 redirects some links have been altered in very strange ways and aren't recognized by my .htaccess file.
All requests are handled by index.php. If a response can't be created due to a bad URL a custom error page is presented instead. With simplified code index.php looks like this
try {
$Request = new Request();
$Request->respond();
} catch(NoresponseException $e) {
$Request->presentErrorPage();
}
I just realized that this page returns a status 200 telling the bot that the page is valid even though it ain't.
Is it enough to add a header with 404 in the catch statement to tell the bots to stop visiting that page?
Like this:
header("HTTP/1.0 404 Not Found");
It looks OK when I tests it, but I'm worried that SE bots (and maybe user agents) will get confused.
You're getting there. The idea is correct - you want to give them a 404. However, just one tiny correction: if the client queries using HTTP/1.1 and you answer using 1.0, some clients will get confused.
The way around this is as follows:
header($_SERVER['SERVER_PROTOCOL']." 404 Not Found");
A well-behaved crawler respects robots.txt at the top level of your site. If you want to exclude crawlers, then #SalmanA's response will work. A sample robots.txt file follows:
User-agent: *
Disallow: /foo/*
Disallow: /bar/*
Disallow: /hd1/*
It needs to be readable by all. Note this is not going to get users off the pages, just a bot that respects robots.txt, which most of them do.
The SE bots DO get confused when they see this:
HTTP/1.1 200 OK
<h1>The page your requested does not exist</h1>
Or this:
HTTP/1.1 302 Object moved
Location: /fancy-404-error-page.html
It is explained here:
Returning a code other than 404 or 410 for a non-existent page (or
redirecting users to another page, such as the homepage, instead of
returning a 404) can be problematic. Firstly, it tells search engines
that there’s a real page at that URL. As a result, that URL may be
crawled and its content indexed. Because of the time Googlebot spends
on non-existent pages, your unique URLs may not be discovered as
quickly or visited as frequently and your site’s crawl coverage may be
impacted (also, you probably don’t want your site to rank well for the
search query File not found).
Your idea about programmatically sending the 404 header is correct and it instructs the search engine that the URL they requested does not exist and they should not attempt to crawl and index it. Ways to set response status:
header($_SERVER["SERVER_PROTOCOL"] . " 404 Not Found");
header(":", true, 404); // this is used to set a header AND modify the http response code
// ":" is used as a hack to avoid specifying a real header
http_response_code(404); // PHP >= 5.4

Redirect to 404 page or display 404 message?

I am using a cms, and file-not-found errors can be handled in different ways:
The page will not be redirected, but an error-msg will be displayed as content (using the default layout with menu/footer).
The page will be redirected to error.php (the page looks the same like 1. but the address changed)
The page will be redirected to an existing page, e.g. sitemap.php
Is there a method to be preferred in regards to search engines, or does this make no difference?
If it's not found, then you should issue a 404 page. Doing a redirect causes a 302 code, followed by a '200 OK', implying that there IS some content. A 404 flat out says "there is no file. stop bugging me".
Something like this would present a 404 page with proper header code:
<?php
if ($page_not_found) {
header('This is not the page you are looking for', true, 404);
include('your_404_page.php');
exit();
}
Don't redirect.
Forget about search engines. If I type a URL in and make a small typo and you redirect me away, then I have to type the whole thing in again.
The page will not be redirected, but an error-msg will be displayed as content (using the default layout with menu/footer).
Try to make it clear it is an error page. It shouldn't look too much like a normal page.
The page will be redirected to error.php (the page looks the same like 1. but the address changed)
No. Really, really no.
The page will be redirected to an existing page, e.g. sitemap.php
There are a few redirect status codes in HTTP, none of them are "Not Found, but you might like this instead".

PHP Redirect Headers Best Practices

I'm creating a PHP CMS and have some system pages like a 404 page, a maintenance page, and an unauthorized access page. When Page A isn't found, the CMS will redirect to the 404 page; if the user doesn't have access to Page B, it will redirect to the unauthorized access page, etc.
I'd like to use the proper status code in the header of each page, but I need clarification on how to handle the header/redirect. Do I put the 404 header on Page A and then redirect to the 404 page or do I put the 404 status on the 404 page itself? Also, if the latter is the correct answer, what kind of redirect should I use to get there, a 301 or a 302?
If a user arrives on page A and that page doesn't exist, then do not redirect : just send a 404 error code from page A -- and, to be nice for your user, an HTML content indicating that the page doesn't exist.
This way, the browser (and it's even more true for crawlers ! ) will know that the page that is not found is page A, and not anything else you'd have tried to redirect to.
Same for other kind of errors, btw : if a specific URL corresponds to an error, then, the error code should be sent from that URL.
Basically, something as simple as this should be enough :
if (page not found) {
header("404 Not Found");
echo "some nice message that says the page doesn't exist";
die;
}
(Well, you could output something nicer, of course ; but you get the idea ;-) )
I'm not sure if the redirecting is the best way for doing this. Id rather use some built in functionality that is included into the project.
If the data is not found, do not redirect the user to another page, just send him an error message, like Hey, this site does not exists! Try an other one and so.
And not at the end, you should build into the code, the code-part from the answer of Pascal Martin.
I would do this into a function, and call it from a bootstrap or something with a similar behavior.
function show_error($type="404", $header = true, $die = false)
{
if($header)
header("404 Not Found");
echo file_get_contents($type.'.php');
if($die) die; //
// and so on...
}

Redirecting to frontpage after 404 error in PHP

I have a php web page that now uses custom error pages when a page is not found. The custom error pages are included in PHP.
So when somebody types in an URL that does not exists I just include an error page, and the error page starts with:
<?php header("HTTP/1.1 404 Not> Found"); ?>
This also tells crawlers that the page does not exist.
Now I have set up a new system. When a user types a wrong url, the user is sent back to the frontpage and a message is displayed on the frontpage. I redirect to the frontpage like this:
header('Location:' . __TINY_URL . '/');
Now the problem is PHP just sends back a 200 code, page found.
How can I mix these two to create a 404 code on the frontpage.
And is this overall a nice way of presenting and error page.
It's giving you a 200 code because you are redirecting to a page that returns a 200 code. The way ive done this before is to send the 404 header then load the 404 view.
header("HTTP/1.0 404 Not Found");
include("four_o_four.php");
Redirecting after an error is not a very good idea. It's especially annoying for people who like to type in/edit URLs, because if you make a typo, you'll get redirected to some arbitrary page and have to start over.
I suggest you don't do this at all. If you want to, you can have your error page look like your front page though, albeit I think that'd be somewhat confusing.
You can add this into your htaccess
ErrorDocument 404 http://www.yourdomain.com/404.php
or
ErrorDocument 404 http://www.yourdomain.com/index.php #Your homepage
This will send the intials 404 header

Categories