Html - PHP - Robots - php

I am just looking for a bit of advice / feedback, I was thinking about setting up and opencart behind an HTML site (shop) that gets ranked well in Google.
The index.html site appears instead of the index.php page by default on the web server (I have tested it).
I was hoping I could construct the site in maintenance mode on the domain, then just delete the html site leaving the (live) opencart site once finished (about 2 weeks).
Just worried in case this may effect ranking.
In the robot.txt file put:
User-agent: *
User-agent: Googlebot
Disallow: /index.php
I would also put in the index.php page (opencart) header:
<meta name="robots" content="nofollow">
<meta name="googlebot" content="noindex, noarchive">
I don't want Google to cache the "website under maintenance" opencart index.php page. It could take a month or so to refresh it.
Obviously I would remove/change the Disallow robot.txt and meta tag etc commands once live and html site files deleted.
I would like to know if any one has tried it or if it will work? Effect google ranking etc?
Is it just a Bad idea? Any feedback would be appreciated.
Many Thanks

I assume you're using LAMP for your website (Linux, Apache, MySQL, PHP)
1) Apache has option to set default page, set it to index.php instead of index.html
2) You may either use re-write rule in .htaccess file (read more here. If your hosting provider doesn't give permission to .htaccess, there's a workaround!In index.php you may include this snippet at the top:
<?php
if($_SERVER['REQUEST_URI'] == '/index.php'){
header("HTTP/1.1 301 Moved Permanently");
header("Location: /");
die();
}
?>
So even if user opens up http://www.domain.com/index.php,
he'll get redirected to http://www.domain.com/
(Eg: My site http://theypi.net/index.php goes to http://theypi.net/)
I've also set similar redirect for http://www.theypi.net to redirect to http://theypi.net
Choosing between one of the two options (with or without www) helps improve ranking as well)
To your question
I would like to know if any one has tried it or if it will work? Effect google ranking etc?
Shorter URL: This is part of URL hygiene which is meant for SEO improvement
If homepage opens just through domain name (without index.php) then your CTR (Click Through Rate) impact in search results is higher.
I would suggest not to use robot blocking mechanism unless above steps aren't feasible for you.
Hope it helps, Thanks!
Edit:
And if you don't even have permission to set homepage as index.php. You may do one of following:
1. create index.html and put php code. If WebServer understands php, put redirect logic as above.
2. else, put JavaScript redirect (not a recommended way)
<script language=”JavaScript”> self.location=”index.php”; </script>

Related

SEO HTML status 301 on files?

Good morning.
Currently trudging through SEO part of a website and i'm using a checklist found on http://www.clickminded.com/seo-checklist/ found parts of to be very helpful.
I have gotten to a part which suggested using Scream Frog SEO which has flagged up quite a significant amount of 301 redirects on images/css/js and even links to other parts of my website.
example of this would be :
Address : http://c-elec.co.uk/welcome/gourmet
Content : text/html
Status code : 301
Status : Permanently moved
Every 301 has the same error and content type even if its an image,js file or css always having their redirect URI has the same as their original address.
I am not sure if this is a problem with how my project is structure or a server issue but other sites i've checked on the server have no 301 when scanned. I'm using php/codeigniter with foundation 5 on a NGINX server.
Thanks for any feedback
It seems, that some of your resources are included using http://www.example.tld.
These requests are redirected to http://example.tld automatically, maybe by some rewrite rule or whatever.
So this request: http://www.c-elec.co.uk/js/app.js is redirected to http://c-elec.co.uk/js/app.js
You need to find out where this redirect is done or include your resources with http://example.tld.
BTW: You are including JS code after the closing HEAD and after the closing BODY tag.

Keep old website (HTML files) on webserver but disallow search agents to index them

I’ve just finished a website for a client who is going to replace their old (very old, HTML hard-coded website). The problem is that they (for now) want to save their old website and all the files on the webserver in the original position. This does not create any issues with the new website which is made in PHP and Wordpress but it makes a big deal when Google (and others) are dropping by with their search robots and indexing.
When doing a Google search it still finds the old HTML files. Is there any way that I could “keep” the old HTML files on the web server but make sure that for the first no robots are going to index them and if anyone is trying to navigate to an HTML page, e.g. http://www.clientdomain.com/old_index_file.html, they are getting redirect? I think the last part might be able to be done in .htaccess but I haven’t found anything useful searching for it.
The first question about not allowing robots and agents to index HTML files, I’ve tried to put these two lines in my robots.txt file
Disallow: /*.html$
Disallow: /*.htm$
But I’m unsure if it will work?
I might deal with this in a completely wrong way but I’ve never tried that a client has requested to keep the old website on same server and in original location before.
Thanks,
- Mestika
<?php
$redirectlink = ‘http://www.puttheredirectedwebpageurlhere.com‘;
//do not edit below here
header (‘HTTP/1.1 301 Moved Permanently’);
header(‘Location: ‘.$redirectlink);
exit;
?>
This code will use a 301 redirect the page to the URL that you desire. The filename of this .php should be the URL slug of the page you want to redirect.
301 Redirect
A 301 redirect, or also known as a permanent redirect, should be put in place to permanently redirect a page. The word ‘permanent’ is there to imply that ALL qualities of the redirected page will be passed on to the detour page.
That includes:
PageRank
MozRank
Page Authority
Traffic Value
A 301 redirect is implemented if the change you want to make is, well… permanent. The detour page now embodies the redirected page as if it was the former. A complete takeover.
The old page will be removed from Google’s index and the new one will replace it.
Or you can do it in your htaccess like shown by the above poster.
There's probably a lot of ways to handle this, assuming you have a clear mapping of pages from the old template to the new one, you could detect the Google bot in your old template (see [1]) and do a 301 redirect (see [2] for example) to the new template.
List item
[1] how to detect search engine bots with php?
List item
[2] How to implement 303 redirect?
Will take some work, but sounds like you'll need to crack open your htaccess file and start adding 301 redirects from the old content to the new.
RewriteCond %{REQUEST_URI} ^/oldpage.html
RewriteRule . http://www.domainname.com/pathto/newcontentinwp/ [R=301,L]
Rinse and repeat
This is definitely something mod_rewrite can help with. Converting your posted robots.txt to a simple rewrite:
RewriteEngine on
RewriteRule /.*\.html /index\.php [R]
The [R] flag signifies an explicit redirect. I would recommend seeing http://httpd.apache.org/docs/2.4/rewrite/remapping.html for more information. You can also forbid direct access with the [F] flag.

How to redirect from former CakePHP page a non-CakePHP page using .htaccess

I have some redirects in place from our previous site that used CakePHP. The new site has plain PHP pages. When trying to redirect the following in an .htaccess file I am having problems:
Redirect 301 /old-page-here http://samesitename.com/somedirectory/newfilename.php
The /old-page-here had the extension .ctp. When I run this in my browser I get a loop with newfilename.php appearing over and over again (as well as the other webaddress information - but not repeated).
I was having a similiarish problem with when CakePHP put page numbers in, e.g.
Redirect 301 /olddirectory/old-page-here/2 http://samesitename.com/somedirectory/newfilename-2.php
When it would add both directories into the webaddress. Redirecting pages that were within the webroot directory (pulled from the database) that did not end in a slash-number have directed OK.
Any ideas what is going wrong?
Maybe this could come in handy for building your new urls. As for your trouble with redirecting when cakephp was still installed: I can imagine that this is somewhat complicated to achieve from "outside", as somewhere in the process cake's dispatcher dissolves the address into controller,action, parameters. From the "inside", you can do
redirects with a status code.

PHP, MySQL: Security concern; Page loads in a weird way

I am testing the security of my website. I am using the following URL to load a PHP page in my website, on localhost:
http://localhost/domain/user/index.php/apple.php
When I do this, the page is not loading normally; Instead the images, icons used in the page simply vanish/disappear from the page. Only text appears. And also on any link I click on this page, it brings me to this same page again without navigating to the required page. So if I have hyperlinks to other pages, such as "SEARCH", which points to search.php, instead of navigating to the search.php page, it refreshes the index.php page and just appends the page name of the destination page to the end of the URL.
For example, say I used the link above. It then loads the index.php page minus the images at it's will. When I click on the "Search" link to navigate to the search page, I see the following in the URL:
http://localhost/domain/user/index.php/search.php
I have a redirection configured to a 404 error page in my .htaccess file, but the page does not redirect to the 404 error page. Notice the search.php towards the end of the URL above. Any other link that I click, reloads the index.php page and just appends the destination page name to the end of the URL like I have shown above.
I was expecting to see a 404 Error but that does not happen. The URL should not even be able to load the page because I do NOT have a "index.php" folder in my website.
What can I do to solve this? All help is appreciated.
Update:
The security concern is that users being able to see a non-existing page (which is quite misleading) like http://localhost/domain/user/index.php/apple.php especially when it does not exists. This makes me feel that this is going to open doors for hackers to exploit the website and compromise its integrity. Can this happen in such a case? I want users to see a 404 error page and any I am willing to any change needed in .htaccess file to accommodate this.
Can you suggest me of some code that I can add to my .htaccess file to accomplish this?
Thank you.
EDIT1:
Here are the contents of my .htaccess files. I have 2 of them. One in domain root and the other in 'user' folder/directory.
/*.htaccess in domain root*/
ErrorDocument 404 /domain/404.php
/*.htaccess in user folder*/
ErrorDocument 404 /domain/user/404.php
EDIT2:
#Pekka Thanks for the link. I added the following code in the .htaccess file (within the user directory)
<Files "mypaths.php">
Options +Includes
SetOutputFilter INCLUDES
AcceptPathInfo Off
</Files>
But still this does not show me the 404 page. Sorry, I am very novice with the .htaccess. Hope you will be able to tell me what I am wrong. Thanks.
The behaviour of why this loads a page:
http://localhost/domain/user/index.php/apple.php
is easily explained. The request is passed to index.php, with apple.php being in the $_SERVER["PATH_INFO"] variable.
So you are in the /user directory as far as the server and the PHP script are concerned.
This is also why no 404 turns up: index.php is always found, no matter which file you specify as the last file.
The browser, however, interprets index.php not as a file, but as the parent directory of apple.php.
Therefore, every relative link you put on the page, say to contact.php is fetched like this:
http://localhost/domain/user/index.php/contact.php
which obviously won't work.
What you may want to do is use absolute paths in images and links, but either way, this is of no concern to security whatsoever.
As a side note, this whole phenomenon is sometimes used to create search engine friendly URLs without having to use mod_rewrite module.
You can turn this behaviour off using the AcceptPathInfo directive.
As for the images, you just have to learn to use absolute paths to your images, which is absolutely necessary
Just instead of images/head.jpg write /images/head.jpg and you will have all your images and styles in place.
As for the /user/index.php/apple.php - why do you want such an odd address?
Why not to use just user/apple.php?
And where is security in your question?

How to make extension-less url for a PHP based site?

Do I have to put every file in a different folder?
like:
about-us/about-us.php
profile/profile.php
etc.
or is there any other automatic solution.
I want to convert
http://sitename.com/about-us/about-us.php
to
http://sitename.com/about-us
You want pretty URL rewriting.
An Apache .htaccess examples from that article:
Pretty URL: /browse/animals-24/cats-76.html
Ugly URL: /browse.php?category=24&subcategory=76
.htaccess:
Options +FollowSymLinks
RewriteEngine On
RewriteRule ^browse/[A-Z0-9_-]+-([0-9]+)/[A-Z0-9_-]+-([0-9]+)\.html$ browse.php?category
If you have a directory called about-us, then you could simply create an index.php file within that directory, and by default, your .htaccess file should redirect users to the correct page.
Thus, going to example.com/about-us/ would bring the user to the same page as about-us.php. It would benefit you to do some research about 301 redirection.
http://www.tamingthebeast.net/articles3/spiders-301-redirect.htm
Basically, when Googlebot crawls your website, the last thing you want is for Google to find both copies of the page, one listed as about-us/ and one listed as about-us/about-us.php. Duplicate content is bad, and optimizing your website for search engines is really not at all that difficult to do.
Let's say you have a leaderboard page on your website with 1000 members. Instead of Google finding all leaderboard.php?user=Whatever, it would be a good idea to block that page from being accessed by Google, or else you will result in hundreds of unwanted archived pages on their search engine.
You might also want to make sure your website can be accessed either by www.yourwebsite.com or by simply yourwebsite.com, BUT NOT BY BOTH (without being 301 redirected).
Hope that helped. Happy programming!
EDIT: If you try renaming your about-us.php file to simply index.php, that would be your quick fix. Depending on your .htaccess configuration, I'm willing to bet that would be your easy fix.

Categories