Replacing .html files with .php files while maintaining search engine rankings - php

I maintain a website that contains a dozen or so .html documents which I have just rewritten to include php code. As search engines currently index the .html documents, I would rather not break those links and I certainly don't want to do anything that will affect my search rankings. I understand I have a couple of choices.
Option 1 is to replace all the .html extensions on my documents with .php, and then update .htaccess so that requests for .html documents are rewritten/redirected to the corresponding .php documents (as suggested here).
If I want to make this a permanent (301) redirection, so that the search engine links are replaced the next time my site is crawled, is this the correct way to do this?
RewriteEngine On
RewriteRule ^(.*)\.html$ $1.php [L,R301]
Option 2 would be to instruct the webserver to send all html documents through the php parser (as suggested here), which means the .html extension on the the files doesn't need to be changed at all:
AddType application/x-httpd-php .htm .html
So I see two viable choices. Is one better than the other (or can you think of a better one)?

The first method means you need to rename all of your html files to php files and for a little while, a marginal amount of extra traffic for the redirects.
The second method means you don't need to change anything at all and html files get processed by the PHP handler like php files do.
First method is more work but it also means your site is more portable. Meaning that if you copy your site to a new host that, say, doesn't give you the ability to change the handler types, then you will still be fine because your files end with the php extension.
The second method is less work and won't require search engines to re-index your site but will make your site a little less portable.
Note that you can also use mod_alias to redirect:
RedirectMatch 301 ^/(.*).html$ /$1.php

Related

Having issues with Parsing PHP in .HTML Files

I have an issue parsing PHP in HTML Files.
I am using an install of Vesta and the domain is running fine. The site in question has
AddType application/x-httpd-php4 .htm .html
# and
AddType application/x-httpd-php5 .htm .html
in the .htaccess, which before I moved server it was allowing php to run in html. I have also tried every single variant of this which I have found on stack overflow and none of them are working.
I can't figure out for the life of me why its not now working?
Has anybody got any ideas?
Thank you
Dan Williams
Since your server will not allow you to use PHP in HTML, just rewrite all .html requests to php in .htaccess:
RewriteEngine On
RewriteRule ^(.*)\.html$ $1.php [L]
Should solve the problem. (https://stackoverflow.com/a/5990276/2119863)
Why won't it parse PHP in HTML?
The more types of files the server needs to push through the php interpreter, the more memory, processor and electricity it will consume. It's like with cars and trucks. Cars do not haul big trailors for a reason - trucks have much bigger engines and take the load but leave a bigger carbon footprint.
The second reason is the separation of functionalities. Seeing a html file, you should be 100% confident - across all and any servers - that this file will not print_r($_SERVER);. And when seeing a php file, you should be confident it performs some dynamic actions. And just like you shouldn't expect a nurse to build houses, neither should you expect HTML to parse PHP. :)

Handling site-wide URL rewrites with PHP

I'm looking into the feasibility of using PHP - instead of mod_rewrite - to handle URL canonicalization. I'm looking to be able to map a large number of different URLs to a given physical PHP page, and handle 301's and 404's in a more centralized and maintainable way. This will include common misspellings, aliases, search engine friendly URL parameters, and the like. These needs seem well outside the power of mod_rewrite, so I'm looking into other options.
I'm thinking I would create a canonical.php script which I map every page to with the following in .htaccess (borrowed from this post):
RewriteEngine On
RewriteBase /
RewriteRule ^(.*)$ canonical.php/$1?%{QUERY_STRING} [L]
And then canonical.php would do whatever URL parsing / db lookups / redirects / etc. are necessary, then simply include /the/appropriate/file.php for the given request.
Is this a reasonable course of action? Is such functionality actually feasible with mod_rewrite directly? (DB lookups and the like aside) will this be distinctly slower than mod_rewrite? Is there any other methodology that's more robust than a PHP wrapper?
You're talking about routing, which plenty of frameworks do. Take a look at this answer: https://stackoverflow.com/questions/115629/simplest-php-routing-framework
What I would suggest and what I use is a php file that gets called everytime when a 404 is encountered and it stores the url encountered.
Then once a week I go to the management console and I map the wrongly spelled mistyped, old urls, searchengine's history url's to the current existing urls then I hit the parse button and it spews out a new updated .htaccess file out for me.
That way you lighten the load on your database, loading time, compiling time and redirecting time.
Just my 2 cents.
Haven't had a chance to test this, but this answer pointed me at Apache's FallbackResource directive, which sounds very promising.

Parse HTML as PHP

Are there any security / performance concerns if we set the Apache web server to configure Apache to handle all HTML as PHP? I was specifically referring to:
AddType application/x-httpd-php .php .php3 .php4 .html
I was in a situation where I needed to add some PHP logic into some HTML files; ideally, I didn't have to change the filename e.g. page.html to page.php (to keep the page rank, etc. for page.html).
This is related to the following question: httpd AddType directive
Edits:
From the existing answers / comments below, it looks like the community suggests to either use redirects or only target specific HTML files. The constraint is that I am redesigning an existing site (400+ HTML pages; each of them uses some sort of Dreamweaver template that pulls in the header and footer from different files). I was hoping to completely shy away from Dreamweaver move into something non-proprietary. So, I am down with two options:
Use Server Side Includes (SSI) to pull in the header and footer. This will result in all my HTML files to be decorated with SSI.
Sprinkle some PHP snippet to include the header and footer. For this choice, I have to make sure the file name stays unchanged.
The more files the server determines it needs to pass through the PHP interpreter, the more overhead involved, but I think this goes without saying. If your site does not have ANY pages with plain HTML, then you're already paying all the performance penalties that you could possibly pay - adding HTML to the list is no different in this case than simply renaming all the files to have a .php extension.
The real performance penalty would come if you do have plain HTML pages - the server will needlessly pass these pages to PHP for interpretation when none is necessary. But even then, it isn't dramatic - the PHP interpreter won't be needed for those HTML pages, so it won't do anything aside from determining that it doesn't need to do anything. This has a cost, but it isn't significant.
Now, if we're talking high-volume here, every little bit of performance matters and this would not be a practicable solution. For low- to mid-volume sites, however, the performance penalty would be nill.
If this is a one-time change and there are a limited number of files that are affected, then it may be more conservative to use a FilesMatch directive.
<FilesMatch "^(file_one|file_two|file_three)\.html$">
AddType application/x-httpd-php .html
</FilesMatch>
I disagree with Tuga. I don't think you should make this change for all your files. Anytime you deal with security, you should try to control the environment. Doing it only for one file is probably the safest. You could do something like
<FilesMatch "^file_name\.html$">
AddType application/x-httpd-php .html
</FilesMatch>
This will only match file_name.html and process it as .php where it is much safer to do this than treat ALL .html files as php.

Site Converter - Website Copier

Does anybody know of a software program that will convert a website built with PHP, JSON and jquery into a mainly HTML format. We need to do a conversion for SEO purposes and don't want to have to rewrite the whole site.
HTML is a language used for markup, PHP is an object oriented functional language. You cannot convert one to the other, I'm sorry.
If you're trying to make sure that you have nothing but .HTML extensions on your public URLs for SEO purposes:
Someone's selling you a line of BS.
You need access to your server configuration.
You don't have to convert anything but your links.
The .PHP extension is the default file extension configured to be sent from Apache to the PHP engine for parsing. You can change what file extension gets parsed in your configuration file.
http://encodable.com/parse_html_files_as_php/
This will allow you to keep .HTM files static and have .HTML files parsed as if they were .PHP files.
Try this: http://www.httrack.com/
It will only return a static HTML site. But it might be a good base for you.
Since the only thing which really knows what type of file you're using is the server itself, it does not really matter what you're using on the back end. Most search engines are smart enough to know that so they don't really care so much. Now, people might care. People might say, "Hm, well, this is .html, that means that this person must have a flat file which is constantly being updated," but I doubt it.
If you're really concerned about having a .html extension, then you can fake it by using htaccess:
RewriteRule ^(.*)\.html$ $1.php [L]
If that is placed in a .htaccess file at the root of your site, it will redirect all requests which end with .html to a corresponding page with .php. It will do that transparently both to the user and to the crawlers.
Of course, every link on your site will need to convert from linking to .php, but it will replace the impossible task of using only .html files with the annoying task of replacing all of your .php links.
As to removing JavaScript, well, you could do that, or you could design your site in such a way that it still uses AJAX but it works with the search engines instead of against them. The biggest trick is to make sure that your site can work with as little AJAX as possible and then use AJAX to supplement. We've come a long way from requiring that all websites work in lynx, but it is still good practice to make sure that they are still sane without the benefit of JS/CSS.
Besides, search engines are getting smarter. Google has been working to read AJAX intelligently since 2009. But even if they weren't, there are plenty of articles out there on using AJAX without hurting SEO.
There is no need to nerf your site because of SEO -- You can have your AJAX and SEO too.
This is hard to accomplish if there is a lot of dynamic data. For a simple website you can just cache every page and make that your new website. I am not sure how useful that would be. For example if you have forms or other user input fields then things will just not work. In any case this is how you do it using wget.
$ wget -m http://www.example.com/
More reading here.

Running other file types as PHP

Is there any problem with running HTML as PHP via .htaccess? such as security or best practices etc. was doing this to make URLs cleaner.
## run the following file types as php
Addhandler application/x-httpd-php .html .htm .rss .xml
Well ideally id like to have my URLs like
localhost/blog/posts/view.php?id=64
to be
localhost/projects/bittyPHP/bittyphp/posts/view/id-64
But having trouble accomplishing that without routing everything to one file and having PHP run determine the paths. I guess this is my real question
I would use mod rewrite.
Probably you do not need to run all html files as PHP, and if you have short_tags enabled "<?" in XML will give you trouble.
Keep in mind that you will run each and every of those files through the PHP handler then. If there is no PHP inside the files, the parser will still inspect them to see if there is any PHP in it. This adds some overhead, but it is likely neglectable in most setups.
Main issue I would say is performance. If you have a significant number of plain HTML files then you're creating unnecessary overhead by always running them through the PHP interpretter.
Best practice is not to do this, but use "friendly" URLS like mysite.com/item/123 and use mod_rewrite to convert them to mysite.com/displayitem.php?id=123 internally
Like many people have already stated, mod_rewrite is the best solution for accomplishing friendly URLs.
Sitepoint has a decent guide to getting started with mod_rewrite.

Categories