Using htaccess to "fake" an XML file?

Using htaccess to "fake" an XML file? - php

Here's the problem I'm trying to solve: I have a dynamic php-driven website that is constantly being updated with new content, and I want my XML sitemap to stay up to date automatically. Two options I see:
Write a php script that queries my database to get all my content and outputs to http://mysite.com/sitemap.xml, execute the script regularly using a cron job.
Simply create my sitemap as a php file (sitemap.php), query the db and write directly to that file, and use the htaccess rewrite rule RewriteRule ^sitemap.xml$ sitemap.php so that whenever someone requests sitemap.xml they're directed to the php file and get a fresh sitemap file.
I'd much rather go with option #2 since it's simpler and doesn't require setting up a cron, but I'm wondering if Googlebot will not recognize sitemap.xml as valid if it's actually a php file?
Does anyone know if option #2 would work, and if not whether there's some better way to automatically create an up-to-date sitemap.xml file? I'm really surprised how much trouble I've had with this... Thanks!

Just make sure your script generates the appropriate Content-Type header. You can do so with header().

Google will only get the headers and the body of the response. If your php script returns the same headers and the same body as your webserver would return, then there is technically no difference between the PHP script response or the XML file response by your server. Use curl -i http://example.com/ to inspect the response headers of a request if you would like to test that on your own.
So you can safely do this, that's for what mod_rewrite has been designed (next to the many other things).

Related

What file extension a php file should have if Content-Type is altered?

I was reading the jQuery Typeahead documentation from www.runningcoder.org, and taking a look to the v1_user example I was studying the source code from the tabs below the form.
As is obvious, in the PHP tab we can see all the 'data' stuff for then alter (IOW send) a header('Content-Type: application/json'); and output echo json_encode(array(...)); ("Everything seems to be PHP code").
Now, please go to the Javascript tab and search the string in your browser (CTRL+F in most of browsers): url: "/jquerytypeahead/user_v1.json". That url attribute is a jQuery Ajax option from the API (I think we all know what it is). The thing is that is referring to a json file, why? Isn't that PHP file we saw before returning a json content file?
I can prove this by going to http://www.runningcoder.org/jquerytypeahead/user_v1.json?q=ar
www.runningcoder.org/jquerytypeahead/user_v1.json?q=ar (note the json file extension)
Locally testing: I have some code returning application/json (the same as the v1_user example)
name it test.php. When accessing this script I cannot refer to localhost/typeahead/test.json?q=something but I can to localhost/typeahead/test.php?q=something. How??

It's important to understand that the Apache server cannot determine URL's based on what the script is returning for the Content-type heading. In other words, Apache has no way of know ahead of time that a script will be returning JSON data.
That said, there are a few ways to make Apache do what you want. Here are two that I can think of.
Method #1
One is to tell your Apache server that .json files should run through a PHP parser. This means you would actually save your file as test.json and, when Apache goes to serve it up, it will execute it just as it would any file with a .php extension.
I am generally not a fan of this solution, but in some cases it can make sense. To make that happen, add the below to you httpd.config file.
AddType application/x-httpd-php .json
Method #2
Another way is to use redirects to serve up test.php as if it was test.json. (In other words, when the user goes to test.json, you actually execute test.php) I personally like this method a lot better. However, it makes tacking down bugs a little harder because the file you expect to find doesn't actually exist. For most developers familiar with Apache, it's not an issue.
To make the redirect work, you can use something like the below in an .htaccess file.
RewriteEngine On
RewriteRule ^(.*)\.json$ $1.php [L]

Gzip not zipping all CSS & JS

My hosting provider does not use htaccess to enable gzip. Their support told me to change my HTML file to PHP and add the following at the top:
<?php ob_start("ob_gzhandler"); ?>
However, after using gzipWTF to find out if my site is making full use of gzip, there are a number of assets listed - save only 3 - that are not being gzipped.
If you would, go to gzipWTF, and enter my URL:
"http://justinjwilson.com".
Check 'details' as an option. Why is it that most of my JS and CSS files are not gzipped? An easy PHP solution would be best. Remember, I can't use htaccess to enable gzip.

You're out of luck with the one-liner. Adding that line to the top of the now-PHP HTML page is going to cause PHP to handle the gzip-ing of the page content. But only that page. It has no impact on the external files, because it doesn't process them in any way. (Separate files are separate GET requests from the browser; PHP knows nothing about them).
So you need a solution that will allow PHP to handle the gzip-ing of all of your content. A relatively simple solution would be to write a separate PHP page that would handle the delivery of all of your external files. Something like /resource.php?r=myscript.js. This php script would have the "ob_gzhandler" turned on at the top, and then would simply open the file, and echo it out.

htaccess - Forbidden page to work only via ajax or only as an include

I want to deny visitors access to pages but still use the pages. How can I:
~ Make a page unviewable but allow it to process ajax requests.
~ Make a PHP file unviewable but include it in scripts.
It seems I need htaccess. I tried using it but it stopped me from using the file as an include.
For the ajax only thing, it seems I can use this in the ajax-only page:
<?php
$AJAX = (isset($_SERVER['HTTP_X_REQUESTED_WITH']) &&
$_SERVER['HTTP_X_REQUESTED_WITH'] == 'XMLHttpRequest');
if ($AJAX){
// echo ajax code.
}
?>
Is this reliable?
TAGS: only via ajax

One way to accomplish your second question about making it so a script is available for server-side inclusion and usage but not accessible from a client is to add this to an .htaccess file in the folder containing the scripts you wish to protect in this way:
deny from all
Try browsing to the script now and you should not be able to get to it. This works for the entire directory the .htaccess file is placed in.
Another way of 'shielding' the php file from access by clients through the web server like this is by placing the php files in a directory outside your wwwroot/public_html.
In your PHP config you'll have to add this dir to your include-search path, or simply include it via the correct relative path, or by using absolute paths.
For example, if you have root_containing_folder/wwwroot/index.php and root_containing_folder/app/core.php, in index.php you could have
require_once('../app/core.php');
and core would be included, but a browser could never get to core.php on its own. (If they could, it would have to be through a URL like www.facing-site.com/../app/core.php -- which your web server should never allow!)

You can't do those things: when an script makes an AJAX request, it's the user's browser that sends the request. If you want client-side scripts to see your content, browsers must be able to see it.
You can apply some security-through-obscurity, for example by putting some kind of auth token in the script. This won't give you much protection, as all a user has to do is read the JS to get the token, but it will stop casual visitors from poking around. Your 'if XHR' is effectively doing this - a browser won't normally send that header if the address is put in the address bar, but a user can easily get the same effect outside of your AJAX code.

Logging dynamically served files in APACHE

I'm serving up Zip and PDF files on the fly via PHP using an output such as:
header('Content-Disposition: attachment; filename="'.$project->name .'.zip"');
echo($zipfile->zl_pack());
I can't find any reference to these downloads in my APACHE logs though. Is it not logged due to being dynamic?
Do I need to actually write the file to the webserver and then serve the result up to get it logged or am I missing something?
Cheers,
Niggles

Correct. httpd does not look at outgoing headers for logging. error_log() will send a message to httpd's error log, but there's no way to put something in the access log.

The request to the PHP program that generates that header should be logged. The filename mentioned in the content disposition header won't be.
I believe mod_perl would allow you to add custom logging, I don't know if mod_php provides a similar feature.

As a workaround you could use mod_rewrite to have the *.zip file logged and still served it through PHP without actually writing it to the filesystem, the client will have to send two requests though.
1) Change your current script so that it doesn't produce the file, but instead puts the parameters needed for the file creation in the session; instead of the current header and file content you would put header('Location: '.$project->name .'.zip');
2) This would cause the second request. Since the requested file doesn't exist yet, you would use mod_rewrite to change the request to a *.zip file to the same or some other PHP script that reads the parameters from the session and produces the file just like you're doing it now. If you use the same PHP script, you would additionally let mod_rewrite add some parameter like "?action=produceFile" to the request and then test for that parameter in the script.
This way you would have two nice entries in your Apache log without having to physically save the file (and probably delete it later).

FYI I found a really good work-around.
Part of the problem was that we wanted to force a "save as" dialogue as many of the users get confused as to where the file gets saved. So this is why I went the
Content-Disposition : attachment
method.
A better method which still invokes the dialogue is to add this to .htaccess
<Files *.zip>
ForceType application/octet-stream
Header set Content-Disposition attachment
</Files>
write the Zip to the fileserver and redirect the page to the zip.
Sure I have to cleanup every night, but it gets logged and it still forces them to choose where they download the file (on pretty much everything but Safari [ there's always one ]).
Cheers,
Niggles

Reading rewrited (mod_rewrite) files from php

Assuming you have only the URL to a file (hosted on the same server as the app) that has been rewritten via mod_rewrite rules.
How would you read the contents of that file with PHP without having direct access to a .htaccess file or the rewrite rules used to building the original URL?
I'm trying to extract all the script tags that have the "src" attribute set, retrieve the contents of the target file, merge all of them into one big javascript file, minify it and then serve that one instead.
The problem is that reading all of the files via file_get_contents seems to slow the page down, so I was looking at alternatives and if I would be able to somehow read the files directly from the file system without having to generate other requests in the background, but to do this, I would have to find out the path to the files and some are accessed via URLs that have been rewritten.

You can't include it as if it were the original PHP, only get the results of the PHP's execution itself.
If you've got fopen wrappers on, this is as easy as using require, include or file_get_contents on the rewritten URL. Otherwise you have fsockopen and curl as options to create the HTTP request for the result.

As you cannot say how the request would be handled, the only possible solution is to send a HTTP request to that server. But that would only get you the output of that file/script.

PHP lays behind apache and has file access on file-system level using fopen-like or include-like etc... functions. Rewrite module won't work for this access, because these functions use OS file access routines but not apache.
There's no way to do this, but implementing in php-script the same rules of URL-rewriting as you have in .htaccess, because apache-rewriting and php file access know nothing about each other and are on comletely different layers of web-application.
AFTER EDIT: The only way - imlement your rewrite rules in php script and use file system php access after parsing the URLs via php (not rewrite module).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Using htaccess to "fake" an XML file? - php

Just make sure your script generates the appropriate Content-Type header. You can do so with header().

Related

What file extension a php file should have if Content-Type is altered?

Gzip not zipping all CSS & JS

htaccess - Forbidden page to work only via ajax or only as an include

Logging dynamically served files in APACHE

Reading rewrited (mod_rewrite) files from php

Categories

Resources