mod_rewrite stripping long values from HTTP header - php

I'm using OpenAthensSP to return IdP metadata that can potentially access our service. OpenAthensSP returns this data in the form of environment variables in the HTTP(s) header, which we then read in PHP (from $_SERVER).
So far so good.
However, when mod_rewrite is used to rewrite the URL that is called by OpenAthens, the metadata (ie., the environment variables from OpenAthens contained in the HTTP header) is stripped out. I have shown this in side-by-side testing: directly calling a PHP script (metadata present) vs rewriting the URL to the exact same PHP (metadata stripped, but other values e.g. cookies present and unchanged). The values that are stripped out have very long values (too long to sociably paste here - more than 100k) - that's the only potential problem I can see. The values are correctly URL encoded.
I have tried setting things like LimitRequestFieldSize and LimitRequestLine in Apache but they don't have any effect, so I think the problem must lie with mod_rewrite.
So the question, essentially, is:
How can I keep very long values intact in the HTTP header while still using mod_rewrite?
The current solution I have is not great, I have had to do this (httpd.conf fragment from VirtualHost section):
# /discovery is the URL called by OpenAthens to supply us IdP metadata
RewriteCond %{REQUEST_URI} ^/discovery [NC]
RewriteRule .* - [L]
# ... other rewrites here to send (nearly) everything else to index.php ...
ErrorDocument 404 /index.php
This way, index.php receives the "/discovery" request and lo-and-behold the lengthy values in $_SERVER are present and correct, although a 404 is triggered, which needless to say is ugly and hacky.
What I can't do is simply send the output from OpenAthens directly to a valid page (e.g., discovery.php) because the metadata is needed to populate a login form that has to exist within the PHP framework being used - which has to start off with index.php.
(In case it matters: this is on CentOS 5.6 / Apache 2.2.3)

As someone who's used OpenAthensSP quite a bit, I know that the data is passed in the Apache sub-process environment, not the HTTP header - it never goes to the user's client. This also explains why LimitRequestFieldSize and LimitRequestLine don't have any effect - they only apply to the HTTP request header. I suspect what's happening is that your rewrite rules are interfering with the request in some way. If they're creating an internal request, you might have better luck using the apache_getenv function in PHP rather than relying on $_SERVER variables.

Related

Redirect any GET request to a single php script

After many hours messing with .htaccess I've arrived to the conclusion of sending any request to a single PHP script that would handle:
Generation of html (whatever the way, includes or dynamic)
301 Redirections with a lot more flexibility in the logic (for a dumb .htaccess-eer)
404 errors finally if the request makes no sense.
leaving in .htaccess the minimal functionality.
After some tests it seems quite feasible and from my point of view more preferable. So much that I wonder what's wrong or can go wrong with this approach?
Server performance?
In terms of SEO I don't see any issue as the procedure would be "transparent" to the bots.
The redirector.php would expect a query string consisting on the actual request.
What would be the .htaccess code to send everything there?
I prefere to move all your php files in a other directory and put only 1 php file in your htdocs path, which handle all requests. Other files, which you want to pass without php, you can place in that folder too with this htaccess:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ /index.php/$0 [L]
Existing Files (JPGs,JS or what ever) are still reachable without PHP. Thats the most flexible way to realize it.
Example:
- /scripts/ # Your PHP Files
- /htdocs/index.php # HTTP reachable Path
- /htdocs/images/test.jpg # reachable without PHP
- /private_files/images/test.jpg # only reachable over a PHP script
You can use this code to redirect all requests to one file:
RewriteEngine on
RewriteRule ^.*?(\?.*)?$ myfile.php$1
Note that all requests (including stylesheets, images, ...) will be redirected as well. There are of course other possibilities (rules), but this is the one I am using and it will keep the query string correct. If you don't need it you can use
RewriteEngine on
RewriteRule ^.*?$ myfile.php
This is a common technique as the bots and even users only see their requested URL and not how it is handled internally. Server performance is not a problem at all.
Because you redirect all URLs to one php file there is no 404 page anymore, because it gets cached by your .php file. So make sure you handle invalid URLs correctly.

Rewrite urls using Apache PHP

I have a number of urls that are stored in a database, and thus instead of adding a rewrite rule in .htaccess for every url, I am using the following code in htaccess to give the control to PHP code through the following ReWrite rule in Apache:
RewriteRule ^.*$ ./index.php
A url mentioned in the database, has a corresponding original url. Though, the tricky situation comes when I have to serve the content of the url fetched from DB by the corresponding original url for which the ReWrite rules are written in .htaccess. One solution is to implement the same rewrite rules for the url fetched from DB in PHP as written in Apache for the original url, however, the number of such original urls is huge.
Thus would be glad to know about a solution if possible which can make execution flow through the ReWrite rules mentioned in Apache after the processing inside PHP is complete.
If you have access to the main httpd.conf you could use a RewriteMap written in PHP.
Other than that, there is no way you can give control from PHP back to Apache so Apache can process it further, not in the same request anyway. You could do a 30x rewrite from PHP to let Apache work on the next request.
Basic rewriting, rather create an apache rule to redirect all 404 errors to a php file, which will be your url handler, using the requested url do a lookup in your list of urls in your database, return the original url to your handler, from there either do a redirect or fetch the page contents server-side/ajax the page/iframe whichever you perfer, if the requested url is not in your list display your custom 404 page, this kills two birds.
Setting up a 404 page:
http://www.404-error-page.com/404-create-a-custom-404-error-page.shtml

Hide url and redirection implications for AJAX POST?

I'm creating a website with php backend. I have a directory called /inc/ which contains php include files that get included when generating html webpages.
If a user tries to request any files in the /inc/ directory (by url in their browser, for example), I've made it so they get redirected to the home page. I did this in an attempt to ensure that none of these files get called externally.
I have need to call one of these files via jQuery POST request.
Here is my question:
1) Can I somehow hide the url of the file requested in the POST?
2) Will a POST to a file in the /inc/ folder via jQuery fail, since external requests for files in the /inc/ folder get redirected to the home page? Or does the server make a distinction between POST requests, and other types of requests?
3) (OPTIONAL) How can I ensure that the POST request is done "legitimately", as opposed to a bot trying to crash my server by issuing thousands of simultaneous post requests?
Not without using a link somewhere, somehow. Remind yourself that jQuery / Ajax / XHTTPRquests / Xm... anything pointing outwards has to point outwards. There will always be an url, and it will always be traceable.
Options to make sure your url is less traceable
Create a page for your javascript calls (hides away, but doesn't really do anything)
Edit .htaccess options and use it to process javascript requests
Edit .htaccess options and a php page for server-side processing of javascript
I'll be going over option 3
Example (includes 2.!)
#Checks if the request is made within the domain
#Edit these to your domain
RewriteCond %{HTTP_REFERER} !^.*domain\.com [NC]
RewriteCond %{HTTP_REFERER} !^.*domain\.com.*$ [NC]
#Pretends the requested page isn't there
RewriteRule \.(html|php|api.key)$ /error/404 [L]
#Set a key to your 'hidden' url
#Since this is server-based, the client won't be able to get it
#This will set the environment variable when a request is made to
#www.yourwebsite.com/the folder this .htaccess is in/request_php_script
SetEnvIf Request_URI "request_php_script" SOMEKINDOFenvironmentNAME=http://yourlink.com
#Alternatively set Env in case your apache doesn't support it
#I use both
SetEnv SOMEKINDOFNAME request_php_script
#This will send the requester to the script you want when they call
#www.yourwebsite.com/the folder this .htaccess is in/request_php_script
RewriteCond %{REQUEST_URI} request_php_script$ [NC]
#if you don' want a php script to handle javascript and benefit the full url obfuscation, write the following instead
#RewriteRule ^.*$ /adirectscript.php [L]
RewriteRule ^.*$ /aredirectscript.php [L]
#and yes this can be made shorter, but this works best if folders and keys in your ENV are similar in some extend
In this case could call a php script that redirects you to the right page, but if everything is internal, then I don't see the reason why you would hide away the url to your scripts. If you have set the .htaccess as shown, only your page can access it. Users and external sources aren't able to reach it, as they'll be redirected.
If your scripts refer to an external API key however, then this might be useful, and you could call the redirect script
<?php
echo file_get_contents(getEnv("SOMEKINDOFNAME"));
?>
Now when this script is called, it'll return your contents. If you want to load in pages, you can call something described like here this instead
getting a webpage content using Php
To make full use of this, you have to set your jQuery POST method to POST at www.yourwebsite.com /the folder this .htaccess is in/request_php_script.php
If 1 and 2 are done properly, like above, you shouldn't have to worry from bots from the outside trying to reach your .php scripts
Sidenote:
You can skip the extra php script, but you'll be traceable in the .har files. Which means that in the end, your url is still reachable somewhere. Using the extra php script (give it query parameters for convenience) will obfuscate the url enough to make it dead hard to find. I've used this way to hide away requests to an external API key
TLDR:
set .htaccess server environment variable for page
set .htaccess rewrite condition to page to be redirected to page if
originally from my page
called by script
set javascript to call specified page
Medium 'secure'
cannot be found in script
cannot be traced back without saving har files
create php file to return page contents
set .htaccess to point to php instead of page
Highly 'secure'
cannot be found in script
cannot be traced back, even when saving har files
1) No.
2) Depends on how you handle redirects, best way is to try and see.
3) Not an easy task in general. Simple approach is to detect same client and limit request rate. No way to detect a bot in general only by request data.
As for your last comment, you can restrict access to thouse files with .htaccess without need for redirects. However you still won't be able to get them with AJAX. The only real reason to hide something is if there is some sensitive information inside like, passwords, logins, something like that. Overwise it's doesn't really matter, nobody is interested in some hidden utulity files.

Reasons not use Apache's error handler as a default mechanism?

Typical web-applications call a default single server object (e.g. a PHP script) each time a request comes in. In case Apache fails to find an applicable script or resource, Apache tries to deliver an error page.
Alternatively, one may design an web-app in such a way, that no scripts or resources exist in the vHost's htdocs/root directory. Thus, each request would force Apache to deliver an error page.
If we define a server-side script as the standard error handler, any URL will trigger the script. Thus, the single script would be the single point of action.
Is anybody aware of reasons, why this approach is wrong?
It seems that the page called by the ErrorHandler statement doesn't have access to form data such as $_GET, $_POST, $_REQUEST, $_COOKIE (at least in my install of Apache/PHP).
After daring a journey into the pit of hell that is Apache mod_rewrite I eventually escaped with the following incantation, which seems to work for me:
<Directory /same/as/document/root/>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule /* index.php
</Directory>
The first line enables mod_rewrite.
The second line sets the rewrite base to the document root, i.e. so that URLs of all subdirectories are processed.
The third line sets a condition such that the rule is only activated if the requested filename ("%{REQUEST_FILENAME}") doesn't exists ("!-f").
The fourth line matches all paths starting with a "/" and redirects them to index.php.
Disclaimer: I know very, very, little about both Perl regular expressions and Apache mod_rewrite.
Just beware that the user may type in a URL path of the form:
/some_directory/ or /some_directory
so you may have to handle both cases.
My guess is that this is a penalty for the webserver.
Each time a resource is requested, it searches the filesystem for a file and if it doesn't find one, the error handler script is searched and run.
And if for some reason PHP fails, you do not get any error pages anymore and Apache will log something like an error occurred and another one while handling the error in the error handler.
You could do it, but you'd probably have to jump through some hoops to get the original URL, and to avoid sending HTTP error codes. If your goal is to use PHP for all requests, you'd probably be better off using mod_rewrite.

PHP not obeying my defined ETags

What I'm doing
I'm pulling an image from the database and sending it to the browser with all the proper headers - the image displays fine. I also send an ETag header, using the SHA1 of the image's content as the tag.
The images are getting called semi regularly, so caching is a bit of an issue (won't kill the site, but nice to have).
The Problem
$_SERVER['HTTP_IF_NONE_MATCH'] is not available to me. As far as I can tell, this is because of PHP's "disobey the cache controls" life style. I can't mess with the session cache limiter, because I don't have access. But, even if I did have access, I wouldn't want to touch it: 99% of the site is under WordPress.
The Environment
PHP 4 (don't ask)
Apache 2.2
WordPress
The images live in the database (largeblog), which I can't change.
Any guidance, tip/tricks, etc. would be helpful. I don't have much room to change the environmental/structural stuff.
Cheers.
Have you tried reading HTTP_IF_NONE_MATCH from apache_request_headers()?
If you are running pre-4.3 php, it was called getallheaders() before.
Edit
I now see, in the page I linked, that you may also want to try to put
RewriteEngine on
RewriteRule .* - [E=HTTP_IF_MODIFIED_SINCE:%{HTTP:If-Modified-Since}]
RewriteRule .* - [E=HTTP_IF_NONE_MATCH:%{HTTP:If-None-Match}]
in the appropriate .htaccess file to force Apache to set the PHP $_SERVER[...] variables you're unsuccessfully trying to read.
If PHP is not receiving The If-None-Match match header, there's not much you can do. I don't know what you mean by "PHP's "disobey the cache controls""; PHP generates arbitrary dynamic on-the-fly, it cannot, a priori, know whether what it is to return is cached by the client or not.
Anyway, you should investigate whether the client is in fact SENDING the header. If it is, but it's not reaching PHP, check whether it's reaching Apache. If it's reaching PHP but not Apache, you could always hack some solution with mod-rewrite, like adding the header as query string (not tested!):
RewriteCond %{HTTP_IF_NONE_MATCH} (.+)
RewriteRule ^/get_image.php /get_image.php?if-none-match=%1 [B,QSA]

Categories