Problems with special characters in url - php

I am working on a client project and I am trying to improve the process of passing variables from a url to php. The url structure of the project looks like the following:
http://xyz.com -> Domain
http://xyz.com/folder -> Folder/File
http://xyz.com/doesnotexist -> Folder/File does not exist
-> Pass it as a parameter to index.php Script
htaccess Rules take this parameter "doesnotexist" and make it available in a $_GET variable in index.php.
The variable gets encoded in javascript with encodeURIComponent, the url can be called in a browser and decoded in php with urldecode. This works perfectly.
Now to my problem: When the passed variable contains special chars like a slash "/" or an ampersand "&" it does not work anymore, because the browser thinks he is searching for a subdirectory. e.g. variable: "does/notexist" -> Browser tries to open http://xyz.com/does/notexist. At the moment I'm replacing such characters like a slash with others that are no problems in a url before encoding. So I replace "/" with "," or "&" with ";", encode it and everything is fine. In my php script I decode it and replace "," with "/" and ";" with "&" and so one. This works, but is really ugly, so I am searching for a better way to do it.
The initial url structure can not be changed. Does anyone know a better way to do this? I'm stuck here. One idea would be to base_encode the whole url parameter, but this is not the way I want it, because the url should be readable.

Thid is a typical situation where you would use a .htaccess file.
\Use mod_rewrite.
from here: howto mod_rewrite every request to index.php except real files but exclude one real directory?
RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L]

Related

.htaccess Php rewrite url

I try to realize a system of rewriting URLs in .htaccess.
Then here is my goal:
If I have an url of this form: http://localhost/view.php?Id=456
Then I want to transform it to: http://localhost/456
I use this rule in htaccess:
RewriteRule ^ ([a-zA-Z0-9] +) $ view.php? Id = $ 1
Now this works very well!
But my problem I want to add points to id ie instead of 456 I can put: my.book
That is to say: http://localhost/my.book
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([a-zA-Z0-9\.]+)$ view.php?id=$1 [QSA,L]
You need RewriteCond %{REQUEST_FILENAME} !-f before the RewriteRule line to tell the server that the RewriteRule written below to be executed if the input passed in the URL is not an actual file. Because server searches for a file matching the input you pass in the URL and also it won't work in case you pass my.book in the URL since web server recognizes . as prefix for extension like .php or .html or like so and thereby it results in Not Found error if there is no file named my.book exists. So, you also need to escape . in the URL.
To allow .'s in the input, you need to add . with escape sequence \ in the character class group like ^([a-zA-Z0-9\.]+)$. Note, allowing this can result in escaping the extension in the URL, that is, passing view.php in the URL won't navigate to the actual file. Rather, it will be considered as a value in the query string.
Try this:
RewriteRule ^([a-zA-Z0-9\.]+)$ view.php?Id=$1
Basically what I did is I added \. with your pattern. This will make sure your regex matches any letter (small/caps), decimal numbers and periods (.). Hope this helps :)

Stop Apache Decoding URL Components

I don't know if the title says it or not, but basically, I am using .htaccess to pass on an entire URL to my PHP file using a URL.
Example: http://example.com/var1/var2/http://example.net/logo.png/image.png
In this case, logo.png would be put inside image.png by my code.
I have tried Javascript to encode the variable URL.
http://example.org/utilities/banner-generator/Testyz/Testy/http%3A%2F%2Fstatic.example.com%2Ffiles%2Favatar%2F1498768_1.png/banner.png
This is what the URL looks like but Apache still treats the encoded slashes as normal slashes.
Is it possible to stop it from doing this?
Trick is to use B flag with THE_REQUEST in condition.
RewriteEngine On
RewriteCond %{THE_REQUEST} \s/+utilities/banner-generator/([^/]+)/([^/]+)/(.+?)/banner\.png [NC]
RewriteRule ^ /include/image/banner.php?name=%1&description=%2&icon=%3 [NE,B,L,QSA]

URL encode in htaccess, maybe?

Consider the following scenario:
I want to be able to access http://www.example.com/word/hello/, where the word hello is variable. So I set up .htaccess to configure that.
RewriteEngine On
RewriteRule ^word/(.+)/?$ displayword.php?word=$1 [L]
I used .+ because I also want to filter any symbols such as ?+-.!;: etc.
And I set up my PHP file accordingly:
<?php
echo $_GET['word'];
?>
Remember that this is just a scenario. Now, I went to this URL: http://www.example.com/word/Are you ok?/, and the page outputted this:
Are you ok
And I couldn't figure out why. But then I realised that the question mark symbol is the starting point of the URL variables.
So is there a way to 'url encode' the question mark in the above example, in order for it to be displayed correctly?
There is no need to encode it, try this:
RewriteEngine On
RewriteRule ^word/([a-zA-Z0-9-=_.?]+)/?$ displayword.php?word=$1 [L]
It will display ? in the parameter and any other character you add to the [group]. I did not test if the rule works, though, but I suppose it does. Looks ok and that is not the question.
I don't know heaps about .htaccess files, but you could change your PHP script to use $_SERVER['PATH_INFO'] instead of $_GET or $_REQUEST.
Particularly, this comment might help you out.
In the HTTP protocol the "?" separates the querystring from the rest of the URL, so I don't think it will be possible to use it directly inside the URL. One solution would be to encode the question mark into %3F.
Then you can use string urldecode (string $str) to decode the string.
See this URL Encoding Reference for the encoding of other characters.
Change your code to this:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,}\s/+word/([^/]+) [NC]
RewriteRule ^ index.php?word=%1 [L,QSA]
Reason this works is because RewriteRule works on %{REQUEST_URI} which gets URI i.e. string before question mark ? however %{THE_REQUEST} works on the full URL that includes question mark ? as well.

PHP Obtain the URL as a variable

I have noticed that many websites use urls that end in
website.com/index.php?var="value"&var2="value2"
and I was wondering how I could make it so that instead of having that be the end of the URL have this instead:
website.com/value/value2
and then have it so that instead of searching for "/value/value2" inside of the servers root folder it would instead just open index.php and then inside the PHP coding have a function that would get what the URL is. Either as a string "/value/value2" or an array "value" "value2" it doesn't matter but just some way of getting those variables. This would be so that the URL could be cleaned up and easy to tell where you were in the website.
Also if there is a way of doing this would it be possible for style.php that is in the same folder as index.php (but has a PHP header setting it to output CSS) that would be called in the head of index.php using <link rel="stylesheet" type="text/css" url="style.php" /> or whatever the syntax for that is, to be able to obtain that same variable so that the css styling could be changed according to the URL.
You can use rewriting of urls in .htaccess file
Check this.
RewriteEngine on
RewriteRule ^([^/]+)/([^/]+)/([^/]+) /?var=$1&var2=$2 [L]
There are three parts to this:
RewriteRule specifies that this is a rule for rewriting (as opposed to a condition or some other directive). The command is to rewrite part 2 into part 3.
This part is a regex, and the rule will be run only if the URL matches this regex. In this case, it says - look for the beginning of the string, then a bunch of non-slash characters, then a slash, then another bunch of non-slash characters. then again bunch of non-slash characters, then a slash, then another bunch of non-slash characters. The parentheses mean the parts within the parentheses will be stored for future reference.
Finally, this part says to rewrite the given URL in this format. $1 and $2 refer to the parts that were captured and stored.
Refer Beginner's Guide to mod_rewrite.
Also tutorial for same.
You need to re write the URL.. if u are using apache you would have to add changes in the .htaccess file. Check this and this manual.
If using apache, enable mod_rewrite and use .htaccess
RewriteEngine on
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteRule ^(.*)$ /index.php [L]
If using nginx, use nginx_rewrite_module http://nginx.org/ru/docs/http/ngx_http_rewrite_module.html
And inside your index.php parse $_SERVER['REQUEST_URI'] variable, it will contain requested url.
This can be achieved easily. Everything after the question mark are called $_GET variables. So you can call $_GET['var'] or $_GET['var2'] to get their values.
For example. I have the URL: http://www.example.com?username=username&password=password
Now i can take that url and make it so:
<?php
$user = $_GET['username'];
$pass = $_GET['password'];
$newUrl = 'http://www.example.com/' . $user . '/' . $pass;
echo 'Link text here';
?>
This results in a formatted url based on $_GET variables: http://www.example.com/username/password

mod_rewrite: no ? and # in REQUEST_URI

What I'm trying to do:
have pretty URLs in the format 'http://domain.tld/one/two/three', that get handled by a PHP script (index.php) by looking at the REQUEST_URI server variable.
In my example, the REQUEST_URI would be '/one/two/three'. (Btw., is this a good idea in general?)
I'm using Apache's mod_rewrite to achieve that.
Here's the RewriteRule I use in my .htaccess:
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]
This works really well thus far; it forwards every REQUEST_URI that consists of a-z, A-Z or a '/' to /index.php, where it is processed.
Only drawback: '?' (question marks) and '#' (hash keys) seem to still be allowed in the REQUEST_URI, maybe even more characters that I've yet to find.
Is it possible to restrict those via my .htaccess and an adequate addition to the RewriteRule?
Thanks!
The fragment identifer, e.g. #some-anchor, is controlled by the browser, not the server. JavaScript would be needed to redirect and remove this, although why you would want to do so I am not sure.
[SNIPPED after clarification]
To rewrite only when the query string is empty:
RewriteCond %{QUERY_STRING} ^$
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]
In mod_rewrite and PHP the variable REQUEST_URI refers to two different part of the URI. In mod_rewrite, %{REQUEST_URI} contains the current URI path; in PHP, $_SERVER['REQUEST_URI'] contains the URI path and query. But in both cases the URI fragment as this part of the URI is not transmitted to the server but only used by the client.
So, when /one/two/three?foo#bar is requested, mod_rewrite’s %{REQUEST_URI} contains /one/two/three and PHP’s $_SERVER['REQUEST_URI'] contains /one/two/three?foo.
The $_SERVER['REQUEST_URI'] variable will contain the original REQUEST_URI as received by the server, before you perform the rewrite. Therefore it's impossible (as far as I know this early in the morning) to remove the query string portion from the REQUEST_URI's attribute, but you naturally have the option of removing it when you process the $_SERVER['REQUEST_URI'] variable in your script.
If you want to only perform your RewriteRule when the query string is not specified, the following should work:
RewriteCond %{QUERY_STRING} !^.+$
RewriteRule ^/?([a-zA-Z/]+)/?$ /index.php [NC,L]
Note that this might be problematic though, since if there's accidentally a query string in a URL that someone uses to link to your site, your script wouldn't be handling it (since the rewrite never happens), so they'll get a 404 response (or whatever the case may be) that might not be as user-friendly as if you had just chosen to silently ignore the trailing information.
If i understand, you want to forbid using of ? and # for your site?
You shouldn't do that, because:
hash (#) is used in AJAX URLs google specification,
question mark (?) is used for example in Google AdWords and Analytics or any Affiliation Program,
So if you force Apache to reject url request containing question mark, people who click on your Ad in AdWords will only see 404 error page.
There is nothing bad in letting people to use both of them. The case is to prevent your site against XSS attacks.
Btw. there is another very importand sign - percent (%) which is used to encode special chars (like Polish or German national letters)

Categories