I have a database in UTF8 unicode ci collation that stores values with special chars like:
oukaïmeden
I want to be able to form urls like:
example.com/oukaïmeden
or
example.com/index.php?id=oukaïmeden
In index.php I set the HTML charset as UTF8 (not that it matters pre output), and save the PHP file as UTF8 unicode ci.
However, no matter what I do, I cannot see to get the string in the form oukaïmeden so I can use it to search the database.
$aparams = explode("/", $_SERVER["REQUEST_URI"]);
extract($_GET);
$id = utf8_decode($aparams[1]);
echo $id;
echo urldecode($id);
echo utf8_decode($id);
echo utf8_encode($id);
I get values like:
ouka%C3%AFmeden
oukaïmeden
I thought my question was "how can I get the string to show the umlaut so I can use it to search/compare etc?" But actually I wonder if I should be searching differently as well?
URLs cannot contain non-ASCII characters. The URL must look like this first and foremost to be correct:
example.com/index.php?id=ouka%C3%AFmeden
That's the correct percent-encoded representation of the UTF-8 encoded word "oukaïmeden". The browser may or may not show this as "oukaïmeden" in your address bar, but the actual URL must be as above.
In PHP, reading this from $_GET will give you the already decoded value. So, to get the UTF-8 encoded string in your PHP script:
$id = $_GET['id'];
Yup, that's it. Nothing more needed.
Here is the output using the so the correct to use is the "utf8_encode"
$id = $_GET['id'];
$id = utf8_decode($id);
echo $id."<br />";
echo urldecode($id)."<br />";
echo utf8_decode($id)."<br />";
echo utf8_encode($id)."<br />";
ouka�meden
ouka�meden
ouka?meden
oukaïmeden
Related
I have DB with Url's.
For example, my url is
https://besplatka.ua/?prop[161][from]=1&prop[161][to]=3&prop[136][to]=20000¤cy=USD
When i use this PHP code
$result = mysqli_query($mysqli, "SELECT url FROM urls WHERE id=5");
while($res = mysqli_fetch_array($result))
{
$my_url=$res['url'];
}
echo $my_url;
I see that the php page does not display the correct value. Encoding everywhere is UTF-8.
https://besplatka.ua/?prop[161][from]=1&prop[161][to]=3&prop[136][to]=20000¤cy=USD
What does this symbol ¤ mean? How do I fix the error?
After some search and try I found that the error is not about encoding UTF8 but the & symbol with curren word become this ¤.
(To get information about symbols: https://dev.w3.org/html5/html-author/charref)
So you can fix this by using urlencode function or just put your variable on the first of URL.
Result: https://besplatka.ua/?currency=USD&prop[161][from]=1&prop[161][to]=3&prop[136][to]=20000
I hope that can help you.
For some reason my special characters got encoded as the following string in a mysql database:
Ã?
Which shows up as:
Ã?
But actually should show up as:
Ö
What went wrong here? I use UTF-8 everywhere.
How can I fix this without recreating all content?
I executed the following in PHP:
<?php
echo str_replace("&", "&", htmlentities("Ö", 0, "ISO-8859-1")) , '<br />';
echo str_replace("&", "&", htmlentities("Ö", 0, "UTF-8")), "</br>";
?>
The str_replace is just there to reveal any HTML mnemonics, which would otherwise
be translated by the browser to the original character, which I don't want to happen.
You will get this as output:
�
Ö
You'll recognise the first value as what you found in the database, and the second one
is a bit like you wanted it to be.
Add to this the fact that the default value for the third argument to htmlentities
depends on your PHP version and is ISO-9959-1 in the case of version 5.3, the one you use.
Also realise that HTML documents which do not specify a character encoding will
by default post form data in ISO-8859-1 format.
Combining all this might give a clue about the cause of your problem:
My guess is that the data is correctly posted as UTF-8 to the server, but then htmlentities interprets this as a non-UTF-8, single byte encoding, and so turns one, multi-byte character into two single byte characters.
Now to the measures to take that this does not continue to happen:
First make sure that your HTML form has the UTF-8 encoding, because this determines the
default encoding that a form will use for sending its data to the server:
<head>
<meta charset="UTF-8">
</head>
Make sure this is not overruled by another encoding in the form tag's accept-charset
attribute.
Then, skip the htmlentities call. You should not turn characters into their
HTML mnemonic when storing them in the database. MySql
supports UTF-8 characters, so just store them like that.
For the second question, you'll have to find all cases and bulk replace them as you find
new instances. You could get get a little help by producing some SQL statements
with a PHP script like the following:
<?php
// list all your non-ASCII characters here. Do not use str_split.
$chars = ["Ö","õ","Ũ","ũ"];
foreach ($chars as $ch) {
$bad = str_replace("&", "&", htmlentities($ch, 0, "ISO-8859-1"));
echo "update mytable set myfield = replace(myfield, '$bad', '$ch')
where instr(myfield, '$bad') > 0;<br />";
}
?>
The output of this script will look like this:
update mytable set myfield = replace(myfield, 'Ã�', 'Ö') where instr(myfield, 'Ã�') > 0;
update mytable set myfield = replace(myfield, 'õ', 'õ') where instr(myfield, 'õ') > 0;
update mytable set myfield = replace(myfield, 'Ũ', 'Ũ') where instr(myfield, 'Ũ') > 0;
update mytable set myfield = replace(myfield, 'Å©', 'ũ') where instr(myfield, 'Å©') > 0;
Of course, you could decide to make a PHP script that will even do the updates itself.
Hopefully you can use this information to fix the issues.
For PDO, use something like
$db = new PDO('dblib:host=host;dbname=db;charset=UTF-8', $user, $pwd);
Ã? is two or three things going wrong, not just one!
C396 is the utf8 hex for Ö or the latin1 hex for the two characters Ö. It requires something else to go wrong to get ? or the black diamond.
Let's see what is in the table; do
SELECT col, HEX(col) FROM tbl WHERE ...
(If you have already done the previously suggested replace(), then the table may be in an even worse mess. Or it might be fixed.)
I have an android app that uses a URL Connection. The latter part of the URL string is;
./upload_data.php?id=SC1495&image=%3FPNG%0D%0A%1A%0A%00%00%00%0DIHDR%00%00%02X%00%00%01%15%08%02%00%00%00%3F*%0C%3F%00%00%00%03sBIT%05%06%053%0B%3F%3F%00%00%01%3FIDATx%3F%3F%3F1%01%00%00%00%3F%3FOm%0D%0F%3F%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%00%3F7%3F%26%00%01%40%3F%27%13%00%00%00%00IEND%3FB%60%3F&imagename=bob&imagetime=20140806+121507
When I put this into the browser and use $_GET['image'], it returns the following;
?PNG IHDRX?*?sBIT3???IDATx???1??Om ??7?&#?'IEND?B?`
I am not decoding anything, I just want to get the string with all the %00 etc.
Can someone enlighten me as to why this would be happening?
$_GET['image'] is returning the correct string. The characters in the url are mainly %00 which is equal to NULL in URL encoding. If you want to keep the characters you can use rawurlencode()
$img = $_GET['image'];
echo rawurlencode($img);
Or just
echo urlencode($_GET['image'])
This will convert the characters you've got back to their url encoded form
remove special charactors
if ($_GET['image'] != ""){
$img = htmlspecialchars($_GET['image']);
echo "<h3>".$_GET['image']."</h3><br><img src='http://yoursite.com/images/".$_GET['image']."'>";
}
else {
exit ("No image here =(");
}
Edit: I think %27 is actually the wrong kind of quote. I am still stuck though, I cannot find a PHP function that does the conversion I want.
Edit (again): I found a solution where I stick %26rsquo%3bs into the URL and it turns into ’. It works so I posted it as an answer below but I'd still be interested in knowing how it'd be done with PHP functions.
I'm working on a website that uses a PHP tree as if it were a directory. For example, if someone types index.php?foo=visual programming (or index.php?foo=visual%20programming) then the website opens the item "Visual Programming" (I'm using strtolower()).
Another working example would be index.php?foo=visual programming&bar=animated path finder which opens "Animated Path Finder", a child of "Visual Programming".
The problem is that some of the items are named things like "Conway’s Game of Life" which uses a HTML entity. My guess of what someone should type to open this would be index.php?foo=visual%20programming&bar=conway%27s%20game%20of%20life. The problem is that ' is not === to ’.
What do I need to do to make this work? Here is my code that selects an item based on $_GET (the PHP is inside of <script type="text/javascript">):
<?php
function echoActiveDirectory($inTree) {
// Compare $_GET with PhpTree
$itemId = 0;
foreach ($_GET as $name) {
if ($inTree->children !== null) {
foreach ($inTree->children as $child) {
if (strtolower($child->title) === strtolower($name)) {
$itemId = $child->id;
$inTree = $child;
break;
}
}
}
}
// Set jsItems[$itemId].selected(), it will be 0 if nothing was found
echo "\t\tjsItems[".$itemId."].selected();\n";
}
echo "// Results of PHP echoActiveDirectory(\$root)\n";
echoActiveDirectory($root);
?>
The website is a work in progress, but it can be tested here to see $_GET working: http://alexsimes.com/index.php
The hex code %27 (39 decimal) will never translate to ’, since it is a completely different entity (Wikipedia). It could be translated to ', but PHP doesn't do that (although I don't know the reason for that).
Edit
While there is no standard for URL-encoding multibyte character sets, PHP will treat a string as just a set of bytes, and if those match an UTF-8 sequence, it will work:
php -r 'echo htmlentities(urldecode("%E2%80%99"), ENT_QUOTES|ENT_HTML401);'
should output
’
You can use html_entity_encode() and html_entity_decode() PHP functions to convert those characters to html entities or decode them back to desired characters before comparison.
You can try the htmlentities function to convert special characters to corresponding html entity. But in your case if data is already stored in db as html entity form, the data from $_GET parameter must be first passed through htmlentities before using it in your query.
My Browser shows URL with file name as
http://www.example.com/pdf/204177_20090604_Chloorhexidine_DCB_oogdruppels_0%2C1%25.pdf
Actual File name is 204160_20090604_Atropine_DCB_oogdruppels_0,5%.pdf
After urldecode, it gives wrong file name as
http://www.example.com/pdf/204177_20090604_Chloorhexidine_DCB_oogdruppels_0,1%.pdf
Update:
Initially I thought that its problem of URL Decode, but files like name 204153_20090605_Aluminiumacetotartraat_DCB_oordruppels_1,2%.pdf while rendering in browser throws Bad request. I am using Kohana 3 framwork. Is it related with server?
$url = 'http://204160_20090604_Atropine_DCB_oogdruppels_0,5%.pdf';
$encode = urlencode($url);
$decode = urldecode($encode);
echo $url."<br />";
echo $encode."<br />";
echo $decode."<br />";
// outputs
http://204160_20090604_Atropine_DCB_oogdruppels_0,5%.pdf
http%3A%2F%2F204160_20090604_Atropine_DCB_oogdruppels_0%2C5%25.pdf
http://204160_20090604_Atropine_DCB_oogdruppels_0,5%.pdf
All ok. You're error is somewhere else.
You are looking at two different files.
It's not possible to urlencode 204160_20090604_Atropine_DCB_oogdruppels_ into 204177_20090604_Chloorhexidine_DCB_oogdruppels_, encoding does not change alphabetical characters.
The error is most likely in the code that creates the file list and outputs the links; the mapping between link titles and filenames appears to be messed up.
this will give you exact file name m using c#
Server.UrlDecode("http://www.example.com/pdf/204160_20090604_Atropine_DCB_oogdruppels_0,5%25.pdf")
, (comma) is encoded as %2c
% (percent) is encoded as %25 by browsers
if you use Request.Url it'll decode ,(comma) but not %(percent)
So Server.UrlDecode("xyz") decode all characters except %(percent), thats y there's "%25" in the above filename