We have a PHP site on Zend Framework with a backend Postgresql database. Our primary character encoding is UTF-8.
I just checked our error log and found a strange entry. My URL is as follows:
www.mydomain.com/schuhe-für-breite-füsse
however someone (or maybe a bot) has tried to access this URL as follows:
www.mydomain.com/schuhe-f\xc3\xbcr-breite-f\xc3\xbcsse/
It's the first time I've seen something like the above. Two things are happening on my page:
1) The above URL is queried against our CMS. This works fine for some reason, I think Postgresql reaslises it is byte-encoded and then converts it back when tried to find this SEF URL in our database.
2) An Ajax request is made on the page, passing the same SEF URL. This fails. I believe the slashes are causing a problem on Javascript.
To avoid this I want to decode any URL that is encoded like this. However a quick test of the following code did not decode anything for me :(
$landing_sef_url = $this->_getParam('landing_sef_url');
$utf8=html_entity_decode($landing_sef_url);
$iso8859=utf8_decode($utf8);
$test3 = html_entity_decode($landing_sef_url, 1, "ISO-8859-1");
$test4 = urldecode($landing_sef_url);
echo utf8_decode("$landing_sef_url");
echo "<br/><br/>";
die($landing_sef_url . " -- $utf8 -- $iso8859 <br/>$test3<br/>$test4");
I found the above via various posts online but they all print back the same result - schuhe-f\xc3\xbcr-breite-f\xc3\xbcsse
Any help would be MUCH appreciated. Many thanks!
This method seems to do what you're looking for:
http://li.php.net/manual/en/function.stripcslashes.php
But if you're just looking to unescape \x## sequences, you could also do this with a fairly simple regular expression.
Related
Apologies if there is an obvious answer (and I know there are about 1000 of these similar questions) - but I have spent two days trying to attack this without success. I cannot seem to crack why I get a null response...
Short background: the following works just fine
$xurl= new SimpleXMLElement('https://gptxsw.appspot.com/view/submissionList?formId=GP_v7&numEntries=1', NULL, TRUE);
$keyname = $xurl->idList->id[0];
echo $keyname;
this provides a response: a unique key like uuid:d0721391-6953-4d0b-b981-26e38f05d2e5
however I try a similar request (which ultimately would be based on first request) and get a failure. I've simplified code as follows...
$xdurl= new SimpleXMLElement('https://gptxsw.appspot.com/view/downloadSubmission?formId=GP_v7[#version=null%20and%20#uiVersion=null]/GP_v7[#key=uuid:d0721391-6953-4d0b-b981-26e38f05d2e5]', NULL, TRUE);
$keyname2 = $xdurl->data->GP_v7->SDD_ID_N[0];
echo $keyname2;
this provides null. And if I try something like
echo $xdurl->asXML();
I get an error response from the site (not from PHP).
Do I need to eject from SimpleXMLElement for the second request? I've read about using XPath and about defining the namespace, but I'm not sure that either would be required: the second file does have two namespaces but one of them isn't used and the other has no prefix for elements. Plus I have tried variations of those - enough to think that my problem/error is either more global in nature (or oversight due to inexperience).
For purposes of this request I have no control over the formatting of either XML file.
Here we go: SimpleXMLElement seems to re-escape (or incorrectly handle in some way) already url-escaped characters like white spaces. Try:
$xdurl= new SimpleXMLElement('https://gptxsw.appspot.com/view/downloadSubmission?formId=GP_v7[#version=null and #uiVersion=null]/GP_v7[#key=uuid:d0721391-6953-4d0b-b981-26e38f05d2e5]', NULL, TRUE);
$keyname2 = $xdurl->data->GP_v7->SDD_ID_N[0];
echo $keyname2;
and you should be fine.
(FYI: I debugged this by manually creating a local copy of the XML request result named "foo.xml" which worked perfectly.)
Thanks to #Matze for getting me on right track.
Issue is that URL has special characters that SimpleXMLElement cannot parse without help.
Solution: add urlencode() command like the following
$fixurl = urlencode('https://gptxsw.appspot.com/view/downloadSubmission?formId=GP_v7[#version=null and #uiVersion=null]/GP_v7[#key=uuid:d0721391-6953-4d0b-b981-26e38f05d2e5]');
$xdurl= new SimpleXMLElement($fixurl, NULL, TRUE);
$keyname2 = $xdurl->data->GP_v7->SDD_ID_N[0];
echo $keyname2;
this provided the answer (in this case 958)
I'm writing PHP code that uses a database. To do so, I use an array as a hash-map.
Every time content is added or removed from my DB, I save it to file.
I'm forced by my DB structure to use this method and can't use mysql or any other standard DB (School project, so structure stays as is).
I built two functions:
function saveDB($db){
$json_db = json_encode($db);
file_put_contents("wordsDB.json", $json_db);
} // saveDB
function loadDB(){
$json_db = file_get_contents("wordsDB.json");
return json_decode($json_db, true);
} // loadDB
When echo-ing the string I get after the encoding or after loading from file, I get a valid json (Tested it on a json viewer) Whenever I try to decode the string using json_decode(), I get null (Tested it with var_dump()).
The json string itself is very long (~200,000 characters, and that's just for testing).
I tried the following:
Replacing single/double-quotes with double/single-quotes (Without any backslashes, with one backslash and three backslashes. And any combination I could think of with a different number of backslashes in the original and replaced string), both manually and using str_replace().
Adding quotes before and after the json string.
Changing the page's encoding.
Decoding without saving to file (Right after encoding).
Checked for slashes and backslashes. None to be found.
Tried addslashes().
Tried using various "Escape String" variants.
json_last_error() doesn't work. I get no error number (Get null, not 0).
It's not my server, so I'm not sure what PHP version is used, and I can't upgrade/downgrade/install anything.
I believe the size has something to do with it, because small strings seem to work fine.
Thanks Everybody :)
In your JSON file change null to "null" and it will solve the problem.
Check if your file is UTF8 encoded. json_decode works with UTF8 encoded data only.
EDIT:
After I saw uploaded JSON data, I did some digging and found that there are 'null' key. Search for:
"exceeding":{"S01E01.html":{"2217":1}},null:{"S01E01.html":
Change that null to be valid property name and json_decode will do the job.
I had a similar problem last week. my json was valid according to jsonlint.com.
My json string contained a # and a & and those two made json_decode fail and return null.
by using var_dump(json_decode($myvar)) which stops right where it fails I managed to figure out where the problem was coming from.
I suggest var_dumping and using find dunction to look for these king of characters.
Just on the off chance.. and more for anyone hitting this thread rather than the OP's issue...I missed the following, someone had htmlentities($json) way above me in the call stack. Just ensure you haven't been bitten by the same and check the html source.
Kickself #124
I have the nth problem encoding related with PHP!
so the story is:
i read a url from a file (ISO-8859). I cant change the encoding of this file for various reason I wont discuss here.
I use that url to make a call to a rest webservice.
the url happens to contain the symbol "è" which is conveted to � when it is loaded by the PHP engine.
as a result the webservice returns and unexpected result because what it gets is actually the word "perch�" instead of "perchè".
I tried to force php to work with ISO-8859 by doing:
ini_set('default_charset', "ISO-8859");
The problem is that it still doesn't work and the webservice doesn't answer properly. I am sure that the webservice works as I tried to copy paste the url by hand in a browser and I received the expected data.
You can convert data from one character set into another using iconv().
Your REST web service is most likely expecting UTF-8 data, so you would have to do something like this:
$data = iconv("iso-8859-1", "utf-8", $data);
before sending the request.
My previous question and this question both are related a bit. please have a look at my previous question I did not found any other way to unserialize the data so coming with the string operation
I am able to get the whole content from file but not able to get the specific string from this content.
I want to search a specific string from these content but function stop working when the reach at first special character in the string. If I am searching something found before the special character the works properly.
String operation function of PHP not working properly when the encounter first special character in the string and stop processing immediately, Hence they does not give me the correct output.
Originally they looks like (^#)
:"Mage_Core_Model_Message_Collection":2:{s:12:"^#*^#_messages";a:0:{}s:20:"^#*^#_lastAddedMessage";N;}
but when I did echo they are display as ?
Here is the code what I tried
$file='/var/www/html/products/var/session/sess_ciktos8icvk11grtpkj3u610o3';
$contents=file_get_contents($file);
$contents=htmlspecialchars($contents);
//$contents=htmlentities($contents);
echo $contents;
$restData=strstr($contents,'"id";s:4:"');
echo $restData;
$id=substr($restData,0,strpos($restData,'"'));
echo $id;
I changed the default_charset to iso-8859-1 and also utf-8 but not working with both
Please let me know How I can resolve this.
Thanks.
These characters that you see as ^# are actually null bytes. They don't have any proper display, neither they are meant to be displayed - it's an internal representation of protected properties in the engine. You're not supposed to mess with them.
As for resolving, it'd be nice to know what kind of resolution you seek - what result are you trying to achieve?
I know that I should encodeURI any url passed to anything else, because I read this:
http://www.digitalbart.com/jquery-and-urlencode/
I want to share the current time of the current track I am listening to.
So I installed the excellent yoururls shortener.
And I have a bit of code that puts all the bits together, and makes the following:
track=2&time=967
As I don't want everyone seeing my private key, I have a little php file which takes the input, and appends the following, so it looks like this:
http://myshorten.example/yourls-api.php?signature=x&action=shorturl&format=simple&url=http://urltoshorten?track=2&time=967
So in the main page, I call the jquery of $("div.shorturl").load(loadall);
It then does a little bit of CURL and then shortener returns a nice short URL.
Like this:
$myurl='http://myshorten.example/yourls-api.php?signature=x&action=shorturl&format=simple&url=' . $theurl;
$ch = curl_init($myurl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);
if ($data === false) {
echo 'cURL failed';
exit;
}
echo $data;
All perfect.
Except... the URL which is shortened is always in the form of http://urltoshorten?track=2 - anything after the ampersand is shortened.
I have tried wrapping the whole URL in php's URLencode, I've wrapped the track=2&time=967 in both encodeURI and encodeURIComponent, I've evem tried wrapping the whole thing in one or both.
And still, the & breaks it, even though I can see the submitted url looks like track=1%26time%3D5 at the end.
If I paste this or even the "plain" version with the unencoded url either into the yoururls interface, or submit it to the yoururls via the api as a normal URL pasted into the location bar of the browser, again it works perfectly.
So it's not yoururls at fault, it seems like the url is being encoded properly, the only thing I can think of is CURL possibly?
Now at this point you might be thinking "why not replace the & with a * and then convert it back again?".
OK, so when the url is expanded, I get the values from
var track = $.getUrlVar('track');
var time = $.getUrlVar('time');
so I COULD lose the time var, then do a bit of finding on where the * is in track and then assume the rest of anything after * is the time, but it's a bit ugly, and more to the point, it's not really the correct way to do things.
If anyone could help me, it would be appreciated.
I have tried wrapping the whole URL in php's URLencode
That is indeed what you have to do (assuming by ‘URL’ you mean inner URL being passed as a component of the outer URL). Any time you put a value in a URL component, you need to URL-encode, whether the value you're setting is a URL or not.
$myurl='http://...?...&url='.rawurlencode($theurl);
(urlencode() is OK for query parameters like this, but rawurlencode() is also OK for path parts, so unless you really need spaces to look slightly prettier [+ vs %20], I'd go for rawurlencode() by default.)
This will give you a final URL like:
http://myshorten.example/yourls-api.php?signature=x&action=shorturl&format=simple&url=http%3A%2F%2Furltoshorten%3Ftrack%3D2%26time%3D967
Which you should be able to verify works. If it doesn't, there is something wrong with yourls-api.php.
I have tried wrapping the whole URL in php's URLencode, I've wrapped the track=2&time=967 in both encodeURI and encodeURIComponent, I've evem tried wrapping the whole thing in one or both. And still, the & breaks it, even though I can see the submitted url looks like track=1%26time%3D5 at the end.
Maybe an explanation of how HTTP variables work will help you out.
If I'm getting a page with the following variables and values:
var1 = Bruce Oxford
var2 = Brandy&Wine
var3 = ➋➌➔ (unicode chars)
We uri-encode the var name and the value of the var, ie:
var1 = Bruce+Oxford
var2 = Brandy%26Wine
var3 = %E2%9E%8B%E2%9E%8C%E2%9E%94
What we are not doing is encoding the delimiting charecters, so what the request data will look like for the above is:
?var1=Bruce+Oxford&var2=Brandy%26Wine&var3=%E2%9E%8B%E2%9E%8C%E2%9E%94
Rather than:
%3Fvar1%3DBruce+Oxford%26var2%3DBrandy%26Wine%26var3%3D%E2%9E%8B%E2%9E%8C%E2%9E%94
Which is of course just gibberish.