Character Format in php - php

Sorry I can’t log in claim ID is having server issues (im normally Arthur Gibbs)
Data from my database currently outputs this when there are strange charecters...
This is just a example
What I get: De√ilscrat™
What I want: De√ilscrat™
It seems that some characters are being translated into character code by the other guys system..
So what I want to know is:
Is there a function that will expand charecter codes within a string?
Turning FUNCTION(De√ilscrat™) >>> De√ilscrat™.

This √ stuff looks like an HTML entity ; so, let's try de-entitying it...
This can be done using the html_entity_decode function, that's provided by PHP.
For instance, with the string you provided, here's a sample of code :
// So the browser interprets the correct charsert
header('Content-type: text/html; charset=UTF-8');
$input = 'De√ilscrat™';
$output = html_entity_decode($input, ENT_NOQUOTES, 'UTF-8');
var_dump($input, $output);
And the output I'm getting is this one :
string 'De√ilscrat™' (length=19)
string 'De√ilscrat™' (length=15)
(First one is the original version, and second one is the "decoded" version)
So, it seems to do the trick ;-)

Related

DOMDocument and UTF8. MySQL says: Incorrect string value

I am trying to load the meta description of this website (which has a German character) via the following script in PHP:
$page_content = file_get_contents($uri);
$dom_obj = new \DOMDocument();
$dom_obj->loadHTML(mb_convert_encoding($page_content, 'HTML-ENTITIES', 'UTF-8'));
However, while trying to write it into the MySQL db, Laravel says it ran into troubles trying to write that into the db: incorrect string value "\xC3" (which is the German character)
When I simply do the following, writing to the db works. But the character is not displayed correctly (ü instead of ü)
$dom_obj->loadHTML($page_content)
This problem only occurs with this website so far, others I tried with the same character do work. Can you think of a possible reason and fix? Thank you!
Edit:
It works fine, when I use PHPs "utf8_decode" to decode the meta description that I get via $dom_obj without mb_convert_encoding. When I do this, all other sites that worked before lead to errors (like this: Incorrect string value: '\xE4t')
I found the error. I was using substr to shorten the description. Apparently substr cut off one of those special characters and this is why it wasnt working.
foreach($dom_obj->getElementsByTagName('meta') as $meta) {
if($meta->getAttribute('name')=='description'){
substr($meta->getAttribute('content'), 0, 156);
This is a workaround:
mb_substr($foo,0,156,"UTF-8");

Æøå in returned JSON result - the data doesn't look like it's supposed to

I have fetched some data from a url request using JSON with the following code:
$url = 'https://recruit.zoho.com/ats/private/xml/JobOpenings/getRecords?authtoken=$at&scope=recruitapi';
$request = new WP_Http;
$result = $request->request($url, $data = array());
$input = json_encode($result, true);
var_dump($input);
This code worked absolutely fine, except the data coming out looked really weird, such as:
"content-encoding":"gzip","vary":"Accept-Encoding","strict-transport-security":"max-age=15768000"},"body":"\u003C?xml version=\"1.0\" encoding=\"UTF-8\" ?\u003E\n\u003Cresponse uri=\"\/ats\/private\/xml\/JobOpenings\/getRecords\"\u003E\u003Cresult\u003E\u003CJobOpenings\u003E\u003Crow no=\"1\"\u003E\u003CFL val=\"JOBOPENINGID\"\u003E\u003C![CDATA[213748000001263043]]\u003E\u003C\/FL\u003E\u003CFL val=\"Published in website\"\u003E\u003C![CDATA[false]]\u003E\u003C\/FL\u003E\u003CFL val=\"Modified by\"\u003E\u003C![CDATA
After some research, I realize that part of the problem most likely is the fact that there are æ, ø, and å in the data I'm requesting. Others have solved the problem this way:
$input = json_encode(utf8_decode($result), true);
However this gives me this error:
Warning: utf8_decode() expects parameter 1 to be string, array given in
I know the array is not a string, but how else do I deal with this? It seems to have worked for others, and I cant figure out why.
Thanks.
Edit:
I noticed this in the beginning of the printed data.
string(31486) "{"headers":{"server":"ZGS","date":"Wed, 12 Aug 2015 13:59:32 GMT","content-type":"text\/xml;charset=utf-8"
Does that mean it is already UTF-8 and I'm totally off?
What you receive in $result is an utf-8 string that seems to represent an url of some sort. Anyhow, json_encode will escape any unicode character to \u008E strings.
If you don't want to escape utf-8 character, this question is relevent to you : Why does the PHP json_encode function convert UTF-8 strings to hexadecimal entities?
Everything seems to work fine from what I see. Although, the string you have provided us seem to be troncated but I guess this is an error on your part.

html decimal coded string

I'm parsing html from a website using simplehtmldom_1_5, when i echo the parsed text to the screen it's printed correctly but when i try to save it to a file using file_put_contents i've my string coded to html decimal code :
&#40&#98&#46&#32&#97&#110&#100&#101&#114&#115&#115&#111&#110&#44&#32
i've already tried all possible combination of utf8_encode, utf8_decode, htmlentities... but nothing worked, same problem when i try to insert to mysql table.
mb_detect_encoding for the parsed text returns ASCII.
Any suggestions ?
header('Content-Type: text/html; charset=utf-8');
ini_set('max_execution_time', 0);
include 'simplehtmldom_1_5/simple_html_dom.php';
$html = file_get_html($curr_url);
$texts = $html->find('div[id=content_h]');
foreach($texts as $text) {
file_put_contents('queries.txt', $text->innertext . "\n", FILE_APPEND);
}
Did you also try html_entity_decode ( http://de1.php.net/html_entity_decode ) ?
Thats the function converting entities back to clear type text
*edit
I just tested this to verify it's working.
Yes it works, BUT:
your data is incorrect !
Every single entity is missing a semicolon at its end!
Thats why decoding only works in loose browser-render engines...
Your data shall be looking like this:
(b.
and not like this
&#40&#98&#46
See the difference?
Finally this worked for me
preg_replace('/&#(\d+)/me',"chr(\\1)", $text)

Character Loss converting _GET Array to URL string

Having a very bizarre issue with the conversion of a $_GET request into a string.
(PHP 5.2.17)
Here is a small snippet of the problem area of the array from print_r():
_GET (array)
...
[address_country_code] => GB
[address_name] => Super Mario
[notify_version] => 3.7
...
There are two cases the _GET data is used:
Case 1): Saved then used later:
// Script1.php
$data = json_encode($_GET);
# > Save to MySQL Database ($data)
// Script2.php (For Viewing & Testing URL later)
# > Load from Database ($result)
echo http_build_query(json_decoded($result,true));
Result of above array snippet: (CORRECT OUTPUT)
address_country_code=GB&address_name=Super+Mario&notify_version=3.7
Case 2): Used in same script as Case 1) just before its saved in Case 1):
// Script1.php
echo http_build_query($_GET);
Results in: (INCORRECT OUTPUT)
address_country_code=GB&address_name=Super+Mario¬ify_version=3.7
How is it possible that a few chars are output as a ¬ in case 2 yet case 1 is fine!
It is driving me insane :(
I have tried also instead of using http_build_query a custom function that generates the url using urlencode() in the Key and Value of the foreach loop, this just resulted in the the ¬ being changed to %C2%AC in one of my test cases!
Everything is ok with your data. You can verify it if you do:
$query = http_build_query($_GET);
parse_str($query, $data);
print_r($data);
you will get the correct uncorrupted data.
And the reason why you see ¬ symbol is how browser interprets html entities. ¬ is represented as ¬ But browser will render it even without semicolon at the end.
You're most likely displaying this data in a web browser and that is interpreting
&not
as special HTML entity.
Pls see this: https://code.google.com/p/doctype-mirror/wiki/NotCharacterEntity
Try doing
var_dump(http_build_query($_GET))
instead of:
echo http_build_query($_GET)
and see HTML source to get/verify actual string.
So, even though both cases output to web a web browser and both convert from an array using http_build_query().
I fixed problem in Case 2 by replacing http_build_query (Case 1 still uses it..) with this function:
htmlspecialchars(http_build_query($_GET));

Problem reading data from file special characters

My previous question and this question both are related a bit. please have a look at my previous question I did not found any other way to unserialize the data so coming with the string operation
I am able to get the whole content from file but not able to get the specific string from this content.
I want to search a specific string from these content but function stop working when the reach at first special character in the string. If I am searching something found before the special character the works properly.
String operation function of PHP not working properly when the encounter first special character in the string and stop processing immediately, Hence they does not give me the correct output.
Originally they looks like (^#)
:"Mage_Core_Model_Message_Collection":2:{s:12:"^#*^#_messages";a:0:{}s:20:"^#*^#_lastAddedMessage";N;}
but when I did echo they are display as ?
Here is the code what I tried
$file='/var/www/html/products/var/session/sess_ciktos8icvk11grtpkj3u610o3';
$contents=file_get_contents($file);
$contents=htmlspecialchars($contents);
//$contents=htmlentities($contents);
echo $contents;
$restData=strstr($contents,'"id";s:4:"');
echo $restData;
$id=substr($restData,0,strpos($restData,'"'));
echo $id;
I changed the default_charset to iso-8859-1 and also utf-8 but not working with both
Please let me know How I can resolve this.
Thanks.
These characters that you see as ^# are actually null bytes. They don't have any proper display, neither they are meant to be displayed - it's an internal representation of protected properties in the engine. You're not supposed to mess with them.
As for resolving, it'd be nice to know what kind of resolution you seek - what result are you trying to achieve?

Categories