I am trying to replace all occurences of \/ in a string output in php with /, but it is not working..
Here is my code:
$output = str_replace("\\/", "/", $output);
echo json_encode($output, JSON_UNESCAPED_UNICODE );
echo json_encode($output, JSON_UNESCAPED_SLASHES);
but I am still getting such strings in the output on the webpage, like:
https:\/\/img.xxxx.com\/images\/channel-resources\/1\/def\/43\/0\/1\/defintion.png
or something like that:
https:\/\/img.yyyy.de\/images\/channel-resources\/1\/obchi\/43\/0\/1\/obchi_1.png
If I switch the order of the two functions like that:
$output = str_replace("\\/", "/", $output);
echo json_encode($output, JSON_UNESCAPED_SLASHES);
echo json_encode($output, JSON_UNESCAPED_UNICODE );
I get the slashes written right, but the germans letters are appearing in a weird form, like: "\u00df" or "u00f6\u00df"... for example the world "große" would be written like "gro\u00dfe"
Anyone an idea to fix that? to get the german letters and the URIs written right? not like "https://img.xxxx.com/images/channel-resources/1/def/43/0/1/defintion.png"?
You're using the wrong constant.
Use JSON_UNESCAPED_SLASHES instead of JSON_UNESCAPED_UNICODE to prevent escaping the slashes in json_encode().
You can specify both using JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE.
See http://php.net/manual/en/json.constants.php
$output = str_replace("\\/", "/", $output);
echo json_encode($output, JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE);
Try to echo $output and check it, I am almost sure it is the json_encode() you are using that adds the \ for you
\u00* are Unicode letters.
Try this to parse into html_entities
$output = 'http:\/\/ßßüüää.com\/';
$output = str_replace("\\/", "/", $output);
$output = htmlentities($output, ENT_COMPAT, "UTF-8");
echo json_encode($output, JSON_UNESCAPED_SLASHES);
Related
TLDR: Trying to convert the string \u00e6\u0097\u00a5\u00e6\u009c\u00ac to 日本 in php.
(Trying to get \u00e6\u0097\u00a5\u00e6\u009c\u00ac to echo out 日本)
Hi folks,
I have a json file from Instagram (downloaded my data) and many of my posts contain Japanese text which is stored encoded in UTF-8 (and please correct me if I'm wrong, especially as mb_detect_encoding("\u00e6\u0097\u00a5\u00e6\u009c\u00ac") returns "ASCII").
For example \u00e6\u0097\u00a5\u00e6\u009c\u00ac becomes 日本.
The conversions can be seen working fine on this encoder/decoder website:
https://mothereff.in/utf-8
(Note that if you put 日本 into the above site it returns \xE6\x97\xA5\xE6\x9C\xAC, so adding \xE6\x97\xA5\xE6\x9C\xAC \u00e6\u0097\u00a5\u00e6\u009c\u00ac to the encoded field will produce 日本 日本 in the decoded field)
I'm trying to convert it back to regular Japanese text but am having issues.
I've been googling and looking over Stackoverflow for just over a day and have been trying many different methods, but I just can't get it to convert. I'm clearly missing something. In most cases it does not change at all.
For the scope of this question, I'm simply trying to convert \u00e6\u0097\u00a5\u00e6\u009c\u00ac into 日本.
I am not trying to convert the json file (though am open to any suggestions that would need me to).
(For the record I am using the variable $str for \u00e6\u0097\u00a5\u00e6\u009c\u00ac)
The following attempts resulted in no visible change, \u00e6\u0097\u00a5\u00e6\u009c\u00ac
echo call_user_func_array('mb_convert_encoding', array(&$str,'HTML-ENTITIES','UTF-8'));
echo iconv('ASCII', 'UTF-8', $str);
echo iconv("UTF-8", "CP1252", $str);
echo iconv('UTF-8', 'ISO-8859-1', $str);
echo iconv('UTF-8', 'UTF-8//IGNORE', utf8_encode($str));
echo iconv('ISO-8859-1', 'UTF-8', $str);
echo iconv('ISO-8859-9', 'UTF-8', $str);
echo iconv(mb_detect_encoding($str, mb_detect_order(), true), "UTF-8", $str);
echo htmlentities($str);
echo mb_convert_encoding($str, 'utf-8', 'iso-8859-1');
echo mb_convert_encoding($str, "EUC-JP", "auto");
echo mb_convert_encoding($str, "utf-8", "windows-1251");
echo mb_convert_encoding($str, "windows-1251", "utf-8");
echo mb_convert_encoding($str,'HTML-ENTITIES', 'UTF-8');
echo mb_convert_encoding($str,"UTF-8","auto");
echo mb_convert_encoding($str,"UTF-8");
echo mb_convert_encoding($str, 'UTF-8', array('EUC-JP', 'SHIFT-JIS', 'AUTO'));
echo mb_convert_encoding($str, 'UTF-8', 'ISO-8859-1');
echo mb_convert_encoding($str, "UTF-8", mb_detect_encoding($str, "UTF-8, ISO-8859-1, ISO-8859-15", true));
echo mb_convert_encoding($str, "ISO-8859-1", mb_detect_encoding($str, "UTF-8, ISO-8859-1, ISO-8859-15", true));
echo mb_convert_encoding($str, 'ISO-8859-1', 'UTF-8');
echo utf8_decode($str);
echo utf8_encode($str);
The following attempt resulted in the slash being duplicated with double quotation marks added, "\\u00e6\\u0097\\u00a5\\u00e6\\u009c\\u00ac"
echo json_encode($str,JSON_HEX_TAG);
echo json_encode($str,JSON_UNESCAPED_UNICODE |JSON_PRETTY_PRINT);
echo json_encode($str,JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
The following attempt resulted in nothing being returned,
echo json_decode($str, JSON_HEX_TAG);
echo json_decode($str, false);
echo json_decode($str, false, 512, JSON_UNESCAPED_UNICODE);
The following attempted resulted in the slashes changing to an unknown character, �_u00e6�_u0097�_u00a5�_u00e6�_u009c�_u00ac
echo mb_convert_encoding($str, "SJIS");
From the PHP documentation I tried this to see if any of the combinations would work, but none did.
https://www.php.net/manual/en/function.mb-convert-encoding.php#97902
foreach(mb_list_encodings() as $chr){
echo mb_convert_encoding($str, 'UTF-8', $chr)." : ".$chr."<br>";
}
echo "<br>--- REVERSE TRY ---<br><br>";
foreach(mb_list_encodings() as $chr){
echo mb_convert_encoding($str, $chr, 'UTF-8')." : ".$chr."<br>";
}
I tried using the Unicode Codepoint Escape Syntax, which gave 日本
https://www.php.net/manual/en/migration70.new-features.php#migration70.new-features.unicode-codepoint-escape-syntax
echo "\u{00e6}\u{0097}\u{00a5}\u{00e6}\u{009c}\u{00ac}";
As mentioned in the brackets earlier, \xE6\x97\xA5\xE6\x9C\xAC does convert to 日本 when echoed.
echo "\xE6\x97\xA5\xE6\x9C\xAC";
Noticing above that the two different codes had the same endings, I tried using str_replace so that they would match, but this time \xE6\x97\xA5\xE6\x9C\xAC was echoed.
echo str_replace("\U00","\x",strtoupper($str));
I have also tried all of the above with and without the following:
header('Content-Type: text/plain; charset="UTF-8"');
Here is a segment of the original JSON file (original file is 13k lines, so here is a single element).
{
"media": [
{
"uri": "media/posts/202104/175127092_241529264421003_4026764649651789139_n_18106766305234668.jpg",
"creation_timestamp": 1619277565,
"title": "Time to head back to Tokyo.\nFukuoka Airport, Japan.\n18 October 2020\n.\n.\n.\n.\n.\n#japan #\u00e6\u0097\u00a5\u00e6\u009c\u00ac #toyphotography #toy #\u00e3\u0081\u008a\u00e3\u0082\u0082\u00e3\u0081\u00a1\u00e3\u0082\u0083 #\u00e3\u0083\u00ad\u00e3\u0083\u009b\u00e3\u0082\u0099\u00e3\u0083\u0083\u00e3\u0083\u0088 #GodJesusRobot #robot #toyholiday #holiday #vacation #\u00e6\u0097\u0085\u00e8\u00a1\u008c #photography #\u00e5\u0086\u0099\u00e7\u009c\u009f #japan_of_insta #japantravel #\u00e6\u0097\u00a5\u00e6\u009c\u00ac\u00e6\u0097\u0085\u00e8\u00a1\u008c #travel #kitakyushu #\u00e5\u008c\u0097\u00e4\u00b9\u009d\u00e5\u00b7\u009e #airport #\u00e7\u00a9\u00ba\u00e6\u00b8\u00af #fukuokaairport #\u00e7\u00a6\u008f\u00e5\u00b2\u00a1\u00e7\u00a9\u00ba\u00e6\u00b8\u00af #plane #airplane #aeroplane #\u00e9\u00a3\u009b\u00e8\u00a1\u008c\u00e6\u00a9\u009f #windowseat #window"
}
]
}
UPDATE
Based on the comments by #jerry and #yourcommonsense, hexbin can work so the string will have to be converted by dropping the \u00. hex2bin(str_replace('\u00', '', $str)); will definitely work for the string mentioned in the TLDR and upper part of the question, but to tackle the full title string in the json I've come up with a very ugly and messy method.
$str = "Time to head back to Tokyo.\nFukuoka Airport, Japan.\n18 October 2020\n.\n.\n.\n.\n.\n#japan #\u00e6\u0097\u00a5\u00e6\u009c\u00ac #toyphotography #toy #\u00e3\u0081\u008a\u00e3\u0082\u0082\u00e3\u0081\u00a1\u00e3\u0082\u0083 #\u00e3\u0083\u00ad\u00e3\u0083\u009b\u00e3\u0082\u0099\u00e3\u0083\u0083\u00e3\u0083\u0088 #GodJesusRobot #robot #toyholiday #holiday #vacation #\u00e6\u0097\u0085\u00e8\u00a1\u008c #photography #\u00e5\u0086\u0099\u00e7\u009c\u009f #japan_of_insta #japantravel #\u00e6\u0097\u00a5\u00e6\u009c\u00ac\u00e6\u0097\u0085\u00e8\u00a1\u008c #travel #kitakyushu #\u00e5\u008c\u0097\u00e4\u00b9\u009d\u00e5\u00b7\u009e #airport #\u00e7\u00a9\u00ba\u00e6\u00b8\u00af #fukuokaairport #\u00e7\u00a6\u008f\u00e5\u00b2\u00a1\u00e7\u00a9\u00ba\u00e6\u00b8\u00af #plane #airplane #aeroplane #\u00e9\u00a3\u009b\u00e8\u00a1\u008c\u00e6\u00a9\u009f #windowseat #window";
$pattern = '/(\\\\u00..)+/i';
function getHex2Bin($matches) {
return hex2bin(str_replace("\U00","",strtoupper($matches[0])));
}
$result = preg_replace_callback($pattern, 'getHex2Bin', $str);
echo $result;
This does work, giving me my desired result:
Time to head back to Tokyo. Fukuoka Airport, Japan. 18 October 2020 . . . . . #japan #日本 #toyphotography #toy #おもちゃ #ロボット #GodJesusRobot #robot #toyholiday #holiday #vacation #旅行 #photography #写真 #japan_of_insta #japantravel #日本旅行 #travel #kitakyushu #北九州 #airport #空港 #fukuokaairport #福岡空港 #plane #airplane #aeroplane #飛行機 #windowseat #window but I can't help feel that there is a much neater solution.
Update 2
Here is a PHP Sandbox showing the results of all attempts mentioned above, including the messy working one.
You face a mojibake case (example in Python for its universal intelligibility):
print('\u00e6\u0097\u00a5\u00e6\u009c\u00ac'.encode('latin1').decode())
日本
Let's rewrite above code in PHP terms (utilizing the json_decode function):
<?php
$str_chin = "日本";
echo $str_chin . " => test: chinese to JSON => "
. json_encode($str_chin) . PHP_EOL;
$str = '\u00e6\u0097\u00a5\u00e6\u009c\u00ac';
$str_moj = json_decode('"' . $str . '"', JSON_INVALID_UTF8_IGNORE );
echo $str . " => mojibake => "
. $str_moj . PHP_EOL;
echo $str . " => solution => "
. mb_convert_encoding($str_moj, 'iso-8859-1', 'utf-8');
?>
Output:
73099438.php
日本 => test: chinese to JSON => "\u65e5\u672c"
\u00e6\u0097\u00a5\u00e6\u009c\u00ac => mojibake => æ¥æ¬
\u00e6\u0097\u00a5\u00e6\u009c\u00ac => solution => 日本
I have a query string with URL-encoded symbols:
$wm_string = "LMI_MODE=1&LMI_PAYMENT_DESC=%CF%EE%E6%E5%F0%F2%E2%EE%E2%E0%ED%E8%E5+Plan+Z";
I need to convert it into JSON with PHP, but json_encode returns an empty string.
Here is my code in PHP:
parse_str($wm_string, $_REQUEST);
var_dump($_REQUEST);
echo "JSON:".json_encode($_REQUEST);
Here is the result:
array(1) { ["LMI_MODE"]=> string(46) "1?LMI_PAYMENT_DESC=Пожертвование Plan Z Online" } JSON:
What should I do?
UPDATE:
The expected result is:
{
"LMI_MODE":1,
"LMI_PAYMENT_DESC":"Пожертвование Plan Z Online"
}
UPDATE2:
The encoding is windows-1251, while json_encode seems to be expecting UTF-8. Is there a way to tell json_encode which encoding it should use while parsing?
Since json_encode does work with UTF-8 only, and the text is in windows-1251, it should be converted from that encoding to UTF-8.
$wm_string = "LMI_MODE=1&LMI_PAYMENT_DESC=%CF%EE%E6%E5%F0%F2%E2%EE%E2%E0%ED%E8%E5+Plan+Z+Online";
$wm_string = iconv("windows-1251", "UTF-8", $wm_string);
parse_str(urldecode($wm_string), $result);
echo "JSON:".json_encode($result, JSON_UNESCAPED_UNICODE);
Output:
JSON:{"LMI_MODE":"1","LMI_PAYMENT_DESC":"Пожертвование Plan Z Online"}
Try this:
$wm_string = "LMI_MODE=1LMI_PAYMENT_DESC=%CF%EE%E6%E5%F0%F2%E2%EE%E2%E0%ED%E8%E5+Plan+Z";
$wm_string = (parse_url(urldecode($wm_string)));
$wm_string = json_encode(urlencode($wm_string['path']));
echo "JSON: " . $wm_string;
Result:
JSON: "LMI_MODE%3D1LMI_PAYMENT_DESC%3D%CF%EE%E6%E5%F0%F2%E2%EE%E2%E0%ED%E8%E5+Plan+Z"
I know this is similar to the other two answers, but just wanted to point out that you don't need to and in fact shouldn't be using urldecode with parse_url. In a simpler approach you could parse the url string and convert the windows-1251 variable to utf8 like this.
<?php
$wm_string = "LMI_MODE=1&LMI_PAYMENT_DESC=%CF%EE%E6%E5%F0%F2%E2%EE%E2%E0%ED%E8%E5+Plan+Z+Online";
//Parse the url string
parse_str($wm_string, $args);
//Convert payment description to utf8
$args['LMI_PAYMENT_DESC'] = iconv("windows-1251", "UTF-8", $args['LMI_PAYMENT_DESC']);
echo "JSON: " . json_encode($args, JSON_UNESCAPED_UNICODE);
//JSON: {"LMI_MODE":"1","LMI_PAYMENT_DESC":"Пожертвование Plan Z Online"}
I have string that looks like this "v\u00e4lkommen till mig" that I get after doing utf8_encode() on the string.
I would like that string to become
välkommen till mig
where the character
\u00e4 = ä = ä
How can I achive this in PHP?
Do not use utf8_(de|en)code. It just converts from UTF8 to ISO-8859-1 and back. ISO 8859-1 does not provide the same characters as ISO-8859-15 or Windows1252, which are the most used encodings (besides UTF-8). Better use mb_convert_encoding.
"v\u00e4lkommen till mig" > This string looks like a JSON encoded string which IS already utf8 encoded. The unicode code positiotion of "ä" is U+00E4 >> \u00e4.
Example
<?php
header('Content-Type: text/html; charset=utf-8');
$json = '"v\u00e4lkommen till mig"';
var_dump(json_decode($json)); //It will return a utf8 encoded string "välkommen till mig"
What is the source of this string?
There is no need to replace the ä with its HTML representation ä, if you print it in a utf8 encoded document and tell the browser the used encoding. If it is necessary, use htmlentities:
<?php
$json = '"v\u00e4lkommen till mig"';
$string = json_decode($json);
echo htmlentities($string, ENT_COMPAT, 'UTF-8');
Edit: Since you want to keep HTML characters, and I now think your source string isn't quite what you posted (I think it is actual unicode, rather than containing \unnnn as a string), I think your best option is this:
$html = str_replace( str_replace( str_replace( htmlentities( $whatever ), '<', '<' ), '>', '>' ), '&', '&' );
(note: no call to utf8-decode)
Original answer:
There is no direct conversion. First, decode it again:
$decoded = utf8_decode( $whatever );
then encode as HTML:
$html = htmlentities( $decoded );
and of course you can do it without a variable:
$html = htmlentities( utf8_decode( $whatever ) );
http://php.net/manual/en/function.utf8-decode.php
http://php.net/manual/en/function.htmlentities.php
To do this by regular expression (not recommended, likely slower, less reliable), you can use the fact that HTML supports &#xnnnn; constructs, where the nnnn is the same as your existing \unnnn values. So you can say:
$html = preg_replace( '/\\\\u([0-9a-f]{4})/i', '&#x$1;', $whatever )
The html_entity_decode worked for me.
$json = '"v\u00e4lkommen till mig"';
echo $decoded = html_entity_decode( json_decode($json) );
echo $title gives me something like \u00ca\u00e0\u00f7\u00e5\u00eb\u00e8.
It should be a readable text instead. How do I decode it correctly?
I've tried html_entity_decode($title, 0, 'UTF-8'), but it doesn't work for non-english languages. I get something like Êà÷åëè instead of a real text.
Try echo htmlentities($str, ENT_QUOTES | ENT_IGNORE, "UTF-8");
try this
$title = mb_convert_encoding($title,'HTML-ENTITIES','utf-8');
hope this will work for you.
Edit:
Try this if it works
$title = iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $title);
So when I run json_encode, it grabs the \r\n from MySQL aswell. I have tried rewriting strings in the database to no avail. I have tried changing the encoding in MySQL from the default latin1_swedish_ci to ascii_bin and utf8_bin. I have done tons of str_replace and chr(10), chr(13) stuff. I don't know what else to say or do so I'm gonna just leave this here....
$json = json_encode($new);
if(isset($_GET['pretty'])) {
echo str_replace("\/", "/", jsonReadable(parse($json)));
} else {
$json = str_replace("\/", "/", $json);
echo parse($json);
}
The jsonReadable function is from here and the parse function is from here. The str_replaces that are already in there are because I am getting weird formatted html tags like </h1>. Finally, $new is an array which is crafted above. Full code upon request.
Help me StackOverflow. You're my only hope
Does the string contain "\r\n" (as in 0x0D 0x0A) or the literal string '\r\n'? If it's the former, this should remove any newlines.
$json = preg_replace("!\r?\n!", "", $json);
Optionally, replace the second parameter "" with "<br />" if you'd like to replace the newlines with a br tag. For the latter case, try the following:
$json = preg_replace('!\\r?\\n!', "", $json);
Don't replace it in the JSON, replace it in the source before you encode it.
I had a similar issue, i used:
$p_num = trim($this->recp);
$p_num = str_replace("\n", "", $p_num);
$p_num = str_replace("\r", ",", $p_num);
$p_num = str_replace("\n",',', $p_num);
$p_num = rtrim($p_num, "\x00..\x1F");
Not sure if this will help with your requirements.