Problem and original data
I have a json data which contain some HTML entities to encode some special characters (mostly from French language, like “é”, “ç”, “à”, etc.) and for html tags.
This is a sample of my json data:
{
"data1": "<p>Le cartulaire de 1380-1381 copié au XVIIe siècle et aujourd’hui perdu<strong>*</strong>.",
"data2": "<p><strong>*</strong> Joseph CUVELIER, <em>Cartulaire de l’abbaye du Val-Benoît</em>, Bruxelles, 1906, p. XI-XXVII.</p>"
}
Desired result
{
"data1": "<p>Le cartulaire de 1380-1381 copié au XVIIe siècle et aujourd’hui perdu<strong>*</strong>.",
"data2": "<p><strong>*</strong> Joseph CUVELIER, <em>Cartulaire de l’abbaye du Val-Benoît</em>, Bruxelles, 1906, p. XI-XXVII.</p>"
}
So, I wish to simply decode all HTML entities back to their respective characters and tags. I try to do this with php.
There is my current code:
/* decode data */
$jsonData = '{
"data1": "<p>Le cartulaire de 1380-1381 copié au XVIIe siècle et aujourd’hui perdu<strong>*</strong>.",
"data2": "<p><strong>*</strong> Joseph CUVELIER, <em>Cartulaire de l’abbaye du Val-Benoît</em>, Bruxelles, 1906, p. XI-XXVII.</p>"
}';
$data = json_decode($jsonData, true);
/* change html entities and re-encode data */
$data = mb_convert_encoding($data, "UTF-8", "HTML-ENTITIES");
header('Content-Type: application/json; Charset="UTF-8"');
echo json_encode($data, JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES);
My current result:
{
"data1": "<p>Le cartulaire de 1380-1381 copié au XVIIe siècle et aujourd’hui perdu<strong>*</strong>.",
"data2": "<p><strong>*</strong> Joseph CUVELIER, <em>Cartulaire de l’abbaye du Val-Benoît</em>, Bruxelles, 1906, p. XI-XXVII.</p>"
}
So, HTML tags were well transformed. But the HTML entities for French special characters stay here (but instead, for example é now I have é ).
Question.
How I can convert HTML entities back to characters?
You can test it online here: https://www.tehplayground.com/Z4uB5KIPPo4UQ4h1
Many thanks in advance!
UPDATE:
Finally, my data is more complex than I was imagining. In the same data some characters were preserved as “é”, “à”, “ç” etc. and some other characters was converted to HTM entities. So I can have something like this:
{
"someData1":
{
"data1":
[
"ecclésiastique"
],
"data2": "séculiers"
},
"someData2":
[
{
"anotherData1": "ecclésiastique",
"anotherData2": "<p>Le cartulaire de 1380-1381 copié au XVIIe siècle et aujourd’hui perdu<strong>*</strong>.",
"anotherData3":
{
"text1": "texte here",
"text2": "texte here"
}
},
{
"anotherData1": "ecclésiastique",
"anotherData2": "<p>Le cartulaire de 1380-1381 copié au XVIIe siècle et aujourd’hui perdu<strong>*</strong>.",
"anotherData3":
{
"text1": "texte here",
"text2": "texte here"
}
}
]
}
So, I suppose I have to 1) Convert all data to HTML entities; 2) Convert all HTML entities back to characters…
There is my current code:
# Get data
$jsonData = '{
"someData1":
{
"data1":
[
"ecclésiastique"
],
"data2": "séculiers"
},
"someData2":
[
{
"anotherData1": "ecclésiastique",
"anotherData2": "<p>Le cartulaire de 1380-1381 copié au XVIIe siècle et aujourd’hui perdu<strong>*</strong>.",
"anotherData3":
{
"text1": "texte here",
"text2": "texte here"
}
},
{
"anotherData1": "ecclésiastique",
"anotherData2": "<p>Le cartulaire de 1380-1381 copié au XVIIe siècle et aujourd’hui perdu<strong>*</strong>.",
"anotherData3":
{
"text1": "texte here",
"text2": "texte here"
}
}
]
}';
$data = json_decode($jsonData, true);
# Convert character encoding
$data = mb_convert_encoding($data, "UTF-8", "HTML-ENTITIES");
# Convert HTML entities to their corresponding characters
function html_decode(&$item){
$item = html_entity_decode($item);
}
array_walk_recursive($data, 'html_decode');
var_dump ($data);
So, I succeed in reversing the encoding. These who was an HTML entities become special characters, and those who was a special character become HTML entities.
But I don't have any idea how to get only special characters.
Online test: https://www.tehplayground.com/bVo3Jr5O7L9p4MXX
There is the solution. I needed to
convert & to & to standardize encoding systems;
convert all applicable characters to HTML entities.
There is the final code. Many thanks to all for all your comments and suggestions.
Full code and online test here: https://www.tehplayground.com/zythX4MUdF3ric4l
array_walk_recursive($data, function(&$item, $key) {
if(is_string($item)) {
$item = str_replace("&", "&", $item); // 1. Replace & by &
$item = html_entity_decode($item); // 2. Convert HTML entities to their corresponding characters
}
});
Related
I have the following code that generates data in json php:
<?php
$stmt = $con->prepare("SELECT
id_news,
url,
cover_page,
alt_img,
mini_title,
mini_description,
date_post,
confg_img,
main_cover
FROM news ORDER BY id_news DESC LIMIT 5");
$stmt->execute();
$member = array();
$stmt->bind_result(
$member['id_news_sport'],
$member['url'],
$member['cover_page'],
$member['alt_img'],
$member['mini_title'],
$member['mini_description'],
$member['date_post'],
$member['confg_img'],
$member['main_cover']
);
header('Content-type: application/json; charset=utf-8');
echo '[';
$count = 0;
while ($stmt->fetch()) {
if( $count ) {
echo ',';
}
echo json_encode($member, JSON_UNESCAPED_SLASHES | JSON_PRETTY_PRINT | JSON_FORCE_OBJECT);
++$count;
}
echo ']';
?>
Obtaining the following result:
[{
"id_news": 712,
"url": "es/deportes/futbol/ecuador/ligapro/serie-a/712/marcos-caicedo-recuerda-su-paso-en-los-equipos-del-astillero",
"cover_page": "https://i.imgur.com/kg7RBqK.jpg",
"alt_img": "Marcos Caicedo",
"mini_title": "Marcos Caicedo recuerda su paso en los equipos del astillero",
"mini_description": "El jugador de Liga de Quito no olvida su paso por Emelec y Barcelona",
"date_post": "2020-12-18 03:21:57",
"confg_img": null,
"main_cover": "relevant_news"
},{
"id_news": 708,
"url": "es/deportes/futbol/internacional/fichajes/708/el-fichaje-que-pretendia-ldu-para-la-defensa-podria-caerse",
"cover_page": "https://i.imgur.com/MmETkch.png",
"alt_img": "Jugadores de LDU celebrando un gol",
"mini_title": "EL fichaje que pretend\u00eda LDU para la defensa podr\u00eda caerse",
"mini_description": "LDU tiene un competidor por el fichaje del central",
"date_post": "2020-12-16 20:26:51",
"confg_img": null,
"main_cover": "relevant_news"
}]
But I need to be able to get a line break between the bracket and another line break for each brace, like this:
[
{
"userId": 1,
"id": 1,
"title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
},
{
"userId": 10,
"id": 99,
"title": "temporibus sit alias delectus eligendi possimus magni",
"body": "quo deleniti praesentium dicta non quod\naut est molestias\nmolestias et officia quis nihil\nitaque dolorem quia"
},
{
"userId": 10,
"id": 100,
"title": "at nam consequatur ea labore ea harum",
"body": "cupiditate quo est a modi nesciunt soluta\nipsa voluptas error itaque dicta in\nautem qui minus magnam et distinctio eum\naccusamus ratione error aut"
}
]
As this json data output is shown, what changes should I use in my PHP JSON code to obtain the same result.
HTML
If you are likely to be displaying the resulting JSON to users in a web browser, you can simply use: echo "[<br>"; and echo "<br>]"; to insert html linebreaks.
API/REST Script
If the result is being handled by any other application other than a web browser, use:
echo "[ \n"; and echo "\n ]"; for programs on Linux/Unix Based Systems (Including MacOS 10+)
echo "[ \r"; and echo "\r ]"; for programs on MacOS Versions Lower than 9
echo "[ \r\n"; and echo "\r\n ]"; for programs on Windows Based Systems
The backslash is the universal "escape" character so if you are unsure about which platform you will be running the script on or want the output to be universal regardless of OS, use the windows format of '\r\n' and each OS will work it's own magic
I do not see the reason why you do it, but it could be something like this, adding \n (a new line) and (two blanks) behind the brackets and after the comma
header('Content-type: application/json; charset=utf-8');
echo '['."\n "; // <<<< here
$count = 0;
while ($stmt->fetch()) {
if( $count ) {
echo ','."\n "; // <<<< and here
}
echo json_encode($member, JSON_UNESCAPED_SLASHES | JSON_PRETTY_PRINT | JSON_FORCE_OBJECT);
++$count;
}
echo "\n".']'; // <<<< and here
both the \n and the must be inside double aposthrophe " "
if the space do not work try with \s
header('Content-type: application/json; charset=utf-8');
echo '['."\n\s\s"; // <<<< here
$count = 0;
while ($stmt->fetch()) {
if( $count ) {
echo ','."\n\s\s"; // <<<< and here
}
echo json_encode($member, JSON_UNESCAPED_SLASHES | JSON_PRETTY_PRINT | JSON_FORCE_OBJECT);
++$count;
}
echo "\n".']'; // <<<< and here
When I make a post request to my php api my swift application gives the error:
Garbage at end
This is my json code (just for trying out):
function getTips($params)
{
$blessure = $params[0];
switch($blessure)
{
case "hoofd":
echo '
"hoofd": [
{
"naam": "neus in de kom",
"beschrijving": "neus zit in de kom",
"categorie": "hoofd"
},
{
"naam": "oor in de kom",
"beschrijving": "oor zit in de kom",
"categorie": "hoofd"
}
]
';
break;
default:
header("HTTP/1.0 404 Not Found");
break;
}
}
What am I doing wrong here?
You can try to send header
header('Content-type: application/json');
or put your data in assoc array and then wrap with json_encode, like this
echo json_encode( array(
array( 'naam' => 'neus in de kom',
'beschrijving'=> 'neus zit in de kom',
),
array()
));
die();
just try)
I have a simple PHP script :
function test() {
$sql = "select author, synopsis from book";
$result = mysql_query($sql); // result set
while ($rec = mysql_fetch_array($result, MYSQL_ASSOC)) {
$arr[] = $rec;
};
$data = json_encode($arr); //encode the data in json format
echo $data;
}
The problem is that when I try to read the result with jQuery, I get ‘"synopsis" : NULL‘. I'm wondering is it because the value of synopsis in the database contains multiple lines?
jQuery code :
<script>
$(document).ready(function() {
$.ajax({
url: "data/book.php",
type: 'POST',
dataType: 'json',
success : function (data) {
$('div.book').text(data[0].synopsis);
}
});
});
</script>
The output of the php :
[{"author":"author","synopsis":null}]
Responde to #iMx suggestion :
array(2) {
[0]=>
array(2) {
["author"]=>
string(7) "author"
["synopsis"]=>
string(697) "Le patient du psychiatre
Dekker, un certain Boone, avoue durant les transes dans lesquelles
le plonge le docteur, qu'il aurait commis une dizaine de meurtres,
tous plus sordides les uns que les autres. Seulement, une fois sortie de
cet �tat d'exaltation, l'homme ne se rappelle de rien. D�sesp�r�, Boone fait une
tentative de suicide rat�e qui le conduit � l'h�pital. Son compagnon de chambre,
visiblement bien cram� du cerveau, �voque le nom de Midian, un endroit dans le
d�sert de l'Athabasca o� se regroupent les damn�s de la terre,
les �tres qui souffrent horriblement. Convaincu de pouvoir y trouver un r
efuge, et ainsi de mettre un terme � ses crimes, il part sur ce lieu �trange..."
}
Maybe there is a problem because of the french characters?
I guess it is an encoding problem. What encoding do your mysql tables have? json_encode() accepts only UTF-8 strings and could return null on a non UTF-8 strings - maybe it is your problem. Try to convert the strings with iconv() or mb_convert_encoding() or to set your MySQL query encoding to UTF-8 with mysql_query('SET CHARACTER SET utf8') before your SELECT requests.
I'll try to present it as simple as I can:
I use json_encode() to encode a number of utf-8 strings from different languages and I notice that characters remain unchanged when they belong to ASCII table but everything else is returned as '\unnnn', where 'nnnn' a hexadecimal number.
See the code:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="application/xhtml+xml; charset=UTF-8" />
<title>Multibyte string functions</title>
</head>
<body>
<h3>Multibyte string functions</h3>
<p>
<?php
//present json encode errors nicely:
//assign integer values to keys and error names to values
echo '<br /><b>Define JSON errors</b><br />';
$constants = get_defined_constants(true);
$json_errors = array();
foreach ($constants["json"] as $name => $value) {
if (!strncmp($name, "JSON_ERROR_", 11)) {
$json_errors[$value] = $name;
}
}
echo nl2br(print_r($json_errors, true), true);
//Display current detection order
echo "<br /><b>Current detection order 'mb_detect_order()':</b> ", implode(", ", mb_detect_order());
//Display internal encoding
echo "<br /><b>Internal encoding 'mb_internal_encoding()':</b> ", mb_internal_encoding();
//Get current language
echo "<br /><b>Current detection language 'mb_language()' ('neutral' for utf8):</b> ", mb_language();
//our test data
//a nowdoc that can break a <input> field;
$str = <<<'STR'
O'Reilly(\n) "& 'Big\Two # <span>bo\tld</span>"
STR;
$strings = array(
$str,
"Latin: tell me the answer and I might find the question!",
"Greek: πες μου την ερώτηση και ίσως βρω την απάντηση!",
"Chinese simplified: 告诉我答复,并且我也许发现问题!",
"Arabic: أخبرني الاجابة, انا قد تجد مسالة!",
"Portuguese: mais coisas a pensar sobre diário ou dois!",
"French: plus de choses à penser à journalier ou à deux!",
"Spanish: ¡más cosas a pensar en diario o dos!",
"Italian: più cose da pensare circa giornaliere o due!",
"Danish: flere ting å tenke på hver dag eller to!",
"Chech: Další věcí, přemýšlet o každý den nebo dva!",
"German: mehr über Spaß spät schönen",
"Albanian: më vonë gjatë fun bukur",
"Hungarian: több mint szórakozás késő csodálatos kenyér"
);
//show encoding and then encode
foreach( $strings as $string ){
echo "<br /><br />$string :", mb_detect_encoding($string);
$json = json_encode($string);
echo "<br />Error? ", $json_errors[json_last_error()];
echo '<br />json=', $json;
}
The above code will output:
Define JSON errors
Array
(
[0] => JSON_ERROR_NONE
[1] => JSON_ERROR_DEPTH
[2] => JSON_ERROR_STATE_MISMATCH
[3] => JSON_ERROR_CTRL_CHAR
[4] => JSON_ERROR_SYNTAX
[5] => JSON_ERROR_UTF8
)
Current detection order 'mb_detect_order()': ASCII, UTF-8
Internal encoding 'mb_internal_encoding()': ISO-8859-1
Current detection language 'mb_language()' ('neutral' for utf8): neutral
O'Reilly(\n) "& 'Big\Two # bo\tld" :ASCII
Error? JSON_ERROR_NONE
json="O'Reilly(\\n) \"& 'Big\\Two # bo\\tld<\/span>\""
Latin: tell me the answer and I might find the question! :ASCII
Error? JSON_ERROR_NONE
json="Latin: tell me the answer and I might find the question!"
Greek: πες μου την ερώτηση και ίσως βρω την απάντηση! :UTF-8
Error? JSON_ERROR_NONE
json="Greek: \u03c0\u03b5\u03c2 \u03bc\u03bf\u03c5 \u03c4\u03b7\u03bd \u03b5\u03c1\u03ce\u03c4\u03b7\u03c3\u03b7 \u03ba\u03b1\u03b9 \u03af\u03c3\u03c9\u03c2 \u03b2\u03c1\u03c9 \u03c4\u03b7\u03bd \u03b1\u03c0\u03ac\u03bd\u03c4\u03b7\u03c3\u03b7!"
Chinese simplified: 告诉我答复,并且我也许发现问题! :UTF-8
Error? JSON_ERROR_NONE
json="Chinese simplified: \u544a\u8bc9\u6211\u7b54\u590d\uff0c\u5e76\u4e14\u6211\u4e5f\u8bb8\u53d1\u73b0\u95ee\u9898!"
Arabic: أخبرني الاجابة, انا قد تجد مسالة! :UTF-8
Error? JSON_ERROR_NONE
json="Arabic: \u0623\u062e\u0628\u0631\u0646\u064a \u0627\u0644\u0627\u062c\u0627\u0628\u0629, \u0627\u0646\u0627 \u0642\u062f \u062a\u062c\u062f \u0645\u0633\u0627\u0644\u0629!"
Portuguese: mais coisas a pensar sobre diário ou dois! :UTF-8
Error? JSON_ERROR_NONE
json="Portuguese: mais coisas a pensar sobre di\u00e1rio ou dois!"
French: plus de choses à penser à journalier ou à deux! :UTF-8
Error? JSON_ERROR_NONE
json="French: plus de choses \u00e0 penser \u00e0 journalier ou \u00e0 deux!"
Spanish: ¡más cosas a pensar en diario o dos! :UTF-8
Error? JSON_ERROR_NONE
json="Spanish: \u00a1m\u00e1s cosas a pensar en diario o dos!"
Italian: più cose da pensare circa giornaliere o due! :UTF-8
Error? JSON_ERROR_NONE
json="Italian: pi\u00f9 cose da pensare circa giornaliere o due!"
Danish: flere ting å tenke på hver dag eller to! :UTF-8
Error? JSON_ERROR_NONE
json="Danish: flere ting \u00e5 tenke p\u00e5 hver dag eller to!"
Chech: Další věcí, přemýšlet o každý den nebo dva! :UTF-8
Error? JSON_ERROR_NONE
json="Chech: Dal\u0161\u00ed v\u011bc\u00ed, p\u0159em\u00fd\u0161let o ka\u017ed\u00fd den nebo dva!"
German: mehr über Spaß spät schönen :UTF-8
Error? JSON_ERROR_NONE
json="German: mehr \u00fcber Spa\u00df sp\u00e4t sch\u00f6nen"
Albanian: më vonë gjatë fun bukur :UTF-8
Error? JSON_ERROR_NONE
json="Albanian: m\u00eb von\u00eb gjat\u00eb fun bukur"
Hungarian: több mint szórakozás késő csodálatos kenyér :UTF-8
Error? JSON_ERROR_NONE
json="Hungarian: t\u00f6bb mint sz\u00f3rakoz\u00e1s k\u00e9s\u0151 csod\u00e1latos keny\u00e9r"
As you can see in most languages-except English-there is a hexadecimal conversion of utf-8 characters.
Is it possible to encode by not replacing my unicode characters? Is it safe? What other people do?
You should consider such encodings that are coming from user input in pages and stored to mysql.
Thanks.
Maybe you should try json_encode($string, JSON_UNESCAPED_UNICODE) , or any method in http://php.net/manual/fr/function.json-encode.php that may be usefull for your various cases.
Ok,
really thanks for the answer!
The problem is that I'm on version PHP Version 5.3.10 and json_encode($string, JSON_UNESCAPED_UNICODE) isn't an option.
Fortunately, a guy called "Mr Swordsteel" posted a comment at php's manual http://www.php.net/manual/en/function.json-encode.php which actually does the trick (thank you Mr Swordsteel!)
The real paradox is that it emulates completely json_encode function and gives a hint if we want to port it to another language like javascript and keep our libraries communicative.
function my_json_encode($in){
$_escape = function ($str) {
return addcslashes($str, "\v\t\n\r\f\"\\/");
};
$out = "";
if (is_object($in)){
$class_vars = get_object_vars(($in));
$arr = array();
foreach ($class_vars as $key => $val){
$arr[$key] = "\"{$_escape($key)}\":\"{$val}\"";
}
$val = implode(',', $arr);
$out .= "{{$val}}";
}elseif (is_array($in)){
$obj = false;
$arr = array();
foreach($in as $key => $val){
if(!is_numeric($key)){
$obj = true;
}
$arr[$key] = my_json_encode($val);
}
if($obj){
foreach($arr AS $key => $val){
$arr[$key] = "\"{$_escape($key)}\":{$val}";
}
$val = implode(',', $arr);
$out .= "{{$val}}";
}else {
$val = implode(',', $arr);
$out .= "[{$val}]";
}
}elseif (is_bool($in)){
$out .= $in ? 'true' : 'false';
}elseif (is_null($in)){
$out .= 'null';
}elseif (is_string($in)){
$out .= "\"{$_escape($in)}\"";debug('in='.$in.', $_escape($in)='.$_escape($in).', out='.$out);
}else{
$out .= $in;
}
return "{$out}";
}
I gave it a lot of tests and couldn't break it!
It would be very interesting now to re-implement json_decode!
Thanks.
Here is my JSON data :
{
"Liste_des_produits1": [{
"Added_Time": "28-Sep-2009 16:35:03",
"prod_ingredient": "sgsdgds",
"prod_danger": ["sans danger pour xyz"],
"prod_odeur": ["Orange"],
"prod_nom": "ceciestunproduit",
"prod_certification": ["HE • Haute Efficité", "Certifier Ecologo", "Contenant recyclable"],
"prod_photo": "",
"prod_categorie": ["Corporel"],
"prod_desc": "gdsg",
"prod_format": ["10 kg", "20 kg"]
}, {
"Added_Time": "28-Sep-2009 16:34:52",
"prod_ingredient": "dsgdsgdsf",
"prod_danger": ["Sans danger pour le fausse sceptiques"],
"prod_odeur": ["Agrumes", "Canneberge"],
"prod_nom": "jsute un test",
"prod_certification": ["100% Éco-technologie", "Certifier Ecologo", "Contenant recyclable"],
"prod_photo": "",
"prod_categorie": ["Corporel"],
"prod_desc": "gsdgsdgsdg",
"prod_format": ["1 Litre", "10 kg"]
}]
}
In PHP, what is the way to access different values of data?
Like: prod_ingredient or prod_danger.
I have tried:
$prod = $result->{'Liste_des_produits1'};
print_r($prod[0]); // error
and
$result = json_decode($json);
print_r($result['Liste_des_produits1'].prod_ingredient[1]); // error
Use json_decode to convert the data to an associative array.
$data = json_decode($jsonString, true);
// extend this concept to access other values
$prod_ingredient = $prod['Liste_des_produits1'][0]['prod_ingredient'];
Use json_decode
Then you can access the data as a regular array.