I've a string which looks like this:
[{
text: "key 1",
value: "value 1"
}, {
text: "key 2",
value: "value 2"
}, {
text: "key 3",
value: "value 3"
}]
I'm not sure what kind of notation this is, AFAIK this is generated by a ASP .NET backend. It looks a lot similar to JSON but calling json_decode() on this fails.
Can someone bring me some light on this kind of notation and provide me a efficient way to parse it into a key / value array with PHP?
Any way you can change the output? Quoting the key names seems to allow it to parse normally:
$test = '[{"text":"key 1","value":"value 1"},{"text":"key 2","value":"value 2"},{"text":"key 3","value":"value 3"}]';
var_dump(json_decode($test));
It is JSON-like, but apparently not exactly to the spec. The PHP json_decode function only likes double quoted key names:
// the following strings are valid JavaScript but not valid JSON
// the name and value must be enclosed in double quotes
// single quotes are not valid
$bad_json = "{ 'bar': 'baz' }";
json_decode($bad_json); // null
// the name must be enclosed in double quotes
$bad_json = '{ bar: "baz" }';
json_decode($bad_json); // null
// trailing commas are not allowed
$bad_json = '{ bar: "baz", }';
json_decode($bad_json); // null
That sample is valid YAML, which is a superset of JSON. There seem to be at least 3 PHP libraries for YAML.
If it is in fact YAML, you're better off using a real YAML library, than running it through a regex and throwing it at your JSON library. YAML has support for other features (besides unquoted strings) which, if your ASP.NET backend uses, aren't going to survive the trip.
It looks like javascript syntax (similar to JSON). Regular expressions are the way to go for parsing it. Strip the '[' and ']', then separate on the ','. Then parse each object individually.
It looks like a custom format. Replace the [{ and }] delimiters at the beginning and the end. Then explode on "},{" and you get this:
text:"key 1",value:"value 1"
text:"key 2",value:"value 2"
text:"key 3",value:"value 3"
At that point you can iterate over each element in the array and use preg_match to extract your values.
It looks almost like a sort of array-style data container - text being the index and value being the value.
$string = ....;
$endArray = array()
$string = trim($string,'[]');
$startArray = preg_split('/{.+}/');
// Array of {text:"key 1",value:"value 1"}, this will also skip empty conainers
foreach( $startArray as $arrayItem ) {
$tmpString = trim($arrayItem,'{}'); // $tmp = text:"key 1",value:"value 1"
$tmpArray = explode(',',$tmpString); // $tmpArray = ('text: "key 1"', 'value: "value 1"')
$endArray[substr($tmpArray[0],7,strlen($tmpArray[0])-1)] = substr($tmpArray[1],7,strlen($tmpArray[1])-1);
}
To get your JSON data accepted by json_decode(), you could use the following regular expression:
function json_replacer($match) {
if ($match[0] == '"' || $match[0] == "'") {
return $match;
}
else {
return '"'.$match.'"';
}
}
$json_re = <<<'END'
/ " (?: \\. | [^\\"] )* " # double-quoted string, with escapes
| ' (?: \\. | [^\\'] )* ' # single-quoted string, with escapes
| \b [A-Za-z_] \w* (?=\s*:) # A single word followed by a colon
/x
END;
$json = preg_replace_callback($json_re, 'json_replacer', $json);
Because the matches will never overlap, a word followed by colon inside a string will never match.
I also found a comparison between different JSON implementations for PHP:
http://gggeek.altervista.org/sw/article_20061113.html
I have never used it, but maybe give a look at json_decode.
Related
I have a json object scraped from a website
{
area: {
"lang": "en",
"area": "25",
"region": "mea"
},
config: {
"rtl": false,
"breakpoint": 768
}
}
due to the season area and config is not enclosed in double quotes php function json_decode retuerns NULL
how to add double quotes in php to area and config if these are not already enclosed in double quotes?
Use a regex replace, (assuming the format).
$json = preg_replace('/([^"\s]+)+: ?{/', '"$1": {', $js_object);
Regex101.com
PHP Sandbox
Edit
For the supplied string, you need to check two more things:
Make sure that the pattern isn't in a string (eg. "Your selection: {packageName}")
Make sure that the backslash character is escaped (Source)
Here's the updated code:
$js_object = '...';
$json_proper_backslashes = preg_replace('#\\\\([^"\\\\\/bfnrtu])#', '\\\\\\\\$1', $js_object);
$json = preg_replace('/({|},)\s*([^"\s]+): ?{/', '$1"$2": {', $json_proper_backslashes);
$json_object = json_decode($json);
I have a json and I need to match all "text" keys as well as the "html" keys.
For example, the json could be like below:
[{
"layout":12,
"text":"Lorem",
"html":"<div>Ipsum</div>"
}]
Or it could be like below:
[{
"layout":12,
"settings":{
"text":"Lorem",
"atts":{
"html":"<div>Ipsum</div>"
}
}
}]
The json is not always using the same structure so I have to match the keys and get their values using preg_match_all. I have tried the following to get the value of the "text" key:
preg_match_all('|"text":"([^"]*)"|',$json,$match_txt,PREG_SET_ORDER);
The above works fine for matching a single key. When it comes to matching a second key ("html" in this case) it just doesn't work. I have tried the following:
preg_match_all('|"text|html":"([^"]*)"|',$json,$match_txt,PREG_SET_ORDER);
Can you please give me some hints why the OR operator (text|html) doesn't work? Strangely, the above (multi-pattern) regex works fine when I test it in an online tester but it doesn't work in my php files.
Fixing text|html
You should add text|html to a group, otherwise it will look for "text or html".
|"(text|html)":"([^"]*)"|
Delimiters
This won't currently work with your delimiters though as you use the pipe (|) inside of the expression. You should change your delimiters to something else, here I've used /.
/"(text|html)":"([^"]*)"/
If you still want to use the pipe as your delimiters, you should escape the pipe within the expression.
|"(text\|html)":"([^"]*)"|
If you don't want to manually escape it, preg_quote() can do it for you.
$exp = preg_quote('"(text|html)":"([^"]*)"');
preg_match_all("|{$exp}|",$json,$match_txt,PREG_SET_ORDER);
Parsing JSON
Although that regex will work, it will need additional parsing and it makes more sense to use a recursive function for this.
json_decode() will decode a JSON string into the relative data types. In the example below I've passed an additional argument true which means I will get an associative array where you would normally get an object.
Once findKeyData() is called, it will recursively call itself and work through all of the data until it finds the specified key. If not, it returns null.
function findKeyData($data, $key) {
foreach ($data as $k => $v) {
if (is_array($v)) {
$data = findKeyData($v, $key);
if (! is_null($data)) {
return $data;
}
}
if ($k == $key) {
return $v;
}
}
return null;
}
$json1 = json_decode('[{
"layout":12,
"text":"Lorem",
"html":"<div>Ipsum</div>"
}]', true);
$json2 = json_decode('[{
"layout":12,
"settings":{
"text":"Lorem",
"atts":{
"html":"<div>Ipsum</div>"
}
}
}]', true);
var_dump(findKeyData($json1, 'text')); // Lorem
var_dump(findKeyData($json1, 'html')); // <div>Ipsum</div>
var_dump(findKeyData($json2, 'text')); // Lorem
var_dump(findKeyData($json2, 'html')); // <div>Ipsum</div>
preg_match_all('/"(?:text|html)":"([^"]*)"/',$json,$match_txt,PREG_SET_ORDER);
print $match_txt[0][0]." with group 1: ".$match_txt[0][1]."\n";
print $match_txt[1][0]." with group 1: ".$match_txt[1][1]."\n";
returns:
$ php -f test.php
"text":"Lorem" with group 1: Lorem
"html":"<div>Ipsum</div>" with group 1: <div>Ipsum</div>
The enclosing parentheses are needed : (?:text|html); I couldn't get it to work on https://regex101.com without. ?: means the content of the parentheses will not be captured (i.e., not available in the results).
I also replaced the pipe (|) delimiter with forward slashes since you also have a pipe inside the regex. Another option is to escape the pipe inside the regex: |"(?:text\|html)":"([^"]*)"|.
I don't see any reason to use a regex to parse a valid json string:
array_walk_recursive(json_decode($json, true), function ($v, $k) {
if ( in_array($k, ['text', 'html']) )
echo "$k -> $v\n";
});
demo
You use the Pipe | character as delimiter, I think this will break your regexp. Does it work using another delimiter like
preg_match_all('#"text|html":"([^"]*)"#',$json,$match_txt,PREG_SET_ORDER);
?
I'm getting a data feed which is in JSON format and the only available format. In PHP, I'm using json_decode to decode the JSON, but it was breaking, and I found out that the JSON was generated in some places with double quotes in a person's nick name. I verified this using:
http://jsonformatter.curiousconcept.com
I don't have control over the creation of the data, but I have to deal with this broken format when it occurs. This data after it's parsed will be put into a MySQL TABLE.
For example:
"contact1": "David "Dave" Letterman",
json_decode would return a NULL. If I manually saved the file, and changed it to single quotes around the nickname of Dave, then everything worked.
$json_string = file_get_contents($json_download);
$json_array = json_decode($json_string, true);
How do I fix the broken JSON format in json_string before it gets processed by json_decode?
What should be done to pre-process the file, backslash the double quotes of the nickname? Or change them to single quotes? Is it even a good idea to store double quotes like this in MySQL?
I don't know when this might occur with each data feed, so I don't want to just check for contact1 if it has inner double quotes to fix them. Is there a way in PHP to take a line such as the above example, and backslash everything after the colon except the outer double quotes? Thanks!
This is the correct code for as provided by tftd:
<?php
// This:
// "contact1": "David "Dave" Letterman",
// Needs to look like this to be decoded by JSON:
// "contact1": "David \"Dave\" Letterman",
$data ='"contact1": "David "Dave" Letterman",';
function replace($match){
$key = trim($match[1]);
$val = trim($match[2]);
if($val[0] == '"')
$val = '"'.addslashes(substr($val, 1, -1)).'"';
else if($val[0] == "'")
$val = "'".addslashes(substr($val, 1, -1))."'";
return $key.": ".$val;
}
$preg = preg_replace_callback("#([^{:]*):([^,}]*)#i",'replace',$data);
var_dump($preg);
$json_array = json_decode($preg);
var_dump($json_array);
echo $json_array . "\n";
echo $preg . "\n";
?>
Here is the output:
string(39) ""contact1": "David \"Dave\" Letterman","
NULL
"contact1": "David \"Dave\" Letterman",
I have a own jsonFixer() function - it works in two steps: removing garbage (for equality of incoherent formatting) and reformatting.
<?php
function jsonFixer($json){
$patterns = [];
/** garbage removal */
$patterns[0] = "/([\s:,\{}\[\]])\s*'([^:,\{}\[\]]*)'\s*([\s:,\{}\[\]])/"; //Find any character except colons, commas, curly and square brackets surrounded or not by spaces preceded and followed by spaces, colons, commas, curly or square brackets...
$patterns[1] = '/([^\s:,\{}\[\]]*)\{([^\s:,\{}\[\]]*)/'; //Find any left curly brackets surrounded or not by one or more of any character except spaces, colons, commas, curly and square brackets...
$patterns[2] = "/([^\s:,\{}\[\]]+)}/"; //Find any right curly brackets preceded by one or more of any character except spaces, colons, commas, curly and square brackets...
$patterns[3] = "/(}),\s*/"; //JSON.parse() doesn't allow trailing commas
/** reformatting */
$patterns[4] = '/([^\s:,\{}\[\]]+\s*)*[^\s:,\{}\[\]]+/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets followed by one or more of any character except spaces, colons, commas, curly and square brackets...
$patterns[5] = '/["\']+([^"\':,\{}\[\]]*)["\']+/'; //Find one or more of quotation marks or/and apostrophes surrounding any character except colons, commas, curly and square brackets...
$patterns[6] = '/(")([^\s:,\{}\[\]]+)(")(\s+([^\s:,\{}\[\]]+))/'; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by quotation marks followed by one or more spaces and one or more of any character except spaces, colons, commas, curly and square brackets...
$patterns[7] = "/(')([^\s:,\{}\[\]]+)(')(\s+([^\s:,\{}\[\]]+))/"; //Find or not one or more of any character except spaces, colons, commas, curly and square brackets surrounded by apostrophes followed by one or more spaces and one or more of any character except spaces, colons, commas, curly and square brackets...
$patterns[8] = '/(})(")/'; //Find any right curly brackets followed by quotation marks...
$patterns[9] = '/,\s+(})/'; //Find any comma followed by one or more spaces and a right curly bracket...
$patterns[10] = '/\s+/'; //Find one or more spaces...
$patterns[11] = '/^\s+/'; //Find one or more spaces at start of string...
$replacements = [];
/** garbage removal */
$replacements[0] = '$1 "$2" $3'; //...and put quotation marks surrounded by spaces between them;
$replacements[1] = '$1 { $2'; //...and put spaces between them;
$replacements[2] = '$1 }'; //...and put a space between them;
$replacements[3] = '$1'; //...so, remove trailing commas of any right curly brackets;
/** reformatting */
$replacements[4] = '"$0"'; //...and put quotation marks surrounding them;
$replacements[5] = '"$1"'; //...and replace by single quotation marks;
$replacements[6] = '\\$1$2\\$3$4'; //...and add back slashes to its quotation marks;
$replacements[7] = '\\$1$2\\$3$4'; //...and add back slashes to its apostrophes;
$replacements[8] = '$1, $2'; //...and put a comma followed by a space character between them;
$replacements[9] = ' $1'; //...and replace by a space followed by a right curly bracket;
$replacements[10] = ' '; //...and replace by one space;
$replacements[11] = ''; //...and remove it.
$result = preg_replace($patterns, $replacements, $json);
return $result;
}
?>
Example of usage:
<?php
// Received badly formatted json:
// {"contact1": "David "Dave" Letterman", price : 30.00, 'details' : "Greatest 'Hits' Album"}
$json_string = '{"contact1": "David "Dave" Letterman", price : 30.00, \'details\' : "Greatest \'Hits\' Album"}';
jsonFixer($json_string);
?>
Will result:
{"contact1": "David \"Dave\" Letterman", "price" : "30.00", "details" : "Greatest \'Hits\' Album"}
Note: this wasn't tested with all possible badly formatted JSON strings but I use on a complex multi level JSON string and is working well until then.
As others have already pointed out, it's best if you tell your client for the problem with the JSON formatting. Ask them to send a bugreport to the original developer/company so they could fix it. If he/they can't fix it - then offer your solution. You simply need to addslashes the string before you json_encode it.
If for some reason you end up having to fix the formatting, here is a way that might work for you:
$data = '"contact1": "David "Dave" Letterman", "contact2": "Peter "Robert" Smith",{\'test\': \'working "something"\'}';
function replace($match){
$key = trim($match[1]);
$val = trim($match[2]);
if($val[0] == '"')
$val = '"'.addslashes(substr($val, 1, -1)).'"';
else if($val[0] == "'")
$val = "'".addslashes(substr($val, 1, -1))."'";
return $key.": ".$val;
}
$preg = preg_replace_callback("#([^{:]*):([^,}]*)#i",'replace',$data);
var_dump($preg);
// string '"contact1": "David \"Dave\" Letterman", "contact2": "Peter \"Robert\" Smith",{'test': 'working \"something\"'}' (length=110)
Keep in mind this may break if somebody messes with the json format again.
As other people have said, you can do a search and replace, but the hard part is going to be creating your fuzzy matching rules, because in order to parse it, you will need to assume some things. Probably, you will need to assume either:
1a) Keys don't contain colons
1b) or key quotes are properly escaped
and
2a) Values don't contain commas
2b) or values have properly escaped quotes.
Even then, you might get into situations where your parsing gets confused, and it gets worse if they have comments their JSON. (Not conforming, but very common.)
Now, depending on the data, you can use newlines to decide when you're looking at a new key, but again, that's not reliable and you start making big assumptions.
So, long story short you either have to make some assumptions that might be made wrong at any time, or you need to get them to fix the data.
Tell them to escape their strings before output. You can even offer to fix it or provide the code solution.
Otherwise you can use preg_replace with a regex expression
See Replacing specified double quotes in text with preg_replace
Regexp are not reliable when comma and [] are in the values is contains json strings, worries and nightmare start. In php json_decode fails without quotes on key, one suggest to use pear Services_JSON which achieves the safiest results if code fixed for class names and the game of invalid json is over:
<?php include("Services_JSON-1.0.3b/JSON.php");
//Patched version https://github.com/pear/Services_JSON/edit/trunk/JSON.php
$json = <<< JSONCODEFROMJS
{
sos:presents,
james:'bond',
"agent":[0,0,7],
secret:"{mission:'impossible',permit: \"tokill\"}",
go:true
}
JSONCODEFROMJS;
function json_fix($json) {
$sjson = new Services_JSON(SERVICES_JSON_IN_ARR|SERVICES_JSON_USE_TO_JSON| SERVICES_JSON_LOOSE_TYPE);
$json_array=$sjson->decode($json);
return json_encode($json_array);
}
$json_array = json_decode(json_fix($json),true);
if(json_last_error() == JSON_ERROR_NONE) {
$json=json_encode($json_array,JSON_PRETTY_PRINT);
echo "<pre>";
echo(htmlentities($json));
echo "</pre>";
} else {
die(json_last_error_msg());
}
?>
my solution was :
function format_json($str)
{
...
$str = preg_replace('/([a-z]+):/i', '"$1":', $str);
$wtr = array('""',',",');
$rpw = array('"',',');
return str_replace($wtr, $rpw, $str);
}
Like stated in previous answers, you need to add double-quote to strings. A really fast and quick way to check json string is to use JSONLint.
Here's the output you will get from JSONLint:
Parse error on line 1:
{ m: [ {
-----^
Expecting 'STRING', '}'
So you need to change all parts that don't have double quotes. For example:
{m: [ ...
Will become:
{"m": [ ...
Edit after comment: It seems that the double quotes inside strings are not escaped properly. For example:
{"m" : [ { "g": [ "32", "Brezilya-"Serie" A", "83", ...
Here -----------------------------^ and ^
Should be:
{"m" : [ { "g": [ "32", "Brezilya-\"Serie\" A", "83", ...
JSON only supports double-quoted strings, and all of the property names must be quoted.
Try to run your JSON through JSONLint. To begin with, property names must be enclosed in double quotes. Then, strings must also be enclosed in double quotes instead of single quotes:
{
"m": [
{
"g": [
32,
"Brezilya-SerieA",
.
.
.
You can parse this string with class from here http://pear.php.net/pepr/pepr-proposal-show.php?id=198
require_once 'JSON.php';
$json = new Services_JSON();
$data = $json->decode($yourstring);
IN your JSON you should use " " instead of ' ' and that will get solved.
its a convention in JSON that the double qoutes will be used to define object names or objects .. try using the way u write a string in C++ for defining your json .
$var ="
{
key : {
key_deep : val\{ue /* should be "val{ue" as { is escaped */
} ,
key2 : value
}
";
print_r(preg_split('//',$var));
// array(
// array(
// 'key'=> array(
// 'key_deep'=> 'val{ue'
// )
// ),
// array('key2'=>'value')
// );
is there a regular expression to split this using preg_split in php?
basically I need the same as json_decode() but without the need of the the quotes on BOTH value and key and the only thing escaped are four characters \{ \, \} \:
Well for one thing that json is incorrect and will spew out an error on json_decode.
read the specs for json here
One correct implementation of the json is:
$var ='
{
"key" : {
key_deep : "val\{ue"
} ,
"key2" : "value"
}
';
Also json_decode never yields an Array it yields a object(stdClass) unless you add the true parameter
You're probably going to want to look at a parser rather than a regular expression, given the arbitrary nesting that could occur here.
Try:
http://pear.php.net/package/PHP_ParserGenerator/redirected
or
http://www.hwaci.com/sw/lemon/
or
http://www.google.com/search?sourceid=chrome&ie=UTF-8&q=php+parser+generator