First of all, I appreciate there are lots of answers regarding dealing with large JSON files. However, I have yet to find one that encounters my scenario.
The problem I face is that I have large JSON files (12mb) that look like this:
{
"range": "Sheet1!A1:P40571",
"majorDimension": "ROWS",
"values": [
[
"new_id",
"qty",
"total_job_cost",
"total_job_revenue",
"total_job_profit",
"total_job_margin"
],
[
"34244",
"5",
"211.25",
"297.00",
"85.75",
"28.87%"
],
[
"34244",
"10",
"211.25",
"297.00",
"85.75",
"28.87%"
],
...
]
}
And I wish to extract out the values array, and convert it into a csv that would like this:
new_id,total_job_cost,total_job_revenue,total_job_profit,total_job_margin
34244,211.25,297.00,85.75,28.87%
34245,211.25,297.00,85.75,28.87%
...
However, since the values array is so large, when I try to extract it using a PHP library for JSON parsing, my server crashes when it tries to read it.
Any suggestions or tips appreciated. Thanks.
You can't read json line by line,but not with any built in libraries. I wrote a simple Json parser for another answer here
Convert structure to PHP array
I had to make a slight modification to handle "real" json" In the switch change this token
case 'T_ENCAP_STRING':
if( $mode == 'key'){
$key .= trim($content,'"');
}else{
value .= unicode_decode($content); //encapsulated strings are always content
}
next($lexer_stream);//consume a token
break;
You can test it here
http://sandbox.onlinephpfunctions.com/code/b2917e4bb8ef847df97edbf0bb8f415a10d13c9f
and find the full (updated) code here
https://github.com/ArtisticPhoenix/MISC/blob/master/JasonDecoder.php
Can't guarantee it will work but it's worth a shot. It should be fairly easy to modify it to read your file.
If the problem is simply to convert the large JSON file to a CSV file, then perhaps a jq solution is admissible. Depending on the computing environment, jq can generally handle large files (GB) breezily, and with a little more effort, it can usually handle even larger files as it has a "streaming parser".
In any case, here is a jq solution to the problem as stated:
jq -r '(.values[] | [.[0,2,3,4,5]]) | #csv' data.json > extract.csv
For the sample input, this produces:
"new_id","total_job_cost","total_job_revenue","total_job_profit","total_job_margin"
"34244","211.25","297.00","85.75","28.87%"
"34244","211.25","297.00","85.75","28.87%"
This is valid CSV, and the use of #csv guarantees the result, if any, will be valid CSV, but if you want the quotation marks removed, there are several options, though whether they are "safe" or not will depend on the data. Here is an alternative jq solution that produces comma-separated values. It uses join(",") instead of #csv:
(.values[] | [.[0,2,3,4,5]]) | join(",")
Related
I have been using PHP to call an API and retrieve a set of data in JSON.I need to make 20 batch calls at once. Therefore, speed is very important.
However, I only need the first element, in fact, I only need one data which is right at the beginning of the JSON file, index 0. However, the JSON file returns about 300 sets of data. I have to wait until all the data are ready before I can proceed.
I want to speed up my API calls by eliminating redundant datasets. Is it possible for me to receive the first set of the data only without having to wait until everything is ready and then go indexing the first element?
excuse my english...thank you in advance.
you could use fopen to get the bytes that are guaranteed to have what you need in it and then use regex to parse it. something like:
$max_bytes = 512;
$fp = fopen($url, "r") ;
$data = "" ;
if($fp) {
while(!preg_match('/"totalAmount"\:"(.*)"/U', $data, $match))
$data .= stream_get_contents($fp, $max_bytes) ;
fclose($fp);
if(count($match)){
$totalAmount = $match[1];
}
}
keep in mind that you cant use the string stored in $data as valid json. It will only be partial
no. json is not a "streamable" format. until you receive the whole string, it cannot be decoded into a native structure. if you KNOW what you need, then you could use string operations to retrieve the portion you need, but that's not reliable nor advisable. Similarly, php will not stream out the text as it's encoded.
e.g. consider a case where your data structure is a LOOOONG shallow array
$x = array(
0 => blah
1 => blah
...
999,999,998 => blah
999,999,999 => array( .... even more nested data here ...)
);
a streaming format would dribble this out as
['blah', 'blah' ............
you could assume that there's nothing but those 'blah' at the top level and output a ], to produce a complete json string:
['blah'.... , 'blah']
and send that, but then you continue encoding and reach that sub array... now you've suddenly got
['blah' ....., 'blah'][ ...sub array here ....]
and now it's no longer valid JSON.
So basically, json encoding is done in one (long) shot, and not in dibs and drabs, just because you simply cannot know what's coming "later" without parseing the whole structure first.
No. You need to fetch the whole set before parsing and sending the data you need back to the client machine.
I am trying to parse the json output from
http://www.nyc.gov/portal/apps/311_contentapi/services/all.json
And my php json_decode returns a NULL
I am not sure where the issue is, I tried running a small subset of the data through JSONLint and it validated the json.
Any Ideas?
The error is in this section:
{
"id":"2002-12-05-22-24-56_000010083df0188b4001eb56",
"service_name":"Outdoor Electric System Complaint",
"expiration":"2099-12-31T00:00:00Z",
"brief_description":"Report faulty Con Edison equipment, including dangling or corroded power lines or "hot spots.""
}
See where it says "hot spots." in an already quoted string. Those "'s should've been escaped. Since you don't have access to edit the JSON perhaps you could do a search for "hot spots."" and replace it with \"hot spots.\"" like str_replace('"hot spots.""', '\\"hot spots.\\""\, $str); for as long as that's in there. Of course that only helps if this is a one time thing. If the site continues to make errors in their JSON output you'll have to come up with something more complex.
What I did to identify the errors in the JSON ...
Since faulty quoting is the first thing to look for, I downloaded the JSON to a text file, opened in a text editor (I used vim but any full featured editor would do), ran a search and replace that removed all characters except double-quote and looked at the result. It was clear that correct lines should have 4 double-quotes so I simply searched for 5 double-quotes together and found the first bad line. I noted the line number and then undid the search and replace to get the original file back and looked at that line. This gives you what you need to get the developers of the API to fix the JSON.
Writing code to automatically fix the bad JSON before giving it to json_decode() would be quite a bit harder but doable using techniques like those in another answer.
According to the PHP manual:
In the event of a failure to decode, json_last_error() can be used to determine the exact nature of the error.
Try calling it to see where the error is.
I am newer for php. I want make php page cache, query data from mysql and store data into json format.
I have many questions:
which type of file should I store? .json or .txt or .cache? for I also need use json decode return datas into page.
I want use cron tab, make many mysql queries and write into one json file. what write code should I choose? fopen, fwrite or file_get_contents or other command? (do not cover the data, but continue write. I will deleted the file and renewer it at the next cron time)
If a multi write into a json data (10 or more mysql query at the same time and write into a same json file, each json child format like {name: ".$row['name']."}), how to completed a top { and bottom } to make a standad json data format?
{ //how to add this one
{name: ".$row['name']."}
{name: ".$row['name']."}
// many name from 10 more mysql queries
} //and this one
Thanks.
It's json_encode()
json_encode() — Returns the JSON representation of a value
<?php
$arr = array ('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5);
echo json_encode($arr);
?>
which type of file should I store
It doesn't matter. There is no fixed extension, but I would pick .json just to make it clear what the file is supposed to contain.
what write code should I choose?
Just use file_put_contents to put the JSON string (see next section) into a file.
each json child format like
You really do not want to use that method. It might work for a while, but becomes very complex when you need to handle things like quoting and special-character escapes. Instead of re-inventing the wheel, use PHP's built-in JSON functions for this.
Create the data-structure you want using PHP's strings, numbers, and arrays, and then rely on json_encode to turn it into a string.
The main thing to be careful of is that depending on how your php array() looks, you might get JSON [] versus {}.
As far as saving the file as .txt or .json won't make a difference.
I think the focal point of this all lies in the json_encode page. Here's the example from that page:
This code:
<?php
$arr = array ('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5);
echo json_encode($arr);
?>
Outputs like this:
{"a":1,"b":2,"c":3,"d":4,"e":5}
3 . You can use fopen and fwrite to write to your file. The second argument to fopen is the mode, you want to use 'a' for append.
Don't write your own cache because anything you write in PHP will be slower than can be supported by native extensions (like APC or memcached or even MySQL itself!!).
Don't cache as JSON. JSON is not a particulary 'fast' to serialize. If you're doing caching you don't want to do any serialization at all. Just store it as it is.
MySQL does query caching for you. If performance is a problem first tune your MySQL queries and database schema. Caching is one of the absolute last optimization you want to do.
If you want an easy way to cache, make a MySQL table called 'cache' and use that. If you want quick (small) file access, use MySQL (seriously). If you want an even faster cache access use an in-memory cache like APC or memcached.
I'm currently working a project that has me working with XML a lot. I have to take an XML response and decrypt each text node and then do various tasks with the data. The problem I'm having is taking the response and processing each text node. Originally I was using the XMLToArray library, and that worked fine I would change the XML into an array and then loop through the array and decrypt the values. However some of the XML response I'm dealing with have repeated tags and the XMLToArray library will only return the last values.
Is there a good way that I can take an XML response and process all the text nodes and easily putting the values into an array that has a similar structure to the response?
Thanks in advance.
I would use SimpleXML.
Here's a small example of using it. It loads and parses XML from http://www.w3schools.com/xml/plant_catalog.xml and then outputs values of "COMMON" and "PRICE" tags of each "PLANT" tag.
$xml = simplexml_load_file('http://www.w3schools.com/xml/plant_catalog.xml');
foreach ( $xml->PLANT as $plantNode ) {
echo $plantNode->COMMON, ' - ', $plantNode->PRICE, "\n";
}
If you have any problems with adapting it to your needs, just give an example of your XML so that we can help with it.
All those XML to array libraries are a remain of the times where PHP 4 would force you to write your own XML parser almost from scratch. In recent PHP versions you have a good set of XML libraries that do the hard job. I particularly recommend SimpleXML (for small files) and XMLReader (for large files). If you still find them complicate, you can try phpQuery.
You might want to give SimpleXML a try. Plus it comes by default in php so you dont need to install
Check out SimpleXML, it may offer a bit more for what you are looking for.
I need to push some JSON data to my website which I would like to read in PHP. What type of file should I make this? A PHP file with the JSON inside of a variable? I understand how to make a text file with JSON encoded data in it, but how do I get this into PHP? Should I use a PHP include with the JSON-encoded data in it assigned to a variable? Or should I read the file from PHP and put the contents into a variable?
Save your json string as plain text, then you can use:
$file = yourfile
$data = file_get_contents($file);
$parsed = json_decode($data);
// compacted:
$parsed = json_decode(file_get_contents($file));
See file_get_contents() and json_decode().
The advantage of doing this (versus storing it in a PHP file then including it) is that now any program or language that understands JSON can read the file.
The question is too vague for a definite "do this" answer, but here are some options and what they might be most suitable for:
Turn the json data into a PHP data structure. If this is a one-time thing (meaning you won't be getting a new json file every day or week or hour), then reading a file (file_get_contents) and parsing JSON (json_decode) for every request is a pretty big waste of resources since that data isn't changing on a regular basis. Just turn JSON key/value objects into PHP associative arrays, JSON strings into PHP strings, etc.
Just serve the json file. If this is data that will just wind up going to the client to be used in javascript anyway, there's no need to do anything special with it on the server, just parse the json on the client.
Put it in a database. This may be a little heavy-handed, but if you really need it in PHP and not just the client, and it is going to be changing or growing on a regular basis, it may be worth it to have something that handles this use case appropriately.