Parsing a Zipped (GZ) JSON file in PHP - php

With help from the guys on Stackoverflow I can now Parse JSON code from a file and save a 'Value' into a database
However the file I intend to read from is actually a massive 2GB file. My web server will not hold this file. However it will hold a ZIPPED version of it - ie 80MB.(ie .GZ)
I believe there is a way to PARSE JSON from a ZIPPED file (.GZ)..........Can anybody help?
I have found the below function which I believe will do this (I think) but I don't know how to link it to my code
private function uncompressFile($srcName, $dstName) {
$sfp = gzopen($srcName, "rb");
$fp = fopen($dstName, "w");
while ($string = gzread($sfp, 4096)) {
fwrite($fp, $string, strlen($string));
}
gzclose($sfp);
fclose($fp);
}
My current PHP code is below and works. It reads a basic small file, JSON decodes it (The JSON is in a series of separate lines hence the need for FILE_IGNORE_NEW_LINES) and then takes a value and saves to MySQL database.
However I believe I need to somehow combine these two bits of code so I can read a ZIPPED file without exceeding my 100MB storage on my webserver
$file="CIF_ALL_UPDATE_DAILY_toc-update-sun";
$trains = file($json_filename, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
foreach ($trains as $train) {
$json=json_decode($train,true);
foreach ($json as $key => $value) {
$input=$value['main_train_uid'];
$q="INSERT INTO railstptest (main_train_uid) VALUES ('$input')";
$r=mysqli_query($mysql_link,$q);
}
}
}
if (is_null($json)) {
die("Json decoding failed with error: ". json_last_error());
}
mysqli_close($mysql_link);
Many Thanks
EDIT
Here is a short snippet of the JSON . There are a series of these
I would only want to be getting a few key values. For example the value G90491 and P20328. A lot of the info I would not need
{"JsonAssociationV1":{"transaction_type":"Delete","main_train_uid":"G90491","assoc_train_uid":"G90525","assoc_start_date":"2013-09-07T00:00:00Z","location":"EDINBUR","base_location_suffix":null,"diagram_type":"T","CIF_stp_indicator":"O"}}
{"JsonAssociationV1":{"transaction_type":"Delete","main_train_uid":"P20328","assoc_train_uid":"P21318","assoc_start_date":"2013-08-23T00:00:00Z","location":"MARYLBN","base_location_suffix":null,"diagram_type":"T","CIF_stp_indicator":"C"}}

It may be possible to do stream extraction of the file and then use a stream JSON parser. ZipArchive has getStream, and someone created a streaming JSON parser for PHP.
You will have to write a listener that inserts the database values as they are found and discards unnecessary JSON so it does not consume memory.
$zip = new ZipArchive;
$zip->open("file.zip");
$parser = new JsonStreamingParser_Parser($zip->getStream("file.json"),
new DB_Value_Inserter);
$parser->parse();
Based on your question, you're working with gzip instead of zip. To get the stream you can use
fopen("compress.zlib://path/to/file.json", "r");
It's difficult to write the DB_Value_Inserter since you haven't provided the format of the JSON you need, but it seems like you can probably just override the Listener::value method and just write the string values you receive.

PHP has compression wrappers that can help with opening and reading lines from compressed files. One is for reading gzip files:
$gzipFile = 'CIF_ALL_UPDATE_DAILY_toc-update-sun.gz';
$trains = new SplFileObject("compress.zlib://{$gzipFile}", 'r');
$trains->setFlags(SplFileObject::DROP_NEW_LINE | SplFileObject::READ_AHEAD
| SplFileObject::SKIP_EMPTY);
Because SplFileObject is iterable, you can keep your outer foreach loop the way it is. Of course, fgets() remains an alternative to using SplFileObject.

Related

MAMP strange behaviour : php read external file from an http:// is very slow, but from https:// is quick

I have a simple PHP script to read a remote file line-by-line, and then JSON decode it. On the production server all works ok, but on my local machine (MAMP stack, OSX) the PHP hangs. It is very slow, and takes more than 2 minutes to produce the JSON file. I think it's the json_decode() that is freezing. Why only on MAMP?
I think it's stuck in while loop, because I can't show the final $str variable that is the result of all the lines.
In case you are wondering why I need to read the file line-by-line, it's because in the real scenario, the remote JSON file is a 40MB text file. My only good performance result is like this, but any good suggestion?
Is there a configuration in php.ini to help solve this?
// The path to the JSON File
$fileName = 'http://www.xxxx.xxx/response-single.json';
//Open the file in "reading only" mode.
$fileHandle = fopen($fileName, "r");
//If we failed to get a file handle, throw an Exception.
if($fileHandle === false){
error_log("erro handle");
throw new Exception('Could not get file handle for: ' . $fileName);
}
//While we haven't reach the end of the file.
$str = "";
while(!feof($fileHandle)) {
//Read the current line in.
$line = fgets($fileHandle);
$str .= $line;
}
//Finally, close the file handle.
fclose($fileHandle);
$json = json_decode($str, true); // decode the JSON into an associative array
Thanks for your time.
I found the cause. It is path protocol.
With
$filename = 'http://www.yyy/response.json';
It freezes the server for 1 to 2 minutes.
I changed the file to another server with https protocol, and used
$filename = 'https://www.yyy/response.json';
and it works.

Which is the faster way to remove a list of rows from huge log file using PHP

I need to remove various useless log rows from a huge log file (200 MB)
/usr/local/cpanel/logs/error_log
The useless log rows are in array $useless
The way I am doing is
$working_log="/usr/local/cpanel/logs/error_log";
foreach($useless as $row)
{
if ($row!="") {
file_put_contents($working_log,
str_replace("$row","", file_get_contents($working_log)));
}
}
I need to remove about 65000 rows from the log file;
the code above does the job but it works slow, about 0.041 sec to remove each row.
Do you know a faster way to do this job using php ?
If the file can be loaded in memory twice (it seems it can if your code works) then you can remove all the strings from $useless in a single str_replace() call.
The documentation of str_replace() function explains how:
If search is an array and replace is a string, then this replacement string is used for every value of search.
$working_log="/usr/local/cpanel/logs/error_log";
file_put_contents(
$working_log,
str_replace($useless, '', file_get_contents($working_log))
);
When the file becomes too large to be processed by the code above you have to take a different approach: create a temporary file, read each line from the input file and write it to the temporary file or ignore it. At the end, move the temporary file over the source file:
$working_log="/usr/local/cpanel/logs/error_log";
$tempfile = "/usr/local/cpanel/logs/error_log.new";
$fin = fopen($working_log, "r");
$fout = fopen($tempfile, "w");
while (! feof($fin)) {
$line = fgets($fin);
if (! in_array($line, $useless)) {
fputs($fout, $line);
}
}
fclose($fin);
fclose($fout);
// Move the current log out of the way (keep it as backup)
rename($working_log, $working_log.".bak");
// Put the new file instead.
rename($tempfile, $working_log);
You have to add error handling (fopen(), fputs() may fail for various reasons) and code or human intervention to remove the backup file.

Uploading CSV or XML for edit and return

I haven't done much coding in the way of HTML5 and PHP before as ive always used Python and only created in system applications instead of web based apps.
Ive tried to find but could not, any information that might assist me with my latest task.
I would like for users to be able to upload a CSV or XML file (Havent decided on format yet) that contains SKUs in one field and Prices in another (Columns).
I then want the user to be able to specify a set of variables and have the document edited to that effect.
Im not sure if I would have to use MySQL to achieve this, and I have no experience with it so if I can at all avoid it then that would be preferable.
Any advice / suggestions on material for doing this, or even actual examples of how this might be achieved would go a long way to increasing my understanding of how to approach this task.
Kind Regards.
Lewis
You can use fgetcsv method and fputcsv methods to manipulate a csv file in php.
For xml files you can use simpleXML parser
I will give an example for CSV files in php.
Reading a CSV file
if(file_exists("/tmp/my_file.csv")){
$filex = fopen("/tmp/my_file.csv","r");
}
else{
echo "file not found";
}
$data = array();
while(!feof($file))
{
$data[] = fgetcsv($file);
}
fclose($filex);
//now you can manipulate $data as you wish
Writing to CSV file
$list = array
(
"abcd,efgh,ijkl,mnop",
"qrst,uvwx,yzab,cdef"
);
$file = fopen("my_file.csv","w");
foreach ($list as $line)
{
fputcsv($file,explode(',',$line));
}
fclose($file);

Extract a file from a ZIP string

I have a BASE64 string of a zip file that contains one single XML file.
Any ideas on how I could get the contents of the XML file without having to deal with files on the disk?
I would like very much to keep the whole process in the memory as the XML only has 1-5k.
It would be annoying to have to write the zip, extract the XML and then load it up and delete everything.
I had a similar problem, I ended up doing it manually.
https://www.pkware.com/documents/casestudies/APPNOTE.TXT
This extracts a single file (just the first one), no error/crc checks, assumes deflate was used.
// zip in a string
$data = file_get_contents('test.zip');
// magic
$head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,0,30));
$filename = substr($data,30,$head['namelen']);
$raw = gzinflate(substr($data,30+$head['namelen']+$head['exlen'],$head['csize']));
// first file uncompressed and ready to use
file_put_contents($filename,$raw);
After some hours of research I think it's surprisingly not possible do handle a zip without a temporary file:
The first try with php://memory will not work, beacuse it's a stream that cannot be read by functions like file_get_contents() or ZipArchive::open(). In the comments is a link to the php-bugtracker for the lack of documentation of this problem.
There is a stream support ZipArchive with ::getStream() but as stated in the manual, it only supports reading operation on an opened file. So you cannot build a archive on-the-fly with that.
The zip:// wrapper is also read-only: Create ZIP file with fopen() wrapper
I also did some attempts with the other php wrappers/protocolls like
file_get_contents("zip://data://text/plain;base64,{$base64_string}#test.txt")
$zip->open("php://filter/read=convert.base64-decode/resource={$base64_string}")
$zip->open("php://filter/read=/resource=php://memory")
but for me they don't work at all, even if there are examples like that in the manual. So you have to swallow the pill and create a temporary file.
Original Answer:
This is just the way of temporary storing. I hope you manage the zip handling and parsing of xml on your own.
Use the php php://memory (doc) wrapper. Be aware, that this is only usefull for small files, because its stored in the memory - obviously. Otherwise use php://temp instead.
<?php
// the decoded content of your zip file
$text = 'base64 _decoded_ zip content';
// this will empty the memory and appen your zip content
$written = file_put_contents('php://memory', $text);
// bytes written to memory
var_dump($written);
// new instance of the ZipArchive
$zip = new ZipArchive;
// success of the archive reading
var_dump(true === $zip->open('php://memory'));
toster-cx had it right,you should award him the points, this is an example where the zip comes from a soap response as a byte array (binary), the content is an XML file:
$objResponse = $objClient->__soapCall("sendBill",array(parameters));
$fileData=unzipByteArray($objResponse->applicationResponse);
header("Content-type: text/xml");
echo $fileData;
function unzipByteArray($data){
/*this firts is a directory*/
$head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,0,30));
$filename = substr($data,30,$head['namelen']);
$if=30+$head['namelen']+$head['exlen']+$head['csize'];
/*this second is the actua file*/
$head = unpack("Vsig/vver/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr($data,$if,30));
$raw = gzinflate(substr($data,$if+$head['namelen']+$head['exlen']+30,$head['csize']));
/*you can create a loop and continue decompressing more files if the were*/
return $raw;
}
If you know the file name inside the .zip, just do this:
<?php
$xml = file_get_contents('zip://./your-zip.zip#your-file.xml');
If you have a plain string, just do this:
<?php
$xml = file_get_contents('compress.zlib://data://text/plain;base64,'.$base64_encoded_string);
[edit] Documentation is there: http://www.php.net/manual/en/wrappers.php
From the comments: if you don't have a base64 encoded string, you need to urlencode() it before using the data:// wrapper.
<?php
$xml = file_get_contents('compress.zlib://data://text/plain,'.urlencode($text));
[edit 2] Even if you already found a solution with a file, there's a solution (to test) I didn't see in your answer:
<?php
$zip = new ZipArchive;
$zip->open('data::text/plain,'.urlencode($base64_decoded_string));
$zip2 = new ZipArchive;
$zip2->open('data::text/plain;base64,'.urlencode($base64_string));
If you are running on Linux and have administration of the system. You could mount a small ramdisk using tmpfs, the standard file_get / put and ZipArchive functions will then work, except it does not write to disk, it writes to memory.
To have it permanently ready, the fstab is something like:
/media/ramdisk tmpfs nodev,nosuid,noexec,nodiratime,size=2M 0 0
Set your size and location accordingly so it suits you.
Using php to mount a ramdisk and remove it after using it (if it even has the privileges) is probably less efficient than just writing to disk, unless you have a massive number of files to process in one go.
Although this is not a pure php solution, nor is it portable.
You will still need to remove the "files" after use, or have the OS clean up old files.
They will of coarse not persist over reboots or remounts of the ramdisk.
if you want to read the content of a file from zip like and xml inside you shoud look at this i use it to count words from docx (wich is a zip )
if (!function_exists('docx_word_count')) {
function docx_word_count($filename)
{
$zip = new ZipArchive();
if ($zip->open($filename) === true) {
if (($index = $zip->locateName('docProps/app.xml')) !== false) {
$data = $zip->getFromIndex($index);
$zip->close();
$xml = new SimpleXMLElement($data);
return $xml->Words;
}
$zip->close();
}
return 0;
}
}
The idea comes from toster-cx is pretty useful to approach malformed zip files too!
I had one with missing data in the header, so I had to extract the central directory file header by using his method:
$CDFHoffset = strpos( $zipFile, "\x50\x4b\x01\x02" );
$CDFH = unpack( "Vsig/vverby/vverex/vflag/vmeth/vmodt/vmodd/Vcrc/Vcsize/Vsize/vnamelen/vexlen", substr( $zipFile, $CDFHoffset, 46 ) );

json output to file readable again in php

I want to be able to transfer a php array from one server to another via ftp in the form of a file.
The receiving server needs to be able to open the said file and read its contents and use the array provided.
I've thought about going about this two ways, either writing a php file from server 1 with php code of an array then simply loading this file on server 2. However writing the said file is getting tricky when the depth of the array is unknown.
So I though about writing the array to the file json encoded but I don't know how the second server could open and read the said data.
Could I simply do:
$jsonArray= json_encode($masterArray);
$fh = fopen('thefile.txt' , 'w');
fwrite($fh, $thePHPfile);
fclose($fh);
Then on the other server open the data into a variable:
$data = json_decode( include('thefile.txt') );
Has anyone had any experience of this before?
For first server, connect to second server by FTP and put that file contents into a file
$jsonArray = json_encode($masterArray);
$stream = stream_context_create(array('ftp' => array('overwrite' => true)));
file_put_contents('ftp://user:pass#host/folder/thefile.txt', $jsonArray, 0, $stream);
use file_get_contents() for second server:
$data = json_decode( file_get_contents('/path/to/folder/thefile.txt') );
If you're only going to be interested in reading the file using PHP, have you thought about using serialize() and unserialize()?
See http://php.net/manual/en/function.serialize.php
It's also probably faster than json_encode() / json_decode() (see http://php.net/manual/en/function.serialize.php#103761).
The PHP Function that your looking for is : file_get_contents
$masterArray = array('Test','Test2','Test3');
$jsonArray= json_encode($masterArray);
$fh = fopen('thefile.txt' , 'w');
fwrite($fh, $jsonArray);
fclose($fh);
Then on the other server:
$masterArray = json_decode( file_get_contents('thefile.txt') );
var_dump($masterArray);
To "transfer" the array between servers, using a file as medium, you found a nice solution by using json_encode and json_decode. The serialize and unserialize functions would perform the same goal nicely.
$my_array = array('contents', 'et cetera');
$serialized = serialize($my_array);
$json_encoded = json_encode($my_array);
// here you send the file to the other server, (you said you know how to do)
// for example:
file_put_contents($serialized_destination, $serialized);
file_put_contents($json_encoded_destination, $json_encoded);
In the receiving server, you just need to read the file contents and apply the corresponding "parse" function:
$serialized = file_get_contents($serialized_destination);
$json_encoded = file_get_contents($json_encoded_destination);
$my_array1 = unserialize($serialized);
$my_array2 = json_decode($json_encoded);

Categories