I have an XML data source URL from where I am reading the data using fread. It contains student information from which I am extracting the Grades and compiling them in an array.
The problem is when I run this script locally, it works fine and all the grades are correctly listed/collected in array. However, when I run this script on a shared server, I get some incorrectly read grades in addition the normal grade names, for example, "ergarten". The complete grade name "Kindergarten" is also recorded in the array which means that there a problem in reading only specific elements.
The first suspect I have in mind is fread byte length. I have changed it to 8192 but without luck.
Here is the relevant code chunk from the php file:
if (!($xml_parser = xml_parser_create())) die("Couldn't create parser.");
xml_set_element_handler( $xml_parser, "startElementHandler", "endElementHandler");
xml_set_character_data_handler( $xml_parser, "characterDataHandler");
while( $data = fread($fp, 8192)){
if(!xml_parse($xml_parser, $data, feof($fp))) {
break;}}
xml_parser_free($xml_parser);
Any thoughts?
I found the problem and fixed it myself.
The problem was that in the loop where the data was being read in chunks using fread, I was simultaneously converting that data using the XML parser and that was causing the problem since the streams of data do not always have a full tags. I removed the parser from that point to run it only when all the data has been read by the script.
That solved the problem.
Related
So I have a LOT (millions) of records which I am trying to process. I've tried MongoDB and Neo4j and both simply grind my dual core ubuntu box to a halt.
I am wondering (and I don't believe there is) if there is any way to store PHP arrays in a file but only load one array into memory. So for example:
<?php
$loaded = array('hello','world');
$ignore_me = array('please','ignore');
$ignore_me2 = array('please','ignore','again');
?>
So effectively I could call the $loaded array but the others aren't loaded into memory (even though they're in the same file)? I know about fread/fopen but that tends to be where the file is a general block of text.
If (as I suspect) the answer is no - how would something like a NoSQL database not need to a) create a file per record and b) load everything into memory?? I know Neo4j uses Java but PHP should be able to match that!!
Did you consider Relational Databases such as Mysql, PostgreSql, MS Sql server?
I see that you tried MongoDB, an object-oriented database, and Neo4J, a node-oriented database.
I know that NoSQL is a great trend, but I tried NoSQL with my collections of millions of records and it performs so bad that I switched back to Relational SQL.
If you still insist to go with NoSQL, try Redis and Memcached, they are in-memory databases.
You could use PHP Streams to read/write to a file.
Read file and convert to array
$content = file_get_contents('/tmp/file.csv'); //file.csv contains a,b,c
$csv = implode(","$content); //csv is now array('a', 'b', 'c');
Write to file
$line = array("a", "b", "c");
// create stream
$file = fopen("/tmp/file.csv","w");
// add line as csv
fputcsv($file, $line);
// close stream
fclose($file);
You can also loop and add lines to a csv and loop and retrieve the lines.
https://secure.php.net/manual/en/function.fputcsv.php
You can retrieve multiple lines with fgetcsv which keeps a pointer of the next line to access as well
https://secure.php.net/manual/en/function.fgetcsv.php
I have tried to extract the user email addresses from my server. But the problem is maximum files are .txt but some are CSV files with txt extension. When I am trying to read and extract, I could not able to read the CSV files which with TXT extension. Here is my code:
<?php
$handle = fopen('2.txt', "r");
while(!feof($handle)) {
$string = fgets($handle);
$pattern = '/[A-Za-z0-9._%+-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}/i';
preg_match_all($pattern, $string, $matches);
foreach($matches[0] as $match)
{
echo $match;
echo '<br><br>';
}
}
?>
I have tried to use this code for that. The program is reading the complete file which are CSV, and line by line which are Text file. There are thousands of file and hence it is difficult to identify.
Kindly, suggest me what I should do to resolve my problem? Is there any solution which can read any format, then it will be awesome.
Well your files are different. Because of that you will have to take a different approach for each of those. In more general terms this is usually calling adapting and is mostly provided using the Adapter design pattern.
Should you use the adapter design pattern you would have a code inspecting the extension of a file to be opened and a switch with either txt or csv. Based on the value you would retrieve aTxtParseror aCsvParser` respectively.
However, before diving deep into this territory you might want to have a look at the files first. I cannot say this for sure without seeing the structures but you can. If the contents of both the text and csv files are the same then a very simple approach is to change the extension to either txt or a csv for all files and then process them using same logic, knowing files with the same extension will now be processed in the same manner.
But from what I understood the file structures actually differ. So to keep your code concise the adapter pattern, having two separate classes/functions for parsing and another one on top of that for choosing the right parsing function (this top function would actually be a form of a strategy) and running it.
Either way, I very much doubt so there is a solution for the problem you are facing as a file structure is mostly your and your own.
Ok, so problem is when CSV file has too long string line. Based on this restriction I suggest you to use example from php.net Here is an example:
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
echo $buffer;
// do your operation for searching here
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
I generate JSON files which I load into datatables, and these JSON files can contain thousands of rows from my database. To generate them, I need to loop through every row in the database and add each database row as a new row in the JSON file. The problem I'm running into is this:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 262643 bytes)
What I'm doing is I get the JSON file with file_get_contents($json_file) and decode it into an array then I add a new row to the array, then encode the array back into JSON and export it to the file with file_put_contents($json_file).
Is there a better way to do this? Is there a way I can prevent the memory increasing with each loop iteration? Or is there a way I can clear the memory before it reaches the limit? I need the script to run to completion, but with this memory problem it barely gets up to 5% completion before crashing.
I can keep rerunning the script and each time I rerun it, it adds more rows to the JSON file, so if this memory problem is unavoidable, is there a way to automatically rerun the script numerous times until its finished? For example could I detect the memory usage, and detect when its about to reach the limit, then exit out of the script and restart it? I'm on wpengine so they won't allow security risky functions like exec().
So I switched to using CSV files and it solved the memory problem. The script runs vastly faster too. JQuery DataTables doesn't have built in support for CSV files, so I wrote a function to convert the CSV file to JSON:
public function csv_to_json($post_type) {
$data = array(
"recordsTotal" => $this->num_rows,
"recordsFiltered" => $this->num_rows,
"data"=>array()
);
if (($handle = fopen($this->csv_file, 'r')) === false) {
die('Error opening file');
}
$headers = fgetcsv($handle, 1024, "\t");
$complete = array();
while ($row = fgetcsv($handle, 1024, "\t")) {
$complete[] = array_combine($headers, $row);
}
fclose($handle);
$data['data'] = $complete;
file_put_contents($this->json_file,json_encode($data,JSON_PRETTY_PRINT));
}
So the result is I create a CSV file and a JSON file much faster than creating a JSON file alone, and there are no issues with memory limits.
Personally as I said in the comments, I would use CSV files. They have several advantages.
you can read / write one line at a time so you only manage the memory for one line
you can just append new data into the file.
PHP has plenty of built in support using either the fputcsv() or SPL file objects.
you can load them directly into the database using using "Load Data Infile"
http://dev.mysql.com/doc/refman/5.7/en/load-data.html
The only cons are
keep the same schema through the whole file
no nested data structures
The issue with Json, is ( as far as I know ) you have to keep the whole thing in memory as a single data set. Therefor you cannot stream it ( line for line ) like a normal text file. There is really no solution beside limiting the size of the json data, which may or may not even be easy to do. You can increase the memory some, but that is just a temporary fix if you expect the data to continue to grow.
We use CSV files in a production environment and I regularly deal with datasets that are 800k or 1M rows. I've even seen one that was 10M rows. We have a single table of 60M rows ( MySql ) that is populated from CSV uploads. So it will work and be robust.
If your set on Json, then I would just come up with a fixed number of rows that works and design your code to only run that many rows at a time. It's impossible for me to guess how to do that without more details.
This must be relatively easy, but I'm struggling to find a solution. I receive data using proprietary network protocol with encryption and at the end the entire received content ends up in a variable. The content is actually that of a CSV file - and I need to parse this data.
If this were a regular file on disk, I could use fgetcsv; if I could somehow break the content into individual records, I could use str_getcsv - but how can I break this file into records? Simple reading until a newline will not work, because CSV can contain values with line breaks in them. Below is an example set of data:
ID,SLN,Name,Address,Contract no
123,102,Market 1a,"Main street, Watertown, MA, 02471",16
125,97,Sinthetics,"Another address,
Line 2
City, NY 10001",16
167,105,"Progress, ahead",,18
All of this data is held inside one variable - and I need to parse it.
Of course, I can always write this data into a temporary file on disk the read/parse it using fgetcsv, but it seems extremely inefficient to me.
If fgetcsv works for you, consider this:
file_put_contents("php://temp",$your_data_here);
$stream = fopen("php://temp","r");
// $result = fgetcsv($stream); ...
For more on php://temp, see the php:// wrapper
I am a php programmer and currently I am working with files. I have to parse and insert the data to mysql database. Since its large amount of data php unable to load or parse the file. I am getting memory leak error even though I have increased memory_limit upto 1500MB.
FATAL: emalloc(): Unable to allocate 456185835 bytes
my text file contains text and xml data. I have to parse the xml data from the text file.
eg: <ajax>some text goes here</ajax> non relativ text <ajax>other content</ajax>
In the above example I have to parse the content inside tag. If any one can give some advice to separate each tag into individual file(eg: 1.txt, 2.txt), it will be great(perl or c or shell scripting..etc ).
Cough... a 1500 MB memory limit is a sure sign you have gone off the rails.
Where are you getting your file? I assume (given the size) that this is a local file. If you are trying to load the file into a string using file_get_contents() it is worth noting that the docs are wrong and that said function does not in fact using memory-mapped I/O (cf. bug 52802). So this is not going to work for you.
What you might try is instead falling back to more C-like (but still PHP) constructs, in particular fopen(), fseek(), and fread(). If the file is of a known structure with newlines, you might also consider fgets().
These should allow you to read in bytes in chunks into a reasonable size buffer from which you can do your processing. Since it looks like you are processing tagged strings, you will have to play the usual games of keeping multiple buffers around in which you can accumulate data until processable. This is fairly standard stuff covered in most introductions to, e.g., stream processing in C.
Note that in PHP (or any other language for that matter), you are also going to have to potentially consider issues of string encoding because, in general, it is no longer the case that 1 byte == 1 character (cf. Unicode).
As you insinuate, PHP may well not be the best language for this task (though it certainly can do it). But your problem isn't really a language-specific one; you are running into a fundamental limitation of handling large files without memory-mapping.
you can actually parse XML with PHP a small block at a time so you dont actually require much ram at all:
set_time_limit(0);
define('__BUFFER_SIZE__', 131072);
define('__XML_FILE__', 'pf_1360591.xml');
function elementStart($p, $n, $a) {
//handle opening of elements
}
function elementEnd($p, $n) {
//handle closing of elements
}
function elementData($p, $d) {
//handle cdata in elements
}
$xml = xml_parser_create();
xml_parser_set_option($xml, XML_OPTION_TARGET_ENCODING, 'UTF-8');
xml_parser_set_option($xml, XML_OPTION_CASE_FOLDING, 0);
xml_parser_set_option($xml, XML_OPTION_SKIP_WHITE, 1);
xml_set_element_handler($xml, 'elementStart', 'elementEnd');
xml_set_character_data_handler($xml, 'elementData');
$f = fopen(__XML_FILE__, 'r');
if($f) {
while(!feof($f)) {
$content = fread($f, __BUFFER_SIZE__);
xml_parse($xml, $content, feof($f));
unset($content);
}
fclose($f);
}