I do have text file having 500k data.
I am running a loop to store some info. something like..
$file = fopen("top-1-500000.txt", "r") or exit("Unable to open file!");
while(!feof($file)) { //some function
mysql_query("INSERT INTO table (name) VALUES ('$value')");
} fclose($file);
The issue is when the loop stopped in the middle then I need to delete data which already read from text file by reading mySQL database manually to prevent loop to read it again from first line from text file. this is a huge effort in terms of multiple files.
Alternative method of reading a large file is to use the MySQL LOAD DATA INFILE functionality.
Example:
LOAD DATA INFILE 'top-1-500000.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
I'm assuming by "stopped in the middle" you mean that the script is timing out. You should use set_time_limit to prevent your script from timing out (I'm assuming your server config allows you to do this).
if you use file() to read the file, you can just use a counter inside a for-loop counting the number of iterations, then when you need to go back to where the loop stopped, use that counter
$file_array = file("top-1-500000.txt");
for($i=0;$i<count($file_array);$i++)
{
// ...code here...
mysql_query("INSERT INTO table (name) VALUES ('$value')");
}
Also, this is assuming this isn't an absolutely massive file
You are reading the file line by line, so
$line = fgets($handle);
will read 1024 chars by default (you can specify that)
There's a function called fseek with which you can navigate the internal file pointer and read from that place on.
A possible solution is to keep in the database the number of read lines and when (if) the loop dies in the middle, you can have $number_lines * 1024 to be the offset for fseek()
and continue to read on.
Related
Good afternoon.
I have a problem with php foreach loop.
I parse xml file ( ~20mb ) using simplexml and then insert data to mysql
in xml are over 37000 items,i must loop 37000 times ,to read data from xml
every 100 iteration i create string like this:
insert into my_table values (...)
But i get a 502 error in 10500-st iteration.
I try send string after loop,but get error again:
memory_limit=240
max_execution_time 500
How can I solve this problem.
Thanks and best regards.
I think the problem is that your script is timing out, you can overcome this by using set_time_limit(0) in you script or by changing the max_execution_time in your php.ini:
while(1) {
set_time_limit(0);
// do something
}
You also need to increase your memory_limit by editing your php.ini and restart your webserver.
Read documentation for set_time_limit()
Can you split the one big xml into several smaller files?
I'd queue the 37000 items into several batches and process them one after another or asynchronous. I've done this a few times in PHP. A better language for jobs like this would be Phyton or RoR.
However, try creating batches of the items.
Regarding storing the data, I use CSV batch inserts with MySQL.
I use this function to convert the strings into CSV format:
<?php
function convertStrToCsv($data, $delimiter = ';', $enclosure = '"')
{
ob_start();
$fp = fopen('php://output', 'w');
fputcsv($fp, $data, $delimiter, $enclosure);
fclose($fp);
return ob_get_clean();
}
… then I save the functions output as a file and finaly use this query to save the CSV data in the Database:
LOAD DATA LOW_PRIORITY LOCAL INFILE '$file' IGNORE INTO TABLE `$table` CHARACTER SET utf8 FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '\"' LINES TERMINATED BY '\\n';
Read more about this:
LOAD DATA INFILE
PHP's str_getcsv()
Happy coding!
I have a 1.3GB text file that I need to extract some information from in PHP. I have researched it and have come up with a few various ways to do what I need to do, but as always am after a little clarification on which method would be best or if another better exists that I do not know about?
The information I need in the text file is only the first 40 characters of each line, and there are around 17million lines in the file. The 40 characters from each line will be inserted into a database.
The methods I have are below;
// REMOVE TIME LIMIT
set_time_limit(0);
// REMOVE MEMORY LIMIT
ini_set('memory_limit', '-1');
// OPEN FILE
$handle = #fopen('C:\Users\Carl\Downloads\test.txt', 'r');
if($handle) {
while(($buffer = fgets($handle)) !== false) {
$insert[] = substr($buffer, 0, 40);
}
if(!feof($handle)) {
// END OF FILE
}
fclose($handle);
}
Above is read each line at a time and get the data, I have all the database inserts sorted, doing 50 inserts at a time ten times over in a transaction.
The next method is the same as above really but calling file() to store all the lines in an array before doing a foreach to get the data? I am not sure about this method though as the array would essentially have over 17 million values.
Another method would be to extract only part of the file, rewrite the file with the unused data, and after that part has been executed recall the script using a header call?
What would be the best way in terms of getting this done in the most quick and efficient manner? Or is there a better way to approach this that I have thought of?
Also I plan to use this script with wamp, but running it in a browser while testing has caused problems with timeout even with setting script time out to 0. Is there a way I can execute the script to run without accessing the page through a browser?
You have it good so far, don't use "file()" function as it would most probably hit RAM usage limit and terminate your script.
I wouldn't even accumulate stuff into "insert[]" array, as that would waste RAM as well. If you can, insert into the database right away.
BTW, there is a nice tool called "cut" that you could use to process the file.
cut -c1-40 file.txt
You could even redirect cut's stdout to some PHP script that inserts into database.
cut -c1-40 file.txt | php -f inserter.php
inserter.php could then read lines from php://stdin and insert into DB.
"cut" is a standard tool available on all Linuxes, if you use Windows you can get it with MinGW shell, or as part of msystools (if you use git) or install native win32 app using gnuWin32.
Why are you doing this in PHP when your RDBMS almost certainly has bulk import functionality built in? MySQL, for example, has LOAD DATA INFILE:
LOAD DATA INFILE 'data.txt'
INTO TABLE `some_table`
FIELDS TERMINATED BY ''
LINES TERMINATED BY '\n';
( #line )
SET `some_column` = LEFT( #line, 40 );
One query.
MySQL also has the mysqlimport utility that wraps this functionality from the command line.
None of the above. The problem with the using fgets() is it does not work as you expect. When the maximum characters is reached, then the next call to fgets() will continue on the same line. You have correctly identified the problem with using file(). The third method is an interesting idea, and you could pull it off with other solutions as well.
That said, your first idea of using fgets() is pretty close, however we need to slightly modify its behaviour. Here's a customized version that will work as you'd expect:
function fgetl($fp, $len) {
$l = 0;
$buffer = '';
while (false !== ($c = fgetc($fp)) && PHP_EOL !== $c) {
if ($l < $len)
$buffer .= $c;
++$l;
}
if (0 === $l && false === $c) {
return false;
}
return $buffer;
}
Execute the insert operation immediately or you will waste memory. Make sure you are using prepared statements to insert this many rows; this will drastically reduce execution time. You don't want to submit the full query on each insert when you can only submit the data.
At the moment I'm writing an import script for a very big CSV file. The Problem is most times it stops after a while because of an timeout or it throws an memory error.
My Idea was now to parse the CSV file in "100 lines" steps and after 100 lines recall the script automatically. I tried to achieve this with header (location ...) and pass the current line with get but it didn't work out as I want to.
Is there a better way to this or does someone have an idea how to get rid of the memory error and the timeout?
I've used fgetcsv to read a 120MB csv in a stream-wise-manner (is that correct english?). That reads in line by line and then I've inserted every line into a database. That way only one line is hold in memory on each iteration. The script still needed 20 min. to run. Maybe I try Python next time… Don't try to load a huge csv-file into an array, that really would consume a lot of memory.
// WDI_GDF_Data.csv (120.4MB) are the World Bank collection of development indicators:
// http://data.worldbank.org/data-catalog/world-development-indicators
if(($handle = fopen('WDI_GDF_Data.csv', 'r')) !== false)
{
// get the first row, which contains the column-titles (if necessary)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// resort/rewrite data and insert into DB here
// try to use conditions sparingly here, as those will cause slow-performance
// I don't know if this is really necessary, but it couldn't harm;
// see also: http://php.net/manual/en/features.gc.php
unset($data);
}
fclose($handle);
}
I find uploading the file and inserting using mysql's LOAD DATA LOCAL query a fast solution eg:
$sql = "LOAD DATA LOCAL INFILE '/path/to/file.csv'
REPLACE INTO TABLE table_name FIELDS TERMINATED BY ','
ENCLOSED BY '\"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES";
$result = $mysqli->query($sql);
If you don't care about how long it takes and how much memory it needs, you can simply increase the values for this script. Just add the following lines to the top of your script:
ini_set('memory_limit', '512M');
ini_set('max_execution_time', '180');
With the function memory_get_usage() you can find out how much memory your script needs to find a good value for the memory_limit.
You might also want to have a look at fgets() which allows you to read a file line by line. I am not sure if that takes less memory, but I really think this will work. But even in this case you have to increase the max_execution_time to a higher value.
There seems to be an enormous difference between fgetcsv() and fgets() when it comes to memory consumption.
A simple CSV with only one column passed my 512M memory limit for just 50000 records with fgetcsv() and took 8 minutes to report that.
With fgets() it took only 3 minutes to successfully process 649175 records, and my local server wasn't even gasping for additional air..
So my advice is to use fgets() if the number of columns in your csv is limited. In my case fgets() returned directly the string inside column 1.
For more then one column, you might use explode() in a disposable array which you unset() after each record operation.
Thumbed up answer 3 #ndkauboy
Oh. Just make this script called as CLI, not via silly web interface. So, no execution time limit will affect it.
And do not keep parsed results forever but write them down immediately - so, you won't be affected by memory limit either.
I have a custom CakePHP shopping cart application where I’m trying to create a CSV file that contains a row of data for each transaction. I’m running into memory problems when having PHP create the CSV file at once by compiling the relevant data in the MySql databse. Currently the CSV file contains about 200 rows of data.
Alternatively, I’ve considered creating the CSV in a piecemeal process by appending a row of data to the file every time a transaction is made using: fopen($mFile.csv, 'a');
My developers are saying that I will still run into memory issues with this approach when the CSV file gets too large as PHP will read the whole file into memory. Is this the case? When using the append mode will PHP attempt to read the whole file into memory? If so, can you recommend a better approach?
Thanks in advance,
Ben
I ran the following script for a few minutes, and generated a 1.4gb file, well over my php memory limit. I also read from the file without issue. If you are running into memory issues it is probably something else that is causing the problem.
$fp = fopen("big_file.csv","a");
for($i = 0; $i < 100000000; $i++)
{
fputcsv($fp , array("val1","val2","val3","val4","val5","val6","val7","val8","val9"));
}
can't you just export from the db like so:
SELECT list_fields INTO OUTFILE '/tmp/result.text'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
FROM test_table;
I want to merge two large CSV files with PHP. This files are too big to even put into memory all at once. In pseudocode, I can think of something like this:
for i in file1
file3.write(file1.line(i) + ',' + file2.line(i))
end
But when I'm looping through a file using fgetcsv, it's not really clear how I would grab line n from a certain file without loading the whole thing into memory first.
Any ideas?
Edit: I forgot to mention that each of the two files has the same number of lines and they have a one-to-one relationship. That is, line 62,324 in file1 goes with line 62,324 in file2.
Not sure what operating system you're on, but if you're using Linux, using the paste command is probably a lot easier than trying to do this in PHP.
If this is a viable solution and you don't absolutely need to do it in PHP, you could try the following:
paste -d ',' file1 file2 > combined_file
Take a look at the fgets function. You could read a single line of each file, process them, and write them to your new file, then move on to the next line until you've reached the end of your file.
PHP: fgets
Specifically look at the example titled Example #1 Reading a file line by line in the PHP manual. It's also important to note the return value of the the fgets functions.
Returns a string of up to length - 1
bytes read from the file pointed to by
handle. If there is no more data to
read in the file pointer, then FALSE
is returned.
So, if it doesn't return FALSE you know you still have more lines to process.
You can use fgets().
$file1 = fopen('file1.txt', 'r');
$file2 = fopen('file2.txt', 'r');
$merged = fopen('merged.txt', 'w');
while (
($line1 = fgets($file1)) !== false
&& ($line2 = fgets($file2)) !== false) {
fwrite($merged, $line1 . ',' . $line2);
}
fgets() reads one line from a file. As you can see, this code uses it on both files at the same time, writing the merged lines to a third file. The manual here:
http://php.net/fgets
http://php.net/fopen
http://php.net/fwrite
Try using fgets() to read one line from each file at a time.
I think the solution for this is to map first line begins for each line ( and some kind of key if you need ) and then make a new csv using fread and fwrite ( we know beginning and ending of each line now , so we need just seek and read )
Another way is to put it into MySQL ( if it is possible ) and then back to new CSV