Large .txt file (15.5mb) parsed using PHP - php

My task is to parse a large .txt file (circa. 15,000 lines) into a MySQL database. The problem is I'm working with a 30 second maximum execution time. I've tried using this:
$handle = #fopen('http://www.someothersiteyouknow.com/bigfile.txt', "r");
if ($handle) {
while (!feof($handle)) {
$lines[] = fgets($handle, 4096);
}
fclose($handle);
}
I can then access the $lines array and parse the data whichever way I need to but it takes too long for the script to finish running. My feeling is that I should read the file in chunks, maybe 1000 lines at a time. But I only understand how to read from the beginning of the .txt file. Please may you impart some ideas for methods of doing this correctly? Just to clarify, I don't require specific code examples, just ideas for how to parse large .txt files using PHP.

This doesn't seem like the best idea, to be honest. What if multiple users access the same page at, or around, the same time? You'll have (number of users*large text file) being processed concurrently.
Suggest you bring the file local (save it locally if the file doesn't already exist), and work with the local file. This should help reduce your transaction time
This should help bring you into the 30s limit ... if the file doesn't take longer than 30s to download!

Consider putting a set_time_limit inside your loop.
Also if this is a once-off thing you could look at doing it with mySQL's load data file ?

If you can put your file on the server, then you may try to use LOAD DATA INFILE query. It has plenty of options to parse the input, and works reasonably fast. Start experimenting with the small portion of your file. If the server ends up inserting everything into the single row, then tune the LINES TERMINATED BY part, by specifying '\n' or '\r\n'. Then double check the number of rows against the number of lines in the file, and SELECT some of them to see what ended up in the table.

Related

read more than 1000 txt files in core php

I have 1000 plus txt files with file name as usernames. Now i'm reading it by using loop. here is my code
for($i=0; $i<1240; $i++){
$node=$users_array[$i];
$read_file="Uploads/".$node."/".$node.".txt";
if (file_exists($read_file)) {
if(filesize($read_file) > 0){
$myfile = fopen($read_file, "r");
$file_str =fread($myfile,filesize($read_file));
fclose($myfile);
}
}
}
when loop runs, it takes too much time and server gets timed out.
I don't know why it is taking that much time because files have not much data in it. read all text from a txt file should be fast. am i right?
Well, you are doing read operations on HDD/SSD which are not as fast as memory, so you should expect a high running time depending on how big the text files are. You can try the following:
if you are running the script from browser, I recommend running it from command line, this way you will not get a web server time out and the script will manage to finish if there is no time execution limit set on php, case in which maybe you should increase it
on your script above you can set "filesize($read_file)" into a variable so that you do not execute it twice, it might improve running the script
if you still can't finish the job consider running it in batches of 100 or 500
keep an eye on memory usage, maybe that is why the script dies
if you need the content of the file as a string you can try "file_get_contents" and maybe skip "filesize" check all together
It sounds like your problem is having 1000+ files in a single directory. On a traditional Unix file system, finding a single file by name requires scanning through the directory entries one by one. If you have a list of files and try to read all of them, it'll require traversing about 500000 directory entries, and it will be slow. It's an O(n^2) algorithm and it'll only get worse as you add files.
Newer file systems have options to enable more efficient directory access (for example https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Hash_Tree_Directories) but if you can't/don't want to change file system options you'll have to split your files into directories.
For example, you could take the first two letters of the user name and use that as the directory. That's not great because you'll get an uneven distribution, it would be better to use a hash, but then it'll be difficult to find entries by hand.
Alternatively you could iterate the directory entries (with opendir and readdir) and check if the file names match your users, and leave dealing with the problems the huge directory creates for later.
Alternatively, look into using a database for your storage layer.

PHP array uses a lot more memory then it should

I tried to load a 16MB file, into an php array.
It ends up with about 63MB memory usage.
Loading it into a string, just consumes the 16MB, but the issue is, I need it inside of an array, to access it faster, afterwards.
The file consists of about 750k lines (routing table dump).
I proberly should load it into a MySQL database, issue there, not enough memory to run that thing, so I did choose rqlite: https://github.com/rqlite/rqlite. Since I also need the replication features.
I am not sure if a SQLite database is fast enough for that.
Does anyone got an Idea for that issue?
You can get the actual file here: http://data.caida.org/datasets/routing/routeviews-prefix2as/2018/07/routeviews-rv2-20180715-1400.pfx2as.gz
The code I used:
$data = file('routeviews-rv2-20180715-1400.pfx2as');
var_dump(memory_get_usage());
Thanks.
You may use the Php fread function. It allows reading data of fixed size. It can be used inside a loop to read sized blocks of data. It does not consume much memory and is suitable for reading large files.
If you want to sort the data, then you may want to use a database. You can read the data from the large file one line at a time using fread and then insert it to the database.

Read, and remove, X number of lines from big text file in PHP

There are a lot of different scenarios that are similar (replace text in file, read specific lines etc) but I have not found a good solution to what I want to do:
Messages (strings) are normally sent to a queue. If the server that handles the queue is down the messages are saved to a file. One message per line.
When the server is up again I want to start sending the messages to the server. The file with messages could be "big" so I do not want to read the entire file into memory. I also only want to send a message once, so the file need to reflect if a message has been sent(in other words: don't get 100 lines and then PHP timeout after 95 so the next time the same thing will happen again).
What I basically need is to read one line from a big text file and then delete that line when it has been processed by my script, without constantly reading/writing the whole file.
I have seen different solutions (fread, SplFileObject etc) that can read a line from a file without reading the entire file (into memory) but I have not seen a good way to delete the line that was just read without going through the entire file and saving it again.
I'm guessing that it can be done since the thing that needs to be done is to remove x bytes from the beginning or the end of the file, depending where you read the lines from.
To be clear: I do not think it's a good solution to read the first line from the file, use it, and then read all the other lines just to write them to a tmp-file and then from there to the original file. Read/write 100000 lines just to get one line.
The problem can be fixed in other ways, like creating a number of smaller files so they can be read/written without to much performance problems, but I would like to know if anyone has a solution to the exact problem.
Update:
Since it can't be done did I end up using Sqlite.

PHP Reading large tab delimited file looking for one line

We get a product list from our suppliers delivered to our site by ftp. I need to create a script that searches through that file (tab delimited) for the products relevant to us and use the information to update stock levels, prices etc.
The file itself is something like 38,000 lines long and I'm wondering on the best way of handling this.
The only way I can think initially is using fopen and fgetcsv then cycling through each line. Putting the line into an array and looking for the relevant product code.
I'm hoping there is a much more efficient way (though I haven't tested the efficiency of this yet)
The file I'll be reading is 8.8 Mb.
All of this will need to be done automatically, e.g. by CRON on a daily basis.
Edit - more information.
I have run my first trial, and based on the 2 answers, I have the following code:
I have the items I need to pick out of the text file from the database in the array with $items[$row['item_id']] = $row['prod_code'];
$catalogue = file('catalogue.txt');
while ($line = $catalogue)
{
$prod = explode(" ",$line);
if (in_array($prod[0],$items))
{
echo $prod[0]."<br>";//will be updating the stock level in the db eventually
}
}
Though this is not giving the correct output currently
I used to do a similar thing with Dominos Pizza clocking in daily data (all UK).
Either load it all into a database in one go.
OR
Use fopen and load a line at a time into a database, keeping memory overheads low. (I had to use this method as the data wasn't formatted very well)
You can then query the database at your leisure.
What do you mean by »I hope there is a more efficient way«? Effecient in respect to what? Writing the code? CPU consumption while executing the code? Disk I/O? Memory consumption?
Holding ~9MB of text in memory is not a problem (unless you've got a very low memory limit). A file() call would read the entire file and return an array (split by lines). This or file_get_contents() will be the most efficient approach in respect to Disk I/O, but consume a lot more memory than necessary.
Putting the line into an array and looking for the relevant product code.
I'm not sure why you would need to cache the contents of that file in an array. But if you do, remember that the array will use slightly more memory than the ~9MB of text. So you'd probably want to read the file sequentially, to avoid having the same data in memory twice.
Depending on what you want to do with the data, loading it into a database might be a viable solution as well, as #user1487944 already pointed out.

Automatically import CSV file and upload to database

One of my clients has all of his product information handled by an outside source. They have provided this to me in a CSV file which they will regulary update and upload to an ftp folder of my specification, say every week.
Within this CSV file is all of the product information; product name, spec, image location etc.
The site which I have built for my client is running a MySQL database, which I thought would be holding all of the product information, and thus has been built to handle all of the product data.
My question is this: How would I go about creating and running a script that would find a newly added CSV file from the specified FTP folder, extract the data, and replace all of the data within the relevant MySQL table, all done automatically?
Is this even possbile?
Any help would be greatly appreciated as I don't want to use the IFrame option, S.
should be pretty straight forward depending on the csv file
some csv files have quotes around text "", some don't
some have , comma inside the quoted field etc
depending on you level of php skills this should be reasonably easy
you can get a modified timestamp from the file to see if it is new
http://nz.php.net/manual/en/function.lstat.php
open the file and import the data
http://php.net/manual/en/function.fgetcsv.php
insert into the database
http://nz.php.net/manual/en/function.mysql-query.php
If the CSV is difficult to parse with fgetcsv
the you could try something like PHPExcel project which has csv reading capabilities
http://phpexcel.codeplex.com
You can just make a script which reads csv file using fread function of php, extract each row and format in an array to insert it into database.
$fileTemp = "path-of-the-file.csv";
$fp = fopen($fileTemp,'r');
$datas = array()
while (($data = fgetcsv($fp)) !== FALSE)
{
$data['productName'] = trim($data[0]);
$data['spec'] = trim($data[1]);
$data['imageLocation'] = trim($data[2]);
$datas[] = $data;
}
Now you have prepared array $datas which you can insert into database with iterations.
All you need is:
Store last file's mtime somewhere (let's say, for simplicity, in another file)
script that runs every X minutes by cron
In this script you simply mtime of the csv file with stored value. If mtime differs, you run SQL query that looks like this:
LOAD DATA LOCAL INFILE '/var/www/tmp/file.csv' REPLACE INTO TABLE mytable COLUMNS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n'
Optionally, you can just touch your file to know when you've performed last data load. If the scv's file mtime is greate than your "helper" file, you should touch it and perform the query.
Documentation on LOAD DATA INFILE SQL statement is here
Of course there is a room for queries errors, but I hope you will handle it (you just need to be sure data loaded properly and only in this case touch file or write new mtime).
have you had a look at fgetcsv? You will probably have to set up a cron job to check for a new file at regular intervals.

Categories