So, It is very odd that with certain files this program just breaks. I have input files with 500 rows on it, and it works just fine, but if I try to input 1000 lines or more, the program only gets the first row of the csv file (where the titles are) and breaks the whole while loop.
I noticed there were another questions that looked similar; however, when I read through it I realized they weren't calling fgetcsv() into a loop. In contrast, I am calling the function in a while loop.
My code looks like this:
if(move_uploaded_file($_FILES["fileToUpload"]["tmp_name"], $target_file)){
$goe = fopen($target_file, "r");
while($data = fgetcsv($goe, filesize($target_file))){
if(!empty($data[0]) && !empty($data[2])){
if($data[0] !='brand' && $data[2] !='MPN'){
$string = $data[0] .' '. $data[2];
$arrayOfSearches[$data[1]] = $string;
}
}
}
fclose($target_file);
}
After debugging I realized it enter once in the while loop, it enters into the first if condition, but when it gets to the second one it never enters (makes sense since the first row are the titles which I don't want).
Any ideas?
I recently had the same problem, when reading a cvs file, the process was returning all rows in one line, but luckily I found the solution, just add this line to your PHP file:
ini_set('auto_detect_line_endings', true);
Hope works for you.
Related
Quick update: The reason I need this solution is that this one php file is used to expand the flat file for about hundred users (that all use the same php file, but have their own flat files)
SOLUTION:
I worked with this one more day, rephrased the question and got a really great answer. I add it here for future help for others:
$content = file_get_contents("newstest.db");
$content = preg_replace('/(^ID:.*\S)/im', '$1value4:::', $content);
$content = preg_replace('/(^\d+.*\S)/im', '$1:::', $content);
file_put_contents("newstest.db", $content);
The original content of the flat file used when testing the code was:
ID:::value1:::value2:::value3:::
1:::My:::first:::line:::
2:::My:::second:::line:::
3:::Your:::third:::line:::
ORIGINAL QUESTION:
I have a PHP script I am trying to modify. Being a PHP newbie, and have searched both here and on Google without finding a solution, I ask here.
I need to add more values (columns) in the flat file, automatically if the "column" does not exist from before.
Because this one PHP file is shared with many users (each with their own flat file), I need a way to automatically add new "columns" in their flat files if the column does not exist. Doing it manually is very time consuming, and I bet there is an easy way.
INFO:
The flat file is named "newstest.db"
The flat file has this layout:
id:::title:::short:::long:::author:::email:::value1:::value2:::value3:::
So the divider is :::
I understand the basics, that I need to add for instance "value4:::" after "value3:::" in the first line of the news.db, then add ::: to the other existing lines to update all lines and prepare for the new "value4"
Today the php uses this to connect to the flat file:
($filesource is the path to the flat file including it's name. Unique for each user.)
$connect_to_file = connect_pb ($filesource);
And to write to the file I use:
insert_pb($filesource,"$new_id:::$title:::$short:::$long:::$author:::$email:::$value1:::::::::");
(As you see in this case value 2 and 3 is not used in this case, but are in others.)
QUESTION:
Is there a quick/ existing php code to use to add a new column if it doesn't already exist? Or do I need to make the php code for this specific task?
I understand that the code must do something along:
If "value4" does not exist in line 0 in $filesource
then add "value4:::" at the end of line 0,
and for each of the other lines add ":::" at the end.
I don't know where to start, but I have tried for some hours.
I understand this:
update_pb(pathtofiletosaveto,"id","x == (ID of news)","value in first line","value to add");
But I don't know how to make an if statement as in 1) above, neither how to update the line 0 in the flat file to add "value4:::" at the end etc.
MY CODE (does not work as intended):
OR, may be I need to read only line 1 in the file (newstest.db), and then exchange that with a new line if "value4" is not in line 1?
A suggestion, but I don't know how do all:
(It's probably full of errors, as I have tried to read up and find examples and combining code.)
<?php
// specify the file
$file_source="newstest.db";
// get the content of the file
$newscontent = file($file_source, true);
$lines ='';
// handle the content, add "value4:::" and ":::" to the lines
foreach ($newscontent as $line_num => $linehere) {
// add "value4:::" at the end of first line only, and put it in memory
if($line_num[0]) {$lines .= $linehere.'value4:::';}
else {
// then add ":::" to the other lines and add them to memory
$lines .= $linehere.':::';
}
// echo results just to see what is going on
//echo 'Line nr'.$line_num.':<br />'.$lines.'<br /><br />';
}
// add
// to show the result
echo "Here is the result:<br /><br />".$lines."<br /><br />";
//Write new content to $file_source
$f = fopen($file_source, 'w');
fwrite($f,$lines);
fclose($f);
echo "done updating database flat file";
?>
This ALMOST works...
But it does NOT add "value4:::" to the end of the first line,
and it does not add ":::" to the end of the next lines, but to the beginning...
So a couple of questions remains:
1) How can I search in line 0 after "value4", and then write "value4:::" at the end of the line?
2) How can I add ":::" at the end of each line, and not in the beginning?
I kindly ask you to either help me with this.
Do you absolutely have to use PHP for this task? It seems like something you only need to do once, and is much easier to do in a different way.
For example, if you have a *nix shell with sed, sed -i 's/$/:::/' <file> will do that task for you.
I want to read a CSV data file, load it into an array, edit it and write it back to a file. I have been able to accomplish this a single iteration with examples here on Stackoverflow! Thanks.
The trouble is when I write the new data back to the file, both methods I have tried to write the edited Array back to the file add an newline at the end the file. This creates an issue when loading the CSV file data a 2nd time. The 2nd read causes an empty Index in the Array that causes an error when writing the file.
Example #1:
foreach($editArray as $row) {
$writeStuff = implode(",", $row);
fwrite($file_handle, $writeStuff);
fwrite($file_handle, "\n");
}
Example #2:
foreach ($editArray as $row) {
fputcsv($file_handle, $row);
}
This is the original csv data:
1/1/16,Yes,No
1/2/16,No,Yes
When written using the above it produces this data with the added newline:
1/1/16,Yes,No
1/2/16,No,Yes
This extra new line creates an issue when reading the file a 2nd time. I get an error on both the fputcsv() or implode(). I believe it is because the empty Index caused by the newline when I read the file the 2nd time after the first write.
I could use a for loop with a conditional on the last fwrite() in the implode() Example #1, but that would seem clunky and not the way to do it.
Maybe there is a completely different way to handle this?
This is the expected behaviour of fputcsv
fputcsv() formats a line (passed as a fields array) as CSV and write it (terminated by a newline) to the specified file handle.
Being that all lines are terminated by newline, you will have an extra blank line at the end of the file
You should apply a fix for the second read, where the last line creates issues, by checking if the line is empty before processing.
If you want to prevent adding a new line at the end of the file, you could build your data set with new lines where you need them (and where you don't) then write it once:
$writeStuff = [];
foreach($editArray as $row) {
$writeStuff[] = implode(',', $row);
}
fwrite($file_handle, implode(PHP_EOL, $writeStuff));
Also, I'm not sure how you load the file, but you could always skip empty lines - here's an example:
$editArray = file('your_filename.csv', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
Based upon the recommendation, I looked for a solution when reading and loading the file rather than when I wrote the file.
These are the solutions I came up with.
First Option:
while(! feof($file_handle)) {
$tmp = fgetcsv($file_handle);
if($tmp != NULL) {
$myArray[] = $tmp;
}
}
fgetcsv returns a NULL if the line is empty.
Second Option. Ditch the fgetcsv() for file(). It ignores the empty newline without testing.
$data_Array = file($file);
foreach($$data_Array as $key) {
$myArray[] = explode(",", $key);
}
This seems to work. Additionally the example given earlier with implode() and PHP_EOL seems to work also. I may be missing something, but these work for now.
I have a 1.3GB text file that I need to extract some information from in PHP. I have researched it and have come up with a few various ways to do what I need to do, but as always am after a little clarification on which method would be best or if another better exists that I do not know about?
The information I need in the text file is only the first 40 characters of each line, and there are around 17million lines in the file. The 40 characters from each line will be inserted into a database.
The methods I have are below;
// REMOVE TIME LIMIT
set_time_limit(0);
// REMOVE MEMORY LIMIT
ini_set('memory_limit', '-1');
// OPEN FILE
$handle = #fopen('C:\Users\Carl\Downloads\test.txt', 'r');
if($handle) {
while(($buffer = fgets($handle)) !== false) {
$insert[] = substr($buffer, 0, 40);
}
if(!feof($handle)) {
// END OF FILE
}
fclose($handle);
}
Above is read each line at a time and get the data, I have all the database inserts sorted, doing 50 inserts at a time ten times over in a transaction.
The next method is the same as above really but calling file() to store all the lines in an array before doing a foreach to get the data? I am not sure about this method though as the array would essentially have over 17 million values.
Another method would be to extract only part of the file, rewrite the file with the unused data, and after that part has been executed recall the script using a header call?
What would be the best way in terms of getting this done in the most quick and efficient manner? Or is there a better way to approach this that I have thought of?
Also I plan to use this script with wamp, but running it in a browser while testing has caused problems with timeout even with setting script time out to 0. Is there a way I can execute the script to run without accessing the page through a browser?
You have it good so far, don't use "file()" function as it would most probably hit RAM usage limit and terminate your script.
I wouldn't even accumulate stuff into "insert[]" array, as that would waste RAM as well. If you can, insert into the database right away.
BTW, there is a nice tool called "cut" that you could use to process the file.
cut -c1-40 file.txt
You could even redirect cut's stdout to some PHP script that inserts into database.
cut -c1-40 file.txt | php -f inserter.php
inserter.php could then read lines from php://stdin and insert into DB.
"cut" is a standard tool available on all Linuxes, if you use Windows you can get it with MinGW shell, or as part of msystools (if you use git) or install native win32 app using gnuWin32.
Why are you doing this in PHP when your RDBMS almost certainly has bulk import functionality built in? MySQL, for example, has LOAD DATA INFILE:
LOAD DATA INFILE 'data.txt'
INTO TABLE `some_table`
FIELDS TERMINATED BY ''
LINES TERMINATED BY '\n';
( #line )
SET `some_column` = LEFT( #line, 40 );
One query.
MySQL also has the mysqlimport utility that wraps this functionality from the command line.
None of the above. The problem with the using fgets() is it does not work as you expect. When the maximum characters is reached, then the next call to fgets() will continue on the same line. You have correctly identified the problem with using file(). The third method is an interesting idea, and you could pull it off with other solutions as well.
That said, your first idea of using fgets() is pretty close, however we need to slightly modify its behaviour. Here's a customized version that will work as you'd expect:
function fgetl($fp, $len) {
$l = 0;
$buffer = '';
while (false !== ($c = fgetc($fp)) && PHP_EOL !== $c) {
if ($l < $len)
$buffer .= $c;
++$l;
}
if (0 === $l && false === $c) {
return false;
}
return $buffer;
}
Execute the insert operation immediately or you will waste memory. Make sure you are using prepared statements to insert this many rows; this will drastically reduce execution time. You don't want to submit the full query on each insert when you can only submit the data.
This question was asked on a message board, and I want to get a definitive answer and intelligent debate about which method is more semantically correct and less resource intensive.
Say I have a file with each line in that file containing a string. I want to generate an MD5 hash for each line and write it to the same file, overwriting the previous data. My first thought was to do this:
$file = 'strings.txt';
$lines = file($file);
$handle = fopen($file, 'w+');
foreach ($lines as $line)
{
fwrite($handle, md5(trim($line))."\n");
}
fclose($handle);
Another user pointed out that file_get_contents() and file_put_contents() were better than using fwrite() in a loop. Their solution:
$thefile = 'strings.txt';
$newfile = 'newstrings.txt';
$current = file_get_contents($thefile);
$explodedcurrent = explode('\n', $thefile);
$temp = '';
foreach ($explodedcurrent as $string)
$temp .= md5(trim($string)) . '\n';
$newfile = file_put_contents($newfile, $temp);
My argument is that since the main goal of this is to get the file into an array, and file_get_contents() is the preferred way to read the contents of a file into a string, file() is more appropriate and allows us to cut out another unnecessary function, explode().
Furthermore, by directly manipulating the file using fopen(), fwrite(), and fclose() (which is the exact same as one call to file_put_contents()) there is no need to have extraneous variables in which to store the converted strings; you're writing them directly to the file.
My method is the exact same as the alternative - the same number of opens/closes on the file - except mine is shorter and more semantically correct.
What do you have to say, and which one would you choose?
This should be more efficient and less resource-intensive as the previous two methods:
$file = 'passwords.txt';
$passwords = file($file);
$converted = fopen($file, 'w+');
while (count($passwords) > 0)
{
static $i = 0;
fwrite($converted, md5(trim($passwords[$i])));
unset($passwords[$i]);
$i++;
}
fclose($converted);
echo 'Done.';
As one of the comments suggests do what makes more sense to you. Since you might come back to this code in few months and you need to spend least amount of time trying to understand it.
However, if speed is your concern then I would create two test cases (you pretty much already got them) and use timestamp (create variable with timestamp at the beginning of the script, then at the end of the script subtract it from timestamp at the end of the script to work out the difference - how long it took to run the script.) Prepare few files I would go for about 3, two extremes and one normal file. To see which version runs faster.
http://php.net/manual/en/function.time.php
I would think that differences would be marginal, but it also depends on your file sizes.
I'd propose to write a new temporary file, while you process the input one. Once done, overwrite the input file with the temporary one.
I was wondering if anybody could shed any light on this problem.. PHP 5.3.0 :)
I have a loop, which is grabbing the contents of a CSV file (large, 200mb), handling the data, building a stack of variables for mysql inserts and once the loop is complete and the variables created, I'm inserting the information.
Now firstly, the mysql insert is performing perfectly, no delays and all is fine, however it's the LOOP itself that has the delay, I was originally using fgetcsv() to read the CSV file but compared to file_get_contents() this had a seriously delay - so I switched to file_get_contents(). The loop will perform in a matter of seconds, until I attempt to add a function (I've also added the expression inside the loop without the function to see if it helps) to create an array with the CSV data from each line, this is what is causing serious delays on the parsing time! (the difference is about 30 seconds based on this 200mb file but depending of filesize of csv file I guess)
Here's some code so you can see what I'm doing:
$filename = "file.csv";
$content = file_get_contents($filename);
$rows = explode("\n", $content);
foreach ($rows as $data) {
$data = preg_replace("/^\"(.*)\"$/","$1",preg_split("/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/", trim($data))); //THIS IS THE CULPRIT CAUSING SLOW LOADING?!?
}
Running the above loop, will perform almost instantly without the line:
$data = preg_replace("/^\"(.*)\"$/","$1",preg_split("/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/", trim($data)));
I've also tried creating a function as below (outside of loop):
function csv_string_to_array($str) {
$expr="/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/";
$results=preg_split($expr,trim($str));
return preg_replace("/^\"(.*)\"$/","$1",$results);
}
and calling the function instead of the one liner:
$data = csv_string_to_array($data);
With again no luck :(
Any help would be appreciated on this, I'm guessing the fgetcsv function is performing in a very similar way based on the delay it causes, looping through and creating an array from the line of data.
Danny
The regex subexpressions (bounded by "(...)") are the issue. It's trivial to show that adding these to an expression can greatly reduce its performance. The first thing I would try is to stop using preg_replace() to simply remove leading and trailing double quotes (trim() would be a better bet for that) and see how much that helps. After that you might need to try a non-regex way to parse the line.
I partially found a solution, I'm sending a batch to only loop 1000 lines at a time (php is looping by 1000 until it reaches the end of the file).
I'm then only setting:
$data = preg_replace("/^\"(.*)\"$/","$1",preg_split("/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/", trim($data)));
on the 1000 lines, so that it's not being set for the WHOLE file which was causing issues.
It is now looping and inserting 1000 rows into the mysql database in 1-2 seconds, which I'm happy with. I've setup the script to loop 1000 rows, remember its last location, then loop to the next 1000 until it reaches the end, it seems to be working ok!
I'd say the major culprit is the complexity of the preg_split() regexp.
And the explode() is probably eating some seconds.
$content = file_get_contents($filename);
$rows = explode("\n", $content);
could be replaced by:
$rows = file ($filename); // returns an array
But, I second the above suggestion from ITroubs, fgetcsv() would probably be a much better solution.
I would suggest using fgetcsv for parsing the data. It seems like memory may be your biggest impact. So to avoid consuming 200MB of RAM, you should parse line-by-line as follows:
$fp = fopen($input, 'r');
while (($row = fgetcsv($fp, 0, ',', '"')) !== false) {
$out = '"' . implode($row, '", "') . '"'; // quoted, comma-delimited output
// perform work
}
Alternatively: Using conditionals in preg is typically very expensive. It is can sometimes be faster to process these lines using explode() and trim() with its $charlist parameter.
The other alternative, if you still want to use preg, add the S modifier to try to speed up the expression.
http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php
S
When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character.
By the way, I don't think your function is doing what you think it should: it won't actually modify the $rows array when you've exited from the loop. To do that, you need something more like:
foreach ($rows as $key => $data) {
$rows[$key]=preg_replace("/^\"(.*)\"$/","$1",preg_split("/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/", trim($data)));