I want to read a csv file from top to bottom every time in a for each loop and check if the same content already exists, overwrites it with new content. For example, if the product id is the same, overwrites old data with new data in the same row.
Right now, I'm using fgetcsv to read the csv file line by line. Is there a method to write to the same line as I read the csv?
Is there a method to write to the same line as I read the csv?
Short answer: No.
Long answer: Yes, but it's massively difficult and generally you would simply want to completely rewrite the file from start to finish.
Let's say you have a file containing:
The quick brown fox jumped over the lazy dog.
and you wants to alter jumped to jumps. "Easy!" you might think, "I'll just change out the words". You look up file IO and how to move pointers around, seek to the position of the beginning of the word "jumped" and write "jumps". Now you have:
The quick brown fox jumpsd over the lazy dog.
Uh oh, that's not right. We'll need to delete that errant d. How? By shifting every single byte in the file up by one byte. A tedious operation and a great way to corrupt your data due to small oversights and odd corner cases. [1 to 3 byte UTF characters? BOMs?] You're in even more trouble if you need to insert a word longer than the original jumped.
Think of files like brick walls. It's easy to append a new row of bricks to the top of the wall, but if you want to stick even one extra brick into the middle you might as well just knock the whole thing down and rebuild.
If you have set of data that needs to be modified in-place then just about the worst place to store it is in flat files like CSV or XML files. Go with Dagon's suggestion and put that data in a database where it belongs.
Don't have a DB server? Look into SQLite.
forgetting about your implementation here, you are asking the fastest way to check if a file has changed in a loop. I though first checksum might be reasonable but it might be a bit longer than you thought.
How to generate a md5 checksum for a CSV file in JSP
the next option is to use file.GetlastWriteTime(). Skip the loop and just check if it is the same as the last time it was used.
http://msdn.microsoft.com/en-us/library/system.io.file.getlastwritetime(v=vs.100).aspx
woops... didn't notice you are using PHP. I am sure there is a similar method for both of those options out there.
Related
There are quite a few different threads about this similar topic, yet I have not been able to fully comprehend a solution to my problem.
What I'd like to do is quite simple, I have a flat-file db, with data stored like this -
$username:$worldLocation:$resources
The issue is I would like to have a submit data html page that would update this line based upon a search of the term using php
search db for - $worldLocation
if $worldLocation found
replace entire line with $username:$worldLocation:$updatedResources
I know there should be a fairly easy way to get this done but I am unable to figure it out at the moment, I will keep trying as this post is up but if you know a way that I could use I would greatly appreciate the help.
Thank you
I always loved c, and functions that came into php from c.
Check out fscanf and fprintf.
These will make your life easier while reading writing in a format. Like say:
$filehandle = fopen("file.txt", "c");
while($values = fscanf($filehandle, "%s\t%s\t%s\n")){
list($a, $b, $c) = $values;
// do something with a,b,c
}
Also, there is no performance workaround for avoiding reading the entire file into memory -> changing one line -> writing the entire file. You have to do it.
This is as efficient as you can get. Because you most probably using native c code since I read some where that php just wraps c's functions in these cases.
You like the hard way so be it....
Make each line the same length. Add space, tab, capital X etc to fill in the blanks
When you want to replace the line, find it and as each line is of a fixed length you can replace it.
For speed and less hassle use a database (even SQLLite)
If you're committed to the flat file, the simplest thing is iterating through each line, writing a new file & changing the one that matches.
Yeah, it sucks.
I'd strongly recommend switching over to a 'proper' database. If you're concerned about resources or the complexity of running a server, you can look into SQLite or Berkeley DB. Both of these use a database that is 'just a file', removing the issue of installing and maintaining a DB server, but still you the ability to quickly & easily search, replace and delete individual records. If you still need the flat file for some other reason, you can easily write some import/export routines.
Another interesting possibility, if you want to be creative, would be to look at your filesystem as a database. Give each user a directory. In each directory, have a file for locations. In each file, update the resources. This means that, to insert a row, you just write to a new file. To update a file, you just rewrite a single file. Deleting a user is just nuking a directory. Sure, there's a bit more overhead in slurping the whole thing into memory.
Other ways of solving the problem might be to make your flat-file write-only, since appending to the end of a file is a trivial operation. You then create a second file that lists "dead" line numbers that should be ignored when reading the flat file. Similarly, you could easily "X" out the existing lines (which, again, is far easier than trying to update lines in a file that might not be the same length) and append your new data to the end.
Those second two ideas aren't really meant to be practical solutions as much as they are to show you that there's always more than one way to solve a problem.
ok.... after a few hours work..this example woorked fine for me...
I intended to code an editing tool...and use it for password update..and it did the
trick!
Not only does this page send and email to user (sorry...address harcoded to avoid
posting aditional code) with new password...but it also edits entry for thew user
and re-writes all file info in new file...
when done, it obviously swaps filenames, storing old file as usuarios_old.txt.
grab the code here (sorry stackoverflow got VERY picky about code posting)
https://www.iot-argentina.xyz/edit_flat_databse.txt
Is that what you are location for :
update `field` from `table` set `field to replace` = '$username:$worldlocation:$updatesResources' where `field` = '$worldLocation';
I have a function to write a text file based on the form settings, a rather large form.
Shortly, I want to compare the output of a function to a single file, and only do execution (rewriting the file) if the destination file is different from the output. As you guess, it is a performance concern.
Is it doable, BTW?
The process is, I fill up some forms:
A single file is written to contain some "specific" selected options
Some "non-specific" options do not necessarily write anything to the file.
The form is updateable anytime, so the content of the file may grow or shrink based on different options.
It only needs a rewrite to the file if I am at point #1.
When at point #2, nothing should be written.
This is what I tried:
if ($output != file_get_contents($filepath)) {
// save the data
}
But I felt so much delay of execution in this.
I found a almost similar issue here: Can I use file_get_contents() to compare two files?, but my issue is different. Mine is comparing the result of the process to an already existing file which simply the result of the process previously. And only rewrite the file if they are different.
No sensitive data on the form, btw.
Any hint is very much appreciated.
Thanks
To compare a whole file with a string (I suppose it's a string, isn't it?) the only way is to read whole file and do comparison. To improve performance you can read file line by line and stop at first different line, as Explosion Pills said before me.
If your file is really big, and you want to improve performance further, you can do some hashing stuff:
Generate the output, let's say $output.
Calculate md5($output) and store in $output_md5.
Compare $output_md5 with a stored one, let's say in file output.md5.
Are they equal?
If yes, do nothing.
If not, save $output into output.txt and $output_md5 in output.md5.
Rather than load the entire file into memory, it may be faster to read it line-by-line (fgets) and compare it to the input string also line-by-line. You could even go as small as character-by-character, but I think that's overkill.
You could always try a combination of what was in the other post, the sha1_file($file) function, with the sha1($string) function, and check the equality of that.
I have files I need to convert into a database. These files (I have over 100k) are from an old system (generated from a COBOL script). I am now part of the team that migrate data from this system to the new system.
Now, because we have a lot of files to parse (each files is from 50mb to 100mb) I want to make sure I use the right methods in order to convert them to sql statement.
Most of the files have these following format:
#id<tab>name<tab>address1<tab>address2<tab>city<tab>state<tab>zip<tab>country<tab>#\n
the address2 is optional and can be empty
or
#id<tab>client<tab>taxid<tab>tagid<tab>address1<tab>address2<tab>city<tab>state<tab>zip<tab>country<tab>#\n
these are the 2 most common lines (I'll say around 50%), other than these, all the line looks the same but with different information.
Now, my question is what should I do to open them to be as efficient as possible and parse them correctly?
Honestly, I wouldn't use PHP for this. I'd use awk. With input that's as predictably formatted as this, it'll run faster, and you can output into SQL commands which you can also insert via a command line.
If you have other reasons why you need to use PHP, you probably want to investigate the fgetcsv() function. Output is an array which you can parse into your insert. One of the first user-provided examples takes CSV and inserts it into MySQL. And this function does let you specify your own delimiter, so tab will be fine.
If the id# in the first column is unique in your input data, then you should definitely insert this into a primary key in mysql, to save you from duplicating data if you have to restart your batch.
When I worked on a project where it was necessary to parse huge and complex log files (Apache, firewall, sql), we had a big gain in performance using the function preg_match_all(less than 10% of the time required using explode / trims / formatting).
Huge files (>100Mb) are parsed in 2 or 3 minutes in a core 2 duo (the drawback is that memory consumption is very high since it creates a giant array with all the information ready to be synthesized).
Regular expressions allow you to identify the content of line if you have variations within the same file.
But if your files are simple, try ghoti suggestion (fgetscv), will work fine.
If you're already familiar with PHP then using it is a perfectly fine tool.
If records do not span multiple lines, the best way to do this to guarantee that you won't run out of memory will be to process one line at a time.
I'd also suggest looking at the Standard PHP Library. It has nice directory iterators and file objects that make working with files and directories a bit nicer (in my opinion) than it used to be.
If you can use the CSV features and you use the SPL, make sure to set your options correctly for the tab characters.
You can use trim to remove the # from the first and last fields easily enough after the call to fgetcsv
Just sit and parse.
It's one-time operation and looking for the most efficient way makes no sense.
Just more or less sane way would be enough.
As a matter of fact, most likely you'll waste more overall time looking for the super-extra-best solution. Say, your code will run for a hour. You will spend another hour to find a solution that runs 30% faster. You'll spend 1,7 hours vs. 1.
I have a couple hundred single words that are identified in a foreach routine and pushed into an array.
I would like to check each one (word) to see if it exists in an existing txt file that is single column 200k+ lines.
(Similar to a huge "bad word" routine i guess but, in the end this will add to the "filter" file.)
I don't know whether i should do this with preg_match in the loop or should I combine the arrays somehow and array_unique?
I would like to add the ones not found to the main file as well. Also flocking in attempt to avoid any multi access issues.
Is this a pipe dream? Well it is for this beginner. My attempts have timed out in 30 seconds.
Stackoverflow has been such a great resource. I don't know what I would do without it. Thanks in advance either way.
sorry, but that sound like a REALLY AWFUL APPROACH!
doing a whole scan (of a table, list or whatever) if you want to check if something already exists is just... wrong.
this is what hashtables are for!
your case sounds like a classical database job...
if you don't have a database available you can use a local sqlite file which will provide essential functionalities.
let me explain the background...
a lookup of "foo" in an hashtable basically consumes O(1) time. which means a static amount of time. because your algorithm knows WHERE to look and can see whether its THERE. hashmaps have the the attitude to run into ambiguiti because of the one-way nature of hashing procedures, which really doesnt matter that much because the hashmap delivers some possible matches which can be compared directly (for a reasonable number of elements, like probably the google index laugh)
so if you want (for some reason) stay with your text-file approach, consider the following:
sort your file and insert your data at the right place (alphabetically would be the most intuitive approach). then you can jump from position to position and isolate the area where the word should be. there are several algorithms available, just have a google. but keep it takes longer the more data you have. usually your running time will be O(log(n)) where n is the size of the table.
well this is all basically just to guide you on the right track.
you can as well shard your data, this would be for example saving every word beginning with a in the file a.txt and so on. or to split the word into characters and create a folder for every character and the last character is the file, then you check if the file exists. those are stupid suggestions, as you will probably run out of inodes on your disk, but it illustrates that you can CHECK for EXISTENCE witout having to do a FULL SCAN.
the main thing is that you have to project some search tree into a reasonable structure (like a database system does automatically for you). the folder example was an example of the basic principle.
this wikipedia entry might be a good place to start: http://en.wikipedia.org/wiki/Binary_search_tree
If the file is too large, then it is not a good idea to read it all into memory. You can process it line by line:
<?php
$words = array('a', 'b', 'c'); # words to insert, assumed to be unique
$fp = fopen('words.txt', 'r+');
while (!feof($fp))
{
$line = trim(fgets($fp));
$key = array_search($line, $words);
if ($key !== false)
{
unset($words[$key]);
if (!$words) break;
}
}
foreach ($words as $word)
{
fputs($fp, "$word\n");
}
fclose($fp);
?>
It loops through the entire file, checking to see if the current line (assumed to be a single word) exists in the array. If it does, that element is removed from the array. If there is nothing left in the array, then the search stops. After cycling through the file, if the array is not empty, it adds each of them to the file.
(File locking and error handling are not implemented in this example.)
Note that this is a very bad way to store this data (file based, unsorted, etc). Even sqlite would be a big improvement. You could always simply write an exporter to .txt if you needed it in plain text.
I want to delete a range of data from a text file using PHP. Let's assume the file contains the following:
Hello, World!
I want to delete everything from character 2 to character 7. The actual file I need to do this with is very large, so I don't want to have to read the large file in order to delete just a small, given range of data. The data contained within the given range is not known, so str_replace or preg_replace solutions wouldn't work anyways.
Thanks!
There is no way to remove a chunk in the middle of a file. You will need to read everything following the chunk to move it down to backfill the hole. Copying the relevant data to another file is an easy way to do this.
If you for some reason have to use a file, and it's a big file, you can read in smaller chunks (like one line at the time), and write the data you want to keep out to a temporary file continously. This will cut down on memory requirements.
I took your guys advice, plus I did some brainstorming, and I found a solution to my problem. Basically I took Ignacio's suggestion:
You will need to read everything following the chunk to move it down to backfill the hole. Copying the relevant data to another file is an easy way to do this.
But, instead of moving the data to a temporary file, I simply read each chunk and then immediately moved the file pointer backwards with fseek() and used fwrite() to fill in the hole. Then I truncated the file to the correct length with ftruncate().
Again, thanks everybody for the suggestions!
You don't want but you have to. Read and rewrite the whole file. That's why everyone would use a database for this, not a plain file
Here's a better solution than my original answer, although I'm still not sure it's ideal.
You could open the file for reading, and read in chunks, dropping whatever you read in into a second file which you've opened for writing. Then just skip the portion you're wanting to "delete".
Naturally at the end you'd copy your temp file over the original.
Edit:
Upon further reflection, I feel this is actually probably the most useful answer (unless/until someone has a better idea), but the credit must go to "gnud" for arriving at the same conclusion first.
The simplest way I can think of is to read in the entire file as a string and use array_splice string_splice to remove a segment.
Edit:
Excuse me, I didn't mean array_splice, I meant string_splice, which is a custom function I made for my own use. It's something like this (I don't have it handy at the moment):
function string_splice($string, $start, $length, $replace) {
$string = substr($string, 0, $start).$replace.substr($string,$length);
}
Edit:
This is NOT the ideal solution; please see the comments below. It's a bad idea to read a very large file into a string; in addition to memory consumption, operating on such a large string is very inefficient. A better solution is that proposed by gnud. Thanks.