php remove duplicate if same mobile phone number is in .csv file - php

I have a separate php script file that saves to file a number csv values via a html contact form.
I would like a maximum of 2 duplicate rows based on mobile phone entries in csv file,
any more and I would want the current record deleted.
I am using the $_GET()(no $_POST() functions) function to record all entries, and then save to file.
Im just having issues with deleting duplicates if the mobile number is already TWICE in the file.
Any help would be greatly appreciated.
**ADDED MORE CODE AND COMMENT BELOW**
I have edited the code, but I am still running into trouble with removing duplicates, let alone check for 2 dupes first. I will do the sanitize and better code 'after' I have some function (help!).
Thanks again for your help :)
<?php
$filename = "input.csv";
$csv_output .= "\n";$title=$_GET[title];$fname=$_GET[fname];
$sname=$_GET[sname];$notes=$_GET[notes];$mobile=$_GET[mobile];
$string="$title,$fname,$sname,$mobile,$notes,$csv_output";
$file = fopen($filename, "c");
// see details on the 'c' mode here http://us3.php.net/manual/en/function.fopen.php - it will create a file if it does not exist.
// Now acquire an exclusive via flock() - http://us3.php.net/manual/en/function.flock.php
flock($file, LOCK_EX); // this will block till some other reader/writer has released the lock.
$stat = fstat($file)
if($stat['size'] == 0) {
// file created for the first time
fwrite($file, "Title,First Name,Last Name,MobileNumber,Notes\n$string");
flock($file, LOCK_UN);
fclose($file);
return;
}
// File not empty - scan thru line by line via fgets(), and detect duplicates
// If a duplicate is detected, just flock($file, LOCK_UN), close the file and return - ///// no need to fwrite the line.
while (($buffer = fgets($file, 2188)) !== false) {
if(!stripos($buffer, ",$mobile,") {
$mobile .= $buffer;
}
else {
flock($file, LOCK_UN);
fclose($file);
return;
}
}
?>

Are you running this on a Linux/Unix system? If so, the way you have accessed the file will lead to race-conditions and possible corruption of the file.
You need to ensure that the write to the file is done in a serialized manner if multiple processes are attempting to write to the same file.
As you don't want to explore other alternatives like a db (even key-value file-based dbs), a pseudo-code approach is:
$file = fopen($filename, "c"); // see details on the 'c' mode here http://us3.php.net/manual/en/function.fopen.php - it will create a file if it does not exist.
// Now acquire a exclusive via flock() - http://us3.php.net/manual/en/function.flock.php
flock($file, LOCK_EX); // this will block till some other reader/writer has released the lock.
$stat = fstat($file)
if($stat['size'] == 0)
{
// file created for the first time
fwrite($file, "Title,First Name,Last Name,MobileNumber,Notes\n$string");
flock($file, LOCK_UN);
fclose($file);
return;
}
// File not empty - scan thru line by line via fgets(), and detect duplicates
// If a duplicate is detected, just flock($file, LOCK_UN), close the file and return - no need to fwrite the line.
// Otherwise fwrite() the line
.
.
flock($file, LOCK_UN);
fclose($file);
You can fill in the details in the middle part - hope you got the gist of it.
You could potentially make it more 'scalable' by initially grabbing a read lock (this will allow multiple readers to run concurrently, and only the writer will block). Once the read portion is done, you need to release the lock, and if a write needs to be done (i.e. no duplicates detected), then grab a writer lock etc...
Clearly this is not a ideal solution but if your file contents are going to be small, it may suffice.
Stating the obvious, you would need to do better error handling with all file-based operations.
A tangential point: you should also sanitize the data from $_GET before going to the core logic to catch for invalid inputs.
Hope this helps.

Related

How to rename() a file in PHP that needs to remain locked while doing so?

I have a text file which multiple users will be simultaneously editing (limited to an individual line per edit, per user). I have already found a solution for the "line editing" part of the required functionality right here on StackOverflow.com, specifically, the 4th solution (for large files) offered by #Gnarf in the following question:
how to replace a particular line in a text file using php?
It basically rewrites the entire file contents to a new temporary file (with the user's edit included) and then renames the temporary file to the original file once finished. It's great!
To avoid one user's edit causing a conflict with another user's edit if they are both attempting an edit at the same time, I have introduced flock() functionality, as can be seen in my variation on the code here:
$reading = fopen($file, 'r');
$writing = fopen($temp, 'w');
$replaced = false;
if ((flock($reading, LOCK_EX)) and (flock($writing, LOCK_EX))) {
echo 'Lock acquired.<br>';
while (!feof($reading)) {
$line = fgets($reading);
$values = explode("|",$line);
if ($values[0] == $id) {
$line = $id."|comment edited!".PHP_EOL;
$replaced = true;
}
fputs($writing, $line);
}
flock($reading, LOCK_UN);
flock($writing, LOCK_UN);
fclose($reading);
fclose($writing);
} else {
echo 'Lock not acquired.<br>';
}
I've made sure the $temp file always has a unique filename. Full code here: https://pastebin.com/E31hR9Mz
I understand that flock() will force any other execution of the script to wait in a queue until the first execution has finished and the flock() has been released. So far so good.
However, the problem starts at the end of the script, when the time has come to rename() the temporary file to replace the original file.
if ($replaced) {
rename($temp, $file);
} else {
unlink($temp);
}
From what I have seen, rename() will fail if the original file still has a flock(), so I need to release the flock() before this point. However, I also need it to remain locked, or rename() will fail when another user running the same script immediately opens a new flock() as soon as the previous flock() is released. When this happens, it will return:
Warning: rename(temporary.txt,original.txt): Access is denied. (code: 5)
tl;dr: I seem to be in a bit of a Catch-22. It looks like rename() won't work on a locked file, but unlocking the file will allow another user to immediately lock it again before the rename() can take place.
Any ideas?
update: After some extensive research into how flock() works, (in layman's terms, there is no guarantee that another script will respect the "lock", and therefore it is not really a "lock" at all as one would assume from the literal meaning of the word) I have opted for this solution instead which works like a charm:
https://docstore.mik.ua/orelly/webprog/pcook/ch18_25.htm
"Good lock" on your locking adventures.

Which is the faster way to remove a list of rows from huge log file using PHP

I need to remove various useless log rows from a huge log file (200 MB)
/usr/local/cpanel/logs/error_log
The useless log rows are in array $useless
The way I am doing is
$working_log="/usr/local/cpanel/logs/error_log";
foreach($useless as $row)
{
if ($row!="") {
file_put_contents($working_log,
str_replace("$row","", file_get_contents($working_log)));
}
}
I need to remove about 65000 rows from the log file;
the code above does the job but it works slow, about 0.041 sec to remove each row.
Do you know a faster way to do this job using php ?
If the file can be loaded in memory twice (it seems it can if your code works) then you can remove all the strings from $useless in a single str_replace() call.
The documentation of str_replace() function explains how:
If search is an array and replace is a string, then this replacement string is used for every value of search.
$working_log="/usr/local/cpanel/logs/error_log";
file_put_contents(
$working_log,
str_replace($useless, '', file_get_contents($working_log))
);
When the file becomes too large to be processed by the code above you have to take a different approach: create a temporary file, read each line from the input file and write it to the temporary file or ignore it. At the end, move the temporary file over the source file:
$working_log="/usr/local/cpanel/logs/error_log";
$tempfile = "/usr/local/cpanel/logs/error_log.new";
$fin = fopen($working_log, "r");
$fout = fopen($tempfile, "w");
while (! feof($fin)) {
$line = fgets($fin);
if (! in_array($line, $useless)) {
fputs($fout, $line);
}
}
fclose($fin);
fclose($fout);
// Move the current log out of the way (keep it as backup)
rename($working_log, $working_log.".bak");
// Put the new file instead.
rename($tempfile, $working_log);
You have to add error handling (fopen(), fputs() may fail for various reasons) and code or human intervention to remove the backup file.

Create file in a thread-safe manner

I have an array of filenames and each process need to create and write only to a single file.
This is what I came to:
foreach ($filenames as $VMidFile) {
if (file_exists($VMidFile)) { // A
continue;
}
$fp = fopen($VMidFile, 'c'); // B
if (!flock($fp, LOCK_EX | LOCK_NB)) { // C
continue;
}
if (!filesize($VMidFile)) { // D
// write to the file;
flock($fp, LOCK_UN);
fclose($fp);
break;
}
flock($fp, LOCK_UN);
fclose($fp); // E
}
But I don't like that I'm relying on the filesize.
Any proposals to do it in another (better) way?
UPD: added the labels to discuss easily
UPD 2: I'm using filesize because I don't see any other reliable way to check if the current thread created the file (thus it's empty yet)
UPD 3: the solution should be condition race free.
A possible, slightly ugly solution would be to lock on a lock file and then testing if the file exists:
$lock = fopen("/tmp/".$filename."LOCK", "w"); // A
if (!flock($lock, LOCK_EX)) { // B
continue;
}
if(!file_exists($filename)){ // C
//File doesn't exist so we know that this thread will create it
//Do stuff to $filename
flock($lock, LOCK_UN); // D
fclose($lock);
}else{
//File exists. This thread didn't create it (at least in this iteration).
flock($lock, LOCK_UN);
fclose($lock);
}
This should allow exclusive access to the file and also allows deciding whether the call to fopen($VMidFile, 'c'); will create the file.
Rather than creating a file and hoping that it's not interfered with:
create a temporary file
do all necessary file operations on it
rename it to the new location if location doesn't exist.
Technically, since rename will overwrite the destination there is a chance that concurrent threads will still clash. That's very unlikely if you have:
if(!file_exists($lcoation) { rename(...
You could use md5_file to verify the file contents is correct after this block.
You can secure exclusive access using semaphores (UNIX only, and provided the sysvsem extension is installed):
$s = sem_get(ftok($filename), 'foo');
sem_acquire($s);
// Do some critical work...
sem_release($s);
Otherwise you can also use flock. It does not require any special extensions, but according to comments on PHP.net is a bit slower than using semaphores:
$a = fopen($file, 'w');
flock($a, LOCK_EX);
// Critical stuff, again
flock($a, LOCK_UN);
Use mode 'x' instead of 'c' in your fopen call. And check the resulting $fp, if it's false, the file wasn't created by the current thread, and you should continue to the next filename.
Also, depending your PHP's installation settings, you may want to put an # in front of the fopen call to suppress any warnings if fopen($VMidFile, 'x') is unable to create the file because it already existed.
This should work even without flock.

Read and write to a file while keeping lock

I am making a simple page load counter by storing the current count in a file. This is how I want to do this:
Lock the file (flock)
Read the current count (fread)
Increment it (++)
Write new count (fwrite)
Unlock file/close it (flock/fclose)
Can this be done without losing the lock?
As I understand it, the file can't be written to without losing the lock. The only way I have come up with to tackle this, is to write a character using "r+" mode, and then counting characters.
As said, you could use FLock. A simple example would be:
//Open the File Stream
$handle = fopen("file.txt","r+");
//Lock File, error if unable to lock
if(flock($handle, LOCK_EX)) {
$count = fread($handle, filesize("file.txt")); //Get Current Hit Count
$count = $count + 1; //Increment Hit Count by 1
ftruncate($handle, 0); //Truncate the file to 0
rewind($handle); //Set write pointer to beginning of file
fwrite($handle, $count); //Write the new Hit Count
flock($handle, LOCK_UN); //Unlock File
} else {
echo "Could not Lock File!";
}
//Close Stream
fclose($handle);
I believe you can achieve this using flock. Open a pointer to your file, flock it, read the data, write the data, then close (close automatically unlocks).
http://php.net/flock
Yes, you have to use rewind before ftruncate. Otherwise, the old content of the file is only refilled with zeros.
The working sequence is:
fopen
flock LOCK_EX
fread filesize
rewind
ftruncate 0
fwrite
flock LOCK_UN
fclose

PHP - Open TXT file, add +1 to contents when link clicked

How can I make it so when a user clicks on a link on my web page, it writes to a .txt file named "Count.txt", which contains only a number and adds 1 to that number? Thank you.
If you forego any validity checking you could do it with something as simple as:
file_put_contents($theCounterFile, file_get_contents($theCounterFile)+1);
Addition:
There's talk about concurrency in this thread and it should be noted that it is a good idea to use a database and transactions to deal with concurrency, I'd highly recommend against writing a bunch of plumbing code to do this in a file.
If you've ever had, or think you might ever have two requests for the resource in the same second you should look into PDO with mysql, or PDO with SQLite instead of a file, use transactions (and InnoDB or better if you're going for mysql).
But really, even if you get two requests in the same microsecond (highly unlikely), chances of locking the file are slim as it will not be kept open and the two requests will probably not be handled parallel enough to lock anyway. Reality check: how many hits on the same resource do you get on average in the same minute?...
If you decide to do anything more advanced, like say two numbers, you may want to consider using SQLite. It's about as about as fast and as simple as opening and closing a file, but is much more flexible.
Open the file, lock the file (VERY important), read the number currently in there, add 1 to the number, write number back to file, release the lock and close the file.
ie. something like :
$fp = fopen("count.txt", "r+");
if (flock($fp, LOCK_EX)) { // do an exclusive lock
$num = fread($fp, 10);
$num++;
fseek($fp, 0);
fwrite($fp, $num);
flock($fp, LOCK_UN); // release the lock
} else {
// handle error
}
fclose($fp);
should work (not tested).
Generally this is quite easy:
$count = (int)file_get_contents('/path/to/Count.txt');
file_put_contents('/path/to/Count.txt', $count++, LOCK_EX);
But you'll run into concurrency problems using this code. One way to generate a lock safe from any race condition is:
$countFile = '/path/to/Count.txt';
$countTemp = tempnam(dirname($countFile), basename($countFile));
$countLock = $countFile . '.lock';
$f_lock = fopen($countLock, 'w');
if(flock($f_lock, LOCK_EX)) {
$currentCount = (int)file_get_contents($countFile);
$f_temp = fopen($countTemp, 'w');
if(flock($f_temp, LOCK_EX)) {
fwrite($f_temp, $currentCount++);
flock($f_temp, LOCK_UN);
fclose($f_temp);
if(!rename($countTemp, $countFile)) {
unlink($countTemp);
}
}
flock($f_lock, LOCK_UN);
fclose($f_lock);
}

Categories