PHP and concurrent file access

PHP and concurrent file access - php

I'm building a small web app in PHP that stores some information in a plain text file. However, this text file is used/modified by all users of my app at some given point in time and possible at the same time.
So the questions is. What would be the best way to make sure that only one user can make changes to the file at any given point in time?

You should put a lock on the file
$fp = fopen("/tmp/lock.txt", "r+");
if (flock($fp, LOCK_EX)) { // acquire an exclusive lock
ftruncate($fp, 0); // truncate file
fwrite($fp, "Write something here\n");
fflush($fp); // flush output before releasing the lock
flock($fp, LOCK_UN); // release the lock
} else {
echo "Couldn't get the lock!";
}
fclose($fp);
Take a look at the http://www.php.net/flock

My suggestion is to use SQLite. It's fast, lightweight, stored in a file, and has mechanisms for preventing concurrent modification. Unless you're dealing with a preexisting file format, SQLite is the way to go.

You could do a commit log sort of format, sort of how wikipedia does.
Use a database, and every saved change creates a new row in the database, that makes the previous record redundant, with an incremented value, then you only have to worry about getting table locks during the save phase.
That way at least if 2 concurrent people happen to edit something, both changes will appear in the history and whatever one lost out to the commit war can be copied into the new revision.
Now if you don't want to use a database, then you have to worry about having a revision control file backing every visible file.
You could put a revision control ( GIT/MERCURIAL/SVN ) on the file system and then automate commits during the save phase,
Pseudocode:
user->save :
getWritelock();
write( $file );
write_commitmessage( $commitmessagefile ); # <-- author , comment, etc
call "hg commit -l $commitmessagefile $file " ;
releaseWriteLock();
done.
At least this way when 2 people make critical commits at the same time, neither will get lost.

A single file for many users really shouldn't be the strategy you use I don't think - otherwise you'll probably need to implement a single (global) access point that monitors if the file is currently being edited or not. Aquire a lock, do your modification, release the lock etc. I'd go with 'Nobody's suggestion to use a database (SQLite if you don't want the overhead of a fully decked out RDBMS)

Related

Laravel download file from php output buffer VS. private storage folder | security

A user can download query results in CSV format. The file is small (a few KB), but the contents are important.
The first approach is to use php output buffer php://:
$callback = function() use ($result, $columns) {
$file = fopen('php://output', 'w');
fputcsv($file, $columns);
foreach($result as $res) {
fputcsv($file, array($res->from_user, $res->to_user, $res->message, $res->date_added));
}
fclose($file);
};
return response()->stream($callback, 200, $headers);
The second approach is to create a new folder in laravels storage system and set it to private and download the file from there. You could even delete the file after the download:
'csv' => [
'driver' => 'local',
'root' => storage_path('csv'),
'visibility' => 'private',
],
Here is the create/download code:
$file = fopen('../storage/csv/file.csv', 'w');
fputcsv($file, $columns);
foreach($result as $res) {
fputcsv($file, array($res->from_user, $res->to_user, $res->message, $res->date_added));
}
fclose($file);
return response()->make(Storage::disk('csv')->get('file.csv'), 200, $headers);
This return will instantly delete the file after the download:
return response()->download(Storage::disk('csv')->path('file.csv'))
->deleteFileAfterSend(true);
What would be more secure? What is the better approach? I am currently leaning towards the second approach with the storage.

Option 1
Reasons:
you are not keeping the file, so persisting to disk has limited use
the data size is small, so download failures are unlikely, and if they happen, the processing time to recreate the output is minimal (I assume it's a quick SQL query behind the scenes?)
keeping the file in storage creates opportunities for the file to replicate, an incremental backup or rsync that you may setup in the future could replicate the sensitive files before they get deleted...
deleting the file from the filesystem does not necessarily make the data unrecoverable
If you were dealing with files that are tens/hundreds of MB, I'd be thinking differently...

Let's think about all options,
Option 1 is good solution because you are not storing the file. It will be more secure than others. But timeout can be a problem at high traffic.
Option 2 also good solution with delete. But you need to create files with unique names so you can use parallel downloads.
Option 3 is like option 2 but if you are using laravel don't use it. (And think about 2 people are downloading at the same time)
After this explanation, you need to work on option 1 to make it more secure if you are using one server. But if you are using microservices you need work on option 2.
I can suggest one more thing to make it secure. Create a unique hashed URL. For example, use timestamp and hash it with laravel and check them before URL. So people can't download again from download history.
https://example.com/download?hash={crypt(timestamp+1min)}
If it is not downloaded in 1 min URL will be expired.

I think the reply depends on the current architecture and the size of file to download
(1)st approach is applicable when:
If the files are small (less then 10 Mb) Thanks #tanerkay
you have simple architecture (e.g. 1 server)
Reasons:
no download failures -- no need to retry
keep it simple
no files = no backups and no rsync and no additional places to steal it
.
.
.
(2)nd approach is applicable when:
If your files are big (10+ Mb)
If you already have microservices architecture with multiple balance loaders -- keep the similarity
If you have millions users trying to download -- you just can't service them without balance loader and parallel downloading
Reasons:
The second approach is definitely more SCALABLE and so more stable under high loading, so more secure. Microservices are more time consuming and more scalable for heavy loading.
The usage of separate file storage allows you in the future to have separate file server and balance loaded and the queue manager and separate dedicated access control.
If the content is important, it usually means that to get it is very important for the user.
But direct output with headers can hang or get a timeout error and so on.
Keeping the file until it would be downloaded is much more sure approach of delivering it I think.
Still, I consider expiration time instead or additionally to the downloading fact -- the download process can fail, or the file is lost (ensure 1+ hour availability) or vice versa the user will try to download it only after 1 year or never -- why should you keep this file for more than N days?

I think first option
The first approach is to use php output buffer php://:
is more secure then other where you're not storing file anywhere.

Transaction priority?

I have crons what run script each 3 minutes, script contains function what:
try
begin transaction
loop
//parse large xml data
//send data to database
endloop
commit
endtry
catch
rollback
endcatch
Now, data insertion is long process what takes about 3-6 minutes, and cron is each 3 minutes, so there is sometimes process conflict.
I see when i add commit inside loop that priority has new process, can i somehow say hey new transaction wait until before transaction commit?

I would try and Keep It Simple S....., and use a simple file locking process like this at the top of your existing cron script.
$fp = fopen("/tmp/my_cron_lock.txt", "r+");
if ( ! flock($fp, LOCK_EX)) {
// other cron is overrunning so
// I'll get restarted in 3 mins
// so I will let other job finish
fclose($fp);
exit;
}
// existing script
// free the lock,
// although this will happen automatically when script terminates
fclose($fp);
?>

You can store some lock somewhere persistent, typically that is done in some lock file in a file system:
The process first checks if a file already exists. If so, it exits right away.
If no lock file exists it creates the lock file itself and writes its own process id into it. When terminating, it again checks if that is still its own lock file (by the process id) and removes it if all is fine.
That way you can fire your trigger script (cron job) every minute without any risk.
The same can be done on database or even table level. However that can be less robust depending on the situation, since it obviously fails if there is an issue with the database connection. The less layers are involved, the more robust. And as always: you have to decide yourself what approach is the best. But in general: locking is the answer.

Preventing a site-wide double submit

I was having a hard time figuring out a good title for this question, so I hope this is clear. I am currently using the TwitterOauth module on one of my sites to post a tweet. While this works, I need to set a limit to the amount of tweets submitted; just one each hour.
Note: I do not have the option to use a database. This is paramount for the question.
I have incorporated this as follows, in the PHP file that handles the actual posting to the Twitter API:
# Save the timestamp, make sure lastSentTweet exists and is writeable
function saveTimestamp(){
$myFile = "./lastSentTweet.inc";
$fh = fopen($myFile, 'w');
$stringData = '<?php function getLastTweetTimestamp() { return '.time().';}';
fwrite($fh, $stringData);
fclose($fh);
}
# Include the lastSentTweet time
include('./lastSentTweet.inc');
# Define the delay
define('TWEET_DELAY', 3600);
# Check for the last tweet
if (time() > getLastTweetTimestamp() + TWEET_DELAY) {
// Posting to Twitter API here
} else {
die("No.");
}
(initial) contents of the lastSentTweet.inc file (chmod 777):
<?php function getLastTweetTimestamp() { return 1344362207;}
The problem is that while this works; it allows for accidental double submits; if multiple users (and the site this script runs on is currently extremely busy) trigger this script, it happens that 2 submits (or more, though this has not occurred yet) to Twitter slip through, instead of just the 1. My first thought is the (although minute) delay in opening, writing and closing the file, but I could be wrong.
Does anyone have an idea what allows for the accidental double submits (and how to fix this)?

You're getting race conditions. You will need to implement locking on your file while you're making changes, but you need to enclose both the read (the include statement) and the update inside the lock; what is critical is to ensure nobody else (e.g. another HTTP request) is using the file, while you read its current value and then update it with the new timestamp.
This would be fairly ineffective. You have other options which might be available in your PHP installation, here are some:
You can use a database even if you don't have a database server: SQLite
You can store your timestamp in APC and use apc_cas() to detect if your last stored timestamp is still current when you update it.
Update
Your locking workflow needs to be something like this:
Acquire the lock on your stored timestamp. If you're working with files, you need to have the file open for reading and writing, and have called flock() on it. flock() will hang if another process has the file locked, and will return only after it has acquired the lock, at which point other processes attempting to lock the file will hang.
Read the stored timestamp from the already locked file.
Check if the required time has passed since the stored timestamp.
Only if it has passed, send the tweet and save the current timestamp to the file; otherwise you don't touch the stored timestamp.
Release the lock (just closing the file is enough).
This would ensure that no other process would update the timestamp after you have read and tested it but before you have stored the new timestamp.

concurrent file read/write

What happens when many requests are received to read & write to a file in PHP? Do the requests get queued? Or is only one accepted and the rest are discarded?
I'm planning to use a text based hit counter.

You can encounter the problem of race condition
To avoid this if you only need simple append data you can use
file_put_contents(,,FILE_APPEND|LOCK_EX);
and don't worry about your data integrity.
If you need more complex operation you can use flock (used for simple reader/writer problem)
For your PHP script counter I suggest you to do with something like this:
//> Register this impression
file_put_contents( $file, "\n", FILE_APPEND|LOCK_EX );
//> Read the total number of impression
echo count(file($file));
This way you don't have to implement a blocking mechanism and you can keep the system and your code script lighter
Addendum
To avoid to have to count the array file() you can keep the system even lighter with this:
//> Register this impression
file_put_contents( $file, '1', FILE_APPEND|LOCK_EX );
//> Read the total number of impression
echo filesize($file);
Basically to read the number of your counter you just need to read its filesize considering each impression add 1 byte to it

No, requests will not be queued, reader will get damaged data, writers will overwrite each other, data will be damaged.
You can try to use flock and x mode of fopen.
It's not so easy to code good locking mutex, so try to find existing variant, or try to move data from file to DB.

You can use flock() to get a lock on the file prior to read/write to it. If other threads are holding a lock on the file, flock() will wait until the other locks are released.

Two users write to a file at the same time? (PHP/file_put_contents)

If I write data to a file via file_put_contents with the FILE_APPEND flag set and two users submit data at the same time, will it append regardless, or is there a chance one entry will be overwritten?
If I set the LOCK_EX flag, will the second submission wait for the first submission to complete, or is the data lost when an exclusive lock can't be obtained?
How does PHP generally handle that? I'm running version 5.2.9. if that matters.
Thanks,
Ryan

you could also check the flock function to implement proper locking (not based on the while / sleep trick)

If you set an exclusive file lock via LOCK_EX, the second script (time-wise) that attempts to write will simply return false from file_put_contents.
i.e.: It won't sit and wait until the file becomes available for writing.
As such, if so required you'll need to program in this behaviour yourself, perhaps by attempting to use file_put_contents a limited number of times (e.g.: 3) with a suitably sized usage of sleep between each attempt.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.