Potentially write to same file in PHP multiple times at once?

Potentially write to same file in PHP multiple times at once? - php

I am using PHP's fputcsv to log votes in an application we are making. The saving part of my code roughly ressembles this:
$handle = fopen('votes.csv', 'a');
fputcsv($handle, $data);
fclose($handle);
This works flawlessly in tests. However, I have a small concern. When deployed in production, it's possible that many many users will be making a request to this script at the same time. I am curious as to how PHP will handle this.
Will I potentially have problems, and lose votes because of that? If so, what can I do to prevent it? Is the solution more complex than simply using a database? And finally, how can I test for this situation, where a lot of requests would be made at the same time? Are there things that already exist to test this sort of stuff?

Writing to a file can cause issues with concurrent users. If you instead insert into a database, you can let the database itself handle the queue. If you run out of connections, it is easily tracked and you can see the load on the db as you go.
An insert in a database will be less resource heavy than an append to a file. Having said that, you would need pretty heavy load for either to take effect - but with a database, you have the build in query queue to alleviate a good portion of the concurrent stress.
When you send a request to a database, it actually goes into a queue for processing. It only fails to be executed if there is a timeout in your PHP code (basically, PHP is told to abandon the wait for the db to respond - and you can control this via PHP and Apache settings) so you have a fantastic built-in buffer.

Related

PHP multiple requests at once are creating incorrect database entries

What I'm running here is a graphical file manager, akin to OneDrive or OpenCloud or something like that. Files, Folders, Accounts, and the main server settings are all stored in the database as JSON-encoded objects (yes, I did get rid of columns in favor of json). The problem is that is multiple requests use the same object at once, it'll often save back incorrect data because the requests obviously can't communicate changes to each other.
For example, when someone starts a download, it loads the account object of the owner of that file, increments its bandwidth counter, and then encodes/saves it back to the DB at the end of the download. But say if I have 3 downloads of the same file at once, they'll all load the same account object, change the data as they see fit, and save back their data without regards to the others that overlap. In this case, the 3 downloads would show as 1.
Besides that downloads and bandwidth are being uncounted, I'm also having a problem where I'm trying to create a maintenance function that loads the server object and doesn't save it back for potentially several minutes - this obviously won't work while downloads are happening and manipulating the server object all the meanwhile, because it'll all just be overwritten with old data when the maintenance function finishes.
Basically it's a threading issue. I've looked into PHP APC in the hope I could make objects persist globally between threads but that doesn't work since it just serializes/deserialized data for each request rather than actually having each request point to an object in memory.
I have absolutely no idea how to fix this without completely designing a new system that's totally different.... which sucks.
Any ideas on how I should go about this would be awesome.
Thanks!

It's not a threading issue. Your database doesn't conform to neither of the standards of building databases, including even the first normal form: every cell must contain only one value. When you're storing JSON data in DB, you cannot write an SQL request to make that transaction atomic. So, yes, you need to put that code in a trash bin.
In case you really need to get that code working, you can use some mutexes to synchronize running PHP scripts. The most common implementation in PHP is file mutex.

You can try to use flock , I guess you already have a user id before getting JSON from DB.
$lockfile = "/tmp/userlocks/$userid.txt";
$fp = fopen($lockfile, "w+");
if (flock($fp, LOCK_EX)) {
//Do your JSON update
flock($fp, LOCK_UN); //unlock
}else{
// lock exist
}
What you need to figure out is what to do when there is a lock, maybe wait for 0.5 secs and try to obtain lock again , or send a message "Only one simultaneous download allowed " or ....

A little php script (logical help needed)

I am a .net developer and devolving an application for a company. For that I need to write a little php script to meet my needs.
My app need to check some information which randomly change almost every second from internet. I am thinking to make a php script so that I can give app the needed information. My idea is to use a simple text file instead of a mysql database (I am free to use a mysql db also). And then make two php pages. For example writer.php and reader.php
work of writer.php is very simple. This file will save the submitted data to the text file I want to use as db.
reader.php will read the text file and then show as simple text and on every read it will also empty the text file.This file will be read by my app.
work done.
Now the logical questions.
reader.php will be read by 40 clients in the same time. If there is
any conflicts?
If this method will be fast than mysql db?
If this method is more resource consuming than a mysql db?

You will have to lock the file for I/O for the time of writting (PHP flock() function). This may slow down things a bit when there will be more clients at same time, as when file will be locked by one user, everyone else would have to wait. The other problem that may appear when writting alot o data is that writting queue may become infinite when there would be many write requests.
MySQL seems to be better idea, as it caches both write and read requests, and it is implemented to avoid simultanous access conflicts.

max_execution_time Alternative

So here's the lowdown:
The client i'm developing for is on HostGator, which has limited their max_execution_time to 30 seconds and it cannot be overridden (I've tried and confirmed it cannot be via their support and wiki)
What I'm have the code doing is take an uploaded file and...
loop though the xml
get all feed download links within the file
download each xml file
individually loop though each xml array of each file and insert the information of each item into the database based on where they are from (i.e. the filename)
Now is there any way I can queue this somehow or split the workload into multiple files possibly? I know the code works flawlessly and checks to see if each item exists before inserting it but I'm stuck getting around the execution_limit.
Any suggestions are appreciated, let me know if you have any questions!

The timelimit is in effect only when executing PHP scripts through a webserver, if you execute the script from CLI or as a background process, it should work fine.
Note that executing an external script is somewhat dangerous if you are not careful enough, but it's a valid option.
Check the following resources:
Process Control Extensions
And specifically:
pcntl-exec
pcntl-fork

Did you know you can trick the max_execution_time by registering a shutdown handler? Within that code you can run for another 30 seconds ;-)
Okay, now for something more useful.
You can add a small queue table in your database to keep track of where you are in case the script dies mid-way.
After getting all the download links, you add those to the table
Then you download one file and process it; when you're done, you check them off (delete from) from the queue
Upon each run you check if there's still work left in the queue
For this to work you need to request that URL a few times; perhaps use JavaScript to keep reloading until the work is done?

I am in such a situation. My approach is similar to Jack's
accept that execution time limit will simply be there
design the application to cope with sudden exit (look into register_shutdown_function)
identify all time-demanding parts of the process
continuously save progress of the process
modify your components so that they are able to start from arbitrary point, e.g. a position in a XML file or continue downloading your to-be-fetched list of XML links
For the task I made two modules, Import for the actual processing; TaskManagement for dealing with these tasks.
For invoking TaskManager I use CRON, now this depends on what webhosting offers you, if it's enough. There's also a WebCron.
Jack's JavaScript method's advantage is that it only adds requests if needed. If there are no tasks to be executed, the script runtime will be very short and perhaps overstated*, but still. The downsides are it requires user to wait the whole time, not to close the tab/browser, JS support etc.
*) Likely much less demanding than 1 click of 1 user in such moment
Then of course look into performance improvements, caching, skipping what's not needed/hasn't changed etc.

PHP - enforce user wait before using server resources

I have a PHP function that I want to make available publically on the web - but it uses a lot of server resources each time it is called.
What I'd like to happen is that a user who calls this function is forced to wait for some time, before the function is called (or, at the least, before they can call it a second time).
I'd greatly prefer this 'wait' to be enforced on the server-side, so that it can't be overridden by dubious clients.
I plan to insist that users log into an online account.
Is there an efficient way I can make the user wait, without using server resources?
Would 'sleep()' be an appropriate way to do this?
Are there any suggested problems with using sleep()?
Is there a better solution to this?
Excuse my ignorance, and thanks!

sleep would be fine if you were using PHP as a command line tool for example. For a website though, your sleep will hold the connection open. Your webserver will only have a finite number of concurrent connections, so this could be used to DOS your site.
A better - but more involved - way would be to use a job queue. Add the task to a queue which is processed by a scheduled script and update the web page using AJAX or a meta-refresh.

sleep() is a bad idea in almost all possible situations. In your case, it's bad because it keeps the connection to the client open, and most webservers have a limit of open connections.

sleep() will not help you at all. The user could just load the page twice at the same time, and the command would be executed twice right after each other.
Instead, you could save a timestamp in your database for when your function was last invoked. Then, before invoking it, you should check the database to see if a suitable amount of time has passed. If it has, invoke the function and update the timestamp in the database.

If you're planning on enforcing a user login, than the problem just got a whole lot simpler.
Have a record inn the database listing users and the last time they used your resource consuming service, and measure the time difference between then and now. If the time difference is too low, deny access and display an error message.

This is best handled at the server level. No reason to even invoke PHP for repeat requests.
Like many sites, I use Nginx and you can use it's rate-limiting to block repeat requests over a certain number. So like, three requests per IP, per hour.

Background PHP worker

I have a script that takes a while to process, it has to take stuff from the DB and transfers data to other servers.
At the moment i have it do it immediately after the form is submitted and it takes the time it takes to transfer that data to say its been sent.
I was wondering is the anyway to make it so it does not do the process in front of the client?
I dont want a cron as it needs to be sent at the same time but just not loading with the client.

A couple of options:
Exec the PHP script that does the DB work from your webpage but do not wait for the output of the exec. Be VERY careful with this, don't blindly accept any input parameters from the user without sanitising them. I only mention this as an option, I would never do it myself.
Have your DB updating script running all the time in the backgroun, polling for something to happen that triggers its update. Say, for example, it could be checking to see if /tmp/run.txt exists and will start DB update if it does. You can then create run.txt from your webpage and return without waiting for a response.
Create your DB update script as a daemon.

Here are some things you can take a look at:
How much data are you transferring, and by transfer is it more like a copy-and-paste the data only, or are you inserting the data from your db into the destination server and then deleting the data from your source?
You can try analyzing your SQL to see if there's any room for optimization.
Then you can check your php code as well to see if there's anything, even the slightest, that might aid in performing the necessary tasks faster.
Where are the source and destination database servers located (in terms of network and geographically, if you happen to know) and how fast the source and destination servers are able to communicate through the net/network?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Potentially write to same file in PHP multiple times at once? - php

Related

PHP multiple requests at once are creating incorrect database entries

A little php script (logical help needed)

max_execution_time Alternative

PHP - enforce user wait before using server resources

Background PHP worker

Categories

Resources