Preventing a site-wide double submit

Preventing a site-wide double submit - php

I was having a hard time figuring out a good title for this question, so I hope this is clear. I am currently using the TwitterOauth module on one of my sites to post a tweet. While this works, I need to set a limit to the amount of tweets submitted; just one each hour.
Note: I do not have the option to use a database. This is paramount for the question.
I have incorporated this as follows, in the PHP file that handles the actual posting to the Twitter API:
# Save the timestamp, make sure lastSentTweet exists and is writeable
function saveTimestamp(){
$myFile = "./lastSentTweet.inc";
$fh = fopen($myFile, 'w');
$stringData = '<?php function getLastTweetTimestamp() { return '.time().';}';
fwrite($fh, $stringData);
fclose($fh);
}
# Include the lastSentTweet time
include('./lastSentTweet.inc');
# Define the delay
define('TWEET_DELAY', 3600);
# Check for the last tweet
if (time() > getLastTweetTimestamp() + TWEET_DELAY) {
// Posting to Twitter API here
} else {
die("No.");
}
(initial) contents of the lastSentTweet.inc file (chmod 777):
<?php function getLastTweetTimestamp() { return 1344362207;}
The problem is that while this works; it allows for accidental double submits; if multiple users (and the site this script runs on is currently extremely busy) trigger this script, it happens that 2 submits (or more, though this has not occurred yet) to Twitter slip through, instead of just the 1. My first thought is the (although minute) delay in opening, writing and closing the file, but I could be wrong.
Does anyone have an idea what allows for the accidental double submits (and how to fix this)?

You're getting race conditions. You will need to implement locking on your file while you're making changes, but you need to enclose both the read (the include statement) and the update inside the lock; what is critical is to ensure nobody else (e.g. another HTTP request) is using the file, while you read its current value and then update it with the new timestamp.
This would be fairly ineffective. You have other options which might be available in your PHP installation, here are some:
You can use a database even if you don't have a database server: SQLite
You can store your timestamp in APC and use apc_cas() to detect if your last stored timestamp is still current when you update it.
Update
Your locking workflow needs to be something like this:
Acquire the lock on your stored timestamp. If you're working with files, you need to have the file open for reading and writing, and have called flock() on it. flock() will hang if another process has the file locked, and will return only after it has acquired the lock, at which point other processes attempting to lock the file will hang.
Read the stored timestamp from the already locked file.
Check if the required time has passed since the stored timestamp.
Only if it has passed, send the tweet and save the current timestamp to the file; otherwise you don't touch the stored timestamp.
Release the lock (just closing the file is enough).
This would ensure that no other process would update the timestamp after you have read and tested it but before you have stored the new timestamp.

Related

How to design the backend to put requests into a queue?

I want to design such a system that users can submit files. After they submit a file I will run some scripts with the files. I want to run these files in order so I want to maintain a queue of requests. How can I do this with php? Is there any open source library for this?
Thanks!

I would use a database..
When user submits a file it adds a link to it's location in the file system to the database.
You have a cron running that checks for new submissions and processes them in order, when done it marks it as processed in the database.

I would use Redis. Redis is a superfast key-value storage; usually its response time is double digit microseconds. (10 - 99 microseconds)
Redis transactions are atomic (transactions either happen or they don't), and you can have a job constantly running without using cron.
To use Redis with PHP, you can use Predis.
Once redis is installed, and Predis is set up to work with your script, upon uploading the file I would do something like this:
// 'hostname' and 'port' is the hostname and the port
// where Redis is installed and listening to.
$client = new Predis\Client('tcp://hostname:port');
// assuming the path to the file is stored in $pathToFile
$client->lpush('queue:files', $pathToFile);
then the script that needs to work with the files, just needs to do something like:
$client = new Predis\Client('tcp://hostname:port');
// assuming the path to the file is stored in $pathToFile
while(true){
$pathToFile = $client->rpop('queue:files');
if(!$pathToFile) {
// list is empty, so move on.
continue;
}
// there was something in the list, do whatever you need to do with it.
// if there's an exception or an error, you can always use break; or exit; to terminate the loop.
}
Take in consideration that PHP tends to use a lot of memory, so I would either explicitly collect the garbage (via gc_enable() and gc_collect_cycles() and unset() variables as you go).
Alternatively, you can use software like supervisord to make this script run just once, and as soon as it ends, start it up again.
In general, I would stay away from using a database and cron to implement queues. It can lead to serious problems.
Assume, for example, you use a table as a queue.
In the first run, your script pulls a job from the database and start doing its thing.
Then for some reason, your script takes longer to run, and the cron job kicks in again, and now you have 2 scripts working the same file. This could either have no consequences, or could have serious consequences. That depends on what your application is actually doing.
So unless you're working with a very small dataset, and you know for sure that your cronjob will finish before the previous script ran, and there will be no collisions, then you should be fine. Otherwise, stay away from that approach.

Third party Library? Too simple to need an entire Library. You could use Redis (see answer by AlanChavez) if you want to waste time and resources and then have to be concerned about garbage collection, when the real solution is not to bring in garbage into the mix in the first place.
Your queue is a text file. When a file is uploaded, the name of the file is appended to the queue.
$q= fopen('queue.txt','a');
The 'a' mode is important. It automatically moves the write pointer to the end of the file for append writes. But the reason it is important is because if the file does not exist, a new one is created.
fwrite($q,"$filename\n");
fclose($q);
If there are simultaneous append writes to this file the OS with arbitrate the conflict without error. No need for file locking, cooperative multitasking, or transactional processing.
When your script that processes the queue begins to run it renames the live queue to a working queue.
if(!file_exists('q.txt')){
if(!file_exists('queue.txt')){exit;}
rename('queue.txt','q.txt');
$q = fopen(`q.txt`,'r');
while (($filename = fgets($q, 4096)) !== false) {
process($filename);
}
fclose($q);
unlink('q.txt');
}
else{
echo 'Houston, we have a problem';
}
Now you see why the 'a' mode was important. We rename the queue and when the next upload occurs, the queue.txt is automatically created.
If the file is being written to as it is being rename the OS will sort it out without error. The rename is so fast that the chance of a simultaneous write is astronomically small. And it is a basic OS feature to sort out file system contention. No need for file locking, cooperative multitasking, or transactional processing.
This is a bullet proof method I have been using for many years.
Replace the Apollo 13 quote with an error recovery routine. If q.txt exists the previous processing did not complete successfully.
That was too easy.
Because it is so simple and we have plenty of memory because we are so efficient: Let's have some fun.
Let's see if writing to the queue is faster than AlanChavez's "super fast" Redis with its only double digit millisecond response.
The time in seconds to add a file name to the queue = 0.000014537 or 14.5µS. Slightly better than Redis's "super fast" 10-99 mS Response Time by, at minimum, 100,000%.

Periodically populate view data with query

I'm writing a view that will have daily/weekly/monthly report data. I'm thinking it makes sense to only run a query periodically to update the data rather than hit the database whenever someone loads the page. Can this be done completely in PHP and MySQL? What are some robust ways to handle this?

Using a templating engine like Smarty that supports caching, you can set a long cache time for those pages. You then need to code your PHP to test whether your date constraints have changed and if the data is not already cached, and if either of those conditions is true, perform the query. Otherwise, Smarty will just load the cached page and your code won't query the database.
$smarty = new Smarty();
if (!$smarty->isCached('yourtemplate.tpl')) {
// Run your query and populate template variables
}
$smarty->display('yourtemplate.tpl');
Further documentation on Smarty caching

Yes but not very well. You want to look into Cron jobs, most Web hosts provide a service to setup Crons. They are simply a way to run a script, any script, PHP, Javascript, a whole page etc.
Search google for cron jobs, you should find what you're looking for.
If your web host doesn't provide cron jobs and you don't know how Unix commands work, then there are sites that will host a cron job for you.
Check out
http://www.cronjobs.org/

I'm thinking it makes sense to only run a query periodically to update the data rather than hit the database whenever someone loads the page
Personally I'd go with both. e.g.
SELECT customer, COUNT(orders.id), SUM(order_lines.value)
FROM orders, order_lines
WHERE orders.id=order_lines.order_id
AND orders.placed>#last_time_data_snapshotted
AND orders.customer=#some_user
GROUP BY customer
UNION
SELECT user, SUM(rollup.orders), SUM(rollup.order_value)
FROM rollup
WHERE rollup.last_order_date<#last_time_data_snapshotted
AND rollup.customer=#some_user
GROUP BY customer
rather than hit the database whenever someone loads the page
Actually, depending on the pattern of usage this may make a lot of sense. But that doesn't necessary preclude the method above - just set a threshold on when you'll push the aggregated data into the pre-consolidated table and test the threshold on each request.

I'd personally go for the storing the cached data in a file, then just read that file if it has been updated within a certain timeframe, if not then do your update (e.g. getting info from the database, writing to the file).
Some example code:
$cacheTime = 900; // 15 minutes
$useCache = false;
$cacheFile = './cache/twitter.cachefile';
// check time
if(file_exists($cacheFile)){
$cacheContents = file_get_contents($cacheFile);
if((date('U')-filemtime($cacheFile))<$cacheTime || filesize($cacheFile)==0){
$useCache = true;
}
}
if(!$useCache){
// get all your update data setting $cacheContents to the file output. I'd imagine using a buffer here would be a good idea.
// update cache file contents
$fh = fopen($cacheFile, 'w+');
fwrite($fh, $cacheContents);
fclose($fh);
}
echo $cacheContents;

Is it possible at the same time to give all users of website the same session?

Is it possible at the same time to give all users of website the same $_SESSION?

I have interpreted your question in the following way:
Is it possible at the same time to give all users of a website the same [state]?
Yes, shared state is usually stored in a database, although state may also be stored in the local file system.
Using $_SESSION is meant to save state for a single user. If you abuse that tool, you will create an insecure system.

I want to do following
if($_SESSION['new_posts'] + 60 >
time()) echo "there are new posts in
the forum!"; I don't want to use mysql
for that.
An easy and fast way
on every new post you do:
//will write an empty file or just update modification time
touch ('new_posts.txt');
And then:
if(filemtime('new_posts.txt') + 60 > time()) { ... }

Global sessions are not possible without opening your site up for a gigantic security hole.
Instead, let's look at what you want to do, which is grabbing data while avoiding a query every page.
1) Writing the value out to a file and then reading it every request is an option. Sessions are stored in a file on the server by default, so it would have the same speed as a session.
2) Store it in a cache such as APC, Memcache, Redis.
Keep in mind these are cached values - you'll still have to update them regularly. You have to use either a cron job or have the client update them. But what do you do when the cache expires and tons of clients are trying to update the cache at once? This is called the dogpile effect and it's something to think about.
Or you could just simply write a SQL query and execute it every page and keep it simple. Why don't you want to do this? Have you profiled the code and determined that it's an actual issue? Worrying about this before it's a known problem is a waste of time.

Integer number truncated when writing to text file in PHP?

I wrote a download counter:
$hit_count = #file_get_contents('download.txt');
$hit_count++;
#file_put_contents('download.txt', $hit_count);
header('Location: file/xxx.zip');
As simple as that. The problem is the stats number is truncated to 4 digits thus not showing the real count:
http://www.converthub.com/batch-image-converter/download.txt
The batch image converter program gets downloaded a couple hundred times per day and the PHP counter has been in place for months. The first time I found out about this was about 2 months ago when I was very happy that it hit 8000 mark after a few weeks yet a week after that it was 500 again. And it happened again and again.
No idea why. Why?

You're probably suffering a race condition in the filesystem, you're attempting to open and read a file, then open the same file and write to it. The operating system may not have fully released its original lock on the file when you close it for reading then open it for writing again straight away. If the site is as busy as you say, then you could even have issues of multiple instances of your script trying to access the file at the same time
Failing that, do all your file operations in one go. If you use fopen (), flock (), fread (), rewind (), fwrite () and fclose () to handle the hit counter updating you can avoid having to close the file and open it again. If you use r+ mode, you'll be able to read the value, increment it, and write the result back in one go.
None of this can completely guarantee that you won't hit issues with concurrent accesses though.
I'd strongly recommend looking into a different approach to implementing your hit counter, such as a database driven counter.

Always do proper error handling, don't just suppress errors with #. In this case, it is probable that the file_get_contents has failed as the file was being written at the time. Thus, $hit_count is set to FALSE, and $hit_count++ makes it 1. So your counter gets randomly reset to 1 whenever the reading fails.
If you insist on writing the number to a file, do proper error checking and only write to the file if you are SURE you got the file open.
$hit_count = file_get_contents('download.txt');
if($hit_count !== false) {
$hit_count++;
file_put_contents('download.txt', $hit_count);
}
header('Location: file/xxx.zip');
It will still fail occasionally, but at least it will not truncate your counter.

This is a kind of situation where having a database record the visits (which would allow for greater data-mining as it could be trended by date, time, referrer, location, etc) would be a better solution than using a counter in a flat file.
A cause may be that you are having a collision between a read and write action on a file (happening once every 8,000 instances or so). Adding the LOCK_EX flag to the file_get_contents() PHP Reference action may prevent this, but I could not be 100% certain.
Better to look at recording the data into a database, as that is almost certain to prevent your current problem of losing count.

PHP and concurrent file access

I'm building a small web app in PHP that stores some information in a plain text file. However, this text file is used/modified by all users of my app at some given point in time and possible at the same time.
So the questions is. What would be the best way to make sure that only one user can make changes to the file at any given point in time?

You should put a lock on the file
$fp = fopen("/tmp/lock.txt", "r+");
if (flock($fp, LOCK_EX)) { // acquire an exclusive lock
ftruncate($fp, 0); // truncate file
fwrite($fp, "Write something here\n");
fflush($fp); // flush output before releasing the lock
flock($fp, LOCK_UN); // release the lock
} else {
echo "Couldn't get the lock!";
}
fclose($fp);
Take a look at the http://www.php.net/flock

My suggestion is to use SQLite. It's fast, lightweight, stored in a file, and has mechanisms for preventing concurrent modification. Unless you're dealing with a preexisting file format, SQLite is the way to go.

You could do a commit log sort of format, sort of how wikipedia does.
Use a database, and every saved change creates a new row in the database, that makes the previous record redundant, with an incremented value, then you only have to worry about getting table locks during the save phase.
That way at least if 2 concurrent people happen to edit something, both changes will appear in the history and whatever one lost out to the commit war can be copied into the new revision.
Now if you don't want to use a database, then you have to worry about having a revision control file backing every visible file.
You could put a revision control ( GIT/MERCURIAL/SVN ) on the file system and then automate commits during the save phase,
Pseudocode:
user->save :
getWritelock();
write( $file );
write_commitmessage( $commitmessagefile ); # <-- author , comment, etc
call "hg commit -l $commitmessagefile $file " ;
releaseWriteLock();
done.
At least this way when 2 people make critical commits at the same time, neither will get lost.

A single file for many users really shouldn't be the strategy you use I don't think - otherwise you'll probably need to implement a single (global) access point that monitors if the file is currently being edited or not. Aquire a lock, do your modification, release the lock etc. I'd go with 'Nobody's suggestion to use a database (SQLite if you don't want the overhead of a fully decked out RDBMS)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.