Integer number truncated when writing to text file in PHP?

Integer number truncated when writing to text file in PHP? - php

I wrote a download counter:
$hit_count = #file_get_contents('download.txt');
$hit_count++;
#file_put_contents('download.txt', $hit_count);
header('Location: file/xxx.zip');
As simple as that. The problem is the stats number is truncated to 4 digits thus not showing the real count:
http://www.converthub.com/batch-image-converter/download.txt
The batch image converter program gets downloaded a couple hundred times per day and the PHP counter has been in place for months. The first time I found out about this was about 2 months ago when I was very happy that it hit 8000 mark after a few weeks yet a week after that it was 500 again. And it happened again and again.
No idea why. Why?

You're probably suffering a race condition in the filesystem, you're attempting to open and read a file, then open the same file and write to it. The operating system may not have fully released its original lock on the file when you close it for reading then open it for writing again straight away. If the site is as busy as you say, then you could even have issues of multiple instances of your script trying to access the file at the same time
Failing that, do all your file operations in one go. If you use fopen (), flock (), fread (), rewind (), fwrite () and fclose () to handle the hit counter updating you can avoid having to close the file and open it again. If you use r+ mode, you'll be able to read the value, increment it, and write the result back in one go.
None of this can completely guarantee that you won't hit issues with concurrent accesses though.
I'd strongly recommend looking into a different approach to implementing your hit counter, such as a database driven counter.

Always do proper error handling, don't just suppress errors with #. In this case, it is probable that the file_get_contents has failed as the file was being written at the time. Thus, $hit_count is set to FALSE, and $hit_count++ makes it 1. So your counter gets randomly reset to 1 whenever the reading fails.
If you insist on writing the number to a file, do proper error checking and only write to the file if you are SURE you got the file open.
$hit_count = file_get_contents('download.txt');
if($hit_count !== false) {
$hit_count++;
file_put_contents('download.txt', $hit_count);
}
header('Location: file/xxx.zip');
It will still fail occasionally, but at least it will not truncate your counter.

This is a kind of situation where having a database record the visits (which would allow for greater data-mining as it could be trended by date, time, referrer, location, etc) would be a better solution than using a counter in a flat file.
A cause may be that you are having a collision between a read and write action on a file (happening once every 8,000 instances or so). Adding the LOCK_EX flag to the file_get_contents() PHP Reference action may prevent this, but I could not be 100% certain.
Better to look at recording the data into a database, as that is almost certain to prevent your current problem of losing count.

Related

Can newline-delimited fwrite()s be truncated partway through a line upon SIGINT/SIGKILL in PHP?

Using plain fopen(..., 'w') and fwrite() in PHP, I'm writing simple \n-delimited lines of text to a file.
strace tells me that PHP is using write() to write the data to disk.
From write(2) (Linux):
The number of bytes written may be less than count if, for example, there is insufficient space on the underlying physical medium, or the RLIMIT_FSIZE resource limit is encountered (see setrlimit(2)), or the call was interrupted by a signal handler after having written less than count bytes.
I'm not sure the best way to go about stress-testing this. I'm currently writing large batches of lines to a file and SIGKILLing PHP midway through:
<?php
$A = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrtuvwxyz';
$line = str_repeat("$A\n$A", 5000);
$fd = fopen('linebuf.txt', 'w');
for (;;) {
fwrite($fd, $line);
flush();
print '.';
}
I used the following suboptimal hack to do the SIGKILLing:
while :; do
rm linebuf.txt
timeout -s KILL 0.05 php linebuf.php
[[ "$(tail -c 1 linebuf.txt)" != "z" ]] && break
done
Using the above, I found that the lines always seem to be fully complete - there are no partial writes.
However, I get the impression I'm seeing "best case scenario" behavior, and that my script may well be susceptible to partial writes if it were to encounter SIGKILL and/or SIGINT.
(I'm not sure if PHP's own buffering is relevant in this picture, but I did find a comment on the stream_set_write_buffer() doc page noting that the function does not work on files, referencing an 18-year old bug. I also discovered that the function returns -1 in my own testing.)
Further context/rationale
I'm currently working on a backend script that loads a bunch of data from a slow-to-read data source. Iterating on this script is painfully slow: it currently takes around ~20 minutes to run (and that's before I tackle writing the part of the script that will actually process the data!).
The exceedingly obvious solution is to cache loaded data into a JSON or text file, then reread that (sequentially, at hundreds of MB/s) upon reload. Yes please.
However, the output of this particular script will affect everything else in the pipeline I'm building, so the integrity of the cache must be the same as the data source I'm reading from.
Elsewhere in this pipeline I frequently create temporary files then (close and) rename them into place (and delete all temporary files upon reload). Within this model, I might regenerate a few things, but I can reasonably prove (assuming sane hardware and environment) that I'll never see partially-written data.
I'm looking for a pattern that will allow me to maintain a similar level of correctness/confidence in a journaling/file-append context.
(As an aside, I've seen a few scenarios where multi-threaded programs and/or multiple separate processes fight over stdout in a terminal, or kernel threads and userspace processes trip over each other in dmesg.
In such circumstances it seems very easy for output from one process to be mixed in with output from another, on the same line. I think this is occurring at the block size boundaries?)

Keeping a file handler open over a long period of time in PHP

I am using PHP 7.1 (if it makes any difference), and I am looking for possible issues with opening a file handler with fopen, but closing that file handler a long time after (potentially 30+ minutes).
The scenario comes down to a long running script that logs its actions periodically. Right now, I use a simple process of using file_put_contents with FILE_APPEND. It works. But it slows down over time as the file gets bigger and bigger.
Using fopen and fwrite means I can avoid that slow-down. The only issue is that I wouldnt be calling fclose until the end of the scripts execution. I assume constantly fopen/fclose'ing the file will give me the same poor performance as my current implementation.
Any one have experience with the internals in this regard? I have no need to read the file, only write.
EDIT:
More info: I am currently running this in a Linux VM on my laptop (VirtualBox). I do not expect top notch performance from this setup. That being said, I still notice the slow-down as the log file gets bigger and bigger.
My code logic as as simple as:
while(true)
{
$result = someFunction();
if(!$result)
{
logSomething();
}
else
{
break;
}
}
The frequency of writes is several times a second.
And files can become several GBs in size.

Fatal error: Maximum execution time of 30 seconds exceeded in joomla solution without changing ini file

I'm created a Joomla extension in which i'm storing records from table A to table B. My script is working fine if table A contains less data.
If table A contains large amout of data. While inserting this huge data execution is getting exceed & showing this error 'Fatal error: Maximum execution time of 30 seconds exceeded in
/mysite/libraries/joomla/database/database/mysqli.php on line 382'.
I can overcome this problem by making change in ini file, but its Joomla extension which people gonna use it in their site so i can't tell them to make change in ini file infact i don't wanna tell them.

take a look into this
http://davidwalsh.name/increase-php-script-execution-time-limit-ini_set
ini_set('max_execution_time', 300);
use this way or
set_time_limit(0);

Use the below codes at the start of the page where you wrote the query codes
set_time_limit(0);

Technically, you can increase the maximum execution time using set_time_limit. Personally, I wouldn't mess with limits other people set on their servers, assuming they put them in for a reason (performance, security - especially in a shared hosting context, where software like Joomla! is often found). Also, set_time_limit won't work if PHP is run in safe mode.
So what you're left with is splitting the task into multiple steps. For example, if your table has 100000 records and you measure that you can process about 5000 records in a reasonable amount of time, then do the operation in 20 individual steps.
Execution time for each step should be a good deal less than 30 seconds on an average system. Note that the number of steps is dynamic, you programmatically divide the number of records by a constant (figure out a useful value during testing) to get the number of steps during runtime.
You need to split your script into two parts, one that finds out the number of steps required, displays them to the user and sequentially runs one step after another, by sending AJAX requests to the second script (like: "process records 5001 to 10000"), and marking steps as done (for the user to see) when the appropriate server respone arrives (i.e. request complete).
The second part is entirely server-sided and accepts AJAX requests. This script does the actual work on the server. It must receive some kind of parameters (the "process records 5001 to 10000" request) to understand which step it's supposed to process. When it's done with its step, it returns a "success" (or possibly "failure") code to the client script, so that it can notify the user.
There are variations on this theme, for instance you can build a script which redirects the user to itself, but with different parameters, so it's aware where it left off and can pick up from there with the next step. In general, you'd want the solution that gives the user the most information and control possible.

Preventing a site-wide double submit

I was having a hard time figuring out a good title for this question, so I hope this is clear. I am currently using the TwitterOauth module on one of my sites to post a tweet. While this works, I need to set a limit to the amount of tweets submitted; just one each hour.
Note: I do not have the option to use a database. This is paramount for the question.
I have incorporated this as follows, in the PHP file that handles the actual posting to the Twitter API:
# Save the timestamp, make sure lastSentTweet exists and is writeable
function saveTimestamp(){
$myFile = "./lastSentTweet.inc";
$fh = fopen($myFile, 'w');
$stringData = '<?php function getLastTweetTimestamp() { return '.time().';}';
fwrite($fh, $stringData);
fclose($fh);
}
# Include the lastSentTweet time
include('./lastSentTweet.inc');
# Define the delay
define('TWEET_DELAY', 3600);
# Check for the last tweet
if (time() > getLastTweetTimestamp() + TWEET_DELAY) {
// Posting to Twitter API here
} else {
die("No.");
}
(initial) contents of the lastSentTweet.inc file (chmod 777):
<?php function getLastTweetTimestamp() { return 1344362207;}
The problem is that while this works; it allows for accidental double submits; if multiple users (and the site this script runs on is currently extremely busy) trigger this script, it happens that 2 submits (or more, though this has not occurred yet) to Twitter slip through, instead of just the 1. My first thought is the (although minute) delay in opening, writing and closing the file, but I could be wrong.
Does anyone have an idea what allows for the accidental double submits (and how to fix this)?

You're getting race conditions. You will need to implement locking on your file while you're making changes, but you need to enclose both the read (the include statement) and the update inside the lock; what is critical is to ensure nobody else (e.g. another HTTP request) is using the file, while you read its current value and then update it with the new timestamp.
This would be fairly ineffective. You have other options which might be available in your PHP installation, here are some:
You can use a database even if you don't have a database server: SQLite
You can store your timestamp in APC and use apc_cas() to detect if your last stored timestamp is still current when you update it.
Update
Your locking workflow needs to be something like this:
Acquire the lock on your stored timestamp. If you're working with files, you need to have the file open for reading and writing, and have called flock() on it. flock() will hang if another process has the file locked, and will return only after it has acquired the lock, at which point other processes attempting to lock the file will hang.
Read the stored timestamp from the already locked file.
Check if the required time has passed since the stored timestamp.
Only if it has passed, send the tweet and save the current timestamp to the file; otherwise you don't touch the stored timestamp.
Release the lock (just closing the file is enough).
This would ensure that no other process would update the timestamp after you have read and tested it but before you have stored the new timestamp.

Processing large amounts of data in PHP without a browser timeout

I have array of mobile numbers, around 50,000. I'm trying to process and send bulk SMS to these numbers using third-party API, but the browser will freeze for some minutes. I'm looking for a better option.
Processing of the data involves checking mobile number type (e.g CDMA), assigning unique ids to all the numbers for further referencing, check for network/country unique charges, etc.
I thought of queuing the data in the database and using cron to send about 5k by batch every minute, but that will take time if there are many messages. What are my other options?
I'm using Codeigniter 2 on XAMPP server.

I would write two scripts:
File index.php:
<iframe src="job.php" frameborder="0" scrolling="no" width="1" height="1"></iframe>
<script type="text/javascript">
function progress(percent){
document.getElementById('done').innerHTML=percent+'%';
}
</script><div id="done">0%</div>
File job.php:
set_time_limit(0); // ignore php timeout
ignore_user_abort(true); // keep on going even if user pulls the plug*
while(ob_get_level())ob_end_clean(); // remove output buffers
ob_implicit_flush(true); // output stuff directly
// * This absolutely depends on whether you want the user to stop the process
// or not. For example: You might create a stop button in index.php like so:
// Stop!
// Start
// But of course, you will need that line of code commented out for this feature to work.
function progress($percent){
echo '<script type="text/javascript">parent.progress('.$percent.');</script>';
}
$total=count($mobiles);
echo '<!DOCTYPE html><html><head></head><body>'; // webkit hotfix
foreach($mobiles as $i=>$mobile){
// send sms
progress($i/$total*100);
}
progress(100);
echo '</body></html>'; // webkit hotfix

I'm assuming these numbers are in a database, if so you should add a new column titled isSent (or whatever you fancy).
This next paragraph you typed should be queued and possibly done night/weekly/whenever appropriate. Unless you have a specific reason too, it shouldn't be done in bulk on demand. You can even add a column to the db to see when it was last checked so that if a number hasn't been checked in at least X days then you can perform a check on that number on demand.
Processing of the data involves checking mobile number type (e.g CDMA), assigning unique ids to all the numbers for further referencing, check for network/country unique charges, etc.
But that still leads you back to the same question of how to do this for 50,000 numbers at once. Since you mentioned cron jobs, I'm assuming you have SSH access to your server which means you don't need a browser. These cron jobs can be executed via the command line as such:
/usr/bin/php /home/username/example.com/myscript.php
My recommendation is to process 1,000 numbers at a time every 10 minutes via cron and to time how long this takes, then save it to a DB. Since you're using a cron job, it doesn't seem like these are time-sensitive SMS messages so they can be spread out. Once you know how long it took for this script to run 50 times (50*1000 = 50k) then you can update your cron job to run more/less frequently.
$time_start = microtime(true);
set_time_limit(0);
function doSendSMS($phoneNum, $msg, $blah);
$time_end = microtime(true);
$time = $time_end - $time_start;
saveTimeRequiredToSendMessagesInDB($time);
Also, you might have noticed a set_time_limit(0), this will tell PHP to not timeout after the default 30seconds. If you are able to modify the PHP.ini file then you don't need to enter this line of code. Even if you are able to edit the PHP.ini file, I would still recommend not changing this feature since you might want other pages to time out.
http://php.net/manual/en/function.set-time-limit.php

If this isn't a one-off type of situation, consider engineering a better solution.
What you basically want is a queue that your browser-bound process can write to, and than 1-N worker processes can read from and update.
Putting work in the queue should be rather inexpensive - perhaps a bunch of simple INSERT statements to a SQL RDBMS.
Then you can have a daemon or two (or 100, distributed across multiple servers) that read from the queue and process stuff. You'll want to be careful here and avoid two workers taking on the same task, but that's not hard to code around.
So your browser-bound workflow is: click some button that causes a bunch of stuff to get added to the queue, then redirect to some "queue status" interface, where the user can watch the system chew through all their work.
A system like this is nice, because it's easy to scale horizontally quite a ways.
EDIT: Christian Sciberras' answer is going in this direction, except the browser ends up driving both sides (it adds to the queue, then drives the worker process)

Cronjob would be your best bet, I don't see why it would take any longer than doing it in the browser if your only problem at the moment is the browser timing out.
If you insist on doing it via the browser then the other solution would be doing it in batches of say 1000 and redirecting to the same script but with some reference to where it got up to last time in a $_GET variable.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.