I need to store the time of the last run of a script to make sure it doesn't read old items (tweets in this case). Whats the best way to keep track of this?
thanks.
Generate a timestamp and save it to a logfile that can be read on the next iteration.
$time_ran = time();
function saveTimeRan(){
$fh = fopen('/path/to/a/new/log' 'w+');
fwrite($fh, $time_ran);
fclose($fh);
}
function getTimeRan(){
$fh = fopen('/path/to/a/new/log' 'r+');
$time = fgets($fh);
fclose($fh);
return $time;
}
Might I suggest you make this an object and put the contents of saveTimeRan in the magic __destruct function, so when your object is GC'ed it will save the time. Just a suggestion. You could put your tweet functions in other object methods and make one comprehensive interface. You could, alternatively, save the value to a database field and call it each iteration.
You gave zero information on specifics, so I will be general.
Store it in a database like MySQL
Write it to a local file with file_put_contents() and read it with file_get_contents()
Touch a local file with one of the os functions, and then stat the file to see it's mtime
Use sqlite to write it to a local database
Whats the best way to keep track of
this?
The best way is to keep track of that in memory(redis/apc/memcached/etc)(back upped by any persistent store(mysql/mongodb/redis/etc). You should try to avoid to touch the disc(I/O) because that's very very slow compared to speed of memory.
Redis
You can configure redis two ways:
write data back to persistent asynchronously(snapshotting) after certain time has passed and or keys have been modified.
write data back to persistent store immediately(append-only).
It's a trade-off(performance vs safety)
APC/Memcached
APC and Memcached don't have persistent storage so you have to do that using for example mysql.
You could store time stamps in a DB or a text file.
You can have the script write a timestamp/date to a text file every time it runs. Then you can optionally read the file before running the script to make sure you don't run it again before x amount of time has passed or only run if a certain condition is true.
Check out file_put_contents and file_get_contents.
-Ryan
Related
Have a file in a website. A PHP script modifies it like this:
$contents = file_get_contents("MyFile");
// ** Modify $contents **
// Now rewrite:
$file = fopen("MyFile","w+");
fwrite($file, $contents);
fclose($file);
The modification is pretty simple. It grabs the file's contents and adds a few lines. Then it overwrites the file.
I am aware that PHP has a function for appending contents to a file rather than overwriting it all over again. However, I want to keep using this method since I'll probably change the modification algorithm in the future (so appending may not be enough).
Anyway, I was testing this out, making like 100 requests. Each time I call the script, I add a new line to the file:
First call:
First!
Second call:
First!
Second!
Third call:
First!
Second!
Third!
Pretty cool. But then:
Fourth call:
Fourth!
Fifth call:
Fourth!
Fifth!
As you can see, the first, second and third lines simply disappeared.
I've determined that the problem isn't the contents string modification algorithm (I've tested it separately). Something is messed up either when reading or writing the file.
I think it is very likely that the issue is when the file's contents are read: if $contents, for some odd reason, is empty, then the behavior shown above makes sense.
I'm no expert with PHP, but perhaps the fact that I performed 100 calls almost simultaneously caused this issue. What if there are two processes, and one is writing the file while the other is reading it?
What is the recommended approach for this issue? How should I manage file modifications when several processes could be writing/reading the same file?
What you need to do is use flock() (file lock)
What I think is happening is your script is grabbing the file while the previous script is still writing to it. Since the file is still being written to, it doesn't exist at the moment when PHP grabs it, so php gets an empty string, and once the later processes is done it overwrites the previous file.
The solution is to have the script usleep() for a few milliseconds when the file is locked and then try again. Just be sure to put a limit on how many times your script can try.
NOTICE:
If another PHP script or application accesses the file, it may not necessarily use/check for file locks. This is because file locks are often seen as an optional extra, since in most cases they aren't needed.
So the issue is parallel accesses to the same file, while one is writing to the file another instance is reading before the file has been updated.
PHP luckily has a mechanisms for locking the file so no one can read from it until the lock is released and the file has been updated.
flock()
can be used and the documentation is here
You need to create a lock, so that any concurrent requests will have to wait their turn. This can be done using the flock() function. You will have to use fopen(), as opposed to file_get_contents(), but it should not be a problem:
$file = 'file.txt';
$fh = fopen($file, 'r+');
if (flock($fh, LOCK_EX)) { // Get an exclusive lock
$data = fread($fh, filesize($file)); // Get the contents of file
// Do something with data here...
ftruncate($fh, 0); // Empty the file
fwrite($fh, $newData); // Write new data to file
fclose($fh); // Close handle and release lock
} else {
die('Unable to get a lock on file: '.$file);
}
Some background information
The files I would like to download is kept at the external server for a week, and a new XML file(10-50mb large) is created there every hour with a different name. I would like the large file to be downloaded to my server chunk by chunk in the background each time my website is loaded, perhaps 0.5mb each time, and then resume the download the next time someone else loads the website. This would require my site to have atleast 100 pageloads each hour to stay updated, so perhaps abit more of the file each time if possible. I have researched simpleXML, XMLreader, SAX parsing, but whatever I do, it seems it takes too long to parse the file directly, therefore I would like a different approach, namely downloading it like described above.
If I download a 30mb large XML file, I can parse it locally with XMLreader in 3 seconds(250k iterations) only, but when I try to do the same from the external server limiting it to 50k iterations, it uses 15secs to read that small part, so it would not be possible to parse it directly from that server it seems.
Possible solutions
I think it's best to use cURL. But then again, perhaps fopen(), fsockopen(), copy() or file_get_contents() are the way to go. I'm looking for advice on what functions to use to make this happen, or different solutions on how I can parse a 50mb external XML file into a mySQL database.
I suspect a Cron job every hour would be the best solution, but I am not sure how well that would be supported by webhosting companies, and I have no clue how to do something like that. But if thats the best solution, and the majority thinks so, I will have to do my research in that area too.
If a java applet/javascript running in the background would be a better solution, please point me in the right direction when it comes to functions/methods/libraries there aswell.
Summary
What's the best solution to downloading parts of a file in the
background, and resume the download each time my website is loaded
until its completed?
If the above solution would be moronic to even try, what
language/software would you use to achieve the same thing(download a large file every hour)?
Thanks in advance for all answers, and sorry for the long story/question.
Edit: I ended up using this solution to get the files with cron job scheduling a php script. It checks my folder for what files I already have, generates a list of the possible downloads for the last four days, then downloads the next XMLfile in line.
<?php
$date = new DateTime();
$current_time = $date->getTimestamp();
$four_days_ago = $current_time-345600;
echo 'Downloading: '."\n";
for ($i=$four_days_ago; $i<=$current_time; ) {
$date->setTimestamp($i);
if($date->format('H') !== '00') {
$temp_filename = $date->format('Y_m_d_H') ."_full.xml";
if(!glob($temp_filename)) {
$temp_url = 'http://www.external-site-example.com/'.$date->format('Y/m/d/H') .".xml";
echo $temp_filename.' --- '.$temp_url.'<br>'."\n";
break; // with a break here, this loop will only return the next file you should download
}
}
$i += 3600;
}
set_time_limit(300);
$Start = getTime();
$objInputStream = fopen($temp_url, "rb");
$objTempStream = fopen($temp_filename, "w+b");
stream_copy_to_stream($objInputStream, $objTempStream, (1024*200000));
$End = getTime();
echo '<br>It took '.number_format(($End - $Start),2).' secs to download "'.$temp_filename.'".';
function getTime() {
$a = explode (' ',microtime());
return(double) $a[0] + $a[1];
}
?>
edit2: I just wanted to inform you that there is a way to do what I asked, only it would'nt work in my case. With the amount of data I need the website would have to have 400+ visitors an hour for it to work properly. But with smaller amounts of data there are some options; http://www.google.no/search?q=poormanscron
You need to have a scheduled, offline task (e.g., cronjob). The solution you are pursuing is just plain wrong.
The simplest thing that could possibly work is a php script you run every hour (scheduled via cron, most likely) that downloads the file and processes it.
You could try fopen:
<?php
$handle = fopen("http://www.example.com/test.xml", "rb");
$contents = stream_get_contents($handle);
fclose($handle);
?>
I need to include one PHP file and execute function from it.
After execution, on end of PHP script I want to append something to it.
But I'm unable to open file. It's possible to close included file/anything similar so I'll be able to append info to PHP file.
include 'something.php';
echo $somethingFromIncludedFile;
//Few hundred lines later
$fh = fopen('something.php', 'a') or die('Unable to open file');
$log = "\n".'$usr[\''.$key.'\'] = \''.$val.'\';';
fwrite($fh, $log);
fclose($fh);
How to achieve that?
In general you never should modify your PHP code using PHP itself. It's a bad practice, first of all from security standpoint. I am sure you can achieve what you need in other way.
As Alex says, self-modifying code is very, VERY dangerous. And NOT seperating data from code is just dumb. On top of both these warnings, is the fact that PHP arrays are relatively slow and do not scale well (so you could file_put_contents('data.ser',serialize($usr)) / $usr=unserialize(file_get_contents('data.ser')) but it's only going to work for small numbers of users).
Then you've got the problem of using conventional files to store data in a multi-user context - this is possible but you need to build sophisticated locking queue management. This usually entails using a daemon to manage the queue / mutex and is invariably more effort than its worth.
Use a database to store data.
As you already know this attempt is not one of the good ones. If you REALLY want to include your file and then append something to it, then you can do it the following way.
Be aware that using eval(); is risky if you cannot be 100% sure if the content of the file does not contain harmful code.
// This part is a replacement for you include
$fileContent = file_get_contents("something.php");
eval($fileContent);
// your echo goes here
// billion lines of code ;)
// file append mechanics
$fp = fopen("something.php", "a") or die ("Unexpected file open error!");
fputs($fp, "\n".'$usr[\''.$key.'\'] = \''.$val.'\';');
fclose($fp);
What's the cleanest way in php to open a file, read the contents, and subsequently overwrite the file's contents with some output based on the original contents? Specifically, I'm trying to open a file populated with a list of items (separated by newlines), process/add items to the list, remove the oldest N entries from the list, and finally write the list back into the file.
fopen(<path>, 'a+')
flock(<handle>, LOCK_EX)
fread(<handle>, filesize(<path>))
// process contents and remove old entries
fwrite(<handle>, <contents>)
flock(<handle>, LOCK_UN)
fclose(<handle>)
Note that I need to lock the file with flock() in order to protect it across multiple page requests. Will the 'w+' flag when fopen()ing do the trick? The php manual states that it will truncate the file to zero length, so it seems that may prevent me from reading the file's current contents.
If the file isn't overly large (that is, you can be confident loading it won't blow PHP's memory limit), then the easiest way to go is to just read the entire file into a string (file_get_contents()), process the string, and write the result back to the file (file_put_contents()). This approach has two problems:
If the file is too large (say, tens or hundreds of megabytes), or the processing is memory-hungry, you're going to run out of memory (even more so when you have multiple instances of the thing running).
The operation is destructive; when the saving fails halfway through, you lose all your original data.
If any of these is a concern, plan B is to process the file and at the same time write to a temporary file; after successful completion, close both files, rename (or delete) the original file and then rename the temporary file to the original filename.
Read
$data = file_get_contents($filename);
Write
file_put_contents($filename, $data);
One solution is to use a separate lock file to control access.
This solution assumes that only your script, or scripts you have access to, will want to write to the file. This is because the scripts will need to know to check a separate file for access.
$file_lock = obtain_file_lock();
if ($file_lock) {
$old_information = file_get_contents('/path/to/main/file');
$new_information = update_information_somehow($old_information);
file_put_contents('/path/to/main/file', $new_information);
release_file_lock($file_lock);
}
function obtain_file_lock() {
$attempts = 10;
// There are probably better ways of dealing with waiting for a file
// lock but this shows the principle of dealing with the original
// question.
for ($ii = 0; $ii < $attempts; $ii++) {
$lock_file = fopen('/path/to/lock/file', 'r'); //only need read access
if (flock($lock_file, LOCK_EX)) {
return $lock_file;
} else {
//give time for other process to release lock
usleep(100000); //0.1 seconds
}
}
//This is only reached if all attempts fail.
//Error code here for dealing with that eventuality.
}
function release_file_lock($lock_file) {
flock($lock_file, LOCK_UN);
fclose($lock_file);
}
This should prevent a concurrently-running script reading old information and updating that, causing you to lose information that another script has updated after you read the file. It will allow only one instance of the script to read the file and then overwrite it with updated information.
While this hopefully answers the original question, it doesn't give a good solution to making sure all concurrent scripts have the ability to record their information eventually.
I experimenting with twitter streaming API,
I use Phirehose to connect to twitter and fetch the data but having problems storing it in files for further processing.
Basically what I want to do is to create a file named
date("YmdH")."."txt"
for every hour of connection.
Here is how my code looks like right now (not handling the hourly change of files)
public function enqueueStatus($status)
$data = json_decode($status,true);
if(isset($data['text'])/*more conditions here*/) {
$fp = fopen("/tmp/$time.txt");
fwirte ($status,$fp);
fclose($fp);
}
Help is as always much appreciated :)
You want the 'append' mode in fopen - this will either append to a file or create it.
if(isset($data['text'])/*more conditions here*/) {
$fp = fopen("/tmp/" . date("YmdH") . ".txt", "a");
fwrite ($status,$fp);
fclose($fp);
}
From the Phirehose googlecode wiki:
As of Phirehose version 0.2.2 there is
an example of a simple "ghetto queue"
included in the tarball (see file:
ghetto-queue-collect.php and
ghetto-queue-consume.php) that shows
how statuses could be easily collected
on to the filesystem for processing
and then picked up by a separate
process (consume).
This is a complete working sample of doing what you want to do. The rotation time interval is configurable too. Additionally there's another script to consume and process the written files too.
Now if only I could find a way to stop the whole sript, my log keeps filling up (the script continues execution) even if I close the browser tab :P