Creating files on a time (hourly) basis - php

I experimenting with twitter streaming API,
I use Phirehose to connect to twitter and fetch the data but having problems storing it in files for further processing.
Basically what I want to do is to create a file named
date("YmdH")."."txt"
for every hour of connection.
Here is how my code looks like right now (not handling the hourly change of files)
public function enqueueStatus($status)
$data = json_decode($status,true);
if(isset($data['text'])/*more conditions here*/) {
$fp = fopen("/tmp/$time.txt");
fwirte ($status,$fp);
fclose($fp);
}
Help is as always much appreciated :)

You want the 'append' mode in fopen - this will either append to a file or create it.
if(isset($data['text'])/*more conditions here*/) {
$fp = fopen("/tmp/" . date("YmdH") . ".txt", "a");
fwrite ($status,$fp);
fclose($fp);
}

From the Phirehose googlecode wiki:
As of Phirehose version 0.2.2 there is
an example of a simple "ghetto queue"
included in the tarball (see file:
ghetto-queue-collect.php and
ghetto-queue-consume.php) that shows
how statuses could be easily collected
on to the filesystem for processing
and then picked up by a separate
process (consume).
This is a complete working sample of doing what you want to do. The rotation time interval is configurable too. Additionally there's another script to consume and process the written files too.
Now if only I could find a way to stop the whole sript, my log keeps filling up (the script continues execution) even if I close the browser tab :P

Related

Concurrent writing to zip file

I have a process done in PHP. This process get a file from internet and put it inside a zip file. The target zipfile is based in an algorithm, there are 4096 zipfiles. The target zipfile is based in a hash of the url processed.
I have another program that launches http petitions so i can run the script concurrently (around 110 processes).
My question is simple. Since threads are pseudorandom, easly 2 threads can try to add files to the same zipfile in the same moment.
Is it possible? Will the file get corrupt if 2 proccess try to add files at same time?
Locking the file or something like that would be possible a possible solution.
I was thinking to use semaphores, but reading, php semaphores dont work under windows.
I have seen this possible solution:
if ( !function_exists('sem_get') ) {
function sem_get($key) { return fopen(__FILE__.'.sem.'.$key, 'w+'); }
function sem_acquire($sem_id) { return flock($sem_id, LOCK_EX); }
function sem_release($sem_id) { return flock($sem_id, LOCK_UN); }
}
Anyways the question is if it is allowed to add files to a zip file from 2 or more different php proccesses at same time.
Short answer: No! The zip algorithm analyses and compresses one stream at a time.
This is tough under Windows. It's far from easy in Linux! I would be tempted to create a db table with a unique index, and use that index number to determine a filename, or at least flag that a file is being written to.

About PHP parallel file read/write

Have a file in a website. A PHP script modifies it like this:
$contents = file_get_contents("MyFile");
// ** Modify $contents **
// Now rewrite:
$file = fopen("MyFile","w+");
fwrite($file, $contents);
fclose($file);
The modification is pretty simple. It grabs the file's contents and adds a few lines. Then it overwrites the file.
I am aware that PHP has a function for appending contents to a file rather than overwriting it all over again. However, I want to keep using this method since I'll probably change the modification algorithm in the future (so appending may not be enough).
Anyway, I was testing this out, making like 100 requests. Each time I call the script, I add a new line to the file:
First call:
First!
Second call:
First!
Second!
Third call:
First!
Second!
Third!
Pretty cool. But then:
Fourth call:
Fourth!
Fifth call:
Fourth!
Fifth!
As you can see, the first, second and third lines simply disappeared.
I've determined that the problem isn't the contents string modification algorithm (I've tested it separately). Something is messed up either when reading or writing the file.
I think it is very likely that the issue is when the file's contents are read: if $contents, for some odd reason, is empty, then the behavior shown above makes sense.
I'm no expert with PHP, but perhaps the fact that I performed 100 calls almost simultaneously caused this issue. What if there are two processes, and one is writing the file while the other is reading it?
What is the recommended approach for this issue? How should I manage file modifications when several processes could be writing/reading the same file?
What you need to do is use flock() (file lock)
What I think is happening is your script is grabbing the file while the previous script is still writing to it. Since the file is still being written to, it doesn't exist at the moment when PHP grabs it, so php gets an empty string, and once the later processes is done it overwrites the previous file.
The solution is to have the script usleep() for a few milliseconds when the file is locked and then try again. Just be sure to put a limit on how many times your script can try.
NOTICE:
If another PHP script or application accesses the file, it may not necessarily use/check for file locks. This is because file locks are often seen as an optional extra, since in most cases they aren't needed.
So the issue is parallel accesses to the same file, while one is writing to the file another instance is reading before the file has been updated.
PHP luckily has a mechanisms for locking the file so no one can read from it until the lock is released and the file has been updated.
flock()
can be used and the documentation is here
You need to create a lock, so that any concurrent requests will have to wait their turn. This can be done using the flock() function. You will have to use fopen(), as opposed to file_get_contents(), but it should not be a problem:
$file = 'file.txt';
$fh = fopen($file, 'r+');
if (flock($fh, LOCK_EX)) { // Get an exclusive lock
$data = fread($fh, filesize($file)); // Get the contents of file
// Do something with data here...
ftruncate($fh, 0); // Empty the file
fwrite($fh, $newData); // Write new data to file
fclose($fh); // Close handle and release lock
} else {
die('Unable to get a lock on file: '.$file);
}

Download a large XML file from an external source in the background, with the ability to resume download if incomplete

Some background information
The files I would like to download is kept at the external server for a week, and a new XML file(10-50mb large) is created there every hour with a different name. I would like the large file to be downloaded to my server chunk by chunk in the background each time my website is loaded, perhaps 0.5mb each time, and then resume the download the next time someone else loads the website. This would require my site to have atleast 100 pageloads each hour to stay updated, so perhaps abit more of the file each time if possible. I have researched simpleXML, XMLreader, SAX parsing, but whatever I do, it seems it takes too long to parse the file directly, therefore I would like a different approach, namely downloading it like described above.
If I download a 30mb large XML file, I can parse it locally with XMLreader in 3 seconds(250k iterations) only, but when I try to do the same from the external server limiting it to 50k iterations, it uses 15secs to read that small part, so it would not be possible to parse it directly from that server it seems.
Possible solutions
I think it's best to use cURL. But then again, perhaps fopen(), fsockopen(), copy() or file_get_contents() are the way to go. I'm looking for advice on what functions to use to make this happen, or different solutions on how I can parse a 50mb external XML file into a mySQL database.
I suspect a Cron job every hour would be the best solution, but I am not sure how well that would be supported by webhosting companies, and I have no clue how to do something like that. But if thats the best solution, and the majority thinks so, I will have to do my research in that area too.
If a java applet/javascript running in the background would be a better solution, please point me in the right direction when it comes to functions/methods/libraries there aswell.
Summary
What's the best solution to downloading parts of a file in the
background, and resume the download each time my website is loaded
until its completed?
If the above solution would be moronic to even try, what
language/software would you use to achieve the same thing(download a large file every hour)?
Thanks in advance for all answers, and sorry for the long story/question.
Edit: I ended up using this solution to get the files with cron job scheduling a php script. It checks my folder for what files I already have, generates a list of the possible downloads for the last four days, then downloads the next XMLfile in line.
<?php
$date = new DateTime();
$current_time = $date->getTimestamp();
$four_days_ago = $current_time-345600;
echo 'Downloading: '."\n";
for ($i=$four_days_ago; $i<=$current_time; ) {
$date->setTimestamp($i);
if($date->format('H') !== '00') {
$temp_filename = $date->format('Y_m_d_H') ."_full.xml";
if(!glob($temp_filename)) {
$temp_url = 'http://www.external-site-example.com/'.$date->format('Y/m/d/H') .".xml";
echo $temp_filename.' --- '.$temp_url.'<br>'."\n";
break; // with a break here, this loop will only return the next file you should download
}
}
$i += 3600;
}
set_time_limit(300);
$Start = getTime();
$objInputStream = fopen($temp_url, "rb");
$objTempStream = fopen($temp_filename, "w+b");
stream_copy_to_stream($objInputStream, $objTempStream, (1024*200000));
$End = getTime();
echo '<br>It took '.number_format(($End - $Start),2).' secs to download "'.$temp_filename.'".';
function getTime() {
$a = explode (' ',microtime());
return(double) $a[0] + $a[1];
}
?>
edit2: I just wanted to inform you that there is a way to do what I asked, only it would'nt work in my case. With the amount of data I need the website would have to have 400+ visitors an hour for it to work properly. But with smaller amounts of data there are some options; http://www.google.no/search?q=poormanscron
You need to have a scheduled, offline task (e.g., cronjob). The solution you are pursuing is just plain wrong.
The simplest thing that could possibly work is a php script you run every hour (scheduled via cron, most likely) that downloads the file and processes it.
You could try fopen:
<?php
$handle = fopen("http://www.example.com/test.xml", "rb");
$contents = stream_get_contents($handle);
fclose($handle);
?>

PHP write to included file

I need to include one PHP file and execute function from it.
After execution, on end of PHP script I want to append something to it.
But I'm unable to open file. It's possible to close included file/anything similar so I'll be able to append info to PHP file.
include 'something.php';
echo $somethingFromIncludedFile;
//Few hundred lines later
$fh = fopen('something.php', 'a') or die('Unable to open file');
$log = "\n".'$usr[\''.$key.'\'] = \''.$val.'\';';
fwrite($fh, $log);
fclose($fh);
How to achieve that?
In general you never should modify your PHP code using PHP itself. It's a bad practice, first of all from security standpoint. I am sure you can achieve what you need in other way.
As Alex says, self-modifying code is very, VERY dangerous. And NOT seperating data from code is just dumb. On top of both these warnings, is the fact that PHP arrays are relatively slow and do not scale well (so you could file_put_contents('data.ser',serialize($usr)) / $usr=unserialize(file_get_contents('data.ser')) but it's only going to work for small numbers of users).
Then you've got the problem of using conventional files to store data in a multi-user context - this is possible but you need to build sophisticated locking queue management. This usually entails using a daemon to manage the queue / mutex and is invariably more effort than its worth.
Use a database to store data.
As you already know this attempt is not one of the good ones. If you REALLY want to include your file and then append something to it, then you can do it the following way.
Be aware that using eval(); is risky if you cannot be 100% sure if the content of the file does not contain harmful code.
// This part is a replacement for you include
$fileContent = file_get_contents("something.php");
eval($fileContent);
// your echo goes here
// billion lines of code ;)
// file append mechanics
$fp = fopen("something.php", "a") or die ("Unexpected file open error!");
fputs($fp, "\n".'$usr[\''.$key.'\'] = \''.$val.'\';');
fclose($fp);

check file for changes using php

Is there any way to check id a file is being accessed or modified by another process from a php script. i have attempted to use the filemtime(), fileatime() and filectime() functions but i have the script in a loop which is checking continuously but it seems once the script has been executed it will only take the time from the first time the file was checked.. an example would be uploading files to a FTP or SMB share i attempted this below
while(1==1)
{
$LastMod = filemtime("file");
if(($LastMod +60) > time())
{
echo "file in use please wait... last modified : $LastMod";
sleep(10);
}else{
process file
}
}
I know the file is constantly changing but the $LastMod variable is not updating but end process and execute again will pick up a new $LastMod from the file but dosnt seem to update each time the file is checked in the loop
I have also attempted this with looking at filesize() but get the same symptoms i also looked into flock() but as the file is created or modified outside PHP I don't see how this would work.
If anyone has any solutions please let me know
thanks Vip32
PS. using PHP to process the files as requires interaction with mysql and querying external websites
The file metadata functions all work off stat() output, which caches its data, as a stat() call is a relatively expensive function. You can empty that cache to force stat() to fetch fresh data with clearstatcache()
There are other mechanisms that allow you to monitor for file changes. Instead of doing a loop in PHP and repeatedly stat()ing, consider using an external monitoring app/script which can hook into the OS-provided mechanism and call your PHP script on-demand when the file truly does change.
Add clearstatcache(); to your loop:
while(true)
{
$LastMod = filemtime("file");
clearstatcache();
if(($LastMod +60) > time())
{
echo "file in use please wait... last modified : $LastMod";
sleep(10);
}else{
process file
}
}

Categories