I have a PHP page with multiple links. Each link writes a different content to the same file. This generally works. However, using the same link within a minute of having it used already, it neither creates nor modifies the file anymore.
This can be verified using Terminal.
Waiting said minute will resume the PHP script to work properly again.
$file = fopen("/private/tmp/iTunesRemoteCommand", "w");
fwrite($file, $_GET['action']);
fclose($file);
chmod("/private/tmp/iTunesRemoteCommand", 0777);
print_r("Done");
For testing purposes, I swapped $_GET['action'] with a fixed, manually entered string.
In essence, every link works once every minute.
The installed version of PHP is v5.3.4.
Having tried it with multiple browsers, I wonder if writing the same content into a file in relative short succession is a limitation of PHP or is there a setting (php.ini ?) where this delay can be reduced.
PHP isn't doing any delays. Writing to a file has no timing penalties whatsoever.
But you have something next to PHP that interferes here. Obviously you are trying to remote-control iTunes. That cannot happen without iTunes reading the file. And probably this affects what can be written to the file, and when.
Just think if iTunes uses a cronjob internally to look for the file contents, then this might happen only every minute. It probably is beyond our influence to get to know what really happens.
Related
I am working on web scraping with php and curl to scrap a whole website
but it takes more than one day to complete the process of scraping
I have even used
ignore_user_abort(true);
set_error_handler(array(&$this, 'customError'));
set_time_limit (0);
ini_set('memory_limit', '-1');
I have also cleared memory after scraping a page I am using simple html DOM
to get the scraping details from a page
But still process runs and works fine for some amount of links after that it stops although process keeps circulating the browser and no error log is generated
Could not understand what seems to be the problem.
Also I need to know if PHP can
run process for two or three days?
thanks in advance
PHP can run for as long as you need it to, but the fact it stops after what seems like the same point every time indicates there is an issue with your script.
You said you have tried ignore_user_abort(true);, but then indicated you were running this via a browser. This setting only works in command line as closing a browser window for a script of this type will not terminate the process anyway.
Do you have xDebug? simplehtmlDOM will throw some rather interesting errors with malformed html (a link within a broken link for example). xDebug will throw a MAX_NESTING_LEVEL error in a browser, but will not throw this in a console unless you have explicitly told it to with the -d flag.
There are lots of other errors, notices, warnings etc which will break/stop your script without writing anything to error_log.
Are you getting any errors?
When using cURL in this way it is important to use multi cURL to parallel process URLs - depending on your environment, 150-200 URLs at a time is easy to achieve.
If you have truly sorted out the memory issue and freed all available space like you have indicated, then the issue must be with a particular page it is crawling.
I would suggest running your script via a console and finding out exactly when it stops to run that URL separately - at least this will indicate if it is a memory issue or not.
Also remember that set_error_handler(array(&$this, 'customError')); will NOT catch every type of error PHP can throw.
When you next run it, debug via a console to show progress, and keep a track of actual memory use - either via PHP (printed to console) or via your systems process manager. This way you will be closer to finding out what the actual issue with your script is.
Even if you set an unlimited memory, there exists a physical limit.
If you call recursively the URLs, the memory can be fullfilled.
Try to do a loop and work with a database:
scan a page, store the founded links if there aren't in the database yet.
when finish, do a select, and get the first unscanned URL
{loop}
I'm learning php and I'd like to write a simple forum monitor, but I came to a problem. How do I write a script that downloads a file regularly? When the page is loaded, the php is executed just once, and if I put it into a loop, it would all have to be ran before the page is finished loading. But I want to, say, download a file every minute and make a notification on the page when the file changes. How do I do this?
Typically, you'll act in two steps :
First, you'll have a PHP script that will run every minute -- using the crontab
This script will do the heavy job : downloading and parsing the page
And storing some information in a shared location -- a database, typically
Then, your webpages will only have to check in that shared location (database) if the information is there.
This way, your webpages will always work :
Even if there are many users, only the cronjob will download the page
And even if the cronjob doesn't work for a while, the webpage will work ; worst possible thing is some information being out-dated.
Others have already suggested using a periodic cron script, which I'd say is probably the better option, though as Paul mentions, it depends upon your use case.
However, I just wanted to address your question directly, which is to say, how does a daemon in PHP work? The answer is that it works in the same way as a daemon in any other language - you start a process which doesn't end immediately, and put it into the background. That process then polls files or accepts socket connections or somesuch, and in so doing, accepts some work to do.
(This is obviously a somewhat simplified overview, and of course you'd typically need to have mechanisms in place for process management, signalling the service to shut down gracefully, and perhaps integration into the operating system's daemon management, etc. but the basics are pretty much the same.)
How do I write a script that downloads
a file regularly?
there are shedulers to do that, like 'cron' on linux (or unix)
When the page is loaded, the php is
executed just once,
just once, just like the index.php of your site....
If you want to update a page which is show in a browser than you should use some form of AJAX,
if you want something else than your question is not clear to /me......
I'm currently running a Linux based VPS, with 768MB of Ram.
I have an application which collects details of domains and then connect to a service via cURL to retrieve details of the pagerank of these domains.
When I run a check on about 50 domains, it takes the remote page about 3 mins to load with all the results, before the script can parse the details and return it to my script. This causes a problem as nothing else seems to function until the script has finished executing, so users on the site will just get a timer / 'ball of death' while waiting for pages to load.
**(The remote page retrieves the domain details and updates the page by AJAX, but the curl request doesnt (rightfully) return the page until loading is complete.
Can anyone tell me if I'm doing anything obviously wrong, or if there is a better way of doing it. (There can be anything between 10 and 10,000 domains queued, so I need a process that can run in the background without affecting the rest of the site)
Thanks
A more sensible approach would be to "batch process" the domain data via the use of a cron triggered PHP cli script.
As such, once you'd inserted the relevant domains into a database table with a "processed" flag set as false, the background script would then:
Scan the database for domains that aren't marked as processed.
Carry out the CURL lookup, etc.
Update the database record accordingly and mark it as processed.
...
To ensure no overlap with an existing executing batch processing script, you should only invoke the php script every five minutes from cron and (within the PHP script itself) check how long the script has been running at the start of the "scan" stage and exit if its been running for four minutes or longer. (You might want to adjust these figures, but hopefully you can see where I'm going with this.)
By using this approach, you'll be able to leave the background script running indefinitely (as it's invoked via cron, it'll automatically start after reboots, etc.) and simply add domains to the database/review the results of processing, etc. via a separate web front end.
This isn't the ideal solution, but if you need to trigger this process based on a user request, you can add the following at the end of your script.
set_time_limit(0);
flush();
This will allow the PHP script to continue running, but it will return output to the user. But seriously, you should use batch processing. It will give you much more control over what's going on.
Firstly I'm sorry but Im an idiot! :)
I've loaded the site in another browser (FF) and it loads fine.
It seems Chrome puts some sort of lock on a domain when it's waiting for a server response, and I was testing the script manually through a browser.
Thanks for all your help and sorry for wasting your time.
CJ
While I agree with others that you should consider processing these tasks outside of your webserver, in a more controlled manner, I'll offer an explanation for the "server standstill".
If you're using native php sessions, php uses an exclusive locking scheme so only a single php process can deal with a given session id at a time. Having a long running php script which uses sessions can certainly cause this.
You can search for combinations of terms like:
php session concurrency lock session_write_close()
I'm sure its been discussed many times here. I'm too lazy to search for you. Maybe someone else will come along and make an answer with bulleted lists and pretty hyperlinks in exchange for stackoverflow reputation :) But not me :)
good luck.
I'm not sure how your code is structured but you could try using sleep(). That's what I use when batch processing.
I have written myself a web crawler using simplehtmldom, and have got the crawl process working quite nicely. It crawls the start page, adds all links into a database table, sets a session pointer, and meta refreshes the page to carry onto the next page. That keeps going until it runs out of links
That works fine however obviously the crawl time for larger websites is pretty tedious. I wanted to be able to speed things up a bit though, and possibly make it a cron job.
Any ideas on making it as quick and efficient as possible other than setting the memory limit / execution time higher?
Looks like you're running your script in a web browser. You may consider running it from the command line. You can execute multiple scripts to crawl on different pages at the same time. That should speed things up.
Memory must not be an problem for a crawler.
Once you are done with one page and have written all relevant data to the database you should get rid of all variables you created for this job.
The memory usage after 100 pages must be the same as after 1 page. If this is not the case find out why.
You can split up the work between different processes: Usually parsing a page does not take as long as loading it,so you can write all links that you find to a database and have multiple other processes that just download the documents to a temp directory.
If you do this you must ensure that
no link is downloaded by to workers.
your processes wait for new links if there are none.
temp files are removed after each scan.
the download process stop when you run out of links. You can archive this by setting a "kill flag" this can be a file with a special name or an entry in the database.
I have a PHP script that downloads videos from various locations.
The video files can be any where from 20mb to 100mb+
I've got PHP currently saving the video file in a directory using CURLOPT_FILE. This is working fine with no problems.
Because of the large files that are being dowloaded I've set the cURL timeout period to 45 minutes to allow the file to download. I have also set set_time_limit(0) so that the PHP page should continue processing after the download has completed. I've also set ini_set("memory_limit","500M");
When the download completes it should echo "Downloaded" and then update a mysql record stating the file has been downloaded.
What is happening though, is the video file is being downloaded correctly by cURL but it is not displaying "Downloaded" in the browser BUT it is updating mysql.
Why is this? I've tried to come up with a solution myself, but I cannot work out what the issue here is...
If you're in a browser environment, the browser will timeout after a certain time, and so will stop listening for output from the script, even though the script will continue to run. It varies across browsers, but the number I've seen is 30 seconds.
To overcome this problem, you should send output (even if meaningless echo "<!--empty comment-->";) every so often.
I recently had a similar problem, and I dealt with it by not outputting any content from the script, and instead polling from the browser every so often using AJAX to see if it was done.
Or, don't use the browser environment (as it's not ideally suited for this problem), and instead use a command line prompt, as it does not have (to my knowledge) these timeouts.