PHP multithreaded cURL alternative proposed, but is it good?

PHP multithreaded cURL alternative proposed, but is it good? - php

Because PHP doesn't have multithreading capabilities I am trying to find a workaround to speed up a simple process.
The process is I am posting data to a webpage with various permutations in the post-data on each request. In a foreach loop I am checking each request response to see if a string exists using strspos. When it is found, it breaks and returns the page. There are around 1000 requests, and it takes bout 1 minute to complete or longer.
As I don't want to use additional libraries, my idea was to exec standalone scripts passing each permutation of post data (say 1000 processes). Each process will only write to a file if the string is found. And on the main script I will run a loop checking if file exists, when it does find that the file exists, can read the file for the post data that was correct.
It seems sound in theory, but I wanted to check if this is a ridiculous solution for a problem that has much simpler solutions!
Thanks.

One solution is using process control library
http://php.net/manual/en/book.pcntl.php
I don't know if you have support for them installed

Related

Getting data from forked children

I am playing with pcntl_fork() in PHP.
I took the class that is written in the second comment, and tried to send data to it - which seems to work fine.
Now I did some processing on that data, and would like to receive some results in my parent process.
Does anyone know how this can be done ? The only way I can think of doing this is to store the information in the database and/or other storage.

Having worked with pcntl fork in a number of projects i do not believe there is any way to send data back to the parent process directly. You would be able to do this via the database as you have already mentioned however you may be better off using PHP's shared memory components (http://php.net/manual/en/book.shmop.php) or memcache for this purpose.
Can you elaborate on what you are doing, it may be that you can avoid this requirement.

max_execution_time Alternative

So here's the lowdown:
The client i'm developing for is on HostGator, which has limited their max_execution_time to 30 seconds and it cannot be overridden (I've tried and confirmed it cannot be via their support and wiki)
What I'm have the code doing is take an uploaded file and...
loop though the xml
get all feed download links within the file
download each xml file
individually loop though each xml array of each file and insert the information of each item into the database based on where they are from (i.e. the filename)
Now is there any way I can queue this somehow or split the workload into multiple files possibly? I know the code works flawlessly and checks to see if each item exists before inserting it but I'm stuck getting around the execution_limit.
Any suggestions are appreciated, let me know if you have any questions!

The timelimit is in effect only when executing PHP scripts through a webserver, if you execute the script from CLI or as a background process, it should work fine.
Note that executing an external script is somewhat dangerous if you are not careful enough, but it's a valid option.
Check the following resources:
Process Control Extensions
And specifically:
pcntl-exec
pcntl-fork

Did you know you can trick the max_execution_time by registering a shutdown handler? Within that code you can run for another 30 seconds ;-)
Okay, now for something more useful.
You can add a small queue table in your database to keep track of where you are in case the script dies mid-way.
After getting all the download links, you add those to the table
Then you download one file and process it; when you're done, you check them off (delete from) from the queue
Upon each run you check if there's still work left in the queue
For this to work you need to request that URL a few times; perhaps use JavaScript to keep reloading until the work is done?

I am in such a situation. My approach is similar to Jack's
accept that execution time limit will simply be there
design the application to cope with sudden exit (look into register_shutdown_function)
identify all time-demanding parts of the process
continuously save progress of the process
modify your components so that they are able to start from arbitrary point, e.g. a position in a XML file or continue downloading your to-be-fetched list of XML links
For the task I made two modules, Import for the actual processing; TaskManagement for dealing with these tasks.
For invoking TaskManager I use CRON, now this depends on what webhosting offers you, if it's enough. There's also a WebCron.
Jack's JavaScript method's advantage is that it only adds requests if needed. If there are no tasks to be executed, the script runtime will be very short and perhaps overstated*, but still. The downsides are it requires user to wait the whole time, not to close the tab/browser, JS support etc.
*) Likely much less demanding than 1 click of 1 user in such moment
Then of course look into performance improvements, caching, skipping what's not needed/hasn't changed etc.

Achieving multithreading in PHP

I am writing a kind of test system in php that would test my database records. I have separated php files for every test case. One (master) file is given the test number and the input parameters for that test in the form of URL string. That file determines the test number and calls the appropriate test case based on test number. Now I have a bunch of URL strings to be passed, I want those to be passsed to that (master) file and every test case starts working independently after receiving its parameters.

PHP is a single threaded entity, no multithreading currently exists for it. However, there are a few things you can do to achieve similar (but not identical) results for use cases I have come across when people normally ask me about multithreading. Again, there is no multithreading in PHP, but some of the below may help you further in creating something with characteristics that may match your requirement.
libevent: you could use this to create an event loop for PHP which would make blocking less of an issue. See http://www.php.net/manual/en/ref.libevent.php
curl_multi: Another useful library that can fire off get/post to other services.
Process Control: Not used this myself, but may be of value if process control is one aspect of your issue. http://uk.php.net/pcntl
Gearman: Now this I've used and it's pretty good. It allows you to create workers and spin off processes into a queue. You may also want to look at rabbit-php or ZeroMQ.

PHP is not multithreaded, it's singlethreaded. You cannot start new threads within PHP. Your best bet would be a file_get_contents (or cURL) to another PHP script to "mimic" threads. True multithreading isn't available in PHP.
You could also have a look at John's post at http://phplens.com/phpeverywhere/?q=node/view/254.

What you can do is use cURL to send the requests back to the server. The request will be handled and the results will be returned.
An example would be:
$c = curl_init("http://servername/".$script_name.$params);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($c);
curl_close($c);
Although this is not considered multithreading, it can be used to achieve your goal.

break up recursive function in php

What is the best way to break up a recursive function that is using a ton of resources
For example:
function do_a_lot(){
//a lot of code and processing is done here
//it takes a lot of execution time
if($true){
//if true we have to do all of that processing again
do_a_lot();
}
}
Is there anyway to make the server only have to take the brunt of the first execution and then break up the recursion into separate processes? Or am I dreaming?

Honestly, if your function is using up that much of your system's resources, I'd most likely refactor my code. However, it's not truly multithreading, but you could perhaps look at using popen to fork your process.

One of the rule of PHP is "Share nothing". That means every PHP process is independant and shares nothing with the others. So if you want to break your execution on several PHP process you'll have to store the data somewhere. It can be a memcached storage, or a database, or the session, as you want.
Then you'll need to 'fork' your PHp process. They're solutions available to get this done on the server side. IMHO this is all hacks. Dangerous and not minded in the PHP/web way. With the exception of 'work queues' tools.
I think the nicest way is to break your task with ajax. This will allow you a clean user interface and will avoid any long response timeout in the web process. i.e. show a 'working zone' to you user, then ask in ajax for next step of the job (first one), get response (in server side stor you response), then ask for next step, store new response and respond , next step, etc. You can even add a 'stop that stuff' function on the client side.
You can check as well for 'php work queue' on google.

If it's a long running task, divide and conquer with gearman

PHP display progress messages on the fly

I am working in a tool in PHP that processes a lot of data and takes a while to finish. I would like to keep the user updated with what is going on and the current task processed.
What is in your opinion the best way to do it? I've got some ideas but can't decide for the most effective one:
The old way: execute a small part of the script and display a page to the user with a Meta Redirect or a JavaScript timer to send a request to continue the script (like /script.php?step=2).
Sending AJAX requests constantly to read a server file that PHP keeps updating through fwrite().
Same as above but PHP updates a field in the database instead of saving a file.
Does any of those sound good? Any ideas?
Thanks!

Rather than writing to a static file you fetch with AJAX or to an extra database field, why not have another PHP script that simply returns a completion percentage for the specified task. Your page can then update the progress via a very lightweight AJAX request to said PHP script.
As for implementing this "progress" script, I could offer more advice if I had more insight as to what you mean by "processes a lot of data". If you are writing to a file, your "progress" script could simply check the file size and return the percentage complete. For more complex tasks, you might assign benchmarks to particular processes and return an estimated percentage complete based on which process has completed last or is currently running.
UPDATE
This is one suggested method to "check the progress" of an active script which is simply waiting for a response from a request. I have a data mining application that I use a similar method for.
In your script that makes the request you're waiting for (the script you want to check the progress of), you can store (either in a file or a database, I use a database as I have hundreds of processes running at any time which all need to track their progress, and I have another script that allows me to monitor progress of these processes) a progress variable for the process. When the process begins, set this to 1. You can easily select an arbitrary number of 'checkpoints' the script will pass and calculate the percentage given the current checkpoint. For a large request, however, you might be more interested in knowing the approximate percent the request has completed. One possible solution would be to know the size of the returned content and set your status variable according to the percentage received at any moment. I.e. if you receive the request data in a loop, each iteration you could update the status. Or if you are downloading to a flat file you could poll the size of the file. This could be done less accurately with time (rather than file size) if you know the approximate time the request should take to complete and simply compare against the script's current execution time. Obviously neither of these are perfect solutions, but I hope they'll give you some insight into your options.

I suggest using the AJAX method, but not using a file or a database. You could probably use session values or something like that, that way you don't have to create a connection or open a file to do anything.

In the past, I've just written messages out to the page and used flush() to flush the output buffer. Very simple, but it may not work correctly on every web server or with every web browser (as they may do their own internal buffering).
Personally, I like your second option the best. Should be reliable and fairly simple to implement.

I like option 2 - using AJAX to read a status file that PHP writes to periodically. This opens up a lot of different presentation options. If you write a JSON object to the file, you can easily parse it and display things like a progress bar, status messages, etc...

A 'dirty' but quick-and-easy approach is to just echo out the status as the script runs along. So long as you don't have output buffering on, the browser will render the HTML as it receives it from the server (I know WordPress uses this technique for it's auto-upgrade).
But yes, a 'better' approach would be AJAX, though I wouldn't say there's anything wrong with 'breaking it up' use redirects.
Why not incorporate 1 & 2, where AJAX sends a request to script.php?step=1, checks response, writes to the browser, then goes back for more at script.php?step=2 and so on?

if you can do away with IE then use server sent events. its the ideal solution.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP multithreaded cURL alternative proposed, but is it good? - php

One solution is using process control library http://php.net/manual/en/book.pcntl.php I don't know if you have support for them installed

Related

Getting data from forked children

max_execution_time Alternative

Achieving multithreading in PHP

break up recursive function in php

PHP display progress messages on the fly

Categories

Resources