I have a php script which queries a list of clients from a mysql database, and goes to each client's IP address and picks up some information which is then displayed on the webpage.
But, it takes a long time, if the number of clients is too high. Is there anyway, I can send those url requests (file_get_contents) in parallel?
Lineke Kerckhoffs-Willems wrote a good article about Multithreading in PHP with CURL. You can use that instead of file_get_contents() to get needed information.
I would use something like Gearman to assign them as jobs in a queue for workers to come along and complete if this needs to scale.
As another option I have also written a PHP wrapper for the Unix at queue, which might be a fit for this problem. It would allow you to schedule the requests so that they can run in parallel. I have used this method successfully in the past to handle the sending of bulk email, which has similar blocking problems to your script.
Related
I have created a script that uses PDO database functions to pull in data from an external feed and insert it into a database, which some days could amount to hundreds of entries.. the page hangs until it's done and there is no real control over it, if there is an error I don't know about it until the page has loaded.
Is there a way to have a controlled insert, so that it will insert X amount, then pause a few seconds and then continue on until it is complete?
During its insert it also executes other queries so it can get quite heavy.
I'm not quite sure what I am looking so have struggled to find help on Google.
I would recommend you to use background tasks for that. Pausing your PHP script will not help you in speeding up page loading. Apache (or nginx or any other web-server) sends whole HTTP packet back to browser only when PHP script is completed.
You can use some functions related to output stream and if web-server supports chunked transfer then you can see progress while your page is loading. But for this purpose many developers use AJAX-queries. One query for one chunk of data. And store position of chunk in a session.
But as I wrote at first the better way would be using background tasks and workers. There are many ways of implementation this approach. You can use some specialized services like RabbitMQ, Gearman or something like that. And you can just write your own console application that you will start and check by cron-task.
I am a programmer at an internet marketing company that primaraly makes tools. These tools have certian requirements:
They run in a browser and must work in all of them.
The user either uploads something (.csv) to process or they provide a URL and API calls are made to retrieve information about it.
They are moving around THOUSANDS of lines of data (think large databases). These tools literally run for hours, usually over night.
The user must be able to watch live as their information is processed and is presented to them.
Currently we are writing in PHP, MySQL and Ajax.
My question is how do I process LARGE quantities of data and provide a user experience as the tool is running. Currently I use a custom queue system that sends ajax calls and inserts rows into tables or data into divs.
This method is a huge pain in the ass and couldnt possibly be the correct method. Should I be using a templating system or is there a better way to refresh chunks of the page with A LOT of data. And I really mean a lot of data because we come close to maxing out PHP memory and is something we are always on the look for.
Also I would love to make it so these tools could run on the server by themselves. I mean upload a .csv and close the browser window and then have an email sent to the user when the tool is done.
Does anyone have any methods (programming standards) for me that are better than using .ajax calls? Thank you.
I wanted to update with some notes incase anyone has the same question. I am looking into the following to see which is the best solution:
SlickGrid / DataTables
GearMan
Web Socket
Ratchet
Node.js
These are in no particular order and the one I choose will be based on what works for my issue and what can be used by the rest of my department. I will update when I pick the golden framework.
First of all, you cannot handle big data via Ajax. To make users able to watch the processes live you can do this using web sockets. As you are experienced in PHP, I can suggest you Ratchet which is quite new.
On the other hand, to make calculations and store big data I would use NoSQL instead of MySQL
Since you're kind of pinched for time already, migrating to Node.js may not be time sensitive. It'll also help with the question of notifying users of when the results are ready as it can do browser notification push without polling. As it makes use of Javascript you might find some of your client-side code is reusable.
I think you can run what you need in the background with some kind of Queue manager. I use something similar with CakePHP and it lets me run time intensive processes in the background asynchronously, so the browser does not need to be open.
Another plus side for this is that it's scalable, as it's easy to increase the number of queue workers running.
Basically with PHP, you just need a cron job that runs every once in a while that starts a worker that checks a Queue database for pending tasks. If none are found it keeps running in a loop until one shows up.
My script sends notification emails of a new comment, this could be to 50 members and 50 emails need to be sent, which could take 20 seconds which is way to long for the user to wait! What's the best way in PHP to do this, is there a way to do asynchronously?
A simple way might be to store the necessary information (email addresses, content) in a database and have a batch process run every minute or so with a cron job. The batch process can query the database for pending emails and, if any are to be sent, go through them and then delete the database entries.
is there a way to do asynchronously?
Yes, there is!
exec('wget PATH_TO_YOUR_SCRIPT_THAT_SENDS_THE_NEWSLETTER > /dev/null &');
Note that the database alternative is a pretty good one too. But this should work too if you're on Linux (and doesn't require a database).
I'd use something like RabbitMQ. Your website acts like a producer sending the email requests to Rabbit; then have a consumer running that processes the requests from Rabbit.
Advantages - if your consumer falls over then when you restart it will pickup from where it left off (last acknowledged request).
Indeed it can be done asynchronously.
The simplest way is to insert the email data into a database rather than actually running the emails, and then have a cron job that periodically actually sends the emails.
There are of course other ways too, but that will probably be the most straight forward.
You can use cURL POST to start an asynchronous script. Set the timeout to a short period so your script can resume after the POST request has been made. You can set the email information in the POST request or store it in a data base table.
I'm planning to write a system which should accept input from users (from browser), make some calculations and show updated data to all users, currently visiting certain website.
Input can come one time in a hour, but can also come 100 times each second. It is VERY important not to loose any of user inputs, but really register and process ALL of them.
So, the idea was to create two programs. One will receive data (input) from browser and store it somehow in a queue (maybe an array, to be really fast?). Second program should wait until there are new items in the queue (saving resources) and then became active and begin to process the queue items. Both programs should run asynchronously.
I can php, so I would write first program using php. But I'm not sure about second part.. I'm not sure about how to send an event from first to second program. I need some advice at this point. Threads are not possible with php? I need some ideas how to create the system like i discribed.
I would use comet server to communicate feedback to the website the input came from (this part already tested)
As per the comments above, trivially you appear to be describing a message queueing / processing system, however looking at your question in more depth this is probably not the case:
Both programs should run asynchronously.
Having a program which process a request from a browser but does it asynchronously is an oxymoron. While you could handle the enqueueing of a message after dealing with the HTTP request, its still a synchronous process.
It is VERY important not to loose any of user inputs
PHP is not a good language for writing control systems for nuclear reactors (nor, according to Microsoft, is Java). HTTP and TCP/IP are not ideal for real time systems either.
100 times each second
Sorry - I thought you meant there could be a lot of concurrent requests. This is not a huge amount.
You seem to be confusing the objective of using COMET / Ajax with asynchronous processing of the application. Even with very large amounts of data, it should be possible to handle the interaction using a single php script working synchronously.
Soon I'm going to have 3 identical scripts on 3 different VPS's and I want to have a fourth VPS which will control them all..
So for the sake of this question what I need to do is insert SQL rows and create files on the sub-servers, and the sub-servers also need to send statistical data back to the mother server. What is the most efficient way of doing this?
I was thinking of making scripts on the servers to do the jobs I need and using cURL to send requests to these scripts making use of URL parameters for the data which needs to be transferred, but perhaps there is a better way? Ideally I want it to be as fast as possible because they will most likely be sending requests to each other every second.
You could use XML-RPC, which exists in many manifestations:
http://us3.php.net/manual/en/book.xmlrpc.php
If you want dead-simple, just use plain HTTP(S) requests, provided you're careful about implementing it.
To perform a simple request, use cURL, file_get_contents, or fopen. This website is packed full of usage examples.
For simple comminication (ie. a script on server A triggers a script on server B), plain and simple HTTP queries works great. You can add basic authentication (htaccess) to avoid unauthorized people to trigger your script, and stronger security by using HTTPS.