Best way to handle memory intensive, long tasks in PHP

Best way to handle memory intensive, long tasks in PHP - php

I have an APNS notification server sent up, which would in theory every day send about 50,000 to 100,000 users a processed notification (based on the amount of users of our web app that ties in with the iOS app).
The notification would go out around 2, but it must send it to each user individually (using Urban Airship) and is triggered by curl on a cron job.
It iterates through each user and has to use an HTML scraper (simple_html_dom to be exact) which takes about 5-10s per user, and is obviously very memory intensive. A simple GET request cant be the right way to come about doing this, in fact im positive it will fail. What is the best way to handle this long, memory intensive task on a cron job?

If You will reuse same variables or set ones You are not going to use any more to null You won't run out memory.
Just don't load all data at once and free it(set to null) or replace with new data after You process it.
And make sure You can't improve speed of Your task 5-10s sounds really long.

Related

Is it just fine to POST data to Laravel every second?

I am trying to build a Tracking System where in an android app sends GPS data to a web server using Laravel. I have read tutorials on how to do realtime apps but as how I have understand, most of the guides only receives data in realtime. I haven't seen yet examples of sending data like every second or so.
I guess its not a good practice to POST data every second to a web server specially when you already have a thousand users. I hope anyone could suggest how or what should I do to get this approach?
Also, as much as possible I would only like to use Laravel without any NodeJS server.

Do sending quickly
First you should estimate server capacity. As of fpm, if you have 32 php processes and every post request handles by a server within 0.01sec, capacity can be roughly estimated asN = 32 / 0.01 = 3200 requests per second.
So just do handling fast. If your request handles for 0.1sec, it is too slow to have a lot of clients on a single server. Enable opcache, it can decrease time 5x. Inserting data to mysql is a slow operation, so you probably need to work it out to make it faster. Say, add it to a fast cache (redis\memcached) and when cache already contains 1000 elements or cache is created more than 0.5 seconds ago, move it to a database as a single insert query.
Do sending random
Most of smartphones may have correct time. So it can lead to a thousand of simultaneous requests when next second starts. So, first 0.01sec server will handle 1000 requests, next 0.99sec it will sleep. Insert at mobile code a random delay 0-0.9sec which is fixed for every device and defined at first install or request. It will load server uniformly.

There's at least 2 really important things you should consider:
Client's internet consumption
Server capacity
If you got a thousand users, every second would mean a lot of requests for you server to handle.
You should consider using some pushing techniques, like described in this #Dipin answer:
And when it comes to the server, you should consider using a queue system to handle those jobs. Like described in this article There's probably some package providing the integration to use Firebase or GCM to handle that for you.
Good luck, hope it helps o/

Pull notification methods

I want to design a notification component. I want to understand what type of pulling notification methods are used out there to effectively pull the notification with minimal stress on the server.
Let's say for example I want to notify user of a chat message, I imagine I would need to pull the data quite regularly, like every 500ms for a quick response. However, doing this may overload the system. Hypothetically speaking if I have a million user browsing the site that's 2 million requests every second!
I'm thinking of writing an algorithm that will incrementally increase the pull interval by 1 second on each pull up to a maximum of 60 second. The interval will reset to 500ms if there is new data. In this way, if the user has frequent notification it will be instant. But if there hasn't been notification for a longer period of time, there maybe a bit of delay of up to a minute.
In essence I'm compromising between user experience and server load to find a middle ground for both.
Please advise on possible drawback of this approach if any. Is there a proper name for it?
Alternatively, is there a better method out there?

What you are doing is pulling or long pulling. Effectively it is not good for performance.
The alternative way is pushing (http://en.wikipedia.org/wiki/Push_technology). You push the data when there is something new.
you could use web socket it achieve this.
You could look at Apollo messaging middle-ware that have native support for websockets and good performances.
http://activemq.apache.org/apollo/

The method you are using could lead a network traffic overload on your server if there are many clients connected . Let's suppose you have 1000 clients connected : the server will have to handle 1000 different connections. A better approach is using a push notification system. Check this out https://nodejs.org/it/docs/

Scale multi request to different services

I have a service, where I need to ask 40 external services (API's) to get information from them, by each user request. For example one user is searching for some information and my service is asking 40 external partners to get the information, aggregates it in one DB (mysql) and displays the result to the user.
At this moment I have a multicurl solution, where I have 10 partner request at one time and if someone parnter is done with the request, then the software is adding another partner from the remaining 30 to the queue of multicurl, until all the 40 request are done and the results are in the DB.
The problem on this solution, is that it can not scale on many servers and I want to have some solution, where I can fire 40 request at one time for example divided on 2-3 servers and wait only so long, as the slowest partner delivers the results ;-) What means, that if the slowest partner tooks 10 seconds I will have the result of all 40 partners in 10 seconds. On multicurl I come in troubles, when there are more then 10-12 requests at one time.
What kind of solution, can you offer me, what i getting as low as possible ressources and can run many many process on one server and be scalable. My software is on PHP written, that mean I need an good connect to the solution with framework or API.
I hope you understand my problem and need. Please ask, if something is not clear.

One possible solution would be to use a message queue system like beanstalkd, Apache ActiveMQ, memcacheQ etc.
A high level example would be:
User makes request to your service for information
Your service adds the requests to the queue (presumably one for each of the 40 services you want to query)
One or more job servers continuously poll the queue for work
A job server gets a message from the queue to do some work, adds the data to the DB and deletes the item from the queue.
In this model, since now the one task of performing 40 requests is distributed and is no longer part of one "process", the next part of the puzzle will be figuring out how to mark a set of work as completed. This part may not be that difficult or maybe it introduces a new challenge (depends on the data and your application). Perhaps you could use another cache/db row to set a counter to the number of jobs a particular request needs in order to complete and as each queue worker finishes a request, it can reduce the counter by 1. Once the counter is 0, you know the request has been completed. But when you do that you need to make sure the counter gets to 0 and doesn't get stuck for some reason.
That's one way at least, hope that helps you a little or opens the door for more ideas.

Best solution for running multiple intensive jobs at specific times

We have a web app that uses IMAP to conditionally insert messages into users' mailboxes at user-defined times.
Each of these 'jobs' are stored in a MySQL DB with a timestamp for when the job should be run (may be months into the future). Jobs can be cancelled at anytime by the user.
The problem is that making IMAP connections is a slow process, and before we insert the message we often have to conditionally check whether there is a reply from someone in the inbox (or similar), which adds considerable processing overhead to each job.
We currently have a system where we have cron script running every minute or so that gets all the jobs from the DB that need delivering in the next X minutes. It then splits them up into batches of Z jobs, and for each batch performs an asynchronous POST request back to the same server with all the data for those Z jobs (in order to achieve 'fake' multithreading). The server then processes each batch of Z jobs that come in via HTTP.
The reason we use an async HTTP POST for multithreading and not something like pnctl_fork is so that we can add other servers and have them POST the data to those instead, and have them run the jobs rather than the current server.
So my question is - is there a better way to do this?
I appreciate work queues like beanstalkd are available to use, but do they fit with the model of having to run jobs at specific times?
Also, because we need to keep the jobs in the DB anyway (because we need to provide the users with a UI for managing the jobs), would adding a work queue in there somewhere actually be adding more overhead rather than reducing it?
I'm sure there are better ways to achieve what we need - any suggestions would be much appreciated!
We're using PHP for all this so a PHP-based/compatible solution is really what we are looking for.

Beanstalkd would be a reasonable way to do this. It has the concept of put-with-delay, so you can regularly fill the queue from your primary store with a message that will be able to be reserved, and run, in X seconds (time you want it to run - the time now).
The workers would then run as normal, connecting to the beanstalkd daemon and waiting for a new job to be reserved. It would also be a lot more efficient without the overhead of a HTTP connection. As an example, I used to post messages to Amazon SQS (by http). This could barely do 20 QPS at very most, but Beanstalkd accepted over a thousand per second with barely any effort.
Edited to add: You can't delete a job without knowing it's ID, though you could store that outside. OTOH, do users have to be able to delete jobs at any time up to the last minute? You don't have to put a job into the queue weeks or months in advance, and so you would still only have one DB-reader that ran every, say, 1 to 5 mins to put the next few jobs into the queue, and still have as many workers as you would need, with the efficiencies they can bring.
Ultimately, it depends on the number of DB read/writes that you are doing, and how the database server is able to handle them.
If what you are doing is not a problem now, and won't become so with additional load, then carry on.

Running some massive task during off-peak period?

Background
In one of our project, we need to run some massive task occasionally, e.g., generate reports, send numbers of notification emails. Sometimes it causes noticeable lag when such massive task is being run. So we are thinking of one possible solution.
Some thoughts
Set crontab to run a backend script every 10 minutes.
Collect the cpu usage info, I found http://phpsysinfo.sourceforge.net/phpsysinfo/index.php?disp=dynamic , but I'm not sure if there is a better way?
If there are contiguous usage lower than a specific value, or the first task in the queue reaches its deadline, the script will get a certain number of tasks from the queue and run.
There are different types of massive task: e.g.,
User can request certain type of report
Notification emails
Cleaning data in database
...
I am wondering if this idea is worth trying?
Is there any problem, or is there some other better solution?

This works up to a point but struggles if you are running anything where access is required 24 hours a day (like an internationally used site).
You may wish to replicate your database and then run your heavy queries off of that - or investigate a form of data warehousing.
What is a data warehouse?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Best way to handle memory intensive, long tasks in PHP - php

If You will reuse same variables or set ones You are not going to use any more to null You won't run out memory. Just don't load all data at once and free it(set to null) or replace with new data after You process it. And make sure You can't improve speed of Your task 5-10s sounds really long.

Related

Is it just fine to POST data to Laravel every second?

Pull notification methods

Scale multi request to different services

Best solution for running multiple intensive jobs at specific times

Running some massive task during off-peak period?

Categories

Resources