Does nodejs single thread model imply longer processing queues? - php

I know very little about nodejs. All I know is, that it works upon a single thread model, which switches to multiple threads for I/O tasks. So for example,
Request A ----> nodejs (Single Thread)
// Finds out that it the requires requires I/O operation
nodejs ----> underlying OS (Starts An Independent Thread)
// nodejs is free to serve more requests
Does this mean for 1000 concurrent requests, there will be a request that will be handled after all 999 requests are handled? If yes, it seems to be an inefficient system! I would term apache running php to be better suited (in the above case). Apache with PHP keeps the ability to launch 1000 concurrent threads and thus no processing queue and zero wait time.
I might be missing an important concept here, but is it really the way nodejs works?

Node.js is based on the event loop programming model. The event loop runs in single thread and repeatedly waits for events and then runs any event handlers subscribed to those events. Events can be for example
timer wait is complete
next chunk of data is ready to be written to this file
theres a fresh new HTTP request coming our way
All of this runs in single thread and no JavaScript code is ever executed in parallel. As long as these event handlers are small and wait for yet more events themselves everything works out nicely. This allows multiple request to be handled concurrently by a single Node.js process.
(There's a little bit magic under the hood as where the events originate. Some of it involve low level worker threads running in parallel.)
In this SQL case, there's a lot of things (events) happening between making the database query and getting its results in the callback. During that time the event loop keeps pumping life into the application and advancing other requests one tiny event at a time. Therefore multiple requests are being served concurrently.
Please look into this article Article

Related

Temporary storage for collecting data prior to sending

I'm working on a composer package for PHP apps. The goal is to send some data after requests, queue jobs, other actions that are taken. My initial (and working) idea is to use register_shutdown_function to do it. There are a couple of issues with this approach, firstly, this increases the page response time, meaning that there's the overhead of computing the request, plus sending the data via my API. Another issue is that long-running processes, such as queue workers, do not execute this method for a long time, therefore there might be massive gaps between when the data was created and when it's sent and processed.
My thought is that I could use some sort of temporary storage to store the data and have a cronjob to send it every minute. The only issue I can see with this approach is managing concurrency on hight IO. Because many processes will be writing to the file every (n) ms, there's an issue with reading the file and removing lines that had been already sent.
Another option which I'm trying to desperately avoid is using the client database. This could potentially cause performance issues.
What would be the preferred way to do this?
Edit: the package is essentially a monitoring agent.
There are a couple of issues with this approach, firstly, this increases the page response time, meaning that there's the overhead of computing the request, plus sending the data via my API
I'm not sure you can get around this, there will be additional overhead to doing more work within the context of a web request. I feel like using a job-queue based/asynchronous system is minimizing this for the client. Whether you choose a local file system write, or a socket write you'll have that extra overhead, but you'll be able to return to the client immediately and not block on the processing of that request.
Another issue is that long-running processes, such as queue workers, do not execute this method for a long time, therefore there might be massive gaps between when the data was created and when it's sent and processed.
Isn't this the whole point?? :p To return to your client immediately, and then asynchronously complete the job at some point in the future? Using a job queue allows you to decouple and scale your worker pool and webserver separately. Your webservers can be pretty lean because heavy lifting is deferred to the workers.
My thought is that I could use some sort of temporary storage to store the data and have a cronjob to send it every minute.
I would def recommend looking at a job queue opposed to rolling your own. This is pretty much solved and there are many extremely popular open source projects to handle this (any of the MQs) Will the minute cron job be doing the computation for the client? How do you scale that? If a file has 1000 entries, or you scale 10x and has 10000 will you be able to do all those computations in less than a minute? What happens if a server dies? How do you recover? Inter-process concurrency? Will you need to manage locks for each process? Will you use a separate file for each process and each minute? To bucket events? What happens if you want less than 1 minute runs?
Durability Guarantees
What sort of guarantees are you offering your clients? If a request returns can the client be sure that the job is persisted and it will be completed at sometime in the future?
I would def recommend choosing a worker queue, and having your webserver processes write to it. It's an extremely popular problem with so many resources on how to scale it, and with clear durability and performance guarantees.

Managing workers with RabbitMQ

I have implemented rabbitMQ in my current php application to handle asynchroneous jobs that are handled by workers. But my current problem is that how should i monitor and scale up or down the workers. Also, i want to add error handling in case all the workers die. I have thought of following two ways but don't know which one is the better:
At producer end, i would analyze the rabbitMQ queue size. If queue size (list of pending tasks) is more than a threshold, i would create one new worker everytime producer script executes but before that i would check the server load (using linux command uptime). If server load is less than a threshold then only new worker would be created. At consumer end (in worker.php), i would apply same method to scale up the workers and i would also check that if script is idle for a given time (i.e. there is no pending task in rabbit mq queue) then it would automatically die (to automate scaling down of workers).
Second method is to use background process or cron to monitor and scale/up down the workers. But i don't want to rely on cron (as i have very bad experiences with it) or background process because if background process crashes for some reason then there is no way to recover from it.
Please help.
I wouldn't recommend bothering to scale them down to nothing when there's no work to be done. The worker that's left (if you want to scale back to 1) will simply wait for something else to consume and it's not an expensive operation.
In terms of determining whether to scale up, I'd recommend leveraging the RabbitMQ Management HTTP API (http://hg.rabbitmq.com/rabbitmq-management/raw-file/3646dee55e02/priv/www-api/help.html). You can use the queue related aspects via a GET operation to get information about queues, including how many entries are currently waiting to be processed.
With that info, you can decide to scale if it either hits a certain threshold, or keeps increasing with every check for a certain amount of time, or something similar. This can be done from the consumer side.
In terms of error handling, I would recommend encapsulating the RabbitMQ connection aspect of your workers such that if a RabbitMQ exception occurs the connection is re-established from scratch and continues.
If it's a more serious type of exception that isn't RabbitMQ-related, you may need to catch it at such a level where the worker basically spawns a new worker before it dies. Then of course there are other types of exceptions (out of memory conditions, for example), where it really isn't feasible to try to continue and your program should just completely die.
It is very difficult to answer your question with any degree of accuracy since there are many aspects of the context which are not included.
How long do the tasks take to execute?
Why do you want to scale up/down? Why don't you have threads waiting for load in the first place?
That being said, coming from the world of Erland and functional programming (which is the language used to power RabbitMQ) I would like to suggest the concept of a SUPERVISOR thread. This thread would have the following responsibilities:
Spawn threads depending on the load/qty of requests
Discard threads depending on the load/qty of requests
Monitor the children threads and re-launch them as required reprocessing the same messages if necessary or discarding them
The Supervisor thread should be as easy as possible and should be built in such a way that it simply loops, sleeps and checks if all the threads that need to be alive actually are - it can then check the load and spawn up or kill off the workers as needed. Or in other words, spawn more and/or not-spawn depending on your needs.
You could easily use an exchange to send messages to both the supervisor and the worker queues where the supervisor would then be able to keep a record/count of the messages in the queue without having to write polling code to the server, it would simply listen to it's own queue. You can increase/dec the counter from the supervisor thread and manage everything from there.
Hope this helps.
See: http://docs.dotcloud.com/guides/daemons/
Regretfully I don't program in PHP and therefore cannot give you PHP-specific assistance, this is however the programming pattern that I recommend that you use. If PHP doesn't allow multi-threaded programming and/or threads then I would highly recommend that you use a language that does since you will not be able to scale and use the full power of the local machine unless you use multiple threads. As for the supervisor crashing, if you keep minimal work in the supervisor and delegate all responsibilities to children threads then the risk of a supervisor crash is minimal.
Perhaps this will help:
Philosophy:
http://soapatterns.org/design_patterns/service_agent
PHP-specific:
http://www.quora.com/PHP-programming-language-1/Is-there-an-actor-framework-for-php

Cross server MySQL connection and requests

I'm going to be using Nodejs to process some CPU intense loop operations with sending emails to registered users as PHP was using too much during the time it runs and freezes the site.
One thing is that Nodejs will be on different server and do a request using external connection in MySQL.
I've heard that external db connection is bad for performance.
Is this true? And are there any pros and cons of doing this?
Keep in mind, when running a CPU intensive operation in Node the whole application blocks as it runs in a single thread. If you're going to run a CPU intensive operation in Node, make sure you spawn it off into a child process who's only job is to run the calculation and then return to the primary application. This will ensure your Node app is able to continue responding to income requests as the data is being processed.
Now, onto your question. Having the database on a different server is extremely common and typically is a good practice to have. Where you can run into performance problems is if your database is in a different data center entirely. The further (physically) your database server is from your application server, the more latency there will be per request.
If these requests are seriously CPU intensive, you should consider looking into a queueing mechanism for a couple reasons. One, it ensures that even in the event of an application crash, you don't lose a request that is being processed. Two, you can monitor the queue, and scale the number of workers processing the queue in the event that the operations are piling to the point that a single application can't finish processing one before another comes in.

What benefits does long-polling have in php?

Say I am creating a webchat, where the chat messages are stored in a SQL database (I don't know how else to do it), what benefits does using AJAX to long-poll instead of simply polling every x seconds?
Since PHP runs only when you open the page, the long-polled PHP script will have to check for new messages every second as well. What benefits does long-polling have then? Either way, I'm going to have a latency of x seconds, only with long-polling the periodic checking happens on the server.
Long polling, in your case, has two advantages:
First, long polling allows clients to receive message updates immediately after they become available on the server, increasing the responsiveness of your webchat.
The second advantage is that almost no change is required in the client application in order to work in this mode. From the client’s point of view, a blocked poll request looks like a network delay, the only difference is that the client doesn't need to wait between sending poll requests, as it would if you were simply polling every x seconds.
However, making a server hold requests increases server load. Usual web servers with synchronous request handling use one thread per each request, this means that waiting request blocks the thread by which it is handled. Thus, 100 chat clients which use long polling to get message updates from the server, will block 100 threads.
Most of these threads will be in the waiting state, but every thread still uses a considerable amount of resources. This problem is solved in Comet by asynchronous request processing, a technique allowing request blocking without blocking a thread, which is now
supported by several web servers including Tomcat.
Reference for my answer: oBIX Watch communication engine reference document

what does threading mean when it comes to programming?

i read somewhere that php has threading ability called pcntl_fork(). but i still dont get what it means? whats its purpose?
software languages have multithreading abilities from what i understand, correct me if im wrong, that its the ability for a parent object to have children of somekind.
thanks
From wikipedia: "In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. It generally results from a fork of a computer program into two or more concurrently running tasks."
Basically, having threads is having the ability to do multiple things within the same running application (or, process space as said by RC).
For example, if you're writing a chat server app in PHP, it would be really nice to "fork" certain tasks so if the server gets hung up processing something like a file transfer or very slow client it can spawn a thread to take care of the file transfer while the main application continues to transfer messages between clients without delay. Last time I'd used PHP, the threading was clunky/not very well supported.
Or, on the client end, while sending said file, it would be a good idea to thread the file transfer, otherwise you wouldn't be able to send messages to the server while sending the file.
It is not a very good metaphor. Think of a thread like a worker or helper that will do work or execute code for you in the background while your program could possibly be doing other tasks such as taking user input.
Threading means you can have more than one line of execution within the same process space. Note that this is different than a multi-process paradigm as processes will not share the same memory space.
This wiki link will do a good job getting you up to speed with threads. Having said that, the function pcntl_fork() in PHP, appears to create a child process which falls inline with the multi-process paradigm.
In layman's terms, since a thread gives you more than one line of execution within a program, it allows you to do more than one thing at the same time. Technically, you're not always doing these things simultaneously as in a single core processor, you're really just time-slicing so it appears you're doing more than one thing at a time.
A pretty straight-forward use of threads is how connections to a web server are handled. If you didn't have multiple threads, your application would listen for a connection on a socket, accept the connection when a client requested a connection, and then would process whatever page the client asked for. This seems well and good until you have a page that takes 5 seconds to load and you have 2 clients connecting at the same time. One of the clients will sit and wait for the server to accept their connection for ~5 seconds, because the 1st client is using the only line of execution to serve the page and it can't do that and accept the 2nd connection.
Now if you have multiple threads, you'll have one thread (i.e. the listener thread) that only accepts connections. As soon as the connection is accepted by the listener thread, he will pass the connection on to another thread. We'll call it the processor thread. Once the connection is passed on to the processor thread, the listener thread will immediately go back to waiting for a new connection. Meanwhile, the processor thread will use it's own execution line to serve the page that takes 5 seconds. In the scenario above, the 2nd client would have it's connection accepted immediately after the 1st client was handed to the 1st processor thread and an additional 2nd processor thread would be created to handle the request from the 2nd client. This would typically allow you to serve both clients the data in a little over 5 seconds while the single-threaded app would take ~10 seconds.
Hope this helps with your understanding of application threading.
Threading mean not to allow sequential behavior inside your code, whatever is your programming language..

Categories