Say I am creating a webchat, where the chat messages are stored in a SQL database (I don't know how else to do it), what benefits does using AJAX to long-poll instead of simply polling every x seconds?
Since PHP runs only when you open the page, the long-polled PHP script will have to check for new messages every second as well. What benefits does long-polling have then? Either way, I'm going to have a latency of x seconds, only with long-polling the periodic checking happens on the server.
Long polling, in your case, has two advantages:
First, long polling allows clients to receive message updates immediately after they become available on the server, increasing the responsiveness of your webchat.
The second advantage is that almost no change is required in the client application in order to work in this mode. From the client’s point of view, a blocked poll request looks like a network delay, the only difference is that the client doesn't need to wait between sending poll requests, as it would if you were simply polling every x seconds.
However, making a server hold requests increases server load. Usual web servers with synchronous request handling use one thread per each request, this means that waiting request blocks the thread by which it is handled. Thus, 100 chat clients which use long polling to get message updates from the server, will block 100 threads.
Most of these threads will be in the waiting state, but every thread still uses a considerable amount of resources. This problem is solved in Comet by asynchronous request processing, a technique allowing request blocking without blocking a thread, which is now
supported by several web servers including Tomcat.
Reference for my answer: oBIX Watch communication engine reference document
Related
I know very little about nodejs. All I know is, that it works upon a single thread model, which switches to multiple threads for I/O tasks. So for example,
Request A ----> nodejs (Single Thread)
// Finds out that it the requires requires I/O operation
nodejs ----> underlying OS (Starts An Independent Thread)
// nodejs is free to serve more requests
Does this mean for 1000 concurrent requests, there will be a request that will be handled after all 999 requests are handled? If yes, it seems to be an inefficient system! I would term apache running php to be better suited (in the above case). Apache with PHP keeps the ability to launch 1000 concurrent threads and thus no processing queue and zero wait time.
I might be missing an important concept here, but is it really the way nodejs works?
Node.js is based on the event loop programming model. The event loop runs in single thread and repeatedly waits for events and then runs any event handlers subscribed to those events. Events can be for example
timer wait is complete
next chunk of data is ready to be written to this file
theres a fresh new HTTP request coming our way
All of this runs in single thread and no JavaScript code is ever executed in parallel. As long as these event handlers are small and wait for yet more events themselves everything works out nicely. This allows multiple request to be handled concurrently by a single Node.js process.
(There's a little bit magic under the hood as where the events originate. Some of it involve low level worker threads running in parallel.)
In this SQL case, there's a lot of things (events) happening between making the database query and getting its results in the callback. During that time the event loop keeps pumping life into the application and advancing other requests one tiny event at a time. Therefore multiple requests are being served concurrently.
Please look into this article Article
I'm going to be using Nodejs to process some CPU intense loop operations with sending emails to registered users as PHP was using too much during the time it runs and freezes the site.
One thing is that Nodejs will be on different server and do a request using external connection in MySQL.
I've heard that external db connection is bad for performance.
Is this true? And are there any pros and cons of doing this?
Keep in mind, when running a CPU intensive operation in Node the whole application blocks as it runs in a single thread. If you're going to run a CPU intensive operation in Node, make sure you spawn it off into a child process who's only job is to run the calculation and then return to the primary application. This will ensure your Node app is able to continue responding to income requests as the data is being processed.
Now, onto your question. Having the database on a different server is extremely common and typically is a good practice to have. Where you can run into performance problems is if your database is in a different data center entirely. The further (physically) your database server is from your application server, the more latency there will be per request.
If these requests are seriously CPU intensive, you should consider looking into a queueing mechanism for a couple reasons. One, it ensures that even in the event of an application crash, you don't lose a request that is being processed. Two, you can monitor the queue, and scale the number of workers processing the queue in the event that the operations are piling to the point that a single application can't finish processing one before another comes in.
So a friend and I are building a web based, AJAX chat software with a jQuery and PHP core. Up to now, we've been using the standard procedure of calling the sever every two seconds or so looking for updates. However I've come to dislike this method as it's not fast, nor is it "cost effective" in that there are tons of requests going back and forth from the server, even if no data is returned.
One of our project supporters recommended we look into a technique known as COMET, or more specifically, Long Polling. However after reading about it in different articles and blog posts, I've found that it isn't all that practical when used with Apache servers. It seems that most people just say "It isn't a good idea", but don't give much in the way of specifics in the way of how many requests can Apache handle at one time.
The whole purpose of PureChat is to provide people with a chat that looks great, goes fast, and works on most servers. As such, I'm assuming that about 96% of our users will being using Apache, and not Lighttpd or Nginx, which are supposedly more suited for long polling.
Getting to the Point:
In your opinion, is it better to continue using setInterval and repeatedly request new data? Or is it better to go with Long Polling, despite the fact that most users will be using Apache? Also, it possible to get a more specific rundown on approximately how many people can be using the chat before an Apache server rolls over and dies?
As Andrew stated, a socket connection is the ultimate solution for asynchronous communication with a server, although only the most cutting edge browsers support WebSockets at this point. socket.io is an open source API you can use which will initiate a WebSocket connection if the browser supports it, but will fall back to a Flash alternative if the browser does not support it. This would be transparent to the coder using the API however.
Socket connections basically keep open communication between the browser and the server so that each can send messages to each other at any time. The socket server daemon would keep a list of connected subscribers, and when it receives a message from one of the subscribers, it can immediately send this message back out to all of the subscribers.
For socket connections however, you need a socket server daemon running full time on your server. While this can be done with command line PHP (no Apache needed), it is better suited for something like node.js, a non-blocking server-side JavaScript api.
node.js would also be better for what you are talking about, long polling. Basically node.js is event driven and single threaded. This means you can keep many connections open without having to open as many threads, which would eat up tons of memory (Apaches problem). This allows for high availability. What you have to keep in mind however is that even if you were using a non-blocking file server like Nginx, PHP has many blocking network calls. Since It is running on a single thread, each (for instance) MySQL call would basically halt the server until a response for that MySQL call is returned. Nothing else would get done while this is happening, making your non-blocking server useless. If however you used a non-blocking language like JavaScript (node.js) for your network calls, this would not be an issue. Instead of waiting for a response from MySQL, it would set a handler function to handle the response whenever it becomes available, allowing the server to handle other requests while it is waiting.
For long polling, you would basically send a request, the server would wait 50 seconds before responding. It will respond sooner than 50 seconds if it has anything to report, otherwise it waits. If there is nothing to report after 50 seconds, it sends a response anyways so that the browser does not time out. The response would trigger the browser to send another request, and the process starts over again. This allows for fewer requests and snappier responses, but again, not as good as a socket connection.
First of all, I am completely new on many stuff, so I will welcome any inputs, including suggestions, existing projects, existing models, etc.
My current problems are:
The background service maintains a queue of tasks. The background service is written in C++ or python.
When a client clicks "Create Task" button in browser, the information will be sent to web server and the web server script (written in PHP) will initiate an RPC call to the background service to append the task to the internal queue.
The client browser will initiate an AJAX request to wait for the completion of the task. The AJAX request will hold until the task is completed (or failed) or the client cancels the request.
Thus, I need an low cost way to get the task progress which is run on a background service process.
I can think of two ways:
The background service can inform the server AJAX script about the progress pro-actively. This is low cost but I actually do not know how to do it. Does any RPC framework provides such asynchronous call back? Currently the RPC framework I decided to use is Thrift because of its multi-languages support.
The AJAX script on server side will make an RPC call to get current progress every a few seconds, and sleep in between. Upon completion, the AJAX script will return, otherwise it will just let the client browser wait by not returning. This is actually simpler but I am not sure about its cost. Note that delay isn't an issue to me here because I suppose that the clients are okay to wait for a few more seconds.
Is there any common way/model to deal with this problem?
Thanks for the help.
Depends on how you code it. The common way to do it is to make a javascripted ajax request every 1-3 seconds or so and poll the progress from the server.
This will intermediately close the connection and be more gentle to the server. If you use a persistent connection (WebSockets also fall into this category), you will keep the server busy. Besides, a "sleep" keeps the CPU busy - which is something I would try to avoid if I were you. On the other hand, if you've got the resources for that...
I can only repeat myself: it depends on how you code it and what you expect of it in the end.
If you want the client do some more work and treat the server gentle, choose your 1st option and if you think your server can handle it, choose the 2nd option and go "persistent" and even use WebSockets (which represent persistent connections to your server - remember that they aren't widely supported by web-browsing clients yet either).
Although I think that in the end - the trade-off of a simple progress compared to hogging your server CPU with constant sleeps and some persistent connections on-top of that will make you choose your 1st option: poll the server script for the progress value every x secs from the client side. Btw.: it's what Twitter does and their servers survived until today! ;)
I think, You can use WebSockets for that.
You can use WebSockets.
Establish a WebSockets connection between the client and a web service that has access to the information you need to pass to the client.
With web sockets, you don't need to poll the server asking it for progress, but rather have the server notify the client whenever it's ready.
A backwards compatible implementation would be long polling.
Cheers
i read somewhere that php has threading ability called pcntl_fork(). but i still dont get what it means? whats its purpose?
software languages have multithreading abilities from what i understand, correct me if im wrong, that its the ability for a parent object to have children of somekind.
thanks
From wikipedia: "In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. It generally results from a fork of a computer program into two or more concurrently running tasks."
Basically, having threads is having the ability to do multiple things within the same running application (or, process space as said by RC).
For example, if you're writing a chat server app in PHP, it would be really nice to "fork" certain tasks so if the server gets hung up processing something like a file transfer or very slow client it can spawn a thread to take care of the file transfer while the main application continues to transfer messages between clients without delay. Last time I'd used PHP, the threading was clunky/not very well supported.
Or, on the client end, while sending said file, it would be a good idea to thread the file transfer, otherwise you wouldn't be able to send messages to the server while sending the file.
It is not a very good metaphor. Think of a thread like a worker or helper that will do work or execute code for you in the background while your program could possibly be doing other tasks such as taking user input.
Threading means you can have more than one line of execution within the same process space. Note that this is different than a multi-process paradigm as processes will not share the same memory space.
This wiki link will do a good job getting you up to speed with threads. Having said that, the function pcntl_fork() in PHP, appears to create a child process which falls inline with the multi-process paradigm.
In layman's terms, since a thread gives you more than one line of execution within a program, it allows you to do more than one thing at the same time. Technically, you're not always doing these things simultaneously as in a single core processor, you're really just time-slicing so it appears you're doing more than one thing at a time.
A pretty straight-forward use of threads is how connections to a web server are handled. If you didn't have multiple threads, your application would listen for a connection on a socket, accept the connection when a client requested a connection, and then would process whatever page the client asked for. This seems well and good until you have a page that takes 5 seconds to load and you have 2 clients connecting at the same time. One of the clients will sit and wait for the server to accept their connection for ~5 seconds, because the 1st client is using the only line of execution to serve the page and it can't do that and accept the 2nd connection.
Now if you have multiple threads, you'll have one thread (i.e. the listener thread) that only accepts connections. As soon as the connection is accepted by the listener thread, he will pass the connection on to another thread. We'll call it the processor thread. Once the connection is passed on to the processor thread, the listener thread will immediately go back to waiting for a new connection. Meanwhile, the processor thread will use it's own execution line to serve the page that takes 5 seconds. In the scenario above, the 2nd client would have it's connection accepted immediately after the 1st client was handed to the 1st processor thread and an additional 2nd processor thread would be created to handle the request from the 2nd client. This would typically allow you to serve both clients the data in a little over 5 seconds while the single-threaded app would take ~10 seconds.
Hope this helps with your understanding of application threading.
Threading mean not to allow sequential behavior inside your code, whatever is your programming language..