PHP Threading and high-latency file access (eg; FTP)

PHP Threading and high-latency file access (eg; FTP) - php

This is a bit complicated, so please don't jump to conclusions, feel free to ask about anything that is not clear enough.
Basically, I have a websocket server written in PHP. Please note that websocket messages are asynchronous, that is, a response to a request might take a lot of time, all the while the client keeps on working (if applicable).
Clients are supposed to ask the server for access to files on other servers. This can be an FTP service, or Dropbox, for the matter.
Here, please take note of two issues: connections should be shared and reused and the server actually 'freezes' while it does its work, hence any requests are processed after the server has 'unfrozen'.
Therefore, I thought, why not offload file access (which is what freezes the server) to PHP threads?
The problem here is twofold;
how do I make a connection resource in the main thread (the server) available to the sub threads (not possible with the above threading model)?
what would happen if two threads end up needing the same resource? It's perfectly fine if one is locked until the other one finishes, but we still need to figure out issue #1.
Perhaps my train of thought is all screwed up, if you can find a better solution, I'm eager to hear it out. I've also had the idea of having a PHP thread hosting a connection resource, but it's pretty memory intensive.

PHP supports no threads. The purpose of PHP is to respond to web requests quickly. That's what the architecture was built for. Different libraries try to do something like threads but they usually cause more issues than they solve.
In general there are two ways to achieve what you want:
off-load the long processes to an external process. A common approach is using a system like gearman http://php.net/gearman
Use asynchronous operations. Some stream operations and such provide an "async" flag or "non-blocking" mode. http://php.net/stream-set-blocking

Related

PHP socket with jQuery/AJAX user-end

I've been working on sockets, generally in PHP for a while. Currently I have a PHP client for connecting to a chat server, and output every each data sent from server it's connected to.
To explain that in a wider matter, I accomplished this using flush() function in PHP to write out every each buffer waiting in the loop. Buffer reader is withing a while where the condition is the status of the connection socket. But this matters less.
Now to what I want to accomplish. I want to keep socket handling to server side and data from server outputted to client, via AJAX/jQuery. So far, my researches always returned me HTML5 WebSocket and node.js, however, I "have to" be real picky about this, as for users of this, my minimal dependency might be:
WinXP IE6 users(Already disables jQuery, even)
Users without JAVA/Flash installed
So I have to think of possibilities in this, which is why I can't use a Flash/Java backend or a new technology like WebSockets, and neither I want to handle server stuff in the client. I really hate to be stuck in old technology but for this it's a must.
As I was searching around, I found this one being as similiar to my needs.
Is PHP socket a viable option for making PHP jQuery based chat?
And to quick review the answers, they all point to one direction, PHP multi-process and memory eating. I know this is a minus, but it's the best I can take for now. But yet still, there'll be timeout disconnects for inactive connections within a certain delay, and extension of the delay if wanted. So I'm not much onto this one.
Secondly, the last answer pointing to "Ajax Chat Application Tutorial", I made an overall review but whoa, writing each line into an html file and re-including it each time, that is which I could do without using an extra file but, is it really necessary? Plus re-reading the file from server side, and re-importing the whole read file into document every each time, isn't that just worse for "both sides"?
Either ways that's about it, I wasn't able to come to a conclusion for a while, and it happened, here I am again. (:P) Waiting for your answers/suggestions/ideas, thanks by now.
Regards.

There is server software available that specializes in such matters. Is called a push server/service. There's for example APE (http://www.ape-project.org/); according to their website, it's compatible with all web browsers and they even got a demo chat there. I'd suggest you to go for that solution.

PHP sockets - what do I need to work with them?

Only yesterday, I was asking a friend of mine how he would go about emulating direct communication channels between two clients through a web server, for the purpose of creating a chat application, but by using solely PHP/MySQL/JavaScript.
He told me that the best way to do this was by the use of SOCKETS, a term I had only heard of until then. This morning I started looking into it for the purpose of creating my chat application, but I'm quickly starting to believe that it's not as easy as I'd hoped.
So my question is this: if I don't have access to my own server (I have a domain hosted on a shared server that I also use for testing purposes), can I still use sockets to achieve my goal? If so, how exactly? (Please understand that I am completely new to the idea)
If not, what other way is there to accomplish the communication channels?
My only idea so far is to simply send periodic requests (AJAX) to the web server the application would be stored on and request any new messages, if any. But this does not seem very feasible.
Thanks in advance for your help!

I think what your friend is trying to get to is implementing Comet for your chat site.
Assuming he's getting you to use PHP sockets to act as a daemon, I highly doubt a shared hosting provider will let you do it.
You could try hanging the PHP script until there's data available. However, this will quickly consume resources on a CGI-based server since the PHP server can't tell if the client is still connected. I know this from experience.
For these kind of things, I highly recommend you get a dedicated server or VPS and write your backend in something like socket.io which automagically handles all your communication problems on both the client and server side. PHP, MYSQL and servers that fork to serve requests are usually the worst case scenarios for implementing Comet since they incur quite a bit of overhead and aren't scalable.
If you can't afford to run your own Comet server, then polling may be your only option. This will be the most resource intensive and least responsive.

How to instantly notify logged in user of a short message

Short task description: I want one signed in user to be able to send an instant short plain text message to another signed in user. The solution needs to be easily scalable and not too resource demandinng in terms of bandwidth and server load (and $$).
The first idea was do client polling but this idea was quickly abandoned since it didn't meet scalability requirement. So, after that I went into research and came accross a number of concepts including sockets, node.js, xmpp. The amount of information is a bit overwhelming, so I was hoping for some advice to point me in the right directions. Hopefully something with readily available hosting solutions.
#epascarello:
thanks for quick response. I did, but not in detail. Before going in-depth into any technology, I want to be know that this is actually what I need.
Most of the examples concetnrate on instant chat but my requirements are somewhat different. I don't need every signed in user to see a message, but only one particular user, for whom it was meant, while there can be, say, 100 000 users logged in...
#Saeed Neamati:
thanks! Yes, I pretty much understand the two client-server communication options and have come to the conclutions that the pulling is a no-go. What I am trying to find now is the most scalable (that's the main prerequisite) and (hopefully) easy to implement push option. For instance, the socket option is relatively easy but it seems like it's not going to scale well due to server overload (or am I wrong). The node.js (at least by concept description) should be better at that, but I wanted to get some confirmation to this assumption. With xmpp - I'm not even sure how relevant it is to my task and how to approach it.
#andyuk:
Andy thanks, yes socket.io is also something that I came accross while doing research. As far as I understand it requires a server module that needs to run on a host. Do you know if possible to run on any server or do I need to look for a specialized hosting company? THe socket.io site for some reason doesn't work on my PC (neither IE or FF).

Did you look at the source code of nodejs chat.?

Look, you only have two options for client-server communication. Either client starts a request (an HTTP request on the web), which is called pull model (like client pulls the request out of the server), and server responds to that, or server starts a response directly without receiving any request (an HTTP response on the web), which is called push model (like the server pushes the data out to the client).
What you described as polling is actually the pull model, and indeed it takes lots of resources from the server.
But on the other hand, when you want to use push model, your server should know the client. In other words, we know that HTTP (based on TCP/IP) is a stateless protocol, which means that after each request, your connection is closed, and server loses you and forgets about you.
If you want the server to know the client, you should keep the connection open. This is usually done via some HTTP headers like Keep-Alive and Connection.
But to do that you should read Comet Programming. However, this reduces your scalability, because more connections are kept open for a one-to-one map between connection and client (To understand this better, you can think of connections as doors of the server. The more you occupy the door as a client, the less other clients can use it).

Checkout socket.io. If web sockets isn't supported by the browser it will fallback to the next best transport technology.
There is even a chat example included in the source code.
As for your concerns about scalability, node.js is perfect for this due to it's event driven, non-blocking nature. Handling many open connections is one of Node's real strengths.
Plurk uses Node.js for their real time chat features and they support 100k+ users.

The whole site is blocked while one page is waiting for blocking operation (PHP, ASP.NET). Why?

Good day!
I've found interesting behaviour for both LAMP stack and ASP.NET.
The scenario:
There is page performing task in 2-3 minutes (making HttpWebRequest for ASP.NET and curl for PHP). While this page is processed all other requests to this virtual host from the same browser are not processed (even if I use different browsers from one machine). I use two pages written in PHP and C#.
I've tested with Apache+PHP in both mod_php and fast_cgi modes on Windows and Debian.
For ASP.NET I use IIS6 (with dedicated app pool for this site and with default app pool) and IIS7 in integrated mode.
I know that it is better to use async calls for such things, but I'm just curious why single page blocks the entire site and not only the thread processing the request?
Thanks in advance!

It seems you open standard php session that is open until end of request. That means session file is locked. Use session_write_close() soon as possible if you don't need session data already.

I don't think it's blocking the site; I would suspect that the open connection is blocking the client from making more requests. Have you proven that other machines can't use the site while your long-running request is in progress?

If you only see a single request coming to the app the only thing I can think of is a global lock somewhere in the pipeline.
The lock can be explicit (you wrote the lock statement) or implicit. If you can see several requests - it can be due to the thread pool exhaustion.
Keep in mind that in addition to the cap on the number of threads used to process incoming web requests there is a separate cap on the number of simultaneous outgoing web requests through HttpWebRequest and by default this limit is very low - if I remember correctly 2 per CPU. I do not remember the name of the setting in the web.config, but will try to look it up.
In any case posting code would give us a better chance to assist you

I've definitely noticed this behavior while debugging ASP.NET applications, but I have always just assumed it was a debug config issue. Are you building everything in release mode and have debugging turned off in your web.config?

ASP.NET applications have a global session lock.
Use EnableSessionState="ReadOnly" for WebForms or [SessionState(SessionStateBehavior.ReadOnly)] for MVC. It will prevent the lock (of course you can't write anything to a read-only session).
I just discovered why all ASP.Net websites are slow, and I am trying to work out what to do about it

Shared/pooled connections to backend services in PHP

I'm trying to figure out the best way to minimize resource utilization when I have PHP talking to various backend services (e.g. Amazon S3 or any other random web services -- I'd like a general solution). Ideally, I'd like to have a single persistent connection to the backend (or maybe a small pool of persistent connections) with some caching, and then have all of the PHP tasks share it. We can consider it all read-only for the purposes of this question. It's not obvious to me how to do this in PHP. There's the database-specific stuff like mysql_pconnect(), but that doesn't really do it for me.
One idea I've had, which seems seems somewhat suboptimal (but is still better than having every single request create and destroy a new connection) is to use a local caching proxy (in a separate process) that would effectively do the pooling and caching. PHP would still be opening and closing a connection for every request, but at least it would be to a local process, so it should be a little faster (and it would reduce load on the backends). But it doesn't seem like this kind of craziness should be necessary. There's gotta be a better way. This is easy in other languages. Please tell me what I'm missing!

There's a large ideological disconnect between the various web technologies. Some are essentially daemons that run full-time in the background, and handle requests passed in on their own. Because there's a process always running, you can have a pool of already open existing working connections.
PHP (and normal CGI scripts) does not have a daemon behind the scenes. Every time a request comes in, the PHP interpreter is started up with a clean slate, compiles the scripts, and runs the bytecode. There's no persistence. The PHP database functions that support persistent connections establish the connection at the web server child level (i.e. mod_php attached to an Apache process). This isn't exactly a connection pool, as you can only ever see the persistent connection attached to your own process.
Without having a daemon or similar process sitting behind the scenes to hand out resources, you won't get real connection pooling.
Keep in mind that most new connections to most services are not heavy-weight, and non-database connections that are heavy-weight might not be friendly to the concept of a connection pool.
Before you think about writing your own PHP-based daemon to handle stuff like this, keep in mind that it may already be a solved problem. Python came up with something called WSGI, with a similar implementation in Ruby called Rack. Perl also has something remarkably similar but I can't remember the name of it off the top of my head. A quick look at Google didn't show any PHP implementations of WSGI, but that doesn't mean they don't exist...

Because S3 and other webservices use HTTP as their transport, you won't get a significant benefit from caching the connection.
Although you may be using an API that appears to authenticate as a first step, looking at the S3 Documentation, the authentication happens with every request - so no benefit in authenticating once and reusing a connection
Web service requests over HTTP are lightweight and typically stateless. Once your request has been answered, no resources (connection or sesson state) are consumed on the server. This allows the web service implementer to use many machines to answer your request without tying up resources on a particular server

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.