We have a bunch of cli cron style scripts that are coded in php.
A few of these services use ftp to send data to remote locations.
The way things are set up, what happens quite frequently is:
a) Script start
b) Connect to ftp # remote location
c) Send data
d) Close ftp connection
e) Terminate script
f) Return to A, repeat, within a short amount of time and send to the same target, but different data.
The issue is that there is quite a bit of overhead (read: slowdown) due to step b, where it first has to connect to the ftp server, login, make sure the folder exists, if not create it, etc etc... I know I know, the right way to do things would be to consolidate these transfers into single pushes... But its far more complicated then that. I simplified about 30-40 steps from here.
So what I was hoping on doing is setting up a system like this:
[ CRON CLI SCRIPT ] --->
[ LOCALLY HOSTED SOCKET BASED SERVER THAT KEEPS THE FTP CONNECTIONS OPEN ] --->
[ REMOTE FTP ]
With the above, we can keep the locally hosted socket based server running, and the ftp connections open and we would skip what is the longest part of the process, the ftp authentication related items.
While setting this up for a 'one at a time' style system in PHP is fairly trivial, what I have never done before is making this as close to multi threaded as possible.
Where by, the socket is opened (for example, 127.0.0.1:10000), and multiple requests can come in. If needed, 'children' are spawned, new ftp connections made, etc etc.
Can anyone shed some insight into making this multi threaded in php, OR, if there is another better solution out there? Perl is an option. Its been years (YEARS...) since I have touched it, but I am sure a couple days in front of some good docs would bring me up to speed enough to make it happen.
We have build a system that does more or less what you want. So, it is definitely possible to build a multi-process application in PHP.
It is however, not trivial. If you fork off a child process, you need to very carefully manage your remote connections in order to avoid problems. (use the socket_* family of functions instead of fsockopen for better control)
Also, Signals, tend to interrupt your normal program flow. This is off course normal, but PHP was not build with this in mind -> be prepared for some unexpected results.
try to user gearman , you can handle most expensive cpu usage with gearman , gearman make a new thread for each process .
Related
I'm coming from a PHP background and I'm a bit confused about how to safely use resources with Golang. My main concern is, in a web context, with PHP, scripts are usually short-lived (HTTP request / response lifetime), but with Golang they're supposed to run forever (because the Golang program acts as a web server and a web application at the same time).
So, when it comes to deal with database connections, log files, I often see that they should be opened once and not for each request, which makes sense. However how stable is it to do this?
For example, if I open a database connection, how can I be sure it won't break at some point? (if the database decides to kill it for some reason, or if my machine looses internet access, would the connection become valid again when I gain internet access later on?) Same for log files, with PHP it's not a problem for a sysadmin to setup log rotation, however in Golang it would break the file handle I think (if the program doesn't know about this)?
I'd really like to work the Golang way and not open/close those resources for each request but I'm not sure what is the "safe" way to do this, are there any recommendations for this? Or is there built-in features for those concerns?
In Go, there are built-in packages that will handle both log files and databases in a Go idiomatic way.
They are:
database/sql: https://golang.org/pkg/database/sql
log: https://golang.org/pkg/log
In Go, as with any language where long running daemon's can be created, error checking and error handling will be important.
This is a bit complicated, so please don't jump to conclusions, feel free to ask about anything that is not clear enough.
Basically, I have a websocket server written in PHP. Please note that websocket messages are asynchronous, that is, a response to a request might take a lot of time, all the while the client keeps on working (if applicable).
Clients are supposed to ask the server for access to files on other servers. This can be an FTP service, or Dropbox, for the matter.
Here, please take note of two issues: connections should be shared and reused and the server actually 'freezes' while it does its work, hence any requests are processed after the server has 'unfrozen'.
Therefore, I thought, why not offload file access (which is what freezes the server) to PHP threads?
The problem here is twofold;
how do I make a connection resource in the main thread (the server) available to the sub threads (not possible with the above threading model)?
what would happen if two threads end up needing the same resource? It's perfectly fine if one is locked until the other one finishes, but we still need to figure out issue #1.
Perhaps my train of thought is all screwed up, if you can find a better solution, I'm eager to hear it out. I've also had the idea of having a PHP thread hosting a connection resource, but it's pretty memory intensive.
PHP supports no threads. The purpose of PHP is to respond to web requests quickly. That's what the architecture was built for. Different libraries try to do something like threads but they usually cause more issues than they solve.
In general there are two ways to achieve what you want:
off-load the long processes to an external process. A common approach is using a system like gearman http://php.net/gearman
Use asynchronous operations. Some stream operations and such provide an "async" flag or "non-blocking" mode. http://php.net/stream-set-blocking
I am designing a file download network.
The ultimate goal is to have an API that lets you directly upload a file to a storage server (no gateway or something). The file is then stored and referenced in a database.
When the file is requsted a server that currently holds the file is selected from the database and a http redirect is done (or an API gives the currently valid direct URL).
Background jobs take care of desired replication of the file for durability/scaling purposes.
Background jobs also move files around to ensure even workload on the servers regarding disk and bandwidth usage.
There is no Raid or something at any point. Every drive ist just hung into the server as JBOD. All the replication is at application level. If one server breaks down it is just marked as broken in the database and the background jobs take care of replication from healthy sources until the desired redundancy is reached again.
The system also needs accurate stats for monitoring / balancing and maby later billing.
So I thought about the following setup.
The environment is a classic Ubuntu, Apache2, PHP, MySql LAMP stack.
An url that hits the currently storage server is generated by the API (thats no problem far. Just a classic PHP website and MySQL Database)
Now it gets interesting...
The Storage server runs Apache2 and a PHP script catches the request. URL parameters (secure token hash) are validated. IP, Timestamp and filename are validated so the request is authorized. (No database connection required, just a PHP script that knows a secret token).
The PHP script sets the file hader to use apache2 mod_xsendfile
Apache delivers the file passed by mod_xsendfile and is configured to have the access log piped to another PHP script
Apache runs mod_logio and an access log is in Combined I/O log format but additionally estended with the %D variable (The time taken to serve the request, in microseconds.) to calculate the transfer speed spot bottlenecks int he network and stuff.
The piped access log then goes to a PHP script that parses the url (first folder is a "bucked" just as google storage or amazon s3 that is assigned one client. So the client is known) counts input/output traffic and increases database fields. For performance reasons i thought about having daily fields, and updating them like traffic = traffic+X and if no row has been updated create it.
I have to mention that the server will be low budget servers with massive strage.
The can have a close look at the intended setup in this thread on serverfault.
The key data is that the systems will have Gigabit throughput (maxed out 24/7) and the fiel requests will be rather large (so no images or loads of small files that produce high load by lots of log lines and requests). Maby on average 500MB or something!
The currently planned setup runs on a cheap consumer mainboard (asus), 2 GB DDR3 RAM and a AMD Athlon II X2 220, 2x 2.80GHz tray cpu.
Of course download managers and range requests will be an issue, but I think the average size of an access will be around at least 50 megs or so.
So my questions are:
Do I have any sever bottleneck in this flow? Can you spot any problems?
Am I right in assuming that mysql_affected_rows() can be directly read from the last request and does not do another request to the mysql server?
Do you think the system with the specs given above can handle this? If not, how could I improve? I think the first bottleneck would be the CPU wouldnt it?
What do you think about it? Do you have any suggestions for improvement? Maby something completely different? I thought about using Lighttpd and the mod_secdownload module. Unfortunately it cant check IP adress and I am not so flexible. It would have the advantage that the download validation would not need a php process to fire. But as it only runs short and doesnt read and output the data itself i think this is ok. Do you? I once did download using lighttpd on old throwaway pcs and the performance was awesome. I also thought about using nginx, but I have no experience with that. But
What do you think ab out the piped logging to a script that directly updates the database? Should I rather write requests to a job queue and update them in the database in a 2nd process that can handle delays? Or not do it at all but parse the log files at night? My thought that i would like to have it as real time as possible and dont have accumulated data somehwere else than in the central database. I also don't want to keep track on jobs running on all the servers. This could be a mess to maintain. There should be a simple unit test that generates a secured link, downlads it and checks whether everything worked and the logging has taken place.
Any further suggestions? I am happy for any input you may have!
I am also planning to open soure all of this. I just think there needs to be an open source alternative to the expensive storage services as amazon s3 that is oriented on file downloads.
I really searched a lot but didnt find anything like this out there that. Of course I would re use an existing solution. Preferrably open source. Do you know of anything like that?
MogileFS, http://code.google.com/p/mogilefs/ -- this is almost exactly thing, that you want.
I'm trying to figure out the best way to minimize resource utilization when I have PHP talking to various backend services (e.g. Amazon S3 or any other random web services -- I'd like a general solution). Ideally, I'd like to have a single persistent connection to the backend (or maybe a small pool of persistent connections) with some caching, and then have all of the PHP tasks share it. We can consider it all read-only for the purposes of this question. It's not obvious to me how to do this in PHP. There's the database-specific stuff like mysql_pconnect(), but that doesn't really do it for me.
One idea I've had, which seems seems somewhat suboptimal (but is still better than having every single request create and destroy a new connection) is to use a local caching proxy (in a separate process) that would effectively do the pooling and caching. PHP would still be opening and closing a connection for every request, but at least it would be to a local process, so it should be a little faster (and it would reduce load on the backends). But it doesn't seem like this kind of craziness should be necessary. There's gotta be a better way. This is easy in other languages. Please tell me what I'm missing!
There's a large ideological disconnect between the various web technologies. Some are essentially daemons that run full-time in the background, and handle requests passed in on their own. Because there's a process always running, you can have a pool of already open existing working connections.
PHP (and normal CGI scripts) does not have a daemon behind the scenes. Every time a request comes in, the PHP interpreter is started up with a clean slate, compiles the scripts, and runs the bytecode. There's no persistence. The PHP database functions that support persistent connections establish the connection at the web server child level (i.e. mod_php attached to an Apache process). This isn't exactly a connection pool, as you can only ever see the persistent connection attached to your own process.
Without having a daemon or similar process sitting behind the scenes to hand out resources, you won't get real connection pooling.
Keep in mind that most new connections to most services are not heavy-weight, and non-database connections that are heavy-weight might not be friendly to the concept of a connection pool.
Before you think about writing your own PHP-based daemon to handle stuff like this, keep in mind that it may already be a solved problem. Python came up with something called WSGI, with a similar implementation in Ruby called Rack. Perl also has something remarkably similar but I can't remember the name of it off the top of my head. A quick look at Google didn't show any PHP implementations of WSGI, but that doesn't mean they don't exist...
Because S3 and other webservices use HTTP as their transport, you won't get a significant benefit from caching the connection.
Although you may be using an API that appears to authenticate as a first step, looking at the S3 Documentation, the authentication happens with every request - so no benefit in authenticating once and reusing a connection
Web service requests over HTTP are lightweight and typically stateless. Once your request has been answered, no resources (connection or sesson state) are consumed on the server. This allows the web service implementer to use many machines to answer your request without tying up resources on a particular server
I have made a chat script using php, mysql and jquery. It uses json to get data from the server. It makes fixed interval requests to the server with the lastly fetched message id to get new messages from the server. But when multiple users will be chatting then thousands and crores of requests will be made to the server within an hour and the hosting people will block it for sure.
Th gmail chat uses socket I think. Because it does not sends fixed interval requests for sure. Could any one of you please give me some sample code or some direction to solve this issue.
Please I need help desperately.
Many thanks in advance. My respect and regards for all.
If the host you are using would "block it for sure" if it's making that many requests, then you may want to consider getting a different host or upgrading your hosting package before worrying about your code. Check out how Facebook implements their chat:
The method we chose to get text from
one user to another involves loading
an iframe on each Facebook page, and
having that iframe's Javascript make
an HTTP GET request over a persistent
connection that doesn't return until
the server has data for the client.
The request gets reestablished if it's
interrupted or times out. This isn't
by any means a new technique: it's a
variation of Comet, specifically XHR
long polling, and/or BOSH.
You may find it useful to see an example of 'comet' technology in action using Prototype's comet daemon and a jetty webserver. The example code for within the jetty download has an example application for chat.
I recently installed jetty myself so you might find a log of my installation commands useful:
Getting started trying to run a comet service
Download Maven from http://maven.apache.org/
Install Maven using http://maven.apache.org/download.html#Installation
I did the following commands
Extracted to /home/sdwyer/apache-maven-2.0.9
> sdwyer#pluto:~/apache-maven-2.0.9$ export M2_HOME=/home/sdwyer/apache-maven-2.0.9
> sdwyer#pluto:~/apache-maven-2.0.9$ export M2=$M2_HOME/bin
> sdwyer#pluto:~/apache-maven-2.0.9$ export PATH=$M2:$PATH.
> sdwyer#pluto:~/apache-maven-2.0.9$ mvn --version
-bash: /home/sdwyer/apache-maven-2.0.9/bin/mvn: Permission denied
> sdwyer#pluto:~/apache-maven-2.0.9$ cd bin
> sdwyer#pluto:~/apache-maven-2.0.9/bin$ ls
m2 m2.bat m2.conf mvn mvn.bat mvnDebug mvnDebug.bat
> sdwyer#pluto:~/apache-maven-2.0.9/bin$ chmod +x mvn
> sdwyer#pluto:~/apache-maven-2.0.9/bin$ mvn –version
Maven version: 2.0.9
Java version: 1.5.0_08
OS name: “linux” version: “2.6.18-4-686″ arch: “i386″ Family: “unix”
sdwyer#pluto:~/apache-maven-2.0.9/bin$
Download the jetty server from http://www.mortbay.org/jetty/
Extract to /home/sdwyer/jetty-6.1.3
> sdwyer#pluto:~$ cd jetty-6.1.3//examples/cometd-demo
> mvn jetty:run
A whole stack of downloads run
Once it’s completed open a browser and point it to:
http://localhost:8080 and test the demos.
The code for the example demos can be found in the directory:
jetty-6.1.3/examples/cometd-demo/src/main/webapp/examples
Right or wrong, a hosting company might get cranky for a couple reasons:
1) Odds are good they are using apache prefork. Each chat request is probably gonna be a new connection and thus hog up a single apache process. Each apache process eats anywhere from 1mb of memory to 100mb of memory.
2) If they maintain the database server and you, the client, suck at database programming, you can hammer their database. "Suck" means anything from "no proper indexing" to "makes a bazillion tiny queries instead of nice fat ones".
As has been suggested above, make sure your code uses persistent connections. Also:
1) Implement a back-off algorithm on the client. Poll the server once a second during activity, then back off to five seconds, then ten, twenty, etc... That way you dont hammer the server when there is no activity.
2) Multiple tabs will kill you. User opens 10 tabs and they all have your chat widget polling the server once a second? Bad news. Even if your host doesn't get pissed, your performance will degrade.
If this thing gets huge, design your system in a way that you can run the chat-server bits independently from the rest of your web application. In otherwords, the clients would be making a request to "chat.yourwebapp.com", which in turn is running on something like lighttpd.
try socket in javascript
http://code.google.com/p/jsocket/
Why would the host block that? Your making a standard http request for a page, if your host doesn't allow that then it's time to switch.
As for using sockets, there is no native ability to connect to a socket via javascript, although I believe JSocket is a lib that allows you to bridge a socket through an embedded flash which is actually connected to your server. Haven't looked for a jquery plugin that does this, might be one.
Your server side code would also change drastically (persistent vs polling is very different) so you'd have your work cut out for you.
I recommend just doing what you are doing and upgrade your host if it can't handle it. Unless your going to have a huge number of users on at a time? A caching system so your not hitting the db on every single request can probably speed things up if it gets that busy.
You think about embedding a small Flash movie in the page and then use sockets to handle the communication with server. This will take a lot of the load from the server and would make much more easier to keep everything in sync. The UI could still make with JavaScript.
It you will stay with your JavaScript solution then silently ignore my answer :-)