I've recently set up a test server running in a virtual machine on my computer so I can do such things as interactive debugging with XDebug. For the most part it's pretty sweet, but I've run into a snag when running multiple requests to the server at once from the same client.
The problem is that guest-host network connection doesn't really exist as a physical connection, so it will run as fast as the computer hardware will allow. This isn't usually a big issue, but I'm trying to implement APC file upload monitoring, and this requires an AJAX request to run in parallel to the file upload to monitor its performance. In the real world, the network would introduce lag and latency and suchlike, leaving enough unused bandwidth for the AjAX request to run in parallel with the file upload. However, in the test machine, the AJAX request can't fetch any data from the server until the upload is finished as there's absolutely no bandwidth left available to it.
Is it possible to set up some kind of bandwidth management in the virtual machine (in Apache, PHP or some Linux utility) that could limit the bandwidth available per HTTP request? For example so that each request is limited to 1mbps, but several requests can exist between the client and the server at the same time? I'm hoping that if this can be done it will allow the AJAX request to fetch its data while the upload is progressing instead of being stalled until the upload actually completes.
I tried a utility called IPRelay, but I don't seem able to get it to work, or at least not in a way that limits per request.
What you're asking for is called Traffic Shaping.
Lighttpd (an alternative to Apache) supports this natively
For Apache, there are a few ways of doing it.
mod_bandwidth - A 3pd module (that hasn't been updated recently) which appears to do the same thing.
mod_bwshare - 3pd module designed to combat DOS attacks, but may be helpful.
Here's a ServerFault Question that may be relevant...
Thanks for the reply. However, I found a handy little utility for Linux called iprelay that lets me throttle connections, it seems to let me have multiple connections open with each connection throttled to the specified limit. That's what I've been using today for testing my APC code and it all seems to be working fine.
Related
I currently finished building a Web server who's main responsibility is to simply take the contents of the body data in each http post request and write it to a log file. The contents of the post data is obfuscated when received. So i'm un obfuscating the post data and writing it to a log file on the server. The contents after obfuscated is a series of random key value pairs that differ between every request. It is not fixed data.
The server is running Linux with 2.6+ kernel. Server is configured to handle heavy traffic (open files limit 32k, etc). The application is written in Python using web.py framework. The http server is Gunicorn behind Nginx.
After using Apache Benchmark to do some load testing, I noticed that it can handle up to about 600-700 requests per second without any log writing issues. Linux natively does a good job at buffering. Problems start to occur when more than this many requests per second attempt to write to the same file at same moment. Data will not get written and information will be lost. I know that "the writing directly to a file" design might not have been the right solution from the get go.
So i'm wondering if anyone can propose a solution that I can implement quickly without altering too much infrastructure and code that can overcome this problem?
I have read about in memory storage like Redis, but I have realized that if data is sitting in memory during server failure then that data is lost. I have read in the docs that redis can be configured as a persistent store, there just needs to be enough memory on the server for Redis to do it. This solution would mean that I would have to write a script that would dump the data from Redis (memory) to the Log file at a certain interval.
I am wondering if there is even a quicker solution? Any help would be greatly appreciated!
One possible option what I can think of is a separate logging process. So that your web.py can be shielded for performance issue. This is classical way of handling logging module. You can use IPC or any other bus communication infrastructure. With this you will be able to address two issues -
Logging will not be a huge bottle neck for high capacity call flows.
A separate module can ensure/provide switch off/on facility.
As such there would not be any huge/significant process memory usage.
However, you should bear in mind below points -
You need be sure that logging is restricted to just logging. It must not be a data store for business processing. Else you may have many synchronization problem in your business logic.
The logging process (here I mean actual Unix process) will become critical and slightly complex (i.e you may have to handle a form of IPC).
HTH!
This is a bit complicated, so please don't jump to conclusions, feel free to ask about anything that is not clear enough.
Basically, I have a websocket server written in PHP. Please note that websocket messages are asynchronous, that is, a response to a request might take a lot of time, all the while the client keeps on working (if applicable).
Clients are supposed to ask the server for access to files on other servers. This can be an FTP service, or Dropbox, for the matter.
Here, please take note of two issues: connections should be shared and reused and the server actually 'freezes' while it does its work, hence any requests are processed after the server has 'unfrozen'.
Therefore, I thought, why not offload file access (which is what freezes the server) to PHP threads?
The problem here is twofold;
how do I make a connection resource in the main thread (the server) available to the sub threads (not possible with the above threading model)?
what would happen if two threads end up needing the same resource? It's perfectly fine if one is locked until the other one finishes, but we still need to figure out issue #1.
Perhaps my train of thought is all screwed up, if you can find a better solution, I'm eager to hear it out. I've also had the idea of having a PHP thread hosting a connection resource, but it's pretty memory intensive.
PHP supports no threads. The purpose of PHP is to respond to web requests quickly. That's what the architecture was built for. Different libraries try to do something like threads but they usually cause more issues than they solve.
In general there are two ways to achieve what you want:
off-load the long processes to an external process. A common approach is using a system like gearman http://php.net/gearman
Use asynchronous operations. Some stream operations and such provide an "async" flag or "non-blocking" mode. http://php.net/stream-set-blocking
I am designing a file download network.
The ultimate goal is to have an API that lets you directly upload a file to a storage server (no gateway or something). The file is then stored and referenced in a database.
When the file is requsted a server that currently holds the file is selected from the database and a http redirect is done (or an API gives the currently valid direct URL).
Background jobs take care of desired replication of the file for durability/scaling purposes.
Background jobs also move files around to ensure even workload on the servers regarding disk and bandwidth usage.
There is no Raid or something at any point. Every drive ist just hung into the server as JBOD. All the replication is at application level. If one server breaks down it is just marked as broken in the database and the background jobs take care of replication from healthy sources until the desired redundancy is reached again.
The system also needs accurate stats for monitoring / balancing and maby later billing.
So I thought about the following setup.
The environment is a classic Ubuntu, Apache2, PHP, MySql LAMP stack.
An url that hits the currently storage server is generated by the API (thats no problem far. Just a classic PHP website and MySQL Database)
Now it gets interesting...
The Storage server runs Apache2 and a PHP script catches the request. URL parameters (secure token hash) are validated. IP, Timestamp and filename are validated so the request is authorized. (No database connection required, just a PHP script that knows a secret token).
The PHP script sets the file hader to use apache2 mod_xsendfile
Apache delivers the file passed by mod_xsendfile and is configured to have the access log piped to another PHP script
Apache runs mod_logio and an access log is in Combined I/O log format but additionally estended with the %D variable (The time taken to serve the request, in microseconds.) to calculate the transfer speed spot bottlenecks int he network and stuff.
The piped access log then goes to a PHP script that parses the url (first folder is a "bucked" just as google storage or amazon s3 that is assigned one client. So the client is known) counts input/output traffic and increases database fields. For performance reasons i thought about having daily fields, and updating them like traffic = traffic+X and if no row has been updated create it.
I have to mention that the server will be low budget servers with massive strage.
The can have a close look at the intended setup in this thread on serverfault.
The key data is that the systems will have Gigabit throughput (maxed out 24/7) and the fiel requests will be rather large (so no images or loads of small files that produce high load by lots of log lines and requests). Maby on average 500MB or something!
The currently planned setup runs on a cheap consumer mainboard (asus), 2 GB DDR3 RAM and a AMD Athlon II X2 220, 2x 2.80GHz tray cpu.
Of course download managers and range requests will be an issue, but I think the average size of an access will be around at least 50 megs or so.
So my questions are:
Do I have any sever bottleneck in this flow? Can you spot any problems?
Am I right in assuming that mysql_affected_rows() can be directly read from the last request and does not do another request to the mysql server?
Do you think the system with the specs given above can handle this? If not, how could I improve? I think the first bottleneck would be the CPU wouldnt it?
What do you think about it? Do you have any suggestions for improvement? Maby something completely different? I thought about using Lighttpd and the mod_secdownload module. Unfortunately it cant check IP adress and I am not so flexible. It would have the advantage that the download validation would not need a php process to fire. But as it only runs short and doesnt read and output the data itself i think this is ok. Do you? I once did download using lighttpd on old throwaway pcs and the performance was awesome. I also thought about using nginx, but I have no experience with that. But
What do you think ab out the piped logging to a script that directly updates the database? Should I rather write requests to a job queue and update them in the database in a 2nd process that can handle delays? Or not do it at all but parse the log files at night? My thought that i would like to have it as real time as possible and dont have accumulated data somehwere else than in the central database. I also don't want to keep track on jobs running on all the servers. This could be a mess to maintain. There should be a simple unit test that generates a secured link, downlads it and checks whether everything worked and the logging has taken place.
Any further suggestions? I am happy for any input you may have!
I am also planning to open soure all of this. I just think there needs to be an open source alternative to the expensive storage services as amazon s3 that is oriented on file downloads.
I really searched a lot but didnt find anything like this out there that. Of course I would re use an existing solution. Preferrably open source. Do you know of anything like that?
MogileFS, http://code.google.com/p/mogilefs/ -- this is almost exactly thing, that you want.
EDIT: Update - scroll down
EDIT 2: Update - problem solved
Some background information:
I'm writing my own webserver in Java and a couple of days ago I asked on SO how exactly Apache interfaces with PHP, so I can implement PHP support. I learnt that FastCGI is the best approach (since mod_php is not an option). So I have looked at the FastCGI protocol specification and have managed to write a working FastCGI wrapper for my server. I have tested phpinfo() and it works, in fact all PHP functions seem to work just fine (posting data, sessions, date/time, etc etc).
My webserver is able to serve requests concurrently (ie user1 can retrieve file1.html at the same time as user2 requesting some_large_binary_file.zip), it does this by spawning a new Java thread for each user request (terminating when completed or user connection with client is cancelled).
However, it cannot deal with 2 (or more) FastCGI requests at the same time. What it does is, it queues them up, so when request 1 is completed immediately thereafter it starts processing request 2. I tested this with 2 PHP pages, one contains sleep(10) and the other phpinfo().
How would I go about dealing with multiple requests as I know it can be done (PHP under IIS runs as FastCGI and it can deal with multiple requests just fine).
Some more info:
I am coding under windows and my batch file used to execute php-cgi.exe contains:
set PHP_FCGI_CHILDREN=8
set PHP_FCGI_MAX_REQUESTS=500
php-cgi.exe -b 9000
But it does not spawn 8 children, the service simply terminates after 500 requests.
I have done research and from Wikipedia:
Processing of multiple requests
simultaneously is achieved either by
using a single connection with
internal multiplexing (ie. multiple
requests over a single connection)
and/or by using multiple connections
Now clearly the multiple connections isn't working for me, as everytime a client requests something that involves FastCGI it creates a new socket to the FastCGI application, but it does not work concurrently (it queues them up instead).
I know that internal multiplexing of FastCGI requests under the same connection is accomplished by issuing each unique FastCGI request with a different request ID.
(also see the last 3 paragraphs of 'The Communication Protocol' heading in this article).
I have not tested this, but how would I go about implementing that? I take it I need some kind of FastCGI Java thread which contains a Map of some sort and a static function which I can use to add requests to. Then in the Thread's run() function it would have a while loop and for every cycle it would check whether the Map contains new requests, if so it would assign them a request ID and write them to the FastCGI stream. And then wait for input etc etc, As you can see this becomes too complicated.
Does anyone know the correct way of doing this? Or any thoughts at all? Thanks very much.
Note, if required I can supply the code for my FastCGI wrapper.
Update:
Basically, I downloaded nginx and set it up to use PHP as a FastCGI application and it too suffered from the same problem as my server. It could not handle concurrent PHP requests. This is leads me to believe my code is in fact correct. So something is wrong with PHP or I am not setting it up correctly. Maybe it is because I am using Windows because some lighttpd users claim Windows can't handle FastCGI properly (this doesn't make much sense). I'll install Linux sometime soon and report any progress with that.
Okay, I managed to find the cause of the problem. It wasn't my code at all. It's PHP, it cannot spawn additional php-cgi's under Windows when running as FastCGI mode, under Linux it works perfectly, I simply pointed my server to my linux box IP and it had no problems with concurrent FCGI requests. Sucks, but I guess that's the way it is...
I did look deeper into the PHP source code after that and found that the section of code which responds to PHP_FCGI_CHILDREN has been encapsulated by #ifndef WIN32 So the developers must be aware of the issue
Hi this comes a little late, I've wrote a spawner for php-cgi.exe on windows, not perfect but it might be what you needed. Check it at here.
re: spawn-php python script...
Thanks #nosam that really helped.
For those wanting to get it working quickly you'll need the following (if 64bit system)
ActivePython-2.7.2.5-win64-x64.msi pywin32-217.win-amd64-py2.7.exe
ActivePython does not have older versions of these on their www so you will need to do a bit of googling around to find a working mirror (there are plenty out there)
Once you have downloaded the src from bitbucket you may need to edit spawn-php.py (to fix up the tab spacing), as bit-bucket seemed to mess up the tab's in the file preventing it from running.
All-in-all that saved my day for a busy little windows website using nginx + fast-cgi.
Thanks mate!
Good day!
I've found interesting behaviour for both LAMP stack and ASP.NET.
The scenario:
There is page performing task in 2-3 minutes (making HttpWebRequest for ASP.NET and curl for PHP). While this page is processed all other requests to this virtual host from the same browser are not processed (even if I use different browsers from one machine). I use two pages written in PHP and C#.
I've tested with Apache+PHP in both mod_php and fast_cgi modes on Windows and Debian.
For ASP.NET I use IIS6 (with dedicated app pool for this site and with default app pool) and IIS7 in integrated mode.
I know that it is better to use async calls for such things, but I'm just curious why single page blocks the entire site and not only the thread processing the request?
Thanks in advance!
It seems you open standard php session that is open until end of request. That means session file is locked. Use session_write_close() soon as possible if you don't need session data already.
I don't think it's blocking the site; I would suspect that the open connection is blocking the client from making more requests. Have you proven that other machines can't use the site while your long-running request is in progress?
If you only see a single request coming to the app the only thing I can think of is a global lock somewhere in the pipeline.
The lock can be explicit (you wrote the lock statement) or implicit. If you can see several requests - it can be due to the thread pool exhaustion.
Keep in mind that in addition to the cap on the number of threads used to process incoming web requests there is a separate cap on the number of simultaneous outgoing web requests through HttpWebRequest and by default this limit is very low - if I remember correctly 2 per CPU. I do not remember the name of the setting in the web.config, but will try to look it up.
In any case posting code would give us a better chance to assist you
I've definitely noticed this behavior while debugging ASP.NET applications, but I have always just assumed it was a debug config issue. Are you building everything in release mode and have debugging turned off in your web.config?
ASP.NET applications have a global session lock.
Use EnableSessionState="ReadOnly" for WebForms or [SessionState(SessionStateBehavior.ReadOnly)] for MVC. It will prevent the lock (of course you can't write anything to a read-only session).
I just discovered why all ASP.Net websites are slow, and I am trying to work out what to do about it