I have created a simple web application with own framework, and have a confusion that when I am dividing the php code into many files for reusability purpose, how much it will affect on performance. I have used CodeIgniter, but if I compare my framework, it has more files to process than the CodeIgniter.
In order to properly answer this question you have to know various things about your hard drive in terms of it's IOPs, cluster size, seek time, sata connection, and/or RAID configuration.
Once you know this stuff and can calculate the time it takes to read a specfic file size from your disk then you can begin calculating how many requests per second would bog down your system.
Once you know this then you need to anticipate how many users are going to hit the system at once.
Another factor is CPU power and RAM speed because if your script is complex or uses a lot of memory then your CPU will be doing a lot of work and hopefully the RAM can keep up.
If you don't want to follow all these steps then run a while() loop that creates, reads, and deletes 5000 dynamic files between 4-50 KB each and use microtime(true) to bench it.
If you are on a shared hosting plan then your only option might be to implement the benchmarking idea at various peak and down times. I will bet that a 2am benchmark will fare much better than a 2pm one.
Good luck!
Theoretically, Number of files matters, but practically, it has a little affect. For example, if one file is divided into 2 files, but if you divide a file into 100 files then it might matter
Related
I have a server with 2 physical CPU which have together 24 cores and 10 GB RAM.
The PHP program is calculating a statistic and I could run each section totally independent of the others. Once all calculations finished I have only to "merge" them.
Therefore I had the idea to perform each calculation phase in a separate thread created/controlled by "pthread".
Each calculation takes around 0,10 seconds but the amount of the calculations lets it take that long when they are serialized.
My questions:
Is there a limitation when creating a new "thread" with "pthreads"?
What is the overhead when creating a new thread? I must consider this to avoid a new delay.
I can imagine that for several seconds the load would be very high but then it ends suddenly once each calculation finished. This is not the problem. It is "my" server and I do not have to take care regarding other users [or when it is a shared server].
While "waiting" for an answer :-) I started to rewrite the class.
I can summarize it like this:
There is no way to start 600 threads at once. I expected it but I wanted to know where is the limit. My configuration "allowed" around 160 threads to be started.
When starting more than these 150 threads the PHP script stopped working without any further notice.
As Franz Gleichmann pointed out the whole process took longer when starting lot of threads. I found out that starting 20 threads has the best performance.
The achieved performance gain is between 20% and 50% - I am satisfied.
I don't know if it is a bug in the pthread library but I could not access any class members. I had to move the class members inside the function. Due to the fact the calculation is in one function it did not bother me and I do not investigate it further.
I wrote a PHP-CLI script that mixes two audio (.WAV PCM) files (with some math involved) so PHP needs to crunch through thousands (if not even millions) of samples with unpack(), do math on them and save them with pack().
Now, I dont need actual info on how to do the mixing or anything, as the title says, I'm looking for possibilites to speed this process up since the script needs 30 seconds of processing time to produce 10 seconds of audio output.
Things that I tried:
Cache the audiofiles to memory and crunch through with substr() instead of fseek()/fread(). Performance gain: 3 seconds.
Write the output file in 5000-samples chunks. Performance gain: 10 seconds.
After those optimizations I ended up at approximately 17 seconds processing time for 10 seconds audio output. What bugs me, is that other tools can do simple audio operations like mixing two files in realtime or even much faster.
Another idea I had was paralellization, but I refrained from that due to the extra problems that would occur (like calculating correct seek positions for the forks/threads and other related things).
So am I missing stuff out or is this actually good performance for a PHP-CLI script?
Thanks for everyone's input on this one.
I rewrote the thing in C++ and can now perform the above actions in less than a second.
I'd never have thought that the speed difference is that huge (compiled application is ~40X faster).
Hey,
I currently have over 300+ qps on my mysql. There is roughly 12000 UIP a day / no cron on fairly heavy PHP websites. I know it's pretty hard to judge if is it ok without seeing the website but do you think that it is a total overkill?
What is your experience? If I optimize the scripts, do you think that I would be able to get substantially lower of qps? I mean if I get to 200 qps that won't help me much. Thanks
currently have over 300+ qps on my mysql
Your website can run on a Via C3, good for you !
do you think that it is a total overkill?
That depends if it's
1 page/s doing 300 queries, yeah you got a problem.
30-60 pages/s doing 5-10 queries each, then you got no problem.
12000 UIP a day
We had a site with 50-60.000, and it ran on a Via C3 (your toaster is a datacenter compared to that crap server) but the torrent tracker used about 50% of the cpu, so only half of that tiny cpu was available to the website, which never seemed to use any significant fraction of it anyway.
What is your experience?
If you want to know if you are going to kill your server, or if your website is optimizized, the following has close to zero information content :
UIP (unless you get facebook-like numbers)
queries/s (unless you're above 10.000) (I've seen a cheap dual core blast 20.000 qps using postgres)
But the following is extremely important :
dynamic pages/second served
number of queries per page
time duration of each query (ALL OF THEM)
server architecture
vmstat, iostat outputs
database logs
webserver logs
database's own slow_query, lock, and IO logs and statistics
You're not focusing on the right metric...
I think you are missing the point here. If 300+ qps are too much heavily depends on the website itself, on the users per second that visit the website, that the background scripts that are concurrently running, and so on. You should be able to test and/or compute an average query throughput for your server, to understand if 300+ qps are fair or not. And, by the way, it depends on what these queries are asking for (a couple of fields, or large amount of binary data?).
Surely, if you optimize the scripts and/or reduce the number of queries, you can lower the load on the database, but without having specific data we cannot properly answer your question. To lower a 300+ qps load to under 200 qps, you should on average lower your total queries by at least 1/3rd.
Optimizing a script can do wonders. I've taken scripts that took 3 minutes before to .5 seconds after simply by optimizing how the calls were made to the server. That is an extreme situation, of course. I would focus mainly on minimizing the number of queries by combining them if possible. Maybe get creative with your queries to include more information in each hit.
And going from 300 to 200 qps is actually a huge improvement. That's a 33% drop in traffic to your server... that's significant.
You should not focus on the script, focus on the server.
You are not saying if these 300+ querys are causing issues. If your server is not dead, no reason to lower the amount. And if you have already done optimization, you should focus on the server. Upgrade it or buy more servers.
I'm currently re-writing my site using my own framework (it's very simple and does exactly what I need, i've no need for something like Zend or Cake PHP). I've done alot of work in making sure everything is cached properly, caching pages in files so avoid sql queries and generally limiting the number of sql queries.
Overall it looks like it's very speedy. The average time taken for the front page (taken over 100 times) is 0.046152 microseconds.
But one thing i'm not sure about is whether i've done enough to reduce php memory usage. The only time i've ever encountered problems with it is when uploading large files.
Using memory_get_peak_usage(TRUE), which I THINK returns the highest amount of memory used whilst the script has been running, the average (taken over 100 times) is 1572864 bytes.
Is that good?
I realise you don't know what it is i'm doing (it's rather simple, get the 10 latest articles, the comment count for each, get the user controls, popular tags in the sidebar etc). But would you be at all worried with a script using that sort of memory getting hit 50,000 times a day? Or once every second at peak times?
I realise that this is a very open ended question. Hopefully you can understand that it's a bit of a stab in the dark and i'm really just looking for some re-assurance that it's not going to die horribly come re-launch day.
EDIT: Just an mini experiment I did for myself. I downloaded and installed Wordpress and a default installation with no extra add ons, just one user and just one post and it used 10.5 megabytes of memory or "11010048 bytes". Quite pleased with my 1.5mb now.
Memory usage values can vary heavily and are subject to fluctuation, but as you already say in your update, a regular WordPress instance is much, much fatter than that. I have had great troubles to get the WordPress backend running with a memory_limit of sixteen megabytes - let alone when Plug-ins come into play. So from that, I'd say a peak of 1,5 Megabytes performing normal tasks is quite okay.
Generation time is extremely subject to the hardware your site runs on, obviously. However, a generation time of 0.046152 seconds (I assume you mean seconds here) sounds very okay to me under normal circumstances.
It is a subjective question. PHP has a lot of overhead and when calling the function with TRUE, that overhead will be included. You'll see what I mean when you call the function in a simple Hello World script. Also keep in mind that results can differ greatly depending on whether PHP is run as an apache module or FastCGI.
Unfortunately, no one can provide assurances. There will always be unforseen variables that can bring down a site. Perform load testing. Use a code profiler to narrow down the location of any bottlenecks to see if there are ways to make those code blocks more efficient
Encyclopaedia Britannica thought they were prepared when they launched their ad-supported encyclopedia ten years ago. The developers didn't know they would be announcing it on Good Morning America the day of the launch. The whole thing came crashing down for days.
As long as your systems aren't swapping, your memory usage is reasonable. Any additional concern is just premature optimization.
I'm working on a social network like Friendfeed. When user add his feed links, I use a cron job to parse each user feed. Is this possible with large number of users, like parsing 10.000 links each hour or will that cause problems? If it isn't possible, what is used on Friendfeed or RSS readers to do that?
You might consider adding some information about your hardware to your question, this makes a big difference for someone looking to advise you on how easily your implementation will scale.
If you end up parsing millions of links, one big cron job is going to become problematic. I am assuming you are doing the following (if not, you probably should):
Realizing when users subscribe to the same feed, to avoid fetching it twice.
When fetching a new feed, check for the existence of a site map that tells you how often the feed is likely to change, re-visit that value on a sensible interval
Checking system load and memory usage to know when to 'back off' and go to sleep for a while.
This reduces the amount of sweat that an hourly cron would produce.
If you are harvesting millions of feeds, you'll probably want to distribute that work, something that you might want to keep in mind while you're still desigining your database.
Again, please update your question with details on the hardware you are using and how big your solution needs to scale. Nothing scales 'infinitely', so please be realistic :)
Don't have quite enough information to judge whether this design is good or not, but to answer the basic question, unless you are doing some very intensive processing on 10k questions, that should be trivial for an hourly cron job to handle.
More information on how you process the feeds, and in particular how the process scales with respect to number of users who have feeds and number of feeds per user, would be useful in giving you further advice.
Your limiting factor will be the network access to these 10,000 feeds. You could process the feeds serially and likely do 10,000 in an hour (you'd need to average about 350ms latency).
Of course you'd want to have more than one process doing the work simultaneously to speed things up.
What ever solution you select, if you meet success (which I hope), you will have performance issue.
As the founder of FF said many times: the only solution to select the best actual solution is to profile/measure. With numbers the choice will be obvious.
So: build a test architecture close to your expected (=realistic) situation in a few months and profile/measure.
You might want to consider checking out IronWorker for big data jobs like this. It's made for it and since it's a service you don't need to deal with servers or scale. It has scheduling built in so you would schedule a worker task to run each hour and that task can then queue up 10,000 other jobs and run them all in parallel.