My Memory Usage Assumption: Is it Correct? - php

I have a script when ran consumes 17mb of memory (logged using memory_get_peak_usage()).
Script is ran 1 million times per day.
Total daily memory consumption: 17 million mb
86400 seconds in a day.
17000000 / 86400 = 196.76
Assumption: Running this script 1 million times each day will require at least 196.76 dedicated memory.
Is my assumption correct?

If script is runs 1000000 copies at the same time, then you would get 17 million MB, but as it releases memory after to completes, you don't add usage to total sum.
You need to know how much copies run at same time and multiply that number of 17 MB. That would be max memory usage.

Not entirely correct; the first hundred times your script is executed, it'll probably all fit into memory fine; so, the first two minutes or so might go as expected. But, once you push your computer into swap, your computer will spend so much time handling swap, that the next 999,800 executions might go significantly slower than you'd expect. And, as they all start competing for disk bandwidth, it will get much worse the longer it runs.
I'm also not sure about the use of the php memory_get_peak_usage() function; it is an 'internal' view of the memory the program requires, not the view from the operating system's perspective. It might be significantly worse. (Perhaps the interpreter requires 20 megs RSS just to run a hello-world. Perhaps not.)
I'm not sure what would be the best way forward for your application: maybe it could be a single long-lived process, that handles events as they are posted, and returns results. That might be able to run in significantly less memory space. Maybe the results don't actually change every 0.0005 seconds, and you could cache the results for one second, and only run it 86400 times per day. Or maybe you need to buy a few more machines. :)

Yes, but what if it gets called multiple times at the same time? You have multiple threads running simultaneously.
EDIT: Also, why is that script running a million times / day? (unless you got a huge website)

I dont think this calculation will give correct results because you also need to consider other factors such as -
How long the script runs.
What are the number of scripts running at a time.
Distribution of the invocation of scripts throughout the time.
And in your calculation whats the point in dividing by 86400 seconds? why not hours or milliseconds. To me, the calculation seems pretty meaningless.

Related

What is the most efficient way to record JSON data per second

Reason
I've been building a system that pulls data from multiple JSON sources. The data being pulled is constantly changing and I'm recording what the changes are to a SQL database via a PHP script. 9 times out of 10 the data is different and therefore needs recording.
The JSON needs to be checked every single second. I've been successfully using a cron task every minute with a PHP function that loops 60 times over.
The problem I'm now having is that the more JSON sources I want to check the slower the PHP file runs, meaning the next cron get's triggered before the previous has finished. It's all starting to feel way too unstable and hacky.
Question
Assuming the PHP script is already the most efficient it can be, what else can be done?
Should I be using multiple cron tasks?
Should something else other then PHP be used?
Are cron tasks even suitable for this sort of problem?
Any experience, best practices or just plan old help will be very much appreciated.
Overview
I'm monitoring for active race sessions and recording each driver and then each lap a driver completes. Laps are recorded only once a driver crosses the start/finish line and I do not know when race sessions may or may not be active or when a driver crosses the line. Therefore I have been checking every second for new data to record.
Each venue where a race session may be active has a separate URL to receive JSON data from. The more venue's I add to my system to monitor the slower the script takes to run.
I've currently 19 venues and the script takes circa 12 seconds to complete. Since I'm running a cron job every minute and looping the script every second. I'm assuming I have at the very least 12 scripts running every second. It just doesn't seem like the most efficient way to do it to me. Of course, it worked a charm back when I was only checking 1 single venue.
There's a cycle to your operations. It is.
start your process by reading the time witn $starttime = time();.
compute the next scheduled time by taking the time plus 60 seconds. $nexttime = $starttime + 60;
do the operations you must do (read a mess of json feeds)
compute how long is left in the minute $timeleft = $nexttime - time();.
sleep until the next scheduled time if ($timeleft > 0) sleep ($timeleft);
set $starttime = $nexttime.
jump back to step 2.
Obviously, if $timeleft is ever negative, you're not keeping up with your measurements. If $timeleft is always negative, you will get further and further behind.
The use of cron every minute is probably wasteful, because it takes resources to fire up a new process and get it going. You probably want to make your process run forever, and use a shell script that monitors it and restarts it if it crashes.
This is all pretty obvious. What's not so obvious is that you should keep track of your individual $timeleft values for each minute over your cycle of measurements. If they vary daily, you should track for a whole day. If they vary weekly you should track for a week.
Then you should should look at the worst (smallest) values of $timeleft. If your 95th percentile is less than about 15 seconds, you're running out of resources and you need to take action. You need a margin like 15 seconds, so your system doesn't move into overload.
If your system has zero tolerance for late sampling of data, you should look at the single worst value of $timeleft, not the 95th percentile. You should give yourself a more generous margin than 15 seconds.
So-called hard real time systems allocate a time slot to each operation, and crash if the operation exceeds the time slot. In your case the time slot is 60 seconds and the operation is reading a certain number of feeds. Crashing is pretty drastic, but measuring is mandatory.
The simplest action to take is to start running multiple worker processes. Give some of your feeds to each process. php runs single-threaded so multiple processes probably will help, at least until you get to three or four of them.
Then you will need to add another computer, and divide your feeds among worker processes on those multiple computers.
A language environment that parses JSON faster than php does might help, but only if the time it takes to parse the JSON is more important than the time it takes to wait for it to arrive.

Running 600+ threads with PHP pthreads - what about the overhead

I have a server with 2 physical CPU which have together 24 cores and 10 GB RAM.
The PHP program is calculating a statistic and I could run each section totally independent of the others. Once all calculations finished I have only to "merge" them.
Therefore I had the idea to perform each calculation phase in a separate thread created/controlled by "pthread".
Each calculation takes around 0,10 seconds but the amount of the calculations lets it take that long when they are serialized.
My questions:
Is there a limitation when creating a new "thread" with "pthreads"?
What is the overhead when creating a new thread? I must consider this to avoid a new delay.
I can imagine that for several seconds the load would be very high but then it ends suddenly once each calculation finished. This is not the problem. It is "my" server and I do not have to take care regarding other users [or when it is a shared server].
While "waiting" for an answer :-) I started to rewrite the class.
I can summarize it like this:
There is no way to start 600 threads at once. I expected it but I wanted to know where is the limit. My configuration "allowed" around 160 threads to be started.
When starting more than these 150 threads the PHP script stopped working without any further notice.
As Franz Gleichmann pointed out the whole process took longer when starting lot of threads. I found out that starting 20 threads has the best performance.
The achieved performance gain is between 20% and 50% - I am satisfied.
I don't know if it is a bug in the pthread library but I could not access any class members. I had to move the class members inside the function. Due to the fact the calculation is in one function it did not bother me and I do not investigate it further.

automating a script in several stages or several times

I need to run a script every night that connects to a Web Service via SOAP with a maximum return of 45000 records.
I can set how many records to return and it seems that the limit is 1000 before I hit the max_execution_time limit.
What would be the best way to automate this script to get all 45000 records? Surely there is a better way than to do 45 cron jobs?
If you want this to run automatically every night, a single cron job is definitely the way to go. There are two basic approaches you could take on this: You could either run a single job that does query after query, pulling x number of records each time, until it pulls all the records, or you could have one that runs over and over again every few minutes, and pulls x numbers of records each time. Both have strengths and drawbacks, but the first option is probably the easier to implement.
To do this, I would recommend that you raise the time limit using set_time_limit(). This should be something very high so that your process will have time to complete, or simply 0 if you have no limit. If you have memory issues as well, then I would pull much less each time. If you say the max is 1000, then consider 500. Have your application loop over and over pulling 500 records at a time until it completes. You may also want to throw in a small delay between each record pull using sleep().
This should help
http://php.net/manual/en/function.set-time-limit.php

Possibilities to speed up PHP-CLI script?

I wrote a PHP-CLI script that mixes two audio (.WAV PCM) files (with some math involved) so PHP needs to crunch through thousands (if not even millions) of samples with unpack(), do math on them and save them with pack().
Now, I dont need actual info on how to do the mixing or anything, as the title says, I'm looking for possibilites to speed this process up since the script needs 30 seconds of processing time to produce 10 seconds of audio output.
Things that I tried:
Cache the audiofiles to memory and crunch through with substr() instead of fseek()/fread(). Performance gain: 3 seconds.
Write the output file in 5000-samples chunks. Performance gain: 10 seconds.
After those optimizations I ended up at approximately 17 seconds processing time for 10 seconds audio output. What bugs me, is that other tools can do simple audio operations like mixing two files in realtime or even much faster.
Another idea I had was paralellization, but I refrained from that due to the extra problems that would occur (like calculating correct seek positions for the forks/threads and other related things).
So am I missing stuff out or is this actually good performance for a PHP-CLI script?
Thanks for everyone's input on this one.
I rewrote the thing in C++ and can now perform the above actions in less than a second.
I'd never have thought that the speed difference is that huge (compiled application is ~40X faster).

PHP and CPU - Process of chat + notifications

My site has a PHP process running, for each window/tab open, that runs in a maximum of 1 minute, and it returns notifications/chat messages/people online or offline. When JavaScript gets the output, it calls the same PHP process again and so on.
This is like Facebook chat.
But, seems it is taking too much CPU when it is running. Have you something in mind how Facebook handles this problem? What do they do so their processes don't take too much CPU and put their servers down?
My process has a "while(true)", with a "sleep(1)" at the end. Inside the cycle, it checks for notifications, checks if one of the current online people got offline/changed status, reads unread messages, etc.
Let me know if you need more info about how my process works.
Does calling other PHPs from "system()" (and wait for its output) alleviate this?
I ask this because it makes other processes to check notifications, and flushes when finished, while the main PHP is just collecting the results.
Thank you.
I think your main problem here is the parallelism. Apache and PHP do not excell at tasks like this where 100+ Users have an open HTTP-Request.
If in your while(true) you spend 0.1 second on CPU-bound workload (checking change status or other useful things) and 1 second on the sleep, this would result in a CPU load of 100% as soon as you have 10 users online in the chat. So in order so serve more users with THIS model of a chat you would have to optimize the workload in your while(true) cycle and/or bring the sleep interval from 1 second to 3 or higher.
I had the same problem in a http-based chat system I wrote many years ago where at some point too many parallel mysql-selects where slowing down the chat, creating havy load on the system.
What I did is implement a fast "ring-buffer" for messages and status information in shared memory (sysv back in the day - today I would probably use APC or memcached). All operations write and read in the buffer and the buffer itself gets periodicaly "flushed" into the database to persist it (but alot less often than once per second per user). If no persistance is needed you can omit a backend of course.
I was able to increase the number of user I could serve by roughly 500% that way.
BUT as soon as you solved this isse you will be faced with another: Available System Memory (100+ apache processes a ~5MB each - fun) and process context switching overhead. The more active processes you have the more your operating system will spend on the overhead involved with assigning "fair enough" CPU-slots AFAIK.
You'll see it is very hard to scale efficently with apache and PHP alone for your usecase. There are open source tools, client and serverbased to help though. One I remember places a server before the apache and queues messages internally while having a very efficent multi-socket communication with javascript clients making real "push" events possible. Unfortunatly I do not remember any names so you'll have to research or hope on the stackoverflow-community to bring in what my brain discarded allready ;)
Edit:
Hi Nuno,
the comment field has too few characters so I reply here.
Lets get to the 10 users in parallel again:
10*0.1 second CPU time per cycle (assumed) is roughly 1s combined CPU-time over a period of 1.1 second (1 second sleep + 0.1 second execute). This 1 / 1.1 which I would boldly round to 100% cpu utilization even though it is "only" %90.9
If there is 10*0.1s CPU time "stretched" over a period of not 1.1 seconds but 3.1 (3 seconds sleep + 0.1 seconds execute) the calculation is 1 / 3.1 = %32
And it is logical. If your checking-cycle queries your backend three times slower you have only a third of the load on your system.
Regarding the shared memory: The name might imply it but if you use good IDs for your cache-areas, like one ID per conversation or user, you will have private areas within the shared memory. Database tables also rely on you providing good IDs to seperate private data from public information so those should be arround allready :)
I would also not "split" any more. The fewer PHP-processes you have to "juggle" in parallel the easier it is for your systems and for you. Unless you see it makes absolutly sense because one type of notification takes alot more querying ressources than another and you want to have different refresh-times or something like that. But even this can be decided in the whyile cycle. users "away"-status could be checked every 30 seconds while the messages he might have written could get checked every 3. No reason to create more cycles. Just different counter variables or using the right divisor in a modulo operation.
The inventor of PHP said that he believes man is too limited to controll parallel processes :)
Edit 2
ok lets build a formula. We have these variables:
duration of execution (e)
duration of sleep (s)
duration of one cycle (C)
number of concurrent users (u)
CPU load (l)
c=e+s
l=ue / c #expresses "how often" the available time-slot c fits into the CPU load generated by 30 CONCURRENT users.
l=ue / (e+s)
for 30 users ASSUMING that you have 0.1s execution time and 1 second sleep
l=30*0.1 / (0.1 + 1)
l=2.73
l= %273 CPU utilization (aka you need 3 cores :P)
exceeding capab. of your CPU measn that cycles will run longer than you intend. the overal response time will increase (and cpu runs hot)
PHP blocks all sleep() and system() calls. What you really need is to research pcntl_fork(). Fortunately, I had these problems over a decade ago and you can look at most of my code.
I had the need for a PHP application that could connect to multiple IRC servers, sit in unlimited IRC chatrooms, moderate, interact with, and receive commands from people. All this and more was done in a process efficient way.
You can check out the entire project at http://sourceforge.net/projects/phpegg/ The code you want is in source/connect.inc.

Categories