I need to move a lot of mysql rows to a couchbase server. The catch is that I need to use a PHP class to do the job (The class has business logic)
I've created a PHP ClI script and ran 6 of them at once. It's faster than running a single CLI script, but not enough. It took me 2 hours to transfer everything.
Are there any better way?
Updated:
What PHP code does with Mysql
select * from table limit $limit
That's about it. Nothing fancy.
Are there any better way?
Yes. There most likely is.
You need to identify the bottleneck. From what you describe it seems the bottleneck is the number of jobs run in parallel. So you should increase that until you find the maximum performance. GNU Parallel can often help you do that.
When you have done that, the bottleneck is somewhere else. But since your description has very little detail, it is impossible to tell where.
You will therefore have to find the new bottleneck. The bottleneck is typically disk I/O, network I/O, or CPU, but can also be a shared lock or other ressource.
To look for a CPU bottleneck run top. If you see a process running at 100% and you cannot parallelize this process, then you have found your bottleneck.
To look for a disk I/O bottleneck run iostat -dkx 1. If the last column hits 100% more than 50% of the time, you have found your bottleneck.
To look for a network I/O bottleneck run iotop. If the bandwidth used is > 70% of the available network bandwidth, you have found your bottleneck.
A special network problem is DNS: This can often be seen as a process that is stuck for 5-10 seconds for no good reason, but which otherwise runs just fine. Use tcpdump -s1000 -n port 53 and see if the questions are being answered quickly. Do this on all machines involved in running the job.
To look for the shared lock is harder. You will first have to find the suspect processes and then you have to strace these.
Hi guys I have a question about server's RAM and PHP/MySQL/Jquery script.
Can scripts kills RAM when script doesn't take extra RAM? (I know it could happen when RAM grow up to maximum or because of memory limit. But it isn't this case.)
I'm testing script but everytime when I do that RAM goes quickly down.
Script doesn't show error for memory limit and it's correctly loading all data. When I don't test script RAM is still down.
In database is a couple records - maybe 350 records in 9 tables (the bigges tables has 147 records).
(I haven't any logs just simply (really simple) graph for running server.)
Thank for your time.
If you're not getting errors in your PHP error log about failing to allocate memory, and you're not seeing other problems with your server running out of RAM (such as extreme performance degradation due to memory pages being written to disk for demand paging) you probably don't need to really worry about it. Any use case where a web server uses up that much memory in a single request is going to be pretty rare.
As for trying to profile the actual memory usage, trying to profile it by watching something like the task manager is going to be pretty unreliable. Most PHP scripts are going to complete in milliseconds, which isn't enough time for the memory allocations to really even register in the task manager.
Even if you have a more reliable method of profiling the memory usage (I don't recall if PHP has built in functions for this, but probably does), bear in mind that memory usage is going to flucuate tremendously for reasons that may be hard to understand. PHP in particular is very high level: you can open a database connection, which involves everything down to the OS opening network sockets, creating internal datastructures, caching things, and much more all in a single line of code. The script may allocate many megabytes of memory for such a thing for a single database row, but may then deallocate it a millisecond later.
Those database sizes are pretty neglibible. Depending on the row sizes it's possibly under a megabyte of data which is a tiny drop in a bucket for memory on anything remotely modern. Don't worry about memory usage for something like that. Only if you see your scripts failing and your error log reports running out of memory should you really worry about it.
First I want to say that I'm using Drupal as CMS and I know that there is separate Drupal stackexchange site. But my problem is not Drupal specific, it's not in User or Advanced User level. It's PHP and Server related. OK now problem.
I have developed website which is not launched yet. Am getting out of memory errors random times. And sometimes server gets crashed. Helps rebooting. There is no other people using App so no heavy load. Particulary am exceeding privmmpages limit. I have tried some general things - increasing/decreasing PHP memory limit, looking in error logs, logging slow MySQL queries. Nothing... Same.
I have ran 'top' linux command. There is 4-5 apache processes depending on browser requests. Which MEM usage(%) are 10, 5, 4, 3, 0.5. two processes are running >10hr.
After restarting apache I got +40% free memory.
Here some questions and mysts for me.
Why that two processes are running so long when there is no active request from browser? And how can I prevent them?
Why I got +40% free memory after restarting when I had 10+5+4+3+0.5 memory used by apache? This should not be equal?
Can this be a memory leak? How can I detect them?
What techniques I should use to step down from higher levels to low levels? Imagine I have 'memory leak' in one of my function, how should I get him in whole application?
How can I benchmark my particulary functions for memory and CPU usage?
Why server is crashing? Even basic httpd restart is returning "fork: Cannot allocate memory". Can this be a symptom of memory leak?
Please answer point by point.
Sounds like you may have an infinite loop somewhere or your not releasing resources when dealing with things such as GD.
Linux keeps things in RAM while there is free ram, if there is a sudden need from another process for RAM, and the ram is not being held in use, Linux will free/swap it for the application in need. Check the output of "free" and you will notice a cached column that indicates how much is just cached and can be released at any time.
I have a PHP class that selects data about a file from a MySQL database, processes that data in PHP and then outputs the final data to the command line. Then it moves onto the next file within a foreach loop. ( later I'll be inserting this data into another table ... but that's not important now )
I want to make the processing as fast as possible.
When I run the script and monitor my system using top or iostat:
my cpus are never less than 65% idle ( 4 core EC2 instance )
the PHP script sits at about 45%
mysqld sits at about 8%
my memory usage never passes ~1.5GB ( 8GB of ram total )
there is very little disk IO
What other bottlenecks could be preventing this process from running faster and using the available CPU and Memory?
EDIT 1:
This does not need to be a procedural process and I've designed it to parallelize the processing if necessary. If I can speed it up some, it'd be simpler to leave it as procedural processing.
I've monitored the disk I/O using iostat -x 1 and there is very little.
I need to speed this up in general because it will ultimately be used to process hundreds of millions of files and I'd like it to be as fast as possible as it's part of a larger processing step.
Well, it may be because a single PHP process can only run on one core at a time and you're not loading up your system to the point where it will have four concurrent jobs running continuously.
Example: if PHP were the only thing running on that box, it was inherently tied to a single core per "job" and only one request at a time were being made, I'd fully expect a CPU load of around 25% despite the fact it's already going as fast as it possibly can.
Of course, once that system started ramping up to the point where there are continuously four PHP scripts running, you may find higher CPU utilisation.
In my opinion, you should only really worry about a performance problem if it's an actual problem (such as not being able to keep up with incoming requests). Optimisation just because you want it using more CPU and/or memory resources seems to be looking at it the wrong way around. I would just get it running as fast as possible without worrying about the actual resources used.
If you want to process hundreds of millions of files as fast as possible (as per your update) and PHP is core-bound, you should think about horizontal scaling.
In other words, if the processing of a single file is independent, you can simply start two or three PHP processes and have them process one file each. That will be more likely to get them running on distinct cores.
You can even scale across physical machines if necessary though that's likely to introduce network latency on the DB access (unless the DB is replicated across all the machines as well).
Without a fair bit more detail, the options I can provide will be mostly generic ones.
The first problem you need to fix is the word "bottleneck", because it means everything and nothing.
It conjurs this image of some sort of constriction in the flow of whatever the machine seems to do which is so fast it must be like water running through pipes.
Computation isn't like that.
I find it helps to see how a very simple, slow, computer works, namely Harry Porter's Relay Computer.
You can watch it chug along, at a very slow clock rate, executing every little step within each instruction and finishing them before it starts the next.
(Now, obviously, machines these days are multi-core, pipelined, multi-level cache, blah blah. That's all fine, but that makes you think computation is like water flowing, and that prevents you from understanding software performance.)
Think of any computer and software as just like in that relay machine, except on a scale of nanoseconds, not seconds.
When a computer is calculating in a program, it is executing instructions one after the other. Call that "X".
When a program wants to read or write some bits to external hardware, it has to request that hardware to start, and then it has to find a way to kill time until the result is ready.
Call that "y".
It could be an idle loop, or letting another "thread" run, etc.
So the execution of a program looks like
XXXXXyyyyyyyXXXXXXXXyyyyyyy
If there are more "y"s in there than "X"s we tend to call it "I/O bound".
If not, we might call it "compute bound".
Either way, it's just a matter of proportion of time spent.
If you say it's "memory bound", that's just like I/O except it could be different external hardware.
It still occupies some fraction of the overall sequential timeline.
Now for any given task, there are infinitely many programs that could be written to do it. Some of them will get done in fewer steps than all the others.
When you want performance, you want to get as close as possible to writing one of those programs.
One way to do it is to find "X"s and "y"s that you can get rid of, and get rid of as many as possible.
Now, within a single thread, if you pick an "X" or "y" at random, how can you tell if you can get rid of it?
Find out what it's purpose is!
That "X" or "y" represents a moment in the execution sequence of the program, and if you look at the state of the program at that time, and look at the source code, you will be able to figure out why that moment is being spent.
Do that a few times.
As soon as you see two moments in time having a similar less-than-absolutely-necessary purpose,
there are probably a lot more like them, and you've found something you can get rid of.
If you do so, the program will no longer be spending that time.
That's the basic idea behind this method of performance tuning.
Here's an example where that method was used, over several iterations, to remove over 97% of the time spent in a program.
Not all programs are that far away from optimal.
(Some are much farther.)
Many programs just have to do a certain amount of "X"s or "y"s, and there's no way around it.
Nevertheless, it is often very surprising how much room you can find for speedup in otherwise perfectly good code - provided - you forget about "bottlenecks" and look for steps that it's doing, over time, that could be removed or done better.
It's easy.
I suspect you're spending most of your time communicating with MySQL and reading the files. How are you determining that there's very little IO? Communicating with MySQL is going to be over the network, which is very slow compared to direct memory access. Same with reading files.
Looks like CPU is your bottleneck. Or to be more precise a single core is your bottle neck.
100% utilisation of a single core will result in a "25% CPU utilisation" if the other three cores are idle.
Your numbers are consistent with a php script running at 100% on a single core, with 5 to 10% utilization on the other three cores.
Sorry to resurrect an old thread, but thought this might help someone out.
I had a similar problem and it had to do with a command line script that was throwing numerous 'Notice' warnings. That somehow led to it performing slowly and using less than 10% of the cpu. This behavior only showed up on migrating from MacOS X to Ubuntu, as the default in OSX seems to be to suppress the wornings. Once I fixed the offending code it performed much better, with processes using around 100% cpu consistently.
As the other guy said, sorry to resurrect an old thread, but this may help somebody.
I had the same issue: running a bunch of processes in parallel, all using MySQL. The machine was slow with no identifiable bottlenecks: cpu, memory nor disk.
It turns out that the most probable cause of my problems was that MySQL internal threads were hung on the same semaphore most of the time. Switching from vanilla MySQL 5.5 to MariaDB 10.0 fixed the problem.
Also, to ensure that my machine is always running at full capacity while not being flooded, I have created a Perl script raspawn.pl (on GitHub).
You can read the full sad story here.
I set memory_limit to -1 . Still i am getting out of memory issues.
I am working with a legacy system, which is poorly coded ( :) ). I ran apache benchmark to check the concurrent user access to the system
ab -n2000 -c100 http://......com/
In the log file i see so many memory related issues.
In the code they use object buffering. This can be the issue ?. Is object buffering is related to memory_limit ?
Changing the memory limit on PHP stops it being killed when it goes past a certain value. However, it does NOT physically give your hardware more memory (or swap). Ultimately, if it needs memory which you don't physically have then things will break.
Object buffering in PHP : I don't know what it means, if you mean Output buffering with ob_start and ob_stop it is not related to object buffering and has not really an impact on memory usage of PHP.
Memory usage of PHP depends on the size of created objects while you build the response of the request. If you perform several times the same request the memory usage of each php execution should be the same.
With a 'no limit' on memory usage the only thing you do is avoiding a request crash because of too much memory usage. That mean if your problem is memory usage on your index page you can easily test it by setting some values in this setting, and decrease until it crash (64Mo, 32Mo, 16Mo, 8Mo, etc). You do not need ab for that.
Now, when you're using ab you make your apache server respond to several parallel requests. For each PHP request you have a new apache process created. And this new apache process will execute an independant PHP-thing, and it will take the same amount of memory as the others process doing the same thing (as you request the same page, and nothing is shared between different PHP execution, and each PHP execution is done in one apache process).
I assume you're using apache with mpm_prefork and mod_php, not any php-fpm or fastcgi php.
So If you have a memory problem in that situation it's maybe that you allow too much process for apache. By default it's 150, if each process takes 30Mb of RAM (check that with top) then it makes 30*150=4.3Go. See the problem?
3 easy solutions
decrease the number of apache process (MaxClients), and set the MinSpareServer, MaxSpareServer and StartServer to that same amount, you wont loose time creating and destroying apache processes.
limit the PHP application memory usage, then you'll be able to handle more process (well, not so easy, can be a long rewrite)
use APC, it decrease the memory usage (and speed up execution)
and after that the other solutions are more complex
use an apache in worker mode or nginx, and get php out of the webserver with php-fpm
use a proxy cache like varnish to catch requests that can be cached (pseudo static content), and avoid requesting apache & PHP too much.