I need to move a lot of mysql rows to a couchbase server. The catch is that I need to use a PHP class to do the job (The class has business logic)
I've created a PHP ClI script and ran 6 of them at once. It's faster than running a single CLI script, but not enough. It took me 2 hours to transfer everything.
Are there any better way?
Updated:
What PHP code does with Mysql
select * from table limit $limit
That's about it. Nothing fancy.
Are there any better way?
Yes. There most likely is.
You need to identify the bottleneck. From what you describe it seems the bottleneck is the number of jobs run in parallel. So you should increase that until you find the maximum performance. GNU Parallel can often help you do that.
When you have done that, the bottleneck is somewhere else. But since your description has very little detail, it is impossible to tell where.
You will therefore have to find the new bottleneck. The bottleneck is typically disk I/O, network I/O, or CPU, but can also be a shared lock or other ressource.
To look for a CPU bottleneck run top. If you see a process running at 100% and you cannot parallelize this process, then you have found your bottleneck.
To look for a disk I/O bottleneck run iostat -dkx 1. If the last column hits 100% more than 50% of the time, you have found your bottleneck.
To look for a network I/O bottleneck run iotop. If the bandwidth used is > 70% of the available network bandwidth, you have found your bottleneck.
A special network problem is DNS: This can often be seen as a process that is stuck for 5-10 seconds for no good reason, but which otherwise runs just fine. Use tcpdump -s1000 -n port 53 and see if the questions are being answered quickly. Do this on all machines involved in running the job.
To look for the shared lock is harder. You will first have to find the suspect processes and then you have to strace these.
Related
I am running HTTP API which should be called more than 30,000 time per minute simultaneously.
Currently I can call it 1,200 time per minute. If I call 1200 time per minute, all the request are completed and get response immediately.
But if I called 12,000 time per minute simultaneously it take 10 minute to complete all the request. And during that 10 minute, I cannot browse any webpage on the server. It is very slow
I am running CentOS 7
Server Specification
Intel® Xeon® E5-1650 v3 Hexa-Core Haswell,
RAM 256 GB DDR4 ECC RAM,
Hard Drive2 x 480 GB SSD(Software-RAID 1),
Connection 1 Gbit/s
API- simple php script that echo the time-stamp
echo time();
I check the top command, there is no load in the server
please help me on it
Thanks
Sounds like a congestion problem.
It doesn't matter how quick your script/page handling is, if the next request gets done within the execution time of the previous:
It is going to use resources (cpu, ram, disk, network traffic and connections).
And make everything parallel to it slower.
There are multiple things you could do, but you need to figure out what exactly the problem is for your setup and decide if the measure produces the desired result.
If the core problem is that resources get hogged by parallel processes, you could lower connection limits so more connections go in to wait mode, which keeps more resources available for actually handing out a page instead of congesting everything even more.
Take a look at this:
http://oxpedia.org/wiki/index.php?title=Tune_apache2_for_more_concurrent_connections
If the server accepts connections quicker then it can handle them, you are going to have a problem which ever you change. It should start dropping connections at some point. If you cram down French baguettes down its throat quicker then it can open its mouth, it is going to suffocate either way.
If the system gets overwhelmed at the network side of things (transfer speed limit, maximum possible of concurent connections for the OS etc etc) then you should consider using a load balancer. Only after the loadbalancer confirms the server has the capacity to actually take care of the page request it will send the user further.
This usually works well when you do any kind of processing which slows down page loading (server side code execution, large volumes of data etc).
Optimise performance
There are many ways to execute PHP code on a webserver and I assume you use appache. I am no expert, but there are modes like CGI and FastCGI for example. Which can greatly enhance execution speed. And tweaking settings connected to these can also show you what is happening. It could for example be that you use to little number of PHP threats to handle that number of concurrent connections.
Have a look at something like this for example
http://blog.layershift.com/which-php-mode-apache-vs-cgi-vs-fastcgi/
There is no 'best fit for all' solution here. To fix it, you need to figure out what the bottle neck for the server is. And act accordingly.
12000 Calls per minute == 200 calls a second.
You could limit your test case to a multitude of those 200 and increase/decrease it while changing settings. Your goal is to dish that number of requestst out in a shortest amount of time as possible, thus ensuring the congestion never occurs.
That said: consequences.
When you are going to implement changes to optimise the maximum number of page loads you want to achieve you are inadvertently going to introduce other conditions. For example if maximum ram usage by Apache would be the problem, the upping that limit will ensure better performance, but heightens the chance the OS runs out of memory when other processes also want to claim more memory.
Adding a load balancer adds another possible layer of failure and possible slow downs. Yes you prevent congestion, but is it worth the slow down caused by the rerouting?
Upping performance will increase the load on the system, making it possible to accept more concurrent connections. So somewhere along the line a different bottle neck will pop up. High traffic on different processes could always end in said process crashing. Apache is a very well build web server, so it should in theories protect you against said problem, however tweaking settings wrongly could still cause crashes.
So experiment with care and test before you use it live.
I'm using redis (2.6.8) with php-fpm and phpredis driver and have some trouble with redis latency issues. Under certain load first request to redis from our application takes about 1-1.5s and redis-cli --latency shows the same latency.
I've already checked the latency guide.
We use redis on the same host with Unix sockets
slowlog has no entries longer 5ms
we don't use AOF
redis takes about 3.5Gb memory of 16Gb available (i suppose it's not too much)
our system is not swapping
there is no other process doing disk I/O
I'm using persistent connections and amount of connected client is varying from 5 to 25 (sometimes strikes to 60-80).
Here is the graph.
It looks like problems starts when there are 20 or more simultaneously connected clients.
Can you help me to figure out where is the problem?
Update
I investigated the problem and it seemed like redis did not have enough processor time for some reason to operate properly.
I thoroughly checked communication between php-fpm and redis with the help of network sniffer. Redis received request over tcp but sent the answer back only after one and a half seconds. It obviously signified that the problem is inside redis, that it cannot process so many requests in the given conditions (possibly processor starvation as the processor was only 50% loaded for the whole system).
The problem was resolved by moving redis to other server that was nearly idle. I suppose that we should have played with linux scheduler to make it work on the same server, but have not done it yet.
Bear in mind that Redis is single-threaded. If the operations that you're doing err on the processor-intensive side, your requests could be blocking on each other. For instance, if you're doing HVALS against hashes with very large values, you're going to make all of your clients wait while you pull out all that data and copy it to the output buffer.
Part of what you need to do here (regardless if this is the issue) is to look at all of the commands that you're using and determine the complexity of each command. If you're doing a bunch of O(N) commands against very large amounts of data, it's not impossible that you're simply doing too much stuff at a time.
TL;DR Nobody on here can debug this issue with real certainty without knowing which commands you're using and what your data looks like. But you can look up the time complexity of each method you're using and make sure it's reasonable.
I ran across this in researching an issue I'm working on but thought it might help here:
https://groups.google.com/forum/#!topic/redis-db/uZaXHZUl0NA
If you read through the thread there is some interesting info.
I have a script which runs 1000 cURL requests using curl_multi_* functions in PHP.
What is the bottleneck behind them timing out?
Would it be the CPU usage? Is there some more efficient way, in terms of how that number of outbound connections is handled by the server, to do this?
I cannot change the functionality and the requests themselves are simple calls to a remote API. I am just wondering what the limit is - would I need to increase memory on the server, or Apache connections, or CPU? (Or something else I have missed)
Your requests are made in a single thread of execution. The bottleneck is almost certainly CPU, have you ever actually watched curl multi code run ? ... it is incredibly cpu hungry; because you don't really have enough control over dealing with the requests. curl_multi makes it possible for you to orchestrate 1000 requests at once, but this doesn't make it a good idea. You have almost no chance of using curl_multi efficiently because you cannot control the flow of execution finely enough, just servicing the sockets and select()'ing on them will account for a lot of the high CPU usage you would see watching your code run on the command line.
The reasons the CPU usage is high during such tasks is this; PHP is designed to run for a fraction of a second, do everything as fast as it can. It usually does not matter how the CPU is utilized, because it's for such a short space of time. When you prolong a task like this the problem becomes more apparent, the overhead incurred with every opcode becomes visible to the programmer.
I'm aware you have said you cannot change the implementation, but still, for a complete answer. Such a task is far more suitable for Threading than curl multi, and you should start reading http://php.net/pthreads, starting with http://php.net/Thread
Left to their own devices on an idle CPU even 1000 threads would consume as much CPU as curl_multi, the point is that you can control precisely the code responsible for downloading every byte of response and upload every byte of the request, and if CPU usage is a concern you can implement a "nice" process by explicitly calling usleep, or limiting connection usage in a meaningful way, additionally your requests can be serviced in separate threads.
I do not suggest that 1000 threads is the thing to do, it is more than likely not. The thing to do would be design a Stackable ( see the documentation ) whose job is to make and service a request in a "nice", efficient way, and design pools ( see examples on github/pecl extension sources ) of workers to execute your newly designed requests ...
I have a PHP class that selects data about a file from a MySQL database, processes that data in PHP and then outputs the final data to the command line. Then it moves onto the next file within a foreach loop. ( later I'll be inserting this data into another table ... but that's not important now )
I want to make the processing as fast as possible.
When I run the script and monitor my system using top or iostat:
my cpus are never less than 65% idle ( 4 core EC2 instance )
the PHP script sits at about 45%
mysqld sits at about 8%
my memory usage never passes ~1.5GB ( 8GB of ram total )
there is very little disk IO
What other bottlenecks could be preventing this process from running faster and using the available CPU and Memory?
EDIT 1:
This does not need to be a procedural process and I've designed it to parallelize the processing if necessary. If I can speed it up some, it'd be simpler to leave it as procedural processing.
I've monitored the disk I/O using iostat -x 1 and there is very little.
I need to speed this up in general because it will ultimately be used to process hundreds of millions of files and I'd like it to be as fast as possible as it's part of a larger processing step.
Well, it may be because a single PHP process can only run on one core at a time and you're not loading up your system to the point where it will have four concurrent jobs running continuously.
Example: if PHP were the only thing running on that box, it was inherently tied to a single core per "job" and only one request at a time were being made, I'd fully expect a CPU load of around 25% despite the fact it's already going as fast as it possibly can.
Of course, once that system started ramping up to the point where there are continuously four PHP scripts running, you may find higher CPU utilisation.
In my opinion, you should only really worry about a performance problem if it's an actual problem (such as not being able to keep up with incoming requests). Optimisation just because you want it using more CPU and/or memory resources seems to be looking at it the wrong way around. I would just get it running as fast as possible without worrying about the actual resources used.
If you want to process hundreds of millions of files as fast as possible (as per your update) and PHP is core-bound, you should think about horizontal scaling.
In other words, if the processing of a single file is independent, you can simply start two or three PHP processes and have them process one file each. That will be more likely to get them running on distinct cores.
You can even scale across physical machines if necessary though that's likely to introduce network latency on the DB access (unless the DB is replicated across all the machines as well).
Without a fair bit more detail, the options I can provide will be mostly generic ones.
The first problem you need to fix is the word "bottleneck", because it means everything and nothing.
It conjurs this image of some sort of constriction in the flow of whatever the machine seems to do which is so fast it must be like water running through pipes.
Computation isn't like that.
I find it helps to see how a very simple, slow, computer works, namely Harry Porter's Relay Computer.
You can watch it chug along, at a very slow clock rate, executing every little step within each instruction and finishing them before it starts the next.
(Now, obviously, machines these days are multi-core, pipelined, multi-level cache, blah blah. That's all fine, but that makes you think computation is like water flowing, and that prevents you from understanding software performance.)
Think of any computer and software as just like in that relay machine, except on a scale of nanoseconds, not seconds.
When a computer is calculating in a program, it is executing instructions one after the other. Call that "X".
When a program wants to read or write some bits to external hardware, it has to request that hardware to start, and then it has to find a way to kill time until the result is ready.
Call that "y".
It could be an idle loop, or letting another "thread" run, etc.
So the execution of a program looks like
XXXXXyyyyyyyXXXXXXXXyyyyyyy
If there are more "y"s in there than "X"s we tend to call it "I/O bound".
If not, we might call it "compute bound".
Either way, it's just a matter of proportion of time spent.
If you say it's "memory bound", that's just like I/O except it could be different external hardware.
It still occupies some fraction of the overall sequential timeline.
Now for any given task, there are infinitely many programs that could be written to do it. Some of them will get done in fewer steps than all the others.
When you want performance, you want to get as close as possible to writing one of those programs.
One way to do it is to find "X"s and "y"s that you can get rid of, and get rid of as many as possible.
Now, within a single thread, if you pick an "X" or "y" at random, how can you tell if you can get rid of it?
Find out what it's purpose is!
That "X" or "y" represents a moment in the execution sequence of the program, and if you look at the state of the program at that time, and look at the source code, you will be able to figure out why that moment is being spent.
Do that a few times.
As soon as you see two moments in time having a similar less-than-absolutely-necessary purpose,
there are probably a lot more like them, and you've found something you can get rid of.
If you do so, the program will no longer be spending that time.
That's the basic idea behind this method of performance tuning.
Here's an example where that method was used, over several iterations, to remove over 97% of the time spent in a program.
Not all programs are that far away from optimal.
(Some are much farther.)
Many programs just have to do a certain amount of "X"s or "y"s, and there's no way around it.
Nevertheless, it is often very surprising how much room you can find for speedup in otherwise perfectly good code - provided - you forget about "bottlenecks" and look for steps that it's doing, over time, that could be removed or done better.
It's easy.
I suspect you're spending most of your time communicating with MySQL and reading the files. How are you determining that there's very little IO? Communicating with MySQL is going to be over the network, which is very slow compared to direct memory access. Same with reading files.
Looks like CPU is your bottleneck. Or to be more precise a single core is your bottle neck.
100% utilisation of a single core will result in a "25% CPU utilisation" if the other three cores are idle.
Your numbers are consistent with a php script running at 100% on a single core, with 5 to 10% utilization on the other three cores.
Sorry to resurrect an old thread, but thought this might help someone out.
I had a similar problem and it had to do with a command line script that was throwing numerous 'Notice' warnings. That somehow led to it performing slowly and using less than 10% of the cpu. This behavior only showed up on migrating from MacOS X to Ubuntu, as the default in OSX seems to be to suppress the wornings. Once I fixed the offending code it performed much better, with processes using around 100% cpu consistently.
As the other guy said, sorry to resurrect an old thread, but this may help somebody.
I had the same issue: running a bunch of processes in parallel, all using MySQL. The machine was slow with no identifiable bottlenecks: cpu, memory nor disk.
It turns out that the most probable cause of my problems was that MySQL internal threads were hung on the same semaphore most of the time. Switching from vanilla MySQL 5.5 to MariaDB 10.0 fixed the problem.
Also, to ensure that my machine is always running at full capacity while not being flooded, I have created a Perl script raspawn.pl (on GitHub).
You can read the full sad story here.
I have two test computers networked together.
One has a gigabit Ethernet, the other a 10 megabit.
Theoretically, data transferred between the two should reach about 1 megabytes p/s.
Now I'm using a PHP script to read data from one host to another using fread. Both reading file and file to be read are chmod 777.
Both computers are running wampserver and both have zonealarm and avast installed and running. Zonealarm is configured to recognise both computers as trusted parts of the network.
Using the time() function to work out the time it takes for the script to read a file on the other comp. The file im reading is 10 megabytes. It should take just over 10 seconds. Yet it takes around 30 seconds. Average 300kbs.
So where is the bottleneck in my setup?
One comp is Vista, other is XP if that matters.
Just because your network speed is 10Mb/sec doesn't mean that the application layer gets that. There is TCP/IP header information (~64 bytes per 1500 byte packet), time processing the buffers in the kernel, time spent doing buffer transfers to/from the LAN controller chip, etc.
I assume when you said you're getting 300kbs you really mean 3mbs, right?
While there's a lot of guesses we can take, this probably belongs on serverfault as you are not asking programming wise what the issue is, and honestly, even there, this will take a lot of trial and error. Not really suitable for question/answer.
Open up the task manager (ctrl+alt+delete, task manager), then switch to the second tab (or the third?) and watch the CPU and network usage as you run the test. If the CPU usage is at 100%, that may be the bottleneck. Check the network usage too to see if there is any overhead you don't expect.
That's where I'd start.