I am using gearman workers in my symfony app. The workers use the symfony services like doctrine and others. I can run multiple workers simultaneously. I want to know how doctrine handles db queries when it gets more than one query at a time(each worker is doing one query at the same time). Also is it possible to have different connections to db so that my gearman workers can use each connection for certain purpose like one connection to read from db and one connection to write to db ?
thanks
Each of your workers is its own process, with its own EntityManager, which each have their own connections. So, by default, you'll have one-connection-per-worker.
As far as "more than one query at a time" this is just like what happens with web-bound processes. From the database's perspective, it's exactly the same. Multiple simultaneous connections executing queries.
The tricky bits with workers is that they tend to live a lot longer than a web-bound process (which is reinitialized for each HTTP request that comes in). You need to be particularly careful that workers that sit idle may have their connections time out, so when they eventually pick a up a job, they explode.
Related
The use case:
8 servers of 300 php-fpm concurrent child process each, produces records to Apache Kafka.
Each one produces 1 Kafka record, 1000 records per second.
Why do we need so many connections?
We have a web API, that is getting 60K calls per minute. Those requests are doing many things and processed via thousands of web php-fpm workers (unfortunately). As part of the request handling, we produce events to Kafka.
The problem:
I cannot find a way to persist connections between php-fpm web requests, creating some that seems to me so inefficient as might hit Kafka bounderies(will it?).
The result is 1000 producer connections per second getting established, sending one single record ach and getting closed just after.
I read here https://www.reddit.com/r/PHP/comments/648zrk/kafka_php_71_library/ that php-rdkafka is efficient, but I dont know if it can solve this issue.
I thought that Opcache might be handy to reuse connections but I cannot find a way to do it.
The question
Is establishing and closing 1000 connections per second fast and cheap or is a proxy with cheap connectivity that will reuse Kafka connection a must for such a use case?
I'm using Gearman in a custom Joomla application and using Gearman UI to track active workers and jobs number.
I'm facing an issue with MYSQL load and number of connections, I'm unable to track the issue, But I've few questions that might help me.
1- Does Gearman Workers launch a new database connection for each job or do they share the same connection?
2- If Gearman launches a new connection everytime a job runs how can I change that to make all jobs share same connection?
3- How can I balance the load between more than one server?
4- Is there is something like "Pay-as-you-go" package for MYSQL hosting? if yes, Please mention them.
Thanks a lot!
This is often an overlooked issue when using any kind of a job queue with workers. 100 workers will open a separate database connection each (they are separate PHP processes). If MySQL is configured allow 50 connections, workers will start failing. To answer your questions:
1) Each worker runs inside one PHP process each, and that process will open 1 database connection. Workers do not share database connections.
2) If only one worker is processing jobs, then only one database connection will be opened. If you have 50 workers running, expect 50 database connections. Since these are not web requests, persistent connections will not work, sharing will not work.
3) You can balance the load by adding READ slaves, and using a MySQL proxy to distribute the load.
4) I've never seen a pay-as-you-go MySQL hosting solution. Ask your provider to increase your number of connections. If they won't, it might be time to run your own server.
Also, the gearman server process itself will only use one database connection to maintain the queue (if you have enabled mysql storage).
Strategies you can use to try and make your worker code play nicely with the database:
After each job, terminate the worker and start it up again. Don't open a new database connection until a new job is received. Use supervisor to keep your workers running all the time.
Close database connections after every query. if you see a lot of connections open in a 'sleep' state, this will help clean them up and keep database connections low. Try $pdo = null; after each query (if you use PDO).
Cache frequently used queries where the result doesn't change, to keep database connections low.
Ensure your tables are properly indexed so queries run as fast as possible.
Ensure database exceptions are caught in a try/catch block. Add retry logic (while loop), where the worker will fail gracefully after say, 10 attempts. Make sure the job is put back on the queue after a failure.
I think the most important thing to look at, before anything else, is the MySQL load. It might be that you have some really heavy queries that are causing this mess. Have you checked the MySQL slow query log? If yes, what did you find? Note that any query that takes more than a second to execute is a slow query.
I need to write a socket server that will be handling at least something about 1000 (much more in future) low-traffic permanent connections. I have made a draft version on PHP for testing purposes (we are developing a monitoring hardware, so we needed to develop and test a conversation protocol and hardware capabilities), which suited me very well when i had just a couple of clients connected. But when the amount of connections had grown to ten, some critical issues appeared. Here some info about server architecture:
I have a master process, which waits for socket connections and on connecting creates a child process (that serves this connection from now on) using pcntl_fork(). Also i am setting up a PDO connection to MySQL in master process. All the child processes are sharing the same single PDO object. At first i was afraid of getting some collisions during simultaneous queries, but i haven't encountered them, even through stress-test (10 children were making queries in the loop without stopping). But there is usleep(500000) in each child, so it could be luck, though i had this testing running for a couple of hours. But such load should not be present even at 1k clients connected, due to rare conversations between them and server.
So here is my first question: is it safe to use single PDO object for a big amount of child processes (ideally there would be around 1000)? I can use single connection for each child, but MySQL doesn't support nearly as much connections.
The second issue is in getting parasite MySQL connections. As i mentioned before, i have only one PDO object. But when i have more than one clients connected, and after they had run some queries, i see in mytop that there is more than one DB connection, and i could not find any correlation between the amount of connections and amount of child processes i have. For example i have 3 childs, and 5 DB connections. I tried to establish persistent connections, and it didn't changed anything.
Second question: Is it PDO who makes those additional connecitons to MySQL, or it is the MySQL driver? And is there a way to force them to use one connection? I don't think it could be my fault, my code prints an alert to console every time i call method which creates PDO object, and that happens only once, at the script start, before forking. After that i only run querys out of children, using parent's PDO object. Once again i can not afford to have so many connections due to MySQL limitations.
Third question: Will be one thousand of socket connections a problem by itself? Aside of the CPU and database load, i mean. Or i should do some amount of lesser servers (128 connections for example), that will tell the clients to connect to other one if max number of connects is exceeded?
Thanks in advance for your time and possible answers.
Currently your primary concern should be your socket server architecture. Forking a process for each client is super heavy. AFAIK an average PC can tolerate around 2000 threads and it's not going to work fast. Switching between processes means that CPU should save its state in memory, and if you have enormous amount of processes, CPU will be busy with memory IO and will have little time for actually doing stuff.
You may want to look at Apache for inspiration. In Apache they use a fixed amount of worker processes/threads, each process/thread working with multiple clients via select function and sockets in a non-blocking mode. This is a far more robust approach.
Regarding database IO, I would spawn a process/thread that would be the sole owner of database connections. Worker processes would communicate with the DB IO process using IPC (in case of processes) or lock-free queues (in case of threads). This approach makes you independent of PDO implementation details (if it is thread safe or does it spawn connections etc).
P. S. I suspect that you actually spawn new PDO objects with forking (forking merely means making a copy of a process with its memory and everything inside it) and PDO objects create and shut down connections on demand. It may explain why you're not seeing correlation between low traffic clients and DB connections.
I'm going to be using Nodejs to process some CPU intense loop operations with sending emails to registered users as PHP was using too much during the time it runs and freezes the site.
One thing is that Nodejs will be on different server and do a request using external connection in MySQL.
I've heard that external db connection is bad for performance.
Is this true? And are there any pros and cons of doing this?
Keep in mind, when running a CPU intensive operation in Node the whole application blocks as it runs in a single thread. If you're going to run a CPU intensive operation in Node, make sure you spawn it off into a child process who's only job is to run the calculation and then return to the primary application. This will ensure your Node app is able to continue responding to income requests as the data is being processed.
Now, onto your question. Having the database on a different server is extremely common and typically is a good practice to have. Where you can run into performance problems is if your database is in a different data center entirely. The further (physically) your database server is from your application server, the more latency there will be per request.
If these requests are seriously CPU intensive, you should consider looking into a queueing mechanism for a couple reasons. One, it ensures that even in the event of an application crash, you don't lose a request that is being processed. Two, you can monitor the queue, and scale the number of workers processing the queue in the event that the operations are piling to the point that a single application can't finish processing one before another comes in.
I am trying to write a client-server app.
Basically, there is a Master program that needs to maintain a MySQL database that keeps track of the processing done on the server-side,
and a Slave program that queries the database to see what to do for keeping in sync with the Master. There can be many slaves at the same time.
All the programs must be able to run from anywhere in the world.
For now, I have tried setting up a MySQL database on a shared hosting server as where the DB is hosted
and made C++ programs for the master and slave that use CURL library to make request to a php file (ex.: www.myserver.com/check.php) located on my hosting server.
The master program calls the URL every second and some PHP code is executed to keep the database up to date. I did a test with a single slave program that calls the URL every second also and execute PHP code that queries the database.
With that setup however, my web hoster suspended my account and told me that I was 'using too much CPU resources' and I that would need to use a dedicated server (200$ per month rather than 10$) from their analysis of the CPU resources that were needed. And that was with one Master and only one Slave, so no more than 5-6 MySql queries per second. What would it be with 10 slaves then..?
Am I missing something?
Would there be a better setup than what I was planning to use in order to achieve the syncing mechanism that I need between two and more far apart programs?
I would use Google App Engine for storing the data. You can read about free quotas and pricing here.
I think the syncing approach you are taking is probably fine.
The more significant question you need to ask yourself is, what is the maximum acceptable time between sync's that is acceptable? If you truly need to have virtually realtime syncing happening between two databases on opposite sites of the world, then you will be using significant bandwidth and you will unfortunately have to pay for it, as your host pointed out.
Figure out what is acceptable to you in terms of time. Is it okay for the databases to only sync once a minute? Once every 5 minutes?
Also, when running sync's like this in rapid succession, it is important to make sure you are not overlapping your syncs: Before a sync happens, test to see if a sync is already in process and has not finished yet. If a sync is still happening, then don't start another. If there is not a sync happening, then do one. This will prevent a lot of unnecessary overhead and sync's happening on top of eachother.
Are you using a shared web host? What you are doing sounds like excessive use for a shared (cPanel-type) host - use a VPS instead. You can get an unmanaged VPS with 512M for 10-20USD pcm depending on spec.
Edit: if your bottleneck is CPU rather than bandwidth, have you tried bundling up updates inside a transaction? Let us say you are getting 10 updates per second, and you decide you are happy with a propagation delay of 2 seconds. Rather than opening a connection and a transaction for 20 statements, bundle them together in a single transaction that executes every two seconds. That would substantially reduce your CPU usage.