I wanted to know if I forked a processes after I bonded the IP and port of the server. Will the fork be able to accept the connection. To extend on that if I have 10 forks running all trying to accept is there a chance that more then 1 could accept the same connection or is there some locking on that?
A few days ago I felt like writing writing a http server in php. So it can handle more then one connection at a time. The master processes accepted the connection reads the data in and passes it to a thread via a unix socket. So far on my laptop i can get 1000 connections a second on a little page that gives the current date and time. One of the bottle necks is the master processes. Originally i would have loved to gotten the file descriptors of the connections and passed those to the sockets and have them read the data in then processes it.
Yes, the forked children will be able to accept new connections on the same (inherited) listening socket.
Assuming you're using a blocking socket_accept() in all your child processes, you shouldn't experience any performance issues even if it ramps up to 100 processes; the operating system will wake up one child process to handle the connection.
It should be mentioned that it's good practice to benchmark it, using ab or similar load generator tools.
Related
The use case:
8 servers of 300 php-fpm concurrent child process each, produces records to Apache Kafka.
Each one produces 1 Kafka record, 1000 records per second.
Why do we need so many connections?
We have a web API, that is getting 60K calls per minute. Those requests are doing many things and processed via thousands of web php-fpm workers (unfortunately). As part of the request handling, we produce events to Kafka.
The problem:
I cannot find a way to persist connections between php-fpm web requests, creating some that seems to me so inefficient as might hit Kafka bounderies(will it?).
The result is 1000 producer connections per second getting established, sending one single record ach and getting closed just after.
I read here https://www.reddit.com/r/PHP/comments/648zrk/kafka_php_71_library/ that php-rdkafka is efficient, but I dont know if it can solve this issue.
I thought that Opcache might be handy to reuse connections but I cannot find a way to do it.
The question
Is establishing and closing 1000 connections per second fast and cheap or is a proxy with cheap connectivity that will reuse Kafka connection a must for such a use case?
I need to write a socket server that will be handling at least something about 1000 (much more in future) low-traffic permanent connections. I have made a draft version on PHP for testing purposes (we are developing a monitoring hardware, so we needed to develop and test a conversation protocol and hardware capabilities), which suited me very well when i had just a couple of clients connected. But when the amount of connections had grown to ten, some critical issues appeared. Here some info about server architecture:
I have a master process, which waits for socket connections and on connecting creates a child process (that serves this connection from now on) using pcntl_fork(). Also i am setting up a PDO connection to MySQL in master process. All the child processes are sharing the same single PDO object. At first i was afraid of getting some collisions during simultaneous queries, but i haven't encountered them, even through stress-test (10 children were making queries in the loop without stopping). But there is usleep(500000) in each child, so it could be luck, though i had this testing running for a couple of hours. But such load should not be present even at 1k clients connected, due to rare conversations between them and server.
So here is my first question: is it safe to use single PDO object for a big amount of child processes (ideally there would be around 1000)? I can use single connection for each child, but MySQL doesn't support nearly as much connections.
The second issue is in getting parasite MySQL connections. As i mentioned before, i have only one PDO object. But when i have more than one clients connected, and after they had run some queries, i see in mytop that there is more than one DB connection, and i could not find any correlation between the amount of connections and amount of child processes i have. For example i have 3 childs, and 5 DB connections. I tried to establish persistent connections, and it didn't changed anything.
Second question: Is it PDO who makes those additional connecitons to MySQL, or it is the MySQL driver? And is there a way to force them to use one connection? I don't think it could be my fault, my code prints an alert to console every time i call method which creates PDO object, and that happens only once, at the script start, before forking. After that i only run querys out of children, using parent's PDO object. Once again i can not afford to have so many connections due to MySQL limitations.
Third question: Will be one thousand of socket connections a problem by itself? Aside of the CPU and database load, i mean. Or i should do some amount of lesser servers (128 connections for example), that will tell the clients to connect to other one if max number of connects is exceeded?
Thanks in advance for your time and possible answers.
Currently your primary concern should be your socket server architecture. Forking a process for each client is super heavy. AFAIK an average PC can tolerate around 2000 threads and it's not going to work fast. Switching between processes means that CPU should save its state in memory, and if you have enormous amount of processes, CPU will be busy with memory IO and will have little time for actually doing stuff.
You may want to look at Apache for inspiration. In Apache they use a fixed amount of worker processes/threads, each process/thread working with multiple clients via select function and sockets in a non-blocking mode. This is a far more robust approach.
Regarding database IO, I would spawn a process/thread that would be the sole owner of database connections. Worker processes would communicate with the DB IO process using IPC (in case of processes) or lock-free queues (in case of threads). This approach makes you independent of PDO implementation details (if it is thread safe or does it spawn connections etc).
P. S. I suspect that you actually spawn new PDO objects with forking (forking merely means making a copy of a process with its memory and everything inside it) and PDO objects create and shut down connections on demand. It may explain why you're not seeing correlation between low traffic clients and DB connections.
I'm using Redis in a PHP project. I use phpredis as a client. Sometimes, during long CLI-scripts, I experience PHP segmentation faults.
I've experienced before that phpredis has problems when the connection times out. As my Redis config is configured to automatically close idle connections after 300 seconds, I guess that causes the segmentation fault.
In order to be able to choose whether to increase the connection timeout or default it to 0 (which means "never timeout"), I would like to know what the possible advantages and disadvantages are?
Why should I never close a connection?
Why should I make sure connections don't stay open?
Thanks
Generally, opening a connection is an expensive operation so modern best practices are to keep them open. On the other hand, open connections requires resources (from the database) to manage so keeping a lot of idle connections open can also be problematic. This trade off is usually resolved via the use of connection pools.
That said, what's more interesting is why does PHP segfault. The timeout is, evidently, caused by a long running command (CLI script in your case) that blocks Redis (which is mostly single threaded) from attending to the PHP app's connections. While this is a well-known Redis behavior, I would expect PHP (event without featuring reconnect at the client library) not to s**t its pants so miserably.
The answer to your question much depends on cases of redis usage in your application. So, should your never close a connection with idle connection timeout?
In general no, your should keep it default - 0. Why or when:
Any types of long living application. Such as CLI-script ot background worker. Why - phpredis do not has builded in reconnection feature so your should take care about this by yourself or do not your idle timeout.
Each time your request processed or CLI script die - all connections would be closed by php engine. Redis server close all connection for closed client sockets. You will have no problems like zombie connection or something like that. As extension, phpredis close connection in destructor - so your may be sure connections don't stay open.
p.s. Of course your can implement reconnection insome proxy class in php by yourself. We have redis in high load environment - ~4000 connections per second on instance. After 2.4 version we do not use idle connection timeout. And do not have any types of troubles with that.
As I understand, Apache isn't suited to serving long-poll requests, as each request into Apache will use one worker thread until the request completes, which may be a long time for long-poll/COMET requests.
But what about socket connections. On the PHP website I saw an example of a "simple multi-client server written in PHP that really works".
My question: Does such socket servers only use one worker thread for all established connections? And what about the opposite: Is it possible to write a PHP client which connects to several socket servers simultaneously using only one worker thread?
Look at phpDaemon. It design for long-pool applications and similar. But I advise you to use node.js for these tasks, if possible.
That is an example of a polling-loop style server - see the MSG_DONTWAIT constant being passed to socket_recv()? Essentially, it has a single thread that loops through all of its open sockets to see if any of them has data waiting. If a socket doesn't have data waiting, it moves on to the next and checks it.
However, note that with such a server, you don't get nice protocol handling beyond the TCP base - you have to worry about parsing a stream of raw data yourself.
All of your connections are done with sockets. The main difference is whether I/O is blocking or not. Choosing to receive from a socket that blocks will cause the thread to block, but using MSG_DONTWAIT will finish immediately.
Apache gives you a few options in this regard. You can fork for concurrent connections (mpm-prefork), use a different thread for each connection (mpm-worker), or threads with non-blocking I/O (mpm-event).
i read somewhere that php has threading ability called pcntl_fork(). but i still dont get what it means? whats its purpose?
software languages have multithreading abilities from what i understand, correct me if im wrong, that its the ability for a parent object to have children of somekind.
thanks
From wikipedia: "In computer science, a thread of execution is the smallest unit of processing that can be scheduled by an operating system. It generally results from a fork of a computer program into two or more concurrently running tasks."
Basically, having threads is having the ability to do multiple things within the same running application (or, process space as said by RC).
For example, if you're writing a chat server app in PHP, it would be really nice to "fork" certain tasks so if the server gets hung up processing something like a file transfer or very slow client it can spawn a thread to take care of the file transfer while the main application continues to transfer messages between clients without delay. Last time I'd used PHP, the threading was clunky/not very well supported.
Or, on the client end, while sending said file, it would be a good idea to thread the file transfer, otherwise you wouldn't be able to send messages to the server while sending the file.
It is not a very good metaphor. Think of a thread like a worker or helper that will do work or execute code for you in the background while your program could possibly be doing other tasks such as taking user input.
Threading means you can have more than one line of execution within the same process space. Note that this is different than a multi-process paradigm as processes will not share the same memory space.
This wiki link will do a good job getting you up to speed with threads. Having said that, the function pcntl_fork() in PHP, appears to create a child process which falls inline with the multi-process paradigm.
In layman's terms, since a thread gives you more than one line of execution within a program, it allows you to do more than one thing at the same time. Technically, you're not always doing these things simultaneously as in a single core processor, you're really just time-slicing so it appears you're doing more than one thing at a time.
A pretty straight-forward use of threads is how connections to a web server are handled. If you didn't have multiple threads, your application would listen for a connection on a socket, accept the connection when a client requested a connection, and then would process whatever page the client asked for. This seems well and good until you have a page that takes 5 seconds to load and you have 2 clients connecting at the same time. One of the clients will sit and wait for the server to accept their connection for ~5 seconds, because the 1st client is using the only line of execution to serve the page and it can't do that and accept the 2nd connection.
Now if you have multiple threads, you'll have one thread (i.e. the listener thread) that only accepts connections. As soon as the connection is accepted by the listener thread, he will pass the connection on to another thread. We'll call it the processor thread. Once the connection is passed on to the processor thread, the listener thread will immediately go back to waiting for a new connection. Meanwhile, the processor thread will use it's own execution line to serve the page that takes 5 seconds. In the scenario above, the 2nd client would have it's connection accepted immediately after the 1st client was handed to the 1st processor thread and an additional 2nd processor thread would be created to handle the request from the 2nd client. This would typically allow you to serve both clients the data in a little over 5 seconds while the single-threaded app would take ~10 seconds.
Hope this helps with your understanding of application threading.
Threading mean not to allow sequential behavior inside your code, whatever is your programming language..