I am working on an IoT project in which 10K devices are sending data every 5 seconds. In order to communicate with server, I use Mosquitto MQTT broker. From the broker, a Laravel (Lumen exactly) application reads data, process it and add to database. In order to read data from MQTT to PHP, I am using the following package.
https://packagist.org/packages/salmanzafar/laravel-mqtt
This is working perfectly at low load, but when the high load is coming to the server, some of the packets are missing. It reaches MQTT, but not reaches to database through PHP.
Does anyone faced this issue ? I have only one topic. Can anyone suggest a better alternative library in PHP ?
Our testing with mosquitto suggests severe degradation around 1000 connections, mainly because it is a single-threaded broker. Of course, there are many variables that affect performance, but if you care about performance, you'll end up with a commercial broker.
Eg. see this blog post https://gambitcomm.blogspot.com/2018/08/video-monitor-end-to-end-latency-of.html which details end-to-end latency testing for 10k publishers.
Related
I have 3 codeigniter based application instances on two separate servers.
Server 1.
First instance is application, second instance is rest API, both use same database. ( I know there is no benefit to have two instances on same machine, other than cleanliness, and that is why I have it this way ).
Server 2.
This server holds only rest API with whole bunch of php data processing functions. I call this server worker because that is what it only does.
This server works as an endpoint for many API services I am connecting with.
So all this server does as first function is receive requests from application, sometimes it processes those requests before anything else.
Then sends requests to API service. Process is complete this session is over.
In short time API service responds with results where this server takes and processes the data then it sends the result to the application.
Application is at times heavy on amount of very simple sql queries, for the most part insert/update on single table. Amount of sent requests is kept to minimal as well, just because for the most part I send data as many requests in one. I call this bulk request.
What is very heavy is amount of responses I get, I can get up to a 1000 responses to one request within few seconds.( I can't minimize that, because I need every single one ), and then each response I get also is being followed by another two identical responses just to make sure I got it, which I threat as duplicate as soon as I can, and stopping that one process.
Then I process every response with php ( not too heavy just matching result arrays ) and post it to my rest API on the application server to update application tables.
Now when I run say 1 request that returns 1000 responses, application is processing data fine with correct results, but the server is pretty much not accessible in this time for other users.
Everything running on an (LAMP) Ubuntu 16.04 with mysql and apache.
Framework is latest codeigniter.
Currently my setup is...
...for the application server
2 vCPUs
4GB RAM
...for worker API server
1 vCPUs
1GB RAM
I know the server setup is very weak, and it bottlenecks for sure. But this was just for development stage.
Now I am moving into production and would like to hear opinions if you have any on how to best approach this.
I am a programmer first, then server administrator.
So I was debating switching to NGINX, I think I will definitely go with php-fpm, maybe MariaDB but I read of thread management is important. This app will not run heavy all the time probably 50/50 so I think just because of that I may not be able to set it to optimal for all times anyway, and may end up with not any better performance at the end.
Then probably will have to multiply servers and setup load balancing, also high availability.
Not sure about all this.
I don't think that just upgrading the servers to maximum will help tho. I can go all the way up too 64 GB RAM and 32 vCPUs per server.
Can I hear your opinions please?
Maybe share some experience?
Links to resources if you have some good ones?
Thank you very much. I hope you can help me.
Thank you.
None of your questions matter. Well, that is an exaggeration. Machines today are not enough different to worry about starting with the "best" on day one. Instead, implement something, run with it for a while, then see where your bottlenecks in order to decide what to do next.
Probably you won't have any bottlenecks for a long time.
this question has to do with theory as with real life programming I first asked it in (cs.stackexchange.com) because is theory most and I had the instruction to ask here (https://cs.stackexchange.com/questions/81472/question-about-implementing-websockets-theory-and-the-reality-in-php) .
I am experimenting with web sockets and PHP many years now (some of this code is already in production) , first I created from scratch a WebSocket (WS) Server with non blocking IO and everything worked fine , except in real life other methods needed by the app couldn’t be non blocking (e.g. connection to a DB and a query). Then I introduced async programming , meaning that the WS Server initiated various PHP requests to the sever and check in every loop if those requests have finished the results in order to send them to client. That worked well for few client side users connected to this WS server , the number had to do with what the operation was but it wouldn’t be more than 30 or 50. That were because if you use only one thread and you have many simultaneous requests you must check each one of them sequential if there is a finished result.
The next step was to analyze the code of popular approaches claiming that can hold and process many (some say 10000) clients in same time. Maybe they knew something that I didn’t (My issue isn’t if they are lying , the issue is if there is something I am missing (or maybe I am wrong) here). The results were frustrating. Most of them don’t use async by default advising you not to use blocking methods (something that is really impossible in real life programming) , but even if you put modules to them to make them async the same problem that I had arose.
The question isn’t what is the solution , because I implemented PHP pthreads and I could make it work , but with no real benefit (e.g. sharing objects , it had to serialize unserialize everything), I write C++ PHP extensions some years now , so I am working in a PHP extension that will do that efficiently.
The question here is , am I missing something ? How can they claim that the can handle a large amount of request simultaneously while even with async programming they have to check for each request in the loop that has finished ?
Thank you in advance for any new knowledge or direction to search that your answer might lead me.
Yes, there are projects that make it possible with PHP. One such project is Amp with its Aerys HTTP and WebSocket server. Yes, you can't just call blocking functions in the same thread. Yes, pthreads won't help, it's mostly like just running another PHP process, because everything in PHP is shared nothing. But how does it work then?
Use non-blocking implementations where possible. There are libraries that work with non-blocking I/O for database access, such as amphp/mysql.
If there's no such library, ask whether something like that can be implemented if you don't want to / can't implement it yourself.
Another possibility is to use libraries such as amphp/parallel that use persistent workers for blocking tasks. Spawning another worker for each blocking task would be horribly inefficient, so that library makes it easy to use worker pools and keep these workers alive for several tasks each.
One such library that makes use of amphp/parallel is amphp/file, which uses these workers for non-blocking filesystem access when no extensions like uv or eio are available, maybe you want to have a look at its ParallelDriver.
How many connections you will be able to handle concurrently depends a lot on your hardware and what you're doing with these connections. If you constantly stream data to each client, you will be able to keep much fewer connections open than in a situation where most connections are idle and only send / receive something in a small portion of the connected time.
If you want to handle more than ~1000 clients, you probably need an extension or recompile PHP because of the FD_MAXSIZE for stream_select, which is compiled in and limits stream_select to file descriptors lower than 1024.
I would like to know if someone could explain to me how to build a real-time application with Symfony?
I have looked at a lot of documentation with my best friend Google, but I have not found quite detailed articles.
I would like some more PHP-oriented thing and saw that there were technologies like ReactPHP / Ratchet (but I can not find a tutorial clear enough to integrate it into an existing symfony project).
Do you have any advice on which technologies to use and why? (If you have tutorial links I take!)
Thank you in advance for your answers !
Every useful Symfony application does some form of I/O. In traditional applications this is most often blocking I/O. Even if it's non-blocking I/O, it doesn't integrate a global event loop that could schedule other things while waiting for I/O.
If you integrate Symfony into an existing event loop based WebSocket server, it will work with blocking I/O as a proof of concept, but you will quickly notice it isn't running fine in production, because any blocking I/O blocks your whole event loop and thus blocks all other connected clients.
One solution is rewriting everything to non-blocking I/O, but then you'd no longer be using Symfony. You might be able to reuse some components, but only those not doing any I/O.
Another solution is to use RPC and queue WebSocket requests into a queue. The intermediary can be written using non-blocking I/O only, it doesn't have much to do. It basically just forwards WebSocket messages as RPC requests to a queue. Then you have a set of workers pulling from that queue, doing a normal Symfony kernel dispatch and sending the response into a response queue. The worker can then continue to fetch the next job.
With the second solution you can totally use blocking I/O and all existing Symfony components. You can spawn as many workers as you need and you can even keep them alive between requests. The difference with a queue in between is that one blocking worker doesn't block the responsiveness of the WebSocket endpoint.
If you want multiple WebSocket processes, you'll need separate response queues for them, so the responses are sent back to the right process where the client is connected.
You can find a working implementation with BeanstalkD as queue in kelunik/rpc-demo. src/Server.php is just for the demo purpose and can be replaced with a HTTP server at any time. To keep the demo simple it uses a single WebSocket process but that can be changed as outlined above. You can start php bin/server and php bin/worker, then use telnet localhost 2000 to connect and send messages. It will respond with the same message but base64 encoded in the workers.
The mentioned demo is built on Amp, but the same concepts apply to ReactPHP as well.
In this issue of the official Symfony repository you may find comments and ideas about this: https://github.com/symfony/symfony/issues/17051
We are currently developing a mobile app for iOS and Android. For this, we need a stable webservices.
Requirements: - Based on PHP and MySQL, must be blazing fast, must be scalable
I've created a custom-coded simple webservices with multiple endpoints to allow passing data from the app to our database, and vice versa.
My Question:
our average response time with my custom coded solution is below 100ms (measured using newrelic) for normal requests (say, updating a DB field, or performing INSERT INTO). This is without any load however (below 100 users daily). When we are creating outbound requests (specifically, sending E-Mail using SendGrid PHP-Framework) we are seeing a response time of > 1000ms. It appears that the request is "waiting" for a response from Sendgrid. Is it possible to tell the script not to "wait for a response"? This is not really ideal. My idea was to store all "pending" requests to a separate table, and then using a cron to run through all "pending" requests and mark them as "completed". Is this a viable solution? And will one cron each minute be enough for processing requests (possible delay of 1min for each E-Mail)?
As always, any replies or suggestions are very appreciated. Thanks in advance!
To answer the first part of your question: Yes you can make asynchronous requests with PHP, and even ignore the service's response. However, as you correctly say it's not a super great solution.
Asynchronous Requests
This excellent blog post on PHP Asynchronous Requests by Segment.io comes to several conclusions:
You can open a socket and write to it, as described by this Stack Overflow Topic - However, it seems that this is actually blocking and fairly slow (300ms in their tests).
You can write to a log file and then process it in another way (essentially a queue, like you describe) - However, this requires another process to read the log and process it. Using the file system can be slow, and shared files can cause all sorts of problems.
You can fork a cURL request - However, this means you aren't waiting for a response, so if SendGrid (or some other service) responds with an error, you can't catch it and react.
Opinion Land
We're now entering semi-opinion land, but queues as you describe (such as a mySQL one with a cron job, or a text file, or something else) tend to be very scalable as you can throw workers at the queue if you need it to process faster. These can be outside your user facing system (and therefor not share resources).
Queues
With a queue, you'd have a separate service that would be responsible for sending an email with SendGrid (e.g.). It would pull tasks off a queue (e.g. "send an email to Nick")and then execute on it.
There are several ways to implement queues that you can process.
You can write your own - As you seem to want to stay on PHP/mySQL, if you do this you'll need to take into account a bunch of queueing problems and weird edge cases. However, you'll have absolute control and for a simple application maybe this will work.
You can implement a self hosted task queue - Celery is meant to be a distributed task queue, øMQ (ZeroMQ) and RabbitMQ can also be used as Task Queues. These are meant to be fast and distributed and have had a lot of thought put into them. You'd need to benchmark them in your system to see if they speed it up. It'd also mean you have to host additional pieces yourself. This however, is likely to be the fastest solution from a communication standpoint.
You can pass things off to a hosted task queue - IronMQ and Amazon SQS are both cool hosted solutions which means you wouldn't need to dedicate resources to them, additionally with IronWorkers (e.g.) you could have the other service taken care of. However, since you're trying to optimize a request to an external service, this probably isn't the solution in this scenario.
Queueing Emails
On the topic of queuing emails (specifically), this is something common to email senders. Like with everything else it means you can have better reliability (because if a service down the line fails you can keep it in the queue and retry).
With email however, there's some specific services out there for queueing messages. These are SMTP Servers. Theoretically you can setup a server like sendmail and then set SendGrid as your "smarthost" or relay and have the server send to SendGrid. It then queues and deals with service interruptions and sends mail with little additional code. However, SMTP servers are pains to deal with, even if they're just forwarding messages. Additionally, SMTP is even slower than HTTP to establish a connection and therefor probably not what you want, but it's good to know.
Another possible solution if you control your own server environment that will speed up your email sending and your application is to install a mail server such as Postfix locally. You then configure Postfix to use your Sendgrid credentials, so any email sent will go from your server to sendgrid.
This is not a PHP solution, but removes the need for writing your own customer solution. If you set Postfix as the default mail server. You can then just use the php mail() function to send email.
https://sendgrid.com/docs/Integrate/Mail_Servers/postfix.html
I've a problem which is giving me some hard time trying to figure it out the ideal solution and, to better explain it, I'm going to expose my scenario here.
I've a server that will receive orders
from several clients. Each client will
submit a set of recurring tasks that
should be executed at some specified
intervals, eg.: client A submits task
AA that should be executed every
minute between 2009-12-31 and
2010-12-31; so if my math is right
that's about 525 600 operations in a
year, given more clients and tasks
it would be infeasible to let the server process all these tasks so I
came up with the idea of worker
machines. The server will be developed
on PHP.
Worker machines are just regular cheap
Windows-based computers that I'll
host on my home or at my workplace,
each worker will have a dedicated
Internet connection (with dynamic IPs)
and a UPS to avoid power outages. Each
worker will also query the server every
30 seconds or so via web service calls,
fetch the next pending job and process it.
Once the job is completed the worker will
submit the output to the server and request
a new job and so on ad infinitum. If
there is a need to scale the system I
should just set up a new worker and the
whole thing should run seamlessly.
The worker client will be developed
in PHP or Python.
At any given time my clients should be
able to log on to the server and check
the status of the tasks they ordered.
Now here is where the tricky part kicks in:
I must be able to reconstruct the
already processed tasks if for some
reason the server goes down.
The workers are not client-specific,
one worker should process jobs for
any given number of clients.
I've some doubts regarding the general database design and which technologies to use.
Originally I thought of using several SQLite databases and joining them all on the server but I can't figure out how I would group by clients to generate the job reports.
I've never actually worked with any of the following technologies: memcached, CouchDB, Hadoop and all the like, but I would like to know if any of these is suitable for my problem, and if yes which do you recommend for a newbie is "distributed computing" (or is this parallel?) like me. Please keep in mind that the workers have dynamic IPs.
Like I said before I'm also having trouble with the general database design, partly because I still haven't chosen any particular R(D)DBMS but one issue that I've and I think it's agnostic to the DBMS I choose is related to the queuing system... Should I precalculate all the absolute timestamps to a specific job and have a large set of timestamps, execute and flag them as complete in ascending order or should I have a more clever system like "when timestamp modulus 60 == 0 -> execute". The problem with this "clever" system is that some jobs will not be executed in order they should be because some workers could be waiting doing nothing while others are overloaded. What do you suggest?
PS: I'm not sure if the title and tags of this question properly reflect my problem and what I'm trying to do; if not please edit accordingly.
Thanks for your input!
#timdev:
The input will be a very small JSON encoded string, the output will also be a JSON enconded string but a bit larger (in the order of 1-5 KB).
The output will be computed using several available resources from the Web so the main bottleneck will probably be the bandwidth. Database writes may also be one - depending on the R(D)DBMS.
It looks like you're on the verge of recreating Gearman. Here's the introduction for Gearman:
Gearman provides a generic application
framework to farm out work to other
machines or processes that are better
suited to do the work. It allows you
to do work in parallel, to load
balance processing, and to call
functions between languages. It can be
used in a variety of applications,
from high-availability web sites to
the transport of database replication
events. In other words, it is the
nervous system for how distributed
processing communicates.
You can write both your client and the back-end worker code in PHP.
Re your question about a Gearman Server compiled for Windows: I don't think it's available in a neat package pre-built for Windows. Gearman is still a fairly young project and they may not have matured to the point of producing ready-to-run distributions for Windows.
Sun/MySQL employees Eric Day and Brian Aker gave a tutorial for Gearman at OSCON in July 2009, but their slides mention only Linux packages.
Here's a link to the Perl CPAN Testers project, that indicates that Gearman-Server can be built on Win32 using the Microsoft C compiler (cl.exe), and it passes tests: http://www.nntp.perl.org/group/perl.cpan.testers/2009/10/msg5521569.html But I'd guess you have to download source code and build it yourself.
Gearman seems like the perfect candidate for this scenario, you might even want to virtualize you windows machines to multiple worker nodes per machine depending on how much computing power you need.
Also the persistent queue system in gearman prevents jobs getting lost when a worker or the gearman server crashes. After a service restart the queue just continues where it has left off before crash/reboot, you don't have to take care of all this in your application and that is a big advantage and saves alot of time/code
Working out a custom solution might work but the advantages of gearman especially the persistent queue seem to me that this might very well be the best solution for you at the moment. I don't know about a windows binary for gearman though but i think it should be possible.
A simpler solution would be to have a single database with multiple php-nodes connected. If you use a proper RDBMS (MSql + InnoDB will do), you can have one table act as a queue. Each worker will then pull tasks from that to work on and write it back into the database upon completion, using transactions and locking to synchronise. This depends a bit on the size of input/output data. If it's large, this may not be the best scheme.
I would avoid sqlite for this sort of task, although it is a very wonderful database for small apps, it does not handle concurrency very well, it has only one locking strategey which is to lock the entire database and keep it locked until a sinlge transaction is complete.
Consider Postgres which has industrial strength concurrency and lock management and can handle multiple simultanious transactions very nicely.
Also this sounds like a job for queuing! If you were in hte Java world I would recommend a JMS based archictecture for your solution. There is a 'dropr' project to do something similar in php but its all fairly new so it might not be suitable for your project.
Whichever technoligy you use you should go for a "free market" solution where the worker threads consume available "jobs" as fast as they can, rather than a "command economy" where a central process allocates tasks to choosen workers.
The setup of a master server and several workers looks right in your case.
On the master server I would install MySQL (Percona InnoDB version is stable and fast) in master-master replication so you won't have a single point of failure.
The master server will host an API which the workers will pull at every N seconds. The master will check if there is a job available, if so it has to flag that the job has been assigned to the worker X and return the appropriate input to the worker (all of this via HTTP).
Also, here you can store all the script files of the workers.
On the workers, I would strongly suggest you to install a Linux distro. On Linux it's easier to set up scheduled tasks and in general I think it's more appropriate for the job.
With Linux you can even create a live cd or iso image with a perfectly configured worker and install it fast and easy on all the machines you want.
Then set up a cron job that will RSync with the master server to update/modify the scripts. In this way you will change the files in just one place (the master server) and all the workers will get the updates.
In this configuration you don't care of the IPs or the number of workers because the workers are connecting to the master, not vice-versa.
The worker job is pretty easy: ask the API for a job, do it, send back the result via API. Rinse and repeat :-)
Rather than re-inventing the queuing wheel via SQL, you could use a messaging system like RabbitMQ or ActiveMQ as the core of your system. Each of these systems provides the AMQP protocol and has hard-disk backed queues. On the server you have one application that pushes new jobs into a "worker" queue according to your schedule and another that writes results from a "result" queue into the database (or acts on it some other way).
All the workers connect to RabbitMQ or ActiveMQ. They pop the work off the work queue, do the job and put the response into another queue. After they have done that, they ACK the original job request to say "its done". If a worker drops its connection, the job will be restored to the queue so another worker can do it.
Everything other than the queues (job descriptions, client details, completed work) can be stored in the database. But anything realtime should be put somewhere else. In my own work I'm streaming live power usage data and having many people hitting the database to poll it is a bad idea. I've written about live data in my system.
I think you're going in the right direction with a master job distributor and workers. I would have them communicate via HTTP.
I would choose C, C++, or Java to be clients, as they have capabilities to run scripts (execvp in C, System.Desktop.something in Java). Jobs could just be the name of a script and arguments to that script. You can have the clients return a status on the jobs. If the jobs failed, you could retry them. You can have the clients poll for jobs every minute (or every x seconds and make the server sort out the jobs)
PHP would work for the server.
MySQL would work fine for the database. I would just make two timestamps: start and end. On the server, I would look for WHEN SECONDS==0