I have built a small PHP server/client code. When I say client server I mean it acts bought as client and a server alternating for 5 seconds in each mode.
Now the code runs on two servers and is triggered by a cron.
On rare occasions they manage to get in perfect sync with each other and they either establish a connection at the very last microsecond but by then the PHP code has already passed to the client mode or they never manage to establish a connection.
Before this whole dance starts they run some database queries to select some information that can be big or small and never identical on them so adding some randomness in the timings has only made this incidents happen more rarely but not disappear completely.
Anyone ever manage to do something like this successfully? How?
You have designed a race condition here. No matter how you try to synchronize these, you'll get in trouble eventually.
The way to solve this is to have each process acting as a servere all the time, and doing client functionality on demand.
Related
I've been reading a lot on the subject of SSE and PHP, most of which seems to be advocating it as viable solutions for all sorts of things including chat apps. I have seen similar questions on this site but have not found a concise, definitive answer.
Is there something inherent in SSE which makes it way more server-friendly than AJAX short polling? Because the headers appear to be of very similar size. I am wondering if there is some kind of behind-the-scenes stuff beyond the headers that a noob like myself can't see e.g. some sort of connection recognition with each request/response? I know there are other factors involved where SSE prevails such as handling disconnections.
In terms of using it in a chat app scenario, ajax and sse appear to be doing the same thing. Neither of them seems to be able to perform long polling effectively with PHP. If I have User A and User B waiting on a PHP script that checks for new messages from the other user in the DB then sleeps for 3 seconds for say 10 loops, User A's new message cannot be inserted until User B has looped through the entire checking script, thereby rendering it absolutely useless (at least based on everything I've tried in the last 2 weeks!). I can get it working smoothly if I chat to myself and no one else is waiting on the checking script, but I've run out of things to talk about with myself and would really enjoy someone else being able to use it too.
So in a nutshell, given an Apache and PHP environment with WebSockets as not an option (due to shared hosting), is the only effective way to write a chat app, based on server burden alone, by short polling with one's choice of either AJAX or SSE, or is SSE definitely the superior option?
I would pursue WebSockets if the eventual traffic called for it and justified the web hosting upgrade.
(ALSO, as a side, is my premise off base regarding the long-polling scenario I described above where User A must wait for User B's loop to finish before he/she/it can perform the insert? Got me confused as to why that should be the case).
Kind of a long-winded, meandering question but hoping someone in the same situation can find this question and save themselves a lot of time.
Many Thanks!
Yes, SSE is a better option than AJAX, as AJAX polling is done on the main servers, like where most of the normal user traffic is to be hit. Whereas SSE polling is done on another instance which is made for it, so there will be no extra traffic on the main server. Please check Mercure (https://mercure.rocks/)
EDIT:
I mean to that, using SSE with platforms like Mercure would be a better option than AJAX. As AJAX will make a request to the main server. Which would increase the count of requests for the main server. Whereas we can distribute the network load using tools like the Mercure, in order to achieve the required functionality.
SSE can be thought of a thin API wrapper around the AJAX long-poll approach. It brings a standard API to something that was a hacky solution before.
something inherent in SSE which makes it way more server-friendly than AJAX short polling?
It holds the socket open. The pro of this is less latency (as soon as the server has the new information it sends it to the client, rather than waiting for the next client poll); the con is the extra resource usage (the socket, and the PHP process).
but I've run out of things to talk about with myself
Surely not. Have you tried starting a chat about if time is an illusion, and what came before?
with WebSockets as not an option (due to shared hosting)
SSE and WebSockets both hold a socket open. Shared hosting ISPs often go round closing sockets that have been open a long time (e.g. over 60s), unless they explicitly say they support SSE. The may also kill long-running PHP processes.
is my premise off base regarding the long-polling scenario I described above where User A must wait for User B's loop to finish before he/she/it can perform the insert?
I think it is off. The "A" in Ajax is asynchronous, meaning you can have multiple ajax/sse requests running at the same time. And on the server side you will have a distinct PHP process running for each request.
I'm programming c++ service which constantly every 1 second makes SELECT query with LIMIT 1 on mysql server, something computes and then makes INSERT and this in loop forever and ever.
I'd like to detect server overloading to make SELECTs with bigger LIMIT, for example LIMIT 10 and in greater inetrvals, like every 5 seconds or so. Not sure if my solution will lighten server overloads.
My problem is how to detect these overloads and I'm not sure what I mean by overload :) It could be anything, but my application is web application in php (chat) so overload could be detected on Apache2 side, or mysql side, or detecting how many users make how many inputs (chat messages) in time interval. I don't know :/
Thank you!
EDIT: Okay, I made an socket server from my C++ application and its really fast that way. Now I'm struggling with memory leaks, but that's another story.
So thank you #brianbeuning for helpful thoughts about my problem.
Better solve that forever and ever loop, its not good idea.
If that loop is really must, then use some caching technique.
For detecting "overload" (I would call it high MySQL CPU usage), try calling external commands supported by operating system.
For example if you use this on Linux, play with ps command.
EDIT:
I realized now that you are programming chatting server.
Using MySQL as middleman is NOT good idea.
Try solving this without using MySQL, and then if you need to save chat log, occasionally save it to MySQL (eg. every 10 seconds or so).
I bet it is CPU hog right now for just 20 intensive users.
Try to make direct client-to-client communication, without requiring server (use server only to establish communication between 2 clients).
Another approach would be to buffer the data in your app and use a connection pool of sorts to manage the load. Keep a rolling buffer of data that needs to be inserted and manage the 'limit' based on the size of the buffer.
Ok, I am working up something like a chat environment, and I'd like to have near real time if not real time conversation. But I know browsers will only give up 2 threads at a time for transactions per domain. So I am trying to figure out a way to make a synchronous chat without really effecting the browser. I also know browsers tend to lock up with synchronous requests.
So whats the best approach at creating a chat like environment on a site from scratch, assume the DB and scripting concept is fine, its the managing of the connection, wondering how to keep a persistant connection that won't congest the browser and cause it to possibly freeze up.
Anyone have any ideas.. Im not looking for flash, or java based solutions. I'd prefer not to poll every second either. But what is stacks impression, what would you do.
First off, the spec only suggests that two connections are allowed. Most modern browsers actually support up to 6.
There're three main accepted methods for creating a chat system out of pure Javascript:
Polling
The first solution is simple, and just involves polling the server every few seconds (5 is a nice number) to see what it's missed. It works simply and efficiently, but can lead to large amounts of unnecessary requests if not careful, which can cause unnecessary server load.
A better implementation of this involves polling to simply check if anything's happened since the last chat update, and if so, only then go through the process of finding out what's happened. Saves on the server load and bandwidth fronts.
Waiting
This method's more commonly used, and involves the browser sending a request to the server which is never fulfilled, and instead keeps 'waiting for a response'. When something happens, the server outputs it and fulfills the request, and the client makes another request and the process repeats. This saves on the request front, but can end up with a backlog of ongoing processes on your server.
Websockets
https://developer.mozilla.org/en/WebSockets
This involves creating a direct socket connection to the server, allowing data to be pushed to the client when needed. It's relatively new though, and can have some compatability issues, especially with older browsers.
Out of these, none of them is specifically the 'best method'; it depends on what you're aiming for, and what matters. If you've got a site designed for up-to-date browsers, then websockets could be your answer, but if you've got a small-ish server, then polling could be better, for example.
My own chat engine checks for new messages every five seconds. That's close enough to instant that nobody knows the difference.
It's as simple as setInterval(updateChat,5000);.
I am trying to write a client-server app.
Basically, there is a Master program that needs to maintain a MySQL database that keeps track of the processing done on the server-side,
and a Slave program that queries the database to see what to do for keeping in sync with the Master. There can be many slaves at the same time.
All the programs must be able to run from anywhere in the world.
For now, I have tried setting up a MySQL database on a shared hosting server as where the DB is hosted
and made C++ programs for the master and slave that use CURL library to make request to a php file (ex.: www.myserver.com/check.php) located on my hosting server.
The master program calls the URL every second and some PHP code is executed to keep the database up to date. I did a test with a single slave program that calls the URL every second also and execute PHP code that queries the database.
With that setup however, my web hoster suspended my account and told me that I was 'using too much CPU resources' and I that would need to use a dedicated server (200$ per month rather than 10$) from their analysis of the CPU resources that were needed. And that was with one Master and only one Slave, so no more than 5-6 MySql queries per second. What would it be with 10 slaves then..?
Am I missing something?
Would there be a better setup than what I was planning to use in order to achieve the syncing mechanism that I need between two and more far apart programs?
I would use Google App Engine for storing the data. You can read about free quotas and pricing here.
I think the syncing approach you are taking is probably fine.
The more significant question you need to ask yourself is, what is the maximum acceptable time between sync's that is acceptable? If you truly need to have virtually realtime syncing happening between two databases on opposite sites of the world, then you will be using significant bandwidth and you will unfortunately have to pay for it, as your host pointed out.
Figure out what is acceptable to you in terms of time. Is it okay for the databases to only sync once a minute? Once every 5 minutes?
Also, when running sync's like this in rapid succession, it is important to make sure you are not overlapping your syncs: Before a sync happens, test to see if a sync is already in process and has not finished yet. If a sync is still happening, then don't start another. If there is not a sync happening, then do one. This will prevent a lot of unnecessary overhead and sync's happening on top of eachother.
Are you using a shared web host? What you are doing sounds like excessive use for a shared (cPanel-type) host - use a VPS instead. You can get an unmanaged VPS with 512M for 10-20USD pcm depending on spec.
Edit: if your bottleneck is CPU rather than bandwidth, have you tried bundling up updates inside a transaction? Let us say you are getting 10 updates per second, and you decide you are happy with a propagation delay of 2 seconds. Rather than opening a connection and a transaction for 20 statements, bundle them together in a single transaction that executes every two seconds. That would substantially reduce your CPU usage.
That question may appear strange.
But every time I made PHP projects in the past, I encountered this sort of bad experience:
Scripts cancel running after 10 seconds. This results in very bad database inconsistencies (bad example for an deleting loop: User is about to delete an photo album. Album object gets deleted from database, and then half way down of deleting the photos the script gets killed right where it is, and 10.000 photos are left with no reference).
It's not transaction-safe. I've never found a way to do something securely, to ensure it's done. If script gets killed, it gets killed. Right in the middle of a loop. It gets just killed. That never happened on tomcat with java. Java runs and runs and runs, if it takes long.
Lot's of newsletter-scripts try to come around that problem by splitting the job up into a lot of packages, i.e. sending 100 at a time, then relading the page (oh man, really stupid), doing the next one, and so on. Most often something hangs or script will take longer than 10 seconds, and your platform is crippled up.
But then, I hear that very big projects use PHP like studivz (the german facebook clone, actually the biggest german website). So there is a tiny light of hope that this bad behavior just comes from unprofessional hosting companies who just kill php scripts because their servers are so bad. What's the truth about this? Can it be configured in such a way, that scripts never get killed because they take a little longer?
Is PHP suitable for very large projects?
Whenever I see a question like that, I get a bit uneasy. What does very large mean? What may be large to you, may be small to me or vice versa. And that is even assuming that we use the same metric. Are you measuring time to build the project, complete life-cycle of the project, money that are involved, number of people using it, number of developers to build/maintain it, etc. etc.
That said, the problems you're describing sounds like you don't know your technology good enough. That would be a problem for you regardless of which technology you picked. For example, use database transactions to ensure atomicity. And use asynchronous offline jobs to process long running tasks (Such as dispatching a mailing list).
A lot if the bad behaviour is covered in good frameworks like the Zend Framework.
Anything that takes longer the 10 seconds is really messed up but you can always raise the execution time with http://de3.php.net/set_time_limit
A lot of big sites are writen in PHP: Facebook, Wikipedia, StudiVZ, Digg.com etc.. a lot of the things you are talking about are just configuration things maybe you should look into that?
Are you looking for set_time_limit() and ignore_user_abort()?
Performance is not a feature you can just throw in after most of the site is done.
You have to design the site for heavy load.
If a database task is normally involving 10K rows, you should be prepared not just the execution time issues, but other maintenance questions.
Worst case: make a consistency tool to check and fix those errors.
Better: instead of phisically delete the images, just flag them and let background services to take care of the expensive maneuvers.
Best: you can utilize a job queue service and add this job to the queue.
If you do need to do transactions in php, you can just do:
mysql_query("BEGIN");
/// do your queries here
mysql_query("COMMIT");
The commit command will just complete the transaction.
If any errors occur, you can just rollback with:
mysql_query("ROLLBACK");
Edit: Note this will only work if you are using a database that supports transactions, such as InnoDB
You can configure how much time is allowed for executing a script, either in the php.ini setting or via ini_set/set_time_limit
Instead of studivz (the German Facebook clone), you could look at the actual Facebook which is entirely PHP. Or Digg. Or many Yahoo sites. Or many, many others.
ignore_user_abort is probably what you're looking for, but you could also add another layer in terms of scheduled maintenance jobs. They basically run on a specified interval and do various things to make sure your data/filesystem are in a state that you want... deleting old/unlinked files is just one of many things you can do.
For these large loops like deleting photo albums or sending 1000's of emails your looking for ignore_user_abort and set_time_limit.
Something like this:
ignore_user_abort(true); //users leaves webpage will not kill script
set_time_limit(0); //script can take as long as it wants
for(i=0;i<10000;i++)
costly_very_important_operation();
Be carefull however that this could potentially run the script forever:
ignore_user_abort(true); //users leaves webpage will not kill script
set_time_limit(0); //script can take as long as it wants
while(true)
do_something();
That script will never die, unless you restart your server.
Therefore it is best to never set the time_limit the 0.
Technically no programming language is transaction safe, it's the database that needs to be transaction safe. So if the script/code running dies or disconnects, for whatever reason, the transaction will be rolled back.
Putting queries in a loop is a very bad idea unless it is specifically design to be running in batches and breaking a much larger set into smaller pieces. Adjusting PHP timers and limits is generally a stop gap solution, you are still dependent on the client browser if using the web to kick off a script.
If I have a long process that needs to be kicked off by a browser, I "disconnect" the process from the browser and web server so control is returned to the user while the script runs. PHP scripts run from the command line can run for hours if you want. You can then use AJAX, or reload the page, to check on the progress of the long running script.
There are security concern with this code, but to "disconnect" a process from PHP running under something like Apache:
exec("nohup /usr/bin/php -f /path/to/script.php > /dev/null 2>&1 &");
But that really has nothing to do with PHP being suitable for large projects or being transaction safe. PHP can be used for large projects, but since by default there is no code that remains "resident" between hits, it can get slow if not designed right. Also, since there is no namespace support, you want to plan ahead if you have a large development team.
It's fine for a Java based system to take a few minutes to startup, initialize and load all the default objects. But this is unacceptable with PHP. PHP will take more planning for larger systems. The question is, when does the time saved in using PHP get wasted by the additional planning time required for a large system?
The reason you most likely experienced bad database consistencies in the past is because you were using the MyISAM engine for mysql (which DOES NOT support transactions). Use InnoDB instead, it supports transactions and performs row level locking.
Or use postgreSQL.
Many, many software sites are made in PHP. However, you will not hear about millions of web pages made in PHP that do not exist anymore because they were abandoned. Those pages may have burned all company money for dealing with PHP mess, or maybe they bankrupted because their soft was so crappy that customer did not want it… PHP seems good at the startup, but it does not scale very well. Yes, there are many huge web sites made in PHP, but they are rather exceptions, than a norm.