Website architecture design requiring real-time polling from external servers

Website architecture design requiring real-time polling from external servers - php

I have a game running in N ec2 servers, each with its own players inside (lets assume it a self-contained game inside each server).
What is the best way to develop a frontend for this game allowing me to have near real-time information on all the players on all servers.
My initial approach was:
Have a common-purpose shared hosting php website polling data from each server (1 socket for each server). Because most shared solutions don't really offer permanent sockets, this would require me to create and process a connection each 5 seconds or so. Because there isn't cronjob with that granularity, I would end up using the requests of one unfortunate client to make this update. There's so many wrong's here, lets consider this the worst case scenario.
The best scenario (i guess) would be to create small ec2 instance with some python/ruby/php web based frontend, with a server application designed just for polling and saving the data from the servers on the website database. Although this should work fine, I was looking for some solution where I don't need to spend that much money (even a micro instance is expensive for such pet project).
What's the best and cheap solution for this?

Is there a reason you can't have one server poll the others, stash the results in a json file, then push that file to the web server in question? The clients could then use ajax to update the listings in near real time pretty easily.
If you don't control the game servers I'd pass the work on updating the json off to one of the random client requests. it's not as bad as you think though.
Consider the following:
Deliver (now expired) data to client, including timestamp
call flush(); (test to make sure the page is fully rendered, you may need to send whitespace or something to fill the buffer depending on how the webserver is configured. appending flush(); sleep(4); echo "hi"; to a php script should be an easy way to test.
call ignore user abort (http://php.net/manual/en/function.ignore-user-abort.php) so your client will continue execution regardless of what the user does
poll all the servers, update your file
Client waits a suitable amount of time before attempting to update the updated stats via AJAX.
Yes that client does end up with the request taking a long time, but it doesn't affect their page load, so they might not even notice.

You don't provide the information needed to make a decision on this. It depends on the number of players, number of servers, number of games, communication between players, amount of memory and cpu needed per game/player, delay and transfer rate of the communications channels, geographical distribution of your players, update rate needed, allowed movement of the players, mutual visibility. A database should not initially be part of the solution, as it only adds extra delay and complexity. Make it work real-time first.
Really cheap would be to use netnews for this.

Related

Is SSE and PHP Better than AJAX for chat app?

I've been reading a lot on the subject of SSE and PHP, most of which seems to be advocating it as viable solutions for all sorts of things including chat apps. I have seen similar questions on this site but have not found a concise, definitive answer.
Is there something inherent in SSE which makes it way more server-friendly than AJAX short polling? Because the headers appear to be of very similar size. I am wondering if there is some kind of behind-the-scenes stuff beyond the headers that a noob like myself can't see e.g. some sort of connection recognition with each request/response? I know there are other factors involved where SSE prevails such as handling disconnections.
In terms of using it in a chat app scenario, ajax and sse appear to be doing the same thing. Neither of them seems to be able to perform long polling effectively with PHP. If I have User A and User B waiting on a PHP script that checks for new messages from the other user in the DB then sleeps for 3 seconds for say 10 loops, User A's new message cannot be inserted until User B has looped through the entire checking script, thereby rendering it absolutely useless (at least based on everything I've tried in the last 2 weeks!). I can get it working smoothly if I chat to myself and no one else is waiting on the checking script, but I've run out of things to talk about with myself and would really enjoy someone else being able to use it too.
So in a nutshell, given an Apache and PHP environment with WebSockets as not an option (due to shared hosting), is the only effective way to write a chat app, based on server burden alone, by short polling with one's choice of either AJAX or SSE, or is SSE definitely the superior option?
I would pursue WebSockets if the eventual traffic called for it and justified the web hosting upgrade.
(ALSO, as a side, is my premise off base regarding the long-polling scenario I described above where User A must wait for User B's loop to finish before he/she/it can perform the insert? Got me confused as to why that should be the case).
Kind of a long-winded, meandering question but hoping someone in the same situation can find this question and save themselves a lot of time.
Many Thanks!

Yes, SSE is a better option than AJAX, as AJAX polling is done on the main servers, like where most of the normal user traffic is to be hit. Whereas SSE polling is done on another instance which is made for it, so there will be no extra traffic on the main server. Please check Mercure (https://mercure.rocks/)
EDIT:
I mean to that, using SSE with platforms like Mercure would be a better option than AJAX. As AJAX will make a request to the main server. Which would increase the count of requests for the main server. Whereas we can distribute the network load using tools like the Mercure, in order to achieve the required functionality.

SSE can be thought of a thin API wrapper around the AJAX long-poll approach. It brings a standard API to something that was a hacky solution before.
something inherent in SSE which makes it way more server-friendly than AJAX short polling?
It holds the socket open. The pro of this is less latency (as soon as the server has the new information it sends it to the client, rather than waiting for the next client poll); the con is the extra resource usage (the socket, and the PHP process).
but I've run out of things to talk about with myself
Surely not. Have you tried starting a chat about if time is an illusion, and what came before?
with WebSockets as not an option (due to shared hosting)
SSE and WebSockets both hold a socket open. Shared hosting ISPs often go round closing sockets that have been open a long time (e.g. over 60s), unless they explicitly say they support SSE. The may also kill long-running PHP processes.
is my premise off base regarding the long-polling scenario I described above where User A must wait for User B's loop to finish before he/she/it can perform the insert?
I think it is off. The "A" in Ajax is asynchronous, meaning you can have multiple ajax/sse requests running at the same time. And on the server side you will have a distinct PHP process running for each request.

Online Testing Platform

Hi there we are developing a website for students for taking Online tests.
We are working on PHP My SQL.
The questions of the all the tests are stored in a table with the test_id associated with the test.
Problem:
Now as the questions of the tests are being loaded from the server it sometime takes time in loading.
As these tests are being TIMED (Online Tests) hence the test taker feels his time is getting wasted.
The loading time may be a result of
slow internet connection
Databse search
Question/s
What is the best way of giving a jerkless experience to the test-taker irrespective of his internet speed and PC configuration.

From your wording, I'm assuming each individual question has its own time limit.
Eliminating a user's slow connection is impossible; if you measure the time on the client to try and avoid that, you open it up to cheating (client can hack the javascript to present a false time).
However you can eliminate database query time: set up a websocket server, have the user connect to it when they start the test, load all of the relevant questions in advance on the server into a queue, and when the user requests a question, immediately record the current time and send the next question from the queue out via the websocket connection.
Also make sure that upon receiving the question, the client side JS displays it immediately and doesn't have to e.g. make further AJAX calls or requests before it can display it. If additional information is needed, that should be looked up by your websocket server and bundled in with the question.
By doing this you should be able to get the response time below 50ms if the user has a decent internet connection and is in the same country as your websocket server.

You can not speed-up loading regardless of the users internet-connection. Of course, you can (and should) optimize all SQL queries and long-running tasks to have them perform as good as possible.
To void issues with test time running out, I would recommend to load all questions before the time starts running. Then, all data can be stored in the clients local storage (refer this link for some more info) - but please take into account, that this will only work if the browser supports local storage.
Another possibility is, to load / generate all data and have some server-side cache (like memcached, or a simple file-cache). On every new action, that cache can be queried without having to query all data from the database. Of course, this will only speedup the process, if the performance issues are in long-running queries, database speed etc - not if the user`s internet connection is too slow.

Running concurrent API requests in PHP using Beanstalkd/Gearman

I've got a rather large PHP web app which gets its products from numerous others suppliers through their API's, usually responding with a large XML to parse. Currently there are 20 suppliers but this is due to rise even further.
Our current set up uses multi curl to make the requests and this takes about 30-40 seconds to complete and is too long. The script runs in the background whilst the front end polls the database looking for results and then displays them as they come in.
To improve this process we were thinking of using a job server to run in the background, each supplier request being a separate job. We've seen beanstalkd and Gearman being mentioned.
So are we looking in the right direction, as in, is a job server the right way to go? We're looking at doing some promotion soon so we may get 200+ users searching 30 suppliers at the same time so the right choice needs to scale well if we have to load balance.
Any advice is great fully received.

You can use Beanstalkd, as you can customize the priority of jobs and the TTR time-to-resolve, default is 60 seconds, but for your scenario you must increase it. There is a nice admin console panel for Beanstalkd.
You should also leverage the multi Curl calls, so you should use parallel requests. In order to make use of Keep-alive you also need to maintain a pool of CURL handles and keep them warm. See high performance curl tips. You also need to tune Linux network stack.
If you run this in cloud, make sure you use multiple micro machines rather than one heavy machine as the throughput is better when you have multiple resources available.

Keeping two distant programs in-sync using MySql

I am trying to write a client-server app.
Basically, there is a Master program that needs to maintain a MySQL database that keeps track of the processing done on the server-side,
and a Slave program that queries the database to see what to do for keeping in sync with the Master. There can be many slaves at the same time.
All the programs must be able to run from anywhere in the world.
For now, I have tried setting up a MySQL database on a shared hosting server as where the DB is hosted
and made C++ programs for the master and slave that use CURL library to make request to a php file (ex.: www.myserver.com/check.php) located on my hosting server.
The master program calls the URL every second and some PHP code is executed to keep the database up to date. I did a test with a single slave program that calls the URL every second also and execute PHP code that queries the database.
With that setup however, my web hoster suspended my account and told me that I was 'using too much CPU resources' and I that would need to use a dedicated server (200$ per month rather than 10$) from their analysis of the CPU resources that were needed. And that was with one Master and only one Slave, so no more than 5-6 MySql queries per second. What would it be with 10 slaves then..?
Am I missing something?
Would there be a better setup than what I was planning to use in order to achieve the syncing mechanism that I need between two and more far apart programs?

I would use Google App Engine for storing the data. You can read about free quotas and pricing here.

I think the syncing approach you are taking is probably fine.
The more significant question you need to ask yourself is, what is the maximum acceptable time between sync's that is acceptable? If you truly need to have virtually realtime syncing happening between two databases on opposite sites of the world, then you will be using significant bandwidth and you will unfortunately have to pay for it, as your host pointed out.
Figure out what is acceptable to you in terms of time. Is it okay for the databases to only sync once a minute? Once every 5 minutes?
Also, when running sync's like this in rapid succession, it is important to make sure you are not overlapping your syncs: Before a sync happens, test to see if a sync is already in process and has not finished yet. If a sync is still happening, then don't start another. If there is not a sync happening, then do one. This will prevent a lot of unnecessary overhead and sync's happening on top of eachother.

Are you using a shared web host? What you are doing sounds like excessive use for a shared (cPanel-type) host - use a VPS instead. You can get an unmanaged VPS with 512M for 10-20USD pcm depending on spec.
Edit: if your bottleneck is CPU rather than bandwidth, have you tried bundling up updates inside a transaction? Let us say you are getting 10 updates per second, and you decide you are happy with a propagation delay of 2 seconds. Rather than opening a connection and a transaction for 20 statements, bundle them together in a single transaction that executes every two seconds. That would substantially reduce your CPU usage.

How to prevent server blocking php

I'm currently developing a php daemon for connecting and retreiving data from social networks like facebook and twitter. This script allready works but I have some concerns about it.
It's possible to create an infinite amount of accounts that the script has to process and (right now) it runs every 5 minutes to create a 'near' realtime experience. So my concern is that, when, let's say 5000 accounts, have been created and have to be monitored. The script slows down and maybe wil run longer than the 5 minute interval. Is there any way to work around this problem? And better, is there any good way (with php, possible with javascript) to create a better 'near' realtime experience?
Any advice will be great!
Thanks in advance

One option would be to spawn multiple daemons and share duties between them. Perhaps have single central job queue and have the daemons consume that. It's really a server-side issue and Javascript has very little to do with such tasks, as long it's not server-side JS.
If the number of monitored subjects is going into thousands, PHP is not really a viable choice since it's neither inherently multi-threaded nor does it have synchronization features. In mass monitoring scenarios, a dedicated server running a J2EE, .NET or a custom multithreaded application is pretty much a must.

for most sites you can retrieve a stream containing all that data(in real-time). For example:
1. twitter
site streams allows services,
such as web sites or mobile push
services, to receive real-time updates
for a large number of users without
any of the hassles of managing REST
API rate limits
2. Facebook
The Graph API supports real-time
updates to enable your application
using Facebook to subscribe to changes
in data from Facebook.
When using these streams you can process the streams in real-time and don't have to do no(nearly none) polling.
P.S: I would most definitely code this in node.js.

set the max execution time to zero and include it
enclose your script in a inite loop:
set_time_limit(0);
while(true){
/your code
}
You should however include some way to end the process gracefully.
Some popular ways to do this is by checking if a env var was set or if a specific file exists.
set_time_limit(0);
while(true){
/your code
if(file_exist(KILL_SWITCH_FILE))break;
}
Another approach would be setting a flag when(in a filem,in a sql database,...) that your script is running and removing it when your done.
That way you can check if another instance of your script is still running.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.