I've doubt regarding speed and latency for show real time data.
Let's assume that I want to show read time data to users by fire ajax requests at every second that get data from MySql table by simple collection query.
For that currently these two options are bubbling in my mind
MySql / Amazon Aurora
File system
Among these options which would be better? Or any other solution?
As I checked practically, if we open one page in browser then ajax requests gives response in less than 500ms using PHP, MySql, Nginx stack.
But if we open more pages then same ajax requests gives response in more than 1 second that should be less than 500ms for every visitors.
So in this case if visitors increase then ajax requests gives very poor response.
I also checked with Node.js+MySql but same result.
Is it good to create json files for records and fetch data from file? Or any other solution?
Indeed, you have to use database to store actual data but you can easily add memory cache (it could be internal dictionary or separate component) to track actual updates.
Than your typical ajax request will look something like:
Memcache, do we have anything new for user 123?
Last update was 10 minutes ago
aha, so nothing new, let's return null;
When you write data:
Put data into database
Update lastupdated time for clients in memcache
Actual key might be different - e.g. chat room id. Idea is to read database only when updates actually happened.
Level 2:
You will burn you webserver and also client internet with high number of calls. You can do something:
DateTime start = DateTime.Now;
while(Now.Subtract(30 seconds) < start)
{
if (hasUpdates) return updates;
sleep(100);
}
Than client will call server 1 time per 30 seconds.
Client will get response immediately when server notices new data.
Related
User keys in search parameters, then we make a request to a data provider and redirect user to a loading page. The response from the data provider hits a callback url, in which case we parse the results and store about 200 rows into the db. Meanwhile the loading page uses ajax to query the db every second and when the results are all there we display the results to the user.
The issue is that insert into the mysql db is too slow. We know the response back from the data provider comes back within seconds, but the processing of the script and inserting of rows into the db is very slow. We do use multirow insert.
Any suggestions to improve? FYI, the code is hugely long... that's why not displaying right now.
There are multitude of factors affecting your insertions:
1) slow hardware and bad server speeds.
Sol : Contact your server administrator
2) Use something other than InnoDB
3) Use a surrogate key , other than your primary key that is numeric and sequential along with your natural primary key.
OR
4) Try this https://stackoverflow.com/a/2223062/3391466.
Suggestion: Instead of running the code on one page and having the user wait the whole process, why not have the php page store the instructions in a php queue? The instructions would then be executed by a separate php script (for instance a Cron Job) and the user wouldn't have to wait for the whole process to take place.
However, in this situation it would be ideal to let the user know that the changes made can take a bit of time to update.
Cron jobs are very easy to implement. In CPanel there is an option for Cron Jobs where you specify which script you want to run and in which intervals. You can let your script know to run once every 1 minute (or more or less depending on how much demand there is). From there your script would check the queue and could keep on running until the queue is empty again.
Let me know if that helped!
I made a private chat system. So far the chat has 3 jquery ajax post scripts calling to the server in a loop for new data.
Message window between current user and target user (The ajax gets the timestamp of the last message on the db and compares it to the last message timestamp that was displayed. Get all messages > than last message timestamp and display it on message window. ajax loops every 5 seconds after last return.)
Whos online checker (Checks db for whos online. ajax loops every 30 seconds after last return)
Who messaged current user (Check and Get users who are not the current target user on the message window and has messaged the current user. ajax loops every 15 seconds after last return)
So far the above 3 are the only ajax loops I have and I am still double checking my code for areas where I can trim it down.
My question is. Would it be better in conserving server system resources if I group together the above 3 ajax post to create 1 ajax post and loop it every 5/8 seconds. Or should I leave it as?
I ask this because I got a warning from my hosting before that I was consuming too much of their server's system resources (due to a very stupid experiment). If I mess up again their gonna cut my hosting so I do hope you guys understand why I ask this kind of question.
Extra details: I use jquery ajax to talk to a php script that gets the data from a mysql db. The loop for the requests are done client side.
Websockets are tricky. So if you decide to go with ajax there are a couple of factors to consider:
The frequence. Efficient systems usually use a sort of tick system. In your case a tick would be 5 seconds as all your time lines can be tacted into a 5 second tact. And yes of course you group all transmission needs of a tick into 1 transmission.
The data quantity. Try to not send more than 1KB of Bytes per tick. Eg. use sparse formats like csv over eg. XML. Set hard entry limits. Compress. Things like that. Network traffic is packaged - so sending 1025 Bytes causes allocation of 2KB resources.
Act on user's inactivity somehow. Eg. do not use up each tick for the "Message window between current user and target user" if the user is inactive for more than a minute. Sort-of-session timeout of 20 minutes or so...
The computation. Make the server side tick response QUICK and small. Consider to use memory tables or mem chaches for the tick handling and then have a ten minutes or so agent that stores to persistence what is needed to go there. Try to avoid complex fat operations (like eg. >3 db round trips) in the tick response.
The hoster. That was also said in other comment. A quick additional hint: You could ask if you are allowed to implement that thing before you sign the contract, if you are able to change the contract. Sometimes there are things like video and instant messaging mentioned in the general terms of service.
There are probably more things.. But these come to my mind immediately...
In general maybe you should also check out https://developers.google.com/speed/docs/best-practices/rtt
I think I understand the concept of polling fairly well. You basically just request data from the server, but only once the data has changed, does the server return it. Straight forward stuff. My problem comes with this example.
Let's say I have auctions with data that changes constantly. Among this data are things like
Closing time of the auction
Number of current bidders on the auction
When I start the long poll, I basically have something like this:
while($counter < $MESSAGE_TIMEOUT_SECONDS) {
$newData = getNewData();
$hasDataChanged = hasDataChanged($newData, $oldData);
if ( $hasDataChanged ) {
return $newData;
}
usleep($MESSAGE_POLL_MICROSECONDS);
}
Where do I get the old data from? I mean, when doing the request, I can either post the current state as it was last given to me, or I can store the data in Session. Am I allowed to store stuff in session when doing a long poll, or should I do a POST from the Javascript with the current state of that page?
Also, how would I stop someone opening 50 pages from killing the database? I mean, getNewData() effectively goes to the database. With a polling interval of about half a second, this could mean 50 requests every half a second, which could mean 50 x 2 x 30 = 3000 requests to the database in 30 seconds by just one user, if he decided to open 50 tabs?
Any ideas?
I would cache all ajax response data in memory along with last date that each auction had any change so you don't have to compare old and new data but just datetime. Invalidate cache on some change (closed, new bid, etc.) for auction.
Then from client side send time from last known data (last ajax call or when user opened page) and compare dates to see if something changed, if it didn't just return status:nochange (now client side knows there is nothing to update) and if it did return all necessary data from cache and update users page.
This model should protect database from overloading.
Currently I have a data file in dropbox that is uploaded every 15 seconds, I want to take this data, which has several different data types, and graph the real time data that the user selects on a website. I have a data server but my data is not on there. Is there any way for me to take this data from the file and graph it while also having a control panel that selects which data I want to graph.
You can refresh your web page using Ajax. Note that if your refresh is set to every 15 seconds and your data comes in every 15 seconds, worst-case is that you will show data that's almost 30 seconds old if the timing of the data update and the Ajax refresh is unfortunate.
You probably want to check for new data using Ajax more frequently, depending on your specific needs. On the server side, cache the result of the Ajax update to avoid too much duplicate processing.
To create the data that you return from the Ajax query, open and process the data file. No need for MySQL. You can use the timestamp of the file to invalidate the result cache I suggest in the previous paragraph.
There are many JavaScript based charting libraries that can update via Ajax. Here's a good starting point:
Graphing JavaScript Library
I'm trying to build a service that will collect some data form web at certain intervals, then parse those data, finally upon result of parse - execute dedicated procedures. Typical schematic of service run:
Request item list to be updated to
Download data of listed items
Check what's not updated yet
Update database
Filter data that contains updates (get only highest priority updates)
Perform some procedures to parse updates
Filter data that contains updates (get only medium priority updates)
Perform some procedures to parse ...
...
...
Everything would be simple if there ware not so many data to be updated.
There is so many data to be updated that at every step from 1 to 8 (maybe besides 1) scripts will fail due to restriction of 60 sec max execution time. Even if there was an option to increase it this would not be optimal as the primary goal of the project is to deliver highest priority data as first. Unlucky defining priority level of an information is based on getting majority of all data and doing lot of comparisons between already stored data and incoming (update) data.
I could resign from the service speed to get at least high priority updates in exchange and wait longer time for all the other.
I thought about writing some parent script (manager) to control every step (1-8) of service, maybe by executing other scripts?
Manager should be able to resume unfinished step (script) to get it completed. It is possible to write every step in that way that it will do some small portion of code and after finishing it mark this small portion of work as done in i.e. SQL DB. after manager's resuming, step (script) will continue form the point it was terminated by server due to exceeding max exec. time.
Known platform restrictions:
remote server, unchangeable max execution time, usually limit to parse one script at the same time, lack of the access to many apache features, and all the other restrictions typical to remote servers
Requirements:
Some kind of manager is mandatory as besides calling particular scripts this parent process must write some notes about scripts that ware activated.
Manager can be called by crul, one minute interval is enough. Unlucky, making for curl a list of calls to every step of service is not an option here.
I also considered getting new remote host for every step of service and control them by another remote host that could call them and ask for doing their job by using ie SOAP but this scenario is at the end of my list of wished solutions because it does not solve problem of max execution time and brings lot of data exchange over global net witch is the slowest way to work on data.
Any thoughts about how to implement solution?
I don't see how steps 2 and 3 by themself can execute over 60 seconds. If you use curl_multi_exec for step 2, it will run in seconds. If you are getting your script over 60 seconds at step 3, you would get "memory limit exceeded" instead and a lot earlier.
All that leads me to a conclusion, that the script is very unoptimized. And the solution would be to:
break the task into (a) what to update and save that in database (say flag 1 for what to update, 0 for what not to); (b) cycle through rows that needs update and update them, setting flag to 0. At ~50 seconds just shut down (assuming that script is run every few minutes, that will work).
get a second server and set it up with a proper execution time to run your script for hours. Since it will have access to your first database (and not via http calls), it won't be a major traffic increase.