I am going to be building an AIR application that shows a list (about 1-25 rows of data) from a data-base. The data-base is on the web. I want the list to be as accurate as possible, meaning as soon as the data-base data changes, the list displayed in the app should update asap. I do not know of anyway that the air application could be notified when there is a change, I am thinking I am going to have to poll the data-base at certain intervals to keep an up to date list. So my question is, first is there any way to NOT have to keep checking the data-base? or if I do keep have to keep checking the data-base what is a reasonable interval to do that at?
Thanks.
What you're talking about is "push" and there are ways to do it, but they're very complicated and probably not worth it for what you're talking about. If you're so inclined, you can check out Comet and it's associated technologies.
I would recommend just polling every 30 seconds. The poll interval really depends on the data though. If it's lifetime home runs, then 30 seconds is a bit much, if it's a chat client or something, that's probably not enough.
Related
we have a PHP/MySQL/Apache Web app which holds a rating system. From time to time we do full recalculations for ratings, which means about 500 iterations of calculation, each taking 4-6 minutes and depending on the results of previous iteration (i.e., parallel solutions are not possible). Time is taken mostly by MySQL queries and loops for each rated player (about 100000 players on each iteration, but complex logic of linking between players gives no possibility for parallelization here also).
The problem is - when we start recalculation in plain old way (one PHP POST request), it dies after about 30-40 minutes from start (which gives only 10-15 iterations completed). The question "why it dies?" and other optimization issues are kinda out of league now - too complex logic, which needs to be refactored and even maybe rewritten in other language/infrastructure, yes, but we have no resources (time/people) for it now. We just need to make things work in the least annoying way.
So, the question: what is the best way to organize such recalculation, if possible, so that site admin can start recalculation by just one click and forget about it for one day, and it still does the thing?
I found on the web few advices for similar problems, but no silver bullet:
move iterations (and, therefore, timeouting) from server to client with usage of AJAX requests instead of plain old PHP requst - could possibly make the browser freeze (and AJAX's async nature is kinda bad for iterations);
make PHP to start a backend service which does the thing (like advised here) - it should take lot of work and I have no idea how to implement it.
So, I humbly ask for any advices possible in such situation.
I want to design a notification component. I want to understand what type of pulling notification methods are used out there to effectively pull the notification with minimal stress on the server.
Let's say for example I want to notify user of a chat message, I imagine I would need to pull the data quite regularly, like every 500ms for a quick response. However, doing this may overload the system. Hypothetically speaking if I have a million user browsing the site that's 2 million requests every second!
I'm thinking of writing an algorithm that will incrementally increase the pull interval by 1 second on each pull up to a maximum of 60 second. The interval will reset to 500ms if there is new data. In this way, if the user has frequent notification it will be instant. But if there hasn't been notification for a longer period of time, there maybe a bit of delay of up to a minute.
In essence I'm compromising between user experience and server load to find a middle ground for both.
Please advise on possible drawback of this approach if any. Is there a proper name for it?
Alternatively, is there a better method out there?
What you are doing is pulling or long pulling. Effectively it is not good for performance.
The alternative way is pushing (http://en.wikipedia.org/wiki/Push_technology). You push the data when there is something new.
you could use web socket it achieve this.
You could look at Apollo messaging middle-ware that have native support for websockets and good performances.
http://activemq.apache.org/apollo/
The method you are using could lead a network traffic overload on your server if there are many clients connected . Let's suppose you have 1000 clients connected : the server will have to handle 1000 different connections. A better approach is using a push notification system. Check this out https://nodejs.org/it/docs/
I'm doing an auction script and time syncing between visitors and the server is necessary (when will the auction end). Every time a user bids, auction end time is extended for a few seconds. My problem is that several users are complaining about their timers skipping (some seconds) and figured out that it is because of a high latency connection.
My current algorithm has a javascript function that runs every second, getting time left for the auction through ajax requests. Is there a better way to approach this, especially for high latency users, to prevent the timer skipping problem?
Adaptive intervals
First of all, I would suggest that you decrease the amount of polling. I don't know about your server implementation, but the current setup will create a lot of requests once you have a couple of users.
I would suggest that you adjust the polling interval depending on how much time is left. If there are two hours left until the end of an auction, we might not really care if the additional seconds are only fetched from the server every minute, right? You could do it like this
pollingInterval = secondsLeft / 100
The interval is shorter and the result is more accurate towards the end of the auction.
Server Sent Events
For the last minute or so, when you want a high accuracy, regular polling at short intervals is not the best solution, as discussed in the comments. Long polling is an option, but you should also look into HTML5 Server Sent Events, which is like a native browser implementation of long polling. There's a good introduction and comparison to Websockets. Browser support is already pretty good, there's a polyfill for unsupported browsers which falls back to...polling.
Have you looked into long polling? Use you could use a jquery/javascript countdown clock and then just change the countdown time whenever a new bid is placed. Should cut your ajax calls drastically.
javascript function that runs every second
This the old way to do what you want.
I think you need to use web-sockets to ensure real-time delivery for all users.
If you want to save time you can use any web-socket servers available instead of making it yourself.
I prefer Real-Time Pusher
It's easy and you can use it free but with a limited number of users. Also you can upgrade for more users.
www.pusher.com
Also, have good API documentation to help you to implement what you want fast and easy.
For any help with Pusher-or-websockets feel free to ask.
I have a table of more than 15000 feeds and it's expected to grow. What I am trying to do is to fetch new articles using simplepie, synchronously and storing them in a DB.
Now i have run into a problem, since the number of feeds is high, my server stops responding and i am not able to fetch feeds any longer. I have also implemented some caching and fetching odd and even feeds at diff time intervals.
What I want to know is that, is there any way of improving this process. Maybe, fetching feeds in parallel. Or may be if someone can tell me a psuedo algo for it.
15,000 Feeds? You must be mad!
Anyway, a few ideas:
Increase the Script Execution time-limit - set_time_limit()
Don't go overboard, but ensuring you have a decent amount of time to work in is a start.
Track Last Check against Feed URLs
Maybe add a field for each feed, last_check and have that field set to the date/time of the last successful pull for that feed.
Process Smaller Batches
Better to run smaller batches more often. Think of it as being the PHP equivalent of "all of your eggs in more than one basket". With the last_check field above, it would be easy to identify those with the longest period since the last update, and also set a threshold for how often to process them.
Run More Often
Set a cronjob and process, say 100 records every 2 minutes or something like that.
Log and Review your Performance
Have logfiles and record stats. How many records were processed, how long was it since they were last processed, how long did the script take. These metrics will allow you to tweak the batch sizes, cronjob settings, time-limits, etc. to ensure that the maximum checks are performed in a stable fashion.
Setting all this may sound like alot of work compared to a single process, but it will allow you to handle increased user volumes, and would form a strong foundation for any further maintenance tasks you might be looking at down the track.
fetch new articles using simplepie, synchronously
What do you mean by "synchronously"? Do you mean consecutively in the same process? If so, this is a very dumb approach.
You need a way of sharding the data to run across multiple processes. Doing this declaratively based on, say the modulus of the feed id, or the hash of the URL is not a good solution - one slow URL would cause multiple feeds to be held up.
A better solution would be to start up multiple threads/processes which would each:
lock list of URL feeds
identify the feed with the oldest expiry date in the past which is not flagged as reserved
flag this record as reserved
unlock the list of URL feeds
fetch the feed and store it
remove the reserved flag on the list for this feed and update the expiry time
Note that if there are no expired records at step 2, then the table should be unlocked, the next step depends on whether you run the threads as daemons (in which case it should implement an exponential back of, e.g. sleeping for 10 seconds doubling up to 320 seconds for consecutive iterations) or if you're running as batches, exit.
Thank You for your responses. I apologize I am replying a little late. I got busy with this problem and later I forgot about this post.
I have been researching a lot on this. Faced a lot of problems. You see, 15,000 feed everyday is not easy.
May be I am MAD! :) But I did solve it.
How?
I wrote my own algorithm. And YES! It's written in PHP/MYSQL. I basically implemented a simple weighted machine learning algorithm. My algorithm basically learns the posting time about a feed and then estimates the next polling time for the feed. I save it in my DB.
And since it's a learning algorithm it improves with time. Ofcourse, there are 'misses'. but these misses are alteast better than crashing servers. :)
I have also written a paper on this. which got published in a local computer science journal.
Also, regarding the performance gain, I am getting a 500% to 700% improvement in speed as opposed to sequential polling.
How is it going so far?
I have a DB that has grown in size of TBs. I am using MySQL. Yes, I am facing perforance issues on MySQL. but it's not much. Most probably, I will be moving to some other DB or implement sharding to my existing DB.
Why I chose PHP?
Simple, because I wanted to show people that PHP and MySQL are capable of such things! :)
I'm working on a social network like Friendfeed. When user add his feed links, I use a cron job to parse each user feed. Is this possible with large number of users, like parsing 10.000 links each hour or will that cause problems? If it isn't possible, what is used on Friendfeed or RSS readers to do that?
You might consider adding some information about your hardware to your question, this makes a big difference for someone looking to advise you on how easily your implementation will scale.
If you end up parsing millions of links, one big cron job is going to become problematic. I am assuming you are doing the following (if not, you probably should):
Realizing when users subscribe to the same feed, to avoid fetching it twice.
When fetching a new feed, check for the existence of a site map that tells you how often the feed is likely to change, re-visit that value on a sensible interval
Checking system load and memory usage to know when to 'back off' and go to sleep for a while.
This reduces the amount of sweat that an hourly cron would produce.
If you are harvesting millions of feeds, you'll probably want to distribute that work, something that you might want to keep in mind while you're still desigining your database.
Again, please update your question with details on the hardware you are using and how big your solution needs to scale. Nothing scales 'infinitely', so please be realistic :)
Don't have quite enough information to judge whether this design is good or not, but to answer the basic question, unless you are doing some very intensive processing on 10k questions, that should be trivial for an hourly cron job to handle.
More information on how you process the feeds, and in particular how the process scales with respect to number of users who have feeds and number of feeds per user, would be useful in giving you further advice.
Your limiting factor will be the network access to these 10,000 feeds. You could process the feeds serially and likely do 10,000 in an hour (you'd need to average about 350ms latency).
Of course you'd want to have more than one process doing the work simultaneously to speed things up.
What ever solution you select, if you meet success (which I hope), you will have performance issue.
As the founder of FF said many times: the only solution to select the best actual solution is to profile/measure. With numbers the choice will be obvious.
So: build a test architecture close to your expected (=realistic) situation in a few months and profile/measure.
You might want to consider checking out IronWorker for big data jobs like this. It's made for it and since it's a service you don't need to deal with servers or scale. It has scheduling built in so you would schedule a worker task to run each hour and that task can then queue up 10,000 other jobs and run them all in parallel.