I'm constructing a social networking website and I'm currently testing Ajax on certain features.
Right now, I have 3 separate calls coming from my include (which is on every page) that will check the database for new messages, achievements, and notifications and update the specific divs.
My biggest worry is that 3 separate calls would eventually lead to performance issues.
I have no way of really seeing how a large user-base would affect this, because my site is currently in beta and I haven't advertised yet. So I have a limited number of people to test with. When I do advertise and gain more members, I don't want to run into any hiccups.
I currently have the calls being made every 20 seconds. I was hoping someone could give me some advice on how long I should set the intervals. I have no way of currently knowing if 20 seconds would be too much, or if I could even set it to 10 and be fine.
Any advice would be appreciated. Thanks.
Perhaps you should look into server push to push the data to the client when there are new messages and so on. That means that instead of polling using resources, resources are only used when there are new messages and pushed to the client.
Some server push servers are:
Ajax Push Engine
Nginx Push Module
Related
I'm running an enterprise level PHP application. It's a browser game with thousands of users online on an infrastructure that my boss refuses to upgrade and the machinery is running on 2-3 system load (yep linux) at all times. Anyhow that's not the real issue. The real issue is that some users wait until the server gets loaded (prime time) and they bring their mouse clickers and they click the same submit button like 10 - 20 times, sending 10-20 requests at the same time while the server is still producing the initial request, thus not updated the cache and the database.
Currently I have an output variable on each request, which is valid for 2 minutes and I have "mutex" lock which is basically a flag inside memcache which if found blocks the execution of the script further, but the mouse clicker makes so many requests at the same time that they run almost simultaneously which is a big issue for me.
How are you, the majority of StackOverflow folks dealing with this issue. I was thinking of flagging the cookie/session but I think I will get in the same issue if the server gets overloaded. Optimization is impossible, the source is 7 years old and is quite optimized, with no queries on most pages (running off of cache) and only querying the database on certain user input, like the one I'm trying to prevent.
Yep it's procedural code with no real objects. Machines run PHP 5 but the code itself is more of a PHP 4. I know, I know it's old and stuff but we can't spare the resource of rewriting this whole mess since most of the original developers left that know how stuff is intertwined and yeah, I'm basically patching old holes. But as far as I know this is a general issue on loaded PHP websites.
P.S: Disabling the button with javascript on submit is not an option. The real cheaters are advanced users. One of them had written a bot clicker and packed it as a Google Chrome extension. Don't ask how I dealt with that.
I would look for a solution outside your code.
Don't know which server you use but apache has some modules like mod_evasive for example.
You can also limit connections per second from an IP in your firewall
I'm getting the feeling this is touching more on how to update a legacy code base than anything else. While implementing some type of concurrency would be nice, the old code base is your real problem.
I highly recommend this video which discusses Technical Debt.
Watch it, then if you haven't already, explain to your boss in business terms what technical debt is. He will likely understand this. Explain that because the code hasn't been managed well (debt paid down) there is a very high level of technical debt. Suggest to him/her how to address this by using small incremental iterations to improve things.
limiting the IP connections will only make your players angry.
I fixed and rewrote a lot of stuff in some famous opensource game clones with old style code:
well, i must say that cheating can be always avoid executing the right queries and logic.
for example look at here http://www.xgproyect.net/2-9-x-fixes/9407-2-9-9-cheat-buildings-page.html
Anyway, about performace, keep in mind that code inside sessions will block all others thread untill current one is closed. So be carefull to inglobe all your code inside sessions.Also, sessions should never contain heavy data.
About scripts: in my games i have a php module that automatically rewrite links adding an random id saved in database, a sort of CSRFprotection. Human user will click on the changed link, so they will not see the changes but scripts will try to ask for the old link and after some try there are banned!
others scripts use the DOM , so its easy to avoid them inserting some useless DIV around the page.
edit: you can boost your app with https://github.com/facebook/hiphop-php/wiki
I don't know if there's an implementation already out there, but I'm looking into writing a cache server which has responsibility for populating itself on cache misses. That approach could work well in this scenario.
Basically you need a mechanism to mark a cache slot as pending on a miss; a read of a pending value should cause the client to sleep a small but random amount of time and retry; population of pending data in a traditional model would be done by the client encountering a miss instead of pending.
In this context, the script is the client, not the browser.
I am a programmer at an internet marketing company that primaraly makes tools. These tools have certian requirements:
They run in a browser and must work in all of them.
The user either uploads something (.csv) to process or they provide a URL and API calls are made to retrieve information about it.
They are moving around THOUSANDS of lines of data (think large databases). These tools literally run for hours, usually over night.
The user must be able to watch live as their information is processed and is presented to them.
Currently we are writing in PHP, MySQL and Ajax.
My question is how do I process LARGE quantities of data and provide a user experience as the tool is running. Currently I use a custom queue system that sends ajax calls and inserts rows into tables or data into divs.
This method is a huge pain in the ass and couldnt possibly be the correct method. Should I be using a templating system or is there a better way to refresh chunks of the page with A LOT of data. And I really mean a lot of data because we come close to maxing out PHP memory and is something we are always on the look for.
Also I would love to make it so these tools could run on the server by themselves. I mean upload a .csv and close the browser window and then have an email sent to the user when the tool is done.
Does anyone have any methods (programming standards) for me that are better than using .ajax calls? Thank you.
I wanted to update with some notes incase anyone has the same question. I am looking into the following to see which is the best solution:
SlickGrid / DataTables
GearMan
Web Socket
Ratchet
Node.js
These are in no particular order and the one I choose will be based on what works for my issue and what can be used by the rest of my department. I will update when I pick the golden framework.
First of all, you cannot handle big data via Ajax. To make users able to watch the processes live you can do this using web sockets. As you are experienced in PHP, I can suggest you Ratchet which is quite new.
On the other hand, to make calculations and store big data I would use NoSQL instead of MySQL
Since you're kind of pinched for time already, migrating to Node.js may not be time sensitive. It'll also help with the question of notifying users of when the results are ready as it can do browser notification push without polling. As it makes use of Javascript you might find some of your client-side code is reusable.
I think you can run what you need in the background with some kind of Queue manager. I use something similar with CakePHP and it lets me run time intensive processes in the background asynchronously, so the browser does not need to be open.
Another plus side for this is that it's scalable, as it's easy to increase the number of queue workers running.
Basically with PHP, you just need a cron job that runs every once in a while that starts a worker that checks a Queue database for pending tasks. If none are found it keeps running in a loop until one shows up.
This issue has been quite the brain teaser for me for a little while. Apologies if I write quite a lot, I just want to be clear on what I've already tried etc.
I will explain the idea of my problem as simply as possible, as the complexities are pretty irrelevant.
We may have up to 80-90 users on the site at any one time. They will likely all be accessing the same page, that I will call result.php. They will be accessing different results however via a get variable for the ID (result.php?ID=456). It is likely that less than 3 or 4 users will be on an individual record at any one time, and there are upwards of 10000 records.
I need to know, with less than a 20-25 second margin of error (this is very important), who is on that particular ID on that page, and update the page accordingly. Removing their name once they are no longer on the page, once again as soon as possible.
At the moment, I am using a jQuery script which calls a php file, reading from a database of "Currently Accessing" usernames who are accessing this particular ID, and only if the date at which they accessed it is within the last 25 seconds. The file will also remove all entries older than 5 minutes, to keep the table tidy.
This was alright with 20 or 30 users, but now that load has more than doubled, I am noticing this is a particularly slow method.
What other methods are available to me? Has anyone had any experience in a similar situation?
Everything we use at the moment is coded in PHP with a little jQuery. We are running on a server managed offsite by a hosting company, if that matters.
I have come across something called Comet or a Comet Server which sounds like it could potentially be of assistance, but it also sounds extremely complicated for my purposes and far beyond my understanding at the moment.
Look into websockets for a realtime socket connection. You could use websockets to push out updates in real time (instead of polling) to ensure changes in the 'currently online users' is sent within milliseconds.
What you want is an in-memory cache with a service layer that maintains the state of activity on the site. Using memcached might be a good starting point. Your pseudo-code would be something like:
On page access, make a call to CurrentUserService
CurrentUserService takes as a parameter the page you're accessing and who you are.
Each time you call it, it removes whatever you were accessing before from the cache.
Then it adds what you're currently accessing.
Then it compiles a list of who else is accessing the same thing based on the current state in the cache.
It returns this list, which your page processes and displays.
If you record when someone accesses a page, you can set a timeout for when the service stops 'counting' them as accessing the page.
My client has a host of Facebook pages that have become very successful. In order to move away from big brother Facebook my client wishes to create a large dynamic site that incorporates the more successful parts of the Facebook empire.
One of my client's spin off sites has been created and is getting a lot of traffic. I'm not sure exactly how much but it hit 90 Gigs in a month as the space allocated need to be increased.
In any case my client has dreamed up a massive website with its own community looking to put the community under the one banner. However I am concerned that it will get thrashed, bottlenecks, long load time, etc.
My questions:
Will a managed dedicated server be able to handle a potentially large amount of traffic?
Is it going to be better to create various parts of the empire in their own separate hosting and domain (normal hosting or VPS), or is it better to have them all under the one hood (i.e. using sub-domains).
If they were all together would it be better for SEO and easier to manage? Or if they are separate, they may be quicker but would it need some sort of Passport user system so people can log into any of the website with the same user details?
Whats the best way to implement a Passport style user system? Do you remotely connect to databases? Or run a regular a Cron job that updates each individual user details on each domain? Maybe run CURL request to the other site given then any new data?
Any other Pros/Cons to keeping all the section together or separating them?
Large site like Facebook manages to have everything under the one root. Then sites like eBay have separate domain names but you can use the same user login across all of them.
I'm not sure what the best option is and would appreciate any guidance.
It is a very general question but to give some hints:
Measure, measure and measure again. Know what kind of parts are used heavily and which are not.
Fix things and go back to 1.
Really: Without knowing what takes lots of time, what is used most heavily etc. you cannot say anything usefull.
VPS or dedicated servers are not the right question. You start with: What do I have to do for the users. Then: How am I going to do that? (for example: in database, in scripts, in message queue) and then finally you see how much hardware you need.
One or multiple domains doesn't really matter. Though one exception: For static content it might be interesting if you have lots of it to use a CDN like Amazon. Read for example: http://highscalability.com/blog/2011/12/27/plentyoffish-update-6-billion-pageviews-and-32-billion-image.html where you can read some things about the possibilities with a CDN.
In general serving static content from a static domain is useful many other things don't really need that. So there you could just consider all in one domain.
I wrote a PHP chatscript with JQUERY on the userend pullstyle. It reloaded a pull.php page once a second and retrieved only new chat records from my sql database since the last check.
However I got an email from my host saying I was using too much bandwidth. Bleh. My chat was working just as I wanted it and this happened. I'm scared to do a COMET php chat because I was told it uses a separate process or thread foreach user.
I suppose I just need a strategy that's good, and more efficient than what I had.
Well you've got the premise down. 1 second recalls are far too frequent - can anyone bang out a response in a second and expect it to go through? It takes me at least 2 seconds to get my SO messages together / relatively typo free.
So part 1 - increase the recall time. You kick it up to 10 seconds, the app is still going to feel pretty speedy, but you're decreasing processing by a factor of 10 - that's huge. Remember every message includes extra data (in the HTML and in all the network layers that support it), and every received message by the server requires processing to decide what to do with it.
Part 2 - You might want to introduce a sliding scale of interactivity speed. When the user isn't typing and hasn't received a message in a while the recheck rate could be decreased to perhaps 30 seconds. If the user is receiving more than one message in a "pull" or has received a message for the last 3 consecutive pulls then the speed should increase (perhaps even more frequent than 10 seconds). This will feel speedy to users during conversations, and not slow/laggy when there's nothing going on.
Part 3 - Sounds like you're already doing it, but your pull should be providing the minimum possible reply. That is just the HTML fragment to render the new messages, or perhaps even just a JSON structure that can be rendered by client-side JavaScript. If you're getting all the messages again (bad) or the whole page again (worse) you can seriously decrease your bandwidth this way.
Part 4 - You may want to look at alternate server software. Apache has some serious overhead... and that's for good reason, you've got all your plugs running, apache config, .htaccess config, rewrites, the kitchen sink - you might be able to switch to a lighter weight http server - nginx, lighttp - and realise some serious performance increase (strictly speaking it won't help much with bandwidth - but one wonders if that's what the host is really complaining about, or that your service consumes serious server resources).
Part 5 - You could look at switching host, or package - a horrible thought I'm sure, but there may be packages with more bandwidth/processor (the later being more key I suspect) available... you could even move to your own server which allows you to do what you want without the service provider getting involved.
You don't need to check for a new message every second, kick it down to maybe 3 or 4. Make sure you are only using 1 query and that the file loaded is as small as possible. And I would always add a timeout... it would stop reloading after a minute or 2 of no activity.
Make these changes and see where that puts you.
Post some code if you want to see if we can help.
I believe that comet is the only solution.
Threads are much lighter than processes, so you would want to use a COMET type technology with a light threaded web server like lighthttpd.
Take a look at Simple Chat. It's very tiny and extensible i think.
All you need to do is "move ahead with one of the standard solutions available"....
Comet, Web Socket, XMPP are a few words you will hear every now n then, just pick one of them and it should be more efficient than classic periodic ajax call based php chat.
If you want to go ahead with PHP and jQuery based solution, try these two working examples of browser based chat system using XMPP: http://bit.ly/jaxlBoshChat (for one-to-one chat example) and http://bit.ly/jaxlBoshMUC (for multi-user chat example). Let me know if you face any problem setting them up.