PHP dealing with concurrency

PHP dealing with concurrency - php

I'm running an enterprise level PHP application. It's a browser game with thousands of users online on an infrastructure that my boss refuses to upgrade and the machinery is running on 2-3 system load (yep linux) at all times. Anyhow that's not the real issue. The real issue is that some users wait until the server gets loaded (prime time) and they bring their mouse clickers and they click the same submit button like 10 - 20 times, sending 10-20 requests at the same time while the server is still producing the initial request, thus not updated the cache and the database.
Currently I have an output variable on each request, which is valid for 2 minutes and I have "mutex" lock which is basically a flag inside memcache which if found blocks the execution of the script further, but the mouse clicker makes so many requests at the same time that they run almost simultaneously which is a big issue for me.
How are you, the majority of StackOverflow folks dealing with this issue. I was thinking of flagging the cookie/session but I think I will get in the same issue if the server gets overloaded. Optimization is impossible, the source is 7 years old and is quite optimized, with no queries on most pages (running off of cache) and only querying the database on certain user input, like the one I'm trying to prevent.
Yep it's procedural code with no real objects. Machines run PHP 5 but the code itself is more of a PHP 4. I know, I know it's old and stuff but we can't spare the resource of rewriting this whole mess since most of the original developers left that know how stuff is intertwined and yeah, I'm basically patching old holes. But as far as I know this is a general issue on loaded PHP websites.
P.S: Disabling the button with javascript on submit is not an option. The real cheaters are advanced users. One of them had written a bot clicker and packed it as a Google Chrome extension. Don't ask how I dealt with that.

I would look for a solution outside your code.
Don't know which server you use but apache has some modules like mod_evasive for example.
You can also limit connections per second from an IP in your firewall

I'm getting the feeling this is touching more on how to update a legacy code base than anything else. While implementing some type of concurrency would be nice, the old code base is your real problem.
I highly recommend this video which discusses Technical Debt.
Watch it, then if you haven't already, explain to your boss in business terms what technical debt is. He will likely understand this. Explain that because the code hasn't been managed well (debt paid down) there is a very high level of technical debt. Suggest to him/her how to address this by using small incremental iterations to improve things.

limiting the IP connections will only make your players angry.
I fixed and rewrote a lot of stuff in some famous opensource game clones with old style code:
well, i must say that cheating can be always avoid executing the right queries and logic.
for example look at here http://www.xgproyect.net/2-9-x-fixes/9407-2-9-9-cheat-buildings-page.html
Anyway, about performace, keep in mind that code inside sessions will block all others thread untill current one is closed. So be carefull to inglobe all your code inside sessions.Also, sessions should never contain heavy data.
About scripts: in my games i have a php module that automatically rewrite links adding an random id saved in database, a sort of CSRFprotection. Human user will click on the changed link, so they will not see the changes but scripts will try to ask for the old link and after some try there are banned!
others scripts use the DOM , so its easy to avoid them inserting some useless DIV around the page.
edit: you can boost your app with https://github.com/facebook/hiphop-php/wiki

I don't know if there's an implementation already out there, but I'm looking into writing a cache server which has responsibility for populating itself on cache misses. That approach could work well in this scenario.
Basically you need a mechanism to mark a cache slot as pending on a miss; a read of a pending value should cause the client to sleep a small but random amount of time and retry; population of pending data in a traditional model would be done by the client encountering a miss instead of pending.
In this context, the script is the client, not the browser.

Related

How to tell if someone has left

I was wondering how I could check if someone has left the site/page and perform an action after they left. I was reading on here and I found this:
No, there isn't. The best you can do is send an AJAX request every X seconds (perhaps only if the user moves the mouse). If the server doesn't receive any requests for 2X seconds, assume that the user left.
That's what I had planned for before but how could you make the server do something (in my case it's to remove them from the DB) if they stop sending the request? An example I can think of is how on Facebook when you go to the site you tell them you're here and online which marks you as online in chat but when you leave it marks you as offline. How is that possible?
Edit: After a while of using cron jobs I found out that web hosting sites don't like running cron jobs often enough to generate a "live" feelings on your site. So instead I found node.js and it works a 1000x better and is much simpler. I'd recommend anyone with the same issue to use it, it's very cheap to buy hosting for and it's simple to learn, if you know Javascript you can build in it no problem. Just be sure to know how Async works.

A not uncommon approach is to run a cron job periodically that checks the list of users, and does XYZ if they have been inactive for X minutes.

Facebook uses the XMPP protocol via the Jabber service to have a constant or real-time connection with the user. However, implementing one isn't an easy task at all. The most simple solution would be, as mentioned in the comments, to have the client make AJAX requests to the server every several seconds, so that the server may check whether the user is still viewing the site or not.
You might want to check out my question, which might be related to yours.

The only method I ever remember in my time developing is one that's not 100% reliable as a number of factors can actually and most likely cause it to either misfired, or not even run fully. Up to and including someone disabling JavaScript. Which grant it isn't highly likely with the way websites of today are put together. But people have the option to turn it off, and then people who are acting maliciously tend to have it off as well.
Anyway, the method I have read about but never put much stock in is, the onunload() event. That you tie into something like the <body> tag.
Example:
<body onunload="myFunction()">
Where myFunction() is a bit of JavaScript to do whatever it is your seeking to have done.
Again not superbly reliable for many reasons, but I think in all it's the best you have with almost all client side languages.

How can I display currently online users, without a database? Or at least very efficiently. Preferably in PHP

This issue has been quite the brain teaser for me for a little while. Apologies if I write quite a lot, I just want to be clear on what I've already tried etc.
I will explain the idea of my problem as simply as possible, as the complexities are pretty irrelevant.
We may have up to 80-90 users on the site at any one time. They will likely all be accessing the same page, that I will call result.php. They will be accessing different results however via a get variable for the ID (result.php?ID=456). It is likely that less than 3 or 4 users will be on an individual record at any one time, and there are upwards of 10000 records.
I need to know, with less than a 20-25 second margin of error (this is very important), who is on that particular ID on that page, and update the page accordingly. Removing their name once they are no longer on the page, once again as soon as possible.
At the moment, I am using a jQuery script which calls a php file, reading from a database of "Currently Accessing" usernames who are accessing this particular ID, and only if the date at which they accessed it is within the last 25 seconds. The file will also remove all entries older than 5 minutes, to keep the table tidy.
This was alright with 20 or 30 users, but now that load has more than doubled, I am noticing this is a particularly slow method.
What other methods are available to me? Has anyone had any experience in a similar situation?
Everything we use at the moment is coded in PHP with a little jQuery. We are running on a server managed offsite by a hosting company, if that matters.
I have come across something called Comet or a Comet Server which sounds like it could potentially be of assistance, but it also sounds extremely complicated for my purposes and far beyond my understanding at the moment.

Look into websockets for a realtime socket connection. You could use websockets to push out updates in real time (instead of polling) to ensure changes in the 'currently online users' is sent within milliseconds.

What you want is an in-memory cache with a service layer that maintains the state of activity on the site. Using memcached might be a good starting point. Your pseudo-code would be something like:
On page access, make a call to CurrentUserService
CurrentUserService takes as a parameter the page you're accessing and who you are.
Each time you call it, it removes whatever you were accessing before from the cache.
Then it adds what you're currently accessing.
Then it compiles a list of who else is accessing the same thing based on the current state in the cache.
It returns this list, which your page processes and displays.
If you record when someone accesses a page, you can set a timeout for when the service stops 'counting' them as accessing the page.

What might be the best way to benchmark a users PC, PHP or JS?

PHP - Apache with Codeigniter
JS - typical with jQuery and in house lib
The Problem: Determining (without forcing a download) a user's PC ability &/or virus issue
The Why: We put out a software that is mostly used in clinics, but can be used from home, however, we need to know, before they go to our mainsite, if their pc can handle the enormities of our web-based, browser-served software.
Progress: So far, we've come up with a decent way to test dl speed, but that's about it.
What we've done: In php we create about a 2.5Gb array of data to send to the user in a view, from there the view calculates the time it took to get the data and then subtracts the php benchmark from this time in order to get a point of reference of upload/download time. This is not enough.
Some of our (local) users have been found to have "crappy" pc's or are virus infected and this can lead to 2 problems. (1)They crash in the middle of preforming task in our program, or (2) their virus' could be trying to inject into our js thus creating a bad experience that may make us look bad to the average (uneducated on how this stuff works) user, thus hurting "our" integrity.
I've done some googling around, but most plug-ins or advice forums/blogs i've found simply give ways to benchmark the speed of your JS and that is simply not enough. I need a simple bit of code (no visual interface included, another problem i found with one nice piece of js lib that did this, but would take days to remove all of the authors personal visual code) that will allow me to test the following 3 things:
The user's data transfer rate (i think we have this covered, but if better method presented i won't rule it out)
The user's processing speed, how fast is the computer in general
possible test for infection via malware, adware, whatever maybe harmful to the user's experience
What we are not looking to do: repair their pc! We don't care if they have problems, we just don't want to lead them into our site if they have too many problems. If they can't do it from home, then they will be recommended to go to their nearest local office to use this software "in house" so to speak.
Further Explanation
We know your can't test the user-side stuff with PHP, we're not that stupid, PHP is mentioned because it can still be useful in either determining connection speed or in delivering a script that may do what we want. Also, this is not a software for just anyone on the net to go sign up and use, if you find it online, unless you are affiliated with a specific clinic and have a login name and what not, your not ment to use the sight, and if you get in otherwise, it's illegal. I can't really reveal a whole lot of information yet as the sight is not live yet. What I can say, is it mostly used by clinics/offices for customers to preform a certain task. If they don't have time/transport/or otherwise and need to do it from home, then the option is available. However, if their home PC is not "up to snuff" it will be nothing but a problem for them and make the 2 hours task they are meant to preform become a 4-6hour nightmare. Thus the reason, i'm at one of my fav quest sights asking if anyone may have had experience with this before and may know a good way to test the user's PC so they can have the best possible resolution, either do it from home (as their PC is suitable) or be told they need to go to their local office. Hopefully this clears things up enough we can refrain from the "sillier" answers. I need a REAL viable solution and/or suggestions, please.

PHP has (virtually) no access to information about the client's computer. Data transfer can just as easily be limited by network speed as computer speed. Though if you don't care which is the limiter, it might work.
JavaScript can reliably check how quickly a set of operations are run, and send them back to the server... but that's about it. It has no access to the file system, for security reasons.
EDIT: Okay, with that revision, I think I can offer a real suggestion - basically, compromise. You are not going to be able to gather enough information to absolutely guarantee one way or another that the user's computer and connection are adequate, but you can get a general idea.
As someone suggested, use a 10MB-20MB file and several smaller ones to test actual transfer rate; this will give you a reasonable estimate. Then, use JavaScript to test their system speed. But don't just stick with one test, because that can be heavily dependent on browser. Do the research on what tests will best give an accurate representation of capability across browsers; things like looping over arrays, manipulating (invisible) elements, and complex math. If there is a significant discrepancy between browsers, then use different thresholds; PHP does know what browser they're using, so you can give the system different "good enough" ratings depending on that. Limiting by version (like, completely rejecting IE6) may help in that.
Finally... inform the user. Gently. First let them know, "Hey, this is going to run a test to see if your network connection and computer are fast enough to use our system." And if it fails, tell them which part, and give them a warning. "Hey, this really isn't as fast as we recommend. You really ought to go down to the local clinic to perform this task; if you choose to proceed, it may take a lot longer than intended." Hopefully, at that point, the user will realize that any issues are on them, not on you.

What you've heard is correct, there's no way to effectively benchmark a machine based on Javascript - especially because the javascript engine mostly depends on the actual browser the user is using, amongst numerous other variables - no file system permissions etc. A computer is hardly going to let a browsers sub-process stress itself anyway, the browser would simply crash first. PHP is obviously out as it's server-side.
Sites like System Requirements Lab have the user download a java applet to run in it's own scope.

How can I write an efficient php chat?

I wrote a PHP chatscript with JQUERY on the userend pullstyle. It reloaded a pull.php page once a second and retrieved only new chat records from my sql database since the last check.
However I got an email from my host saying I was using too much bandwidth. Bleh. My chat was working just as I wanted it and this happened. I'm scared to do a COMET php chat because I was told it uses a separate process or thread foreach user.
I suppose I just need a strategy that's good, and more efficient than what I had.

Well you've got the premise down. 1 second recalls are far too frequent - can anyone bang out a response in a second and expect it to go through? It takes me at least 2 seconds to get my SO messages together / relatively typo free.
So part 1 - increase the recall time. You kick it up to 10 seconds, the app is still going to feel pretty speedy, but you're decreasing processing by a factor of 10 - that's huge. Remember every message includes extra data (in the HTML and in all the network layers that support it), and every received message by the server requires processing to decide what to do with it.
Part 2 - You might want to introduce a sliding scale of interactivity speed. When the user isn't typing and hasn't received a message in a while the recheck rate could be decreased to perhaps 30 seconds. If the user is receiving more than one message in a "pull" or has received a message for the last 3 consecutive pulls then the speed should increase (perhaps even more frequent than 10 seconds). This will feel speedy to users during conversations, and not slow/laggy when there's nothing going on.
Part 3 - Sounds like you're already doing it, but your pull should be providing the minimum possible reply. That is just the HTML fragment to render the new messages, or perhaps even just a JSON structure that can be rendered by client-side JavaScript. If you're getting all the messages again (bad) or the whole page again (worse) you can seriously decrease your bandwidth this way.
Part 4 - You may want to look at alternate server software. Apache has some serious overhead... and that's for good reason, you've got all your plugs running, apache config, .htaccess config, rewrites, the kitchen sink - you might be able to switch to a lighter weight http server - nginx, lighttp - and realise some serious performance increase (strictly speaking it won't help much with bandwidth - but one wonders if that's what the host is really complaining about, or that your service consumes serious server resources).
Part 5 - You could look at switching host, or package - a horrible thought I'm sure, but there may be packages with more bandwidth/processor (the later being more key I suspect) available... you could even move to your own server which allows you to do what you want without the service provider getting involved.

You don't need to check for a new message every second, kick it down to maybe 3 or 4. Make sure you are only using 1 query and that the file loaded is as small as possible. And I would always add a timeout... it would stop reloading after a minute or 2 of no activity.
Make these changes and see where that puts you.
Post some code if you want to see if we can help.

I believe that comet is the only solution.
Threads are much lighter than processes, so you would want to use a COMET type technology with a light threaded web server like lighthttpd.

Take a look at Simple Chat. It's very tiny and extensible i think.

All you need to do is "move ahead with one of the standard solutions available"....
Comet, Web Socket, XMPP are a few words you will hear every now n then, just pick one of them and it should be more efficient than classic periodic ajax call based php chat.
If you want to go ahead with PHP and jQuery based solution, try these two working examples of browser based chat system using XMPP: http://bit.ly/jaxlBoshChat (for one-to-one chat example) and http://bit.ly/jaxlBoshMUC (for multi-user chat example). Let me know if you face any problem setting them up.

First site going live real soon. Last minute questions

I am really close to finishing up on a project that I've been working on. I have done websites before, but never on my own and never a site that involved user generated data.
I have been reading up on things that should be considered before you go live and I have some questions.
1) Staging... (Deploying updates without affecting users). I'm not really sure what this would entail, since I'm sure that any type of update would affect users in some way. Does this mean some type of temporary downtime for every update? can somebody please explain this and a solution to this as well.
2) Limits... I'm using the Kohana framework and I'm using the Auth module for logging users in. I was wondering if this already has some type of limit (on login attempts) built in, and if not, what would be the best way to implement this. (save attempts in database, cookie, etc.). If this is not whats meant by limits, can somebody elaborate.
Edit: I think a good way to do this would be to freeze logging in for a period of time (say 15 minutes), or displaying a captcha after a handful (10 or so) of unseccesful login attempts
3) Caching... Like I said, this is my first site built around user content. Considering that, should I cache it?
4) Back Ups... How often should I backup my (MySQL) database, and how should I back it up (MySQL export?).
The site is currently up, yet not finished, if anybody wants to look at it and see if something pops out to you that should be looked at/fixed. Clashing Thoughts.
If there is anything else I overlooked, thats not already in the list linked to above, please let me know.
Edit: If anybody has any advice as to getting the word out (marketing), i'd appreciate that too.
Thanks.
EDIT: I've made the changes, and the site is now live.

1) Most sites who incorporate frequent updates or when their is a massive update that will take some time use a beta domain such as beta.example.com that is restricted to staff until it is released to the main site for the public.
2) If you use cookies then they can just disable cookies and have infinite login attempts, so your efforts will go to waste. So yeah, use the database instead. How you want it to keep track is up to you.
3) Depends on what type of content it is and how much there is. If you have a lot of different variables, you should only keep the key variables that recognize the data in the database and keep all the additional data in a cache so that database queries will run faster. You will be able to quickly find the results you want and then just open the cache file associated with them.
4) It's up to you, it really depends on traffic. If you're only getting 2 or 3 new pieces of data per day, you probably don't want to waste the time and space backing it up every day. P.S. MySQL exports work just fine, I find them much easier to import and work with.

1) You will want to keep taking your site down for updates to a minimum. I tend to let jobs build up, and then do a big update at the end of the month.
2) In terms of limiting login attempts; Cookies will be simple to implement but is not fool-proof, it will prevent the majority of your users but it can be easily circumvented so it would be best to choose another way. Using a database would be better but a bit more complicated to implement and could add more strain to a database.
3) Cacheing depends greatly on how often content is updated or changes. If content is changing a lot it may not be worth caching data but if a lot of more static then maybe using something like memcache or APC will be of use.
4) You should always make regular backups. I do one daily via a cron job to my home server although a weekly one would suffice.

Side notes: YSlow indicates that:
you are not serving up expires headers on your CSS or images (causes pages to load slower, and costs you more bandwidth)
you have CSS files that are not served up with gzip compression (same issues)
also consider moving your static content (CSS,Images,etc.) to a separate domain (CDN) for faster load times

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.