Server optimization for ajax-heavy Apache+Mysql+PHP site - php

I've looked around and haven't found a pre-existing answer to this question.
Info
My site relies on Ajax, Apache, Mysql, and PHP.
I have already built my site and it works well however as soon as too many users begin to connect (when receiving roughly 200+ requests per second) the server performs very poorly.
This site is very reliant on ajax. The main page of the site performs an ajax request every second so if 100 people are online, I'm receiving at least 100 requests per second.
These ajax queries invoke mysql queries on the server-side. These queries return small datasets. The returned datasets will change very often so I'd imagine caching would be ineffective.
Questions
1) What configuration practices would be best to help me increase the maximum number of requests per second? This applies to Ajax, Mysql, PHP, and Apache.
2) For Apache, do I want persistent connections (the KeepAlive directive) to be "On" or "Off"? As I understand, Off is useful if you are expecting many users, but On is useful for ajax and I need both of these things.
3) When I test the server's performance on serving a plain, short html page with no ajax (and involving only 1 minor mysql query) it still performs very poorly when this page gets 200+ requests per second. I'd imagine this must be due to apache configuration / server resources. What options do I have to improve this?
Thanks for any help!

Depending on the actual user need the caching can be implemented in different patterns. In many cases the users don't really need updates per every second and/or they can be cached for longer period of times and just make it look like it updates a lot. It depends...
Just to give some ideas:
Do every user need to get really unique, user specific responses from ajax requests or is it same or similar to all or sub groups of the users?
Does it make sense to have every second updates for every user?
Can the users notice the difference if the data is cached for, let's say, 10 seconds?
If the data is really unique for every user, but doesn't get updated for every user per every second, couldn't you use data refreshing (invalidate cached data when the data actually changes)?

I used requirejs to lazy load the js, html and css files. For server to serve loads of assets you need to keep the KeepAliveTimeout to 15

Related

Is it just fine to POST data to Laravel every second?

I am trying to build a Tracking System where in an android app sends GPS data to a web server using Laravel. I have read tutorials on how to do realtime apps but as how I have understand, most of the guides only receives data in realtime. I haven't seen yet examples of sending data like every second or so.
I guess its not a good practice to POST data every second to a web server specially when you already have a thousand users. I hope anyone could suggest how or what should I do to get this approach?
Also, as much as possible I would only like to use Laravel without any NodeJS server.
Do sending quickly
First you should estimate server capacity. As of fpm, if you have 32 php processes and every post request handles by a server within 0.01sec, capacity can be roughly estimated asN = 32 / 0.01 = 3200 requests per second.
So just do handling fast. If your request handles for 0.1sec, it is too slow to have a lot of clients on a single server. Enable opcache, it can decrease time 5x. Inserting data to mysql is a slow operation, so you probably need to work it out to make it faster. Say, add it to a fast cache (redis\memcached) and when cache already contains 1000 elements or cache is created more than 0.5 seconds ago, move it to a database as a single insert query.
Do sending random
Most of smartphones may have correct time. So it can lead to a thousand of simultaneous requests when next second starts. So, first 0.01sec server will handle 1000 requests, next 0.99sec it will sleep. Insert at mobile code a random delay 0-0.9sec which is fixed for every device and defined at first install or request. It will load server uniformly.
There's at least 2 really important things you should consider:
Client's internet consumption
Server capacity
If you got a thousand users, every second would mean a lot of requests for you server to handle.
You should consider using some pushing techniques, like described in this #Dipin answer:
And when it comes to the server, you should consider using a queue system to handle those jobs. Like described in this article There's probably some package providing the integration to use Firebase or GCM to handle that for you.
Good luck, hope it helps o/

Downloading many web pages with PHP curl

I'm building a PHP application which has a database containing approximately 140 URL's.
The goal is to download a copy of the contents of these web pages.
I've already written code which reads the URL's from my database then uses curl to grab a copy of the page. It then gets everything between <body> </body>, and writes it to a file. It also takes into account redirects, e.g. if I go to a URL and the response code is 302, it will follow the appropriate link. So far so good.
This all works ok for a number of URL's (maybe 20 or so) but then my script times out due to the max_execution_time being set to 30 seconds. I don't want to override or increase this, as I feel that's a poor solution.
I've thought of 2 work arounds but would like to know if these are a good/bad approach, or if there are better ways.
The first approach is to use a LIMIT on the database query such that it splits the task up into 20 rows at a time (i.e. run the script 7 separate times, if there were 140 rows). I understand from this approach it still needs to call the script, download.php, 7 separate times so would need to pass in the LIMIT figures.
The second is to have a script where I pass in the ID of each individual database record I want the URL for (e.g. download.php?id=2) and then do multiple Ajax requests to them (download.php?id=2, download.php?id=3, download.php?id=4 etc). Based on $_GET['id'] it could do a query to find the URL in the database etc. In theory I'd be doing 140 separate requests as it's a 1 request per URL set up.
I've read some other posts which have pointed to queueing systems, but these are beyond my knowledge. If this is the best way then is there a particular system which is worth taking a look at?
Any help would be appreciated.
Edit: There are 140 URL's at the moment, and this is likely to increase over time. So I'm looking for a solution that will scale without hitting any timeout limits.
I dont agree with your logic , if the script is running OK and it needs more time to finish, just give it more time it is not a poor solution.What you are suggesting makes things more complicated and will not scale well if your urls increase.
I would suggest moving your script to the command line where there is no time limit and not using the browser to execute it.
When you have an unknown list wich will take an unknown amount of time asynchronous calls are the way to go.
Split your script into a single page download (like you proposed, download.php?id=X).
From the "main" script get the list from the database, iterate over it and send an ajax call to the script for each one. As all the calls will be fired all at once, check for your bandwidth and CPU time. You could break it into "X active task" using the success callback.
You can either set the download.php file to return success data or to save it to a database with the id of the website and the result of the call. I recommend the later because you can then just leave the main script and grab the results at a later time.
You can't increase the time limit indefinitively and can't wait indefinitively time to complete the request, so you need a "fire and forget" and that's what asynchronous call does best.
As #apokryfos pointed out, depending on the timing of this sort of "backups" you could fit this into a task scheduler (like chron). If you call it "on demand", put it in a gui, if you call it "every x time" put a chron task pointing the main script, it will do the same.
What you are describing sounds like a job for the console. The browser is for the users to see, but your task is something that the programmer will run, so use the console. Or schedule the file to run with a cron-job or anything similar that is handled by the developer.
Execute all the requests simultaneously using stream_socket_client(). Save all the socket IDs in an array
Then loop through the array of IDs with stream_select() to read the responses.
It's almost like multi-tasking within PHP.

Is it much more resource-demanding to load 100 mysql rows than to load only 5 of them?

I am doing a pagination system for about 100 items.
My question is:
Should I just load all 100 of them and then use jQuery to switch pages without reloading? Or should I use a MySQL query with "LIMIT 5" and then, each time user presses on Next Page or Previous Page, another Mysql query with LIMIT 5 is initiated?
For every item, I would have to load a thumbnail picture but I could keep it in the cache to avoid using my server bandwidth.
Which one is the best option from a server resource perspective?
Thanks in advance. Regards
Try connecting directly to your MySql instance via the command line interface. Execute the query with 100 at at time, and then with LIMIT 5. Look at the msec results. This will tell you which is more efficient or less resource-demanding.
100 records at a time from MySql (depending on dataset) really is nothing. The performance hit wouldn't be noticeable for a properly written query/database schema.
That said, I vote for calling only the results you need at a time. Use the LIMIT clause and your jquery pagination method to make it efficient.
For the server, the most efficient way would be to grab all 100 items once, send them to the client once, and have the client page through them locally. That's one possibly expensive query, but that's cheaper overall than having the client go back and forth for each additional five items.
Having said that, whether that's feasible is a different topic. You do not want to be pushing a huge amount of data to your client at once, since it'll slow down page loads and client-side processing. In fact, it's usually desirable to keep the bandwidth consumed by the client to a minimum. From that POV, making small AJAX requests with five results at a time when and only when necessary is much preferable. Unless even 100 results are so small overall that it doesn't make much of a difference.
Which one works best for you, you need to figure out.
Depends significantly on your query. If it is a simple SELECT from a well-designed table (indexes set etc.) then unlee you're running on a very underpowered server, there will be no noticeable difference between requesting 100 rows and 5 rows. If it is complicated query, then you should probably limit the number of queries.
Other considerations to take into account are how long it takes to load a page, as in the actual round trip time to the server to receive the data by the client. I'm going to make the wild guess that you are in America or Europe, where internet speeds are nice a fast, not the entire world is that lucky. Limiting the number of times your site has to request data from the server is a much better metric than how much load your server has.
This is moving rapidly into UX here, but your users don't care about your server load, they don't care if this way means your load average is 0.01 instead of 0.02. They will care if you have almost instantaneous transitions between sections of your site.
Personally, I'd go with the "load all data, then page locally" method. Also remember that Ajax is your friend, if you have to, load the results page, then request the data. You can split the request into two: first page and rest of pages. There's alot of behind-the-scenes tweaks you can do to make your site seem incredibly fast, and that is something people notice.
I'd say, load 5 at a time and paginate. My considerations:
It is indeed much lighter to load 5 at a time
Not all of your users will navigate through all 100, so those loaded might not even be used
A slight load time between 5 records are something expected (i.e. most users won't complain just because they have to wait 500ms - 1s)
You can also give user options to display x number of items per page, and put all options as well to let users see all items in the page. Over time, you can also monitor what most of your users preference in terms of x number of items to display per page are then go with that for the default LIMIT

Strategies for rarely updated data

Background:
2 minutes before every hour, the server stops access to the site returning a busy screen while it processes data received in the previous hour. This can last less than two minutes, in which case it sleeps until the two minutes is up. If it lasts longer than two minutes it runs as long as it needs to then returns. The block is contained in a its own table with one field and one value in that field.
Currently the user is only informed of the block when (s)he tries to perform an action (click a link, send a form etc). I was planning to update the code to bring down a lightbox and the blocking message via BlockUI jquery plugin automatically.
There are basically 2 methods I can see to achieve my aim:
Polling every N seconds (via PeriodicalUpdater or similar)
Long polling (Comet)
You can reduce server load for 1 by checking the local time and when it gets close to the actual time start the polling loop. This can be more accurate by sending the local time to the server returning the difference mod 60. Still has 100+ people querying the server which causes an additional hit on the db.
Option 2 is the more attractive choice. This removes the repeated hit on the webserver, but doesn't allieve the repeated check on the db. However 2 is not the choice for apache 2.0 runners like us, and even though we own our server, none of us are web admins and don't want to break it - people pay real money to play so if it isn't broke don't fix it (hence why were are running PHP4/MySQL3 still).
Because of the problems with option 2 we are back with option 1 - sub-optimal.
So my question is really two-fold:
Are there any other possibilities I've missed?
Is long polling really such a problem at this size? I understand it doesn't scale, but I am more concerned at what level does it starve Apache of threads. Also are there any options you can adjust in Apache so it scales slightly further?
Can you just send to the page how many time is left before the server starts processing data received in the previous hour. Lets say that when sending the HTML you record that after 1 min the server will start processing. And create a JS that will trigger after that 1 min and will show the lightbox.
The alternative I see is to get it done faster, so there is less downtime from the users perspective. To do that I would use a distributed system to do the actual data processing behind the hourly update, such as Hadoop. Then use whichever method is most appropriate for that short downtime to update the page.

Generating scoreboards on large traffic sites

Bit of an odd question but I'm hoping someone can point me in the right direction. Basically I have two scenarios and I'd like to know which one is the best for my situation (a user checking a scoreboard on a high traffic site).
Top 10 is regenerated every time a user hits the page - increase in load on the server, especially in high traffic, user will see his/her correct standing asap.
Top 10 is regenerated at a set interval e.g. every 10 minutes. - only generates one set of results causing one spike every 10 minutes rather than potentially once every x seconds, if a user hits in between the refresh they won't see their updated score.
Each one has it's pros and cons, in your experience which one would be best to use or are there any magical alternatives?
EDIT - An update, after taking on board what everyone has said I've decided to rebuild this part of the application. Rather than dealing with the individual scores I'm dealing with the totals, this is then saved out to a separate table which sort of acts like a cached data source.
Thank you all for the great input.
Adding to Marcel's answer, I would suggest only updating the scoreboards upon write events (like new score or deleted score). This way you can keep static answers for popular queries like Top 10, etc. Use something like MemCache to keep data cached up for requests, or if you don't/can't install something like MemCache on your server serialize common requests and write them to flat files, and then delete/update them upon write events. Have your code look for the cached result (or file) first, and then iff it's missing, do the query and create the data
Nothing is never needed real time when it comes to the web. I would go with option 2 users will not notice that there score is not changing. You can use some JS to refresh the top 10 every time the cache has cleared
To add to Jordan's suggestion: I'd put the scorecards in a separate (HTML formatted) file, that is produced every time when new data arrives and only then. You can include this file in the PHP page containing the scorecard or even let a visitor's browser fetch it periodically using XMLHttpRequests (to save bandwidth). Users with JavaScript disabled or using a browser that doesn't support XMLHttpRequests (rare these days, but possible) will just see a static page.
The Drupal voting module will handle this for you, giving you an option of when to recalculate. If you're implementing it yourself, then caching the top 10 somewhere is a good idea - you can either regenerate it at regular intervals or you can invalidate the cache at certain points. You'd need to look at how often people are voting, how often that will cause the top 10 to change, how often the top 10 page is being viewed and the performance hit that regenerating it involves.
If you're not set on Drupal/MySQL then CouchDB would be useful here. You can create a view which calculates the top 10 data and it'll be cached until something happens which causes a recalculation to be necessary. You can also put in an http caching proxy inline to cache results for a set number of minutes.

Categories