MySQL concurrent connections - loading text from database

MySQL concurrent connections - loading text from database - php

If I have a single page, which its contents are generated from a MySQL database (simple query to display the contents of a cell, HTML contents), how many users can hit that page at once without crashing the database?
I have to display the same data across multiple domains, so instead of duplicating the page on three domains, I'm going to load it into mysql and use this route to display it. However, I want to find out how many concurrent connections I could handle without crashing the database.
Can anyone point me in the right directions to finding this out? I'm assuming this small query shouldn't be a huge load, but if 10,000 visitors hit it at once, then what?

You need to check ur setting for max_connections, you could get more information about this by looking at the following link: http://www.electrictoolbox.com/update-max-connections-mysql/
Are you using any framework? If so most have caching built into them which should solve the issue assuming that the information isn't being updated on a moment to moment basis. Even if it is, try to cache whatever parts of the page you assume are not going to change that often.

how many users can hit that page at once without crashing the database?
All of them.
It's a bad question.
The database should not crash because users are trying to connect. There are lots of limits on how many php clients can open concurrent connections to the database. Really you should have the DBMS correctly configured so it handles the situation gracefully - i.e. max_connections should limit the number of clients - not the amount of memory available to the DBMS.
If you mean how many concurrent connections can you support without connections being rejected / queued, that's something VERY different. And it's nearly a sensible question.
In that case, it's going to depend on the amount of memory you've got, how you've configured the DBMS, how fast the CPU is, what size the content is, how many content items there are, how PHP connects to the DBMS, how far away the PHP is from the DBMS....
You're more likely to run into connection limits on the webserver before you hit them on the DBMS - but the method for establishing what those limits are is the same for both - generate a controlled number of connection and measure how memory it uses - repeat for a different number, get lots of data. Draw a graph, see where the line crosses your resource limits.
so instead of duplicating the page on three domains, I'm going to load it into mysql
The number of domains is pretty much irrelevant - unless you're ecxplicitly looking for a method of avoiding ESI.
but if 10,000 visitors
Really? On a typical site that would mean perhaps a million concurrent HTTP sessions.

Related

Handling big arrays in PHP

The application i am working on needs to obtain dataset of around 10mb maximum two times a hour. We use that dataset to display paginated results on the site also simple search by one of the object properties should also be possible.
Currently we are thinking about 2 different ways to implement this
1.) Store the json dataset in the database or a file in the file system, read that and loop over to display results whenever we need.
2.) Store the json dataset in relational MySQL table and query the results and loop over whenever we need to display them.
Replacing/Refreshing the results has to be done multiple times per hour as i said.
Both ways have cons. I am trying to choose a good way which is less evil overall. Reading 10 MB in memory is not a lot and on the other hand rewriting a table few times a hour could produce conflicts in my opinion.
My concern regarding 1.) is how safe the app will be if we read 10mb in the memory all the time? What will happen if multiple users do this at some point of time, is this something to worry about or PHP is able to handle this in background?
What do you think it will be best for this use case?
Thanks!

When php runs on a web server (as it usually does) the server starts new php processes on demand when they're needed to handle concurrent requests. A powerful web server may allow fifty or so php processes. If each of them is handling this large data set, you'll need to have enough RAM for fifty copies. And, you'll need to load that data somehow for each new request. Reading 10mb from a file is not an overwhelming burden unless you have some sort of parsing to do. But it is a burden.
As it starts to handle each request, php offers a clean context to the programming environment. php is not good at maintaining in-RAM context from one request to the next. You may be able to figure out how to do it, but it's a dodgy solution. If you're running on a server that's shared with other web applications -- especially applications you don't trust -- you should not attempt to do this; the other applications will have access to your in-RAM data.
You can control the concurrent processes with Apache or nginx configuration settings, and restrict it to five or ten copies of php. But if you have a lot of incoming requests, those requests get serialized and they will slow down.
Will this application need to scale up? Will you eventually need a pool of web servers to handle all your requests? If so, the in-RAM solution looks worse.
Does your json data look like a big array of objects? Do most of the objects in that array have the same elements as each other? If so, that's conformable to a SQL table? You can make a table in which the columns correspond to the elements of your object. Then you can use SQL to avoid touching every row -- every element of each array -- every time you display or update data.
(The same sort of logic applies to Mongo, Redis, and other ways of storing your data.)

Running Query in Queue MYSQL

I want to know something really important for my website, Imagine if I have thousand of users that accessed the website, then, At that time there probably be thousand of queries that might be running at the same, So, Don't that fill up the DB pool as at the same time there are thousand of queries.So, to overcome this , how can I run the MYSQL queries in Queue ?
Also, I am using InnoDB as MYSQL Storage Engine, Any other suggestion is also appreciated.

The problem does not exist.
Properly written queries can come and go in the blink of an eye.
You won't have "thousands" of users actually executing queries at the same time. The real metric is "pages per minute".
If you were to "queue" the queries, latency would be bad, and the users would move to some competing site.
If you do have performance problems, it is almost always because of lack of index, poorly phrasing of queries, and/or improper schema design.
Another, more appropriate approach, for scaling is to use Replication go allow for multiple read-only Slaves and have multiple clients (on other serves) hitting those Slaves through a load-balancer. This is what Yahoo, for example, does to handle its traffic. And it needs only a handful of servers for each property (news, weather, sports, finance, etc). Are you at that scale yet?

RDMS handles the query queues internally. You can use caches systems in order to increase the response time and use nosql databases in order to scale your solution. On the other hand, if your intent is to throttle the visitors, you have to handle it in your application side.

I suggest you monitor a couple of status variables in MySQL:
SHOW GLOBAL STATUS LIKE 'Threads_%';
Threads_connected is the number of clients are currently connected and occupying a thread in the MySQL server. Typically this is one per concurrent PHP request (if you're using PHP).
Threads_running is the number of clients who are connect and actually executing a query at that moment.
You might have hundreds of threads connected, but only a handful of threads running at any given time. This is because a PHP script runs code in between the SQL queries. During this in-between time, the MySQL connection is still open, but not doing anything in the MySQL server. This is normal.
You might have thousands of users, but since human beings typically view a web page for at least a few seconds before they click on something to request the next web page, there is some time in between page views where neither your web server or your database server need to do anything for a given user.
So it's typically true that the MySQL server can handle the traffic even when you have thousands of users, because only a tiny fraction of them are running a query simultaneously.

Optimizing ajax to reduce mysql server load

I have an auction site that sometimes becomes heavily loaded & mostly mysql is seen to consume lot of memory & cpu. The situation i have is as below.
An ajax query is going to mysql every second for every user who is online & watching the auction to check the bid count against a previous value. If anyone places a bid, the count is different, so this ajax invokes one more ajax that retrieves records & displays in a table bids that are specific to the user who is watching / logged in. I'm limiting this to first 10 to reduce load.
However the problem is if there are 50 users online, & one of them places a bid, 50 queries go into mysql & all of them detect the bid count has changed & issue further queries to get records to display bids corresponding to each user.
THe bigger problem is if there are 500 users online then 500 queries go into mysql to detect a change & if a bid is placed another 500 queries (a query specific to each online user) go into mysql & potentially crash the server.
Note: Currently there is a single mysql connection object used as a singleton in a php that is responsible for executing queries, retrieving records, etc.
I'm essentially looking at a solution where 500 queries don't goto mysql if 500 users are online, but all of them should get an update even if one of them places a bid for a particular auction. Any ideas / suggestions highly welcome.
How can i best implement a solution for this scenario that reduce the load on mysql ?
Resource wise we are fairly ok, doing a VPS4 on Hostgator. The only problem is cpu / memory usage which is 95% when many users are placing bids.
Appreciate some suggestions

It sounds like you will want to take a look at memcached or some other caching service. You can have a process querying MySQL and updating it into memcached, and ajax making a query directly into memcached to retrieve the rows.
Memcached does not keep the relational consistency, and querying it is much less resource consuming than querying MySQL every single time.
PHP has a very nice interface to work with memcached: Memcache
The website of the memcached project.
There are a few other caching services. You might also want to look at query caching in MySQL, but this would still need several connections into MySQL, which will be very resource consuming either way.

In the short-term, you could also just run the detailed query. It will return nothing when there's nothing to update (which replaces the first query!).
That might buy you some time for caching or deeper analysis of your query speed.

What happens when my PHP website will start having a LOT of members?

This is something I am really curious about and I do not really understand how is that possible.
So lets say I am the owner of Facebook (ahah) and I have million of people visiting my website every day, thousands and thousands of images, videos, logs etc..
How do I store all this data?
Do I have more databases in different servers around the world and then I connect to them from a single location?
Do I use an internal API system that requests info from other servers where the data is stored?
For example I know that Facebook has a lot of data centers around the world and hundreds of servers..
How do they connect to these servers? Are the profiles stored in different locations and when I connect to my profile, I will then be using that specific server? Or is there one main server that has the support of other hundreds of servers around the world?
Is there a way to use PHP in a way that I will connect to different servers and to different mySQL (???) databases to store and retrieve data whenever I want?
Sorry if this looks like a silly question, but since it could happen a day to work on a successful website, I really want to know what I will have to do, and what is the logic behind.
Thank you very much.

I'll try to answer your (big) question but not from Facebook point of view since their architecture is pretty much known.
First thing you have to know is that you would have to distribute the workload of your web application. Question is how, so in order to determine what's going to be slow, you have to divide your app in segments.
First up is the HTTP server, or the one that accepts all the requests. By going to "www.your-facebook.com", you're contacting a service on an IP. Naturally, you would probably have more than one IP but let's say you have a single entry point.
Now what happens? You have an HTTP server software, let's say Apache and it handles incoming connections. Since Apache creates a thread per connected user, it requires certain amount of memory for that operation. Eventually, it will run out of memory and then shit hits the fan, stuff stops working, your site is unavailable.
Therefore, you have to somehow scale this part of your application that connects your PHP code / MySQL db to people who want to interact with it.
Let's assume you successfully scaled your Apache and you have a cluster of computers which can accept new computers in order to scale-out. You solved your first problem.
Next part is the actual layer that does the work. Accepts input from the user and saves it somewhere (MySQL) and that's the biggest problem you'll have - why?
Due to the database.
Databases store their data on mediums such as hard drives. Hard drives, be it an SSD or mechanical one - are limited by their ability to write or retrieve data. If I'm not mistaken, RAM operates at levels of around 6GB/sec transfer rate. Not to mention that the seek time is also much much lower than HDD's one is.
Therefore, if you have an X amount of users asking for a piece of information and you can only deliver it at a certain rate - your app crashes, or it becomes unresponsive and the layer handling database queries becomes slow since the hardware cannot match the speed at which you need the data.
What are the options here? There are many, I won't mention all of them
Split Reads and Writes. Set your database layer in such a way that you have dedicated machines that write the data and completely different ones that read it. You have to use replication and replication has its own quirks - it never works without breaking.
Optimize handling of your data set by sharding your data. Great for read / write performance, screwed up when you need to query multiple shards and merge the data.
Get better hardware, especially storage (such as FusionIO)
Pay for better storage engine (such as TokuDB)
Alleviate load on the database by using caching. The data that your users request probably doesn't change so often that you have to query the db every single time (say you're viewing someone's profile, what's the chance they'll change it every second?). That's why Facebook uses Memcached extensively - a system that stores small pieces of data in RAM, it's easily scalable and what not. Most important, it's damn quick!
Use different solutions next to MySQL. MySQL (and some other databases) aren't good for every type of data storage or retrieval. Someone mentioned NoSQL before. NoSQL solutions are quick, but still immature. They don't do as much as relational databases do. They use methods of delaying disk write (they keep cached copy of data they need to write in RAM) so that they can achieve fast insert rates. That's why it's not unusual to lose data when using NoSQL.
Topic about MySQL vs "insert database or whatever here" is broad, I don't want to go into that but remember - every single one of data stores out there saves data on the hard drive eventually. The difference (physical of course) is how they optimize their flushing to the disk itself.
I also didn't mention various reports you can run by gathering the data (how many men between 19 and 21 have clicked an advert X between 01:15 and 13:37 CET and such) which is what Facebook is actually gathering (scary stuff!).
Third up - the language gluing the data store (MySQL) and output (HTTP server). PHP.
As you can see, most of the work here is already done by Apache and MySQL. Optimization on PHP level is small, even facebook got small results (they claim 50%, but that's UP TO 50%). I tried HipHop extensively, it is not as fast as it claims to be. Naturally, Facebook guys mentioned that already, so it's no wonder. The advantage they get is because they replaced Apache with their own server built in into HipHop. Some people claim "language X is better than language Y" and they're right, but that's not always the case. Each language has its own advantages and disadvantages.
For example, PHP is widely-spread but it's slow for certain operations (implementing a Trie with over 1 billion entries for example). It's great for things like echo some HTML after parsing the output from the db. It's quick to insert and retrieve data from the database, and that's about 90% of the PHP usage - talk to the db, display the data, end.
Therefore, no matter what language you use (say we used C++ instead of PHP), your bottleneck will be the data storage / retrieval layer.
On the other hand, why is using C++ NOT handy? Because there are more people who know how to use PHP than ones who use C++. It's also MUCH slower to develop web apps in C++. Sure, they will execute faster, but who will notice the difference between 1 millisecond and 1 microsecond?
This post is more like an informative blog post, I know it's not filled with resources to back up my claims but anyone who did any work with larger data sets or websites will know that the P.I.T.A. is always the data storage component. Some things that I said probably won't fit with everyone, but in a NUTSHELL this is how you'd go about optimizing your site.

Unfortunately, your question doesn't have a simple answer. For the MySQL portion of it, you would need to investigate database scale-out. You can start looking at it here: http://www.mysql.com/why-mysql/scaleout/mixi.html. There are a number of different ways to set up Apache/PHP web sites across a server farm. One of them involves setting up round robin DNS. This is adding a DNS record with a number of different IP addresses. Your DNS then hands out a different IP address each time the record is requested so that the load is balanced across a number of servers. You can also set up clustering with MySQL, Apache and Heartbeat, but that is more of a high-availability solution than a scaling solution.

When you have a website with so many users you'll already have enough experience to know the answer of the question, you'll also have a lot of money to pay people to find the optimal architecture of your system.
I'm not saying that what I describe below is the Holy Grail, but it is certainly an option:
You will have a big, fragmented database with lots of backups and you'll have a few name servers which will know the location of servers and some rules about the data stored on each server. When data is searched the query will be sent to a name server which will find the server(s) where the answer can be found for the particular query. I've also upvoted N.B.'s answer, I think he is mostly right.

For lots of users, you should have a server with lots of memory and speed. Configure php.ini to allow more memory usage. A server with lots of users should have 4-12GB available. Also, save resources by closing the desktop environment. If you have this many users, you might want to consider a CDN and also make a database request queue.

How to make a javascript/php chatroom more efficient in terms of load time and sql communication

Right now the setup for my javascript chat works so it's like
function getNewMessage()
{
//code would go here to get new messages
getNewMessages();
}
getNewMessages();
And within the function I would use JQuery to make a get post to retrieve the messages from a php scrip which would
1. Start SQL connection
2. Validate that it's a legit user through SQL
3. retrieve only new message since the last user visit
4. close SQL
This works great and the chat works perfectly. My concern is that this is opening and closing a LOT of SQL connections. It's quite fast, but I'd like to make a small javascript multiplayer game now, and transferring user coordinates as well as the tens of other variables 3 times a second in which I'm opening and closing the sql connection each time and pulling information from numerous tables each time might not be efficient enough to run smoothly, and might be too much strain on the server too.
Is there any better more efficient way of communicating all these variables that I should know about which isn't so hard on my server/database?

Don't use persistent connections unless it's the only solution available to you!
When MySQL detects the connection has been dropped, any temporary tables are dropped, any active transaction is rolled back, and any locked tables are unlocked. Persistent connections only drop when the Apache child exits, not when your script ends, even if the script crashes! You could inherit a connection in the middle of a transaction. Worse, other requests could block, waiting for those tables to unlock, which may take quite a long time.
Unless you have measured how long it takes to connect and identified it as a very large percentage of your script's run time, you should not consider using persistent connections. In fact, that should be what you do here, if you're worried about performance. Check out xhprof or xdebug, profile your code, then start optimizing.

Maybe try to use a different approach to get the new messages from the server: Comet.
Using this technique you do not have to open that much new connections.

http://www.php.net/manual/en/features.persistent-connections.php
and
http://www.php.net/manual/en/function.mysql-pconnect.php

A couple of dozen players at the same time won't hurt the database or cause noticeable lag if you have efficient SQL statements. Likely your database will be hosted on the same server or at least the same network as your game or site, so no worries. If your DB happens to be hosted on a separate server running an 8-bit 16mz board loaded with MSDOS, located in the remote Amazon, connected by radio waves hooked up to a crank-powered generator operatated by a drunk monkey, you're on your own with this one.
Otherwise, really you should be more worried about exactly how much data you're passing back and forth to your players. If you're passing back and forth coordinates for all objects in an entire world, page load could take a painfully long time, even though the DB query takes a fraction of a second. This is sometimes overcome in games by a "fog of war" feature which doesn't bother notifying the user of every single object in the entire map, only those which are in immediate range of the player. This can easily be done with a single SQL query where object coordinates are in proximity to a player. Though, if you have a stingy host, they will care about the number of connects and queries.
If you're concerned about attracting even more players than that, consider exploring cache methods like pre-building short files storing commonly fetched records or values using fopen(), fgets(), fclose(), etc. Or, use php extensions like apc to store values in memory which persist from page load to page load. memcache or memcached also act similarly, but in a way which acts like a separate server you can connect to, store values which can be shared with other page hits, and query.
To update cached pages or values when you think they might become stale, you can run a cron job every so often to update these files or values. If your host doesn't allow cron jobs, consider making your guests do that legwork: a line of script on a certain page will refresh the cache with new values from a database query after a certain number of page hits. Or cache a date value to check against on every page hit, and if so much time has passed, refresh the cache.
Again, unless you're under the oppressive thumb of a stingy host, or unless you're getting a hundred or more page hits at a time, no need to even be concerned about your database. Databases are not that fragile. If they crashed in a hysterical fit of tears anytime more than one query came their way, the engineers who made it wouldn't have a job for very long.

I know this is quite an annoying "answer" but perhaps you should be thinking about this a different way, after all this is really not the strongest use of a relational database. Have you considered an XMPP solution? IMO this would be the best tool for the job and both ejabberd and openfire are trivial to set up these days. The excellent Strophe library can make the front end story easy, and as an added bonus you get HTTP binding (like commet) so you won't need to poll the server, your latency will go down and you'll be generating less HTTP traffic.
I know it's highly unlikely you're going to change your whole approach just cos I said so, but wanted to provide an alternative perspective.
http://www.ejabberd.im/
http://code.stanziq.com/strophe/

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.