I am running application (build on PHP & MySql) on VPS. I have article table which have millions of records in it. Whenever user login i am displaying last 50 records for each section.
So every-time use login or refresh page it is executing sql query to get those records. now there are lots of users on website due to that my page speed has dropped significantly.
I done some research on caching and found that i can read mysql data based on section, no. articles e.g (section - 1 and no. of articles - 50). store it in disk file cache/md5(section no.).
then in future when i get request for that section just get the data from cache/md5(section no).
Above solution looks great. But before i go ahead i really would like to clarify few below doubts from experts .
Will it really speed up my application (i know disk io faster than mysql query but dont know how much..)
i am currently using pagination on my page like display first 5 articles and when user click on "display more" then display next 5 articles etc... this can be easily don in mysql query. I have no idea how i should do it in if i store all records(50) in cache file. If someone could share some info that would be great.
any alternative solution if you believe above will not work.
Any opensource application if you know. (PHP)
Thank you in advance
Regards,
Raj
I ran into the same issue where every page load results in 2+ queries being run. Thankfully they're very similar queries being run over and over so caching (like your situation) is very helpful.
You have a couple options:
offload the database to a separate VPS on the same network to scale it up and down as needed
cache the data from each query and try to retrieve from the cache before hitting the database
In the end we chose both, installing Memecached and its php extension for query caching purposes. Memecached is a key-value store (much like PHP's associative array) with a set expiration time measured in seconds for each value stored. Since it stores everything in RAM, the tradeoff for volatile cache data is extremely fast read/write times, much better than the filesystem.
Our implementation was basically to run every query through a filter; if it's a select statement, cache it by setting the memecached key to "namespace_[md5 of query]" and the value to a serialized version of an array with all resulting rows. Caching for 120 seconds (3 minutes) should be more than enough to help with the server load.
If Memecached isn't a viable solution, store all 50 articles for each section as an RSS feed. You can pull all articles at once, grabbing the content of each article with SimpleXML and wrapping it in your site's article template HTML, as per the site design. Once the data is there, use CSS styling to only display X articles, using JavaScript for pagination.
Since two processes modifying the same file at the same time would be a bad idea, have adding a new story to a section trigger an event, which would add the story to a message queue. That message queue would be processed by a worker which does two consecutive things, also using SimpleXML:
Remove the oldest story at the end of the XML file
Add a newer story given from the message queue to the top of the XML file
If you'd like, RSS feeds according to section can be a publicly facing feature.
I have an auction site that sometimes becomes heavily loaded & mostly mysql is seen to consume lot of memory & cpu. The situation i have is as below.
An ajax query is going to mysql every second for every user who is online & watching the auction to check the bid count against a previous value. If anyone places a bid, the count is different, so this ajax invokes one more ajax that retrieves records & displays in a table bids that are specific to the user who is watching / logged in. I'm limiting this to first 10 to reduce load.
However the problem is if there are 50 users online, & one of them places a bid, 50 queries go into mysql & all of them detect the bid count has changed & issue further queries to get records to display bids corresponding to each user.
THe bigger problem is if there are 500 users online then 500 queries go into mysql to detect a change & if a bid is placed another 500 queries (a query specific to each online user) go into mysql & potentially crash the server.
Note: Currently there is a single mysql connection object used as a singleton in a php that is responsible for executing queries, retrieving records, etc.
I'm essentially looking at a solution where 500 queries don't goto mysql if 500 users are online, but all of them should get an update even if one of them places a bid for a particular auction. Any ideas / suggestions highly welcome.
How can i best implement a solution for this scenario that reduce the load on mysql ?
Resource wise we are fairly ok, doing a VPS4 on Hostgator. The only problem is cpu / memory usage which is 95% when many users are placing bids.
Appreciate some suggestions
It sounds like you will want to take a look at memcached or some other caching service. You can have a process querying MySQL and updating it into memcached, and ajax making a query directly into memcached to retrieve the rows.
Memcached does not keep the relational consistency, and querying it is much less resource consuming than querying MySQL every single time.
PHP has a very nice interface to work with memcached: Memcache
The website of the memcached project.
There are a few other caching services. You might also want to look at query caching in MySQL, but this would still need several connections into MySQL, which will be very resource consuming either way.
In the short-term, you could also just run the detailed query. It will return nothing when there's nothing to update (which replaces the first query!).
That might buy you some time for caching or deeper analysis of your query speed.
There are 1 on 1 live chat. Two solutions:
1) I store every message into database and with jQuery's help I check if there is a new message in database every second. Of course I use cache either. If there is, we give that message.
2) I store every message in one html file and every second through jQuery that file is shown over and over again.
What is better? Or there is third option? And in general, what is better, mysql or file for this kinda project?
Thank you very much.
P.S. The most important question is: what is more efficient and what way will eat less resources!
Edit: And is it, nowadays, very bad for many chats (let's say 2,500 chats, that means 5,000 users) to use long polling and check when file was edited every second through javascript? I use very similiar methods like this chat: http://css-tricks.com/jquery-php-chat/ Will it kill my hosting?
Everyone has given a wide range of opinions but I don't think anyone has really hit the nail on the head.
When it comes down to storing data, the amount of data, the rate it is to be accessed, and several other factors all determine what's the best storage platform.
Some people have suggested using memcached. Now although this is a valid answer (you can use it), I don't think that this is a good idea, solely based on the fact that memcached stores data within your server's memory.
Your memory is not for data storage, it's for use of the actual applications, operating system, shared libraries, etc.
Storing data within the memory can cause a lot of issues with other applications currently running. If you store too much data in your RAM your applications would not be able to complete operations assigned to them.
Although this is faster then a disk based storage platform such as MySQL, it's not as reliable.
I would personally use MySQL as your storage engine server-side. This would reduce the amount of problems you would come across and also makes the data very manageable.
To speed up the responses to your clients I would look at running node on your server.
This is because it's event driven and non-blocking.
What does that mean?
Well, when Client A requests some data that is stored on the hard drive, traditionally PHP might say to the C++, fetch me this chunk of data stored on this sector of the hard drive. C++ would say 'ok no problem', and while it goes of to get the information PHP would sit and wait for the data to be read and returned before it continues it's operations, blocking all other client's in the meantime.
With node, it's slightly different. Node will say to the kernel, 'fetch me this chunk of information and when your done, give me call', and then it continues to take requests from other clients that may not need disk access.
So suddenly because we have assigned a callback to the kernel, we do not have to wait :), happy days.
Take a look at this image:
This really could be the answer your looking for, please see the following for a more descriptive and detailed information regarding how node could be the right choice for you:
http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/
A fourth option, probably not what you want if you already have PHP code you want to use, but maybe the most efficient is to use a Javascript based server instead of php.
Node.js is easily capable of being a chat server and can store all the recent messages as a Javascript variable.
You can use long polling or other comet techniques so that you so not have to wait a second for messages to update.
Also, the event based architecture of a Javascript server means that there is no overhead for idling around waiting for messages.
It depends on number of chats in the same time. If it's for support and you expect average load to be 1 to 5 chat sessions at a time then you don't to worry too much. Just make sure that when there is no activity for some time stop refreshing and show a message for user to click to resume chat session.
If the visitors will chat with each other and you expect big number of sessions - 10-50 at the same time you can still use PHP + database. Just make sure you don't make redundant queries and your queries are cached correctly. To reduce load you can also deny chat script from being logged in web server:
SetEnvIf Request_URI "^/chat.php$" dontlog
CustomLog /var/log/apache2/access.log combined env=!dontlog
Edit:
you can have delay schema. For example if you query 2 times with delay 1 second and you get no data you can increase delay to 2 seconds. if you reach 10 queries with no response - increase delay to 5 seconds. After 10 minute you can pause the conversation, requiring users to click on a button to resume the chat. That'll, combined with advices above will guarantee low enough load to have many concurrent chats
Edit2:
I suggest you to find some flash or java solution and buy it. With 5000-10000 users you have to be genius to make it work on VPS, especially if RAM is not much. Not that it's not possible but you can rent cheaper VPS and with the rest of the money buy some solution in java or flash (don't know if flush supports 2 way connection, I'm not a flash expert).
Note about number of users: if you have 10 000 users my guess is that you'll have not more than 100 chats at the same time. Go and look dating sites - they have not more than 10% of the users online and maybe most of them are doing something else and not chatting
3rd option. use MEMCACHE. infinitely faster read/writes. perfect for your application.
Store the chat messages in the database but use Memcached as a caching layer for the database reads. So the most popular reads (e.g. the last 20 messages in the chat room) will always be served straight out of memory.
This gives you the benefit for speed for the most frequent operations and persistant storage for all of the messages.
Just to throw in another option... flat files could provide a less resource-hungry alternative.
Every chat is assigned a unique ID and a flat file stored for it. Every chat adds a line to this file. Each client machine then uses jquery to check ONLY the modified date of the file, to see if the chat has been updated.
While I would never normally recommend flat files over a database, I have a sneaky feeling that checking the modified date on a flat file would scale up better than the MySQL alternative.
I was intrigued so I did some tests and here are the results:
With an existing db connection, the number of "SELECT field FROM table LIMIT 0,1" that could be run in 1 second: ~ 4,000
Opening and closing a db connection, but running the same query: ~ 1,800
Checking the modified date on various different files: ~225,000
So to check if a conversation has been updated, storing the conversations in flat files and checking for the last modified date would easily be faster than doing anything with a database.
In general, http connections are not very useful when it comes to pushing data to the client. Doing polls at every x seconds tend to be a resource hog on any server, given you have significant traffic.
You should try XMPP combined with BOSH. Luckily, most of the heavy work is already done for you. You can implement a pure jquery (or other js framework) based solution very quickly. Read this tutorial, it will help you a lot - not only solving your specific problem but, giving you a broader view on how to implement push technologies over the good ole' http.
Unless, its a small-audience script - Between Database vs File-System, its better to use Database(.)
P.S:- Flash also makes a great platform for chat servers, you might wanna look into that aswell.
If you define a conversation as only two people, then a request every second is going to look like one read request per second per user, and one write request every time somebody writes something (say every 10 seconds). So every 10 seconds you will have about 2.2 requests per second, per conversation.
For 50 conversations, that's 100 users and 220 requests per second. That's a lot of load on a server for such a small number of conversations. Writing the conversation to JSON or XML, would probably provide a more scalable solution.
This article discusses the architecture of Meebo - long-polling, comet.
As an afterthought, have you considered installing an IM server like Jabber rather than starting from scratch?
you could always get the right tool for the job ... an XMPP compliant bit of software. for as poor as the documentation is, ejabber is pretty alright. because it follows closely the XMPP standard: http://code.google.com/p/ijab/ you can use any XMPP client. You can store all of it in an RDBMS if you like and provide similar functionalities that are offered in gmail / google talk.
$0.02
A really fast alternative could be a NoSQL database like MongoDB:
MongoDB homepage
Some benchmarks
MongoDB's extension homepage on php.net
I don't use it but you maybe can try Photon , a very high speed framework based on Mongrel.
On the author blog (in french) you have a example , 30 lines of code for a real time chat server, with video demonstration.
I think storing the data on the database is better. Please refer the following link
Script Tutorials Chat
hi i created a simple php/mysql/Ajax chat application and I have a few questions. before that let me explain how it works.
So, if a user is on the chat page, the ajax script sends a request to a php file that shows the chat histories (latest messages), and returns it in HTML. This request is looped every second to show the latest messages to the user viewing the page.
so far its been working great.
now my question and concern is, 1.) What are the cons of using a method like this, if any? 2.) What things should i worry most about, if it gets a large user base and many people are using it simultaneously? (mostly because its making a request every second, for each user on it..)
the mysql table is an innodb table, and I'm using only one SELECT statement without a WHERE clause.. something like SELECT * FROM table ORDER BY id DESC LIMIT 10 etc.. (basically, I'm making mysql do something very easy like cake)
3.) Any suggestion are welcome ;)
thanks very much
vikash
Definitely, you will need to look at scalability issues for both the web server and database server. There are technologies such as MySQL clustering for improving performance on the database and web clustering for the HTTP side of things.
With large scale use you may also look at trimming down the table by removing early posts and dumping them to a separate table for low-frequency access. You could also have some method of caching the database requests via some worker threads so the database reads are minimal, but the front-end will have the ability to cope with the high volume of requests.
I got 60 people in phpFreeChat (php/ajax/mysql chat) and it was a complete processor hog. It brought an 8 core server to its knees.
I am looking to display 60,000 records on a webpage with php pulling the records from a mysql database on localhost. These 60,000 records may change depending on the data input.
The records have 5 text fields and due to the sheer number of records, a significant time is taken to send the data from the mysql server to the web browser. Even on a localhost, the time taking is around 15 seconds. During this time, the page is empty.
I would like to seek professional opinion on how to either to
1. display the data in an alternative method, (which I'm not sure what method) or
2. hasten the sending of data from mysql server to the web browser using caching technology like memcache.
In the end i will be deploying the application on the internet where the lag would be immensely unacceptable (i.e. > 15 seconds).
Thank you and Best Regards!
I would suggest trying AJAX pagination. No user will be able to see and analyze 60k records at one time. You can have the php display the first x (however many fit on the average screen or two) records to fill 2-3 pages, and have JavaScript listen for a scroll change. If a user starts scrolling down, have it automatically query the next y records, and add them to the display list. Possibly also removing the records from the top of the list.
Also, adding some quick-jump links or a search feature could help, as you wouldn't want to scroll down 60k records to make changes.
This will significantly lighten the server and client load, as it would only have to serve up a couple hundred records at a time.
DataTable
You should have a look at YUI's DataTable. You should hook the datatable up to autocomplete. There is also an example how they did it in YUI2(help) but YUI3 is a lot faster.
Caching
Caching is also important. You say you could use memcached so that is very good. I am a big fan of redis(But both will work, but the nice thing is that redis is I think better suited for autocomplete). There is even a free plan of Redis To go.
Another important tip is to make sure you are getting your data as you want it displayed from the database. In other words if there is any calculation or processing that you have to do, avoid doing it in PHP code during loops. Use SQL functions to process data, name fields, etc. Databases are good at that sort of thing. Of course this may or may not apply to exactly what you're doing.