I am new to memcached and just started using that. I have few questions:
I have implemented MemCached in my php database class, where I am storing resultset (arrays) in memcache. My question is that as it is for website, say if 4 user access the same page and same query execution process, then what would memcache do? As per my understanding for 1 user, it will fetch from DB, for rest 3 system will use Memcache.? is that right?
4 users mean it objects of memcache will generate? but all will use same memory? IS same applies to 2 different pages on website? as bith pages will be using
$obj = memcached->connect(parameter);
I have run a small test. But results are starnge, when I execute query with normal mysql statements, execution time is lower than when my code uses memcached? why is that? if thats the case why every where its is written memcache is fast.?
please give some example to effectively test memcached execution time as compare to mormal mysql_fetch_object.
Memcache does not work "automatically". It is only a key => value map. You need to determine how it is used and implement it.
The preferred method is:
A. Attempt to get from memcache
B. If A. failed, get from db, add to memcache
C. Return result
D. If you ever update that data, expire all associated keys
This will not prevent the same query executing on the db multiple times. If 2 users both get the same data at the same time, and everything is executed nearly at the same time as well, both attempts to fetch from memcache will fail and add to memcache. And that is usually ok.
In code, it will create as many connections as current users since it is run from php which gets executed for each user. You might also connect multiple times (if you're not careful with your code) so it could be way more times.
Many times, the biggest lag for both memcache AND sql is actually network latency. If sql is on the same machine and memcache on a different machine, you will likely see slower times for memcache.
Also, many frameworks/people do not correctly implement multi-get. So, if you have 100 ids and you get by id from memcache, it will do 100 single gets rather than 1 multi-get. That is a huge slow down.
Memcache is fast. SQL with query caching for simple selects is also fast. Typically, you use memcache when:
the queries you are running are complicated/slow
OR
it is more cost effective to use memcache then have everyone hit the SQL server
OR
you have so many users that the database is not sufficient to keep up with the load
OR
you want to try out a technology because you think it's cool or nice to have on your resume.
You can use any variety of profiling software such as xdebug or phprof.
Alternatively, you can do this although less reliable due to other things happening on your server:
$start = microtime(true);
// do foo
echo microtime(true) - $start;
$start = microtime(true);
// do bar
echo microtime(true) - $start;
You have two reasons to use memcache:
1 . Offload your database server
That is, if you have a high load on your database server because you keep querying the same thing over and over again and the internal mysql cache is not working as fast as expected. Or your might have issues regarding write performance that is clugging your server, then memcache will help you offload mysql in a consistent and better way.
In the event that you myself server is NOT stressed, there could be no advantage to using memcached if it is mostly for performance gain. Memcached is still a server, you still have to connect to it and talk to it, so the network aspect is still maintained.
2 . Share data between users without relying to database
In another scenario, you might want to share some data or state between users of your web application without relying on files or on a sql server. Using memcached, you can set a value from a user's perspective and load it from another user.
Good examples of that would be chat logs between users, you don'T want to store everything in a database because it makes a lot of writes and reads and you want to share the data and don't care to lose everything in case an error comes around and the server restarts...
I hope my answer is satisfactory.
Good luck
Yes that is right. Bascially this is called caching and is unrelated to Memcached itself.
I do not understand fully. If all 4 users connect to the same memchache daemon, they will use shared memory, yes.
You have not given any code, so it is hard to tell. There can be many reasons, so I would not jump to conclusions with so little information given.
You need to metric your network traffic with deep packet inspection to effectively test and compare both execution times. I can not give an example for that in this answer. You might be okay with just using microtime and log whether cache was hit (result was already in cache) or missed (not yet in cache, need to take from the database).
Related
I have searched for a few hours already but have found nothing on the subject.
I am developing a website that depends on a query to define the elements that must be loaded on the page. But to organize the data, I must repass the result of this query 4 times.
At first try, I started using mysql_data_seek so I could repass the query, but I started losing performance. Due to this, I tried exchanging the mysql_data_seek for putting the data in an array and running a foreach loop.
The performance didn't improve in any way I could measure, so I started wondering which is, in fact, the best option. Building a rather big data array ou executing multiple times the mysql_fetch_array.
My application is currently running with PHP 5.2.17, MySQL, and everything is in a localhost. Unfortunatly, I have a busy database, but never have had any problems with the number of connections to it.
Is there some preferable way to execute this task? Is there any other option besides mysql_data_seek or the big array data? Has anyone some information regarding benchmarking testes of these options?
Thank you very much for your time.
The answer to your problem may lie in indexing appropriate fields in your database, most databases also cache frequently served queries but they do tend to discard them once the table they go over is altered. (which makes sense)
So you could trust in your database to do what it does well: query for and retrieve data and help it by making sure there's little contention on the table and/or placing appropriate indexes. This in turn can however alter the performance of writes which may not be unimportant in your case, only you really can judge that. (indexes have to be calculated and kept).
The PHP extension you use will play a part as well, if speed is of the essence: 'upgrade' to mysqli or pdo and do a ->fetch_all(), since it will cut down on communication between php process and the database server. The only reason against this would be if the amount of data you query is so enormous that it halts or bogs down your php/webserver processes or even your whole server by forcing it into swap.
The table type you use can be of importance, certain types of queries seem to run faster on MYISAM as opposed to INNODB. If you want to retool a bit then you could store this data (or a copy of it) in mysql's HEAP engine, so just in memory. You'd need to be careful to synchronize it with a disktable on writes though if you want to keep altered data for sure. (just in case of a server failure or shutdown)
Alternatively you could cache your data in something like memcache or by using apc_store, which should be very fast since it's in php process memory. The big caveat here is that APC generally has less memory available for storage though.(default being 32MB) Memcache's big adavantage is that while still fast, it's distributed, so if you have multiple servers running they could share this data.
You could try a nosql database, preferably one that's just a key-store, not even a document store, such as redis.
And finally you could hardcode your values in your php script, make sure to still use something like eaccelerator or APC and verify wether you really need to use them 4 times or wether you can't just cache the output of whatever it is you actually create with it.
So I'm sorry I can't give you a ready-made answer but performance questions, when applicable, usually require a multi-pronged approach. :-|
I'm hoping to develop a LAMP application that will centre around a small table, probably less than 100 rows, maybe 5 fields per row. This table will need to have the data stored within accessed rapidly, maybe up to once a second per user (though this is the 'ideal', in practice, this could probably drop slightly). There will be a number of updates made to this table, but SELECTs will far outstrip UPDATES.
Available hardware isn't massively powerful (it'll be launched on a VPS with perhaps 512mb RAM) and it needs to be scalable - there may only be 10 concurrent users at launch, but this could raise to the thousands (and, as we all hope with these things, maybe 10,000s, but this level there will be more powerful hardware available).
As such I was wondering if anyone could point me in the right direction for a starting point - all the data retrieved will be the same for all users, so I'm trying to investigate if there is anyway of sharing this data across all users, rather than performing 10,000 identical selects a second. Soooo:
1) Would the mysql_query_cache cache these results and allow access to the data, WITHOUT requiring a re-select for each user?
2) (Apologies for how broad this question is, I'd appreciate even the briefest of reponses greatly!) I've been looking into the APC cache as we already use this for an opcode cache - is there a method of caching the data in the APC cache, and just doing one MYSQL select per second to update this cache - and then just accessing the APC for each user? Or perhaps an alternative cache?
Failing all of this, I may look into having a seperate script which handles the queries and outputs the data, and somehow just piping this one script's data to all users. This isn't a fully formed thought and I'm not sure of the implementation, but perhaps a combo of AJAX to pull the outputted data from... "Somewhere"... :)
Once again, apologies for the breadth of these question - a couple of brief pointers from anyone would be very, very greatly appreciated.
Thanks again in advance
If you're doing something like an AJAX chat which polls the server constantly, you may want to look at node.js instead, which keeps an open connection between server and browser. This way, you can have changes pushed to the user when they happen and you won't need to do all that redundant checking once per second. This can scale very well to thousands of users and is written in javascript on the server-side, so not too difficult.
The problem with using the MySQL cache is that the entire table cache gets invalidated on any write to that table. You're better off using a caching solution like memcached or APC if you're trying to control that behavior more precisely. And yes, APC would be able to cache that information.
One other thing to keep in mind is that you need to know when to invalidate the cache as well, so you don't have stale data.
You can use apc,xcache or memcache for database query caching or you can use vanish or squid for gateway caching...
This is something I am really curious about and I do not really understand how is that possible.
So lets say I am the owner of Facebook (ahah) and I have million of people visiting my website every day, thousands and thousands of images, videos, logs etc..
How do I store all this data?
Do I have more databases in different servers around the world and then I connect to them from a single location?
Do I use an internal API system that requests info from other servers where the data is stored?
For example I know that Facebook has a lot of data centers around the world and hundreds of servers..
How do they connect to these servers? Are the profiles stored in different locations and when I connect to my profile, I will then be using that specific server? Or is there one main server that has the support of other hundreds of servers around the world?
Is there a way to use PHP in a way that I will connect to different servers and to different mySQL (???) databases to store and retrieve data whenever I want?
Sorry if this looks like a silly question, but since it could happen a day to work on a successful website, I really want to know what I will have to do, and what is the logic behind.
Thank you very much.
I'll try to answer your (big) question but not from Facebook point of view since their architecture is pretty much known.
First thing you have to know is that you would have to distribute the workload of your web application. Question is how, so in order to determine what's going to be slow, you have to divide your app in segments.
First up is the HTTP server, or the one that accepts all the requests. By going to "www.your-facebook.com", you're contacting a service on an IP. Naturally, you would probably have more than one IP but let's say you have a single entry point.
Now what happens? You have an HTTP server software, let's say Apache and it handles incoming connections. Since Apache creates a thread per connected user, it requires certain amount of memory for that operation. Eventually, it will run out of memory and then shit hits the fan, stuff stops working, your site is unavailable.
Therefore, you have to somehow scale this part of your application that connects your PHP code / MySQL db to people who want to interact with it.
Let's assume you successfully scaled your Apache and you have a cluster of computers which can accept new computers in order to scale-out. You solved your first problem.
Next part is the actual layer that does the work. Accepts input from the user and saves it somewhere (MySQL) and that's the biggest problem you'll have - why?
Due to the database.
Databases store their data on mediums such as hard drives. Hard drives, be it an SSD or mechanical one - are limited by their ability to write or retrieve data. If I'm not mistaken, RAM operates at levels of around 6GB/sec transfer rate. Not to mention that the seek time is also much much lower than HDD's one is.
Therefore, if you have an X amount of users asking for a piece of information and you can only deliver it at a certain rate - your app crashes, or it becomes unresponsive and the layer handling database queries becomes slow since the hardware cannot match the speed at which you need the data.
What are the options here? There are many, I won't mention all of them
Split Reads and Writes. Set your database layer in such a way that you have dedicated machines that write the data and completely different ones that read it. You have to use replication and replication has its own quirks - it never works without breaking.
Optimize handling of your data set by sharding your data. Great for read / write performance, screwed up when you need to query multiple shards and merge the data.
Get better hardware, especially storage (such as FusionIO)
Pay for better storage engine (such as TokuDB)
Alleviate load on the database by using caching. The data that your users request probably doesn't change so often that you have to query the db every single time (say you're viewing someone's profile, what's the chance they'll change it every second?). That's why Facebook uses Memcached extensively - a system that stores small pieces of data in RAM, it's easily scalable and what not. Most important, it's damn quick!
Use different solutions next to MySQL. MySQL (and some other databases) aren't good for every type of data storage or retrieval. Someone mentioned NoSQL before. NoSQL solutions are quick, but still immature. They don't do as much as relational databases do. They use methods of delaying disk write (they keep cached copy of data they need to write in RAM) so that they can achieve fast insert rates. That's why it's not unusual to lose data when using NoSQL.
Topic about MySQL vs "insert database or whatever here" is broad, I don't want to go into that but remember - every single one of data stores out there saves data on the hard drive eventually. The difference (physical of course) is how they optimize their flushing to the disk itself.
I also didn't mention various reports you can run by gathering the data (how many men between 19 and 21 have clicked an advert X between 01:15 and 13:37 CET and such) which is what Facebook is actually gathering (scary stuff!).
Third up - the language gluing the data store (MySQL) and output (HTTP server). PHP.
As you can see, most of the work here is already done by Apache and MySQL. Optimization on PHP level is small, even facebook got small results (they claim 50%, but that's UP TO 50%). I tried HipHop extensively, it is not as fast as it claims to be. Naturally, Facebook guys mentioned that already, so it's no wonder. The advantage they get is because they replaced Apache with their own server built in into HipHop. Some people claim "language X is better than language Y" and they're right, but that's not always the case. Each language has its own advantages and disadvantages.
For example, PHP is widely-spread but it's slow for certain operations (implementing a Trie with over 1 billion entries for example). It's great for things like echo some HTML after parsing the output from the db. It's quick to insert and retrieve data from the database, and that's about 90% of the PHP usage - talk to the db, display the data, end.
Therefore, no matter what language you use (say we used C++ instead of PHP), your bottleneck will be the data storage / retrieval layer.
On the other hand, why is using C++ NOT handy? Because there are more people who know how to use PHP than ones who use C++. It's also MUCH slower to develop web apps in C++. Sure, they will execute faster, but who will notice the difference between 1 millisecond and 1 microsecond?
This post is more like an informative blog post, I know it's not filled with resources to back up my claims but anyone who did any work with larger data sets or websites will know that the P.I.T.A. is always the data storage component. Some things that I said probably won't fit with everyone, but in a NUTSHELL this is how you'd go about optimizing your site.
Unfortunately, your question doesn't have a simple answer. For the MySQL portion of it, you would need to investigate database scale-out. You can start looking at it here: http://www.mysql.com/why-mysql/scaleout/mixi.html. There are a number of different ways to set up Apache/PHP web sites across a server farm. One of them involves setting up round robin DNS. This is adding a DNS record with a number of different IP addresses. Your DNS then hands out a different IP address each time the record is requested so that the load is balanced across a number of servers. You can also set up clustering with MySQL, Apache and Heartbeat, but that is more of a high-availability solution than a scaling solution.
When you have a website with so many users you'll already have enough experience to know the answer of the question, you'll also have a lot of money to pay people to find the optimal architecture of your system.
I'm not saying that what I describe below is the Holy Grail, but it is certainly an option:
You will have a big, fragmented database with lots of backups and you'll have a few name servers which will know the location of servers and some rules about the data stored on each server. When data is searched the query will be sent to a name server which will find the server(s) where the answer can be found for the particular query. I've also upvoted N.B.'s answer, I think he is mostly right.
For lots of users, you should have a server with lots of memory and speed. Configure php.ini to allow more memory usage. A server with lots of users should have 4-12GB available. Also, save resources by closing the desktop environment. If you have this many users, you might want to consider a CDN and also make a database request queue.
Well, this is the thing. Let's say that my future PHP CMS need to drive 500k visitors daily and I need to record them all in MySQL database (referrer, ip address, time etc.). This way I need to insert 300-500 rows per minute and update 50 more. The main problem is that script would call database every time I want to insert new row, which is every time someone hits a page.
My question, is there any way to locally cache incoming hits first (and what is the best solution for that apc, csv...?) and periodically send them to database every 10 minutes for example? Is this good solution and what is the best practice for this situation?
500k daily it's just 5-7 queries per second. If each request will be served for 0.2 sec, then you will have almost 0 simultaneous queries, so there is nothing to worry about.
Even if you will have 5 times more users - all should work fine.
You can just use INSERT DELAYED and tune your mysql.
About tuning: http://www.day32.com/MySQL/ - there is very useful script (will change nothing, just show you the tips how to optimize settings).
You can use memcache or APC to write log there first, but with using INSERT DELAYED MySQL will do almost same work, and will do it better :)
Do not use files for this. DB will serve locks much better, than PHP. It's not so trivial to write effective mutexes, so let DB (or memcache, APC) do this work.
A frequently used solution:
You could implement an counter in memcached which you increment on an visit, and push an update to the database for every 100 (or 1000) hits.
We do this by storing locally on each server to CSV, then having a minutely cron job to push the entries into the database. This is to avoid needing a highly available MySQL database more than anything - the database should be able to cope with that volume of inserts without a problem.
Save them to a directory-based database (or flat file, depends) somewhere and at a certain time, use a PHP code to insert/update them into your MySQL database. Your php code can be executed periodically using Cron, so check if your server has Cron so that you can set the schedule for that, say every 10 minutes.
Have a look at this page: http://damonparker.org/blog/2006/05/10/php-cron-script-to-run-automated-jobs/. Some codes have been written in the cloud and are ready for you to use :)
One way would be to use Apache access.log. You can get a quite fine logging by using cronolog utility with apache . Cronolog will handle the storage of a very big number of rows in files, and can rotate it based on volume day, year, etc. Using this utility will prevent your Apache from suffering of log writes.
Then as said by others, use a cron-based job to analyse these log and push whatever summarized or raw data you want in MySQL.
You may think of using a dedicated database (or even database server) for write-intensive jobs, with specific settings. For example you may not need InnoDB storage and keep a simple MyIsam. And you could even think of another database storage (as said by #Riccardo Galli)
If you absolutely HAVE to log directly to MySQL, consider using two databases. One optimized for quick inserts, which means no keys other than possibly an auto_increment primary key. And another with keys on everything you'd be querying for, optimized for fast searches. A timed job would copy hits from the insert-only to the read-only database on a regular basis, and you end up with the best of both worlds. The only drawback is that your available statistics will only be as fresh as the previous "copy" run.
I have also previously seen a system which records the data into a flat file on the local disc on each web server (be careful to do only atomic appends if using multiple proceses), and periodically asynchronously write them into the database using a daemon process or cron job.
This appears to be the prevailing optimium solution; your web app remains available if the audit database is down and users don't suffer poor performance if the database is slow for any reason.
The only thing I can say, is be sure that you have monitoring on these locally-generated files - a build-up definitely indicates a problem and your Ops engineers might not otherwise notice.
For an high number of write operations and this kind of data you might find more suitable mongodb or couchdb
Because INSERT DELAYED is only supported by MyISAM, it is not an option for many users.
We use MySQL Proxy to defer the execution of queries matching a certain signature.
This will require a custom Lua script; example scripts are here, and some tutorials are here.
The script will implement a Queue data structure for storage of query strings, and pattern matching to determine what queries to defer. Once the queue reaches a certain size, or a certain amount of time has elapsed, or whatever event X occurs, the query queue is emptied as each query is sent to the server.
you can use a Queue strategy using beanstalk or IronQ
Right now the setup for my javascript chat works so it's like
function getNewMessage()
{
//code would go here to get new messages
getNewMessages();
}
getNewMessages();
And within the function I would use JQuery to make a get post to retrieve the messages from a php scrip which would
1. Start SQL connection
2. Validate that it's a legit user through SQL
3. retrieve only new message since the last user visit
4. close SQL
This works great and the chat works perfectly. My concern is that this is opening and closing a LOT of SQL connections. It's quite fast, but I'd like to make a small javascript multiplayer game now, and transferring user coordinates as well as the tens of other variables 3 times a second in which I'm opening and closing the sql connection each time and pulling information from numerous tables each time might not be efficient enough to run smoothly, and might be too much strain on the server too.
Is there any better more efficient way of communicating all these variables that I should know about which isn't so hard on my server/database?
Don't use persistent connections unless it's the only solution available to you!
When MySQL detects the connection has been dropped, any temporary tables are dropped, any active transaction is rolled back, and any locked tables are unlocked. Persistent connections only drop when the Apache child exits, not when your script ends, even if the script crashes! You could inherit a connection in the middle of a transaction. Worse, other requests could block, waiting for those tables to unlock, which may take quite a long time.
Unless you have measured how long it takes to connect and identified it as a very large percentage of your script's run time, you should not consider using persistent connections. In fact, that should be what you do here, if you're worried about performance. Check out xhprof or xdebug, profile your code, then start optimizing.
Maybe try to use a different approach to get the new messages from the server: Comet.
Using this technique you do not have to open that much new connections.
http://www.php.net/manual/en/features.persistent-connections.php
and
http://www.php.net/manual/en/function.mysql-pconnect.php
A couple of dozen players at the same time won't hurt the database or cause noticeable lag if you have efficient SQL statements. Likely your database will be hosted on the same server or at least the same network as your game or site, so no worries. If your DB happens to be hosted on a separate server running an 8-bit 16mz board loaded with MSDOS, located in the remote Amazon, connected by radio waves hooked up to a crank-powered generator operatated by a drunk monkey, you're on your own with this one.
Otherwise, really you should be more worried about exactly how much data you're passing back and forth to your players. If you're passing back and forth coordinates for all objects in an entire world, page load could take a painfully long time, even though the DB query takes a fraction of a second. This is sometimes overcome in games by a "fog of war" feature which doesn't bother notifying the user of every single object in the entire map, only those which are in immediate range of the player. This can easily be done with a single SQL query where object coordinates are in proximity to a player. Though, if you have a stingy host, they will care about the number of connects and queries.
If you're concerned about attracting even more players than that, consider exploring cache methods like pre-building short files storing commonly fetched records or values using fopen(), fgets(), fclose(), etc. Or, use php extensions like apc to store values in memory which persist from page load to page load. memcache or memcached also act similarly, but in a way which acts like a separate server you can connect to, store values which can be shared with other page hits, and query.
To update cached pages or values when you think they might become stale, you can run a cron job every so often to update these files or values. If your host doesn't allow cron jobs, consider making your guests do that legwork: a line of script on a certain page will refresh the cache with new values from a database query after a certain number of page hits. Or cache a date value to check against on every page hit, and if so much time has passed, refresh the cache.
Again, unless you're under the oppressive thumb of a stingy host, or unless you're getting a hundred or more page hits at a time, no need to even be concerned about your database. Databases are not that fragile. If they crashed in a hysterical fit of tears anytime more than one query came their way, the engineers who made it wouldn't have a job for very long.
I know this is quite an annoying "answer" but perhaps you should be thinking about this a different way, after all this is really not the strongest use of a relational database. Have you considered an XMPP solution? IMO this would be the best tool for the job and both ejabberd and openfire are trivial to set up these days. The excellent Strophe library can make the front end story easy, and as an added bonus you get HTTP binding (like commet) so you won't need to poll the server, your latency will go down and you'll be generating less HTTP traffic.
I know it's highly unlikely you're going to change your whole approach just cos I said so, but wanted to provide an alternative perspective.
http://www.ejabberd.im/
http://code.stanziq.com/strophe/