I'm programming a reddit-like website.
The user can display items from categories of its choice.
For this I'm querying a JOIN of the categories he subscribed to and the items.
Hardcore query
First solution : store the data on disk in a "categories_1-2-4-7-10.json" and serve it to the users browsing the same categories.
Cons : takes space on disk, heavy load.
I'm thinking about a new solution : views. But I don't really know how they work, do they regenerate often enough to be a heavyload on the server?
View would let me query data that already has been JOINED
Further : I'm only making a view for the frontpage items. I don't need to optimize later pages as they're not as frequently accessed.
It's a bad idea to store things to disk and then load them for a site. Disk operations are insanely slow compared to in memory operations.
You can still store JSON documents, but consider storing them in a caching layer.
Something like Redis, which is the new hotness these days (http://redis.io/) or Couchbase (http://www.couchbase.com/)
Store everything in memory and the site will be much faster.
As far as how often to regenerate your views ... a good idea is to give them an expiration time. Read about how that might work with caching in general. You would set a category view to exist in the cache for maybe 1 minute. After a minute the item leaves memory and you make a database query to put a newer version back in. Rinse and repeat.
Related
I've been working on a website lately and want to speed up my application.
I want to cache my users' pages but the pages are dynamic like if someone posts a new feed then the homepage is updated with that new feed. If I cache the homepage for one user and a friend of his posts a new feed I want that cache to be expired and the next time he visits the homepage again the application contacts the database and fetches the new feeds and caches it.
I'm using memcache and PHP and MySQL for my DB.
I have a table called friends, feeds and users.
Will it be efficient to cache every user's friends and when that user posts a feed, my app fetches his/her friends and caches a notification with their userid so that when those friends log in the app checks at every page if there is a notification to take action (in this case deleting the homepage in the cache).
Regards,
Resul
Profile your application and locate places where you access data that is expensive to fetch (or calculate). Those places are good places to start with memcached, unless you're doing more writes than reads (where you'd likely have to update the cache more often than you could make use of it).
Caching everything you ever access could well lead to nothing than a quite full memcached that holds mostly data that is rarely accessed (while potentially pushing things out from the cache you actually should cache). In many cases you shouldn't use memcached as a 1:1 copy of your database in key-value form.
Before you even start server-side optimizations, you should run ySlow and try to get an A rating. Take a hard look at you JavaScript too. If you are using jQuery, then getting rid of it would vastly improve the overall performance of site. The front-end optimization usually is much more important.
Next step would be optimizing cleaning up the server-side code. Try testing your SQL queries qith EXPLAIN. See if you are missing some indexes. And then do some profiling on PHP side with Xdebug. See where the bottlenecks are.
And only then start messing with caching. As for Memcached, unless your website runs on top of cluster of servers, you do not need it. Hell .. it might even be harmful. If you site is located on single box, you will get much better results with APC, which, unlike Memcached, is not distributed by nature.
Write a class that handles all the DB queries, caches the tables, and does the queries on the cached tables instead your DB. update your cache each time you do an Insert or an update on a Table.
Concern about my page loading speed, I know there are a lot of factors that affect page loading time.
Does retrieving records (Categories) in a array instead of DB is faster?
Thanks
It is faster to keep it all in PHP till you have an absurd amount of records and you use up RAM.
BUT, both of these things are super fast. Selecting a handful of records on a single table that has an index should take less than a msec. Are you sure that you know the source of your web page slowness?
I would be a little bit cautious of having your Data in your code. It will make your system less maintainable. How will users change categories?
THis gets back to deciding if you want your site static versus dynamic.
Yes of course retrieving data from an array is much faster than retrieving data from a Database, but usually arrays and databases have totally different use cases, because data in an array is static (you type the value in code or in a separate file and you can't modify them) while data in a database is dynamic
Yes, it's probably faster to have an array of your categories directly in your PHP script, especially if you need all the categories on every page load. This makes it possible for APC to cache the array (if you have APC running), and also lessen the traffic to/from the database.
But is this where your bottleneck is? It seems to me as the categories should have been cached in the query cache and therefore be easily retrieved. If this is not your biggest bottleneck, chances are you won't see any decrease in loading times. Make sure to profile your application to find the large bottlenecks or you will waste your time on getting only small performance gains.
If you store categories in a database, you have to connect to the database, prepare a SQL statement, send it to the server, fetch the result set, and (probably) store the results in an array. (But you'll probably already have a connection to the database anyway, and hardware and software is designed to do this kind of work quickly.)
Storing and retrieving categories
from a database trades speed for
maintenance. Your data is always up
to date; it might take a little
longer to get it.
You can also store categories as constants or as literals in an array assignment. It would be smart to generate the constants or the array literals from data stored in the database, but you might not have to do that for every page load. If "categories" doesn't change much, you might be able to get away with generating the code once or twice a day, plus whenever someone adds a category. It depends on your application.
Storing and retrieving categories
from an array trades maintenance for
speed. Your data loads a little
faster; it might be incomplete.
The unsatisfying answer is that you're not going to be able to tell how different storage and page generation strategies affect page loading speed until you test them. And even testing isn't that easy, because the effect of changing server and database parameters can be, umm, surprising.
(You can also generate static pages from the database using php. I suggest you test some static pages to give you an idea of "best case" performance.)
Think you are the proud owner of Facebook, then
which data you want to store in app layer [memcached/ APC] and which data in MySQL cache ?
Please explain also why you think so.
[I want to have an idea on which data to cache where]
For memcache, store session data. You have to typically query from a large table or from the filesystem to get it, depending on how it's stored. Putting that on memory removes hitting the disk for a relatively small amount data (that is typically critical to one's web application).
For your database cache, put stuff in there that is not changing so often. We're talking about wall posts, comments, etc. They are queried a lot and rarely change, all things considered. You may also want to consider doing a flat file cache, so you can purge individual files with greater ease, and divide it up as you see fit.
I generally don't directly cache any arbitrary data with APC, usually I will just let it cache stuff automatically and get lessened memory loads.
This is only one way to do it, but as far as the industry goes, this is a somewhat well-used model.
I need help find the right caching solution for a clients site. Current site is centoOS, php, mysql, apache using smarty templates (i know they suck but it as built by someone else). The current models/methods use fairly good OO structure but there are WAY to many queries being done for some of the simple page functions. I'm looking try find some sort of caching solution but i'm a noob when it comes to this and don't know what is available that would fit the current site setup.
It is an auction type site with say 10 auctions displayed on one page at one time -- the time and current bid on each auction being updated via an ajax call returning json every 1 second (it's a penny auction site like beezid.com so updates every second are necessary). As you can see, if the site gets any sort of traffic the number of simultaneous requests could be huge. Obviously this data changes every second because the json data returned has the updated time left in the auction, and possibly updated bid amounts and bid users for each auction.
What i want is the ability to cache certain pages for a given amount of time or based on other changed variable. For example, memory caching the page that displays 10 auctions and only updating that cache copy when one of the auctions ends. Or even the script above that returns json string data every second. If i was able to cache the first request to this page in memory, serve the following requests from memory and then re-cache it again after 1 second, that could potentially reduce the serverload a lot. But i don't know if this is even possible or if the overhead of doing something like this outweights any request load savings.
I looked into xcache some but i couldn't find a way that i could set a particular cache time on a specific page or based on other variables?!? Maybe i'm missed something, but does anyone have a recommendation on a caching scheme that would work for these requirements?
Mucho thanks for any input you might have...
Cacheing can be done using many methods. Memcached springs to mind as being suited to your task. but if the site is ultra busy you may run out of ram.
When I do caching I often use a simple file cache, while it does involve at least one stat call to determine the freshness of the cached content it is still fast and marginally better than calling a sql server.
If you must call a sql server then it may pay to use a memory(heap) table to store much of the precomputed data. this technique is no more efficient than memcached, probably less so but saves you installing memcached.
DC
Zend_Cache can do what you want, and a lot more. It supports a lot of backends, including xcache and memcache, and allows you to cache data, full pages, partial pages, and well, just about anything you can imagine :p.
And in case you are wondering : you can use the Zend_Cache component by itself, you don't have to use the complete Zend framework for your application.
I have a social network
The users table is around 60,000 rows
The friends table is around 1 million
rows (used to determine who is your
friend)
I am wanting to do a friend feed, wall, whatever you like to call it, it will show things like user status post (twitter type posts), it will show a few different items but for the start it will just be friend status and maybe blog post.
Basicly you will only see content published with a user ID that is in your friend list.
I have been trying to come up with the best way and haven't gotten very far but here is my latest idea.
Currently to build this feed, I have to
Get the list of friend ID's on the
large friend table
Get the stream data from the friend
ids from the above result
JOIN the user table to get the
publishers picture URL and username
Then JOIN the comments table to get
comments posted on the feed items
That is one big task to build that feed
I have 3 ideas so far, this is where your help can come in.
Memcache Option:
Use memcache and cache a users
friendlist as an array when the user
logs into the site, also when the user
approves a new friend request for a
friend to be added to there list, it
would rebuild there cache.
In addition to just getting there friends I could save there friends picture URL and username, this would speed up things again by eliminating this query when building the friend feed.
File cache Option:
Do the same as the memcache option
does but save this data as an array
to a cache file instead of memory,
then include this cache file into
the page.
I am not sure which is the best
method for performance I understand
memcache stores everything in memory
so friends that have like 20,000
friends that could use a lot of
memory and a file cache would only
put it in memory when the users
needs it if I am correct. Also if I
did the file method, when a user
logs out of the site, I would delete
there cache file so the cache folder
would never be too large of files
Session cache Option:
Same as file cache above, I just realized that session data is saved into a file so wouldn't that make it capable of being a cache?
Please give me your opinions or any advice or info you have on this as I don't have much knowledge of caching, I have read a lot but sometimes other peoples ideas help a lot
Memcache is your best bet for a lot of reasons:
It's REALLY fast - Everything's in memory, and it's highly optimized for situations just like yours (and caching in general :)
It's distributed - This means that if you have multiple web / app servers running, they can all access the same cache
You can pool multiple servers for memcache - If you've got a few servers that are relatively underutilized (or several dedicated cache servers), you can pool them all together into one big cache
It's super-scalable (for the reasons mentioned prior)
It's got great PHP support - The PECL package for memcache was recently updated with a lot of new goodness
You can even store your user sessions in memcache - just set it up in your php.ini file. This is much faster than storing sessions in databases, and allows your sessions to persist across multiple web hosts (if you're in a load balanced situation)... this will also give your site a bit of a performance boost as there's no need to hit the filesystem / database for session info on every request.
... and many more ;)
As to some of your concerns about memory footprint of individual cached items you've got a few options. My initial thought is to just give it a whirl, see how big these cache items really get (you can find several open-source things to monitor the actual cache usage, such as cacti). I think they'll be smaller than you'd think.
If they're not, I'd suggest re-thinking your cache strategy as far as what you actually cache, for how long, etc. Maybe you could build the feed from several things already in the cache (i.e. cache individual user data, and then build the feed for a person from all those individual items in cache). There are a lot of good articles out there on that front, just search 'em out :)
The default maximum object size that is allowed in Memcache is 1MB.
#jsaondavis :
"session data is saved into a file".
Your above statement is wrong. Session can be configured to store in database. Default session hadndler is file.
Redis would be a good solution:
Here is a thread on the Redis Vs. Memcached.
Sounds like Redis has 512mb storage instead of the 1mb limit... QUITE a bit different :)
Memcached vs. Redis?
Wrong! Memcached is not fixed to a size! its up to your computer memory and configurations you set.