I've been working on a website lately and want to speed up my application.
I want to cache my users' pages but the pages are dynamic like if someone posts a new feed then the homepage is updated with that new feed. If I cache the homepage for one user and a friend of his posts a new feed I want that cache to be expired and the next time he visits the homepage again the application contacts the database and fetches the new feeds and caches it.
I'm using memcache and PHP and MySQL for my DB.
I have a table called friends, feeds and users.
Will it be efficient to cache every user's friends and when that user posts a feed, my app fetches his/her friends and caches a notification with their userid so that when those friends log in the app checks at every page if there is a notification to take action (in this case deleting the homepage in the cache).
Regards,
Resul
Profile your application and locate places where you access data that is expensive to fetch (or calculate). Those places are good places to start with memcached, unless you're doing more writes than reads (where you'd likely have to update the cache more often than you could make use of it).
Caching everything you ever access could well lead to nothing than a quite full memcached that holds mostly data that is rarely accessed (while potentially pushing things out from the cache you actually should cache). In many cases you shouldn't use memcached as a 1:1 copy of your database in key-value form.
Before you even start server-side optimizations, you should run ySlow and try to get an A rating. Take a hard look at you JavaScript too. If you are using jQuery, then getting rid of it would vastly improve the overall performance of site. The front-end optimization usually is much more important.
Next step would be optimizing cleaning up the server-side code. Try testing your SQL queries qith EXPLAIN. See if you are missing some indexes. And then do some profiling on PHP side with Xdebug. See where the bottlenecks are.
And only then start messing with caching. As for Memcached, unless your website runs on top of cluster of servers, you do not need it. Hell .. it might even be harmful. If you site is located on single box, you will get much better results with APC, which, unlike Memcached, is not distributed by nature.
Write a class that handles all the DB queries, caches the tables, and does the queries on the cached tables instead your DB. update your cache each time you do an Insert or an update on a Table.
Related
I am working on a project with a custom HTML5 front end and a backend I've designed from experience. The backend is composed of a message queue and a cache - currently I've chosen Beanstalk and Memcache because I'm famliar with them but I am open to suggestions.
My question though comes from how my coder is interfacing with the MySQL DB we are using to store the data. The idea is to pre-cache most or all of the DB so the site runs really fast. It's not a huge DB so RAM for Memcache shouldn't be an issue. However, my coder is using CodeIgniter with GreenBean. I've never heard of GreenBean before and when I google it I get almost nothing that isn't related to greenbeans the food. What little I could find suggested it was an ORM which fits from what my coder has told me.
The problem is this. With raw PDO my pre-caching scheme is simple - I would grab each row from each table and store it in the cache with a key. Then every time I needed that data I would look at the cache first for it and then the DB. If something is changed on the backend then I only need to update that row in the DB and the associated key in the cache.
With an ORM, if I store the entire ORM object serialized into the cache then it holds a bunch of related data. Data that could be incorrect if something were changed. For example, you have a DB of employees that is linked to the office they work in and the dept they work in. The ORM grabs the office and the dept and we store all of that in the cache. But if the office address changes the ORM object for every employee in that office is now stale/incorrect.
In that example, just letting the cache expire probably isn't an issue most of the time. But in my application, that data should really get updated immediately. So in a simple PDO scheme you flush the cache keys related to the data that changed and every future page call gets the updated data. But with an ORM you have lots and lots of cached object instances that might be incorrect and no good way of finding them. So it seems to me you are now left with some form of indexing of your cached objects and when you change something simple you could be flushing and refilling a big chunk of the cache. The site gets really slow then.
Typically I would just cache a DB result after the first time I needed it but in this case I think that could end up being really slow for a lot of users that make the first requests that particular set of data. Additionally, there are some search features that could require a lot of data from the DB. Thus my desire to pre-cache.
So in this case I'm thinking an ORM would hurt the site's performance. I'm thinking I'm not the first person to have this issue though. Is there an ORM out there that would handle this scenario well? Is there a better backend architecture I'm missing?
Thanks
Im working on Blog based Website having more than 50k posts. I need suggestions to increase the website speed.
I have two options
1: I can pick up the post data from the mysql database and display it using php
2: Static Webpage for each post (Using DOM parser i can Update the Post Contents)
which one is fast database or File System ? or any other suggestions to speedup my website.Im using go daddy shared hosting.
I would suggest:
a pagination for the site.
implement coding style: fetch-what-you-only-need from the database
run some load tests on where on your site needs improving.
Sorry, looked up godaddy and they do not allow memcached :(
Use database and implement memcached to cache recently shown pages.
Even with 50 K posts I imagine that most fetches are for a small subset of posts for a specific time period, usually recent posts.
If this is the case a memcache solution would beat any disk based storage.
Automatically generating static pages for posts for often retrieved posts is another way.
But base storage in a database is the easiest.
You can't get reliable performance on shared hosting, so just go with what's easiest to work with. Today you may get fast access to the file system, but tomorrow they relocate your app to another silo and the database becomes faster. It's a lot easier to extend a database to add new features, so I'd go with that.
But if you really care about performance you have to make tests to measure it.
you can use page caching for whole page
query caching from caching results of database queries
using file system will give only give trouble in update ,delete ,insert etc..
I'm hoping to develop a LAMP application that will centre around a small table, probably less than 100 rows, maybe 5 fields per row. This table will need to have the data stored within accessed rapidly, maybe up to once a second per user (though this is the 'ideal', in practice, this could probably drop slightly). There will be a number of updates made to this table, but SELECTs will far outstrip UPDATES.
Available hardware isn't massively powerful (it'll be launched on a VPS with perhaps 512mb RAM) and it needs to be scalable - there may only be 10 concurrent users at launch, but this could raise to the thousands (and, as we all hope with these things, maybe 10,000s, but this level there will be more powerful hardware available).
As such I was wondering if anyone could point me in the right direction for a starting point - all the data retrieved will be the same for all users, so I'm trying to investigate if there is anyway of sharing this data across all users, rather than performing 10,000 identical selects a second. Soooo:
1) Would the mysql_query_cache cache these results and allow access to the data, WITHOUT requiring a re-select for each user?
2) (Apologies for how broad this question is, I'd appreciate even the briefest of reponses greatly!) I've been looking into the APC cache as we already use this for an opcode cache - is there a method of caching the data in the APC cache, and just doing one MYSQL select per second to update this cache - and then just accessing the APC for each user? Or perhaps an alternative cache?
Failing all of this, I may look into having a seperate script which handles the queries and outputs the data, and somehow just piping this one script's data to all users. This isn't a fully formed thought and I'm not sure of the implementation, but perhaps a combo of AJAX to pull the outputted data from... "Somewhere"... :)
Once again, apologies for the breadth of these question - a couple of brief pointers from anyone would be very, very greatly appreciated.
Thanks again in advance
If you're doing something like an AJAX chat which polls the server constantly, you may want to look at node.js instead, which keeps an open connection between server and browser. This way, you can have changes pushed to the user when they happen and you won't need to do all that redundant checking once per second. This can scale very well to thousands of users and is written in javascript on the server-side, so not too difficult.
The problem with using the MySQL cache is that the entire table cache gets invalidated on any write to that table. You're better off using a caching solution like memcached or APC if you're trying to control that behavior more precisely. And yes, APC would be able to cache that information.
One other thing to keep in mind is that you need to know when to invalidate the cache as well, so you don't have stale data.
You can use apc,xcache or memcache for database query caching or you can use vanish or squid for gateway caching...
I am considering enabling Memcache support for my large-scale REST service. However I have some questions regarding best approaches for these key-value stores.
The setup:
A database wrapper which has functions for select, update and etc.
A REST framework which contains all the API functions (getUser, createUser and etc.)
In my head, the ideal approach would be to integrate the Memcache in the database wrapper so, for example, every SQL query would get md5-hashed and saved in the cache (this is btw what most online resources suggests). However, there is obviously a problem with this approach: if a search query has been cached, and one of the users from the search result has been updated after the cached result, this wont reflect in the next request (because it is now in the cache).
As I see it I have several ways of handeling this:
Implement the Memcache in the REST framework for each function (getUser, createUser etc) and thereby explicit handle the updating of the cache etc. if users gets updated. This could end up in redundant code.
Let the cached values expire very quickly and live with the fact that some requests shows old cached values.
Do a more advanced implementation of the Memcache in the database wrapper so that I can identify which parts(e.g. users) to update in e.g. a search request.
Could you guide me to which of the following, or a complete another approach, to take?
Thanks in advance.
Enabling cache for a web application is not something to take lightly.
Maybe you have done that already bit... I recommend you first come up with a goal based on business needs or forcast (ex: must accept 1000 requests per seconds) then properly stress-test your system to have numbers before you start changing anything and then identify your bottleneck.
http://en.wikipedia.org/wiki/Performance_tuning
I usually use profiling tools such as HXProf (by facebook).
https://github.com/facebook/xhprof
Caching all your data to mirror your database might not be the best approach.
Find out how big you can allocate for your cache. If your architecture only allow you to allocate 100MB for your memcache, then it will affect your decision about what you cache and how long you cache it.
The best cache is to cache forever. But we all know that data changes. You can start by caching data that is requested often and requires the most resources to fetch.
Always try to make sure you are not working on improving something that will get you low improvement.
Without understanding your architecture in depth, it would be hazardous for anyone to recommend a caching strategy that best fit your needs.
Maybe you should cache the resutling output of your web services instead? Using a reverse proxy for example (What #Darrel is talking about) or using output buffering...
http://en.wikipedia.org/wiki/Reverse_proxy
http://php.net/manual/en/book.outcontrol.php
Optimize your database queries before you think about caching. Make sure your use a PHP Op cache (like APC) and all those things that are standard practice.
http://phplens.com/lens/php-book/optimizing-debugging-php.php
http://blog.digitalstruct.com/2008/01/31/performance-tuning-overview/
If you want to cache data and prevent stale/old data from being served, the trick is to identify your data (primary key maybe?) and when the data is updated or deleted, you delete or update the cache for that identifyer.
<?php
// After inserting into DB, you can also put it in the cache
$memcache->set($userId, $userData);
// After updating or deleting the user, you update or delete the data
$memcache->delete($userId);
A lot of site will show stale data. When I am on stackoverflow and my reputation is increased and then I got in the stackoverflow chat, the reputation shown is my old reputation. When I got a reputation of 20 (reputation required to chat) I still could not chat for another 5 minutes because the chat system had my old reputation data and did not yet know my reputation had increased enough to allow me to chat. Some data can be stale while other type of data should never be stale. Consider that when caching data.
Conclusion
Your approaches can all be valid depending on the factors that I talk about above. In fact, you can use a combination of those for all the different type of data you want to cache and how long it is acceptable to show old data for them. Maybe the categories or list of countries (since they do not change often) can be cached for a long time while the reputation (or whatever data changes all the time for all users) should be cached for a short period only.
I have a social network
The users table is around 60,000 rows
The friends table is around 1 million
rows (used to determine who is your
friend)
I am wanting to do a friend feed, wall, whatever you like to call it, it will show things like user status post (twitter type posts), it will show a few different items but for the start it will just be friend status and maybe blog post.
Basicly you will only see content published with a user ID that is in your friend list.
I have been trying to come up with the best way and haven't gotten very far but here is my latest idea.
Currently to build this feed, I have to
Get the list of friend ID's on the
large friend table
Get the stream data from the friend
ids from the above result
JOIN the user table to get the
publishers picture URL and username
Then JOIN the comments table to get
comments posted on the feed items
That is one big task to build that feed
I have 3 ideas so far, this is where your help can come in.
Memcache Option:
Use memcache and cache a users
friendlist as an array when the user
logs into the site, also when the user
approves a new friend request for a
friend to be added to there list, it
would rebuild there cache.
In addition to just getting there friends I could save there friends picture URL and username, this would speed up things again by eliminating this query when building the friend feed.
File cache Option:
Do the same as the memcache option
does but save this data as an array
to a cache file instead of memory,
then include this cache file into
the page.
I am not sure which is the best
method for performance I understand
memcache stores everything in memory
so friends that have like 20,000
friends that could use a lot of
memory and a file cache would only
put it in memory when the users
needs it if I am correct. Also if I
did the file method, when a user
logs out of the site, I would delete
there cache file so the cache folder
would never be too large of files
Session cache Option:
Same as file cache above, I just realized that session data is saved into a file so wouldn't that make it capable of being a cache?
Please give me your opinions or any advice or info you have on this as I don't have much knowledge of caching, I have read a lot but sometimes other peoples ideas help a lot
Memcache is your best bet for a lot of reasons:
It's REALLY fast - Everything's in memory, and it's highly optimized for situations just like yours (and caching in general :)
It's distributed - This means that if you have multiple web / app servers running, they can all access the same cache
You can pool multiple servers for memcache - If you've got a few servers that are relatively underutilized (or several dedicated cache servers), you can pool them all together into one big cache
It's super-scalable (for the reasons mentioned prior)
It's got great PHP support - The PECL package for memcache was recently updated with a lot of new goodness
You can even store your user sessions in memcache - just set it up in your php.ini file. This is much faster than storing sessions in databases, and allows your sessions to persist across multiple web hosts (if you're in a load balanced situation)... this will also give your site a bit of a performance boost as there's no need to hit the filesystem / database for session info on every request.
... and many more ;)
As to some of your concerns about memory footprint of individual cached items you've got a few options. My initial thought is to just give it a whirl, see how big these cache items really get (you can find several open-source things to monitor the actual cache usage, such as cacti). I think they'll be smaller than you'd think.
If they're not, I'd suggest re-thinking your cache strategy as far as what you actually cache, for how long, etc. Maybe you could build the feed from several things already in the cache (i.e. cache individual user data, and then build the feed for a person from all those individual items in cache). There are a lot of good articles out there on that front, just search 'em out :)
The default maximum object size that is allowed in Memcache is 1MB.
#jsaondavis :
"session data is saved into a file".
Your above statement is wrong. Session can be configured to store in database. Default session hadndler is file.
Redis would be a good solution:
Here is a thread on the Redis Vs. Memcached.
Sounds like Redis has 512mb storage instead of the 1mb limit... QUITE a bit different :)
Memcached vs. Redis?
Wrong! Memcached is not fixed to a size! its up to your computer memory and configurations you set.