Optimising a big Wordpress site - php

I'm looking at optimising a rather large site I've been adding to and adding to. The database has become pretty big (maybe 100,000 posts) and it has started slowing down somewhat and giving me "Mysql has gone away"errors. I've been reading about database optimisation and have ready some people saying you should only be looking to use 1-15 queries on a page.
Do people think the suggestion that only a handful of queries should be used on any page?
Am I correct in thinking that every time I use a Wordpress function such as get_permalink() I am creating a new query and new connection the database?
I have some loops in there that literally loop through 100+ users at a time and use functions such as get_user_meta() in these loops - so would this mean I am literally making 100 database queries or are they somehow cached in Wordpress?

With issues like this, the thing to do is take the caching out of the hands of Wordpress, and make the server to the work.
Software like Wordpress and Drupal do have their own caching systems, and you should enable them, but even with them in use, there's still a certain amount of overhead for the software to load and serve the page.
So I suggest you investigate a server caching engine such as Varnish.
This will dramatically reduce the server load for most sites like yours; if you have a lot of requests for the same page over and over, Varnish will take over the caching and Wordpess will never even have to know that the page is being requested. No more loading PHP and the Wordpress core for every request, no more database session with every request.
If your back-end CMS software is starting to go slowly, this is the single most effective way of speeding it up.

Related

File caching vs mySQL storage of Twitter/Facebook/Other API results

I have a few sites with Twitter & Facebook Feeds, and one that references a health club schedule (quite large, complicated data tree). I am starting to get into caching to improve load times on page, and am also interested in keeping bandwidth usage down as these sites are hosted on our own VPS.
Right now I have Twitter and Facebook serializing/unserializing each to a simple data file, rewriting themselves every 10 minutes. Would it be better to write this data to the mySQL database? And if so, what is a good method for accomplishing this?
Also, on the Twitter feed results, it contains only what I need, so it is nice and small (3 most recent tweets). But for Facebook, the result is larger and I sort through it with PHP for display - should I store THAT result or the raw feed? Does it matter?
For the other, larger JSON object, would the file vs mysql recommendation be the same?
I appreciate any insights and would be happy to show an example of the JSON schedule object if it makes a difference.
P.S. APC is not a viable option as it seemed to break all my WordPress installs yesterday. However, we are running on FastCGI.
If it's just a cache I would go for a file, but I don't think it will really matter. Unless ofcourse you have thousands or millions of these cache files, then mysql should be the way to go. If you are doing anything else with the cache (like storing multiple versions or searching in the text) then I would go for MySQL.
As for speed, only cache what you're using. So the store the processed results and not the raw ones. Why process it every time? Try to cache it in a format as close as the actual output will be.
Since you use a VPS, I don't think you'll have an enormous amount of visitors so APC (although very nice) isn't really needed. If you do want a memory cache, you could try to look at xcache:
http://xcache.lighttpd.net/

is wordpress suitable for a large database querying?

Wonder if anyone has any advice on a project I might be picking up.
It's a bit vague at the mo, but I think basically a scientific app as in you supply a product which is then checked against a database of proteins to see if it has a reaction.
From the information I did get - this could really be any scenario, and sounds quite simple dev wise - the user on the site fills out a form which then checks in some way against a db of other similar attributes to see if it's suitable, it then gets also added to that db at the end (so the db potentially grows as each user does a check).
The part I'm wondering about is this could be potentially huge (eg 10,000 + unlimited) - so would anything be gained from building a custom php app to handle all this, as in would wordpress not be suitable for backend - and should I be catering for time intensive queries ?
Thanks for looking
No.
See ticket http://core.trac.wordpress.org/ticket/9864 (Performance issues with large number of pages). I filed this ticket 3+ years ago, and there is still no resolution. Since that time, the code in question got even more complicated, and started using a heavier version of ther internal query library.
Those severe issues are with pages, but posts are equally taxing in queries. And if you have meta-data, it starts taxing the server even further. To top this off, most leading caching plugins for WP exclude web robots, so every few weeks when google, baidu and yandex start hammering your machine for the 10k pages, it'll bring your machine down.
This just means that you can't use it natively for large content sets, but you can still use most of WordPress with additional code customizations for parts that'll be outside of the WP database/construct.
Edit: to clarify- what I was saying is that solely using WP's native database structure, as defined by this schema in the codex, and query_posts() / WP_Query() to perform the queries is the inefficiency I was referring to. The native WP storage/query system doesn't handle large volumes of pages / posts very efficiently. However, bypassing some of the native functionality will likely work fine.

What are the Advantage and Disadvantage of Caching in Web Development In PHP, how does it affect Database?

In PHP,
What are the Advantage and Disadvantage of Caching in Web Development In PHP, how does it affect Database?
Caching works in many different ways, but for PHP specifically I can think of a few ways;
Database calls; they are slow, require computation, and can be quite intensive. If you've got repeated calls, caching the query is golden. There's two levels; at the PHP side where you control the cache, and at the database side where they do.
Running PHP code means the webserver calls the PHP interpreter, it parses the code, and the run it. A PHP cacher can cache the parsing part, and go straight for the running part. THen there's the next generation of directly compiling PHP code to C, and run it from there (like Facebook does).
Computations; if you're doing math or heavy lifting of repeated operation, you can cache the result instead of calculate it every time.
Advantages;
speed
less resources used
reuse
being smart
Disadvantages;
stale data
overhead
complexity
I'll only deal with the disadvantages here;
First, stale data; this means that when you use cached content/data you are at risk of presenting old data that's no longer relevant to the new situation. If you've cached a query of products, but in the mean time the product manager has delete four products, the users will get listings to products that don't exists. There's a great deal of complexity in figuring out how to deal with this, but mostly it's about creating hashes/identifiers for caches that mean something to the state of the data in the cache, or business logic that resets the cache (or updates, or appends) with the new data bits. This is a complicated field, and depends very much on your requirements.
Then overhead is all the business logic you use to make sure your data is somewhere between being fast and being stale, which lead to complexity, and complexity leads to more code that you need to maintain and understand. You'll easily lose oversight of where data exists in the caching complex, at what level, and how to fix the stale data if you get it. It can easily get out of hand, so instead of doing caching on complex logic you revert to simple timestamps, and just say that a query is cached for a minute or so, and hope for the best (which, admittedly, can be quite effective and not too crazy). You could give your cache life-times (say, it will live X minutes in the cache) vs. access (it will live for 10 requests) vs. timed (it will live until 10pm) and variations thereof. The more variation, the more complexity, of course.
However, having said that, caching can turn a bog of a system into quite a snappy little vixen without too much effort or complexity. A little can get you a long way, and writing systems that use caching as a core component is something I'd recommend.
The main advantage, and also the goal, of caching is speeding up loading and minimizing system resources needed to load a page.
The main disadvantage is how it's implemented by the developers, and then maintaining proper caching system for the website, making it properly manageable by the Admin.
The above statements are purely said in general terms.
Caching is used to reduce hefty/slow operations (heavy calculations/parsing/database operations) which will consistently product the same result. Caching this result will reduce the server load and speed up the application (because the hefty/slow operation does not need executing)
The disadvantage is that it'll often increase complexity of the application, because the cache should be purged/altered when the result of the operation will no longer be the result cached.
Simple example: a website whose navigation is stored in the database could cache the navigation once the navigation has been fetched from the database, thus reducing the total amount of db-calls, because we no longer need to execute a query to retrieve the navigation.
When the navigation changes (e.g. a page had been added), the cached value for the navigation should be rebuilt, because the navigation that has been cached does not yet reflect the latest change: the new page is not present there.
When a page is Cached, instead of regenerating the page every time, they store a copy of what they send to your browser. The next time a visitor requests the same page, the script will know it'd already generated one recently, and simply send that to the browser without all the hassle of re-running database queries or searches.
Advantage of Caching:
Reduce load on Web Servers and Database
Page downloads faster
Disadvantage:
As information is stored in cache, it make page the heavy.
Sometimes the updated information doesnot show as the cache is not updated
Advantages and disadvantages of caching in web development totally depends upon our context!
Main advantage is reduce data retrieval time either from database or at page loading time.
and disadvantage is separate maintenance or using third party services or tools for that.

php caching techniques

Hi this is more of an information request really.
I'm currently working on a pretty large event listing website and have started thinking about some caching for the data sets being used.
I have been messing with APC this week and have seen some real improvements during testing however what I'm struggling to get my head around is best practices and techniques required when trying to cache data that changes frequently.
Say for example the user hits the home page, this by default displays the latest 10 events happening and if that user is logged in those events are location specific. Is it possible to deploy some kind of caching system when dealing with logged in states and data that changes frequently, the system currently allows the user to "show more events: which is an ajax request to pull extra results from the db.
I haven't really found anything on this as I'm not sure what to search for but I'm really interested to know the techniques used for advanced caching systems that deal especially with data that changes and data specific to users?
I mean is it even worth it? are the other performance boosters when dealing with this sort of criteria?
Any articles or tips and info on this will be greatly appreciated!! Please let me know if any other info is required!!
Your basic solutions are:
file cache
memcached/redis
APC
Each used for slightly different goal.
File cache is usually something that you utilize when you can pre-render files or parts of them. It is used in templating solutions, partial views (mvc), css frameworks. That sort of stuff.
Memcached and redis are both more or less equal, except redis is more of a noSQL oriented thing. They are used for distributed cache ( multiple servers , same cached data ) and for storing the sessions, if you have cluster of webservers.
APC is good for two things: opcode cache and data cache. Faster then memcached, but works for each server separately.
Bottom line is : in a huge project you will use all of them. Each for a different task.
So you have opcode caching, which speeds things up by saving already compiled PHP files in cache.
Then you have data caching, where you save variables or objects that take time to get like data built from SQL queries.
Then you have output caching, which is where you save entire blocks of your webpages in files, and output those files instead of building that block of your webpage on each request.
I once wrote a blog post about how to do output caching:
http://www.spotlesswebdesign.com/blog.php?id=17
If it's location specific, and there are a billion locations, your best bet is probably output caching assuming you have a lot of disc space, but you will have to use your head for what is best, as each situation is very different when it comes to how best to apply caching.
If done correctly, using memcached or similar solutions can give huge boosts to site performance. By altering the cached data directly instead of rehydrating it from the database you can bypass the database entirely for data that either doesn't need to be saved or can be trivially rebuilt. Since the database is often the most critical component in web applications, any load you can take off it is a bonus.
On the other hand, making sure your database queries are as light and efficient as possible will have a much larger impact on performance than most cache tweaks.

Need a php caching recommendation

I need help find the right caching solution for a clients site. Current site is centoOS, php, mysql, apache using smarty templates (i know they suck but it as built by someone else). The current models/methods use fairly good OO structure but there are WAY to many queries being done for some of the simple page functions. I'm looking try find some sort of caching solution but i'm a noob when it comes to this and don't know what is available that would fit the current site setup.
It is an auction type site with say 10 auctions displayed on one page at one time -- the time and current bid on each auction being updated via an ajax call returning json every 1 second (it's a penny auction site like beezid.com so updates every second are necessary). As you can see, if the site gets any sort of traffic the number of simultaneous requests could be huge. Obviously this data changes every second because the json data returned has the updated time left in the auction, and possibly updated bid amounts and bid users for each auction.
What i want is the ability to cache certain pages for a given amount of time or based on other changed variable. For example, memory caching the page that displays 10 auctions and only updating that cache copy when one of the auctions ends. Or even the script above that returns json string data every second. If i was able to cache the first request to this page in memory, serve the following requests from memory and then re-cache it again after 1 second, that could potentially reduce the serverload a lot. But i don't know if this is even possible or if the overhead of doing something like this outweights any request load savings.
I looked into xcache some but i couldn't find a way that i could set a particular cache time on a specific page or based on other variables?!? Maybe i'm missed something, but does anyone have a recommendation on a caching scheme that would work for these requirements?
Mucho thanks for any input you might have...
Cacheing can be done using many methods. Memcached springs to mind as being suited to your task. but if the site is ultra busy you may run out of ram.
When I do caching I often use a simple file cache, while it does involve at least one stat call to determine the freshness of the cached content it is still fast and marginally better than calling a sql server.
If you must call a sql server then it may pay to use a memory(heap) table to store much of the precomputed data. this technique is no more efficient than memcached, probably less so but saves you installing memcached.
DC
Zend_Cache can do what you want, and a lot more. It supports a lot of backends, including xcache and memcache, and allows you to cache data, full pages, partial pages, and well, just about anything you can imagine :p.
And in case you are wondering : you can use the Zend_Cache component by itself, you don't have to use the complete Zend framework for your application.

Categories