I'm writing a web service for my application and want to know the best way to handle the possibly tons of requests I might get. A lot of the data probably won't change throughout the day but the particular script I'm writing makes 3 MySQL queries which seem a little excessive considering the data will probably be the same as the last request to the script, and if it's not the same then it's no big deal.
Will performance be much better if I save the output XML/JSON to a file and then serve it to the requester throughout the day and then overwrite it with the first request of the following day? What's the best way of doing this?
I know Joomla and phpBB and other MySQL intensive applications use caching so as to not make as many MySQL queries, so this is what got me thinking.
EDIT - Forgot to mention I'm on Windows/IIS 7.0
Don't muck about with caching until you need to. Memcache is very often a premature optimisation and should be your last, not first, resort. Adding this sort of caching can result in complicated consistency problems.
Databases are not slow by nature, and certainly not slower than loading a bunch of cached data from flat files. But they can be slow by misuse. For example if one of your every-page-queries does a write to a MyISAM table, or does an unindexed query, or one of your queries is just complex and hard. Attack these situations by fixing your schema first.
You should take a look at memchached: access it through PHP's memcache or memcached
Memcache module provides handy procedural and object oriented interface to memcached, highly effective caching daemon, which was especially designed to decrease database load in dynamic web applications.
It is designed exactly for what you need and is used by many high-performance webapplications.
Memcached was developed to enhance the speed of LiveJournal.com, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. The introduction of memcached dropped the database load enormously.
Note
There are two (2) client libraries in PHP. Some more discussion on this can be found on serverfaul.com: memcache-vs-memcached and here is a comparison
The quick and dirty way would be something like this.
1. create a cachefilename for this specific url
2. check if such a file exists in the cache directory and if it is younger than n minutes
2.1 if not:
2.2 // business logic goes here => Save output e.g. in a variable $output
2.3 save contents of $output in cachefilename
2.4 echo $output
3. if so:
3.1 $output = file_get_contents(cachefilename)
3.2 echo $output
It might not be as elegant as memcache or memcached, but you can use it virtually everywhere.
Memcached is great but might be a bit of an overkill for your intention. A file based caching approach like the one presented by Martin will work just fine. There is a number of ready-made caching libraries on file level for PHP out there.
Related
Lets assume you're developing a multiplayer game where the data is stored in a MySQL-database. For example the names and description texts of items, attributes, buffs, npcs, quests etc.
That data:
won't change often
is frequently requested
is required on server-side
and cannot be cached locally (JSON, Javascript)
To solve this problem, i wrote a file-based caching system that creates .php-files on the server and copies the entire mysql-tables as pre-defined php variables into them.
Like this:
$item_names = Array(0 => "name", 1 => "name");
$item_descriptions = Array(0 => "text", 1 => "text");
That file contains a loot of data, will end up having a size of around 500 KB and is then loaded on every user request.
Is that a good attempt to avoid unnecessary queries; Considering that query-caching is being deprecated in MySQL 8.0? Or is it better to just get the data needed using individual queries, even if ending up with hundreds of them per request?
I suggest you to use some kind of PSR-6 compilant cache system (it could be filesystem also) and later when your requests grow you can easily swap out to a more performant cache, like a PSR-6 Redis cache.
Example for PSR-6 compatible file system cache.
More info about PSR-6 Caching Interface
Instead of making your own caching mechanism, you can use Redis as it will handle all your caching requirements.
It will be easy to implement.
Follow the links to get to know more about Redis
REDIS
REDIS IN PHP
REDIS PHP TUTORIALS
In my experience...
You should only optimize for performance when you can prove you have a problem, and when you know where that problem is.
That means in practice that you should write load tests to exercise your application under "reasonable worst-case scenario" loads, and instrument your application so you can see what its performance characteristics are.
Doing any kind of optimization without a load test framework means you're coding on instinct; you may be making things worse without knowing it.
Your solution - caching entire tables in arrays - means every PHP process is loading that data into memory, which may or may not become a performance hit in its own right (do you know which request will need which data?). It also looks like you'll be doing a lot of relational logic in PHP (in your example, gluing the item_name to the item_description). This is something MySQL is really good at; your PHP code could easily be slower than MySQL at joins.
Then you have the problem of cache invalidation - how and when do you refresh the cached data? How does your application behave when the data is being refreshed? I've seen web sites slow to a crawl when cached data was being refreshed.
In short - it's a complicated decision, there are no obvious right/wrong answers. My first recommendation is "build a test framework so you can approach performance based on evidence", my second is "don't roll your own - consider using an ORM with built-in cache support", my third is "consider using something like Redis or memcached to store your cache information".
There are many possible solutions, depends on your requirements. Possible solution could be:
File base JSON format caching. Data retrieve from database will be save to a file for next time use before the program process.
Memory base cache, such as Memcached, APC, Redis, etc. Similar the upon solution, better performance but more integrated code required.
Memory base database, such as NoSQL, MongoDB, etc. It is a memory base database.
Multiple database servers, one master write database with multiple salve for read databases, there are a synchronisation between servers.
Quick and minimise the code changes, I suggest using option B.
Hi this is more of an information request really.
I'm currently working on a pretty large event listing website and have started thinking about some caching for the data sets being used.
I have been messing with APC this week and have seen some real improvements during testing however what I'm struggling to get my head around is best practices and techniques required when trying to cache data that changes frequently.
Say for example the user hits the home page, this by default displays the latest 10 events happening and if that user is logged in those events are location specific. Is it possible to deploy some kind of caching system when dealing with logged in states and data that changes frequently, the system currently allows the user to "show more events: which is an ajax request to pull extra results from the db.
I haven't really found anything on this as I'm not sure what to search for but I'm really interested to know the techniques used for advanced caching systems that deal especially with data that changes and data specific to users?
I mean is it even worth it? are the other performance boosters when dealing with this sort of criteria?
Any articles or tips and info on this will be greatly appreciated!! Please let me know if any other info is required!!
Your basic solutions are:
file cache
memcached/redis
APC
Each used for slightly different goal.
File cache is usually something that you utilize when you can pre-render files or parts of them. It is used in templating solutions, partial views (mvc), css frameworks. That sort of stuff.
Memcached and redis are both more or less equal, except redis is more of a noSQL oriented thing. They are used for distributed cache ( multiple servers , same cached data ) and for storing the sessions, if you have cluster of webservers.
APC is good for two things: opcode cache and data cache. Faster then memcached, but works for each server separately.
Bottom line is : in a huge project you will use all of them. Each for a different task.
So you have opcode caching, which speeds things up by saving already compiled PHP files in cache.
Then you have data caching, where you save variables or objects that take time to get like data built from SQL queries.
Then you have output caching, which is where you save entire blocks of your webpages in files, and output those files instead of building that block of your webpage on each request.
I once wrote a blog post about how to do output caching:
http://www.spotlesswebdesign.com/blog.php?id=17
If it's location specific, and there are a billion locations, your best bet is probably output caching assuming you have a lot of disc space, but you will have to use your head for what is best, as each situation is very different when it comes to how best to apply caching.
If done correctly, using memcached or similar solutions can give huge boosts to site performance. By altering the cached data directly instead of rehydrating it from the database you can bypass the database entirely for data that either doesn't need to be saved or can be trivially rebuilt. Since the database is often the most critical component in web applications, any load you can take off it is a bonus.
On the other hand, making sure your database queries are as light and efficient as possible will have a much larger impact on performance than most cache tweaks.
I want build a lightweight cached system (few server-resources) for easy serve pages (without SQL queries, without heavy-functions...), but end result depends of multiple user profiles.
I know existence of APC, memcached and other third-systems... The purpose of this question is learn the most efficient way (and theorical explanations) in a specific coding-scene.
What method I should use? (more efficient in end-process)
A) Multiple and different static html file-part readings (show in image, I think that bad performance for several read access to hard disk)
B) A single-file reading (dynamic script), that include data needed and reorder self (with strpos, str_replace ...)
C) Other better solution
Very thanks,
Note: Sorry for my bad english.
EDIT: Suppose I have applied APC and memcached on my system. I'm interested in system scheme/coding/structure.
Using memcached or APC is going to be a lot better than using static files. Generally speaking, you should use memcached if you have a bunch of web servers all serving content and want the cache to be shared between all web nodes, and use APC if you have just one or don't need a shared cache. These are the fast because they store your data in memory, and memory access is several orders of magnitude better that disk ready. That said, using static files as a cache is going to be a lot better than not using any kind of caching at all.
I constantly read on the Internet how it's important to correctly architect my PHP applications so that they can scale.
I have built a simple/small CMS that is written in PHP (think of Wordpress, but waaaay simpler).
I essentially have URLs like such: http://example.com/?page_id=X where X is the id in my MySQL database that has the page content.
How can I configure my application to be load balanced where I'm simply performing PHP read activities.
Would something like Nginx as the front door setup to route traffic to multi-nodes running my same code to handle example.com/?page_id=X be enough to "load balance" my site?
Obviously, MySQL is not being load balanced in this situation, though for simplicity - that makes that out of scope for this question.
These are some well known techniques for scaling such an app.
Reduce DB hits
Most often the bottle neck will be your DB, so cache recent pages so that you reduce DB activity, perhaps in something like memcached.
Design your schema such that it is partition-able.
In the simplest case, separate your data into logical partitions, and store each partition in a separate mysql DB. Craigslist, for example, partitions data by city, and in some cases, by section within that. In your case, you could partition by Id quite simply.
Manage php sessions
Putting ngnx in front of a php website will not work if you use sessions. Load balancing php does have issues as sessions are persisted on local storage. Therefore you need to do session management explicitly. The traditional solution is to use memcached to store and look up some kind of cookie.
Don't optimize prematurely.
Focus on getting your application out so that the next magnitude of current users gets the optimal experience.
Note: Your main potential pain points are discussed here on SO
No, it is not at all important to scale your application if you don't need to.
My view on this is:
Make it work
Make sure it works correctly - testability, robustness
Make it work efficiently enough to be cost effective to run
Then, if you have to so much traffic that your system cannot handle it, AND you've already thrown all the hardware that (sensible) money can buy at it, then you need to scale. Not sooner.
Yes it is relatively easy to scale read-workloads, because you can simply perform reads against readonly database replicas. The challenge is to scale write-workloads.
A lot of sites have few writes, even if they're really busy.
The correct approach is to use some kind of load balancer such as:
http://www.softwareprojects.com/resources/programming/t-how-to-install-and-configure-haproxy-as-an-http-loa-1752.html
What this does is forward a certain user session only to a certain server, hence you dont have to worry about sessions and where they are stored at all. What you do have to worry is how to distribute the filesystem if the 2 servers are running on two different machines, especially if you make heavy use of the filesystem. Hope this article above helps...
A few minutes ago, I asked whether it was better to perform many queries at once at log in and save the data in sessions, or to query as needed. I was surprised by the answer, (to query as needed). Are there other good rules of thumb to follow when building PHP/MySQL multi-user apps that speed up performance?
I'm looking for specific ways to create the most efficient application possible.
hashing
know your hashes (arrays/tables/ordered maps/whatever you call them). a hash lookup is very fast, and sometimes, if you have O(n^2) loops, you may reduce them to O(n) by organizing them into an array (keyed by primary key) first and then processing them.
an example:
foreach ($results as $result)
if (in_array($result->id, $other_results)
$found++;
is slow - in_array loops through the whole $other_result, resulting in O(n^2).
foreach ($other_results as $other_result)
$hash[$other_result->id] = true;
foreach ($results as $result)
if (isset($hash[$result->id]))
$found++;
the second one is a lot faster (depending on the result sets - the bigger, the faster), because isset() is (almost) constant time. actually, this is not a very good example - you could do this even faster using built in php functions, but you get the idea.
optimizing (My)SQL
mysql.conf: i don't have any idea how much performance you can gain by optimizing your mysql configuration instead of leaving the default. but i've read you can ignore every postgresql benchmark that used the default configuration. afaik with configuration matters less with mysql, but why ignore it? rule of thumb: try to fit the whole database into memory :)
explain [query]: an obvious one, a lot of people get wrong. learn about indices. there are rules you can follow, you can benchmark it and you can make a huge difference. if you really want it all, learn about the different types of indices (btrees, hashes, ...) and when to use them.
caching
caching is hard, but if done right it makes the difference (not a difference). in my opinion: if you can live without caching, don't do it. it often adds a lot of complexity and points of failures. google did a bit of proxy caching once (to make the intertubes faster), and some people saw private information of others.
in php, there are 4 different kinds of caching people regulary use:
query caching: almost always translates to memcached (sometimes to APC shared memory). store the result set of a certain query to a fast key/value (=hashing) storage engine. queries (now lookups) become very cheap.
output caching: store your generated html for later use (instead of regenerating it every time). this can result in the biggest speed-ups, but somewhat works against PHPs dynamic nature.
browser caching: what about etags and http responses? if done right you may avoid most of the work right at the beginning! most php programmers ignore this option because they have no idea what HTTP is.
opcode caching: APC, zend optimizer and so on. makes php code load faster. can help with big applications. got nothing to do with (slow) external datasources though, and the potential is somewhat limited.
sometimes it's not possible to live without caches, e.g. if it comes to thumbnails. image resizing is very expensive, but fortunatley easy to control (most of the time).
profiler
xdebug shows you the bottlenecks of your application. if your app is too slow, it's helpful to know why.
queries in loops
there are (php-)experts who do not know what a join is (and for every one you educate, two new ones without that knowledge will surface - and they will write frameworks, see schnalles law). sometimes, those queries-in-loops are not that obvious, e.g. if they come with libraries. count the queries - if they grow with the results shown, there is something wrong.
inexperienced developers do have a primal, insatiable urge to write frameworks and content management systems
schnalle's law
Optimize your MySQL queries first, then the PHP that handles it, and then lastly cache the results of large queries and searches. MySQL is, by far, the most frequent bottleneck in an application. A poorly designed query can take two to three times longer than a well designed query that only selects needed information.
Therefore, if your queries are optimized before you cache them, you have saved a good deal of processing time.
However, on some shared hosts caching is file-system only thanks to a lack of Memcached. In this instance it may be better to run smaller queries than it is to cache them, as the seek time of the hard drive (and waiting for access due to other sites) can easily take longer than the query when your site is under load.
Cache.
Cache.
Speedy indexed queries.