Can memory based database replace the need for caching?

Can memory based database replace the need for caching? - php

Mysql has memory based data engines, which means it keeps the data in RAM.
There are two types of memory storage engine in Mysql as far as I know that use memory,
One is Memory engine itself
The not very cool feature of this storage engine is that only creates virtual tables which means if the server is restarted the data is lost
The other one is Cluster storage engine
This doesn't have the drawback of the previous engine, it uses memory but it also keeps a file based record of data.
Now the question is if your Database is already using RAM to store and process data, do you need to add another caching engine like Memcached in order to boost your product's performance?
How fast is a memory engined database compared to Memcached?
Does Memcache add any features to your products that a memory engined database doesn't?
Plus memory engined database gives you more features like being able to request queries, compared to Memcached which will only let you get raw data, so Memcached is kind of like a database engine that only supports SELECT command.
Am I missing something?

It depends how you use memcached. If you use it to cache a rendered HTML page that took 30 SQL queries to build, then it will give you a performance boost even over an in-memory database.

A (relational) database and caching service are complementary. As pointed out, they have different design goals and use-cases. (And yet I find the core advantage of a database missing in the post.)
Memcached (and other caches) offer some benefits that can not be realized under an ACID database model. This is a trade-off but such caches are designed for maximum distribution and minimum latency. Memcached is not a database: it is a distributed key-value store with some eviction policies. Because it is merely a key-value store it can "skip" many of the steps in querying database data -- at the expensive of only directly supporting 1-1 operations. No joins, no relationships, a single result, etc.
Remember, a cache is just that: a cache. It is not a reliable information store nor does it offer the data-integrity/consistency found in a (relational) database. Even cache systems which can "persist" data do not necessarily have ACID guarantees (heck, even MyISAM isn't fully ACID!). Some cache systems offer much stronger synchronization/consistency guarantees; memcached is not such a system.
The bottomline is, be because of memcache's simple design and model, it will win in latency and distribution over the realm it operates on. How much? Well, that depends...
...first, pick approach(es) with the required features and guarantees and then benchmark the approaches to determine which one(s) are suitable (or "best suited") for the task. (There might not even be a need to use a cache or "memory database" at all or the better approach might be to use a "No SQL" design.)
Happy coding.

Memcached can store up to 1mb of data. What it does is that it leverages the db load in such way that you don't even connect to the db in order to ask it for data. Majority of websites have a small amount of data that they display to the user (in terms of textual data, not the files themselves).
So to answer - yes, it's a good idea to have Memcached too since it can help you so you don't even connect to the db, which removes some overhead at the start.
On the other hand, there's plethora of engines available for MySQL. Personally, I wouldn't use memory engine for many reasons - one of them being the loss of data. InnoDB, the default MySQL engine as of recent release - already stores the working data set in the memory (controlled by innodb_buffer_pool variable) and it's incredibly fast if the dataset can fit in the memory.
There's also TokuDB engine that surpasses InnoDB in terms of scaling, both better than memory engine. However, it's always a good thing to cache the data that's frequently accessed and rarely changed.

Related

File based cache or many mysql queries

Lets assume you're developing a multiplayer game where the data is stored in a MySQL-database. For example the names and description texts of items, attributes, buffs, npcs, quests etc.
That data:
won't change often
is frequently requested
is required on server-side
and cannot be cached locally (JSON, Javascript)
To solve this problem, i wrote a file-based caching system that creates .php-files on the server and copies the entire mysql-tables as pre-defined php variables into them.
Like this:
$item_names = Array(0 => "name", 1 => "name");
$item_descriptions = Array(0 => "text", 1 => "text");
That file contains a loot of data, will end up having a size of around 500 KB and is then loaded on every user request.
Is that a good attempt to avoid unnecessary queries; Considering that query-caching is being deprecated in MySQL 8.0? Or is it better to just get the data needed using individual queries, even if ending up with hundreds of them per request?

I suggest you to use some kind of PSR-6 compilant cache system (it could be filesystem also) and later when your requests grow you can easily swap out to a more performant cache, like a PSR-6 Redis cache.
Example for PSR-6 compatible file system cache.
More info about PSR-6 Caching Interface

Instead of making your own caching mechanism, you can use Redis as it will handle all your caching requirements.
It will be easy to implement.
Follow the links to get to know more about Redis
REDIS
REDIS IN PHP
REDIS PHP TUTORIALS

In my experience...
You should only optimize for performance when you can prove you have a problem, and when you know where that problem is.
That means in practice that you should write load tests to exercise your application under "reasonable worst-case scenario" loads, and instrument your application so you can see what its performance characteristics are.
Doing any kind of optimization without a load test framework means you're coding on instinct; you may be making things worse without knowing it.
Your solution - caching entire tables in arrays - means every PHP process is loading that data into memory, which may or may not become a performance hit in its own right (do you know which request will need which data?). It also looks like you'll be doing a lot of relational logic in PHP (in your example, gluing the item_name to the item_description). This is something MySQL is really good at; your PHP code could easily be slower than MySQL at joins.
Then you have the problem of cache invalidation - how and when do you refresh the cached data? How does your application behave when the data is being refreshed? I've seen web sites slow to a crawl when cached data was being refreshed.
In short - it's a complicated decision, there are no obvious right/wrong answers. My first recommendation is "build a test framework so you can approach performance based on evidence", my second is "don't roll your own - consider using an ORM with built-in cache support", my third is "consider using something like Redis or memcached to store your cache information".

There are many possible solutions, depends on your requirements. Possible solution could be:
File base JSON format caching. Data retrieve from database will be save to a file for next time use before the program process.
Memory base cache, such as Memcached, APC, Redis, etc. Similar the upon solution, better performance but more integrated code required.
Memory base database, such as NoSQL, MongoDB, etc. It is a memory base database.
Multiple database servers, one master write database with multiple salve for read databases, there are a synchronisation between servers.
Quick and minimise the code changes, I suggest using option B.

DSE Query optimization and use cases

Should I use Solr for all of my reading activities and then Cassandra for all writes to maximise the performance of DSE? or can I read using Cassandra but obviously on a key value basis for select activities?

Cassandra is a write-optimised database and so reads may be slow, but Solr should be used a crutch or a 'nitro-boost' if you will, and not as the go-to method for reading. Because if your reads are slow, it may be because the DB design is fundamentally flawed and that could be dangerous for scaling as well as maintenance.
Maximizing the performance of a DSE should be based on the pattern of your reads and writes. For example if your users table is only used for login and a couple of other times for profile related data, you don't need Solr for that. Some duplicate tables with different keys should suffice.
However if your app is an ERP that requires user data at all times, Solr indexing for faster reads should be considered.
And to reiterate, if your reads are slow, check if a better db design can solve the issue.

Best MySQL storage engine to use for PHP session storage

I want to use MySQL to store session variables. From what I understand this means that on every page request there will be one read and one write to the table.
Which MySQL storage engine is best suited for this task? MyISAM, InnoDB , MariaDB (which I don't see in PHPMyAdmin), Memory, or something else entirely?

"Best" means nothing. You need to express your constraints: do you need consistency? Durability? High-availability? Performance? A combination of all these properties? Can you afford to loose your sessions? Can they fit in memory? Do you need to support concurrent accesses to the same data?
Without more context, I would choose InnoDB which is the most balanced storage engine. It provides correct performance for OLTP applications, ACID transactions, good reliability, and sensible concurrency management. Session variables access will likely be done using primary keys, and this operation is very efficient with InnoDB.
Now if performance is really a constraint, I would rather use a NoSQL engine (i.e. not MySQL). To store session data, Redis usually does a very good job, and is easy enough to integrate and deploy.

Memory storage engine sounds to be the best option. Keep in mind that this is good for temporary sessions.
http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html

It depends on how you evaluate "betterness":
MyISAM is the most common (many shared hosting packages only let you use MyISAM). plus it is rather limited in the relationship control aspect, so you set it up really fast and easy. if you want portability and fast implementation across multiple hosting scenarios, MYISAM IS BEST.
InnoDB allows you to create relationships and saveguard data integrity by linking keys in different tables, which means more work but much more professional db design. many shared hosting packages do not implement InnoDB, therefore when exporting table structure from one environment to another, you might have some extra work to do. if you want realationship management and control, INNODB IS BEST.
As far as data portability is concerned, an InnoDB database will be completely accepted by MyISAM (because MyISAM does not check data integrity: "is there a user number 4 in the user database when i insert a new record in user_car, for example"). If you start out with MyISAM, exporting to a full-fledged InnoDB database will be a nightmare, even if your data has all keys, table data must be imported in the correct order (user and car, before user_car).
MariaDB? never, simply because less people use it, therefore you will have less support, as compared to MyISAM and InnoDB.
Bottom line clincher: INNODB.

If you do not wan't the overhead from a SQL connection consider using MemCached session sorage. See http://php.net/manual/en/memcached.sessions.php

Cassandra is much slower than Mysql for simple operations?

I see a lot of statements like: "Cassandra very fast on writes", "Cassandra has reads really slower than writes, but much faster than Mysql"
On my windows7 system:
I installed Mysql of default configuration.
I installed PHP5 of default configuration.
I installed Casssandra of default configuration.
Making simple write test on mysql: "INSERT INTO wp_test (id,title) VALUES ('id01','test')" gives me result: 0.0002(s)
For 1000 inserts: 0.1106(s)
Making simple same write test on Cassandra: $column_faily->insert('id01',array('title'=>'test')) gives me result of: 0.005(s)
For 1000 inserts: 1.047(s)
For reads tests i also got that Cassandra is much slower than mysql.
So the question, does this sounds correct that i have 5ms for one write operation on Cassadra? Or something is wrong and should be at least 0.5ms.

When people say "Cassandra is faster than MySQL", they mean when you are dealing with terabytes of data and many simultaneous users. Cassandra (and many distributed NoSQL databases) is optimized for hundreds of simultaneous readers and writers on many nodes, as opposed to MySQL (and other relational DBs) which are optimized to be really fast on a single node, but tend to fall to pieces when you try to scale them across multiple nodes. There is a generalization of this trade-off by the way- the absolute fastest disk I/O is plain old UNIX flat files, and many latency-sensitive financial applications use them for that reason.
If you are building the next Facebook, you want something like Cassandra because a single MySQL box is never going to stand up to the punishment of thousands of simultaneous reads and writes, whereas with Cassandra you can scale out to hundreds of data nodes and handle that load easily. See scaling up vs. scaling out.
Another use case is when you need to apply a lot of batch processing power to terabytes or petabytes of data. Cassandra or HBase are great because they are integrated with MapReduce, allowing you to run your processing on the data nodes. With MySQL, you'd need to extract the data and spray it out across a grid of processing nodes, which would consume a lot of network bandwidth and entail a lot of unneeded complication.

Cassandra benefits greatly from parallelisation and batching. Try doing 1 million inserts on each of 100 threads (each with their own connection & in batches of 100) and see which ones is faster.
Finally, Cassandra insert performance should be relatively stable (maintaining high throughput for a very long time). With MySQL, you will find that it tails off rather dramatically once the btrees used for the indexes grow too large memory.

It's likely that the maturity of the MySQL drivers, especially the improved MySQL drivers in PHP 5.3, is having some impact on the tests. It's also entirely possible that the simplicity of the data in your query is impacting the results - maybe on 100 value inserts, Cassandra becomes faster.
Try the same test from the command line and see what the timestamps are, then try with varying numbers of values. You can't do a single test and base your decision on that.

Many user space factors can impact write performance. Such as:
Dozens of settings in each of the database server's configuration.
The table structure and settings.
The connection settings.
The query settings.
Are you swallowing warnings or exceptions? The MySQL sample would on face value be expected to produce a duplicate key error. It could be failing while doing nothing at all. What Cassandra might do in the same case isn't something I'm familiar with.
My limited experience of Cassandra tell me one thing about inserts, while performance of everything else degrades as data grows, inserts appear to maintain the same speed. How fast it is compared to MySQL however isn't something I've tested.
It might not be so much that inserts are fast but rather tries to be never slow. If you want a more meaningful test you need to incorporate concurrency and more variations on scenario such as large data sets, various batch sizes, etc. More complex tests might test latency for availability of data post insert and read speed over time.
It would not surprise me if Cassandra's first port of call for inserting data is to put it on a queue or to simply append. This is configurable if you look at consistency level. MySQL similarly allows you to balance performance and reliability/availability though each will have variations on what they allow and don't allow.
Outside of that unless you get into the internals it may be hard to tell why one performs better than the other.
I did some benchmarks of a use case I had for Cassandra a while ago. For the benchmark it would insert tens of thousands of rows first. I had to make the script sleep for a few seconds because otherwise queries run after the fact would not see the data and the results would be inconsistent between implementations I was testing.
If you really want fast inserts, append to a file on ramdisk.

When to use Redis instead of MySQL for PHP applications?

I've been looking at Redis. It looks very interesting. But from a practical perspective, in what cases would it be better to use Redis over MySQL?

Ignoring the whole NoSQL vs SQL debate, I think the best approach is to combine them. In other words, use MySQL for for some parts of the system (complex lookups, transactions) and redis for others (performance, counters etc).
In my experience, performance issues related to scalability (lots of users...) eventually forces you to add some kind of cache to remove load from the MySQL server and redis/memcached is very good at that.

I am no Redis expert, but from what I've gathered, both are pretty different. Redis :
Is not a relational database (no fancy data organisation)
Stores everything in memory (faster, less space, probably less safe in case of a crash)
Is less widely deployed on various webhosts (if you're not hosting yourself)
I think you might want to use Redis for when you have a small-ish quantity of data that doesn't need the relational structure that MySQL offers, and requires fast access. This could for example be session data in a dynamic web interface that needs to be accessed often and fast.
Redis could also be used as a cache for some MySQL data which is going to be accessed very often (ie: load it when a user logs in).
I think you're asking the question the wrong way around, you should ask yourself which one is more suited to an application, rather than which application is suited to a system ;)

MySQL is a relational data store. If configured (e.g. using innodb tables), MySQL is a reliable data-store offering ACID transactions.
Redis is a NoSQL database. It is faster (if used correctly) because it trades speed with reliability (it is rare to run with fsync as this dramatically hurts performance) and transactions (which can be approximated - slowly - with SETNX).
Redis has some very neat features such as sets, lists and sorted lists.
These slides on Redis list statistics gathering and session management as examples. There is also a twitter clone written with redis as an example, but that doesn't mean twitter use redis (twitter use MySQL with heavy memcache caching).

MySql -
1) Structured data
2) ACID
3) Heavy transactions and lookups.
Redis -
1) Non structured data
2) Simple and quick lookups. for eg - token of a session
3) use it for caching layer.

Redis, SQL (+NoSQL) have their benefits+drawbacks and often live side by side:
Redis - Local variables moved to a separate application
Easy to move from local variables/prototype
Persistant storrage
Multiple users/applications all see the same data
Scalability
Failover
(-) Hard to do more advanced queries/questions on the data
NoSQL
Dump raw data into the "database"
All/most of Redis features
(-) Harder to do advanced queries, compared to SQL
SQL
Advanced queries between data
All/most of Redis features
(-) Need to place data into "schema" (think sheet/Excel)
(-) Bit harder to get simple values in/out than Redis/NoSQL
(different SQL/NoSQL solutions can vary. You should read up on CAP theorem and ACID on why one system can't simultaneously give you all)

According to the official website, Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. Actually, Redis is an advanced key-value store. It is literally super fast with amazingly high throughput as it can perform approximately 110000 SETs per second, about 81000 GETs per second. It also supports a very rich set of data types to store. As a matter of fact, Redis keeps the data in-memory every time but also persistent on-disk database. So, it comes with a trade-off: Amazing speed with the size limit on datasets (as per memory). In this article, to have some benchmarks in comparison to MySQL, we would be using Redis as a caching engine only.
Read Here: Redis vs MySQL Benchmarks

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.