I've been looking at Redis. It looks very interesting. But from a practical perspective, in what cases would it be better to use Redis over MySQL?
Ignoring the whole NoSQL vs SQL debate, I think the best approach is to combine them. In other words, use MySQL for for some parts of the system (complex lookups, transactions) and redis for others (performance, counters etc).
In my experience, performance issues related to scalability (lots of users...) eventually forces you to add some kind of cache to remove load from the MySQL server and redis/memcached is very good at that.
I am no Redis expert, but from what I've gathered, both are pretty different. Redis :
Is not a relational database (no fancy data organisation)
Stores everything in memory (faster, less space, probably less safe in case of a crash)
Is less widely deployed on various webhosts (if you're not hosting yourself)
I think you might want to use Redis for when you have a small-ish quantity of data that doesn't need the relational structure that MySQL offers, and requires fast access. This could for example be session data in a dynamic web interface that needs to be accessed often and fast.
Redis could also be used as a cache for some MySQL data which is going to be accessed very often (ie: load it when a user logs in).
I think you're asking the question the wrong way around, you should ask yourself which one is more suited to an application, rather than which application is suited to a system ;)
MySQL is a relational data store. If configured (e.g. using innodb tables), MySQL is a reliable data-store offering ACID transactions.
Redis is a NoSQL database. It is faster (if used correctly) because it trades speed with reliability (it is rare to run with fsync as this dramatically hurts performance) and transactions (which can be approximated - slowly - with SETNX).
Redis has some very neat features such as sets, lists and sorted lists.
These slides on Redis list statistics gathering and session management as examples. There is also a twitter clone written with redis as an example, but that doesn't mean twitter use redis (twitter use MySQL with heavy memcache caching).
MySql -
1) Structured data
2) ACID
3) Heavy transactions and lookups.
Redis -
1) Non structured data
2) Simple and quick lookups. for eg - token of a session
3) use it for caching layer.
Redis, SQL (+NoSQL) have their benefits+drawbacks and often live side by side:
Redis - Local variables moved to a separate application
Easy to move from local variables/prototype
Persistant storrage
Multiple users/applications all see the same data
Scalability
Failover
(-) Hard to do more advanced queries/questions on the data
NoSQL
Dump raw data into the "database"
All/most of Redis features
(-) Harder to do advanced queries, compared to SQL
SQL
Advanced queries between data
All/most of Redis features
(-) Need to place data into "schema" (think sheet/Excel)
(-) Bit harder to get simple values in/out than Redis/NoSQL
(different SQL/NoSQL solutions can vary. You should read up on CAP theorem and ACID on why one system can't simultaneously give you all)
According to the official website, Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache and message broker. Actually, Redis is an advanced key-value store. It is literally super fast with amazingly high throughput as it can perform approximately 110000 SETs per second, about 81000 GETs per second. It also supports a very rich set of data types to store. As a matter of fact, Redis keeps the data in-memory every time but also persistent on-disk database. So, it comes with a trade-off: Amazing speed with the size limit on datasets (as per memory). In this article, to have some benchmarks in comparison to MySQL, we would be using Redis as a caching engine only.
Read Here: Redis vs MySQL Benchmarks
Related
Lets assume you're developing a multiplayer game where the data is stored in a MySQL-database. For example the names and description texts of items, attributes, buffs, npcs, quests etc.
That data:
won't change often
is frequently requested
is required on server-side
and cannot be cached locally (JSON, Javascript)
To solve this problem, i wrote a file-based caching system that creates .php-files on the server and copies the entire mysql-tables as pre-defined php variables into them.
Like this:
$item_names = Array(0 => "name", 1 => "name");
$item_descriptions = Array(0 => "text", 1 => "text");
That file contains a loot of data, will end up having a size of around 500 KB and is then loaded on every user request.
Is that a good attempt to avoid unnecessary queries; Considering that query-caching is being deprecated in MySQL 8.0? Or is it better to just get the data needed using individual queries, even if ending up with hundreds of them per request?
I suggest you to use some kind of PSR-6 compilant cache system (it could be filesystem also) and later when your requests grow you can easily swap out to a more performant cache, like a PSR-6 Redis cache.
Example for PSR-6 compatible file system cache.
More info about PSR-6 Caching Interface
Instead of making your own caching mechanism, you can use Redis as it will handle all your caching requirements.
It will be easy to implement.
Follow the links to get to know more about Redis
REDIS
REDIS IN PHP
REDIS PHP TUTORIALS
In my experience...
You should only optimize for performance when you can prove you have a problem, and when you know where that problem is.
That means in practice that you should write load tests to exercise your application under "reasonable worst-case scenario" loads, and instrument your application so you can see what its performance characteristics are.
Doing any kind of optimization without a load test framework means you're coding on instinct; you may be making things worse without knowing it.
Your solution - caching entire tables in arrays - means every PHP process is loading that data into memory, which may or may not become a performance hit in its own right (do you know which request will need which data?). It also looks like you'll be doing a lot of relational logic in PHP (in your example, gluing the item_name to the item_description). This is something MySQL is really good at; your PHP code could easily be slower than MySQL at joins.
Then you have the problem of cache invalidation - how and when do you refresh the cached data? How does your application behave when the data is being refreshed? I've seen web sites slow to a crawl when cached data was being refreshed.
In short - it's a complicated decision, there are no obvious right/wrong answers. My first recommendation is "build a test framework so you can approach performance based on evidence", my second is "don't roll your own - consider using an ORM with built-in cache support", my third is "consider using something like Redis or memcached to store your cache information".
There are many possible solutions, depends on your requirements. Possible solution could be:
File base JSON format caching. Data retrieve from database will be save to a file for next time use before the program process.
Memory base cache, such as Memcached, APC, Redis, etc. Similar the upon solution, better performance but more integrated code required.
Memory base database, such as NoSQL, MongoDB, etc. It is a memory base database.
Multiple database servers, one master write database with multiple salve for read databases, there are a synchronisation between servers.
Quick and minimise the code changes, I suggest using option B.
Mysql has memory based data engines, which means it keeps the data in RAM.
There are two types of memory storage engine in Mysql as far as I know that use memory,
One is Memory engine itself
The not very cool feature of this storage engine is that only creates virtual tables which means if the server is restarted the data is lost
The other one is Cluster storage engine
This doesn't have the drawback of the previous engine, it uses memory but it also keeps a file based record of data.
Now the question is if your Database is already using RAM to store and process data, do you need to add another caching engine like Memcached in order to boost your product's performance?
How fast is a memory engined database compared to Memcached?
Does Memcache add any features to your products that a memory engined database doesn't?
Plus memory engined database gives you more features like being able to request queries, compared to Memcached which will only let you get raw data, so Memcached is kind of like a database engine that only supports SELECT command.
Am I missing something?
It depends how you use memcached. If you use it to cache a rendered HTML page that took 30 SQL queries to build, then it will give you a performance boost even over an in-memory database.
A (relational) database and caching service are complementary. As pointed out, they have different design goals and use-cases. (And yet I find the core advantage of a database missing in the post.)
Memcached (and other caches) offer some benefits that can not be realized under an ACID database model. This is a trade-off but such caches are designed for maximum distribution and minimum latency. Memcached is not a database: it is a distributed key-value store with some eviction policies. Because it is merely a key-value store it can "skip" many of the steps in querying database data -- at the expensive of only directly supporting 1-1 operations. No joins, no relationships, a single result, etc.
Remember, a cache is just that: a cache. It is not a reliable information store nor does it offer the data-integrity/consistency found in a (relational) database. Even cache systems which can "persist" data do not necessarily have ACID guarantees (heck, even MyISAM isn't fully ACID!). Some cache systems offer much stronger synchronization/consistency guarantees; memcached is not such a system.
The bottomline is, be because of memcache's simple design and model, it will win in latency and distribution over the realm it operates on. How much? Well, that depends...
...first, pick approach(es) with the required features and guarantees and then benchmark the approaches to determine which one(s) are suitable (or "best suited") for the task. (There might not even be a need to use a cache or "memory database" at all or the better approach might be to use a "No SQL" design.)
Happy coding.
Memcached can store up to 1mb of data. What it does is that it leverages the db load in such way that you don't even connect to the db in order to ask it for data. Majority of websites have a small amount of data that they display to the user (in terms of textual data, not the files themselves).
So to answer - yes, it's a good idea to have Memcached too since it can help you so you don't even connect to the db, which removes some overhead at the start.
On the other hand, there's plethora of engines available for MySQL. Personally, I wouldn't use memory engine for many reasons - one of them being the loss of data. InnoDB, the default MySQL engine as of recent release - already stores the working data set in the memory (controlled by innodb_buffer_pool variable) and it's incredibly fast if the dataset can fit in the memory.
There's also TokuDB engine that surpasses InnoDB in terms of scaling, both better than memory engine. However, it's always a good thing to cache the data that's frequently accessed and rarely changed.
I need to update a large db quickly. It may be easier to code in a scripting language but I suspect a C program would do the update faster. Anybody know if there have been comparative speed tests?
It wouldn't.
The rate of the update speed depends on:
database configuration (engine used, db config)
hardware of the server, especially the HDD subsystem
network bandwith between source and target machine
amount of data transfered
I suspect that you think that a scripting language will be a hog in this last part - amount of data transfered.
Any scripting language will be fast enough to deliver the data. If you have a large amount of data that you need to parse / transform quickly - then yes, C would definitely be language of choice. However if it's sending simple string data to the db, there's no point in doing that, although it's not like it's difficult to create a simple C program for UPDATE operation. It's not like it's that complicated to do it in C, it's almost on par with using PHP's mysql_ functions from "complexity" point of view.
Are you concerned about speed because you're already dealing with a situation where speed is a problem, or are you just planning ahead?
I can say comfortably that DB interactions are generally constrained by IO, network bandwidth, memory, database traffic, SQL complexity, database configuration, indexing issues, and the quantity of data being selected far more than by the choice of a scripting language versus C.
When you run into bottlenecks, they'll almost always be solved by a better algorithm, smarter use of indexes, faster IO devices, more caching... those sorts of things (beginning with algorithms).
The fourth component of LAMP is a scripting language after all. When fine tuning, memcache becomes an option, as well as persistent interpreters (such as mod_perl in a web environment, for example).
The majority cost in database transactions lie on the database side. The cost of interpreting / compiling your SQL statement and evaluating the query execution is much more substantial than any difference to be found in the language of what sent it.
It is in rare situations that the application's CPU usage for database-intensive work is a greater factor than the CPU use of the database server, or the disk speed of that server.
Unless your applications are long-running and don't wait on the database, I wouldn't worry about benchmarking them. If they do need benchmarking, you should do it yourself. Data use cases vary wildly and you need your own numbers.
Since C's a lower-level language, it won't have the parseing/type-conversion overhead that the scripting languages will. A MySQL int can map directly onto a C int, whereas a PHP int has various metadata attached to it that needs to be populated/updated.
On the other hand, if you need to do any text manipulation as part of this large update, any speed gains from C would probably be lost in hairpulling/debugging because of its poor string manipulation support versus what you could do with trivial ease in a scripting language like Perl or PHP.
I've heard speculation that the C API is faster, but I haven't seen any benchmarks. For performing large database operations quickly, regardless of programming language, use Stored Procedures: http://dev.mysql.com/tech-resources/articles/mysql-storedprocedures.html.
The speed comes from the fact that there is a reduced strain on the network.
From that link:
Stored procedures are fast! Well, we
can't prove that for MySQL yet, and
everyone's experience will vary. What
we can say is that the MySQL server
takes some advantage of caching, just
as prepared statements do. There is no
compilation, so an SQL stored
procedure won't work as quickly as a
procedure written with an external
language such as C. The main speed
gain comes from reduction of network
traffic. If you have a repetitive task
that requires checking, looping,
multiple statements, and no user
interaction, do it with a single call
to a procedure that's stored on the
server. Then there won't be messages
going back and forth between server
and client, for every step of the
task.
The C API will be marginally faster, for the simple reason that any other language (regardless of whether it's a "scripting language" or a fully-compiled language) will probably, at some level, be mapping from that language to the C API. Using the C API directly will obviously be a few dozen CPU cycles faster than performing a mapping operation and then using the C API.
But this is just spitting in the ocean. Even accessing main memory is an order of magnitude or two slower than CPU cycles on a modern machine and I/O operations (disk or network access) are several orders of magnitude slower still. There's no point in optimizing to make it a microsecond faster to send the query if it will still take half a second (or even multiple seconds, for queries which are complex or examine/return large amounts of data) to actually run the query.
Choose the language that you will be most productive in and don't worry about micro-optimizing language choice. Even if the language itself becomes a performance issue (which is extremely unlikely), your additional productivity will save more money than the cost of an additional server.
I have found that for large batches of data (Gigabytes or more), it is commonly faster overall to dump the data from mysql into a file or multiple files on an application machine. Then process it there (with your favourite tool, here: Perl) and the use LOAD DATA LOCAL INFILE to slurp it back into a fresh table while doing as little as possible in SQL. While doing that, you should
remove indexes from the table before LOAD (may not be necessary for MyISAM, but meh).
always, ALWAYS load the data in PK order!
add indexes after being done with loading.
Another advantage is that it may be much easier to parallelize the processing on a cheap application machine with a bunch of fast-but-volatile disks rather than do concurrent writing to your expensive and non-scalable database master.
Either way. Large datasets usually mean that the DB is the bottleneck.
I'm using memcache for caching (obviously) which is great. But I'm also using it as a cross-request/process data store. For instance I have a web chat on one of my pages and I use memcache to store the list of online users in it. Works great but it bothers me that if I have to flush the whole memcache server (for whatever reason) I loose the online list. I also use it to keep record of views for some content (I then periodically update the actual rows in the DB), and if I clear the cache I loose all data about views (from the last write to db).
So what I'm asking is this: what should I use instead of memcache for this kind of things? It needs to be fast and preferably store it's data in memory. I think some noSQL product would be a good fit here, but I've no idea which one. I'd like to use something that I could use for other use cases in the future, analytics come to mind (what are users searching the most for instance).
I'm using PHP so it has to have good bindings for it.
Redis! It's like memcache on steroids (the good kind). Here are some drivers.
Redis is an advanced key-value store. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists, sets, and ordered sets. All this data types can be manipulated with atomic operations to push/pop elements, add/remove elements, perform server side union, intersection, difference between sets, and so forth. Redis supports different kind of sorting abilities.
You could try memcachedb. It uses exactly the same protocol as memcache, but it's persistent store.
You could also try cassandra or redis
An SQL database is overkill if your storage needs are small. When I was young and dumb, I used a text file and flock()ed it when I needed to access it. This doesn't scale, but I still feel that non-database solutions have been completely ignored in Web 2.0.
Does anyone not use an SQL database for storage? What are the alternatives?
There are a lot of alternatives. But having SQLite which gives you SQL power combined with no fuss of file based storage, there is no need to look for these alternatives. SQLite is light enough to be used in cell phones and MP3 players, so I don't see how it could be considered an overkill.
So unless your application needs something very specific, don't bother. Most alternatives are a lot harder to use and have less performance.
SQLite is invented for this.
It's just a flat-file that contains a complete SQL database. You can query, update, insert, delete, there's little to no overhead in installation and all you need is the driver (which comes standard in PHP )
SQLite is a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.
Kind of weird that nobody mentioned this already?
CouchDB (http://couchdb.apache.org/index.html) is a non-sql database, and seems to be a popular project these days, as well as Google's bigtable, or GT.M (http://sourceforge.net/projects/fis-gtm) which has been around forever.
Object databases abound as well; dbforobjects (http://www.db4o.com/), ZODB (http://www.zope.org/Products/StandaloneZODB), just to name a few.
All of these are supposedly faster and simpler than traditional SQL databases for certain use cases, but none approach the simplicity of a flat file.
A distributed hash table like google bigtable or hadoop is a simple and scalable non SQL database and often suits the websites far better than a SQL database. SQL is great for complex relational data, but most websites don't have this requirement. Most websites store and retrieve data in a few forms and don't need to run complex operations on the data.
Take a look at one of these solutions as they will provide all of the concurrent access that you need but don't subscribe to the traditional ideas of data normalisation. They can be thought of as pretty analogous to a bunch of named text files.
It probably depends how dynamic your web site is. I used wiki software once that used RCS to check in and out text files. I wouldn't recommend that solution for something that gets as many updates as StackOverflow or Wikipedia. The thing about database is that they scale well, and the database engine writers have figured out all the fiddly little details of simultaneous access, load balancing, replication, etc.
I would say that it doesn't depend on whether you store less or more information, it depends on how often you are requesting the stored data. Databasemanagers are superb on caching queries, so they are often the better choice performance wise. How ever, if you don't need a dynamic web page and are just loading static data - maybe a text file is the better option. Which format the data is stored in (i.e. XML, JSON, key=pair) doesn't matter - it's I/O operations that are performance heavy.
When I'm developing web applications, I always use a RDBMS as the primary data holder. If the web application don't need to serve dynamic data at every request, I simply apply a cache functionality storing the data in a cache file that gets requested when no new data have been added to the primary data source (the RDBMS).
I wouldn't choose whether to use an SQL database based on how much data I wanted to store - I would choose based on what kind of data I wanted to store and how it is to be used.
Wikipeadia defines a database as: A database is a structured collection of records or data that is stored in a computer system. And I think your answer lies there: If you want to store records such as customer accounts, access rights and so on then a DB such as mySQL or SQLite or whatever is not overkill. They give you a tried and trusted mechanism for managing those records.
If, on the other hand, your website stores and delivers unchanging file-based content such as PDFs, reports, mp3s and so on then simply storing them in a well-defined directory layout on a disk is more than enough. I would also include XML documents here: if you had for example a production department that created articles for a website in XML format there is no need to put them in a DB - store them on disk and use XSLT to deliver them.
Your choice of SQL or not will also depend on how the content you wish to store is to be retrieved. SQL is obviously good for retrieving many records based on search criteria whereas a directory tree, XML database, RDF database, etc are more likely to be used to retrieve single records.
Choice of storage mechanism is very important when trying to scale high-traffic site and stuffing everything into a SQL DB will quickly become a bottleneck.
It depends what you are storing. My blog uses Blosxom (written in Perl but a similar thing could be done for PHP) where each individual entry is a separate text file. The first line is plain text (the title) and the rest is unrestricted HTML. Following a few simple rules, these are rendered to form a simple but effective blogging framework.
It does have drawbacks but it also means that each post is a discrete file, which works well for updating on a local machine and then publishing to a remote web server. This is limited when it comes to efficient querying though, so certainly not a good choice if you want fine-grained control and web-based interaction with your data.
Check CouchDB.
I have used LINQ to XML as a data source in a .NET project. It was a small solution, and used caching to mitigate performance concerns. I would do it again for the quick site that just needs to keep data in a common place without increasing server requirements.
Depends on what you're storing and how you need to access it. Generally sql provides great reporting and manual management ability. Almost everything needs some way to manage what's stored and report on it.
In Perl I use DBM or Storable for such tasks. DBM will update automatically when variable is updated.
One level down from SQL databases is an ISAM (Indexed Sequential Access Method) - basically tables and indexes but no SQL and no explicit relationships among tables. As long as the conceptual basis fits your design, it will scale nicely. I've used Codebase effectively for a long time.
If you want to work with SQL-database-type data, then consider FileMaker.
A Simple answer is that you can use any data storage format, from standard defined, to database (which generally involved a protocol), even a bespoke file-format.
There are trade-offs for every choice you make in IT, and certainly websites are no different. In the early 2000's file-based forum systems were popular as it allows anyone with limited technical ability to edit pages and posts. Completely static sites swiftly become unmanageable and content does not benefit from upgrades to the site user-interface; however the site if coded correctly can simply be moved to a sub-directory, or ripped into the new design. CMS's and dynamic systems bring with them their own set of problems, namely that there does not yet exist a widely adopted standard for data storage amongst them; that they often rely on third-party plugins to provide features between design styles (despite their documentation advocating for separation of function & form).
In 2016, it's pretty uncommon not to use a standard storage mechanism, such as a *SQL RDBMS; although static site generators such as Jekyll (powers a lot of GitHub pages); and independent players such as October CMS still provision for static file-based storage.
My personal preference is to use an *SQL enabled RDBMS, it provides me syntax that is standardised at least at the vendor level, familiar and powerful syntax, but unlike a lot of people I don't think this is the only way, and in most cases would advocate for using a site-generator to save parts that don't have to be dynamic to a static store as this is the cheapest way to live on the web.
TLDR; it's up to you, SQL & RDBMS backed are popular.
Well, this is a bit of an open-ended question from the OP and there are two questions ... around SQL alternatives and non-SQL.
In general, in the "Why is SQL good" category ... it's a mature and robust standard that provides referential-integrity. Java JDBC supports it fully as do tools like TOAD and there a many SQL implementations such as SQL-Lite referenced earlier.
Now specific to a "for a web-site" is not particularly indicative of anything. Does a web-site need referential integrity? Maybe. If the business nature of the web-site is largely unstructured content, then one can consider any kind of persistent storage really from so called "no-SQL" databases like AWS DynamoDB to Mongo (not a fan though).
For managing the complexities of SQL stores - one suggestion versus a list of every persistence store ever created ... is AWS Aurora (part of RDS service). It is multi-region active-active and fully MySQL-compliant. JDBC/ODBC based driver frameworks would work out-of-the-box and it effectively offers "zero administration".
I would check out XML if I were you. See w3schools XML tutorial section on the left side. Tons of possibilities without using SQL database.