I am thinking about using a noSQL (mongoDB) paired with memcached to store sessions with in my webapp. The idea is that upon each page load, the user data is compared to the data in the memcache and if something has changed, the data would be written to both memcached and mySQL. This way the reads would be greatly reduced and memcached utilized to do what it does best.
However I am a bit concerned about using a non-ACID database for session storage especially with the memcached layer. Let's say something goes wrong while updating the session to the DB and our users got instant headache wondering why their product that they put in the cart doesn't show up...
What's an appropriate approach to this? Should we go for a mySQL session storage or is it fine to keep a non-acid supportive database for sessions?
Thanks!
I'm using MongoDB as session storage currently. It is possible to avoid race conditions mentioned by pilif. I found a class that implements a session handler for MongoDB (http://www.jqueryin.com/projects/mongo-session/) and forked it on github to suit my needs (http://github.com/halfdan/MongoSession).
If you don't want to lose your data, stick with ACID tested databases.
What's the payoff you're looking for?
If you want a secure system, you can't trust anything from the user, save for perhaps selected integers, so letting them store the information is typically a really bad idea.
I don't see the payoff for storing sessions outside of your MySQL database. You can cron cleanup on the tables if that's your concern, but why bother? Some users will shop on a site and then get distracted for a while. They would then come back a day or two later.
If you use cookies or something really temporary to store their session info, there is a really good chance their shopping time was wasted. Users really value their time... so if you stored their session info in the database, you can write something sexy to manage that data.
Plus, the nice side effect of this is that you'll generate a lot of residual information about what people like on your website that wouldn't perhaps be available to you later on. Like you could even consider some of it to be like a poll or something where the items people are adding to their cart could impact how you manage your business, order inventory or focus your marketing.
If you go with something really temporary then you lose out on getting residual benefits.
Without any locking on the session, be really, really careful of what you are storing. Never ever store anything that is dependent on what you have read before as the data might change between you reading and writing - especially in case of ajax where multiple requests can go out at once.
An example what you must not store in a non-locked session would be a shopping cart as, to add a product, you have to read, unserialize, add the product and then serialize again. If any other request does the same thing between the first requests read and write, you lose the second request's data.
Have a look at this article for detail: http://thwartedefforts.org/2006/11/11/race-conditions-with-ajax-and-php-sessions/
Keep Sessions on your filesystem (where PHP locks them for you), in your database (where you have to do manual locking) or never, ever, write anything of value to your session if that value is derived of a previous read.
While using memcached as a cache for database, it is the user who have to ensure the data consistency between database and cache. If you'll want to scale up and add more servers there is a probability to be out of sync with database even if everything seems ok.
Instead you may consider Hazelcast. As of 1.9 it also supports memcache protocol. Compared to memcached Hazelcast wants you to implement Map Persister and only itself updates the database for the updated entries. This way you don't have to handle "check cache, if data changed update database" kind of stuff.
If you write your app so that the user stores all session information client side, then you just verify that information as needed, you won't need to worry about sessions on the server side. This is one of the principles in REST style architecture. For instance, if the user is requesting adding an item to their shopping cart, just store the itemID list and count on the client side. When you hit the cart page, you can easily look up the item information from the list of itemIDs they are telling you are in their cart.
During checkout, go directly against the database with transactions to ensure you aren't getting any race conditions, and check your live inventory. If inventory isn't there when they go to check out, just say, "sorry, we just sold out". Of course, at that point you should go update any caches you have out there that are telling people you have inventory.
I would look at how much the user costs to acquire and then ask what is the cost for implementing a really good system. Keep in mind that users are a biological retry method. "I'm bored... press reload again..." While, this isn't the most perfect solution, it is sometimes acceptable vs the cost comparsion for "not lose anything - ever".
If you want additional security, you can have your sessions cached to a separate set of memcache servers so there are no accidental flushes. :)
There are a number of other systems membase.org, and some other persistent memcache solutions (java implementations) that will persist storage to disk. If you want to modify your client somewhat, or how you access memcache, you can do your own replication of memcache session objects.
-daniel
Related
I am using cache to store our carts. As more carts are being added the system is becoming extremely slow when fetching a single cart.
I use the below code to access a single cart and feel it may be that there is lots of information stored under one key in the cache.
$carts = Cache::get("carts");
$cart = $carts[$this->reference];
I understand we could use unique keys such as carts:uniqueId but I need to be able to access all of the carts and loop through them so I feel this solution doesn't work.
I'd appreciate any advice on how I can solve this issue.
Edit:
As per ka_lin comment, I need to loop through the carts and bookings so that I can check if the booking date is still available. If it is no longer available I need to clear the date selected and also emit an event to alert the user.
I would prefer to keep the data on the server side as this is built behind an api in order to make it easier for integration for partners.
If you need to save the carts on the server for some reason you could always use the database for that.
If you are storing them because the users should not lose items when leaving the page i would recommend using javascript and storing it with localStorage.
EDIT:
It sounds like you are using the cache for something that should be handled by a database.
What happen behind the scene is: Cart (if you have used as Session based cart) in Laravel gets stored in file system and Cache without memcache/apc cache installed. Laravel use default cache as a filesystem too. So ultimately you doing same thing.
Personally i don't think that only using cart in many places of website slow down the whole website. There must be an other factor of website which slow down your website.
To resolve the issue I created table called CacheKeys where I saved the unique cart ids which I can then use to loop through and fetch all carts/bookings.
The mindset with which you approached the problem is correct. The implementation, however, brings lots of pain.
Enter: Lada cache package
In such systems, what's truly slow is the mountain of sql queries which hit the database. Lada cache helps by automatically saving your query results in redis and invalidating them when these have been updated.
We've seen queries drop from 2s to 0.5s in production environments, an effective 4x gain.
RTFM: Read the manual CAREFULLY especially the Finally, all your models must include the Spiritix\LadaCache\Database\LadaCacheTrait trait part and enjoy a hassle-free cache system.
NOTE: Your backend should be the only system handling your queries against your database.
On my website, users can currently compare multiple company for a prestation.
For each company, I calculate the price of the prestation.
To calculate this price I have a very big SQL request in order to filter companies based on the user's previous input and get every parameter I need.
Once the query end, I loop througt the company list and calculate the price for each of them and for each additional services the company offer. Then I display those value to the user in HTML.
The user can then add or remove an option and order the company.
Then, when a user choose a price, I send the company_id along with the user's otpion (the different services he chose) to the server and get the price previously calculated from the user SESSION.
Prices are stored in user session in order to avoid the calculation process but, I have like ~6 prices by company and usually ~30 companies for one user request. Which mean that I store in session an array with around 180 different prices for one user.
I found it quite wastefull to store this many variables in session and I was wondering, is there a better way to store those variables ? Should I store them in database ?
By the way, server side, i'm using PHP along with Mysql for the database.
What you're effectively doing is a very primitive form of caching. Sadly, the session is not the best place to do so, for a variety of reasons:
The session can never be shared between users. Some values cached may be the same for every user. It's good to have a cache that allows you to go the "unique" or "shared" routes at will.
Is your session cached data used on every page? If it is, then forget this point. If, however, it isn't, on every page, you're still incurring the cost of fetching (which, depending on your server configuration, may involve a few fs calls, or network calls, or a combination) and deserializing the data, on every request. This, if your session payload is large, can make a significant different to load times.
Another point to consider is the simple fact that, if you are running a straight-out-of-the-box LAMP stack and have not configured a shared session driver, you're going to find a very nasty surprise when you scale out :-)
Before we go any further, ask yourself these questions:
Do the values in the session change on a user-by-user basis? And if they do, is it by a fixed amount of percentage?
Do the values in the session change often?
How are the values calculated?
If #1 is "No"
You are better off caching one copy and use it for every user.
If #2 is "No"
You are better off denormalizing (i.e. pre-calculating) the values in the database
If #3 is a complicated formula
See #2
In every case
MySQL is a very poor cache driver, and should be avoided if you can. Most people replace it with redis, but that's also not a very good way due to its inability to scale by itself (you need nutcracker/twemproxy to make it properly shard). Either way, if your data to be cached is relatively small, MySQL will do. However, plan for the future.
If you do plan on going the proper cache way, consider a key-value store such as Riak, or a document storage driver such as Aerospike.
Imagine a local Groupon clone. Now imagine a deal that attracted 10x normal visitors and because visitors were trying to buy deal in parallel MySQL database went down and deal's maximum purchases limit was exceeded.
I'm looking for best practices of the payment processing for highly-loaded websites, that will handle payments for the limited amount of products in parallel.
For now the simplest options seems to lock/unlock deal while customer is trying to purchase it on a third-party payment processor's page.
Any thoughts?
I was with you until you started to talk about a 3rd party payment processors page. It's hard to control your user's experience while dishing them off to a 3rd party site, because you have no idea what they're doing while they're there, if they got side-tracked, how long they're going to take to finish the transaction, IF they finished the transaction, etc.
If processing payments locally is not an option, that's not necessarily a problem - it just presents an issue with how you have to actually think about handling your transactions.
So, if it were me, not thinking about the 3rd party right now - we'll set that aside for a minute. Obviously, I'd #1 make sure my MySQL database was resilient enough to not go down, because that creates a huge problem for reconciling transactions. But, things happen, so you need a backup.
My suggestion would be to utilize a caching system which kept track of the product, and the current # of products available. Memcache could be good for this, as it's just a single record which will be pretty easy to grab. You wouldn't have to hit the database at all to get info on your product (availability) and if it went down, your users/application would be none the wiser, as you'd be getting info straight from Memcache about your item (no mysql required).
This presents an issue (when the database goes down) with storing payment records. When you collect money, you obviously need that transaction information in your database, and if your database is down - well, that's a problem. Memcache is not such a great solution for this, because you're limited to the size of your value and you must know about every key you care about. On top of that, Memcache doesn't have sets or set operations, so you can't append to a value without fear of nuking some data.
So, lets add a different piece of technology, Redis.
A solution for the transaction problem would be to write them to redis in the event that your MySQL server is not available (or write to both if you really want to, but you don't really need to do that). Then have a background process that knows how to go get the transaction details from redis and write them to your MySQL table(s) when it comes back online. Redis is pretty resilient to crashing, and is capable of operating at huge volumes. It also has set operations so you can easily append data to a set without fear of a race condition during your read/change/write operations.
So, you could store all your transactions in a redis key as a single set (store them as json strings if you like, that'd be pretty easy), then when your DB crashes you can just go get that data from Redis and write it to MySQL when it comes back online.
To keep things simple, if you were going use redis to store transactions, you may as well also use it to store your product cache, instead of memcache - keep the stack simple.
This takes care of not accessing the database for your Product details, and also keeping track of your (potentially) missed transactions, should MySQL crash. But it doesn't handle the problem of keeping track of product inventory while new transactions come in while MySQL is down, and ensuring that you don't over-sell product.
To handle this case, when a transaction is saved, you can decrement the # of products available (keep it as a flat number, so you're not constantly re-calculating it on page-load). This will tell you instantly if the product is oversold or not. However, what this does not do is protect the time that the "product is in the cart." Once the user puts the product in the cart (which you've allowed because you said you have the inventory), you have the problem of making sure it doesn't sell out before they check out.
The solution to this problem also doubles as your solution to the 3rd party transaction problem. So you're using a caching mechanism for your products, and a fall-back mechanism for your transactions. What you should do now, is when a user tries to buy a product (either puts it in the carts, or is shot off to the 3rd party processor) create a "product reservation" for them. It's probably easiest to make a redis entry for each of these. Make product reservations have a expiry time, say 5 or 10, maybe even 15 minutes if you like. Every time you see a user on your site, refresh the timeout to make sure they don't run out of time (you can put more logic in this if you desire, obviously). When a transaction was completed and changed from pending to paid, you'd create your transaction record (mysql or redis, depending on database availability), decrement your available quantity, and delete your reservation record.
You'd then use your available quantity information, in addition to your un-expired reservation information, to determine the quantity available for sale. If this number ever drops to zero, then you are effectively sold out; but if a certain number of your users don't convert it frees up the inventory that they didn't buy, allowing you to rinse and repeat that process until you're in fact, sold out.
This is a pretty long explanation of a fairly robust system, and if you ever run into the situation where your MySQL server crashed, AND redis crashed, you'd be kind of screwed; so it makes sense to have a failover of both of those systems here (which is entirely feasible and possible). It should make for a pretty rock solid checkout/inventory management process.
Hope it helps.
Use master slave mysql configuration with read/write connections.
Use cache as much as possible (redis is good idea).
Try to put some logic into redis, so it will not make extra connection to mysql + it will be faster.
For transactions maybe it is wise to use some kind of message queuing system (rabbitMQ). it will allow you to forward some tasks into background.
Dispate all this optimization you will have big problems if db or cache engine or mq will fail. But using master slave for all these services you will be kind of on the safe side. i.e. using multiple machines that will be able to continue to work if other machine fails.
And that brings me to next idea. cloud services with auto scaling (like aws).
Do you consider Compensating Service Transaction ?
I am considering enabling Memcache support for my large-scale REST service. However I have some questions regarding best approaches for these key-value stores.
The setup:
A database wrapper which has functions for select, update and etc.
A REST framework which contains all the API functions (getUser, createUser and etc.)
In my head, the ideal approach would be to integrate the Memcache in the database wrapper so, for example, every SQL query would get md5-hashed and saved in the cache (this is btw what most online resources suggests). However, there is obviously a problem with this approach: if a search query has been cached, and one of the users from the search result has been updated after the cached result, this wont reflect in the next request (because it is now in the cache).
As I see it I have several ways of handeling this:
Implement the Memcache in the REST framework for each function (getUser, createUser etc) and thereby explicit handle the updating of the cache etc. if users gets updated. This could end up in redundant code.
Let the cached values expire very quickly and live with the fact that some requests shows old cached values.
Do a more advanced implementation of the Memcache in the database wrapper so that I can identify which parts(e.g. users) to update in e.g. a search request.
Could you guide me to which of the following, or a complete another approach, to take?
Thanks in advance.
Enabling cache for a web application is not something to take lightly.
Maybe you have done that already bit... I recommend you first come up with a goal based on business needs or forcast (ex: must accept 1000 requests per seconds) then properly stress-test your system to have numbers before you start changing anything and then identify your bottleneck.
http://en.wikipedia.org/wiki/Performance_tuning
I usually use profiling tools such as HXProf (by facebook).
https://github.com/facebook/xhprof
Caching all your data to mirror your database might not be the best approach.
Find out how big you can allocate for your cache. If your architecture only allow you to allocate 100MB for your memcache, then it will affect your decision about what you cache and how long you cache it.
The best cache is to cache forever. But we all know that data changes. You can start by caching data that is requested often and requires the most resources to fetch.
Always try to make sure you are not working on improving something that will get you low improvement.
Without understanding your architecture in depth, it would be hazardous for anyone to recommend a caching strategy that best fit your needs.
Maybe you should cache the resutling output of your web services instead? Using a reverse proxy for example (What #Darrel is talking about) or using output buffering...
http://en.wikipedia.org/wiki/Reverse_proxy
http://php.net/manual/en/book.outcontrol.php
Optimize your database queries before you think about caching. Make sure your use a PHP Op cache (like APC) and all those things that are standard practice.
http://phplens.com/lens/php-book/optimizing-debugging-php.php
http://blog.digitalstruct.com/2008/01/31/performance-tuning-overview/
If you want to cache data and prevent stale/old data from being served, the trick is to identify your data (primary key maybe?) and when the data is updated or deleted, you delete or update the cache for that identifyer.
<?php
// After inserting into DB, you can also put it in the cache
$memcache->set($userId, $userData);
// After updating or deleting the user, you update or delete the data
$memcache->delete($userId);
A lot of site will show stale data. When I am on stackoverflow and my reputation is increased and then I got in the stackoverflow chat, the reputation shown is my old reputation. When I got a reputation of 20 (reputation required to chat) I still could not chat for another 5 minutes because the chat system had my old reputation data and did not yet know my reputation had increased enough to allow me to chat. Some data can be stale while other type of data should never be stale. Consider that when caching data.
Conclusion
Your approaches can all be valid depending on the factors that I talk about above. In fact, you can use a combination of those for all the different type of data you want to cache and how long it is acceptable to show old data for them. Maybe the categories or list of countries (since they do not change often) can be cached for a long time while the reputation (or whatever data changes all the time for all users) should be cached for a short period only.
I would like to create an interface for manipulating invoices in a transaction-like manner.
The database consists of an invoices table, which holds billing information, and an invoice_lines table, which holds line items for the invoices. The website is a set of scripts which allow the addition, modification, and removal of invoices and their corresponding lines.
The problem I have is this, I would like the ACID properties of the database to be reflected in the web application.
Atomic: When the user hits save, either the entire invoice is modified or the entire invoice is not changed at all.
Consistent: The application code already ensures consistency, lines cannot be added to non-existent invoices. Invoice IDs cannot be duplicated.
Isolated: If a user is in the middle of a set of changes to an invoice, I would like to hide those changes from other users until the user clicks save.
Durable: If the web site dies, the data should be safe. This already works.
If I were writing a desktop application, it would maintain a connection to the MySQL database at all times, allowing me to simply use the BEGIN TRANSACTION and COMMIT at the beginning and end of the edit.
From what I understand you cannot BEGIN TRANSACTION on one PHP page and COMMIT on a different page because the connection is closed between pages.
Is there a way to make this possible without extensions? From what I have found, only SQL Relay does this (but it is an extension).
you don't want to have long running transactions, because that'll limit concurrency. http://en.wikipedia.org/wiki/Command_pattern
The translation on the web for this type of processing is the use of session data or data stored in the page itself. Typically what is done is that after each web page is completed the data is stored in the session (or in the page itself) and at the point in which all of the pages have been completed (via data entry) and a "Process" (or "Save") button is hit, the data is converted into the database form and saved - even with the relational aspect of data like you mentioned. There are many ways to do this but I would say that most developers have an architecture similar to what I mentioned (using session data or state within the page) to satisfy what you are talking about.
You'll get much advice here on different architectures but I can say that the Zend Framework (http://framework.zend.com) and the use of Doctrine (http://www.doctrine-project.org/) make this fairy easy since Zend provides much of the MVC architecture and session management and Doctrine provides the basic CRUD (create, retrieve, update, delete) you are looking for - plus all of the other aspects (uniqueness, commit, rollback, etc). Keeping the connection open to mysql may cause timeouts and lack of available connections.
Database transactions aren't really intended for this purpose - if you did use them, you'd probably run into other problems.
But also you can't use them as each page request uses its own connection (potentially) so cannot share a transaction with any others.
Keep the modifications to the invoice somewhere else while the user is editing them, then apply them when she hits save; you can do this final apply step in a transaction (albeit quite a short-lived one).
Long-lived transactions are usually bad.
The solution is not to open the transaction during the GET phase. Do all aspects of the transaction—BEGIN TRANSACTION, processing, and COMMIT—all during the POST triggered by the "save" button.
Persistent connections may help you:
http://php.net/manual/en/features.persistent-connections.php
Another is that when using
transactions, a transaction block will
also carry over to the next script
which uses that connection if script
execution ends before the transaction
block does.
But I recommend you to find another approach to the problem.
For example: create a cache table.
When you need to "commit", transfer the records from the cache table to the "real" tables.
Altough there are some good answers, I think that found some good responses to your question, that I was stuck with also. I think the best approach is using a framework like Doctrine (O/R mapping) that has this kind of approach somehow implemented. Here you have a link to what I'm talking about.