I've got a PHP app that stores arbitrary config info in a file. I would like to read that file once, when the app first starts, save it as some kind of application state variable, and leverage it across potentially thousands of user sessions. My Google foo is usually pretty good but in this case the only thing I'm able to come up with is the $_SESSION variable. Using it means reading the config file once per user session, which could mean reading it thousands of times a minute in high-volume installations, which seems inefficient.
When I worked with .NET web apps there was an idea of an application session that could be used to persist app configuration information across multiple user sessions. Does PHP have a similar concept?
Does php provide an API for cross-session data management? No
Does php provide a mechanism for reading and updating data? Yes there's lots of them
While this sounds like a session handler which is shared across multiple users, it's implementation is very different. By default (and by necessity) php's sessions are blocking. If the access to this shared dataset was blocking then you would severely limit concurrency.
Given that the access to the data must be non-blocking, how do you mediate concurrent updates to the shared data? A lot depends on the frequency of the updates. But there's also questions about capacity and whether you need to support multiple nodes.
Any one-size-all solution for the functionality is going to be severely hampered in capacity and/or performance. There are lots of products PHP will integrate with to provide a suitable storage substrate, however (leaving aside the logic of the interface for your super-session) it is not in the nature of open source software to package up third party products and hide them behind APIs.
Related
I'm going to introduce the Round Robin Load Balancing in our architecture and I don't really want to use the Sticky Session since we don't utilize the cookie in our apps. I'm trying to decide whether I should store the session ID for my php apps inside the couchbase database or I should store it in the mounted AWS S3. Currently our session ID is stored in a pretty standard way which is local server.
I'm thinking of moving the session id storage in the couchbase database, however that requires us to change the code to accomodate that capability. Storing the session ID on mounted AWS S3 is preferable since I don't really have to change anything in my code other than the pointer in php.ini. But is it a good idea though and does it has any performance implication to it. Anyone has any experience with this feature request and perhaps can share your thought/result?
Many thanks everyone...
Deciding on how to implement sessions based on what is easy to implement doesn't seem like a very sound basis for choosing an architecture.
S3 -- particularly if you are already using it in a way in which it is not exactly designed to be used (by mounting it, to emulate a filesystem, which it is not -- it's an object store) -- does not seem like an appropriate platform for storing session data, for two reasons: the first is the likelihood of potential performance issues, and the second is the consistency model of S3. A third potential consideration is the per-request pricing for S3, and a fourth consideration is that you'd need to disable any local file cache from s3fs (assuming that's what you're using) or you run the risk of reading stale data... but this would likely introduce additional potential performance issues.
When you create a new object in S3, and then try to download it, it will not necessarily be immediately available in the US Standard Region, which only provides eventual consistency, which means it is sometimes but not always possible to immediately download something you just finished uploading. The other regions provide read-after-write consistency on new objects, but this could potentially come in the form of a tradeoff that increases the time it takes to initially create the new object, or the initial time to fetch it again.
In contrast to new objects, all regions, not just US Standard, provide only eventual consistency when you overwrite an existing object with different content. This means if you change the contents of a file on S3 and immediately read the file again, you may not immediately see the newest version of the file... and that if you delete a file and subsequently try to read from it again, you might actually be able to, for a short period of time. Testing this to disprove whether this is a problem would serve no purpose, since this is their stated consistency model, and behavior you observe today could deteriorate in the future yet still be consistent with their model.
http://aws.amazon.com/s3/faqs/
By contrast, SimpleDB, DynamoDB, and RDS all provide services that are more appropriate for storing session data, with the applicability of each of these services depending on your specific requirements.
Or, you could store the sessions in Couchbase, if that provides suitable performance. I can't comment on that possibility, since I am unfamiliar with that platform... but S3, in spite of being an excellent service, it does not seem well-suited for this application.
One thing, though...
we don't utilize the cookie in our apps
I'm skeptical, since that's typically the way sessions work. How does your server identify the appropriate session for the user connecting to your site, then, if not by a cookie?
I am using sessions in PHP to track if a user is logged in. I do not use it to store any other data about the user; essentially it is like checking a hash table to see if the user has authenticated.
Would there be some advantage to using redis instead of native PHP sessions?
I'm curious about performance, scalability, and security (not really concerned with code complexity).
Using something like Redis for storing sessions is a great way to get more performance out of load balanced servers.
For example on Amazon Web Services, the load balancers have what's called 'sticky sessions'. What this means is that when a user first connects to your web app, e.g. when logging in to it, the load balancer will choose one of your app servers and this user will continue to be served from this server until they exit your application. This is because the sessions used by PHP, for example, will be stored on the app server that they first start using.
Now, if you use Redis on a separate server, then configure your PHP on each of your app servers to store it's sessions in Redis, you can turn this 'sticky sessions' off. This would mean that any of your servers can access the sessions and, therefore, the user be served from a different server with every request to your app. This ultimately makes for more efficient use of your load balancing set-up.
You want the session save handler to be fast. This is due to the fact that a PHP session will block all other concurrent requests from the same user until the first request is finished.
There are a variety of handlers you could use for PHP sessions across multiple servers: File w/ NFS, MySQL Database, Memcache, and Redis.
The database method (using InnoDB) was the slowest in my experience followed by File w/ NFS. Locking and write contention are the main factors. Memcache and Redis provide similar performance and are by far the better alternatives since all operations are in RAM. Redis is my choice because you can enable disk persistence, and Memcache is only memory based.
I explain Redis Sessions in PHP with Kohana if you want more detail. Here is our dashboard for managing Redis keys:
I don't really think you need to worry much about sessions unless you get MASSIVE ammounts of traffic, PHP handle sessions nicely, and if you store only that little data, it should be fine even with a lot of requests, and about performance it should be close, as redis is not native to PHP.
With 10k users, if each user uses like 1kb data of sessions, it would consume 10,000kb or 10~mb, which is not much; PHP is smart enough to use a good enough data structure to hold and quickly write and read those values. The problem is if the session data is too big, or for some reason the server consumes too many resources reading the session data, but that's normally if it's the data is too big.
I'm on board with the whole cookieless domains / CDN thing, and I understand how just sending cookies for requests to www.yourdomain.com, while setting up a separate domain like cdn.yourdomain.com to keep unnecessary cookies from being sent can help performance.
What I'm curious about is if using PHP's native sessions have a negative effect on performance, and if so, how? I know the session key is kept track of in a cookie, which is small, and so that seems fine.
I'm prompted to ask this question because in the past I've written my web apps and stored a lof of the user's active data, preferences, and authentication information in the $_SESSION variable. However, I notice that some popular web applications out there, like Wordpress, don't use $_SESSION at all. But sessions are easy to use and seem fairly secure, especially if you combine it with tracking user-agent / ip changes to prevent session hijacking. So why don't Wordpress and other web apps use php's sessions? Should I also stop using sessions?
Also, let me also clarify that I do realize the server must load the session data to process a page request, but that's not what I'm asking about here. My question is about if / how it impacts the network performance, especially in regard to the headers being sent / received. For example does using sessions prevent pages or images on the site from being served from the browser's cache? Is the PHPSESID cookie the only additional header that is being sent? These sorts of things.
The standard store for $_SESSION is the file-system with one file per session. This comes with a price:
When two requests access the same session, one request will win over the other and the other request needs to wait until the first request has finished. A race condition controlled by file-locking.
Using cookies to store the session data (Wordpress, Codeigniter), the race-condition is the same but the locking is not that immanent, but a browser might do locking within the cookie management.
Using cookies has the downside that you can not store that much data and that the data get's passed with each request and response. This is likely to trigger security issues as well. Steal the cookie and you've got the data. If it's encrypted, an attacker can try to decrypt it to gain the data stored therein.
The historical reason for Wordpress was that the platform never used the PHP Sessions. The root project started around 2000, it got a lot of traction in 2002 and 2004. As session handling was only available with PHP 4 and PHP 3 was much more popular that time.
Later on, when $_SESSION was available, the main design of the application was already done, and it worked. Next to that, in 2004/2005 wordpress decided to start a commercial multi-blog hosting service. This created a need in scaling the application(s) across servers and cookies+database looked more easy for the session/user handling than using the $_SESSION implementation. Infact, this is pretty easy and just works, so there never was need to change it.
For Codeigniter I can not say that much. I know that it stores all session information inside a cookie by default. So session is just another name for cookie. Optionally it can be encrypted but this needs configuration. IIRC it was said that this has been done because "most users do not need sessions". For those who need, there is a database backend (requires additional configuration) so users can change from cookie to database store transparently within their application. There is a new implementation available as well that allows you to change to any store you like, e.g. to native PHP sessions as well. This is done with so called drivers.
However this does not mean that you can't achieve the same based on $_SESSION nowadays. You can replace the store with whatever you like (even cookies :) ) and the PHP implementation of it should be encapsulated anyway in a good program design.
That done you can implement a store you can better control locking on (e.g. a database) and that works across servers in a load balanced infrastructure that does not support sticky sessions.
Wordpress is a good example for an own implementation of sessions handling totally agnostic to whatever PHP offers. That means the wheel has been re-invented. With a view from today, I would not call their design explicitly innovative, so it full-fills a very specific need in a very specific environment that you can only understand if you know about the projects roots.
Codeigniter is maybe a little step ahead (in an interface sense) as it offers some sort of (unstable) interface to sessions and it's possible to replace it with any implementation you like. That's much better for new developers but it's also sort of re-inventing the wheel because PHP does this already out of the box.
The best thing you can do in an application design is to make the implementation independent from system needs, so to make the storage mechanism of your session data independent from the rest of the program flow. PHP offers this with a pretty direct interface, the $_SESSION array and the session configuration.
As $_SESSION is a superglobal array you might want to prevent your application to access it directly as this would introduce global state. So in a good design you would have an interface to it, to be able to fully abstract away from the superglobal.
Done that, plus abstraction of the store plus configuration (e.g. all in one session dependency container), you should be able to scale and maintain your application well over as many servers as you like for whatever reason. Your implementation then can just use cookies if you think that's it for you. However you will be able to switch to database based session in case you need it - without the need to rewrite large parts of your application.
I'm not 100% confident this is the case but one reason to avoid the built-in $_SESSION mechanism in PHP is if you want to deploy your web application in a high-availability web farm scenario.
Because the default session behavior in PHP is to store session objects in process, in memory, it makes it hard (if not impossible) to have multiple servers processing requests from the same user. You would only have this if you wanted to deploy your web application in a web farm environment where you have a number of PHP web servers processing requests for your app to balance the load.
So, while in-process session state is generally much faster than a database-based solution, the latter is favorable when you need to process a huge number of requests and to service the capacity a web-farm environment is used.
As I said in the beginning, I'm not 100% sure if PHP supports configuring the session state provider to be a database, or session state server, instead of the in-process default.
I am building a web-application and have a couple of quick questions. From what I learnt, one should not worry about scalability when initially building the app and should only start worrying when the traffic increases. However, this being my first web-application, I am not quite sure if I should take an approach where I design things in an ad-hoc manner and later "fix" them. I have been reading stories about how people start off with an app that gets millions of users in a week or two. Not that I will face the same situation but I can't help but wonder, how do these people do it?
Currently, I bought a shared hosting account on Lunarpages and that got me started in building and testing the application. However, I am interested in learning how to build the same application in a scalable-manner using the cloud, for instance, Amazon's EC2. From my understanding, I can see a couple of components:
There is a load balancer that first receives requests and then decides where to route each request
This request is then handled by a server replica that then processes the request and updates (if required) the database and sends back the response to the client
If a similar request comes in, then a caching mechanism like memcached kicks into picture and returns objects from the cache
A blackbox that handles database replication
Specifically, I am trying to do the following:
Setting up a load balancer (my homework revealed that HAProxy is one such load balancer)
Setting up replication so that databases can be synchronized
Using memcached
Configuring Apache to work with multiple web servers
Partitioning application to use Amazon EC2 and Amazon S3 (my application is something that will need great deal of storage)
Finally, how can I avoid burning myself when using Amazon services? Because this is just a learning phase, I can probably do with 2-3 servers with a simple load balancer and replication but until I want to avoid paying loads of money accidentally.
I am able to find resources on individual topics but am unable to find something that starts off from the big picture. Can someone please help me get started?
Personally, I think you should be considering how your app will scale initially - as otherwise you'll run into problems down the line.
I'm not saying you need to build it initially as a multi-server system, but if you think you'll need to do it later, be mindful of the concerns now.
In my experience, this includes things like:
Sessions. Unless you use 'sticky' load balancing, you will have to have some way of sharing session state between servers. This probably means storing session data on either shared storage, or in a DB.
File uploads and replication. If you allow users to upload files, or you have a CMS that allows you to upload images/documents, it needs to cater for the fact that these files will also need to find their way onto other nodes in your cluster. However, if you've gone down the shared storage route mentioned above, this should cover it.
DB scalability. If you're using traditional DB servers, you might want to think about how you'll implement scalability at that level. This may mean coding your app so you use one connection string for reads, and another for writes. Then, you are free to implement replication with one master node handling the inserts/updates cascading the changes to read only nodes that handle the bulk of the work.
Middleware. You might even want to go down the route of implementing some kind of message oriented middleware solution to completely hand off business logic functions - this will give you a great level of flexibility in how you wish to scale this business logic layer in the future. Although initially this will be a lot of complication and work for not a great deal of payoff.
Have you considered playing around with VMs first? You can run 2-3 VMs on your local machine and set them up like you would actual servers, they just won't be able to handle real traffic levels. If all you're looking for is the learning experience, it might be an ideal way to go about it.
Here's a little background, currently i have
3 web servers
one db server which also host memcache for php sessions for the 3 web servers.
I have the php configs on the 3 servers to point to the memcache server for sessions. It was working fine until alot of connections were being produced for reads etc, which then caused connection timeouts.
So I'm currently looking at clustering the memcache on each web server for sessions, my only concern is how to go about making sure that memcache on all the servers have the same information for sessions.
Someone guided me to http://github.com/trs21219/Memcached-Library because i am using codeigniter but how do i converge my php sessions onto this since memcache seems as a key-value store? Thanks in advance.
Has anyone checked out http://repcached.sourceforge.net/ and does it work?
I'm not sure you have the same expectations of memcache that its designers had.
First, however, memcache distribution works differently than you expect: there is no mechanism to replicate stored information. Each memcache instance is a simple key-value store, as you've noticed. The distribution is done by the client code which has a list of all configured memcache instances and does a hash of the key to direct it to one of the instances. It is possible for the client to store it everywhere and retrieve it locally, or for it to hash it multiple times for redundancy, but these are not straightforward exercises.
But the other issue is that memcache is designed for reasonably short-lived data that memcache is allowed to throw away at any time. This makes it really good for caching frequently accessed data that can be a little stale (say up to a few minutes old) but might be expensive to retrieve (such as almost a minute to generate from a query).
PHP sessions don't really qualify for this, in my experience. A database can easily support many thousands of PHP sessions with barely visible traffic, but you need a lot of memcache storage to support the same number: 50k per session and 5000 sessions means close to 256Mb, and then there is all the other data you want to put in there. Not enough storage and you get lots of unexplained logouts (as memcache discards session data when under memory pressure) and thus lots of annoyed users who have to keep logging in again.
We've found GREAT advantage applying MongoDB instead of MySQL for most things, including session handling. It's far faster, far smaller, far easier. We keep MySQL around for transactional needs, but everything else goes into Mongo now. We've relegated memcache to simply caching pages and other data that isn't critical if it's there or not, something like smarty does.
There is no need to use some 3rd party libraries to organize memcached "cluster".
http://ru.php.net/manual/en/memcached.addserver.php
Just use this function to add several servers into the pool and after that data will be stored and distributed over those servers. The server for storing/retrieving the data for the specific key will be selected according to consistent key distribution option.
So in this case you don't need to worry about "how to go about making sure that memcache on all the servers have the same information for sessions"