I'm looking for an idea of best practices here. I have a web based application that has a number of hooks into other systems. Let's say 5, and each of these 5 systems has a number of flags to determine different settings the user has selected in said systems, lets say 5 settings per system (so 5*5).
I am storing the status of these settings in the user sesion variables and was wondering is that a sufficient way of doing it?
I'm learning php as I go along so not sure about any pitfalls that this could run me into!
There's no size limit on session (apart from obvious memory and disk quota limit). Just keep it sane and don't put your entire database in it.
You have to realize that the session usually times out after 20minutes and then the data is garbage collected eventually. 25 values in the session isn't too much, but be sure you store them somewhere a bit more persistent if you can't afford to lose that data.
It's probably fine for a prototype, but you probably want to consider persisting these settings in MySQL/Postgres/Mongo/etc.
In terms of the number of settings that PHP can support, it depends on how many users you have and how much memory your production environment has.
I've seen PHP $_SESSIONs that contain arrays of hundreds of objects, so your little 5x5 matrix is tiny by comparison.
(The hundreds of ojects examples that I've seen are, however, a bit excessive, so just to clarify that I'm not condoning going that far! ;-))
Related
If I have a website where users login and logout, and each user has 4 session variables being used, how will this affect my site?
Say if I have 100,000 active members, then that would be effectively 400,000 session variables being passed at the same time. Will this affect the loading of my site? I understand php has a memory limit but do not fully understand it.
Thanks
4 variables per user is nothing, but my suggestion would be on a different level: Focus on what's causing actual bottlenecks in your web site. This issue is probably not relevant right now, and is really easy to switch from in the future if this what slows you down. (And it won't)
I bet you have much more important stuff to work on than worry about another variable, and when you get to that amount of active users, your whole structure will probably change, including servers and solutions. Good luck!
To answer shortly - yes, it will affect loading of your site, if you have 100k users. But it won't be only because of sessions, they will be a part of the bottleneck.
It's easy to calculate possible memory consumption and according to that you should decide how to scale your site.
Scaling options are endless (well, as a phrase of course, there's a finite amount of ways to scale programs but still there are many to choose).
If it happens that you attract that many users, chances are you will be able to afford professional help when it comes to scaling your site.
If it's a case of you wondering what those options are, then it might be the best to ask a question with specific things in mind that trouble you when determining when and where the bottlenecks might be.
Also, you don't deploy sites with so many active users on a single server, using default PHP configuration, especially the session one.
The answer, of course, is "it depends".
STRONG SUGGESTION:
1) Establish a "performance baseline"
2) Consider resources like CPU, memory, network and disk
3) Consider OS, Web server, Application and database
4) Run stress tests, and compare your performance between "normal loads" to "high loads"
5) Identify the bottlenecks, and deal with them appropriately
Assuming you're running Linux/Apache/MySql (a total guess on my part - you didn't say), here's an excellent three-part article that might get you started in the right direction:
Tuning LAMP systems, Part 1: Understanding the LAMP architecture
But trust me: worrying about minutia like whether you have 5 session variables instead or 4, instead of trying to gather solid baseline statistics, is NOT going to help you scale to 100,000 users :)!
So I am making a small multiplayer game and I am using php as the backend. I basically need to SET and GET a lot of positions of objects, well one object is one player that has a X/Y position in this case.
I don't need todo it in realtime, but perhaps every 5-20 seconds since it's turn based. I don't mind if I loose data since positions will be set again by the clients every now and then.
I was thinking of doing this with memcached, or redis. Basically each player would be a "key" and this key would contain an object with some relevant information, but the most important thing beeing the X/Y positions.
Perhaps I am going about this the wrong way but, this approach would seem very easy to do, however I am not sure how well it would work since I don't have a lot of experience with either of these soutions.
I should add that we are talking about perhaps 10 players here, hence 10 objects with x/y positions that needs updating every now and then.
Can it be done like this, is there a better solution than memcached/redis? If not which of these two would be better performance-wise? From what I understand it's almost the same thing, just that redis offers some more functionality (Which may not necessarily be needed).
Oh and yes I am also using APC with php obviously. Thanks!
With just 10 objects in the entire data model, I would store them all as a serialized array under a single key. The serialization time will pale in comparison to the memcached call, so you may as well minimize the number of reads and writes to one.
I just checked out the redis online demo, and it looks pretty neat. Thanks for the link. I can't speak to which is better, but memcached in PHP is proven and mature so you can't go wrong there.
Redis is cheapest on resources, especially 32 bit version, e.g. if you use less 2 GB cache memory, which is the case I believe, run 23 bit Redis even if your server is 64 bit.
I see programmers putting a lot of information into databases that could otherwise be put in a file that holds arrays. Instead of arrays, they'll use many tables of SQL which, I believe, is slower.
CitrusDB has a table in the database called "holiday". This table consists of just one date column called "holiday_date" that holds dates that are holidays. The idea is to let the user add holidays to the table. Citrus and the programmers I work with at my workplace will prefer to put all this information in tables because it is "standard".
I don't see why this would be true unless you are allowing the user, through a user interface, to add holidays. I have a feeling there's something I'm missing.
Sometimes you want to design in a bit of flexibility to a product. What if your product is released in a different country with different holidays? Just tweak the table and everything will work fine. If it's hard coded into the application, or worse, hard coded in many different places through the application, you could be in a world of pain trying to get it to work in the new locale.
By using tables, there is also a single way of accessing this information, which probably makes the program more consistent, and easier to maintain.
Sometimes efficiency/speed is not the only motivation for a design. Maintainability, flexibility, etc are very important factors.
The main advantage I have found of storing 'configuration' in a database, rather than in a property file, or a file full of arrays, is that the database is usually centrally stored, whereas a server may often be split across a farm of several, or even hundreds of servers.
I have implemented, in a corporate environment, such a solution, and the power of being able to change configuration at a single point of access, knowing that it will immediately be propagated to all servers, without the concern of a deployment process is actually very powerful, and one that we have come to rely on quite heavily.
The actual dates of some holidays change every year. The flexibility to update the holidays with a query or with a script makes putting it in the database the easiest way. One could easily implement a script that updates the holidays each year for their country or region when it is stored in the database.
Theoretically, databases are designed and tuned to provide faster access to data than doing a disk read from a file. In practice, for small to mid-sized applications this difference is minuscule. Best practices, however, are typically oriented at larger scale. By implementing best practices on your small application, you create one that is capable of scaling up.
There is also the consideration of the accessibility of the data in terms of other aspects of the project. Where is most of the data in a web-based application? In the database. Thus, we try to keep ALL the data in the database, or as much as is feasible. That way, in the future, if you decide that now you need to join the holiday dates again a list of events (for example), all the data is in a single place. This segmenting of disparate layers creates tiers within your application. When each tier can be devoted to exclusive handling of the roles within its domain (database handles data, HTML handles presentation, etc), it is again easier to change or scale your application.
Last, when designing an application, one must consider the "hit by a bus principle". So you, Developer 'A', put the holidays in a PHP file. You know they are there, and when you work on the code it doesn't create a problem. Then.... you get hit by a bus. You're out of commission. Developer 'B' comes along, and now your boss wants the holiday dates changed - we don't get President's Day off any more. Um. Johnny Next Guy has no idea about your PHP file, so he has to dig. In this example, it sounds a little trivial, maybe a little silly, but again, we always design with scalability in mind. Even if you KNOW it isn't going to scale up. These standards make it easier for other developers to pick up where you left off, should you ever leave off.
The answer lays in many realms. I used to code my own software to read and write to my own flat-file database format. For small systems, with few fields, it may seem worth it. Once you learn SQL, you'll probably use it for even the smallest things.
File parsing is slow. String readers, comparing characters, looking for character sequences, all take time. SQL Databases do have files, but they are read and then cached, both more efficiently.
Updating & saving arrays require you to read all, rebuild all, write all, save all, then close the file.
Options: SQL has many built-in features to do many powerful things, from putting things in order to only returning x through y results.
Security
Synchronization - say you have the same page accessed twice at the same time. PHP will read from your flatfile, process, and write at the same time. They will overwrite each other, resulting in dataloss.
The amount of features SQL provides, the ease of access, the lack of things you need to code, and plenty other things contribute to why hard-coded arrays aren't as good.
The answer is it depends on what kind of lists you are dealing with. It seems that here, your list consists of a small, fixed set of values.
For many valid reasons, database administrators like having value tables for enumerated values. It helps with data integrity and for dealing wtih ETL, as two examples for why you want it.
At least in Java, for these kinds of short, fixed lists, I usually use Enums. In PHP, you can use what seems to be a good way of doing enums in PHP.
The benefit of doing this is the value is an in-memory lookup, but you can still get data integrity that DBAs care about.
If you need to find a single piece of information out of 10, reading a file vs. querying a database may not give a serious advantage either way. Reading a single piece of data from hundreds or thousands, etc, has a serious advantage when you read from a database. Rather than load a file of some size and read all the contents, taking time and memory, querying from the database is quick and returns exactly what you query for. It's similar to writing data to a database vs text files - the insert into the database includes only what you are adding. Writing a file means reading the entire contents and writing them all back out again.
If you know you're dealing with very small numbers of values, and you know that requirement will never change, put data into files and read them. If you're not 100% sure about it, don't shoot yourself in the foot. Work with a database and you're probably going to be future proof.
This is a big question. The short answer would be, never store 'data' in a file.
First you have to deal with read/write file permission issues, which introduces security risk.
Second, you should always plan on an application growing. When the 'holiday' array becomes very large, or needs to be expanded to include holiday types, your going to wish it was in the DB.
I can see other answers rolling in, so I'll leave it at that.
Generally, application data should be stored in some kind of storage (not flat files).
Configuration/settings can be stored in a KVP storage (such as Redis) then access it via REST API.
I don't really have any experience with caching at all, so this may seem like a stupid question, but how do you know when to cache your data? I wasn't even able to find one site that talked about this, but it may just be my searching skills or maybe too many variables to consider?
I will most likely be using APC. Does anyone have any examples of what would be the least amount of data you would need in order to cache it? For example, let's say you have an array with 100 items and you use a foreach loop on it and perform some simple array manipulation, should you cache the result? How about if it had a 1000 items, 10000 items, etc.?
Should you be caching the results of your database query? What kind of queries should you be caching? I assume a simple select and maybe a couple joins statement to a mysql db doesn't need caching, or does it? Assuming the mysql query cache is turned on, does that mean you don't need to cache in the application layer, or should you still do it?
If you instantiate an object, should you cache it? How to determine whether it should be cached or not? So a general guide on what to cache would be nice, examples would also be really helpful, thanks.
When you're looking at caching data that has been read from the database in APC/memcache/WinCache/redis/etc, you should be aware that it will not be updated when the database is updated unless you explicitly code to keep the database and cache in synch. Therefore, caching is most effective when the data from the database doesn't change often, but also requires a more complex and/or expensive query to retrieve that data from the database (otherwise, you may as well read it from the database when you need it)... so expensive join queries that return the same data records whenever they're run are prime candidates.
And always test to see if queries are faster read from the database than from cache. Correct database indexing can vastly improve database access times, especially as most databases maintain their own internal cache as well, so don't use APC or equivalent to cache data unless the database overheads justify it.
You also need to be aware of space usage in the cache. Most caches are a fixed size and you don't want to overfill them... so don't use them to store large volumes of data. Use the apc.php script available with APC to monitor cache usage (though make sure that it's not publicly accessible to anybody and everybody that accesses your site.... bad security).
When holding objects in cache, the object will be serialized() when it's stored, and unserialized() when it's retrieved, so there is an overhead. Objects with resource attributes will lose that resource; so don't store your database access objects.
It's sensible only to use cache to store information that is accessed by many/all users, rather than user-specific data. For user session information, stick with normal PHP sessions.
The simple answer is that you cache data when things get slow. Obviously for any medium to large sized application, you need to do much more planning than just a wait and see approach. But for the vast majority of websites out there, the question to ask yourself is "Are you happy with the load time". Of course if you are obsessive about load time, like myself, you are going to want to try to make it even faster regardless.
Next, you have to identify what specifically is the cause of the slowness. You assumed that your application code was the source but its worth examining if there are other external factors such as large page file size, excessive requests, no gzip, etc. Use a site like http://tools.pingdom.com/ or an extension like yslow as a start for that. (quick tip make sure keepalives and gzip are working).
Assuming the problem is the duration of execution of your application code, you are going to want to profile your code with something like xdebug (http://www.xdebug.org/) and view the output with kcachegrind or wincachegrind. That will let you know what parts of your code are taking long to run. From there you will make decisions on what to cache and how to cache it (or make improvements in the logic of your code).
There are so many possibilities for what the problem could be and the associated solutions, that it is not worth me guessing. So, once you identify the problem you may want to post a new question related to solving that specific problem. I will say that if not used properly, the mysql query cache can be counter productive. Also, I generally avoid the APC user cache in favor of memcached.
I am wondering if it is viable to store cached items in Session variables, rather than creating a file-based caching solution? Because it is once per user, it could reduce some extra calls to the database if a user visits more than one page. But is it worth the effort?
If the data you are caching (willing to cache) does not depend on the user, why would you store in the session... which is attached to a user ?
Considering sessions are generally stored in files, it will not optimise anything in comparaison of using files yourself.
And if you have 10 users on the site, you will have 10 times the same data in cache ? I do not think this is the best way to cache things ;-)
For data that is the same fo all users, I would really go with another solution, be it file-based or not (even for data specific to one user, or a group of users, I would probably not store it in session -- except if very small, maybe)
Some things you can look about :
Almost every framework provides some kind of caching mecanism. For instance :
PEAR::Cache_Lite
Zend_Cache
You can store cached data using lots of backend ; for instance :
files
shared memory (using something like APC, for example)
If you have several servers and loads of data, memcached
(some frameworks provide classes to work with those ; switching from one to the other can even be as simple as changing a couple of lines in a config file ^^ )
Next question is : what do you need to cache ? For how long ? but that's another problem, and only you can answer that ;-)
It can be, but it depends largely on what you're trying to cache, as well as some other circumstances.
Is the information likely to change?
Is it a problem if slightly outdated information is shown?
How heavy is the load the query imposes on the database?
What is the latency to the database server? (shouldn't be an issue on local network)
Should the information be cached on a per user basis, or globally for the entire application?
Amount of data involved
etc.
Performance gain can be significant in some cases. On a particular ASP.NET / SQL Server site I've worked on, adding a simple caching mechanism (at application level) reduced the CPU load on the web server by a factor 3 (!) and at the same time prevented a whole bunch of database timeout issues when accessing a certain table.
It's been a while since I've done anything serious in PHP, but I think your only option there is to do this at the session level. Most of my considerations above are still valid however. As for effort; it should take very little effort to implement, assuming your code is sufficiently structured.
Session should only really be used strictly for user specific data. If you're using it to cache things that should be common across multiple sessions, you're duplicating a lot of data needlessly. Why not just use the Cache that comes with ASP.NET (you can use inProcess, rather than SQL if your concern is DB roundtrips, since you'll be storing Cached data in memory)