How many session variables per user is considered too much? - PHP - php

If I have a website where users login and logout, and each user has 4 session variables being used, how will this affect my site?
Say if I have 100,000 active members, then that would be effectively 400,000 session variables being passed at the same time. Will this affect the loading of my site? I understand php has a memory limit but do not fully understand it.
Thanks

4 variables per user is nothing, but my suggestion would be on a different level: Focus on what's causing actual bottlenecks in your web site. This issue is probably not relevant right now, and is really easy to switch from in the future if this what slows you down. (And it won't)
I bet you have much more important stuff to work on than worry about another variable, and when you get to that amount of active users, your whole structure will probably change, including servers and solutions. Good luck!

To answer shortly - yes, it will affect loading of your site, if you have 100k users. But it won't be only because of sessions, they will be a part of the bottleneck.
It's easy to calculate possible memory consumption and according to that you should decide how to scale your site.
Scaling options are endless (well, as a phrase of course, there's a finite amount of ways to scale programs but still there are many to choose).
If it happens that you attract that many users, chances are you will be able to afford professional help when it comes to scaling your site.
If it's a case of you wondering what those options are, then it might be the best to ask a question with specific things in mind that trouble you when determining when and where the bottlenecks might be.
Also, you don't deploy sites with so many active users on a single server, using default PHP configuration, especially the session one.

The answer, of course, is "it depends".
STRONG SUGGESTION:
1) Establish a "performance baseline"
2) Consider resources like CPU, memory, network and disk
3) Consider OS, Web server, Application and database
4) Run stress tests, and compare your performance between "normal loads" to "high loads"
5) Identify the bottlenecks, and deal with them appropriately
Assuming you're running Linux/Apache/MySql (a total guess on my part - you didn't say), here's an excellent three-part article that might get you started in the right direction:
Tuning LAMP systems, Part 1: Understanding the LAMP architecture
But trust me: worrying about minutia like whether you have 5 session variables instead or 4, instead of trying to gather solid baseline statistics, is NOT going to help you scale to 100,000 users :)!

Related

MySQL application performance issue

We want to get better performance with our application based on MySQL and PHP.
The current situation is an e-learning system that receives some "bursts" of queries depending on the day of the week and/or the hour of the day.
(hundreds of students starting drills at the same time from different schools for example)
As you may guess, these systems need real time calculations all the time.
We have very few slow queries in general and we try to improve them when they appear in the logs.
The hardware is self hosted, it's currently a VPS and it's not our hardware, but we keep hardware upgrade as part of the solution.
We have a specific table that is read/write intensive. We think it comes to disk access to that table. (study logs)
We are trying to figure out a hardware and/or software setup that could increase performance especially when that log table is needed.
One solution we are thinking about, is using replication to balance the "write" and "read" queries. (proxySQL + replication)
Our fear with that setup is what happens if the master is non available...
One software possibility we are currently developing is creating a "summary" table that is calculated only once a day or so. That should release some stress at least on 2 screens of the application. The bottleneck in that case seemed to be regarding the creation of temporary table and number of join tables.
I can add details as needed, please don't hesitate to ask.
EDIT: reformulate
What are the possible MySQL setups available to get better performance? Replication, cluster, other?
Thank you very much for your time.
Since you are not putting any code to your question I just answer you generally:
Calculate memory usage by memory_get_usage(). put it at the last line of your codes
Check loading average by sys_getloadavg(). put it at the last line of your codes
Check running time by microtime(). Accurate way to measure execution times of php scripts
Check which query cost you more by: select * from sys.x$statement_analysis
Calculate performance just by ONE run and try to make it better.
Find where you are hashing. for example find where you are using crypt. some hashing method can cost you too much. find where you really don't need them and on that place you can use low cost hashing like sha1 or md5. for example if you hashing user avatars inside a public folder you can lower your cost by using something like md5. But you NEVER EVER go for performance where security matters. for example for password hashing never go for performance.
You can cache some place that can be cached. It can really help you for overall performance. read PHP Cache Dynamic Pages To Speed Up Load Times. Also you can cache with Apache How To Configure Content Caching Using Apache Modules On A VPS and nginx A Guide to Caching with NGINX and NGINX Plus.
In your queries don't use something that is not working with indexes (you can't always do that but do it as you can). For example find_in_set makes high impact on performance. Specially when you are dealing with huge archives.
If you really think making Log can impact on your performance, save log in another server. You can connect from current server to MySQL on another server by using IP. Also you can make API for this.
Always think about better architecture. Sometimes with reviewing codes you see something can be drop or replace with better idea.

What happens when my PHP website will start having a LOT of members?

This is something I am really curious about and I do not really understand how is that possible.
So lets say I am the owner of Facebook (ahah) and I have million of people visiting my website every day, thousands and thousands of images, videos, logs etc..
How do I store all this data?
Do I have more databases in different servers around the world and then I connect to them from a single location?
Do I use an internal API system that requests info from other servers where the data is stored?
For example I know that Facebook has a lot of data centers around the world and hundreds of servers..
How do they connect to these servers? Are the profiles stored in different locations and when I connect to my profile, I will then be using that specific server? Or is there one main server that has the support of other hundreds of servers around the world?
Is there a way to use PHP in a way that I will connect to different servers and to different mySQL (???) databases to store and retrieve data whenever I want?
Sorry if this looks like a silly question, but since it could happen a day to work on a successful website, I really want to know what I will have to do, and what is the logic behind.
Thank you very much.
I'll try to answer your (big) question but not from Facebook point of view since their architecture is pretty much known.
First thing you have to know is that you would have to distribute the workload of your web application. Question is how, so in order to determine what's going to be slow, you have to divide your app in segments.
First up is the HTTP server, or the one that accepts all the requests. By going to "www.your-facebook.com", you're contacting a service on an IP. Naturally, you would probably have more than one IP but let's say you have a single entry point.
Now what happens? You have an HTTP server software, let's say Apache and it handles incoming connections. Since Apache creates a thread per connected user, it requires certain amount of memory for that operation. Eventually, it will run out of memory and then shit hits the fan, stuff stops working, your site is unavailable.
Therefore, you have to somehow scale this part of your application that connects your PHP code / MySQL db to people who want to interact with it.
Let's assume you successfully scaled your Apache and you have a cluster of computers which can accept new computers in order to scale-out. You solved your first problem.
Next part is the actual layer that does the work. Accepts input from the user and saves it somewhere (MySQL) and that's the biggest problem you'll have - why?
Due to the database.
Databases store their data on mediums such as hard drives. Hard drives, be it an SSD or mechanical one - are limited by their ability to write or retrieve data. If I'm not mistaken, RAM operates at levels of around 6GB/sec transfer rate. Not to mention that the seek time is also much much lower than HDD's one is.
Therefore, if you have an X amount of users asking for a piece of information and you can only deliver it at a certain rate - your app crashes, or it becomes unresponsive and the layer handling database queries becomes slow since the hardware cannot match the speed at which you need the data.
What are the options here? There are many, I won't mention all of them
Split Reads and Writes. Set your database layer in such a way that you have dedicated machines that write the data and completely different ones that read it. You have to use replication and replication has its own quirks - it never works without breaking.
Optimize handling of your data set by sharding your data. Great for read / write performance, screwed up when you need to query multiple shards and merge the data.
Get better hardware, especially storage (such as FusionIO)
Pay for better storage engine (such as TokuDB)
Alleviate load on the database by using caching. The data that your users request probably doesn't change so often that you have to query the db every single time (say you're viewing someone's profile, what's the chance they'll change it every second?). That's why Facebook uses Memcached extensively - a system that stores small pieces of data in RAM, it's easily scalable and what not. Most important, it's damn quick!
Use different solutions next to MySQL. MySQL (and some other databases) aren't good for every type of data storage or retrieval. Someone mentioned NoSQL before. NoSQL solutions are quick, but still immature. They don't do as much as relational databases do. They use methods of delaying disk write (they keep cached copy of data they need to write in RAM) so that they can achieve fast insert rates. That's why it's not unusual to lose data when using NoSQL.
Topic about MySQL vs "insert database or whatever here" is broad, I don't want to go into that but remember - every single one of data stores out there saves data on the hard drive eventually. The difference (physical of course) is how they optimize their flushing to the disk itself.
I also didn't mention various reports you can run by gathering the data (how many men between 19 and 21 have clicked an advert X between 01:15 and 13:37 CET and such) which is what Facebook is actually gathering (scary stuff!).
Third up - the language gluing the data store (MySQL) and output (HTTP server). PHP.
As you can see, most of the work here is already done by Apache and MySQL. Optimization on PHP level is small, even facebook got small results (they claim 50%, but that's UP TO 50%). I tried HipHop extensively, it is not as fast as it claims to be. Naturally, Facebook guys mentioned that already, so it's no wonder. The advantage they get is because they replaced Apache with their own server built in into HipHop. Some people claim "language X is better than language Y" and they're right, but that's not always the case. Each language has its own advantages and disadvantages.
For example, PHP is widely-spread but it's slow for certain operations (implementing a Trie with over 1 billion entries for example). It's great for things like echo some HTML after parsing the output from the db. It's quick to insert and retrieve data from the database, and that's about 90% of the PHP usage - talk to the db, display the data, end.
Therefore, no matter what language you use (say we used C++ instead of PHP), your bottleneck will be the data storage / retrieval layer.
On the other hand, why is using C++ NOT handy? Because there are more people who know how to use PHP than ones who use C++. It's also MUCH slower to develop web apps in C++. Sure, they will execute faster, but who will notice the difference between 1 millisecond and 1 microsecond?
This post is more like an informative blog post, I know it's not filled with resources to back up my claims but anyone who did any work with larger data sets or websites will know that the P.I.T.A. is always the data storage component. Some things that I said probably won't fit with everyone, but in a NUTSHELL this is how you'd go about optimizing your site.
Unfortunately, your question doesn't have a simple answer. For the MySQL portion of it, you would need to investigate database scale-out. You can start looking at it here: http://www.mysql.com/why-mysql/scaleout/mixi.html. There are a number of different ways to set up Apache/PHP web sites across a server farm. One of them involves setting up round robin DNS. This is adding a DNS record with a number of different IP addresses. Your DNS then hands out a different IP address each time the record is requested so that the load is balanced across a number of servers. You can also set up clustering with MySQL, Apache and Heartbeat, but that is more of a high-availability solution than a scaling solution.
When you have a website with so many users you'll already have enough experience to know the answer of the question, you'll also have a lot of money to pay people to find the optimal architecture of your system.
I'm not saying that what I describe below is the Holy Grail, but it is certainly an option:
You will have a big, fragmented database with lots of backups and you'll have a few name servers which will know the location of servers and some rules about the data stored on each server. When data is searched the query will be sent to a name server which will find the server(s) where the answer can be found for the particular query. I've also upvoted N.B.'s answer, I think he is mostly right.
For lots of users, you should have a server with lots of memory and speed. Configure php.ini to allow more memory usage. A server with lots of users should have 4-12GB available. Also, save resources by closing the desktop environment. If you have this many users, you might want to consider a CDN and also make a database request queue.

Difference between server caching and Client Caching for a large dataset?

I am implementing a project in PHP with mysql. Right now i don't have much data but i was wondering that in future when i have a large dataset. It will slow down my search in the table. So to decrease that searching time, i was thinking for caching techniques. Which caching i.e. client or server will be good for a large dataset?
Thanks, aby
Server, in my opinion.
A client-side cacheing technique will have one of two negative outcomes depending on how you do it:
If you cache only what the user has searched for before, the cache won't be of any use unless the user performs exactly the same search again.
If you cache the whole dataset the user will have to download the whole thing, and that will slow your site down and incur bandwidth expenses.
The easiest thing you can do is just add appropriate indexes to the table you're searching. That will be sufficient for 99% of possible applications and should be the first thing you do, before you think about cacheing at all.
Apologies if I've pitched this answer below your level, I'm not sure exactly what you're doing, what you're planning to cache or how much experience you have.
Pay close attention to indexing in your database schemas. If you do this part right, the database should be able to keep up until your data and traffic is large. The right caching scheme will depend significantly on what your usage patterns are like. You should do testing as your site grows to know where the bottlenecks are and what the best caching scheme will be for your system.

How many variables is to many when storing in _SESSION?

I'm looking for an idea of best practices here. I have a web based application that has a number of hooks into other systems. Let's say 5, and each of these 5 systems has a number of flags to determine different settings the user has selected in said systems, lets say 5 settings per system (so 5*5).
I am storing the status of these settings in the user sesion variables and was wondering is that a sufficient way of doing it?
I'm learning php as I go along so not sure about any pitfalls that this could run me into!
There's no size limit on session (apart from obvious memory and disk quota limit). Just keep it sane and don't put your entire database in it.
You have to realize that the session usually times out after 20minutes and then the data is garbage collected eventually. 25 values in the session isn't too much, but be sure you store them somewhere a bit more persistent if you can't afford to lose that data.
It's probably fine for a prototype, but you probably want to consider persisting these settings in MySQL/Postgres/Mongo/etc.
In terms of the number of settings that PHP can support, it depends on how many users you have and how much memory your production environment has.
I've seen PHP $_SESSIONs that contain arrays of hundreds of objects, so your little 5x5 matrix is tiny by comparison.
(The hundreds of ojects examples that I've seen are, however, a bit excessive, so just to clarify that I'm not condoning going that far! ;-))

How to load balance (scale) a simple PHP application?

I constantly read on the Internet how it's important to correctly architect my PHP applications so that they can scale.
I have built a simple/small CMS that is written in PHP (think of Wordpress, but waaaay simpler).
I essentially have URLs like such: http://example.com/?page_id=X where X is the id in my MySQL database that has the page content.
How can I configure my application to be load balanced where I'm simply performing PHP read activities.
Would something like Nginx as the front door setup to route traffic to multi-nodes running my same code to handle example.com/?page_id=X be enough to "load balance" my site?
Obviously, MySQL is not being load balanced in this situation, though for simplicity - that makes that out of scope for this question.
These are some well known techniques for scaling such an app.
Reduce DB hits
Most often the bottle neck will be your DB, so cache recent pages so that you reduce DB activity, perhaps in something like memcached.
Design your schema such that it is partition-able.
In the simplest case, separate your data into logical partitions, and store each partition in a separate mysql DB. Craigslist, for example, partitions data by city, and in some cases, by section within that. In your case, you could partition by Id quite simply.
Manage php sessions
Putting ngnx in front of a php website will not work if you use sessions. Load balancing php does have issues as sessions are persisted on local storage. Therefore you need to do session management explicitly. The traditional solution is to use memcached to store and look up some kind of cookie.
Don't optimize prematurely.
Focus on getting your application out so that the next magnitude of current users gets the optimal experience.
Note: Your main potential pain points are discussed here on SO
No, it is not at all important to scale your application if you don't need to.
My view on this is:
Make it work
Make sure it works correctly - testability, robustness
Make it work efficiently enough to be cost effective to run
Then, if you have to so much traffic that your system cannot handle it, AND you've already thrown all the hardware that (sensible) money can buy at it, then you need to scale. Not sooner.
Yes it is relatively easy to scale read-workloads, because you can simply perform reads against readonly database replicas. The challenge is to scale write-workloads.
A lot of sites have few writes, even if they're really busy.
The correct approach is to use some kind of load balancer such as:
http://www.softwareprojects.com/resources/programming/t-how-to-install-and-configure-haproxy-as-an-http-loa-1752.html
What this does is forward a certain user session only to a certain server, hence you dont have to worry about sessions and where they are stored at all. What you do have to worry is how to distribute the filesystem if the 2 servers are running on two different machines, especially if you make heavy use of the filesystem. Hope this article above helps...

Categories