PHP uses cookies, sessions or databases (and ORMs) in order to remember data (so they are not lost after single HTTP request). However, in Java (I mean servlets etc.) there is another solution: in brief you may choose for an object different scopes (how long it exists). Besides of session-scope or simple single HTTP-request "life" (scope), it can "live" during whole HTTP-server runtime and can be initialized at the startup of the HTTP-server.
Data can be therefore shared between different users / sessions, and no database requests are required (causing decrease of efficiency of the whole web-application). (I mean they're not required when HTTP-Server is already running - the object and its state is "remembered").
(And I do as much as I can to decrease SQL requests, using even PHP arrays for frequently read, but actually never modified DB data).
What I need in PHP is a way to:
Remember (store somewhere) data that can be changed and shared between many users, but not into DB
Without using sessions (nor cookies) I want to have multiple data-informations for many requests (etc. AJAX no single, but many requests to the same URL), which of course must be stored somewhere else for some time. For instance, I want to read all data (rows) with a single SQL request, remember them for a short period in PHP, and only then, one by one row, send responses with, say, each row in seperate response into appropriate AJAX function
Anyone can give me some hints how can I achieve this in PHP, preferably easiest possible way?
As a preface to this answer (which I'm sure you've already grasped), PHP's execution model essentially 'restarts' the process between requests and as such storage of anything cross-request in PHP alone is unachievable.
That leaves you with a few options, and they're all really 'strengths' of database:
Use a simple key-value in-memory persistance layer, like memcached or Redis
Use a noSQL solution with a bit more structure (and consistency should this be required) but that's still working in-memory and is comparably quicker than an RDB
Use an RDBMS because it'll work great, and the quantity if traffic you'll need to topple a well designed schema on moderate hardware is probably much higher than you think
HTH
Related
I constantly read on the Internet how it's important to correctly architect my PHP applications so that they can scale.
I have built a simple/small CMS that is written in PHP (think of Wordpress, but waaaay simpler).
I essentially have URLs like such: http://example.com/?page_id=X where X is the id in my MySQL database that has the page content.
How can I configure my application to be load balanced where I'm simply performing PHP read activities.
Would something like Nginx as the front door setup to route traffic to multi-nodes running my same code to handle example.com/?page_id=X be enough to "load balance" my site?
Obviously, MySQL is not being load balanced in this situation, though for simplicity - that makes that out of scope for this question.
These are some well known techniques for scaling such an app.
Reduce DB hits
Most often the bottle neck will be your DB, so cache recent pages so that you reduce DB activity, perhaps in something like memcached.
Design your schema such that it is partition-able.
In the simplest case, separate your data into logical partitions, and store each partition in a separate mysql DB. Craigslist, for example, partitions data by city, and in some cases, by section within that. In your case, you could partition by Id quite simply.
Manage php sessions
Putting ngnx in front of a php website will not work if you use sessions. Load balancing php does have issues as sessions are persisted on local storage. Therefore you need to do session management explicitly. The traditional solution is to use memcached to store and look up some kind of cookie.
Don't optimize prematurely.
Focus on getting your application out so that the next magnitude of current users gets the optimal experience.
Note: Your main potential pain points are discussed here on SO
No, it is not at all important to scale your application if you don't need to.
My view on this is:
Make it work
Make sure it works correctly - testability, robustness
Make it work efficiently enough to be cost effective to run
Then, if you have to so much traffic that your system cannot handle it, AND you've already thrown all the hardware that (sensible) money can buy at it, then you need to scale. Not sooner.
Yes it is relatively easy to scale read-workloads, because you can simply perform reads against readonly database replicas. The challenge is to scale write-workloads.
A lot of sites have few writes, even if they're really busy.
The correct approach is to use some kind of load balancer such as:
http://www.softwareprojects.com/resources/programming/t-how-to-install-and-configure-haproxy-as-an-http-loa-1752.html
What this does is forward a certain user session only to a certain server, hence you dont have to worry about sessions and where they are stored at all. What you do have to worry is how to distribute the filesystem if the 2 servers are running on two different machines, especially if you make heavy use of the filesystem. Hope this article above helps...
I'm creating a web service that often scrapes data from remote web pages. After scraping this data, I have a simple multidimensional array of information to use. The scraping process is fairly taxing on my server, and the page load takes a while. I was considering adding a simple cache system using a MySQL database, where I create one row per remote web page with a the array of information pulled from it stored as a JSON encoded string. Is this a good enough system? Or would something like a text file per web page be a better idea?
Since you're scraping multiple web pages, and you want to your data to be persistently cached, you have a few options -- the best of which would be to use memcache or a database such as MySQL. Using text files is not a good idea, because you would have to serialize / deserialize your data, and read from your filesystem. To query a database or a memcache is many times more efficient.
Since you're probably looking for your cache to be somewhat persistent, I would suggest going with MySQL. You would simply create a table that has an auto-incrementing primary key, which a column for each element in your parsed JSON object. (Note that MySQL currently does not support arrays. In order to emulate them, you will need to use relational tables, or serialize your array data and provide it to a text field. The former method is preferred).
Every time you scrape a page, you would run an UPDATE statement to update that individual page's information in the database. If you specify a unique index on whatever you use to uniquely identify your page (URL / etc), you will achieve optimal look-up performance.
If you're looking to store the cache locally on 1 server (e.g. if your mysql server and http server are on the same box), you might be better off using APC, which is a cache service that comes with PHP.
If you're looking to store the data remotely (e.g. a dedicated cache box) then I would go with Memcache instead of MySQL.
"When all you have is a hammer ..."
I don;'t tend to have particularly large APC configs, 64 - 128MB max. Memcache can go to a couple of gigabytes or maybe more (far more if you run multiple instances). Both are also transient - a restart of Apache, or Memcache (the the latter is slightly less likely, or often) will lose the data
It depends then, on how often you are willing to process the data to produce the cache, and how long that cache could otherwise be useful for. If it was good for weeks before you re-scraped the pages - Mysql is a entirely suitable backing store.
Potential pther options, depending on how many items are being cached & how big the data is, are, as you suggest, a file-based cache, SQlite, or other systems.
Is it "better" (more efficient, faster, more secure, etc) to (A) cache data that is used on every page load in the $_SESSION array (but still querying a table for a flag to reload the data fresh), or (B) to load it from the database each time?
I'm using the cache method (A), but I'm worried that with hundreds of users, memory could become an issue? It's just simple data, like firstname, lastname, birthday, etc.
With either method, there's still a query being run. Thoughts?
If your data is used on every pages, and is the same for all users, I wouldn't cache it in $_SESSION (which means having a different copy of that data for each user), but with another mecanism, like :
file
In memory, with APC for instance (if only 1 server)
In memory, with memcached, for instance (if you have several servers)
If your data requires long calculations or several DB queries to be obtained, caching it in database could be another possibility (would mean only 1 query to fetch back, and less calculations)
If your data is not the same for each user (which seems to be the case in your situation, as you are caching names, birthdates, ...) :
I would make sure I only cache what is necessary
Once you only have a few data to cache, putting it in session should be quite OK
If you really have that many users, you'll probably have some other scalability problems, and will most likely come to use something like memcached anyway ; which means you'll have some other way of caching ;-)
As a sidenote : if you are doing the same query over and over again, you DB server should cache it by itself (for MySQL, it would go into the "query cache") ; so, it would not be as bad as you think, I suppose -- even if not that much optimized ^^
It depends on what you're session handler is. Your session handler could be MySQL, and thus the question would not be which is better, but how to optimize your session handling.
The default PHP session handler is files, but it can be changed to mysql quite easily.
If you're talking about non-user specific data, then just save it to the DB. Worry about optimizing if you run into problems later. It is usually much more beneficial to use a better design pattern then thinking about optimizing before hand. Design your code so you can easily use a different handler for storage, and you won't have optimizing problems later.
If it is user specific, use the session, but use an appropriate session handler if necessary.
I am wondering if it is viable to store cached items in Session variables, rather than creating a file-based caching solution? Because it is once per user, it could reduce some extra calls to the database if a user visits more than one page. But is it worth the effort?
If the data you are caching (willing to cache) does not depend on the user, why would you store in the session... which is attached to a user ?
Considering sessions are generally stored in files, it will not optimise anything in comparaison of using files yourself.
And if you have 10 users on the site, you will have 10 times the same data in cache ? I do not think this is the best way to cache things ;-)
For data that is the same fo all users, I would really go with another solution, be it file-based or not (even for data specific to one user, or a group of users, I would probably not store it in session -- except if very small, maybe)
Some things you can look about :
Almost every framework provides some kind of caching mecanism. For instance :
PEAR::Cache_Lite
Zend_Cache
You can store cached data using lots of backend ; for instance :
files
shared memory (using something like APC, for example)
If you have several servers and loads of data, memcached
(some frameworks provide classes to work with those ; switching from one to the other can even be as simple as changing a couple of lines in a config file ^^ )
Next question is : what do you need to cache ? For how long ? but that's another problem, and only you can answer that ;-)
It can be, but it depends largely on what you're trying to cache, as well as some other circumstances.
Is the information likely to change?
Is it a problem if slightly outdated information is shown?
How heavy is the load the query imposes on the database?
What is the latency to the database server? (shouldn't be an issue on local network)
Should the information be cached on a per user basis, or globally for the entire application?
Amount of data involved
etc.
Performance gain can be significant in some cases. On a particular ASP.NET / SQL Server site I've worked on, adding a simple caching mechanism (at application level) reduced the CPU load on the web server by a factor 3 (!) and at the same time prevented a whole bunch of database timeout issues when accessing a certain table.
It's been a while since I've done anything serious in PHP, but I think your only option there is to do this at the session level. Most of my considerations above are still valid however. As for effort; it should take very little effort to implement, assuming your code is sufficiently structured.
Session should only really be used strictly for user specific data. If you're using it to cache things that should be common across multiple sessions, you're duplicating a lot of data needlessly. Why not just use the Cache that comes with ASP.NET (you can use inProcess, rather than SQL if your concern is DB roundtrips, since you'll be storing Cached data in memory)
Is there difference between caching PHP objects on disk rather than not? If cached, objects would only be created once for ALL the site visitors, and if not, they will be created once for every visitor. Is there a performance difference for this or would I be wasting time doing this?
Basically, when it comes down to it, the main question is:
Multiple objects in memory, PER user (each user has his own set of instantiated objects)
VS
Single objects in cached in file for all users (all users use the same objects, for example, same error handler class, same template handler class, and same database handle class)
To use these objects, each PHP script would have to deserialize them anyway. So it's definitely not for the sake of saving memory that you'd cache them on disk -- it won't save memory.
The reason to cache these objects is when it's too expensive to create the object. For an ordinary PHP object, this is not the case. But if the object represents the result of an costly database query, or information fetched from a remote web service, for instance, it could be beneficial to cache it locally.
Disk-based cache isn't necessarily a big win. If you're using PHP and concerned about performance, you must be running apps in an opcode-caching environment like APC or Zend Platform. These tools also provide caching you can use to save PHP objects in your application. Memcached is also a popular solution for a fast memory cache for application data.
Also keep in mind not all PHP objects can be serialized, so saving them in a cache, whether disk-based or in-memory, isn't possible for all data. Basically, if the object contains a reference to a PHP resource, you probably can't serialize it.
Is there difference between caching PHP objects on disk rather than not?
As with all performance tweaking, you should measure what you're doing instead of just blindly performing some voodoo rituals that you don't fully understand.
When you save an object in $_SESSION, PHP will capture the objects state and generate a file from it (serialization). Upon the next request, PHP will then create a new object and re-populate it with this state. This process is much more expensive than just creating the object, since PHP will have to make disk I/O and then parse the serialized data. This has to happen both on read and write.
In general, PHP is designed as a shared-nothing architecture. This has its pros and its cons, but trying to somehow sidestep it, is usually not a very good idea.
Unfortunately there is not right answer for this. The same solution for the same website on the same server can provide better performance or a lot worse. It really depends on too many factors (application, software, hardware, configuration, server load, etc).
The points to remember are:
- the slowest part of a server is the hard drive.
- object creation is WAY better than disk access.
=> Stay as far as possible from the HD and cache data in RAM if possible.
If you do not have performance issue, I would advice to do... nothing.
If you have performance issue: benchmark, benchmark, benchmark. (The only real way to find a better solution).
Interesting video on that topic: YouTube Scalability
I think you would be wasting time, unless the data is static and complex to generate.
Say you had an object representing an ACL (Access Control List) stating which user levels have permissions for certain resources.
Populating this ACL might take considerable time, especially if data comes from a database. The cached ACL could be instantiated much quicker.
I have used caching SQL query results, and time-intensive calculation results and have had impressive results. right now I'm working on an application that fetches more than 200 database records (which have a a lot of SQL functions and calculation in them) from a table with more than 200,000 records, calculate results from the fetched data, for each request. I use Zend_Cache component of Zend Framework to cache the calculated results, so next time I do not need to:
connect to database
wait for database server to find my records, calculation my sql functions, return results
fetch at least 200 (could even rich 1000) records into memory
step over all these data and calculate what I want from them
I just do:
call for Zend_Cache::load() method, that will do some file reading.
that will save me at least 4-5 seconds on each request (very inaccurate, I did not profile it actually. but the performance gain is quite visible)
Can be useful in certain cases, but comes with careful study of implications and after other kind of performance improvements (like DB queries, data structure, algorithms, etc.).
The query you cache should be constant (and limited in number) and the data, pretty static. To be effective (and worth it), your hard disk access needs to be far quicker than your DB query for that data.
I once used that by serializing cached objects in files, on relatively static content on a home page taking 200+ hits/s with a heavily loaded single-instance DB, with unavoidable queries (at my level). Gained about 40% performance on that home page.
Code -when developing that from scratch- is very quick and straightforward, with pile_put/get_contents and un/serialize. You can name your file after, say, the md5 checksum of your query.
Having the objects cached in memory is usually better then on the disk:
http://code.google.com/p/php-object-cache/
However, benchmark for yourself and compare the results. Thats they only you can know for sure.