How works (and using) RESTful's caching - php

I developed a really small REST API (using PHP), which provides information about users (also update and create users but it doesn't matter for the question). just to show the available calls (JSON output, by the way):
/api/users/54216
/api/users/54216?fields=id,name
/api/users/54216/photos
54216 is an example user id.
Until this day, I used caching only to save html pages to display, really not complicated - never used cache to save only data.
What should I do to save these calls, and how do I use it then? My target is (I think..) to save the data to JSON file one time at X minutes and when needed, get the file cache and decode it.
In addition, how do you recommend me to cache specific the information of user? because call no.1 output all the information and call no.2 output only specific fields, I don't wanna use 2 cache files because it's really not effective.
I have never taken a part in this section (cache [json] data & REST API, it's my first time), so I am very confused.
EDIT:
I am talking about server-side caching.

I suggest you to read HTTP Cache
The first important principal is to understand how HTTP Caching Works,
there are basically two parts, TTL (Cache-Control) and Stale Check
(ETag). When a resource is generated by an origin server you need to
think of it is gone. You no longer have control over it, you only get
to make suggestions to the client what to do with it. The two
mechanism you have are TTL (which is how long the client should keep
the object in cache before checking back) and Stale Check (which is a
version of the resource that was returned) that can be sent with a new
GET request to the origin server, to say "Hey I have this version is
it still good". Giving the Origin server the opportunity to say yep,
keeping using that one and provide a new TTL, if it is still valid.
You need to use these two controls in different ways to get the
effects you want. For instance when serving files that will never
change (like the css for a build) you can set a really long TTL, and
no etag. For something that doesn't change very often, but when it
does change needs to be quick (like the party members on a
reservation) you would set a low TTL (like 1 minute) and an ETag. In
this second example you set a low TTL of 1 minute to help with bursts
from clients to not overwhelem the origin server (scale) and the ETag
allows the Origin server to skip the construction of the reservation
object, if it has a way to verify what the current valid ETag is
faster than constructing the entire reservation. Another example
would be something that doesn't change often and when it does, it can
propagate slowly (like a user's ad recommendation profile) You can
set a higher TTL (like 6 hours) and not worry so much about an ETag
(although it would still be useful).
REf: https://groups.google.com/d/msg/api-craft/YJMH0XMQJIM/HtdAPEXbQLMJ
Or, if you want to cache on server side, have a look at memcached (tutorial)
And also look at Reverse Proxy cache solutions like varnish etc.

https://devcenter.heroku.com/articles/ios-network-caching-http-headers
This explains Caching from iOS perspective, but the term Cache-Control, max-age, ETags, Last Modified are explained well.

Related

Passing class Instances and other data between pages in PHP

I've been looking into the problems of having persistent data available between pages in PHP. This particularly applies to objects that have been set up in one page that need to be accessed later. It seems this is more difficult than I assumed it would be, but there are several ways this could be done, although they all seem a bit awkward to use especially when the data gets quite complex:
Passing the data via $_GET or $_POST to the next page
Copying the data to a database and retrieving it in the next page
Putting the data in a session or cookie
Serializing the object and recreating it with the same parameters and values
These all seem quite laborious as they mostly rely on having to deconstruct your existing data structure and then rebuild it again on the next page. I assume this is to reduce memory requirements of the PHP server by purging data from one page as soon as its closed and starting with a 'clean slate'.
Is there a more direct way of passing larger data structures between pages in PHP?
Many thanks,
Kw
I assume this is to reduce memory requirements of the PHP server by purging data from one page as soon as its closed
Nope, this is not because of memory efficiency concern. This is because HTTP protocol is stateless. Each request must carry all information that is necessary to fulfill it.
Counter-example to your proposed scenario:
let's suppose Alice visits page A, some objects are created and you want them to be available in page B.
You track a visit to page B.
2.1. But it's not Alice, it's Bob. How do you determine which objects to show and where do you get them from?
2.2. It is Alice again, but the request arrived to another machine from your 1000 server farm. Naturally, you don't have original PHP objects. What do you do now?
If you use $_GET or $_POST you are limited to non-sensitive data and you expose your objects to any user. You don't want that.
Cookies are limited in size
cookies are usually limited to 4096 bytes and you can't store more than 20 cookies per site.
The best way to persist objects between requests (for the same user) is to use Sessions. There are already session save handlers for memcached, redis, mysql etc. You can also write your own if you need something custom.

How to check if the query returned result has changed since last check

For caching purposes, I need to check whether sql returned set of data has changed. HTTP_IF_MODIFIED_SINCE will check if the file code has changed (correct me if I am wrong). I need a way to check if the sql returned results has changed? what's the best approach/way to do so? Thanx.
I need to check whether sql returned set of data has changed
The real answer is: you can't :)
How can you tell that data has changed, when you did not check back? Correct - you just simply cannot tell!
Adjust your cache time according to your presumption, when the data might have changed. There is no easy way to decide this. Caching means to weigh up between how costly it is to fetch fresh data from the database and how long you (your users) can live with probably old data, and not every piece of data is as well cacheable as others.
The database itself uses query caches and should be able to serve an unchanged resultset from its cache (doing it much more intelligent as PHP could possibly do it). However, in a multiuser environment those caches tend to be freed early because of heavy database traffic. Here you can help out with your very own caching strategy.
u can use MD5(str);
if(md5($lastsql)==md5($nowsql)){}...
md5 description
HTTP_IF_MODIFIED_SINCE will check if the file code has changed (correct me if I am wrong)
The If-Modified-Since header tells the application to respond with a 304 status code if it can determine that the resource hasn't changed since the specified date.
For static files, usually the web server handles this itself, and the modification date is the modification date of the static file.
For non static resources, like a PHP script, the application (i.e. your code) has to handle the If-Modified-Since header itself. Most applications don't. Applications implementing this usually refer to the modification of the data, not the code.
You are probably doing "Optimistic Concurrency Control". This usually involves a "version" field in the database, which then you can use as the "etag" in HTTP. (the keyword there is "conditional GET")

What's a best way to store a lot of data in cache?

My website sends curl requests to an external service and gets XML responses.
The requests are user specific and the responses are rather heavy (& several requests on the same page), so it takes time to load the page and uses too much server's traffic.
How I tried to solve the problem:
The requests sent from the client side (js). Unluckily for me it becomes rather messy to parse the received data and integrate it to the page's objects
Put the responses in session (as they are specific for user). The session files on server get large too fast. Implemented a counter, that erases all the responses from session if their number is too big (using this now)
Memcache? Too much data to save
Do you think I should use one of the solutions or is there another way to do it?
Use a commbination of
cache
database
You push things in your "data store" (this is cache and database). Then you look up in your datastore if it is available. The data store looks into cache, if available give it, if not look in database. And if everything fails get the info.
You could also increase the size of the cache (but that is not a good sollution).
Try like this
$key = "User_id_".$user_id."category_".$category_id;
then acc to this key store each data like
$memcache->set($key, $data, , 3600);

Is it bad to store $_session variables for every user?

Question basically says it all. I get a lot of traffic, about 200k hits a day. I want to store the original referrer (where they came from) in a session variable for various purposes. Is this a good idea or should I stick in a database instead?
You can do both at once :). PHP allows you define the storage logic of your sessions in scripts. This way it is possible to store sessions in a database as well. Check the manual of set_session_save_handler()
Using a database would have its advantages if you use load balancing (or plan to do it once). This way all web servers could read the session data from the same database (or cluster) and the load balancer would not have to worry about which request should be forwarded to which web server. If session data is stored in files, which is the default mechanism, then a load balancer has to forwared each request of a session to the same physical web server, which is much more complex, as the load balancer has to work on HTTP level.
You could just store the information in a cookie if you only need it for the user's current session. Then you don't need to store it at all on your end.
There are a few down sides as well:
They may have cookies disabled, so you may not be able to save it.
If you need the information next time you may not be able to get it, as it could have been deleted.
Not super secure so don't save passwords, bank info, etc.
So if needing this information is required no matter what, maybe its not the way to go. If the information is optional, then this will work.
The default PHP session handler is the file handler. So, the pertinent questions are:
Are you using more than 1 webserver without sticky sessions (load balancing)?
Are you running out of disk space?
Do you ever intend to do those?
If yes (to any), then store it in a database. Or, even better, calculate the stuff on every request (or cache it somewhere like Memcached). You could also store the stuff in a signed cookie (to prevent tampering).

ASP.NET or PHP: Is Memcached useful for storing user-state information?

This question may expose my ignorance as a web developer, but that wouldn't exactly be a bad thing for me now would it?
I have the need to store user-state information. Examples of information that I need to store per user. (define user: unauthenticated visitor)
User arrived to the site from google/bing/yahoo
User utilized the search feature (true/false)
List of previous visited product pages on current visit
It is my understanding that I could store this in the view state, but that causes a problem with page load from the end-users' perspective because a significant amount of non-viewable information is being transferred to and from the end-users even though the server is the only side that needs the info.
On a similar note, it is my understanding that the session state can be used to store such information, but does not this also result in the same information being transferred to the user and stored in their cookie? (Not quite as bad as viewstate, but it does not feel ideal).
This leaves me with either a server-only-session storage system or a mem-caching solution.
Is memcached the only good option here?
Items 1 and 2 seem to be logging information - i.e. there's no obvious requirement to refer back them, unless you want to keep their search terms. So just write it to a log somewhere and forget about it?
I could store this in the view state
I'm guessing that's a asp.net thing - certainly it doesn't mean anything to me, so I can't comment.
it is my understanding that the session state can be used to store such information
Yes - it would be my repository of choice for item 3.
does not this also result in the same information being transferred to the user and stored in their cookie?
No - a session cookie is just a handle to data stored server-side in PHP (and every other HTTP sesion imlpementation I've come across).
This leaves me with either a server-only-session storage system or a mem-caching solution
memcache is only really any use as a session storage substrate.
Whether you should use memcache as your session storage substrate....on a single server, there's not a lot of advantage here - although NT (IIRC) uses fixed size caches/buffers (whereas Posix/Unix/Linux will use all available memory) I'd still expect most the session I/O to be to memory rather than disk unless your site is overloaded. The real advantage is if you're running a cluster (but in that scenario you might want to consider something like Cassandra).
IME, sessions are not that great of an overhead, however if you want to keep an indefinite history of the activity in a session, then you should overflow the data (or keep it seperate from) the session to prevent it getting too large.
C.
Memcached is good for read often/write rarely key/value data, it is a cache as the name implies. e.g. if you are serving up product info that changes infrequently you store it in memcached so you are not querying the database repeatedly for semi static data. You should use session state to store your info, the only thing that will be passed to and fro is the session identifier, not the actual data, that stays on the server.

Categories