Cache remotely accessed websites

Cache remotely accessed websites - php

I have an application that often retrieves remote websites(via cURL), and I was wondering what are my options regarding to caching of those HTTP requests. For example:
application -->curl-->www.example.com
The issue is that cURL could be called hundreds of times in an hour, and every time would need to make hundreds of HTTP requests, that are basically the same. So, what I could to speed things up ? I was experimenting with Traffic server but wasn't very satisfied with results. I guess DNS caching is a must, but what else I can do here? The system that the app is running on is CentOS.

I don't know why Traffic Server didn't provide satisfiable results, but in general a Forward Proxy setup with caching is the way to do this. You would of course make sure that the response from www.example.com is cacheable, either via configurations on the caching proxy server, or directly on the origin (example.com). This is probably the biggest confusion in the proxy caching world, the expectations of what is cacheable or not does not meet the requirements.

Related

Does prefetch data by Apache before serving a SPA site hurt performance?

I have this application containing 2 parts:
User-facing SPA web served by Apache
NodeJS API server
When a user go to the SPA web, Apache will do authentication and prefetch data for that user by making a POST and a GET request to the API before serving the page. This works fine for normal days, but it is pretty slow when there is a lot of traffic.
Apart from improving the API, I think delegating authentication and data fetching to client-side would improve performance.
My questions are:
Does prefetch data by Apache before serving the SPA page hurt performance?
And would it be better to let client-side does the fetching?

Ideally you should make requests to API directly from your SPA using ajax, then you can store your static files (html, js, css) on Google Storage or Amazon EC3.
1) Yes, it's bad for performance, because it's unuseful layer between browser and your API. Also default Apache creates new process to handle request, so it's bad for your RAM
2) Yes it's better and nowadays best practice

Here are my recommendation
User-facing SPA web served by Apache
Use Nginx instead. It will be much faster because of a event based web server instead of launching processes
When a user go to the SPA web, Apache will do authentication and prefetch data for that user by making a POST and a GET request to the API before serving the page. This works fine for normal days, but it is pretty slow when there is a lot of traffic.
If this happens only for the first call then you have to take a call on your end. Basically you have 1 request which on server translates to 2 sub request. So you have total 3 requests.
Now if a client refreshes the page and you still fetch the information then it a bad design.
Also doing things on webserver would mean you have less flexibility in testing changes, tomorrow if you want to change approach, every time you need to update your config and reload the server
Does prefetch data by Apache before serving the SPA page hurt performance?
Yes it does. You increase the load on server and you can have better control of things at client side.
And would it be better to let client-side does the fetching?
Yes, you can add client side and server side caching separately to make system even more performant.
But this all becomes a opinion based approach. So you should use approach that is able to server your purpose

Not seeing improvement in response times after using Cloudflare CDN

I have a php, Mysql, Apache based website. Hosting server located in London.
Opened cloudflare account, enabled and configured my website to route via cloudflare and enabled caching for static content.
Ran a page load test from different countries. and could not see any improvements.
The test tool howwver detects that i am making effective use of CDN. but there isnt any performance improvement.
1 My static resources each takes around 20ms to download when accessed from london.
2. When accessed from other countries, these resources are taking a good 600ms roughly.
am i missing something?

A couple of things to check:
Are you sending appropriate Cache-Control headers for the static content and confirmed that CloudFlare is caching the content (you should get "HIT" for one of the headers)
Have you run multiple tests for the resources? Perhaps they just haven't been cached yet so you were getting the timing of the CloudFlare->origin fetch

CloudFlare by default does not cache pages, only resources. That may be why your pages are not any faster. If you want your pages to be cached and faster, you have to set up rules to do so.

whats the difference between cache & http cache "reverse proxy"

am having a bit of difficulty understanding the difference between the normal cache "memory,file,db,etc.." and the http caching "reverse proxy".
Example.
lets say i have a page divided into 3 parts.
movies
games
apps
and when i retrieve those parts from the db, i cache each part in its own key & when a new data is entered to any of those parts i flush the cache and remake it including the new data, so now each part will only update if there is something new was added.
on the other hand the http caching have something call ESI which u can include page partials that have a different cache life span from the main page which is perfect but
why would i need to use it ?
or what is the advantage over the first method ?
Edit
this is slimier to what i was after but still, why would u use/continue to use the reverse proxy over the below ?
https://laracasts.com/series/russian-doll-caching-in-laravel
https://www.reddit.com/r/laravel/comments/3b16wr/caching_final_html_from_view/
https://github.com/laracasts/matryoshka

A reverse proxy cache has a few benefits:
Requests served by the proxy never hit your webserver. It's typically
cheaper/easier to scale out your proxy servers (e.g. by using
commercial providers like Akamai) than to scale out your
webserver(s).
This means you can survive traffic spikes and denial of
service attacks with much less stress.
It also means you can serve
traffic that's not close to your origin webserver much faster - if
your site serves a global audience, latency can have a big impact on
your perceived response time.
You can also take your webserver off line, e.g. for upgrades, without affecting your end users.
Drawbacks of reverse proxy caching:
It's an additional layer of architecture, which brings complexity, maintenance, administration etc.
It may also require additional costs (most sites have separate infrastructure for the reverse proxy servers, or use a commercial CDN provider).
Your solution design needs to manage cache invalidation between even more layers - this can easily become complex and error prone.
This, in turn, means that debugging and testing the solution can be very hard. If the QA team report an incorrect page, you need to be able to find out where the item was served from - reverse proxy, application cache, database?

How do I share objects between multiple get requests in PHP?

I created a small and very simple REST-based webservice with PHP.
This service gets data from a different server and returns the result. It's more like a proxy rather than a full service.
Client --(REST call)--> PHP Webservice --(Relay call)--> Remote server
<-- Return data ---
In order to keep costs as low as possible I want to implement a caching table on the PHP webservice system by maintaining data for a period of time in server memory and only re-request the data after a timeout (let's say after 30 mins).
In pseudo-code I basically want to do this:
$id = $_GET["id"];
$result = null;
if (isInCache($id) && !cacheExpired($id, 30)){
$result = getFromCache($id);
}
else{
$result = getDataFromRemoteServer($id);
saveToCache($result);
}
printData($result);
The code above should get data from a remote server which is identified by an id. If it is in the cache and 30 mins have not passed yet the data should be read from the cache and returned as a result of the webservice call. If not, the remote server should be queried.
While thinking on how to do this I realized 2 important aspects:
I don't want to use filesystem I/O operation because of performance concerns.
Instead, I want to keep the cache in memory. So, no MySQL or local
file operations.
I can't use sessions because the cached data must be shared across different users, browsers and internet connections worldwide.
So, if I could somehow share objects in memory between multiple GET requests, I would be able to implement this caching system pretty easily I think.
But how could I do that?
Edit: I forgot to mention that I cannot install any modules on that PHP server. It's a pure "webhosting-only" service.

I would not implement the cache on the (PHP) application level. REST is HTTP, therefore you should use a caching HTTP proxy between the internet and the web server. Both servers, the web server and the proxy could live on the same machine as long as the application grows (if you worry about costs).
I see two fundamental problems when it comes to application or server level caching:
using memcached would lead to a situation where it is required that a user session is bound to the physical server where the memcache exists. This makes horizontal scaling a lot more complicated (and expensive)
software should being developed in layers. caching should not being part of the application layer (and/or business logic). It is a different layer using specialized components. And as there are well known solutions for this (HTTP caching proxy) they should being used in favour of self crafted solutions.

Well, if you do have to use PHP, and you cannot modify the server, and you do want in-memory caching for performance reasons (without first measuring that any other solution has good enough performance), then the solution for you must be to change the webhosting.
Otherwise, you won't be able to do it. PHP does not really have any memory-sharing facilities available. The usual approach is to use Memcached or Redis or something else that runs separately.
And for a starter and proof-of-concept, I'd really go with a file-based cache. Accessing a file instead of requesting a remote resource is WAY faster. In fact, you'd probably not notice the difference between file cache and memory cache.

Load balancing and APC

I am interested in a scenario where webservers serving a PHP application is set up with a load balancer.
There will be multiple webservers with APC behind the load balancer. All requests will have to go through the load balancer, which then sends it to one of the web servers to process.
I understand that memcached should be used for distributed caching, but I think having the APC cache on each machine cache things like application configurations and other objects that will NOT be different across any of the servers would yield even better performance.
There is also an administrator area for this application. It is also accessed via the load balancer (for example, site.com/admin). In a case like this, how can I call apc_clear_cache to clear the APC object cache on ALL servers?

Externally in your network you have a public IP you use to route all your requests to your load balancer that distributes load round robin so outside you cannot make a request to clear your cache on each server one at a time because you don't know which one is being used at any given time. However, within your network, each machine has its own internal IP and can be called directly. Knowing this you can do some funny/weird things that do work externally.
A solution I like is to be able to hit a single URL and get everything done such as http://www.mywebsite/clearcache.php or something like that. If you like that as well, read on. Remember you can have this authenticated if you like so your admin can hit this or however you protect it.
You could create logic where you can externally make one request to clear your cache on all servers. Whichever server receives the request to clear cache will have the same logic to talk to all servers to clear their cache. This sounds weird and a bit frankenstein but here goes the logic assuming we have 3 servers with IPs 10.232.12.1, 10.232.12.2, 10.232.12.3 internally:
1) All servers would have two files called "initiate_clear_cache.php" and "clear_cache.php" that would be the same copies for all servers.
2) "initiate_clear_cache.php" would do a file_get_contents for each machine in the network calling "clear_cache.php" which would include itself
for example:
file_get_contents('http://10.232.12.1/clear_cache.php');
file_get_contents('http://10.232.12.2/clear_cache.php');
file_get_contents('http://10.232.12.3/clear_cache.php');
3) The file called "clear_cache.php" is actually doing the cache clearing for its respective machine.
4) You only need to make a single request now such as http://www.mywebsite/initial_clear_cache.php and you are done.
Let me know if this works for you. I've done this in .NET and Node.js similar but haven't tried this in PHP yet but I'm sure the concept is the same. :)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.