How can I cache dynamic data from a REST API using PHP? - php

UPDATE: I've decided to take the advice below and implement a Memcached tier in my app. Now I have another thought. Would it be possible/a good idea to do an AJAX request on a poll (say every five or ten minutes) that checks Memcached and then updates when Memcached has expired? This way the latency is never experienced by the end user because it's performed silently in the background.
I'm using Directed Edge's REST API to do recommendations on my web app. The problem I'm encountering is that I query for a significant number of recommendations in multiple places across the site, and the latency is significant, making the page load something like 2-5 seconds for each query. It looks terrible.
I'm not using Directed Edge's PHP bindings, and instead am using some PHP bindings I wrote myself. You can see the bindings on GitHub. I'm connecting to their API using cURL.
How can I cache the data I'm receiving? I'm open to any number of methods so long as they're fairly easy to implement and fairly flexible.
Here's an example of the client code for getting recommendations.
$de = new DirectedEdgeRest();
$item = "user".$uid;
$limit = 100;
$tags = "goal";
$recommendedGoals = $de->getRecommended($item, $tags, $limit);

You can cache to a file using serialize and file_put_contents:
file_put_contents("my_cache", serialize($myObject));
You could also cache to memcached or a database.

Related

Increase speed of dynamic API calls using PHP

I am calling different API's on one of my web sites. I am able to get optimal results with multi curl PHP. However, I'm noticing that the speed becomes very slow when traffic is a little high. I have read that caching is another way to speed up websites. However,my question is that can I use caching when the API calls that I am using are entirely dependent on user based inputs? Or is there any alternative solution to this.
It could be possible that maybe 1 request is taking too long to load and as a result delaying other requests.
The answer to your question depends on what kind of task user perform with the data. Basically cache can be used for all tasks related to retrieving, querying data and is not suitable for inserting, mutating or deleting data. There are many way to implement cache in your web application, but one of the easiest way is to use GET request for all user's requests that retrieve data only, and then configure the web server or a CDN to cache them.

Parallel processing/forking in PHP to speed up checking large arrays

I have a php script on my website that is designed to give a nice overview of a domain name the user enters. It does this job quite well, however it is very slow. This might have something to do with the fact it's checking an array of 64 possible domain names, and THEN moving on to checking nameservers for A records/MX records/NS records etc.
What i would like to know, is it possible to run multiple threads/child processes of this? So that it will check multiple ellements of the array at once, and generate the output a lost faster?
I've put an example of my code in a pastebin (so to avoid creating a huge and spammy post on here)
http://pastebin.com/Qq9qKtP9
In perl I can do something like this:
$fork = new Parallel::ForkManager($threads);
foreach(Something here){
$fork->start and next;
$fork->finish;
}
And i could make the loop run in as many processes as needed. Is something similar possible in PHP or any other ways you can think of to speed this up? The main issue is, cloudflare has a timeout, and often it will take long enough CF blocks the response happening.
Thanks
* Never Mind Support !! *
You never want to create threads (or additional processes for that matter) in direct response to a web request.
If your frontend is instructed to create 60 threads every time someone clicks on page.php, and 100 people come along and request page.php at once, you will be asking your hardware to create and execute 6000 threads concurrently, to say nothing of the threads used by operating system services and other software. For obvious reasons, this does not, and will never scale.
Rather you want to separate out those parts of the application that require additional threads or processes and communicate with this part of the application via some kind of sane RPC. This means that the backend of the application can utilize concurrency via pthreads or forking, using a fixed number of threads or processes, and spreading work as evenly as possible across all available resources. This allows for an influx of traffic; it allows your application to scale.
I won't write example code, it seems altogether too trivial.
The first thing you want to do is optimze your code to shorten the execution time as much as possible.
For example, instead of making five dns queries:
$NS = dns_get_record($murl, DNS_NS);
$MX = dns_get_record($murl,DNS_MX);
$SRV = dns_get_record($murl,DNS_SRV);
$A = dns_get_record($murl,DNS_A);
$TXT = dns_get_record($murl,DNS_TXT);
You can only call dns_get_record once:
$DATA = dns_get_record($murl, DNS_NS + DNS_MX + DNS_SRV + DNS_A + DNS_TXT);
and parse out the variables from there.
Instead of outright forking processes to handle several parts concurrently, I'd implement a queue that all of the requests would get pushed into. The query processor would be limited as to how many items it could process at once, avoiding the potential DoS if hundreds or thousands of requests hit your site at the same time. Without some sort of limiting mechanism, you'd end up with so many processes that the server might hang.
As for the processor, in addition to the previously mentioned items, you could try pecl/Gearman as your queue processor. I haven't used it, but it appears to do what you're looking for.
Another method to optimize this would be implementing a caching system, that saved the results for, say, a week (or whatever). This would cut down on someone looking up the same site repeatedly in a day (or running a script on your site).
I doubt that it's a good idea to fork with PHP the apache process. But if you really want there is PCNTL (which is not available in the apache module).
You might have more fun with pthread. Nowadays you can even download a PHP which claims to be threadsafe.
And finally you have the possibility to use classic non blocking IO which I would prefer in the case of PHP.

YouTube API retrieving data slows my site down significantly when loading

I am using YouTube's api to load current data of videos that users share on the site in a feed like Facebook, the thing is that it slows my website down a great amount, it's about 2-4 seconds per set of data, so if I have one video 2-4 seconds, then 2 videos 4-8 seconds, etc. So my question is is there a way to not retrieve ALL of the data with this, and speed it up more. (I store the title and description of the video in my own database when the user shares it, but other data I can't. Here's my code:
$JSON = file_get_contents("http://gdata.youtube.com/feeds/api/videos?q={$videoID}&alt=json");
$JSON_Data = json_decode($JSON);
$ratings = $JSON_Data->{'feed'}->{'entry'}[0]->{'gd$rating'}->{'average'};
$totalRatings = number_format($JSON_Data->{'feed'}->{'entry'}[0]->{'gd$rating'}->{'numRaters'});
$views = number_format($JSON_Data->{'feed'}->{'entry'}[0]->{'yt$statistics'}->{'viewCount'});
I also load the thumbnail in, which I may go back to saving the thumbnail on my server on submission, but it doesn't seem to be what is slowing it down so much, because when I remove it it still takes a long time.
$thumbnail = "http://img.youtube.com/vi/".$videoID."/2.jpg";
You can use CURL, file_get_contents ..... its not the point.
The big point is : CACHE THE RESPONSE !
Use memcached, file system, data base or whatever but never call API on page load
As far as I know, this is generally something PHP is not very good at doing.
It simply doesn't support multithreading and threads is exactly what you want to do (perform all the http requests simultaneously, so that their latency is merged).
Perhaps you can move this part of the logic into the browser, by using javascript? The XMLHTTPRequest object in JavaScript supports multithreading.
As far as I know, the only way to do it in PHP is to use raw sockets (fsockopen(); fwrite(); fread(); fclose();), but that isn't for the feint of heart... you'll need to be familiar with the HTTP specification.
And finally, does the content change much? Perhaps you can have a local cache of the html in a database, and a cron job (that might run every 30 seconds) to rebuild the cache? This might be a violation of google's terms of service.
Really the best solution would be to do the server communication with some other language, one that supports threading, and talk to that with your PHP script. I'd probably use Ruby.

Any idea how to implement this?

Any idea how to implement this (http://fluin.com/63) using MySQL+PHP+Javascript(mootools)?
In a nutshell, it's a realtime threaded conversational web app.
Update:
This uses http://www.ape-project.org/home.html
Any idea how to implement realtime stuff without AJAX push (ape)?
Install Firefox.
Install Web Development toolbar
Install Firebug
Install HttpFox
Read docs of above tools re how to use, what they can do.
Go to http://fluin.com/63. Use above tools to inspect.
Read up on Databases and data models, and MySQL.
Build your own.
Well, this depends on your definition of realtime, which, in its technical meaning, is simply impossible with public ip networks and traditional tcp stack, for you have no control over timing.
Closer to the topic though, to get any web page updated without direct user intervention, you'd have to use javascript to poll server for changes since the last successful poll, and do this over certain intervals of time. In calculating these intervals you'll have to consider both network/server load, and the delay that is comfortable for the user.
The server, of course, will have to store the new data and its timely status (creation timestamps are one way of doing it), to be able to distinguish between content already delivered to various clients.
As soon as the server reports new content, it is inserted into a dom page via javascript and the user sees the response.
This is a bit general, of course, but you should get the idea.
Isn't it like a shoutbox ? here an example of one
Doing this properly using PHP only is very hard. When you have 5 users you could use long-polling, but it will definitely not scale when you have let's say 1000 users.
Using comet with PHP?
The screencast(link) in my post shows how you could implement it, but it has a couple of flaws:
It touches the disc(disc is very slow compared to memory).
To make matters worse it also polls the disc frequently(filemtime()).
Maybe phet(PHP) is able to scale. You should try that out.
To make it scale I think you need at least:
a good implementation of long-polling(at least long-polling. You have better transports) that can handle load.
keep data in memory(much faster than dics) using something like redis or memcached.
I would use:
node.js with socket.io(video) module.
to keep data in memory I would use node_redis(video).

Process feeds simultaneously

I am developing a vertical search engine. When a users searches for an item, our site loads numerous feeds from various markets. Unfortunately, it takes a long time to load, parse, and order the contents of the feed quickly and the user experiences some delay. I cannot save these feeds in the db nor can I cache them because the contents of the feeds are constantly changing.
Is there a way that I can process mutliple feeds at the same time at the same time in PHP? Should I use popen or it there a better php parallel processing method?
Thanks!
Russ
If you are using curl to fetch the feeds, you could take a look at the function curl_multi_exec, which allows to do several HTTP requests in parallel.
(The given example is too long to be copied here.)
That would at least allow you to spend less time fetching the feeds...
Considering you server is doing almost nothing when it's waiting for the HTTP request to end, parallelizing those wouldn't harm, I guess.
Parallelizing parsing of those feeds, on the other hand, might do some damages, if it's a CPU-intensive operation (might be, if it's XML parsing and all that).
As a sidenote : is it really not possible to cache some of this data ? Event if it's only for a couple of minutes ?
Using a cron job to fetch the most often used data and store it in cache, for instance, might help a lot...
And I believe a website responding fast is more important to the users than really really upto date at the second results... If your site doesn't respond, they'll go somewhere else !
I agree, people will forgive the caching far sooner than they will forgive a sluggish response time. Just recache every couple of minutes.
You'll have to setup a results page that executes multiple simultaneous requests against the server via JavaScript. You can accomplish this with a simple AJAX request and then inject the returned data into the DOM once it's finished loading. PHP doesn't have any support for threading, currently. Parallelizing the requests is the only solution at the moment.
Here's some examples using jQuery to load remote data from a website and inject it into the DOM:
http://docs.jquery.com/Ajax/load#urldatacallback

Categories