How to implement load balancing in simple REST api? - php

I have simple REST api written in CakePHP (php on apache). Basically it has just one endpoint, let's say /api/something/?format=json. Calling this endpoint doesn't read anything from DB, but internally it's fetching and parsing some external website and returns parsed data to the user in json format. The problem is that fetching and parsing data from external web page may last quite long and therefore I need some load balancing mechanizm which will distribute api calls among several servers.
I have never done any load balancing so I even don't know where to look for info - I am looking for the simplest solution.

Is it a resource that has to be fetched live? Because you could cache the processed data for a certain amount of time.
If it has to be live, doing it in a distributed way is probably not going to solve your problem. (except when you're getting back a dataset that is very large)

http://en.wikipedia.org/wiki/Load_balancing_(computing)
Its pretty late but I guess This is what you need ! Just get the hardware to do all the good stuff !

Related

Increase speed of dynamic API calls using PHP

I am calling different API's on one of my web sites. I am able to get optimal results with multi curl PHP. However, I'm noticing that the speed becomes very slow when traffic is a little high. I have read that caching is another way to speed up websites. However,my question is that can I use caching when the API calls that I am using are entirely dependent on user based inputs? Or is there any alternative solution to this.
It could be possible that maybe 1 request is taking too long to load and as a result delaying other requests.
The answer to your question depends on what kind of task user perform with the data. Basically cache can be used for all tasks related to retrieving, querying data and is not suitable for inserting, mutating or deleting data. There are many way to implement cache in your web application, but one of the easiest way is to use GET request for all user's requests that retrieve data only, and then configure the web server or a CDN to cache them.

Long php process hangs the serve. Bad project design?

Hi I need suggestions for my research project.
I'm building a database that read RSS feeds produced by google alerts and saves the results in a database for latter categorization.
I'm using Wordpress and pods framework to handle the database and UI.
I have 4 objects (pods) with their own tables:
Resources, it's the site data taken from an alert feed.
Sources, it's the domain of the site, like stackoverflow.com
Feeds, with the alert query and rss url.
Topics, the main topics under which the other objects are categorized.
The program flow is shortly this:
For every topic take the feeds.
For every feed load the the rss xml.
For every entry URL in the rss, if newer than last check, control if the domain is already saved in the Sources object.
If the source it's present check if the URL is saved in the Resource object.
If the resource is present (that is we already have the data for this URL) add the current topic and feed of the loop to the resource (if not present).
If the resource is not present save the resource with some data and the current topic and feed.
If the source is not present save the source with the current topic.
This way I will have a bunch of resources with linked feeds and topics, and the relative sources with their topics.
The problem is that the data goes up to the hundreds very fast and in one month I already reached more than 1500 records for resources.
So now every time I run the script, since for every new entry it has to compare it with all the previous ones, the script sistematically hangs.
So I need a way to make it more efficient or to avoid the problem splitting the process.
Since the script is called via Ajax, I thought this flow would work:
Ask the server for the topics/feeds structure.
For every feed in every topic ask the server to load the XML and pass it back has an array.
Then in the front end for every entry send a compare and save call.
Of course the drawback is that I will got plenty of calls.
Another technique I heard about is to flush the data during the server process, since I understood this should trick the server time limit to reset. But I'm not sure I understood well.
Of course the best solution would be to rebuild everything using more specific code instead than two general purpose abstraction layers. But I'm really short in time!
Edit: code here https://github.com/bakaburg1/overseer
I think you're loading too much data at once. The whole point of AJAX is to load data a bit at a time, on-demand as you need it. It doesn't really matter that you have a lot of HTTP calls to the server as long as the scripts you're running to return data don't hang like yours is doing.
I would suggest questioning whether you really need to return all this data at once. If you think you do, a front-end redesign is probably in order. Don't load so much data at once on a single page that it takes forever to load. If you do, caching and other things will help, but it will never truly fix the problem.

Collecting and Processing data with PHP (Twitter Streaming API)

after reading through all of the twitter streaming API and Phirehose PHP documentation i've come across something I have yet to do, collect and process data separately.
The logic behind it, If I understand correctly, is to prevent a log jam at the processing phase that will back up the collecting process. I've seen examples before but they basically write right to a MySQL database right after collection which seems to go against what twitter recommends you do.
What I'd like some advice/help on is, what is the best way to handle this and how. It seems that people recommend writing all the data directly to a text file then parsing/processing it with a separate function. But with this method, I'd assume it could be a memory hog.
Here's the catch, it's all going to be running as a daemon/background process. So does anyone have any experience with solving a problem like this, or more specifically, the twitter phirehose library? Thanks!
Some notes:
*The connection will be through a socket so my guess is that the file will constantly be appended? not sure if anyone has any feedback on that
The phirehose library comes with an example of how to do this. See:
Collect: https://github.com/fennb/phirehose/blob/master/example/ghetto-queue-collect.php
Consume: https://github.com/fennb/phirehose/blob/master/example/ghetto-queue-consume.php
This uses a flat file, which is very scalable and fast, ie: Your average hard disk can write sequentially at 40MB/s+ and scales linearly (ie: unlike a database, it doesn't slow down as it gets bigger).
You don't need any database functionality to consume a stream (ie: you just want the next tweet, there's no "querying" involved).
If you rotate the file fairly often, you will get near-realtime performance (if desired).

Which is the best approach to consume webservices and manipulate its data?

Need some advice on the best approach.
Currently we are going to start a new CI web project where we need to leverage data heavily from a external web-services or API for data?
Is it better to manipulate the data programically (in objects or array) when i need to sort them or store them in database and call them with order, group by etc..?
Is there a known architecture or framework for this?
What's the best approach use nowadays like how aggregater website is doing where they pull many data sources from various vendor API?
I would suggest getting the data using curl etc manipulate as arrays etc then store.
Make sure you build in somekind of caching as well so you don't end up making unnessecary requests.
The reason behind my method is to process once rather than everytime your site is requested.
After all these while, i've have come up with the plan and it's working great !
Consume webservices
Deserialize XML to arrays/ objects
Store in cache (APC/File cache, i'm using codeigniter by the way ) (expire every 4hrs)
First request will take 3-4 secs to complete(first call to webservice to grab data, stored it in cache), while subsequent requests from users take 0.002 secs due to cached data. 4hours later, the cycle will repeat so as to make sure data is 4hourly updated from webservice.
If you are the first user that access the site after each refresh, you are the unlucky chap. But you sacrificed for all other chaps.

Process feeds simultaneously

I am developing a vertical search engine. When a users searches for an item, our site loads numerous feeds from various markets. Unfortunately, it takes a long time to load, parse, and order the contents of the feed quickly and the user experiences some delay. I cannot save these feeds in the db nor can I cache them because the contents of the feeds are constantly changing.
Is there a way that I can process mutliple feeds at the same time at the same time in PHP? Should I use popen or it there a better php parallel processing method?
Thanks!
Russ
If you are using curl to fetch the feeds, you could take a look at the function curl_multi_exec, which allows to do several HTTP requests in parallel.
(The given example is too long to be copied here.)
That would at least allow you to spend less time fetching the feeds...
Considering you server is doing almost nothing when it's waiting for the HTTP request to end, parallelizing those wouldn't harm, I guess.
Parallelizing parsing of those feeds, on the other hand, might do some damages, if it's a CPU-intensive operation (might be, if it's XML parsing and all that).
As a sidenote : is it really not possible to cache some of this data ? Event if it's only for a couple of minutes ?
Using a cron job to fetch the most often used data and store it in cache, for instance, might help a lot...
And I believe a website responding fast is more important to the users than really really upto date at the second results... If your site doesn't respond, they'll go somewhere else !
I agree, people will forgive the caching far sooner than they will forgive a sluggish response time. Just recache every couple of minutes.
You'll have to setup a results page that executes multiple simultaneous requests against the server via JavaScript. You can accomplish this with a simple AJAX request and then inject the returned data into the DOM once it's finished loading. PHP doesn't have any support for threading, currently. Parallelizing the requests is the only solution at the moment.
Here's some examples using jQuery to load remote data from a website and inject it into the DOM:
http://docs.jquery.com/Ajax/load#urldatacallback

Categories