I have an array of altcoins i want to query using bittrex api. I loop through them and pass it on to the API as a parameter Like so:
$coins = array("BTC-LTC", "BTC-PTOY", "BTC-BLK");
for($i=0; $i<3; $i++) {
$mkt_code = $coins[$i];
getPrice($mkt_code);
}
function getPrice($mkt_code) {
$uri = 'https://bittrex.com/api/v1.1/public/getticker?market='.$mkt_code;
$obj = json_decode(file_get_contents($uri));
/*do something like save prices in a database*/
}
But if i set a cron job to run this file say every minute and array size is over 100 this could take quite a while to execute sequentially. I want a way of doing it concurrently so that i could get the prices all at once, much faster and in real time. Help please, i'm relatively new to this. Thanks
Use Guzzle's concurrent request feature.
Related
I've been working on a script that makes close to a thousand async requests using getAsync and Promise\Settle. Each page requested it then parsed using Symphony crawler filter method (Also slow but a separate issue.)
My code looks something like this:
$requestArray = [];
$request = new Client($url);
foreach ($thousandItemArray as $item) {
$requestArray[] = $request->getAsync(null, $query);
}
$results = Promise\settle($request)->wait(true);
foreach ($results as $item) {
$item->crawl();
}
Is there a way I can crawl the requested pages as they come in rather than waiting for them all and then crawling. Am i right in thinking this would speed things up if possible?
Thanks for your help in advance.
You can. getAsync() returns a promise, so you can assign an action to it using ->then().
$promisesList[] = $request->getAsync(/* ... */)->then(
function (Response $resp) {
// Do whatever you want right after the response is available.
}
);
$results = Promise\settle($request)->wait(true);
P.S.
Probably you want to limit the concurrency level to some number of requests (not to start all the requests at once). If yes, use each_limit() function instead of settle. And vote for my PR to be able to use settle_limit() ;)
Is there a way to sideload (load multiple API calls at the same time) API calls to lessen the impact on API call limits, using PHP?
For example, we're using the EchoNest API to gather information on musicians. When the artist page on our site is accessed, we run multiple functions which each call a different API method that returns the specific data that we need. Everything works and looks awesome!
Here are a few (abbreviated) methods that we're calling that each count against our call limit:
function artistPageNews() {
$artist_name = $_GET['artistname'];
$results = iTunes::search($artist_name, array(
'entity' => 'musicVideo'
))->results;
$echonest_api_key = "OUR_API_KEY";
// News Method
$echonest_news = 'http://developer.echonest.com/api/v4/artist/news?api_key='.$echonest_api_key.'&name='.str_replace(" ", "+", $artist_name).'&format=json&results=2&start=0';
$echonest_news_json = file_get_contents($echonest_news);
$news_json = json_decode($echonest_news_json);
$news_entry = $news_json->response->news;
foreach ($news_entry as $news) {
// Do Magic Stuff Here...
}
}
function artistPageVideos() {
$artist_name = $_GET['artistname'];
$results = iTunes::search($artist_name, array(
'entity' => 'musicVideo'
))->results;
$echonest_api_key = "OUR_API_KEY";
// Videos Method
$echonest_videos = 'http://developer.echonest.com/api/v4/artist/video?api_key='.$echonest_api_key.'&name='.str_replace(" ", "+", $artist_name).'&format=json&results=6&start=0';
$echonest_videos_json = file_get_contents($echonest_videos);
$videos_json = json_decode($echonest_videos_json);
$videos_entry = $videos_json->response->video;
foreach ($videos_entry as $video) {
// Do More Magic Stuff Here...
}
}
We have maybe about 7 (or more) of these methods that are called on each Artist page load. Obviously this can mean trouble when lots of people are viewing the artist pages every hour.
I understand that there's a way to store the more static information into a database and use that info instead of calling the API methods on every request. I am currently exploring that option. But I also read here that there may be a way to 'sideload' the API calls so that you can make multiple requests at one time. In that example, they're using Curl. I'm trying to do this with PHP.
curl https://{subdomain}.zendesk.com/api/v2/help_center/fr/articles.json?include=users \
-v -u {email_address}:{password}
Can anyone help me get started with this or perhaps recommend a better way to do this, such as storing this information into a database or table and pulling from that instead of calling the API every time?
Thanks in advance.
I am working on a real estate website and we're about to get an external feed of ~1M listings. Assuming each listing has ~10 photos associated with it, that's about ~10M photos, and we're required to download each of them to our server so as to not "hot link" to them.
I'm at a complete loss as to how to do this efficiently. I played with some numbers and I concluded, based on a 0.5 second per image download rate, this could take upwards of ~58 days to complete (download ~10M images from an external server). Which is obviously unacceptable.
Each photo seems to be roughly ~50KB, but that can vary with some being larger, much larger, and some being smaller.
I've been testing by simply using:
copy(http://www.external-site.com/image1.jpg, /path/to/folder/image1.jpg)
I've also tried cURL, wget, and others.
I know other sites do it, and at a much larger scale, but I haven't the slightest clue how they manage this sort of thing without it taking months at a time.
Sudo code based on the XML feed we're set to receive. We're parsing the XML using PHP:
<listing>
<listing_id>12345</listing_id>
<listing_photos>
<photo>http://example.com/photo1.jpg</photo>
<photo>http://example.com/photo2.jpg</photo>
<photo>http://example.com/photo3.jpg</photo>
<photo>http://example.com/photo4.jpg</photo>
<photo>http://example.com/photo5.jpg</photo>
<photo>http://example.com/photo6.jpg</photo>
<photo>http://example.com/photo7.jpg</photo>
<photo>http://example.com/photo8.jpg</photo>
<photo>http://example.com/photo9.jpg</photo>
<photo>http://example.com/photo10.jpg</photo>
</listing_photos>
</listing>
So my script will iterate through each photo for a specific listing and download the photo to our server, and also insert the photo name into our photo database (the insert part is already done without issue).
Any thoughts?
I am surprised the vendor is not allowing you to hot-link. The truth is you will not serve every image every month so why download every image? Allowing you to hot link is a better use of everyone's bandwidth.
I manage a catalog with millions of items where the data is local but the images are mostly hot linked. Sometimes we need to hide the source of the image or the vendor requires us to cache the image. To accomplish both goals we use a proxy. We wrote our own proxy but you might find something open source that would meet your needs.
The way the proxy works is that we encrypt and URL encode the encrypted URL string. So http://yourvendor.com/img1.jpg becomes xtX957z. In our markup the img src tag is something like http://ourproxy.com/getImage.ashx?image=xtX957z.
When our proxy receives an image request, it decrypts the image URL. The proxy first looks on disk for the image. We derive the image name from the URL, so it is looking for something like yourvendorcom.img1.jpg. If the proxy cannot find the image on disk, then it uses the decrypted URL to fetch the image from the vendor. It then writes the image to disk and serves it back to the client. This approach has the advantage of being on demand with no wasted bandwidth. I only get the images I need and I only get them once.
You can save all links into some database table (it will be yours "job queue"),
Then you can create a script which in the loop gets the job and do it (fetch image for a single link and mark job record as done)
The script you can execute multiple times f.e. using supervisord. So the job queue will be processed in parallel. If it's to slow you can just execute another worker script (if bandwidth does not slow you down)
If any script hangs for some reason you can easly run it again to get only images that havnt been yet downloaded. Btw supervisord can be configured to automaticaly restart each script if it fails.
Another advantage is that at any time you can check output of those scripts by supervisorctl. To check how many images are still waiting you can easy query the "job queue" table.
Before you do this
Like #BrokenBinar said in the comments. Take into account how many requests per second the host can provide. You don't want to flood them with requests without them knowing. Then use something like sleep to limit your requests per whatever number it is they can provide.
Curl Multi
Anyway, use Curl. Somewhat of a duplicate answer but copied anyway:
$nodes = array($url1, $url2, $url3);
$node_count = count($nodes);
$curl_arr = array();
$master = curl_multi_init();
for($i = 0; $i < $node_count; $i++)
{
$url =$nodes[$i];
$curl_arr[$i] = curl_init($url);
curl_setopt($curl_arr[$i], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($master, $curl_arr[$i]);
}
do {
curl_multi_exec($master,$running);
} while($running > 0);
for($i = 0; $i < $node_count; $i++)
{
$results[] = curl_multi_getcontent ( $curl_arr[$i] );
}
print_r($results);
From: PHP Parallel curl requests
Another solution:
Pthread
<?php
class WebRequest extends Stackable {
public $request_url;
public $response_body;
public function __construct($request_url) {
$this->request_url = $request_url;
}
public function run(){
$this->response_body = file_get_contents(
$this->request_url);
}
}
class WebWorker extends Worker {
public function run(){}
}
$list = array(
new WebRequest("http://google.com"),
new WebRequest("http://www.php.net")
);
$max = 8;
$threads = array();
$start = microtime(true);
/* start some workers */
while (#$thread++<$max) {
$threads[$thread] = new WebWorker();
$threads[$thread]->start();
}
/* stack the jobs onto workers */
foreach ($list as $job) {
$threads[array_rand($threads)]->stack(
$job);
}
/* wait for completion */
foreach ($threads as $thread) {
$thread->shutdown();
}
$time = microtime(true) - $start;
/* tell you all about it */
printf("Fetched %d responses in %.3f seconds\n", count($list), $time);
$length = 0;
foreach ($list as $listed) {
$length += strlen($listed["response_body"]);
}
printf("Total of %d bytes\n", $length);
?>
Source: PHP testing between pthreads and curl
You should really use the search feature, ya know :)
I want to send ~50 requests to different pages on the same domain and then, I'm using DOM object to gain urls to articles.
The problem is that this number of requests takes over 30 sec.
for ($i = 1; $i < 51; $i++)
{
$url = 'http://example.com/page/'.$i.'/';
$client = new Zend_Http_Client($url);
$response = $client->request();
$dom = new Zend_Dom_Query($response); // without this two lines, execution is also too long
$results = $dom->query('li'); //
}
Is there any way to speed this up?
It's a generel problem by design - not the code itself. If you're doing a for-loop over 50 items each opening an request to an remote uri, things get pretty slow since every requests waits until responde from the remote uri. e.g.: a request takes ~0,6 sec to been completed, multiple this by 50 and you get an exection time of 30 seconds!
Other problem is that most webserver limits its (open) connections per client to an specific amount. So even if you're able to do 50 requests simultaneously (which you're currently not), things won't speed up measurably.
In my option there is only one solution (without any deep going changes):
Change the amout of requests per exection. Make chunks from e.g. only 5 - 10 per (script)-call and trigger them by an external call (e.g. run them by cron).
Todo:
Build a wrapper function which is able to save the state of its current run ("i did request 1 - 10 at my last run, so now I have to call 11 - 20) into a file or database and trigger this function by an cron.
Example Code (untested) for better declaration;
[...]
private static $_chunks = 10; //amout of calls per run
public function cronAction() {
$lastrun = //here get last run parameter saved from local file or database
$this->crawl($lastrun);
}
private function crawl($lastrun) {
$limit = $this->_chunks + $lastrun;
for ($i = $lastrun; $i < limit; $i++)
{
[...] //do stuff here
}
//here set $lastrun parameter to new value inside local file / database
}
[...]
I can't think of a way to speed it up but you can increase the timeout limit in PHP if that is your concern:
for($i=1; $i<51; $i++) {
set_time_limit(30); //This restarts the timer to 30 seconds starting now
//Do long things here
}
I have a script that is running on a shared hosting environment where I can't change the available amount of PHP memory. The script is consuming a web service via soap. I can't get all my data at once or else it runs out of memory so I have had some success with caching the data locally in a mysql database so that subsequent queries are faster.
Basically instead of querying the web service for 5 months of data I am querying it 1 month at a time and storing that in the mysql table and retrieving the next month etc. This usually works but I sometimes still run out of memory.
my basic code logic is like this:
connect to web service using soap;
connect to mysql database
query web service and store result in variable $results;
dump $results into mysql table
repeat steps 3 and 4 for each month of data
the same variables are used in each iteration so I would assume that each batch of results from the web service would overwrite the previous in memory? I tried using unset($results) in between iterations but that didn't do anything. I am outputting the memory used with memory_get_usage(true) each time and with every iteration the memory used is increased.
Any ideas how I can fix this memory leak? If I wasn't clear enough leave a comment and I can provide more details. Thanks!
***EDIT
Here is some code (I am using nusoap not the php5 native soap client if that makes a difference):
$startingDate = strtotime("3/1/2011");
$endingDate = strtotime("7/31/2011");
// connect to database
mysql_connect("dbhost.com", "dbusername" "dbpassword");
mysql_select_db("dbname");
// configure nusoap
$serverpath ='http://path.to/wsdl';
$client = new nusoap_client($serverpath);
// cache soap results locally
while($startingDate<=$endingDate) {
$sql = "SELECT * FROM table WHERE date >= ".date('Y-m-d', $startingDate)." AND date <= ".date('Y-m-d', strtotime($startingDate.' +1 month'));
$soapResult = $client->call('SelectData', $sql);
foreach($soapResult['SelectDateResult']['Result']['Row'] as $row) {
foreach($row as &$data) {
$data = mysql_real_escape_string($data);
}
$sql = "INSERT INTO table VALUES('".$row['dataOne']."', '".$row['dataTwo']."', '".$row['dataThree'].")";
$mysqlResults = mysql_query($sql);
}
$startingDate = strtotime($startingDate." +1 month");
echo memory_get_usage(true); // MEMORY INCREASES EACH ITERATION
}
Solved it. At least partially. There is a memory leak using nusoap. Nusoap writes a debug log to a $GLOBALS variable. Altering this line in nusoap.php freed up a lot of memory.
change
$GLOBALS['_transient']['static']['nusoap_base']->globalDebugLevel = 9;
to
$GLOBALS['_transient']['static']['nusoap_base']->globalDebugLevel = 0;
I'd prefer to just use php5's native soap client but I'm getting strange results that I believe are specific to the webservice I am trying to consume. If anyone is familiar with using php5's soap client with www.mindbodyonline.com 's SOAP API let me know.
Have you tried unset() on $startingDate and mysql_free_result() for $mysqlResults?
Also SELECT * is frowned upon even if that's not the problem here.
EDIT: Also free the SOAP result too, perhaps. Some simple stuff to begin with to see if that helps.