I'm trying to make a site for my battlefield 1 clan, on one of the pages i'd like to display our team and some of their stats.
This API allows me to request just what I need, I decide to use php curl requests to get this data on my site. It all works perfectly fine, but it is super slow, sometimes it even reaches the 30s max of php.
Here is my code
<?php
$data = $connection->query("SELECT * FROM bfplayers");
while($row = mysqli_fetch_assoc($data)){
$psnid = $row['psnid'];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://battlefieldtracker.com/bf1/api/Stats/BasicStats?platform=2&displayName=".$psnid);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
$headers = [
'TRN-Api-Key: MYKEY',
];
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$response = curl_exec($ch);
curl_close($ch);
$result = json_decode($response, true);
print($result['profile']['displayName']);
}
?>
I have no idea why it's going this slow, is it because I am using xamp on localhost or because the requests are going through a loop?
Thanks in advance
your loop is not optimized in the slightest, i believe if you optimized your loop code, your code could run A LOT faster. you create and delete the curl handle on each iteration, when you could just keep re-using the same curl handle on each player (this would use less cpu and be faster), you don't use compressed transfer (enabling compression would probably make the transfer faster), and most importantly, you run the api calls sequentially, i believe if you did the api requests in parallel, it would load much faster. also, you don't urlencode psnid, that's probably a bug. try this
<?php
$cmh = curl_multi_init ();
$curls = array ();
$data = $connection->query ( "SELECT * FROM bfplayers" );
while ( ($row = mysqli_fetch_assoc ( $data )) ) {
$psnid = $row ['psnid'];
$tmp = array ();
$tmp [0] = ($ch = curl_init ());
$tmp [1] = tmpfile ();
$curls [] = $tmp;
curl_setopt_array ( $ch, array (
CURLOPT_URL => "https://battlefieldtracker.com/bf1/api/Stats/BasicStats?platform=2&displayName=" . urlencode ( $psnid ),
CURLOPT_ENCODING => '',
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_SSL_VERIFYHOST => false,
CURLOPT_HTTPHEADER => array (
'TRN-Api-Key: MYKEY'
),
CURLOPT_FILE => $tmp [1]
) );
curl_multi_add_handle ( $cmh, $ch );
curl_multi_exec ( $cmh, $active );
}
do {
do {
$ret = curl_multi_exec ( $cmh, $active );
} while ( $ret == CURLM_CALL_MULTI_PERFORM );
curl_multi_select ( $cmh, 1 );
} while ( $active );
foreach ( $curls as $curr ) {
fseek ( $curr [1], 0, SEEK_SET ); // https://bugs.php.net/bug.php?id=76268
$response = stream_get_contents ( $curr [1] );
$result = json_decode ( $response,true );
print ($result ['profile'] ['displayName']) ;
}
// the rest is just cleanup, the client shouldn't have to wait for this
// OPTIMIZEME: apache version of fastcgi_finish_request() ?
if (is_callable ( 'fastcgi_finish_request' )) {
fastcgi_finish_request ();
}
foreach ( $curls as $curr ) {
curl_multi_remove_handle ( $cmh, $curr [0] );
curl_close ( $curr [0] );
fclose ( $curr [1] );
}
curl_multi_close ( $cmh );
it runs all api calls in parallel, and use transfer compression (CURLOPT_ENCODING), and runs api requests in parallel with downloading results from the db, and it tries to disconnect the client before running cleanup routines, it will probably run much faster.
also, if mysqli_fetch_assoc() are causing slow roundtrips to your db, it would probably be even faster to replace it with mysqli_fetch_all()
also, something that would probably be much faster than this, would be to have a cronjob run every minute (or every 10 seconds?) that caches the results, and show a cached result to the client. (even if the api calls lags, the client pageload wouldn't be affected at all.)
Related
I have a custom cURL function that has to download huge number of images from remote server. I was banned a couple of times before when I used file_get_contents(). I found that curl_multi_init() is better option as with 1 connection it can download for example 20 images at once.
I made a custom functions that uses curl_init() and I am trying to figure out how I can implement curl_multi_init() so in my LOOP where I grab the list of 20 URLs from the database I can call my custom function and at the last loop to use curl_close(). At the current situation my function generates connection for each url in the LOOP. Here is the function:
function downloadUrlToFile($remoteurl,$newfileName){
$errors = 0;
$options = array(
CURLOPT_FILE => fopen('../images/products/'.$newfileName, 'w'),
CURLOPT_TIMEOUT => 28800,
CURLOPT_URL => $remoteurl,
CURLOPT_RETURNTRANSFER => 1
);
$ch = curl_init();
curl_setopt_array($ch, $options);
$imageString =curl_exec($ch);
$image = imagecreatefromstring($imageString);
if($imageString !== false AND !empty($imageString)){
if ($image !== false){
$width_orig = imagesx($image);
if($width_orig > 1000){
$saveimage = copy_and_resize_remote_image_product($image,$newfileName);
}else $saveimage = file_put_contents('../images/products/'.$newfileName,$imageString);
}else $errors++;
}else $errors++;
curl_close($ch);
return $errors;
}
There has to be a way to use curl_multi_init() and my function downloadUrlToFile because:
I need to change the file name on the fly
In my function I am also checking several things for the remote image.. In the sample function I check the size only and resize it if neccessary but there is much more things done by this function (I cutted that part for shorter, but I also use the function to pass more variables..)
How should the code be changed so during the LOOP to connect only once to the remote server?
Thanks in advance
Try this pattern for Multi CURL
$urls = array($url_1, $url_2, $url_3);
$content = array();
$ch = array();
$mh = curl_multi_init();
foreach( $urls as $index => $url ) {
$ch[$index] = curl_init();
curl_setopt($ch[$index], CURLOPT_URL, $url);
curl_setopt($ch[$index], CURLOPT_HEADER, 0);
curl_setopt($ch[$index], CURLOPT_RETURNTRANSFER, true);
curl_multi_add_handle($mh, $ch[$index]);
}
$active = null;
for(;;) {
curl_multi_exec($mh, $active);
if($active < 1){
// all downloads completed
break;
}else{
// sleep-wait for more data to arrive on socket.
// (without this, we would be wasting 100% cpu of 1 core while downloading,
// with this, we'll be using like 1-2% cpu of 1 core instead.)
curl_multi_select($mh, 1);
}
}
foreach ( $ch AS $index => $c ) {
$content[$index] = curl_multi_getcontent($c);
curl_multi_remove_handle($mh, $c);
//You can add some functions here and use $content[$index]
}
curl_multi_close($mh);
UPDATE: Setup tested - it works - but my web-host cannot handle 600 email in about 6 seconds - I had each connection wait 20 seconds and then send one mail - those all went through
I have a mailing list with 600+ emails
I have a function to send out the 600+ emails
Unfortunately, there is a limit as to the execution time (90 seconds) - and therefore the script is shut down before it is completed. I cannot change the time with set_time_limit(0), as it is set by my web-host (not in an ini file that i can change either)
My solution is to make post requests from a main file to a sub file that will send out chunks of 100 mails at a time. But will these be sent without delay - or will they wait for an answer before sending the next request?
The code:
for($i=0;$i<$mails;$i+100) {
$url = 'http://www.bedsteforaeldreforasyl.dk/siteadmin/php/sender.php';
$myvars = 'start=' . $i . '&emne=' . $emne . '&besked=' . $besked;
$ch = curl_init( $url );
curl_setopt( $ch, CURLOPT_POST, 1);
curl_setopt( $ch, CURLOPT_POSTFIELDS, $myvars);
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt( $ch, CURLOPT_HEADER, 0);
curl_setopt( $ch, CURLOPT_SAFE_UPLOAD, 0);
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt( $ch, CURLOPT_TIMEOUT, 1);
$response = curl_exec( $ch );
curl_close($ch);
}
$mails is the total number of recipients
$start is the start row number i the SQL statement
Will this (as I hope) start 6 parallel connections - or will it (as I fear) start 6 procesesses each after the other?
In the receiving script I have:
<br>
ignore_user_abort(true);<br>
$q1 = "SELECT * FROM maillist LIMIT $start,100 ORDER BY navn";
Create six php scripts, one for each 100 emails (or pass a value (e.g. 0-5) to a single script).
Create a main script to call these six sub-scripts.
Use stream_socket_client() to call the sub-scripts.
The six scripts will run simultaneously.
You can catch anything echoed back by the sub-scripts (e.g. status).
$timeout = 120;
$buffer_size = 8192;
$result = array();
$sockets = array();
$id = 0;
header('Content-Type: text/plain; charset=utf-8');
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail1.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail2.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail3.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail4.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail5.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail6.php");
foreach($urls as $path){
$host = $path['host'];
$path = $path['path'];
$http = "GET $path HTTP/1.0\r\nHost: $host\r\n\r\n";
$stream = stream_socket_client("$host:80", $errno,$errstr, 120,STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT);
if ($stream) {
$sockets[] = $stream; // supports multiple sockets
fwrite($stream, $http);
}
else {
$err .= "$id Failed<br>\n";
}
}
echo $err;
while (count($sockets)) {
$read = $sockets;
stream_select($read, $write = NULL, $except = NULL, $timeout);
if (count($read)) {
foreach ($read as $r) {
$id = array_search($r, $sockets);
$data = fread($r, $buffer_size);
if (strlen($data) == 0) {
// echo "$id Closed: " . date('h:i:s') . "\n\n\n";
$closed[$id] = microtime(true);
fclose($r);
unset($sockets[$id]);
}
else {
$result[$id] .= $data;
}
}
}
else {
// echo 'Timeout: ' . date('h:i:s') . "\n\n\n";
break;
}
}
var_export($result);
I'll provide some ideas on how the objective can be achieved.
First Option - Use curl_multi_* suite of functions. It provides non-blocking cURL requests.
2 . Second Option - Use an asynchronous library like amphp or ReactPHP. Though it would essentially provide the same benefit as curl_multi_*, IIRC.
Use pcntl_fork() to create separate processes and distribute the job as in worker nodes.
Use pthreads extension, which essentially provides a userland PHP implementation of true multi-threading.
I'll warn you though, the last two options should be the last resort, since the parallel processing world comes up some spooky situations which can prove to be really pesky ;-).
I'd also probably suggest you that if you are planning to scale this sort of application, it'd be the best course of action to use some external service.
I have a simple CURL script that searches Google for "Batman", then saves the result in a file...
Can someone tell me a good way of iterating through the file to find each of the search results title and URL, please?
This is my code:
function get_remote_file_to_cache()
{
$the_site = "https://www.google.se/webhp?sourceid=chrome-instant&rlz=1C5CHFA_enSE555SE556&ion=1&espv=2&ie=UTF-8#newwindow=1&q=batman";
$curl = curl_init ();
$fp = fopen ( "temp_file.txt", "w" );
curl_setopt ( $curl, CURLOPT_URL, $the_site );
curl_setopt ( $curl, CURLOPT_FILE, $fp );
curl_setopt ( $curl, CURLOPT_RETURNTRANSFER, TRUE );
curl_exec ( $curl );
$httpCode = curl_getinfo ( $curl, CURLINFO_HTTP_CODE );
if ($httpCode == 404)
{
touch ( 'cache/404_err.txt' );
} /*
* else { touch('cache/'.rand(0, 99999).'--all_good.txt'); }
*/
else
{
$contents = curl_exec ( $curl );
fwrite ( $fp, $contents );
}
curl_close ( $curl );
fclose ( $fp );
}
echo rand(1, 425).get_remote_file_to_cache();
You can search trough the HTML using DOMDocument and DOMXPath
// Temp:
$sPageHTML = '<html><head></head><body><div class="test">Text here</div></body></html>';
$oDomDocument = new DOMDocument ( );
$oDomDocument->loadHTML ( $sPageHTML );
// Now, search the DOM structure for all divs with class "test".
$oXPath = new DOMXPath ( $oDomDocument );
$results = $oXPath->query ( '//div[#class="test"]' );
// Loop through the results.
foreach ( $results as $result )
{
echo 'Innertext: ' . $result->nodeValue;
}
Good luck
If you are still searching, you can find an open source php google scraper here:
http://scraping.compunect.com/?scrape-google-search (scroll to bottom for the code)
You can just copy the DOM parsing routines from it, they work very well.
I've got on my server PHP file, which download something using curl from another server, and save it to db in nested php function. This process is little time-consuming, when I open it in my browser, I must wait ca. 1 minute, but all downloaded records are correct.
Problem is in CRON wget/curl download. When I use
wget http://myserver/myscript.php, or curl http://myserver/myscript.php, connection is closed after 1 byte, and nothing happens on server...
Where make I mistake? Maybe some headers? Why wget/curl don't wait on end of my PHP function like browser? I hope, that require of wp-load.php (I must use for it some Wordpress functions) isn't problem?
Many thanks for responses
Code:
<?php
define('WP_USE_THEMES', false);
require_once("wp-load.php");
$licznik = 0;
// WP_Query arguments
$args = array (
'post_type' => array( 'easy-rooms' ),
'posts_per_page' => 30
);
// The Query
$query = new WP_Query( $args );
// The Loop
if ( $query->have_posts() ) {
while ( $query->have_posts() ) {
$query->the_post();
$fields = array(
"funkcja" => "lista_rezerwacji",
"id_pokoju" => get_the_ID()
);
$postvars = '';
foreach($fields as $key=>$value) {
$postvars .= $key . "=" . $value . "&";
}
rtrim($fields_string, '&');
$url = "http://some.remote.script.to.download.sth.by.curl";
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_POST, 1); //0 for a get request
curl_setopt($ch,CURLOPT_POSTFIELDS, $postvars);
curl_setopt($ch,CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT ,3);
curl_setopt($ch,CURLOPT_TIMEOUT, 120);
$response = curl_exec($ch);
curl_close ($ch);
echo $response;
//THIS IS FUNCTION IN SOME WORDPRESS PLUGIN, WHICH DOESN'T WORK WHEN I WGET/CURL THIS SCRIPT
set_reservations($response, get_the_ID());
$licznik++;
}
} else {
// no posts found
}
// Restore original Post Data
print_r($licznik);
wp_reset_postdata();
?>
I can't allow for file_get_contents to work more than 1 second, if it is not possible - I need to skip to next loop.
for ($i = 0; $i <=59; ++$i) {
$f=file_get_contents('http://example.com');
if(timeout<1 sec) - do something and loop next;
else skip file_get_contents(), do semething else, and loop next;
}
Is it possible to make a function like this?
Actually I'm using curl_multi and I can't fugure out how to set timeout on a WHOLE curl_multi request.
If you are working with http urls only you can do the following:
$ctx = stream_context_create(array(
'http' => array(
'timeout' => 1
)
));
for ($i = 0; $i <=59; $i++) {
file_get_contents("http://example.com/", 0, $ctx);
}
However, this is just the read timeout, meaning the time between two read operations (or the time before the first read operation). If the download rate is constant, there should not being such gaps in the download rate and the download can take even an hour.
If you want the whole download not take more than a second you can't use file_get_contents() anymore. I would encourage to use curl in this case. Like this:
// create curl resource
$ch = curl_init();
for($i=0; $i<59; $i++) {
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
// set timeout
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
}
$ctx = stream_context_create(array(
'http' => array(
'timeout' => 1
)
)
);
file_get_contents("http://example.com/", 0, $ctx);
Source