Use curl instead of file_get_contents in function

Use curl instead of file_get_contents in function - php

Below my code for loading and storing a xml-feed. Important is also the time-out in case the feed is offline or responds to slow.
Some users do not have file_get_contents enabled. I'm looking for a way to change this to curl or do a check and use the one that is enabled. And not lose the functionality to set a time out. Any ideas?
function feeder()
{
$cache_time = 3600 * 12; // 12 hours
$cache_file = plugin_dir_path(__FILE__) . '/cache/feed.xml';
$timedif = #(time() - filemtime($cache_file));
$fc_xml_options = get_option('fc_xml_options');
$xml_feed = $fc_xml_options['feed'];
// remove white space(s) and/or space(s) from connector code
$xml_feed = str_replace(' ', '', $xml_feed);
$xml_feed = preg_replace('/\s+/', '', $xml_feed);
if (file_exists($cache_file) && $timedif < $cache_time)
{
$string = file_get_contents($cache_file);
}
else
{
// set a time-out (5 sec) on fetch feed
$xml_context = array( 'http' => array(
'timeout' => 5,
) );
$pure_context = stream_context_create($xml_context);
$string = file_get_contents($xml_feed, false, $pure_context);
if ($f = #fopen($cache_file, 'w'))
{
fwrite($f, $string, strlen($string));
fclose($f);
}
}

You can use curl to read local file. Just change url to file path prefixed by file://
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "file://full_path_to_file");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
curl_close($ch);

Related

How can I deactivate cookies used for authentication in php?

I would like to apologize in advance if this is a stupid question, but I am a junior developer starting a new job (and yes, very afraid of making a huge mistake). Most of my expertise is in Python, SQL, JavaScript, CSS and HTML. However, in my job I've been tasked with deactivating cookies in their website (they have to because of privacy laws in Europe). Some of the pages' backends are written in javascript and I was able to find the cookies and deactivate them, but some are written in php. I can tell what the code is and what it does, but since I've never dealt with php before, I'm not sure if I should just delete the script or if I should modify it in any way. Any help or advice will be greatly appreciated. This is the code (it is in its own file):
<?php
// Real-time Data Aggregation (RDA)
// error_reporting( E_ALL );
// ini_set('display_errors', 1);
class RDA {
private $session_cookie = '';
private $log_site = '';
private $config = array();
private $raw_payload = '';
private $payload = array();
private $publish_path_map = array();
public function __construct($config){
$this->config = $config;
}
public function process(){
$this->raw_payload = file_get_contents('php://input');
if(!$this->is_json($this->raw_payload)){
echo 'Expected payload was not provided. Script has been aborted.';
return;
}
$this->payload = json_decode($this->raw_payload);
if(array_key_exists('passed_through_rda', $this->payload) && $this->payload->passed_through_rda == 'true') return; // If this had previously passed through a RDA script so let's abort to prevent recursion.
if($this->is_test_payload()) return; // When the Test button is clicked from account settings simply echo back the payload and abort.
$this->send_next_webhook_request(); // forward payload to another webhook listener.
if($this->payload->finished != 'true') return; // we only want to react when the event has finished and not when it has been started.
$this->set_publish_path_map(); // sets up an index of publish paths to use as reference to prevent publish recursion.
foreach($this->config['actions'] as $action){
if(!$this->payload_contains_trigger_path($action)) continue; // payload does not contain trigger path so end execution.
$this->authenicate();
$this->publish($action);
}
$this->log_request();
}
private function authenicate(){
if($session_cookie != '') return; // session cookie was already created so exit authenication.
$endpoint = $this->config['ouc_base_url'] . '/authentication/login';
$config = array(
'skin' => $this->config['skin'],
'account' => $this->config['account'],
'username' => $this->config['username'],
'password' => $this->config['password']
);
$post_fields = http_build_query($config);
$cURLConnection = curl_init($endpoint);
curl_setopt($cURLConnection, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($cURLConnection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($cURLConnection, CURLOPT_HEADER, true);
$api_response = curl_exec($cURLConnection);
$header = curl_getinfo( $cURLConnection );
curl_close($cURLConnection);
$header_content = substr($api_response, 0, $header['header_size']);
$pattern = "#Set-Cookie:\\s+(?<cookie>[^=]+=[^;]+)#m";
preg_match_all($pattern, $header_content, $matches);
$this->session_cookie = implode("; ", $matches['cookie']);
}
private function publish($action){
$endpoint = '/files/publish';
$config = array(
'site' => $action['site'],
'path' => $action['publish_path'],
'include_scheduled_publish' => 'true',
'include_checked_out' => 'true'
);
$this->log_site = $action['site']; // set a site to use to create log files if logging is turned on.
$this->send($endpoint, $config);
}
private function set_publish_path_map(){
foreach($this->config['actions'] as $action){
$this->publish_path_map[$action['site'] . $action['publish_path']] = 1;
}
}
private function log_request(){
if($this->config['log'] != 'true' || $this->log_site == '') return; // don't log when logging turned or if log_site not set
$log_id = uniqid();
$endpoint = '/files/save';
$config = array(
'site' => $this->log_site,
'path' => $this->config['config_file'], // uses the config PCF to do a "save as" to a log file
'new_path' => $this->get_root_relative_folderpath() . '_log/' . $log_id . '.txt',
'text' => $this->raw_payload
);
$this->send($endpoint, $config);
}
private function send_next_webhook_request(){
$next_webhook_url = trim($this->config['next_webhook_url']);
if($next_webhook_url == '') return; // next_webhook_url not entered so just return.
$this->payload->passed_through_rda = 'true';
$connection = curl_init($next_webhook_url);
curl_setopt($connection, CURLOPT_POSTFIELDS, json_encode($this->payload, JSON_UNESCAPED_SLASHES));
curl_setopt($connection, CURLOPT_RETURNTRANSFER, true);
$api_response = curl_exec($connection);
curl_close($connection);
}
private function send($endpoint, $config){
$endpoint = $this->config['ouc_base_url'] . $endpoint;
$post_fields = http_build_query($config);
$connection = curl_init($endpoint);
curl_setopt($connection, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($connection, CURLOPT_COOKIE, $this->session_cookie);
$api_response = curl_exec($connection);
curl_close($connection);
}
private function payload_contains_trigger_path($action){
$site = $action['site'];
$success = array(); // the success node in the webhook payload contains files that were published.
if(!array_key_exists($site, $this->payload->success)) return false; // no success array so just return false.
$success = $this->payload->success->{$site};
$published_paths = array();
foreach($success as $i){
if(!array_key_exists($site . $i->path, $this->publish_path_map)) $published_paths[] = $i->path; // only include paths that aren't also publish targets configured in this script to avoid publish recursion.
}
$trigger_paths = $action['trigger_path'];
$trigger_paths = explode(',', $trigger_paths);
foreach($trigger_paths as $trigger_path){
$trigger_path = trim($trigger_path);
$trigger_path = preg_replace('/(.)[\/]+$/', '$1', $trigger_path); // removes trailing slash unless the value is the string length is 1, for instance: '/'
if($trigger_path == '') continue;
foreach($published_paths as $path){
if($this->starts_with($path, $trigger_path)) return true;
}
}
return false;
}
private function is_test_payload(){
$account = $this->payload->account;
if($account == '<account name>'){ // This is the account name value used by the test http request.
echo $this->raw_payload;
return true;
}
return false;
}
private function is_json($string){
if(trim($string) == '') return false;
json_decode($string);
return (json_last_error() == JSON_ERROR_NONE);
}
private function starts_with($string, $startString){
$len = strlen($startString);
return (substr($string, 0, $len) === $startString);
}
private function get_root_relative_folderpath(){
$result = $this->get_root_relative_filepath();
$result = str_replace('\\', '/', $result);
$result = preg_replace('/[^\/]+$/', '', $result);
return $result;
}
private function get_root_relative_filepath(){
$result = str_replace($_SERVER['DOCUMENT_ROOT'], '', $_SERVER['SCRIPT_FILENAME']);
return $result;
}
}
?>
For clarification: they have a service that manages cookies and they were able to turn those off, but there are a number of cookies that are persisting, and they are being generated by scripts leftover from years ago (I have no idea who wrote this code, or how old it is) and they need to be deleted. I just want to make sure that if I delete something it won't cause other bugs on the website

Method to close all the cookies and sessions
i think you have start the sessions session_start()
session_start();
you can read the documentation here
//http://php.net/manual/en/function.setcookie.php#73484
To destory and close the sessions try the code below
the below method will help to unset the cookies serving in your php program
if (isset($_SERVER['HTTP_COOKIE'])) {
$cookies = explode(';', $_SERVER['HTTP_COOKIE']);
foreach($cookies as $cookie) {
$parts = explode('=', $cookie);
$name = trim($parts[0]);
setcookie($name, '', time()-1000);
setcookie($name, '', time()-1000, '/');
}
}
session_destroy();

PHP multiple post request

UPDATE: Setup tested - it works - but my web-host cannot handle 600 email in about 6 seconds - I had each connection wait 20 seconds and then send one mail - those all went through
I have a mailing list with 600+ emails
I have a function to send out the 600+ emails
Unfortunately, there is a limit as to the execution time (90 seconds) - and therefore the script is shut down before it is completed. I cannot change the time with set_time_limit(0), as it is set by my web-host (not in an ini file that i can change either)
My solution is to make post requests from a main file to a sub file that will send out chunks of 100 mails at a time. But will these be sent without delay - or will they wait for an answer before sending the next request?
The code:
for($i=0;$i<$mails;$i+100) {
$url = 'http://www.bedsteforaeldreforasyl.dk/siteadmin/php/sender.php';
$myvars = 'start=' . $i . '&emne=' . $emne . '&besked=' . $besked;
$ch = curl_init( $url );
curl_setopt( $ch, CURLOPT_POST, 1);
curl_setopt( $ch, CURLOPT_POSTFIELDS, $myvars);
curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt( $ch, CURLOPT_HEADER, 0);
curl_setopt( $ch, CURLOPT_SAFE_UPLOAD, 0);
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 0);
curl_setopt( $ch, CURLOPT_TIMEOUT, 1);
$response = curl_exec( $ch );
curl_close($ch);
}
$mails is the total number of recipients
$start is the start row number i the SQL statement
Will this (as I hope) start 6 parallel connections - or will it (as I fear) start 6 procesesses each after the other?
In the receiving script I have:
<br>
ignore_user_abort(true);<br>
$q1 = "SELECT * FROM maillist LIMIT $start,100 ORDER BY navn";

Create six php scripts, one for each 100 emails (or pass a value (e.g. 0-5) to a single script).
Create a main script to call these six sub-scripts.
Use stream_socket_client() to call the sub-scripts.
The six scripts will run simultaneously.
You can catch anything echoed back by the sub-scripts (e.g. status).
$timeout = 120;
$buffer_size = 8192;
$result = array();
$sockets = array();
$id = 0;
header('Content-Type: text/plain; charset=utf-8');
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail1.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail2.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail3.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail4.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail5.php");
$urls[] = array('host' => 'www.example.com','path' => "http://www.example.com/mail6.php");
foreach($urls as $path){
$host = $path['host'];
$path = $path['path'];
$http = "GET $path HTTP/1.0\r\nHost: $host\r\n\r\n";
$stream = stream_socket_client("$host:80", $errno,$errstr, 120,STREAM_CLIENT_ASYNC_CONNECT|STREAM_CLIENT_CONNECT);
if ($stream) {
$sockets[] = $stream; // supports multiple sockets
fwrite($stream, $http);
}
else {
$err .= "$id Failed<br>\n";
}
}
echo $err;
while (count($sockets)) {
$read = $sockets;
stream_select($read, $write = NULL, $except = NULL, $timeout);
if (count($read)) {
foreach ($read as $r) {
$id = array_search($r, $sockets);
$data = fread($r, $buffer_size);
if (strlen($data) == 0) {
// echo "$id Closed: " . date('h:i:s') . "\n\n\n";
$closed[$id] = microtime(true);
fclose($r);
unset($sockets[$id]);
}
else {
$result[$id] .= $data;
}
}
}
else {
// echo 'Timeout: ' . date('h:i:s') . "\n\n\n";
break;
}
}
var_export($result);

I'll provide some ideas on how the objective can be achieved.
First Option - Use curl_multi_* suite of functions. It provides non-blocking cURL requests.
2 . Second Option - Use an asynchronous library like amphp or ReactPHP. Though it would essentially provide the same benefit as curl_multi_*, IIRC.
Use pcntl_fork() to create separate processes and distribute the job as in worker nodes.
Use pthreads extension, which essentially provides a userland PHP implementation of true multi-threading.
I'll warn you though, the last two options should be the last resort, since the parallel processing world comes up some spooky situations which can prove to be really pesky ;-).
I'd also probably suggest you that if you are planning to scale this sort of application, it'd be the best course of action to use some external service.

Using curl as an alternative to fopen file resource for fgetcsv

Is it possible to make curl, access a url and the result as a file resource? like how fopen does it.
My goals:
Parse a CSV file
Pass it to fgetcsv
My obstruction: fopen is disabled
My chunk of codes (in fopen)
$url = "http://download.finance.yahoo.com/d/quotes.csv?s=USDEUR=X&f=sl1d1t1n&e=.csv";
$f = fopen($url, 'r');
print_r(fgetcsv($f));
Then, I am trying this on curl.
$curl = curl_init();
curl_setopt($curl, CURLOPT_VERBOSE, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $param);
curl_setopt($curl, CURLOPT_URL, $url);
$content = #curl_exec($curl);
curl_close($curl);
But, as usual. $content already returns a string.
Now, is it possible for curl to return it as a file resource pointer? just like fopen? Using PHP < 5.1.x something. I mean, not using str_getcsv, since it's only 5.3.
My error
Warning: fgetcsv() expects parameter 1 to be resource, boolean given
Thanks

Assuming that by fopen is disabled you mean "allow_url_fopen is disabled", a combination of CURLOPT_FILE and php://temp make this fairly easy:
$f = fopen('php://temp', 'w+');
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_FILE, $f);
// Do you need these? Your fopen() method isn't a post request
// curl_setopt($curl, CURLOPT_POST, true);
// curl_setopt($curl, CURLOPT_POSTFIELDS, $param);
curl_exec($curl);
curl_close($curl);
rewind($f);
while ($line = fgetcsv($f)) {
print_r($line);
}
fclose($f);
Basically this creates a pointer to a "virtual" file, and cURL stores the response in it. Then you just reset the pointer to the beginning and it can be treated as if you had opened it as usual with fopen($url, 'r');

You can create a temporary file using fopen() and then fwrite() the contents into it. After that, the newly created file will be readable by fgetcsv(). The tempnam() function should handle the creation of arbitrary temporary files.
According to the comments on str_getcsv(), users without access to the command could try the function below. There are also various other approaches in the comments, make sure you check them out.
function str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = '\\', $eol = '\n') {
if (is_string($input) && !empty($input)) {
$output = array();
$tmp = preg_split("/".$eol."/",$input);
if (is_array($tmp) && !empty($tmp)) {
while (list($line_num, $line) = each($tmp)) {
if (preg_match("/".$escape.$enclosure."/",$line)) {
while ($strlen = strlen($line)) {
$pos_delimiter = strpos($line,$delimiter);
$pos_enclosure_start = strpos($line,$enclosure);
if (
is_int($pos_delimiter) && is_int($pos_enclosure_start)
&& ($pos_enclosure_start < $pos_delimiter)
) {
$enclosed_str = substr($line,1);
$pos_enclosure_end = strpos($enclosed_str,$enclosure);
$enclosed_str = substr($enclosed_str,0,$pos_enclosure_end);
$output[$line_num][] = $enclosed_str;
$offset = $pos_enclosure_end+3;
} else {
if (empty($pos_delimiter) && empty($pos_enclosure_start)) {
$output[$line_num][] = substr($line,0);
$offset = strlen($line);
} else {
$output[$line_num][] = substr($line,0,$pos_delimiter);
$offset = (
!empty($pos_enclosure_start)
&& ($pos_enclosure_start < $pos_delimiter)
)
?$pos_enclosure_start
:$pos_delimiter+1;
}
}
$line = substr($line,$offset);
}
} else {
$line = preg_split("/".$delimiter."/",$line);
/*
* Validating against pesky extra line breaks creating false rows.
*/
if (is_array($line) && !empty($line[0])) {
$output[$line_num] = $line;
}
}
}
return $output;
} else {
return false;
}
} else {
return false;
}
}

HTTP response code after redirect

There is a redirect to server for information and once response comes from server, I want to check HTTP code to throw an exception if there is any code starting with 4XX. For that I need to know how can I get only HTTP code from header? Also here redirection to server is involved so I afraid curl will not be useful to me.
So far I have tried this solution but it's very slow and creates script time out in my case. I don't want to increase script time out period and wait longer just to get an HTTP code.
Thanks in advance for any suggestion.

Your method with get_headers and requesting the first response line will return the status code of the redirect (if any) and more importantly, it will do a GET request which will transfer the whole file.
You need only a HEAD request and then to parse the headers and return the last status code. Following is a code example that does this, it's using $http_response_header instead of get_headers, but the format of the array is the same:
$url = 'http://example.com/';
$options['http'] = array(
'method' => "HEAD",
'ignore_errors' => 1,
);
$context = stream_context_create($options);
$body = file_get_contents($url, NULL, $context);
$responses = parse_http_response_header($http_response_header);
$code = $responses[0]['status']['code']; // last status code
echo "Status code (after all redirects): $code<br>\n";
$number = count($responses);
$redirects = $number - 1;
echo "Number of responses: $number ($redirects Redirect(s))<br>\n";
if ($redirects)
{
$from = $url;
foreach (array_reverse($responses) as $response)
{
if (!isset($response['fields']['LOCATION']))
break;
$location = $response['fields']['LOCATION'];
$code = $response['status']['code'];
echo " * $from -- $code --> $location<br>\n";
$from = $location;
}
echo "<br>\n";
}
/**
* parse_http_response_header
*
* #param array $headers as in $http_response_header
* #return array status and headers grouped by response, last first
*/
function parse_http_response_header(array $headers)
{
$responses = array();
$buffer = NULL;
foreach ($headers as $header)
{
if ('HTTP/' === substr($header, 0, 5))
{
// add buffer on top of all responses
if ($buffer) array_unshift($responses, $buffer);
$buffer = array();
list($version, $code, $phrase) = explode(' ', $header, 3) + array('', FALSE, '');
$buffer['status'] = array(
'line' => $header,
'version' => $version,
'code' => (int) $code,
'phrase' => $phrase
);
$fields = &$buffer['fields'];
$fields = array();
continue;
}
list($name, $value) = explode(': ', $header, 2) + array('', '');
// header-names are case insensitive
$name = strtoupper($name);
// values of multiple fields with the same name are normalized into
// a comma separated list (HTTP/1.0+1.1)
if (isset($fields[$name]))
{
$value = $fields[$name].','.$value;
}
$fields[$name] = $value;
}
unset($fields); // remove reference
array_unshift($responses, $buffer);
return $responses;
}
For more information see: HEAD first with PHP Streams, at the end it contains example code how you can do the HEAD request with get_headers as well.
Related: How can one check to see if a remote file exists using PHP?

Something like:
$ch = curl_init();
$httpcode = curl_getinfo ($ch, CURLINFO_HTTP_CODE );
You should try the HttpEngine Class.
Hope this helps.
--
EDIT
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $your_agent_variable);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $your_referer);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$output = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpcode ...)

The solution you found looks good. If the server is not able to send you the http headers in time your problem is that the other server is broken or under very heavy load.

multi curl unable to handle more than 200 request at a time

Could you please tell me , is there any limitation to send a request using multi_curl.
When I tried to send a request more than 200 , it was getting timeout.
see the below code ..............
.........................................
foreach($newUrlArry as $url){
$gatherUrl[] = $url['url'];
}
/*...................Array slice----------------------*/
$totalUrlRequest = count($gatherUrl);
if($totalUrlRequest > 10){
$offset = 10;
$index = 0;
$matchedAnchors = array();
$dom = new DOMDocument;
$NoOfTilesRequest = ceil($totalUrlRequest/$offset);
for($sl = 0; $sl<$NoOfTilesRequest;$sl++){
$output = array_slice($gatherUrl, $index, $offset);
$index = $offset+$index;
$responseAction = $this->multiRequestAction($output);
$k=0;
foreach($responseAction as $responseHtml){
#$dom->loadHTML($responseHtml);
$documentLinks = $dom->getElementsByTagName("a");
$chieldFlag = false;
for($i=0;$i<$documentLinks->length;$i++) {
$documentLink = $documentLinks->item($i);
if ($documentLink->hasAttribute('href') AND substr($documentLink->getAttribute('href'), 0, strlen($match)) == $match) {
$description = $documentLink->childNodes;
foreach($description as $words) {
$name = trim($words->nodeName);
if($name == 'em' || $name == 'b' || $name=="span" || $name=="p") {
if(!empty($words->nodeValue)) {
$matchedAnchors[$sl][$k]['anchor'] = trim($words->nodeValue);
$matchedAnchors[$sl][$k]['img'] = 0;
if($documentLink->hasAttribute('rel'))
$matchedAnchors[$sl][$k]['rel'] = 'Y';
else
$matchedAnchors[$sl][$k]['rel'] = 'N';
$chieldFlag = true;
break;
}
}
elseif($name == 'img' ) {
$alt= $words->getAttribute('alt');
if(!empty($alt)) {
$matchedAnchors[$sl][$k]['anchor'] = trim($words->getAttribute('alt'));
$matchedAnchors[$sl][$k]['img'] = 1;
if($documentLink->hasAttribute('rel'))
$matchedAnchors[$sl][$k]['rel'] = 'Y';
else
$matchedAnchors[$sl][$k]['rel'] = 'N';
$chieldFlag = true;
break;
}
}
}
if(!$chieldFlag){
$matchedAnchors[$sl][$k]['anchor'] = $documentLink->nodeValue;
$matchedAnchors[$sl][$k]['img'] = 0;
if($documentLink->hasAttribute('rel'))
$matchedAnchors[$sl][$k]['rel'] = 'Y';
else
$matchedAnchors[$sl][$k]['rel'] = 'N';
}
}
}$k++;
}
}
}

Both #Phliplip & #lunixbochs have mentioned common cURL pitfalls (max execution time & denied by the target server.)
When sending that many cURL requests to the same server I try to "be nice" and place voluntarily sleep periods so I don't bombard the host. For a low-end site, 1000+ requests could be like a mini DDOS!
Here's code that's worked for me. I it used to scrape a client's product data from their old site, since the data was locked in a proprietary database system with NO export function.
<?php
header('Content-type: text/html; charset=utf-8', true);
set_time_limit(0);
$urls = array(
'http://www.example.com/cgi-bin/product?id=500',
'http://www.example.com/cgi-bin/product?id=501',
'http://www.example.com/cgi-bin/product?id=502',
'http://www.example.com/cgi-bin/product?id=503',
'http://www.example.com/cgi-bin/product?id=504',
);
$i = 0;
foreach($urls as $url){
echo $url."\n";
$curl = curl_init($url);
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
curl_setopt($curl, CURLOPT_USERAGENT, $userAgent);
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1 );
curl_setopt($curl, CURLOPT_TIMEOUT, 25 );
$html = curl_exec($curl);
$html = #mb_convert_encoding($html, 'HTML-ENTITIES', 'utf-8');
curl_close($curl);
// now do something with info returned by curl
$i++;
if($i%10==0){
sleep(20);
} else {
sleep(2);
}
}
?>
The main features are:
no max execution time
voluntary sleep-ing
new curl init & exec for each request.
In my experience, going to sleep() will stop servers from denying you.
However if by "different different server" you mean that you are sending a small number of requests a large number of servers, for example:
$urls = array(
'http://www.example-one.com/',
'http://www.example-two.com/',
'http://www.example-three.com/',
'http://www.example-four.com/',
'http://www.example-five.com/',
'http://www.example-six.com/'
);
And you are using set_time_limit(0); then something then an error may be causing your code to die; try
ini_set('display_errors',1);
error_reporting(E_ALL);
And tell us the error message you are getting.

PHP doesn't place a restriction on the number of connections using curl_multi_init, but memory usage and time limits will be an issue.
Check your memory_limit setting in your php.ini and try to increase it to see if that helps you.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Use curl instead of file_get_contents in function - php

You can use curl to read local file. Just change url to file path prefixed by file:// $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "file://full_path_to_file"); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // $output contains the output string $output = curl_exec($ch); curl_close($ch);

Related

How can I deactivate cookies used for authentication in php?

PHP multiple post request

Using curl as an alternative to fopen file resource for fgetcsv

HTTP response code after redirect

multi curl unable to handle more than 200 request at a time

Categories

Resources