I am trying to login to a site and then call numerous URLs to get the source and scrape for images. It works fine using regular curl but when I try to use multi_curl I am getting back the exact same response. So that I only have to login once I am resuing the curl resource (this works fine with regular curl) and I think this may be the reason why it is returning the same response.
Does anyone know how to use multi_curl but authenticate first?
Here is the code I am using:
<?php
// LICENSE: PUBLIC DOMAIN
// The author disclaims copyright to this source code.
// AUTHOR: Shailesh N. Humbad
// SOURCE: http://www.somacon.com/p539.php
// DATE: 6/4/2008
// index.php
// Run the parallel get and print the total time
$s = microtime(true);
// Define the URLs
$urls = array(
"http://localhost/r.php?echo=request1",
"http://localhost/r.php?echo=request2",
"http://localhost/r.php?echo=request3"
);
$pg = new ParallelGet($urls);
print "<br />total time: ".round(microtime(true) - $s, 4)." seconds";
// Class to run parallel GET requests and return the transfer
class ParallelGet
{
function __construct($urls)
{
// Create get requests for each URL
$mh = curl_multi_init();
$count = 0;
$ch = curl_init();
foreach($urls as $i => $url)
{
$count++;
if($count == 1)
{
// SET URL FOR THE POST FORM LOGIN
curl_setopt($ch, CURLOPT_URL, 'https://www.example.com/login.php');
// ENABLE HTTP POST
curl_setopt ($ch, CURLOPT_POST, 1);
// SET POST PARAMETERS : FORM VALUES FOR EACH FIELD
curl_setopt ($ch, CURLOPT_POSTFIELDS, 'user=myuser&password=mypassword');
// IMITATE CLASSIC BROWSER'S BEHAVIOUR : HANDLE COOKIES
curl_setopt ($ch, CURLOPT_COOKIEJAR, realpath($_SERVER['DOCUMENT_ROOT']) . '/cookie.txt');
# Setting CURLOPT_RETURNTRANSFER variable to 1 will force cURL
# not to print out the results of its query.
# Instead, it will return the results as a string return value
# from curl_exec() instead of the usual true/false.
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
// EXECUTE 1st REQUEST (FORM LOGIN)
curl_exec ($ch);
}
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_COOKIEFILE, realpath($_SERVER['DOCUMENT_ROOT']) . '/cookie.txt');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$ch_array[$i] = $ch;
curl_multi_add_handle($mh, $ch_array[$i]);
}
// Start performing the request
do {
$execReturnValue = curl_multi_exec($mh, $runningHandles);
} while ($execReturnValue == CURLM_CALL_MULTI_PERFORM);
// Loop and continue processing the request
while ($runningHandles && $execReturnValue == CURLM_OK) {
// Wait forever for network
$numberReady = curl_multi_select($mh);
if ($numberReady != -1) {
// Pull in any new data, or at least handle timeouts
do {
$execReturnValue = curl_multi_exec($mh, $runningHandles);
} while ($execReturnValue == CURLM_CALL_MULTI_PERFORM);
}
}
// Check for any errors
if ($execReturnValue != CURLM_OK) {
trigger_error("Curl multi read error $execReturnValue\n", E_USER_WARNING);
}
// Extract the content
foreach($urls as $i => $url)
{
// Check for errors
$curlError = curl_error($ch_array[$i]);
if($curlError == "") {
$res[$i] = curl_multi_getcontent($ch_array[$i]);
} else {
print "Curl error on handle $i: $curlError\n";
}
// Remove and close the handle
curl_multi_remove_handle($mh, $ch_array[$i]);
curl_close($ch_array[$i]);
}
// Clean up the curl_multi handle
curl_multi_close($mh);
// Print the response data
print_r($res);
}
}
?>
you need to enable/use cookies with curl as well. look for it on the documentation, don't forget to create the cookies (empty files) with read and write permission for curl.
$cookie = tempnam ("/tmp", "CURLCOOKIE");
$ch = curl_init();
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt( $ch, CURLOPT_COOKIEJAR, $cookie );
Related
I would like to have a lottery check page written in php. The code does not work with the Hungarian lottery database ($ url2) but works with the other ($ url1). Too much data is the problem?
<?php
echo "CURL - function test <br>";
$url1 = "http://www.example.com";
$url2 = "https://bet.szerencsejatek.hu/cmsfiles/otos.html";
function curl_download($Url){
// is cURL installed yet?
if (!function_exists('curl_init')){
die('Sorry cURL is not installed!');
}
// OK cool - then let's create a new cURL resource handle
$ch = curl_init();
// Now set some options (most are optional)
// Set URL to download
curl_setopt($ch, CURLOPT_URL, $Url);
// Set a referer
curl_setopt($ch, CURLOPT_REFERER, "http://www.example.org/yay.htm");
// User agent
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
// Include header in result? (0 = yes, 1 = no)
curl_setopt($ch, CURLOPT_HEADER, 0);
// Should cURL return or print out the data? (true = return, false = print)
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Timeout in seconds
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
// Download the given URL, and return output
$output = curl_exec($ch);
// Close the cURL resource, and free system resources
curl_close($ch);
return $output;
}
echo curl_download($url2);
echo strlen(curl_download($url2));
The first thing that it depends on what the error is.
I think you should dump the result of CURL work. Something like
if (!curl_errno($ch)) {
switch ($http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE)) {
case 200: # OK
$return = ['result' => 'ok', 'response_text' => $result];
break;
default:
$return = ['result' => 'unexpected_http_code', 'http_code' => $http_code,
'response_text' => $result
];
}
} else {
$return = ['result' => 'curl_error', 'curl_error' => curl_error($ch)];
}
Maybe it's because you didn't configure your SSL settings because the second URL starts with https://
i am using the ABBY API for OCR and i want to get the results in a variable for further processing instead of downloading the result as a file
<?php
include_once("dBug.php");
// Name of application you created
$applicationId = 'telianewtest';
// Password should be sent to your e-mail after application was created
$password = 'w0Ye61tWZ6fODm7hIUj9XTeJ';
$fileName = '20080118155747372_Page_2.jpg';
// Get path to file that we are going to recognize
$local_directory=dirname(__FILE__).'/images/';
$filePath = $local_directory.'/'.$fileName;
if(!file_exists($filePath))
{
die('File '.$filePath.' not found.');
}
if(!is_readable($filePath) )
{
die('Access to file '.$filePath.' denied.');
}
// Recognizing with English language to rtf
// You can use combination of languages like ?language=english,russian or
// ?language=english,french,dutch
// For details, see API reference for processImage method
$url = 'http://cloud.ocrsdk.com/processImage?language=english&exportFormat=xml';
// Send HTTP POST request and ret xml response
$curlHandle = curl_init();
curl_setopt($curlHandle, CURLOPT_URL, $url);
curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curlHandle, CURLOPT_USERPWD, "$applicationId:$password");
curl_setopt($curlHandle, CURLOPT_POST, 1);
curl_setopt($curlHandle, CURLOPT_USERAGENT, "PHP Cloud OCR SDK Sample");
$post_array = array(
"my_file"=>"#".$filePath,
);
curl_setopt($curlHandle, CURLOPT_POSTFIELDS, $post_array);
$response = curl_exec($curlHandle);
if($response == FALSE) {
$errorText = curl_error($curlHandle);
curl_close($curlHandle);
die($errorText);
}
$httpCode = curl_getinfo($curlHandle, CURLINFO_HTTP_CODE);
curl_close($curlHandle);
// Parse xml response
$xml = simplexml_load_string($response);
if($httpCode != 200) {
if(property_exists($xml, "message")) {
die($xml->message);
}
die("unexpected response ".$response);
}
$arr = $xml->task[0]->attributes();
$taskStatus = $arr["status"];
if($taskStatus != "Queued") {
die("Unexpected task status ".$taskStatus);
}
// Task id
$taskid = $arr["id"];
// 4. Get task information in a loop until task processing finishes
// 5. If response contains "Completed" staus - extract url with result
// 6. Download recognition result (text) and display it
$url = 'http://cloud.ocrsdk.com/getTaskStatus';
$qry_str = "?taskid=$taskid";
// Check task status in a loop until it is finished
// TODO: support states indicating error
while(true)
{
sleep(5);
$curlHandle = curl_init();
curl_setopt($curlHandle, CURLOPT_URL, $url.$qry_str);
curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curlHandle, CURLOPT_USERPWD, "$applicationId:$password");
curl_setopt($curlHandle, CURLOPT_USERAGENT, "PHP Cloud OCR SDK Sample");
$response = curl_exec($curlHandle);
$httpCode = curl_getinfo($curlHandle, CURLINFO_HTTP_CODE);
curl_close($curlHandle);
// parse xml
$xml = simplexml_load_string($response);
if($httpCode != 200) {
if(property_exists($xml, "message")) {
die($xml->message);
}
die("Unexpected response ".$response);
}
$arr = $xml->task[0]->attributes();
$taskStatus = $arr["status"];
if($taskStatus == "Queued" || $taskStatus == "InProgress") {
// continue waiting
continue;
}
if($taskStatus == "Completed") {
// exit this loop and proceed to handling the result
break;
}
if($taskStatus == "ProcessingFailed") {
die("Task processing failed: ".$arr["error"]);
}
die("Unexpected task status ".$taskStatus);
}
// Result is ready. Download it
$url = $arr["resultUrl"];
$curlHandle = curl_init();
curl_setopt($curlHandle, CURLOPT_URL, $url);
curl_setopt($curlHandle, CURLOPT_RETURNTRANSFER, 1);
// Warning! This is for easier out-of-the box usage of the sample only.
// The URL to the result has https:// prefix, so SSL is required to
// download from it. For whatever reason PHP runtime fails to perform
// a request unless SSL certificate verification is off.
curl_setopt($curlHandle, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($curlHandle);
curl_close($curlHandle);
// Let user donwload rtf result
header('Content-type: application/txt');
header('Content-Disposition: attachment; filename="file.xml"');
echo $response;
?>
I tried to access the $xml variable with now success... any ideas?
Thank you in advance
(I have included the password since its a demo account, you can check it out if you want)
I'm trying to use curl to do a simple GET with one parameter called redirect_uri. The php file that gets called prints out a empty string for $_GET["redirect_uri"] it shows red= and it seems like nothing is being sent.
code to do the get
//Get code from login and display it
$ch = curl_init();
$url = 'http://www.besttechsolutions.biz/projects/facebook/testget.php';
//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_GET,1);
curl_setopt($ch,CURLOPT_GETFIELDS,"redirect_uri=my return url");
//execute post
print "new reply 2 <br>";
$result = curl_exec($ch);
print $result;
// print "<br> <br>";
// print $fields_string;
die("hello");
the testget.php file
<?php
print "red-";
print $_GET["redirect_uri"];
?>
This is how I usually do get requests, hopefully it will help you:
// create curl resource
$ch = curl_init();
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// Follow redirects
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
// Set maximum redirects
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
// Allow a max of 5 seconds.
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
// set url
if( count($params) > 0 ) {
$query = http_build_query($params);
curl_setopt($ch, CURLOPT_URL, "$url?$query");
} else {
curl_setopt($ch, CURLOPT_URL, $url);
}
// $output contains the output string
$output = curl_exec($ch);
// Check for errors and such.
$info = curl_getinfo($ch);
$errno = curl_errno($ch);
if( $output === false || $errno != 0 ) {
// Do error checking
} else if($info['http_code'] != 200) {
// Got a non-200 error code.
// Do more error checking
}
// close curl resource to free up system resources
curl_close($ch);
return $output;
In this code, the $params could be an array where the key is the name, and the value is the value.
i want to get several pages thru curl_exec, first page is come normally, but all others - 302 header, what reason?
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, ROOT_URL);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$content = curl_exec($curl); // here good content
curl_close($curl);
preg_match_all('/href="(\/users\/[^"]+)"[^>]+>\s*/i', $content, $p);
for ($j=0; $j<count($p[1]); $j++){
$new_curl = curl_init();
curl_setopt($new_curl, CURLOPT_URL, NEW_URL.$p[1][$j]);
curl_setopt($new_curl, CURLOPT_RETURNTRANSFER, 0);
$content = curl_exec($new_curl); // here 302
curl_close($new_curl);
preg_match('/[^#]+#[^"]+/i', $content, $p2);
}
smth like this
You probably want to provide a sample of your code so we can see if you're omitting something.
302 response code typically indicates that the server is redirecting you to a different location (found in the Location response header). Depending on what flags you use, CURL can either retrieve that automatically or you can watch for the 302 response and retrieve it yourself.
Here is how you would get CURL to follow the redirects (where $ch is the handle to your curl connection):
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
You can use curl multi which is faster and can get data from all the url's in parallel.
You can use it like this
//Initialize
$curlOptions = array(CURLOPT_RETURNTRANSFER => 1);//Add whatever u additionally want.
$curlHandl1 = curl_init($url1);
curl_setopt_array($curlHandl1, $curlOptions);
$curlHandl2 = curl_init($url2);
curl_setopt_array($curlHandl2, $curlOptions);
$multi = curl_multi_init();
curl_multi_add_handle($multi, $curlHandle1);
curl_multi_add_handle($multi, $curlHandle2);
//Run Handles
$running = null;
do {
$status = curl_multi_exec($mh, $running);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
while ($running && $status == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$status = curl_multi_exec($mh, $running);
} while ($status == CURLM_CALL_MULTI_PERFORM);
}
}
//Retrieve Results
$response1 = curl_multi_getcontent($curlHandle1);
$status1 = curl_getinfo($curlHandle1);
$response1 = curl_multi_getcontent($curlHandle1);
$status1 = curl_getinfo($curlHandle1);
You can find more information here http://www.php.net/manual/en/function.curl-multi-exec.php
Checkout the Example1
In PHP, how can I determine if any remote file (accessed via HTTP) exists?
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops
$data = curl_exec($ch);
curl_close($ch);
if (!$data) {
echo "Domain could not be found";
}
else {
preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/",$data,$matches);
$code = end($matches[1]);
if ($code == 200) {
echo "Page Found";
}
elseif ($code == 404) {
echo "Page Not Found";
}
}
Modified version of code from here.
I like curl or fsockopen to solve this problem. Either one can provide header data regarding the status of the file requested. Specifically, you would be looking for a 404 (File Not Found) response. Here is an example I've used with fsockopen:
http://www.php.net/manual/en/function.fsockopen.php#39948
This function will return the response code (the last one in case of redirection), or false in case of a dns or other error. If one argument (the url) is supplied a HEAD request is made. If a second argument is given, a full request is made and the content, if any, of the response is stored by reference in the variable passed as the second argument.
function url_response_code($url, & $contents = null)
{
$context = null;
if (func_num_args() == 1) {
$context = stream_context_create(array('http' => array('method' => 'HEAD')));
}
$contents = #file_get_contents($url, null, $context);
$code = false;
if (isset($http_response_header)) {
foreach ($http_response_header as $header) {
if (strpos($header, 'HTTP/') === 0) {
list(, $code) = explode(' ', $header);
}
}
}
return $code;
}
I recently was looking for the same info. Found some really nice code here: http://php.assistprogramming.com/check-website-status-using-php-and-curl-library.html
function Visit($url){
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$page=curl_exec($ch);
//echo curl_error($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode >= 200 && $httpcode < 300){
return true;
}
else {
return false;
}
}
if(Visit("http://www.site.com")){
echo "Website OK";
}
else{
echo "Website DOWN";
}
Use Curl, and check if the request went through successfully.
http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/
Just a note that these solutions will not work on a site that does not give an appropriate response for a page not found. e.g I just had a problem with testing for a page on a site as it just loads a main site page when it gets a request it cannot handle. So the site will nearly always give a 200 response even for non-existent pages.
Some sites will give a custom error on a standard page and not still not give a 404 header.
Not much you can do in these situations unless you know the expected content of the page and start testing that the expected content exists or test for some expected error text within the page and that is all getting a bit messy...