PHP & CURL scraping - php

I have a problem when I run this script in Google Chrome I got a blank page. When I use another link of a web site, it works successfully. I do not what is happening.
$curl = curl_init();
$url = "https://www.danmurphys.com.au/dm/home";
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($curl);
echo $output;

There are some conditions which make your result blank. Such as:
Curl error.
Redirection without response body and the curl doesn't follow the redirection.
The target host doesn't give any response body.
So here you have to find out the problem.
For the first possibility, use curl_error and curl_errno to confirm that the curl wasn't errored when its runtime.
For the second, use CURLOPT_FOLLOWLOCATION option to make sure the curl follows the redirection.
For the third possibility, we can use curl_getinfo. It returns an array which contains "size_download". The size_download shows you the length of the response body. If it is zero that is why you see a blank page when printing it.
One more, try to use var_dump to see the output (debug purpose only). There is a possibility where the curl_exec returns bool false or null. If you print the bool false or null it will show a blank.
Here is the example to use all of them.
<?php
$curl = curl_init();
$url = "https://www.danmurphys.com.au/dm/home";
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
$output = curl_exec($curl);
$info = curl_getinfo($curl);
$err = curl_error($curl);
$ern = curl_errno($curl);
if ($ern) {
printf("An error occurred: (%d) %s\n", $ern, $err);
exit(1);
}
curl_close($curl);
printf("Response body size: %d\n", $info["size_download"]);
// Debug only.
// var_dump($output);
echo $output;
Hope this can help you.
Update:
You can use CURLOPT_VERBOSE to see the request and response information in details.
Just add this
curl_setopt($curl, CURLOPT_VERBOSE, true);
It doesn't need to be printed, the curl will print it for you during runtime.

Related

php Curl vs postman giving different results on error

I have a php curl script that returns the results of a get the run as a command from another process. The code is:
<?php
$arr = getopt("f:");
$url = $arr['f'];
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );
$result = curl_exec($ch);
if (curl_errno($ch)) {
echo curl_error($ch);
} else {
echo $result;
}
curl_close($ch);
?>
When I do a api get request for a specific url with curl that gives a 400 error, curl_error($ch) is "The requested URL returned error: 400 Bad Request"
When I run the same request in postman, I get a json reply such as: {"result_ok":false,"code":400,"message":"Invalid Email: xxxxxxxxxx#ail.com (POST)"}.
How can I get the json returned in the curl request? If I echo the $result when there is an error condition, it is null.
From CURLOPT_FAILONERROR explained:
fail the request if the HTTP code returned is equal to or larger than 400. The default action would be to return the page normally, ignoring that code.
CURLOPT_FAILONERROR is false by default so either remove it or set it:
curl_setopt($ch, CURLOPT_FAILONERROR, false);

Curl Drops Parameters

SO, I have been fighting with a piece of code that I want to use to get a remote page's source code using curl.
The code executes successfully, both in the browser and on command line. However, I get the of the main file only. When parameters are added, they are not considered whatsoever in the output.
The Code:
STACK : Ubuntu, Nginx, PHP-FPM 7.2
$urlcontent = 'https://XXX.YYY.COM/file/?var1=value1' ;
// Create a new cURL resource
$curl = curl_init();
if (!$curl) {
die("Couldn't initialize a cURL handle");
}
// Set the file URL to fetch through cURL
curl_setopt($curl, CURLOPT_URL, $urlcontent);
// Set a different user agent string (Googlebot)
curl_setopt($curl, CURLOPT_USERAGENT, 'CodiBot/2.1');
// Follow redirects, if any
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
// Fail the cURL request if response code = 400 (like 404 errors)
curl_setopt($curl, CURLOPT_FAILONERROR, true);
// Return the actual result of the curl result instead of success code
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// Wait for 10 seconds to connect, set 0 to wait indefinitely
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
// Execute the cURL request for a maximum of 50 seconds
curl_setopt($curl, CURLOPT_TIMEOUT, 50);
// Do not check the SSL certificates
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
// Fetch the URL and save the content in $html variable
$html = curl_exec($curl);
// Check if any error has occurred
if (curl_errno($curl))
{
echo 'cURL error: ' . curl_error($curl);
}
else
{
// cURL executed successfully
print_r(curl_getinfo($curl));
print_r($html);
}
curl_close($curl);
PROBLEM
I get the content for https://XXX.YYY.COM/file but not the corresponding ?var1=value1 part. IN other words, as I feed info to be retrieved to DB I get only the html of the main file.
I tried :
curl_setopt($ch, CURLOPT_POSTFIELDS, 'foo=1&bar=2&baz=3');
I know the remote server may have CORS enabled, but I tried the same url using a remote curl retriever and it succeeded. SO, it may not be the remote server

curl how to format a curl request with an api key in php

hi am new to curl but need it for a particular project could you help me format this code to work i would like to get the results and print out the raw JSON on the page here is the code i am using
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "curl -u api key: https://api.companieshouse.gov.uk/search/companies");
$x = curl_exec($curl);
curl_close($curl);
print($x);
this is a link to the api page i am trying to use
https://developer.companieshouse.gov.uk/api/docs/search/companies/companysearch.html
this is the example they give on the page
curl -uYOUR_APIKEY_FOLLOWED_BY_A_COLON:
https://api.companieshouse.gov.uk/search/companies
these are the parameters for the call if possible i would like to set them as well
q (required)
items_per_page (optional)
start_index (optional)
I simplified your request.you can try this by using below code.
$params = array("q"=>"<Some Value>","items_per_page"=>"<Some Value>","start_index"=>"<Some Value>");
$curl = curl_init();
curl_setopt($curl, CURLOPT_USERPWD, "api key:");
curl_setopt($curl, CURLOPT_URL, "https://api.companieshouse.gov.uk/search/companies?".http_build_query($params));
$x = curl_exec($curl);
curl_close($curl);
print($x);
Thanks for this opportunity to learn more of curl :-)
I tested the code below, it works. Of course you must be registered on the site and you must have got your API key. You will then provide it as username/password, but without username (this is written here on the site).
$searchvalue= 'Test';
$curl = curl_init("https://api.companieshouse.gov.uk/search/companies?q=$searchvalue");
curl_setopt($curl, CURLOPT_USERPWD, "your_api_key_provided_by_the_site");
// curl_setopt($curl, CURLOPT_HEADER, true); // when debugging : to show returning header
$rest = #curl_exec($curl);
print_r($rest);
if(curl_errno($curl))
{
echo 'Curl error : ' . curl_error($curl);
}
#curl_close($curl);

PHP Curl, Request data accept in a variable

Hello I am having a very weird problem. I am writing this code to get the data from the koncat api
$url = 'https://api.kontakt.io/beacon/?device';
$privKey = 'key here';
$cURL = curl_init();
curl_setopt($cURL, CURLOPT_URL, $url);
curl_setopt($cURL, CURLOPT_HTTPGET, true);
curl_setopt($cURL, CURLOPT_HTTPHEADER, array("Api-Key: ".$privKey,
"Accept: application/vnd.com.kontakt+json;version=5",
"Content-Type: application/x-www-form-urlencoded"));
$result = curl_exec($cURL);
The problem is Data is successfully printing on the screen but I don't know which variable here is printing out the data. I didn't use echo command in my code. If I print out the result variable it prints out only '1' meaning success.
The php documentation about curl_exec says that it will return true or fail on failure. curl_exec will also print the result directly into the output.
If you don't want that, you need to set the curl option CURLOPT_RETURNTRANSFER to 1. This will tell curl_exec to return the result on sucess.
Documentation about CURLOPT_RETURNTRANSFER:
CURLOPT_RETURNTRANSFER TRUE to return the transfer as a string of the return value of curl_exec() instead of outputting it out directly.
More information:
http://php.net/curl_exec
http://php.net/curl_setopt

cURL operation and PHP

I have a frontend code
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
//curl_setopt($ch, CURLOPT_HTTPHEADER, array('Accept: application/json'));
curl_setopt($ch, CURLOPT_URL, $url);
//make the request
$responseJSON = curl_exec($ch);
$response_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($response_status == 200) { // success
// remove any "problematic" characters from the json string and then decode
if (debug) {
echo "----finish API of getAPI inside basic_function with status==200---";
echo "<br>";
echo "-------the json response is-------" ; //.$responseJSON;
var_dump($responseJSON);
//print_r($responseJSON);
echo "<br>";
}
return json_decode( preg_replace( '/[\x00-\x1F\x80-\xFF]/', '', $responseJSON ) );
}
and I have a backend code which executed when cURL fired its operation with its URL. The backend code would therefore activated. So, I know cURL is operating.
$output=array (
'status'=>'OK',
'data'=>'12345'
)
$output=json_encode($output)
echo $output;
and $output shown on browser as {"status":"OK","data":"12345"}
However, I gone back to the frontend code and did echo $responseJSON, I got nothing. I thought the output of {"status":"OK","data":"12345"} would gone to the $responseJSON. any idea?
Here's output on Browser, something is very odd! the response_status got 200 which is success even before the parsing of API by the backend code. I expect status =200 and json response after the {"status":"OK","data":"12345"}
=========================================================================================
inside the get API of the basic functions
-------url of cURL is -----http://localhost/test/api/session/login/?device_duid=website&UserName=joe&Password=1234&Submit=Submit
----finish API of getAPI inside basic_function with status==200---
-------the json response is-------string(1153)
"************inside Backend API.php******************
---command of api is--/session/login/
---first element of api is--UserName=joe
--second element of api is---Password=1234
---third element of api is----Submit=Submit
----fourth element of api is---
-------inside session login of api-------------
{"status":"OK","data":"12345"}
Have you tried with curl_setopt($ch, CURLOPT_TIMEOUT, 10); commented?
See what happends if you comment that line.
Also try with the a basic code, if that works, smthing you added later is wrong:
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, false);
// grab URL and pass it to the browser
curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
Try var_dump($responseJSON)
If it returns false try
curl_error ( $ch )
Returns a clear text error message for the last cURL operation.
Are you sure your $url is correct?

Categories