Scraping a website with cURL request not reading the HTML code - php

Crawling http://www.mfinante.ro/infocodfiscal.html?cod=299 is not working.
It's getting redirected to some other location. But why?
<?php
$url = 'http://www.mfinante.ro/infocodfiscal.html?cod=299';
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_ENCODING ,"");
curl_setopt($curl, CURLOPT_USERAGENT, $agent);
$html = curl_exec($curl);
$redirectURL = curl_getinfo($curl,CURLINFO_EFFECTIVE_URL );
curl_close($curl);
echo $html;
?>
I'm unable to understand why this happening.

You could use htmlspecialchars() to get the source code of the response
echo htmlspecialchars($html);
It's likely that there is a javascript or meta redirect in there somewhere. My JS is so poor i can't really help you with that.
If you can find that, you can build a regular expression to find the URL and then fetch it's contents.

Related

web scraping does not work only on this site

I'm using the same code to get the price of different web pages (7 in particular), all work perfect, but in 1 I can not get any data, could you tell me if it is impossible, if the page has any protection? Thanks in advance.
$source = file_get_contents("https://www.cyberpuerta.mx/Computo-Hardware/Discos-Duros-SSD-NAS/Discos-Duros-Internos-para-PC/Disco-Duro-Interno-Western-Digital-Caviar-Blue-3-5-1TB-SATA-III-6-Gbit-s-7200RPM-64MB-Cache.html");
preg_match("'<span class=\"priceText\">(.*?)</span>'", $source, $price);
echo $price[1];
I hope this result:
$869.00
This code only works badly on the website shown in the code.
Use curl with an agent set, this usually tricks the website protections to believe it's a true user.
$URL = "https://www.cyberpuerta.mx/Computo-Hardware/Discos-Duros-SSD-NAS/Discos-Duros-Internos-para-PC/Disco-Duro-Interno-Western-Digital-Caviar-Blue-3-5-1TB-SATA-III-6-Gbit-s-7200RPM-64MB-Cache.html";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, $URL);
$result =curl_exec($ch);
preg_match("'<span class=\"priceText\">(.*?)</span>'", $result, $price);
echo $price[1];

Why curl return 400 bad request when i try to get page content?

i am trying to get web page content with curl from some websites but they return 400 bad request ( file_get_contents return empty ) here's the function i am using :
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Put error_reporting(E_ALL); line at the top file where you are calling this function.
It will generate the cause of an error.

Curl return empty in php

$loginUrl = 'http://mp3.zing.vn/json/song/get-source/ZmJmTknNCBmLNzHtZbxtvmLH';
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$loginUrl);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
$result=curl_exec($ch);
curl_close($ch);
var_dump(json_decode($result));
I have a problem to get the data using curl operation. If i use the url only in my browser then it returns the data but here i using var_dump its null. I have consult some post in stackoverflow but i cant sovle this problem.
Where i do some mistake, please help my. Thanks
The URL is invalid, i.e. the path mentioned as the variable $loginURL doesnot exist.
loginUrl = 'http://mp3.zing.vn/json/song/get-source/ZmJmTknNCBmLNzHtZbxtvmLH';

query dynamic link to retrieve json file using cURL and php

Hello I am trying to query the site URL which will return a dynamic json file containing the following information...
{"input_address":"1BEgj5JUUsUz91bR9F6q6YoUPgtAGWAZg9","destination":"1A8JiWcwvpY7tAopUkSnGuEYHmzGYfZPiq","fee_percent":0}
I need to extract the input_address from that json file. This is probably easy although I am very new to php, here is what i have located in xx.php...
<?php
$root_url = 'https://blockchain.info/api/receive?method=create&address=1A8JiWcwvpY7tAopUkSnGuEYHmzGYfZPiq';
$agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)':
$ch = curl_init();
curl_setopt($ch, CURL_SSL_VERIFYPEER, false);
curl_setopt($ch, CURL_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, 'Content-type:application/json');
curl_setopt($ch, CURLOPT_URL, $root_url);
curl_setopt($ch, CURLOPY_USERAGENT, $agent);
$result=curl_exec($ch);
curl_close($ch);
$gg = json_decode($result);
var_dump(json_decode($result));
$hh = var_dump($gg['input_address']);
//$zz = json_decode($result);
echo 'blah:'. $gg;
echo 'hh:'. $hh;
?>
When I go to xx.php it is returning a blank page and I thought it should return a page that echos the contents of the entire json file? Any help is appreciated.

How to send form in remote page using PHP, jQuery or Curl

I have page name customer.php in this page, I have to call gift.php with query string. And gift.php page has a form set to auto submit.
I have try this using CURL but it gives 404 error. And using php file_get_contents this one also redirect to 404. And tried using jQuery Ajax, I haven't got any result, But Its working with iframe, but I belive it is not the best option
<iframe src="gift.php?name=John"></iframe>
Could someone please help me to do this using curl...
this code not working
$url =urlencode('gift.php?name=john');
$header = array("Accept: application/json");
$ch = curl_init();
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_ENCODING, "gzip");
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$retValue = curl_exec($ch);
$response = json_decode(curl_exec($ch));
$ee = curl_getinfo($ch);
print_r($ee);
print_r($retValue);
For curl don't use $url =urlencode('gift.php?name=john'); and $url must contain domainname. For example $url = 'http://domain.com/gift.php?name=' . urlencode('john');

Categories