Parsing any webpage using CURL on PHP - php

Is it possible to write a PHP function that returns HTML-string of any possible link the same way the browser does? Example of links: "http://google.com", "", "mywebsite.com", "somesite.com/.page/nn/?s=b#85452", "lichess.org"
What I've tried:
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSLVERSION, 3);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$data = curl_exec($curl);
if(curl_errno($curl)){
echo 'Curl error: ' . curl_error($curl);
}
echo $data;
curl_close($curl);
Sadly enough, for some links this code returns blank page because of SSL or any other stuff, but for some links it works.
Or is there any alternative to CURL? I just do not understand why php cannot retrieve any html out of the box.

CURL may fail on SSL sites if you're running an older version of PHP. Make sure your OS and PHP version are up-to-date.
You may also opt to use file_get_contents() which works with URLs and is generally a simpler alternative if you just want to make simple GET requests.
$html = file_get_contents('https://www.google.com/');

Related

cURL follow redirect causing 500 server error

I'm trying to replicate the functionality of a PHP 'header' redirect using cURL so that it works within a cronjob triggered script.
The following code runs within a while loop (which I know works) but it returns a 500 internal server error when I try to run it, as though the request is timing out on the server (it should not take any more than 30 seconds).
Is there something wrong with the way I'm doing this:
<?php
$url = 'http://exampleurl/?ld_LeadProductType=PROF&ld_contactFirstName='.rawurlencode($quotes['name']).'&ld_businessName='.rawurlencode($quotes['company_name']).'&QuotePrice=%20/%20Quote%20Price:'.number_format($quotes['quote'],2).'.';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
curl_setopt($ch, CURLOPT_TIMEOUT, 128);
$html = curl_exec($ch);
$redirectURL = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL );
curl_close($ch);
?>

Getting string instead of xml response in curl

I have the following curl request
$url='http://test/paynetz/epi/fts?login=160&pass=Test#123&ttype=NBFundTransfer&prodid=NSE&amt=50&txncurr=INR&txnscamt=0&clientcode=TkFWSU4%3d&txnid='.urlencode($string).'&date='.urlencode($date).'&custacc=1234567890&udf1=ajeesh&udf2=sam#zz.com&udf3=940000000&udf4=arrackaparmabilhouse&ru=http://www.zwitch.co';
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
echo $auth = curl_exec($curl);
Im getting this
http://test/paynetz/epi/ftsNBFundTransfer267050dHwIMJR%2FucGOZcnocTnwvISAVaeNZK93Y8veI%2Bb1DtY%3D11
Instead of an xml.Im getting the values only not the xml.
I had 505 error inthe response,so I used urlencode($string) instead of $string
Have you tried adding curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)'); This will confirm that you are a human rather than a bot.
If you are trying to output the XML directly onto a web page, you'll might want to lookup htmlentities().

Can't get url using JSON script parsed with file_get_contents

I have this link I want to parse some information in it or just save it in a file...
can't do it without this simple code:
Example:
<?php
$myFile = 'test.txt';
$get= file_get_contents("http://www.ticketmaster.com/json/resale?command=get_resale_listings&event_id=0C004B290BF2D95F");
file_put_contents($myFile, $get); ?>
The output is:
{"version":1.1,"error":{"invalid":{"cookies":true}},"command":"get_resale_listings"}
I tried many other things like fopen or include did not work either. I don't understand because when I put the url in the browser it shows exactly ALL the code (google chrome) OR even better ask me to save it as a file (explorer). Looks like a browser cookies or something that doesn't load on my localhost ??
thanks for your tips.
You need to access that url with CURL.
The server checks if the client has cookies enabled. Using file_get_content() You do not send any information about client (browser).
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.ticketmaster.com/json/resale?command=get_resale_listings&event_id=0C004B290BF2D95F');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_exec($ch);

Curl in PHP retrieve HTML into variable not working

My code snippet (this comes after previous Curl commands to log-in and store cookies):
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL,"https://www.example.com");
curl_setopt($curl,CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSLVERSION, 3);
curl_setopt($curl, CURLOPT_HEADER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729)");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curl, CURLOPT_VERBOSE, 1);
curl_setopt($curl, CURLOPT_COOKIEFILE, 'cookie.txt');
$result = curl_exec ($curl);
if ($result === FALSE) {
echo "cURL Error: " . curl_error($curl);
}
curl_close ($curl);
print 'result: '.$result;
The displayed result just prints headers. In other words, the actual source HTML does not appear to be saved to the $result variable. I know the results are there because when I view source on the retrieved page - everything is there. If it makes a difference the page I'm requesting is XML.
Thank you
Replace print 'result: '.$result;
with
echo htmlspecialchars($result);
If you are looking to display the <tags>, you need to do like this.
You've set this variable:
curl_setopt($curl, CURLOPT_HEADER, 1);
This is why you're getting headers.
You're printing the result to the browser and the browser sees XML tags and does XML stuff with them. The data is there because you see it when you View->Source. If you remove the headers your browser may complain about a missing stylesheet and display the raw XML. But as you've already proved to yourself, you have the data.
To elaborate on #rand'Chris's answer, you can remove both:
curl_setopt($curl, CURLOPT_HEADER, 1);
and
curl_setopt($curl, CURLOPT_VERBOSE, 1);
And you will no longer receive headers from example.com. Of course, since you're accessing a different site, this may not be the complete (or even correct) solution.

PHP Curl does not return a values from url

i am using above function to retrieve the two urls , second one works when i am retriving google or other websites, but below things is not getting any response. but when i enter the url in browser i am seeing the response.. can you guide me to fix this issue?
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$response=file_get_contents_curl('http://XX.XX.XX.XX:YYYYY/Request?Mob=999999999&Message=Rr+vodafone+999999999+10+1234&myTxId=10');
echo "<br/>respnse is...".$response;
$response=file_get_contents_curl('http://www.google.com');
echo "<br/>website respond is...".$response;
Edit:
Issue is with Port number i am trying to access 8090 is not enabled in my server. when i enabled it for outgoing it was working... Thanks every one for the support.
Can you try:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20100101 Firefox/12.0');
Try to find error give this code before curl close:--
echo "Curl Error :--" . curl_error($ch);
if no error found do like this:-
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($ch);
then
print_r($result);
exit;

Categories