file_get_contents is not working for some url - php

I use file_get_contents in PHP. In the below code in first URL works fine but the second one isn't working.
$URL = "http://test6473.blogspot.com";
$domain = file_get_contents($URL);
print_r($domain);
$add_url= "http://adfoc.us/1575051";
$add_domain = file_get_contents($add_url);
echo $add_domain;
Any suggestions on why the second one doesn't work?

URL which is not retrieved by file_get_contents, because their server checks whether the request come from browser or any script. If they found request from script they simply disable page contents.
So that I have to make a request similar as browser request. So I have used following code to get 2nd url contents. It might be different for different web server. Because they might keep different checks.
Even though why dont you try to use following code! If you are lucky this might work for you!!
function getUrlContent($url) {
fopen("cookies.txt", "w");
$parts = parse_url($url);
$host = $parts['host'];
$ch = curl_init();
$header = array('GET /1575051 HTTP/1.1',
"Host: {$host}",
'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language:en-US,en;q=0.8',
'Cache-Control:max-age=0',
'Connection:keep-alive',
'Host:adfoc.us',
'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36',
);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
$url = "http://adfoc.us/1575051";
$html = getUrlContent($url);
Thanks everyone for the guidance.

Unfortunately it looks like the second site blocks access from unrecognized browsers. Even using curl from the command line doesn't work:
curl -I http://adfoc.us/1575051
gives:
HTTP/1.1 200 OK
Server: cloudflare-nginx
Date: Fri, 28 Jun 2013 12:15:40 GMT
Content-Type: text/html
Connection: keep-alive
X-Powered-By: PHP/5.5.0
Set-Cookie: __cfduid=d7cd1bf18c136a288cc2b36065a3b31f01372421740; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.adfoc.us
CF-RAY: 85a4dc6829e06d0
but no content. Note it returns status 200 so if you check the returned string for boolean === false to see if it failed, it will actually appear as if it has worked.
If you need to spoof the useragent (and possibly other things) to try and get the url to accept your request, you'll need to take the plunge with the curl libraries and try different combinations to try and get it working. Experimenting to see what works with the curl command line first would also be a good way to reduce development time in investigating this.
Here's someone who has been through this before:
php curl: how can i emulate a get request exactly like a web browser?

looks like the second url answers too slow sometimes, maybe have redirects.
try to use curl and set bigger timeout.
also, turn errors on
error_reporting(-1);
ini_set('display_errors','On');

you can try this code also
<?php
function getUrlContent($url) {
$parts = parse_url($url);
$host = $parts['host'];
$ch = curl_init();
$header = array('GET /1575051 HTTP/1.1',
"Host: {$host}",
'Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language:en-US,en;q=0.8',
'Cache-Control:max-age=0',
'Connection:keep-alive',
'Host:adfoc.us',
'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36',
);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
$url = "https://news.google.com/rss/search?q=apple&hl=en-IN&gl=IN&ceid=IN:en";
$html = getUrlContent($url);
$xml = simplexml_load_string($html);
$json = json_encode($xml);
$array = json_decode($json,TRUE);
print_r($array);
?>

Related

PHP Setting custom header starting with ':'

I need to setup some custom headers start with ":".
$option['headers'][] = ":authority: example.com"; //<-- Here is the problem
$option['headers'][] = "accept-encoding: gzip, deflate, br";
$option['post'] = json_encode(array("Domain"=>"example.com"));
$url = "https://www.google.com";
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.62 Safari/537.36");
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_COOKIEFILE,"file.cookie");
curl_setopt($ch,CURLOPT_COOKIEJAR,"file.cookie");
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_VERBOSE, true);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $option['post']);
curl_setopt($ch, CURLOPT_HTTPHEADER, $option['headers']);
$getdata = curl_exec($ch);
I try to replace the ":" with chr(58) but same problem. I get error 55 and from log "* Failed sending HTTP POST request". If I comment first line is working, but I really need that header. I'm stuck here. Any solutions?
:authority: looks like an HTTP/2 psuedo header, and you can't set them like this with curl. curl will however pass it on itself and it will use the same content as it would set for Host: to make it work the same way, independently of which HTTP version that is eventually used (it will also work with HTTP/3).

cURL returns a relative "Location:" url, how to get the full URL?

I'm using cURL in order to bypass redirections (301, 302) and special URLs like adf.ly and such. I have a working code but I couldn't fix this specific problem:
When I'm using cURL to follow the location parameter, on some websites, the last redirection may have a relative url in its Location: result.
To make it clear, I added such redirection in my own website. I'm using
header("Location: /maintenance"); exit();
in my PHP code in order to demonstrate it. I made this quick cURL request:
$url = "http://deci.deals/post/99/comment/427";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1712.4 Safari/537.36');
curl_setopt($ch, CURLOPT_HEADER, 1);
$data = curl_exec( $ch );
curl_close($ch);
echo("<br/>CURRENT_URL: ".$url);
if ( preg_match_all( '/Location:\s*(.+)\s*$/im', $data, $matches, PREG_SET_ORDER ) ) {
$last = array_pop( $matches );
echo "<br/>FOUND: ".$last[1];
}
The output:
CURRENT_URL: http://deci.deals/post/99/comment/427
FOUND: /maintenance
As you can see, the Location value is a relative URL. Is there any way to get the absolute URL instead of this?
Thanks!
From this Wikipedia page:
Relative URLs are URLs that do not include a scheme or a host. In
order to be understood they must be combined with the URL of the
original request.
Client request for http://www.example.com/blog:
GET /blog HTTP/1.1
Host: www.example.com
Server response:
HTTP/1.1 302 Found
Location: /articles/
The URL of the location is expanded by the client to
http://www.example.com/articles/

Final Effective URL - PHP (Proxy)

I apologize in advance for my English. I have small problem.
I want to get Final Effective URL from page
streamuj.tv/video/00e276bf5841bf77c8de?streamuj=original&authorize=ac13bb77d3d863ca362315b9b4dcdf3e
When you put a link into the browser gives me to .flv file
But when I put it through PHP gives me s3.streamuj.tv/unauthorized.flv
When I try it through this: getlinkinfo.com/info?link=http%3A%2F%2Fwww.streamuj.tv%2Fvideo%2F00e276bf5841bf77c8de%3Fstreamuj%3Doriginal%26authorize%3Dac13bb77d3d863ca362315b9b4dcdf3e&x=49&y=11
So everything is fine indicates that
s4.streamuj.tv:8080/vid/d0fe77e1020b6414a16aa5316c759add/58aaf1dd/00e276bf5841bf77c8de_hd.flv?start=0
My PHP CODE:
<?php
session_start();
include "simple_html_dom.php";
$proxy = array("189.3.93.114:8080");
$proxyNum = 0;
$proxy = explode(':', $proxy[$proxyNum]);
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'http://www.streamuj.tv/video/00e276bf5841bf77c8de?streamuj=original&authorize=ac13bb77d3d863ca362315b9b4dcdf3e');
curl_setopt($curl, CURLOPT_FILETIME, true);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($curl, CURLOPT_PROXY, $proxy[0]);
curl_setopt($curl, CURLOPT_PROXYPORT, $proxy[1]);
$header = curl_exec($curl);
$info = curl_getinfo($curl);
curl_close($curl);
$u1 = $info['url'];
echo "u1: $u1</br>";
$u2 = str_replace("flv?start=0","flv",$u1);
echo $u2;
?>
Where is the problem? Why it makes unauthorized.flv?
Solution
Server was checking client legitimacy via user-agent HTTP header parameter.
Using custom user-agent solved the problem.
curl_setopt($curl, CURLOPT_HTTPHEADER, array( 'user-agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2950.0 Iron Safari/537.36' ));
Original post:
Most likely the generated flv URL is not pointing to static place. It
probably uses sessionID + cookie / verifies IP (one of these, or
both).
Without knowing what header you have to request with via CURL, you
probably won't get a relevant response.

Scrape website with javascript using cURL

I try to scrape data of this website:
http://ntthnue.edu.vn/tracuudiem
First, when I insert the SBD field with data 'TS4740', I can successfully get the result. However, when I try to run this code:
Here is my PHP cURL code:
<?php
function getData($id) {
$url = 'http://ntthnue.edu.vn/tracuudiem';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, ['sbd' => $id]);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
echo getData('TS4740');
I just got the old page. Can anybody explain why? Thank you!
Make sure you add all the necessary headers and input data. The server that is processing this request can do all kinds of checks to see if it's a "valid" form request. As such you need to spoof the request to be as close to a regular browser request as possible.
Use tools like Chrome Dev Tools to see both the request and respons headers that are sent between the server and your browser to better understand what you curl setup should be like. And further use a app like Postman to make the request simulation super easy and to see what works and not.
Working example:
<?php
function getData($id) {
$url = 'http://ntthnue.edu.vn/tracuudiem';
$ch = curl_init($url);
$postdata = 'namhoc=2015-2016&kythi_name=Tuy%E1%BB%83n+sinh+v%C3%A0o+l%E1%BB%9Bp+10&hoten=&sbd='.$id.'&btnSearch=T%C3%ACm+ki%E1%BA%BFm';
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Origin: http://ntthnue.edu.vn',
'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36',
'Content-Type: application/x-www-form-urlencoded',
'Referer: http://ntthnue.edu.vn/tracuudiem',
));
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
echo getData('TS4740');

php curl super long url malformed

I have a long url nested in a variable: $mp4, and trying to download it with curl but i'm getting malformed error. Please help me if you can, thank you in advance!
The below is what I have in my php script:
exec("curl -o $fnctid.mp4 \"$mp4\"");
Error message:
curl: (3) <url> malformed
Sample url to test download:
http://f26.stream.nixcdn.com/6f4df1d8c248cf149b846c24d32f1c35/514e0209/PreNCT5/22-TaylorSwift-2426783.mp4
The current url is returning 408 - Request Timeout if that is fixed you are you this simple code :
$url = 'http://f26.stream.nixcdn.com/6f4df1d8c248cf149b846c24d32f1c35/514e0209/PreNCT5/22-TaylorSwift-2426783.mp4';
$useragent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2';
$file = __DIR__ . DIRECTORY_SEPARATOR . basename($url);
$fp = fopen($file, 'w+');
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_TIMEOUT, 320);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
echo curl_exec($ch);
var_dump(curl_getinfo($ch)); // return request information
curl_close($ch);
fclose($fp);
This error can be resolved through using urlencode
$url = urlencode ( $url )
This function is convenient when encoding a string to be used in a query part of a URL, as a convenient way to pass variables.
Hope this solve answer

Categories