I would like to scrape the content of this
http://whostreams.net/embed/gryr4u074z82x using curl.
I've been trying setting different user agents, and setting other options
but I just can't seem to get the content of that page, as I often get redirected or I get a "page moved" error.
I believe it has something to do with the fact that the query string gets encoded somewhere but I'm really not sure how to get around that.
$url = 'http://whostreams.net/embed/gryr4u074z82x';
$curl_handle=curl_init();
curl_setopt($curl_handle, CURLOPT_REFERER, 'http://www.fel3arda.com/2018/09/denmark-vs-wales.html');
curl_setopt($curl_handle, CURLOPT_HTTPHEADER, array('Host: whostreams.net'));
curl_setopt($curl_handle, CURLOPT_URL,$url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36');
$query = curl_exec($curl_handle);
curl_close($curl_handle);
echo ($query) ;
What do I need to do to get my php code to show the exact content of the page
curl_exec() need to be before curl_close();
Because curl_close() Terminates the CURL session and releases resources. The descriptor curl_handle is also destroyed.
the code you posted works for me, just added a <?php to it.
<?php
$url = 'http://whostreams.net/embed/gryr4u074z82x';
$curl_handle=curl_init();
curl_setopt($curl_handle, CURLOPT_REFERER, 'http://www.fel3arda.com/2018/09/denmark-vs-wales.html');
curl_setopt($curl_handle, CURLOPT_HTTPHEADER, array('Host: whostreams.net'));
curl_setopt($curl_handle, CURLOPT_URL,$url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36');
$query = curl_exec($curl_handle);
curl_close($curl_handle);
echo ($query) ;
i do indeed get the
CLICK HERE TO UNMUTE
STREAM IS OFFLINE
Retrying in seconds
page + the heavily obfuscated javascript used to start streaming the video from wss://ws.peer5.com
you say I just can't seem to get the content of that page - well, what content are you getting? and what did you expect to get instead? because here is roughly what my Google Chrome webbrowser and curl is getting:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1" />
<meta name="viewport" content="width=device-width; initial-scale=1.0">
<script>if(window==window.top) document.location="/"</script>
<link rel="stylesheet" href="/css/embed.min.css?v=0.1" />
<!-- Tssp-->
<!-- PopAds.net Popunder Code for whostreams.net | 2018-09-09,2437207,0,0 -->
<script type="text/javascript" data-cfasync="false">
/*<![CDATA[/* */
/* Generated 2018-09-09 16:26:49 for "PopAds%20CGAPIL%20A", len 1367 */
(function(){ var p=window;p["\x5f\x70\x6fp"]=[["\u0073i\x74e\u0049\x64",2437207],["\u006d\x69\x6e\u0042i\x64",0],["\x70\x6f\u0070un\x64er\x73Pe\x72\x49\x50",0],["\x64\x65\u006c\u0061y\u0042e\x74\u0077een",0],["\u0064\x65\u0066\u0061u\u006ct",false],["\x64\u0065fau\x6c\x74P\x65\u0072\x44a\u0079",0],["\u0074o\u0070\x6dos\x74\x4cay\x65\x72",!1]];var l=["/\x2fc\u0031\x2ep\x6f\u0070a\u0064s\u002en\x65\u0074\u002f\x70o\x70\u002e\u006a\u0073","/\u002f\x63\u0032.p\x6fpa\x64\u0073.n\x65t/\x70\u006fp\u002ej\x73","//w\x77\x77.\x6b\u0061\u006f\x6ariv\u006d\u0068\x79s\x2ec\u006f\x6d\u002f\u0062p\x2ejs","/\x2fww\x77.\x74djo\x61\x6f\x73\u0069\u0062\x65\x73\u002e\x63om\x2f\x78\u002ejs",""],w=0,x,a=function(){if(""==l[w])return;x=p["\u0064\x6f\u0063\u0075\u006de\u006e\u0074"]["\x63\u0072e\x61\x74\u0065\u0045le\u006d\x65n\x74"]("\x73cr\u0069\x70\x74");x["\x74\x79\x70\u0065"]="te\x78\x74\u002f\u006a\x61v\x61\u0073\u0063\x72\x69p\u0074";x["\x61\x73\u0079\u006ec"]=!0;var s=p["\x64\x6fcu\u006de\x6et"]["g\u0065\u0074Ele\x6d\x65n\x74\x73\x42\u0079\x54\x61\x67\u004ea\x6d\x65"]("\x73\u0063r\u0069\u0070\u0074")[0];x["\x73\x72c"]=l[w];if(w<2){x["\u0063ro\u0073\x73Or\u0069g\x69\u006e"]="\x61\x6eo\u006e\x79mo\x75s";};x["\u006f\u006ee\x72\x72\u006f\u0072"]=function(){w++;a()};s["p\x61\x72\u0065n\u0074\u004e\u006f\x64\u0065"]["\u0069nse\x72\x74\x42\x65\x66ore"](x,s)};a()})();
/*]]>/* */
</script>
</head>
<body>
<div class="jwplayer jw-reset jw-skin-glow" id="player"></div>
<div id="btn-unmute" onclick="WSUnmute()">CLICK HERE TO UNMUTE</div>
<div class="tb stream-offline" >
<div class="tb-col">
<img src="/imgs/logo.png" />
<h2>STREAM IS OFFLINE</h2>
<p>Retrying in <span class="counter"></span> seconds</p>
</div>
</div>
<script src="/js/jquery.min.js"></script>
<script>var WSreloadCounter,WSnTries=0,videoStarted = false, startMuted = startMuted();function errorPlaying(){$(".stream-offline .counter").text(10);$(".stream-offline").css("display","table");WSreloadCounter=setInterval(function(){var a=$(".stream-offline .counter").text();if(a>1){a--;$(".stream-offline .counter").text(a)}else{ clearInterval(WSreloadCounter);WSnTries++;if(WSnTries<10){WSreloadStream();}else{ window.location.reload() } }},1000)}function startMuted(){var d=/constructor/i.test(window.HTMLElement)||(function(a){return a.toString()==="[object SafariRemoteNotification]"})(!window.safari||(typeof safari!=="undefined"&&safari.pushNotification));if(d){return true}var c=!!window.chrome&&!!window.chrome.webstore;if(c&&getChromeVersion()>=66){return true}return false}function getChromeVersion(){var a=navigator.userAgent.match(/Chrom(e|ium)\/([0-9]+)\./);return a?parseInt(a[2],10):false};</script>
<script src="//api.peer5.com/peer5.js?id=5yaksk6z3h8drz14s022"></script><script src="//api.peer5.com/peer5.clappr.plugin.js"></script>
<script src="/players/clappr/clappr.min.js?v=0.22"></script>
<script>eval(function(p,a,c,k,e,d){e=function(c){return(c<a?'':e(parseInt(c/a)))+((c=c%a)>35?String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/,String)){while(c--){d[e(c)]=k[c]||e(c)}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('6 3;$(4).J(2(){3=C D.E({K:"L://Y.l.k:10/V/H.17?s=-Z&e=W",X:"#3",11:"r%",12:"r%",14:q,13:q,U:"M",I:"",N:"1",O:"",T:{S:2(e){R()},P:2(e){5(2(){$(".9-B").G()},Q);16(!p){p=8;5(2(){6 h=4.o("t")[0],s=4.u("x");s.w("n-v","y.b");s.f="z/d";s.m=8;s.g="//1o.b/1n/18/1l/1q.j";h.i(s)},F);5(2(){6 h=4.o("t")[0],s=4.u("x");s.w("n-v","y.b");s.f="z/d";s.m=8;s.g="//l.k/1i/1h.j";h.i(s)},1g);5(2(){$.1m("",{"1f":"H","a":"A"})},F)}},1e:2(e){$(".9-B").1d()},19:2(e){$("#1a-c").G()},}})});2 1b(){$(".9-1k").1j("1p","1s");6 7=3.1r(3);7=C D.E(7.1t);3.1c();3=7;3.A();3.c()}2 15(){3.c()}',62,92,'||function|player|document|setTimeout|var|newplayer|true|stream||com|unmute|javascript||type|src||appendChild|js|net|whostreams|async|data|getElementsByTagName|videoStarted|false|100||head|createElement|domain|setAttribute|script|aeckcjy|text|play|logo|new|Clappr|Player|15000|fadeOut|gryr4u074z82x|watermark|ready|source|http|bestfit|position|watermarkLink|onPlay|1000|errorPlaying|onError|events|stretching|hls|1536534286|parent|cdn|Xj60CxQUPZV0M5RAeKbFA|8080|width|height|mute|autoPlay|WSUnmute|if|m3u8|d1|onVolumeUpdate|btn|WSreloadStream|destroy|fadeIn|onPause|ref|120000|adcash|pops|css|offline|fa|post|d4|wdaxvjr9dc|display|d4d1faecf77b3799e550953764a305da|configure|none|options'.split('|'),0,{}))
</script><!--Amung / Analytics -->
<div style="display:none;"><img name="viewers" src="//whos.amung.us/cwidget/whostreams/000000ffffff.png"></div>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=UA-112185528-1"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'UA-112185528-1');
</script>
</body>
</html>
I am making a home automantion project with Arduino and I am using Teleduino to remotely control an LED as a test. I want to take the contents of this link and display them into a php page.
<!DOCTYPE html>
<html>
<body>
<?php
include 'simple_html_dom.php';
echo file_get_html('http://us01.proxy.teleduino.org/api/1.0/2560.php?k=202A57E66167ADBDC55A931D3144BE37&r=definePinMode&pin=7&mode=1');
?>
</body>
The problem is that the function does not return anything.
Is something wrong with my code?
Is there any other function I can use to send a request to a page and get that page in return?
I think you had to use function file_get_contents but your server is protcting data from scraping so curl would be a better solution:
<?php
// echo file_get_contents('http://us01.proxy.teleduino.org/api/1.0/2560php?k=202A57E66167ADBDC55A931D3144BE37&r=definePinMode&pin=7&mode=1');
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "http://us01.proxy.teleduino.org/api/1.0/2560.php?k=202A57E66167ADBDC55A931D3144BE37&r=definePinMode&pin=7&mode=1");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
// $output contains the output string
$output = curl_exec($ch);
echo $output;
// close curl resource to free up system resources
curl_close($ch);
?>
I have been able to parse actual .json files, but this link I can't seem to parse.
http://forecast.weather.gov/MapClick.php?lat=36.321903791028205&lon=-96.80576767853478&FcstType=json
I am thinking because the link itself is not a .json file but a json formatted link... and I am having issues trying to parse it... even if I start by using...
<?php
$url = "http://forecast.weather.gov/MapClick.php?lat=36.321903791028205&lon=-96.80576767853478&FcstType=json";
$json = file_get_contents($url);
$json_a = json_decode($json,true);
// <---------- Current Conditions ----------> //
//Display Location
$location_full = $json_a['location']['areaDescription'];
?>
And the on my page I want to display this information I have:
<?php
require 'req/weatherinfo.php';
?>
<!DOCTYPE html>
<html>
<head>
<title>PawneeTV Weather</title>
</head>
<body>
<?php echo $location_full; ?><p>
</body>
</html>
Any ideas why its generating a blank page? I have cleared the errors now it just doesn't display anything. I've done with many times with a .json file source, it works with this source http://api.wunderground.com/api/43279e1c0b065c2e/forecast/q/OK/Pawnee.json, but will not work with a link thats ends with =json instead of .json
You can not use file_get_contents in that case. More explanation about this you can read here.
This code is working:
<?php
$url = "http://forecast.weather.gov/MapClick.php?lat=36.321903791028205&lon=-96.80576767853478&FcstType=json";
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, $url);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
$json_a = json_decode($output,true);
// <---------- Current Conditions ----------> //
//Display Location
$location_full = $json_a['location']['areaDescription'];
I have to put this page: http://www.tvindiretta.com/m/ in a iframe. This page is cURL powered. He is it's content. When I try to put this url: http://www.tvindiretta.com/m/index.php in an iframe (with tag) the browser redirects to the iframe url. How can I keep this page inside the iframe. I have to change the user user agent. the I'm a complete noob in cURL but help me please. He is the /m/index.php page source code:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.tvindiretta.com/");
curl_setopt($ch, CURLOPT_MAXREDIRS, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('User-Agent: Mozilla/5.0 (iPhone; U; CPU iPhone OS 2_2_1 like Mac OS X; en-us) AppleWebKit/525.18.1 (KHTML, like Gecko) Version/3.1.1 Mobile/5H11 Safari/525.20'));
curl_exec($ch);
$result = curl_exec ($ch);
curl_close ($ch);
print $result;
curl_close($ch);
?> $
I don't think there is an user-agent redirection on this web page since
<?php
if (isset($_GET['get'])){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.tvindiretta.com/m");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_exec($ch);
$result = curl_exec ($ch);
curl_close ($ch);
print $result;
}
else{
?>
<!DOCTYPE HTML>
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<iframe src="test.php?get" style="position:absolute; top:100px; left:100px; width:400px; height:400px;"/>
</body>
</html>
<?php } ?>
Seems to screw the page, but provide me the mobile content anyway.
So I guess the real problem here is the javascript code inside that page:
In html5 you have a new iframe attribute "sandbox" which allows you to restrict the iframe's content behaviour .
Unfortunately this seems to be supported only by Chrome and Safari.
One idea here could be to try to scrape the content of the web page (with DomDocument in PHP for instance), keep only the content in which you are interested, and try to reproduce their style. It may be easier to say than to do, but I can't see a cleaner way to do so.
Since it seems you are interested in getting a TV program, you could check for a dedicated xml scaper XMLtv.
I have a web service to which I send a xml request (application/x-www-form-urlencoded encoded) and get a response back. These are sent to the URL contained within a query parameter called 'xml'
When I use a simple html form such as the one below, I am returned a result. However, when I use my php code, I am returned an error. Perhaps it is because of this: These are sent to the URL contained within a query parameter called 'xml'? If that's the case, how do I send it in that parameter? I'd be very grateful if someone could point out what I've been doing wrong. Many thanks
<form method="post" name="form1" action="http://webservicesapi.com/login.pl">
<textarea cols="80" rows="20" name="xml">
<?xml version="1.0"?><request><auth username="hello" password="world" /><method action="login" /></request>
</textarea>
<input type="submit" value="submit XML document">
</form>
This doesn't work:
<?php
// open a http channel, transmit data and return received buffer
function xml_post($xml, $url, $port)
{
$user_agent = $_SERVER['HTTP_USER_AGENT'];
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
curl_setopt($ch, CURLOPT_FAILONERROR, 1); // Fail on errors
if (ini_get('open_basedir') == '' && ini_get('safe_mode' == 'Off'))
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // allow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_PORT, $port); //Set the port number
curl_setopt($ch, CURLOPT_TIMEOUT, 15); // times out after 15s
curl_setopt($ch, CURLOPT_POSTFIELDS, $xml); // add POST fields
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
if($port==443)
{
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
}
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$xml = '<?xml version="1.0"?><request><auth username="hello" password="world" /><method action="login" /></request>';
$url ='http://webservicesapi.com/login.pl';
$port = 80;
$response = xml_post($xml, $url, $port);
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Untitled Document</title>
</head>
<body>
<P><?=nl2br(htmlentities($response));?></P>
</body>
</html>
?>
CURLOPT_POSTFIELDS expects either an associative array, or a raw post string. Since you are passing it a string, it treats it as a raw post string. So either of these should work:
$response = xml_post(array('xml' => $xml), $url, $port);
OR
$response = xml_post('xml='.urlencode($xml), $url, $port);