I'm attempting to use Youtube's API to pull a list of video and display them. To do this, I need to curl their api and get the xml file returned, which I will then parse.
When I run the following curl function
function get_url_contents($url){
$crl = curl_init();
$timeout = 5;
curl_setopt ($crl, CURLOPT_URL,$url);
curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
$ret = curl_exec($crl);
curl_close($crl);
return $ret;
}
against the url
http://gdata.youtube.com/feeds/api/videos?q=Apple&orderby=relevance
The string that is saved is horribly screwed up. There are no < > tags, or half of the characters in most of it. It looks 100% different then if I view it in a browser.
I tried print, echo, and var dump and they all show it has completely different, which makes parsing it impossible.
How do I get the file properly from the server?
It's working for me. I'm pretty sure that the file is returned without errors, but when you print it, the <> tags aren't showed. But if you look on the source code you can see them.
Try this, you can see it work:
$content = get_url_contents('http://gdata.youtube.com/feeds/api/videos?q=Apple&orderby=relevance');
$xml = simplexml_load_string($content);
print_r($xml);
Make use of the client library that Google provides, it'll make your life easier.
http://code.google.com/apis/youtube/2.0/developers_guide_php.html
Related
I'm trying to create a program using PHP where I can load a full webpage and navigate the site while still staying in a different domain. The problem I'm having is that I can't load things like stylesheets and images because they are relative links. I need a way to make the relative links in to absolute links.
Right now I can get just plain HTML from the page using this handy bit of code:
echo file_get_contents('http://tumblr.com');
I can't use an iframe to display the webpage.
Your code should work, but you must set allow_url_fopen to on before running it.
echo file_get_contents('http://othersiteurl.com');
You may also use cURL. Example:
function get_data($url, $timeout = 5) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Slightly modified code from: https://davidwalsh.name/curl-download
I have a URL like this https://facebook.com/5 , I want to get HTML of that page, just like view source.
I tried using file_get_contents but that didn't returned me correct stuff.
Am I missing something ?
Is there any other function that I can utilize ?
If I can't get HTML of that page, what special thing did the developer do while coding the site to avoid this thing ?
Warning for being off topic
But does this task have be done using PHP?
Since this sounds like a task of web-scraping, I think you would gain more use in casperjs.
With this, you can target with precision what you would want to retrieved from the fb-page rather than grabbing the whole content, which I assume as of this writing is generated by multiple requests of content and rendered to you through a virtual DOM.
Please note that I haven't tried retrieving content from facebook, but I've done this with multiple services.
Good luck!
You may want to use curl instead: http://php.net/manual/en/curl.examples.php
Edit:
Here is an example of mine:
$url = 'https://facebook.com/5';
$ssl = true;
$ch = curl_init();
$timeout = 3;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, $ssl);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);
curl_close($ch);
Note that depending on the websites vhost configuration a slash at the end of the url can make a difference.
Edit: Sorry for the undefined variable.. I copied it out of a helper method i used. Now it should be alright.
Yet another Edit:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
By adding this option you will follow the redirects that are apperently happening in your example. Since you said it was an example I actually didnt run it before. Now I did and it works.
I am performing a cURL on an ssl page (Page1.php) that in turn performs a cURL on an SSL page (Page2.php). Both pages are on my site and within the same directory and both pages return XML. Through logging I see that Page2.php is being hit and is outputting valid XML. I can also hit page2.php in a browser and it returns valid XML. However, page1.php is timing and out never returning the XML.
Here is the relevant code from Page1.php:
$url = "https://mysite.com/page2.php"
$c = curl_init($url);
if ($c)
{
curl_setopt($c,CURLOPT_RETURNTRANSFER, true);
curl_setopt($c,CURLOPT_FOLLOWLOCATION, true);
curl_setopt($c,CURLOPT_CAINFO, "cacert.pem");
curl_setopt($c, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($c, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($c,CURLOPT_TIMEOUT, 30);
curl_setopt($c,CURLOPT_FRESH_CONNECT, true);
$result = curl_exec($c);
curl_close($c);
}
$result never has anything in it.
Page2 has similar options set but its $result var does have a the expected data in it.
I'm a bit of a noob when it comes to PHP so I'm hoping that I'm overlooking something really simple here.
BTW, we are using a WAMP setup with Windows Server 2008.
I am trying to use php simple dom parser to a bunch of pages but i have a problem with SOME of them. While in 90% of the pages everything working fine, in some urls i cant save the curl output to a string... The url exists of course...
curl_setOpt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile);
$data = curl_exec($ch);
$html= str_get_html($data);
if ($html) {
....
}
My code is something like this but it never gets inside if. I can echo $data without any problem but i cant echo $html :S I also tried file_get_html but nothing. The weird is that i dont get any error. How i can configure php simple dom parser to throw me the error.
Oh my GOD! It was the file size of data... I changed (simple_html_dom.php) to something bigger and i am fine...
define('MAX_FILE_SIZE', 800000);
So poor error handling :(
I have 2 questions:
1) When I use the curl, my pictures not shown, why?
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'www.google.com');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
?>
2) In destination page I have a text field, how can I fill it with curl method?
Thanks
I think that:
the images are used relatively, so the paths are incorrect if you use curl.
you should use the result of curl_exec and use that info to write into your textarea.
To get the result from the curl_exec method, set the CURLOPT_RETURNTRANSFER option, like so:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
Now you can get the value like so:
$result = curl_exec($ch);
It's all in the manual by the way.
To get the images working you might want to do some regular expression replace where you look for anything within the src="" attribute and prefix that with, let's say http://www.google.com (or whatever might be correct).