Recursive cURL not returning results - php

I am performing a cURL on an ssl page (Page1.php) that in turn performs a cURL on an SSL page (Page2.php). Both pages are on my site and within the same directory and both pages return XML. Through logging I see that Page2.php is being hit and is outputting valid XML. I can also hit page2.php in a browser and it returns valid XML. However, page1.php is timing and out never returning the XML.
Here is the relevant code from Page1.php:
$url = "https://mysite.com/page2.php"
$c = curl_init($url);
if ($c)
{
curl_setopt($c,CURLOPT_RETURNTRANSFER, true);
curl_setopt($c,CURLOPT_FOLLOWLOCATION, true);
curl_setopt($c,CURLOPT_CAINFO, "cacert.pem");
curl_setopt($c, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($c, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($c,CURLOPT_TIMEOUT, 30);
curl_setopt($c,CURLOPT_FRESH_CONNECT, true);
$result = curl_exec($c);
curl_close($c);
}
$result never has anything in it.
Page2 has similar options set but its $result var does have a the expected data in it.
I'm a bit of a noob when it comes to PHP so I'm hoping that I'm overlooking something really simple here.
BTW, we are using a WAMP setup with Windows Server 2008.

Related

How to run php script

I have 7-8 php scripts written which pulls data from remote server and store it into our server. Each script insert/update around 3000-4000 records at a time. When I hit any script from browser it works fine(individual script) but if I try to call all files together by writing header('Location: http://www.example.com/') it gets break. Can anyone suggest me a better way to work with this. Someone suggested me use multi-threading I have not used threading yet so can anyone help me with the better approach/solution. TIA.
Note: your current code doesn't work because header('Location: example.com') redirects the browser to example.com which means your php script finished running and the browser is now on example.com
Solution 1:
if allow_url_fopen is "On" in php.ini you can execute them as using:
<?php
$url1 = file_get_contents('http://www.example.com/1.php');
$url2 = file_get_contents('http://www.example.com/2.php');
?>
and so on...
Solution 2:
function initCURL($url) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HEADER, false);
$data = curl_exec($curl);
curl_close($curl);
return $data;
}
use it as follows:
<?php
$url1 = initCURL('http://www.example.com/1.php');
$url2 = initCURL('http://www.example.com/2.php');
?>
in these examples $url1 and $url2 will carry whatever data is returned by the scripts.

What am I doing wrong here (CURL), no matter what I try it returns empty/null

$url = "http://www.reddit.com/r/{mysubreddit}/new.json";
$fields = "sort=new";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
curl_close($ch);
var_dump($data);
{mysubreddit} is whatever subreddit I wanna check. It works fine to just grab that url via postman, or even in the browser. But when I use PHP/CURL, it returns empty. I've tried replacing the URL, with another URL to another site, and it works fine, so the curl part is working fine.
Is there something with reddit that I have to set? headers? or explicitly tell it for JSON? Or what?
I thought it might have to do with POST, but I tried GET to, still empty/null.
$url = "http://www.reddit.com/r/{mysubreddit}/new.json?sort=new";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
curl_close($ch);
That doesnt work either
You just need to add:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
As others have mentioned, reddit is sending you a 302 redirect to https. You would be able to see that by examining the headers returned by curl_getinfo().
Enabling redirect following, as sorak describes, will work. However, it's not a good solution - you will make two HTTP requests on every single API call. This is a completely unnecessary waste of network and increases the execution time of your script. Instead, just change the url that you're requesting to be from https://www.reddit.com/ in the first place.

curl puts redirected url into adressline of browser

I am pretty new to cURL and have only been using it for a short time.
My problem is that I want to get the content of a page (file_get_content() doesn't work) by using cURL. Unfortunately, the site in question has bot protection, meaning it checks whether you are a bot or not when you first arrive at the site. If you are not a bot it will redirect you to the real site with an absolute path (I guess).
Whenever I load this site with cURL it appends the path to my server address.
For example:
My server has the address: http://examplepage.com/ cURL appends the redirected path to my URL. So it would be something like: http://examplepage.com/absolute/path?with=parameters
On the original page, where I try to get the content from, it works because they have a path like that but I do not (I want some html-content of theire site).
Here is my code so far:
<?php
/* getting site */
$website = "https://originalsite.com/?some=parameters";
$redirectURL;
function curl_download($url) {
//initialize curl handler
$c = curl_init();
// Include header in result? (0 = yes, 1 = no)
curl_setopt($c, CURLOPT_HEADER, 1);
//set url to download
curl_setopt($c, CURLOPT_URL, $url);
// follow redirection
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);
//set referer
curl_setopt($c, CURLOPT_REFERER, "https://originalsite.com/");
// User agent
curl_setopt($c, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
// Should cURL return or print out the data? (true = return, false = print)
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
// Timeout in seconds
curl_setopt($c, CURLOPT_TIMEOUT, 10);
// Download the given URL, and return output
$output = curl_exec($c);
// Close the cURL resource, and free system resources
curl_close($c);
return $output;
}
$content = curl_download($website);
echo $content;
?>
so it'll enter the site where it checks whether I am a bot or not and after that it redirects me to the site (or it least, it tries to).
I have searched the internet and StackOverflow but I couldn't find an answer to my problem.
What's happening is that there is some JavaScript code issuing a redirect once you render the page. Try disabling JavaScript in your browser for a quick test.

CURl not returning an entire page

I am using CURl to retrieve a page for a small search engine project I am working on, but on some pages it's not retrieving the entire page.
The function that I have setup is:
public function grabSourceCode($url) {
// Try and get source code using #file_get_contents
$ch = curl_init();
$timeout = 50;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_USERAGENT,'NameBot/0.2');
$source_code = curl_exec($ch);
curl_close($ch);
return $source_code;
}
and I am retrieving the page using:
$Crawler->grabSourceCode('https://sedo.com/search/searchresult.php4?keyword=cats&language_output=e&language=e')
on this page I get everything, but on this page, I only get part of the page.
I have tried using file_get_contents() but that has the same results.
It seem's to be an issue with dynamic loading of the page, when I run the browser in JavaScript blocking mode it shows the same results as the CURl function.
Is there anyway to do this in PHP, or would I have to look at another language, such as JavaScript?
Thanks, Daniel

Php curl incorrect download

I'm attempting to use Youtube's API to pull a list of video and display them. To do this, I need to curl their api and get the xml file returned, which I will then parse.
When I run the following curl function
function get_url_contents($url){
$crl = curl_init();
$timeout = 5;
curl_setopt ($crl, CURLOPT_URL,$url);
curl_setopt ($crl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($crl, CURLOPT_CONNECTTIMEOUT, $timeout);
$ret = curl_exec($crl);
curl_close($crl);
return $ret;
}
against the url
http://gdata.youtube.com/feeds/api/videos?q=Apple&orderby=relevance
The string that is saved is horribly screwed up. There are no < > tags, or half of the characters in most of it. It looks 100% different then if I view it in a browser.
I tried print, echo, and var dump and they all show it has completely different, which makes parsing it impossible.
How do I get the file properly from the server?
It's working for me. I'm pretty sure that the file is returned without errors, but when you print it, the <> tags aren't showed. But if you look on the source code you can see them.
Try this, you can see it work:
$content = get_url_contents('http://gdata.youtube.com/feeds/api/videos?q=Apple&orderby=relevance');
$xml = simplexml_load_string($content);
print_r($xml);
Make use of the client library that Google provides, it'll make your life easier.
http://code.google.com/apis/youtube/2.0/developers_guide_php.html

Categories