Non-empty string empty after return - php

I'm writing a php script that deals with page processing via cURL, so I have a function to get and return pages by URL
function get_url($Url){
if (!function_exists('curl_init')){
die('Sorry cURL is not installed!');
}
set_time_limit (20);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: age_gate_birthday=19901101"));
curl_setopt($ch, CURLOPT_REFERER, "http://www.facebook.com");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
return $output;
}
echoing $output in this function always returns a string of HTML, however if I call on this function in another function
function get_vid ($sql, $url) {
$data = get_url($url);
...
the returned value is an empty string, despite the fact that $output had value when get_url() was doing its thing.
Weirdly enough, the error only exists with specific URLs, but works fine with others.
Thank you for trying to help!
UPDATE: It seems CURL returns FALSE randomly on specific links, which seems to be a culprit of this issue, however curl_error is empty, so I'm unable to identify the cause of this.

I think it's because you get a http redirect.
Try to check http code like this :
if (curl_getinfo($ch,CURLINFO_HTTP_CODE) == 302) {
// Manage http redirect here
}

Related

PHP cURL using my IP address instead of the Server?

I want to make a cURL request to a private page where it automatically checks my IP/language to deliver content in that language.
$url = 'https://example.com/';
$ch = curl_init($url);
if ($ch === false)
{
throw new CurlException('Curl Init return `false`.');
}
if ( ! ini_get('open_basedir'))
{
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
}
if (isset($options['curlHeaders']))
{
curl_setopt($ch, CURLOPT_HTTPHEADER, $options['curlHeaders']);
}
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36');
curl_setopt($ch, CURLOPT_URL, $url);
$content = curl_exec($ch);
if ($content === false)
{
// there was a problem
$error = curl_error($ch);
throw new CurlException('Error retrieving "'.$url.'" ('.$error.')');
}
elseif ($content === true)
{
throw new CurlException('Unexpected return value of content set to true.');
}
return $content;
The problem is that the content I get back is always in the same language as my public IP address is. Let's say I want the returned content in English, but it loads in French. The Locale on my computer is in English, the only thing that is French is my IP address.
I have tried it locally and uploaded the code to a server and used a VPN also. I got the same results though.
What am I missing here? Is there any caching when sending cURL requests?
So it turned out I had to set a language header on the cURL settings which solved this issue completely:
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept-Language: en-US;q=0.8,en;q=0.4"));

cUrl waits for url to load IF results are printed

The following curl-call succeeds every time, if and only if $data is printed after the curl-call. curl_getinfo() returning
[content_type] => text/html; charset=UTF-8
If $data is not printed, the curl-call sometimes return the same result as above and sometimes returns $data being "Loading...", Which means that page has not finished loading yet. And curl_getinfo() returning
[content_type] => text/html
Furthermore, when using print_r($data), I can see the print_r(curl_getinfo($ch)); on my website being updated several times while performing the curl-call. What... The.... F?
(the set_opt-list has grown larger as I'm trying to find a solution LOL)
Ooh.. yeah, even if I print $data after it's been returned to function caller and caught in another variable.. curl succeeds every time.
Is this normal behaviour? I don't want to print_r($data)!
Is it possible that the url I'm retrieving contains javascript which gets run when I "print" it on my website? Why does it work occasionally without the print_r($data)? Ref: is-there-a-way-to-let-curl-wait-until-the-pages-dynamic-updates-are-done
edit: Until further notice, I've put the curl-call in a while-loop, checking if downloaded size is above a certain threshold. I've set the while loop to 10 iterations, and so far it is enough, i.e. it will manage to download the content of interest. Time consumed is barely noticed.
function curl_get_contents($url) {
global $dbg;
$ch = curl_init();
$timeout = 30;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_NOSIGNAL, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch,CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
$data = curl_exec($ch);
if ($dbg) {
print_r(curl_getinfo($ch)); // This one gets refreshed if print_r($data) used below
if(curl_errno($ch)){
echo 'Curl error: ' . curl_error($ch);
} else {
echo "ALL GOOD <br>";
}
}
curl_close($ch);
//echo $data; // If I do this...
//print_r($data); // ... or this. curl is success 100%.
return $data;
}

PHP cURL with post data login not working where wget with shell_exec is

I'm trying to access remote files on a website that requires me to log in to it. The current way I'm doing this is with a wget ran from shell_exec. This 'works' but is definitly not the way I want to do it in. I would prefer to do it using cURL, but for some reason this will not work for me.
What happens instead of getting the page I requested is I'm redirected to the login page, which does not happen with the wget method, which should be making the exact same request...
What am I doing wrong?
First of all this is the current (not nice but) working method of accessing the page:
shell_exec("wget --post-data='serviceLoginUser=something&serviceLoginPass=something&qq=login' -p https://internal.website.com/somepage.php?get=request -O return.json");
This is my download function which does not work but I want to replace the old way with: (updated after suggestion from Abkarino)
public function download($url, $postData = NULL){
//Set cURL options
curl_setopt($this->ch, CURLOPT_URL, $url);
curl_setopt($this->ch, CURLOPT_HEADER, 0);
//To act like a normal browser (needed for intranet)
curl_setopt($this->ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17');
curl_setopt($this->ch, CURLOPT_AUTOREFERER, true);
curl_setopt($this->ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($this->ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($this->ch, CURLOPT_VERBOSE, 1);
//TODO: remove
curl_setopt($this->ch, CURLOPT_SSL_VERIFYPEER, false);
//If POST data was supplied:
if ($postData != NULL){
curl_setopt($this->ch,CURLOPT_POSTFIELDS, $postData);
}
//execute post
return curl_exec($this->ch);
}
Where the following code will form the request: (currenly this code is only to test if the cURL actually returns the correct data)
include 'Downloader.php';
$dl = new Downloader;
$postData = array(
'serviceLoginUser' => 'something',
'serviceLoginPass' => 'something',
'qq' => 'login'
);
echo $dl->download("https://internal.website.com/somepage.php?get=request", $postData);

Simple html dom div downloading issue

Here's some php code that i wrote. It's mainly based on docs.
It's obviously using simple html dom
The problem is it doesnt really work and i dunno why.
<?php
include("simple_html_dom.php");
$context = stream_context_create();
stream_context_set_params($context, array('user_agent' => "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2049.0 Safari/537.36"));
$html = file_get_html('http://www.ask.fm', 0, $context);
$elem = $html->find('div[id=heads]', 0);
var_dump($elem);
?>
What i want is to set useragent which i tried to do above that sentence. And then i want to download div with id "heads". That's not much but i couldnt figure it out in any way.
<?php
include "simplehtmldom_1_5/simple_html_dom.php";
function curl($url)
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
// you may set this options if you need to follow redirects. Though I didn't get any in your case
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
$content = curl_exec($curl);
curl_close($curl);
return $content;
}
$html = str_get_html(curl("http://www.ask.fm"));
echo $elem = $html->find('div[id=heads]', 0);
?>
I think it is useful for you

PHP cURL, file_get_contents blank page

I'm trying to get a page content with cURL or file_get_content. On many websites it's working but i'm trying to do that on a friend's server and it's not.
I think there is a protection with header or things like that. I get the following error code : 401 forbidden. If i try to reach the same page with a normal browser it works.
Here is my code for the file_get_contents function :
$homepage = file_get_contents('http://192.168.1.3');
echo $homepage; // just a test to see if the page is loaded, it's not.
if (preg_match("/my regex/", $homepage)) {
// ... some code
}
I also tryed with cURL :
$url = urlencode('http://192.168.1.3');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0');
$result = curl_exec($ch) or die("Not working");
curl_close($ch);
echo $result; // not working ..
Nothing works, maybe i should add more args to curl_setopt ...
Thanks.
PS : If i try with linux (wget) i get an error, but if i try with aria2c it's working.
HTTP Status 401 means that UNAUTHORIZED. You need send the server with username and passwd。
With file_get_contents, you add the second param . That's a context-steam, which you can set header info.
You'd better to use curl for file_get_contents intend to access local file, as it's a block function. Add the option as following, it's a basic authorize.
curl_setopt($ch,CURLOPT_USERPWD,"my_username:my_password");
try this update with useragent
<?php
$curlSession = curl_init();
curl_setopt($curlSession, CURLOPT_URL, 'http://192.168.1.3/');
curl_setopt($curlSession,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curlSession, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curlSession, CURLOPT_RETURNTRANSFER, true);
$homepage = curl_exec($curlSession);
curl_close($curlSession);
echo $homepage ;
?>
if still getting blank page you have to install this add-on on firefox and see the "request-headers" and "response-headers"

Categories