Hi i'm trying to get the content from a json file, but i can't i have many troubles to do it, my code is:
<?PHP
$url = 'http://www.taringa.net/api/efc6d445985d5c38c5515dfba8b74e74/json/Users-GetUserData/apptastico';
$ch = curl_init();
$timeout = 0; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$file_contents = curl_exec($ch); // take out the spaces of curl statement!!
curl_close($ch);
var_dump($file_contents);
?>
if i put the address in the browser i get the content without any issue, but if i try to get it on PHP or other code i have issues, what can i do? i try use file_get_contents(); too but i don't get anything, only issues and issues
It appears that they are checking user agents and blocking PHP. Set this before using file_get_contents:
ini_set('user_agent', 'Mozilla/5.0 (Windows NT 5.2; rv:2.0.1) Gecko/20100101 Firefox/4.0.1');
If they are checking user agents, they may be doing so to prevent people from doing this kind of thing.
Have you read a little bit about cURL on php.net?
here is how it works:
From php.net:
$url = 'http://www.taringa.net/api/efc6d445985d5c38c5515dfba8b74e74/json/Users-GetUserData/apptastico';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, FALSE); // set to true to see the header
curl_setopt($ch, CURLOPT_NOBODY, FALSE); // show the body
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$content = curl_exec($ch);
curl_close($ch);
echo $content
Related
I know this is a common problem when using Curl but I have not found a solution after looking through StackOverflow and Google.
I've tried different User Agents and I'm getting different errors:
The requested URL returned error: 400 Bad Requestresource(19) of type (Unknown)
The requested URL returned error: 400 Bad Requeststring(42) of type (Unknown) (I noticed the 42 refers to the '=' in the $target_url)
depending on some of the modifications I make to my code below, however none has pointed me in the direction to solve this problem.
I appreciate any advice:
$target_url = "http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=170307";
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)');
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
//curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
if ($html === false) $html = curl_error($ch);
echo stripslashes($html);
curl_close($ch);
var_dump($ch);
*** I should note that I'm actually reading the url (and a few others) from a file, so maybe there is something wrong with the format of the url?
I've done this before and had no problem with it, but now I'm stumped.
I read each line/url and place it into an array which I loop through later on.
*** If I hardcode the url then it works fine, but for some reason reading it from the file produces the error.
Don't use stripslashes() use preg_replace() to filter the URLs
<?php
$target_url="http://www.hockeydb.com/ihdb/stats/pdisplay.php?pid=170307";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,4);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
$html = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+) ([\"'>]+)#",'$1'.$target_url.'$2$3', $html);
echo $html;
curl_close($ch);
var_dump($ch);
?>
i want to fetch a URL from bio with php.
URL: https://www.instagram.com/sukhcha.in/ (It can be anyone's profile)
I tried using simple_html_dom but it always shows https error while fetching html from url.
As advised in my comment, you should use cURL, because it supports HTTPS protocol :
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_TIMEOUT, 0); // Timeout (0 : no timeout)
curl_setopt($ch, CURLOPT_HEADER, false); // Do not download header
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0'); // creates user-agent
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // do not output content
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirections
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); // do not check HTTPS host (very important, if you set it to true, it probably won't work)
curl_setopt($ch, CURLOPT_URL, 'https://www.instagram.com/sukhcha.in/');
$content = curl_exec($ch);
?>
Then you have to use XPath on your $content variable to extract the part you want.
You can use CURLto get data.
$url = 'https://weather.com/weather/tenday/l/USMO0460:1:US';
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HTTPHEADER, array('Content-Type: application/x-www-form-urlencoded'));
$curl_response = curl_exec($curl);
Debug data
echo '<pre>';
print_r($curl_response);
echo '</pre>';
Close curl
curl_close($curl);
I'm trying to get content of this page: http://www.nytimes.com/2014/01/26/us/politics/rand-pauls-mixed-inheritance.html?hp&_r=0
I tried file_get_contents and curl solution but all gives me a Login page of NYTimes and I have no idea why.
Tried these file_get_contents()/curl getting unexpected page, PHP file_get_contents() behaves differently to browser, file_get_content get the wrong web
Is there any solution? Thanks
EDIT:
//this is the curl code I use
$cookieJar = dirname(__FILE__) . '/cookie.txt';
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieJar);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieJar);
curl_setopt($ch, CURLOPT_URL, $link);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
try to test it using saving cookies to same directory where the script resides first
so set the cookies path like that
$cookie = "cookie.txt";
this code works with me and i got the page
<?php
function curl_get_contents($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$get_page = curl_get_contents("http://www.nytimes.com/2014/01/26/us/politics/rand-pauls-mixed-inheritance.html?hp&_r=1");
echo $get_page;
?>
I think you need cURL to allow cookies to be saved. Try adding these lines to the cURL setup. For me this worked:
$cookie = dirname(__FILE__) . "\cookie.txt";
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
Use Live HTTP Headers firefox plugin to check what is going on during page access. There can be redirections, cookie set etc. And then try to implement this behaviour with php curl (note: set user-agent as and other client headers the same as browser)
I am trying to make a API call to wikipedia through: http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=xml, but the xml is full with html and css tags.
Is there a way to fetch only plain text without tags? Thanks!
*Edit 1:
$json = json_decode(file_get_contents('http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=json'));
$txt = strip_tags($json->text);
var_dump($json);
Null displayed.
Question was partially answered here
$url = 'http://en.wikipedia.org/w/api.php?action=parse&page=Petunia&format=json&prop=text';
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript"); // required by wikipedia.org server
$c = curl_exec($ch);
$json = json_decode($c);
var_dump(strip_tags($json->{'parse'}->{'text'}->{'*'}))
I was not able to use file_get_contents but it works fine with cURL.
it is possible to fetch info or description from wikipedia by using xml.
$url = "http://en.wikipedia.org/w/api.php?action=opensearch&search=".$term."&format=xml&limit=1";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
curl_setopt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_HEADER, false); // Include head as needed
curl_setopt($ch, CURLOPT_NOBODY, FALSE); // Return body
curl_setopt($ch, CURLOPT_VERBOSE, FALSE); // Minimize logs
curl_setopt($ch, CURLOPT_REFERER, ""); // Referer value
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // No certificate
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // Follow redirects
curl_setopt($ch, CURLOPT_MAXREDIRS, 4); // Limit redirections to four
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Return in string
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; he; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8"); // Webbot name
$page = curl_exec($ch);
$xml = simplexml_load_string($page);
if((string)$xml->Section->Item->Description) {
print_r(array((string)$xml->Section->Item->Text,
(string)$xml->Section->Item->Description,
(string)$xml->Section->Item->Url));
} else {
echo "sorry";
}
But curl must be install on server... have a nice day...
I've been killing myself all day for this one bug. I can't tell you enough how much I'd appreciate any possible help on this.
Basically, I have a very simple script. It logs into a website, looks at a file's header to see if it is an image type and then it downloads it. It then repeats this three times.
The problem here is that I cannot set CURLOPT_NOBODY without curl_exec crashing the entire script with -no- errors. (I can't even call or get an curl_error!)It would seem that it is impossible for me to go from CURLOPT_NOBODY, true to CURLOPT_NOBODY, false. The loop below runs one time and then dies().
What could possibly be causing this bug?
Here is the script:
// Log into the Website
curl_setopt($ch, CURLOPT_URL, 'http://myexample.com/login');
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.12) Gecko/2009070611 Firefox/3.0.12");
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_exec($ch);
// Begin the Loop for Finding Images
for($i = 0; $i < 3; $i++) {
curl_setopt($ch, CURLOPT_URL, 'http://myexample.com/file.php?id=' . $i);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_HEADER, true);
$output = curl_exec($ch) or die('WHY DOES THIS DIE!!!');
$curl_info = curl_getinfo($ch);
echo '<br/>' . $output;
// (Normally checks for content type here) Download the File
curl_setopt($ch, CURLOPT_URL, 'http://myexample.com/file.php?id=' . $i);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
$filename = 'downloads/test-' . $i . '.jpg';
$fp = fopen($filename, 'w');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec($ch);
fclose($fp);
}
I'm running apache 2.2 and PHP version 5.2.13.
Thanks for any help- I can't tell you how much I'd appreciate it. I'm completely stuck here. :(
Looks to me like the curl library is getting confused, esp when you are reusing the resource.
you should do
$ch = curl_init();
// do stuff with curl
curl_close($ch);
$ch = curl_init();
// another curl call
curl_close($ch);
$ch = curl_init();
// yet another curl call
curl_close($ch);
I was given the same errors executing your script, but adding in the curl_close's and the curl_init to reinitialize, seem to fix the problem. I don't know if this is acceptable, if not. I'd use the fopen() to do your http downloading, its much more intuitive than using curl, unless you need something that isn't supported in fopen.