file_get_contents script works with some websites but not others

file_get_contents script works with some websites but not others - php

I'm looking to build a PHP script that parses HTML for particular tags. I've been using this code block, adapted from this tutorial:
<?php
$data = file_get_contents('http://www.google.com');
$regex = '/<title>(.+?)</';
preg_match($regex,$data,$match);
var_dump($match);
echo $match[1];
?>
The script works with some websites (like google, above), but when I try it with other websites (like, say, freshdirect), I get this error:
"Warning: file_get_contents(http://www.freshdirect.com) [function.file-get-contents]: failed to open stream: HTTP request failed!"
I've seen a bunch of great suggestions on StackOverflow, for example to enable extension=php_openssl.dll in php.ini. But (1) my version of php.ini didn't have extension=php_openssl.dll in it, and (2) when I added it to the extensions section and restarted the WAMP server, per this thread, still no success.
Would someone mind pointing me in the right direction? Thank you very much!

It just requires a user-agent ("any" really, any string suffices):
file_get_contents("http://www.freshdirect.com",false,stream_context_create(
array("http" => array("user_agent" => "any"))
));
See more options.
Of course, you can set user_agent in your ini:
ini_set("user_agent","any");
echo file_get_contents("http://www.freshdirect.com");
... but I prefer to be explicit for the next programmer working on it.

$html = file_get_html('http://google.com/');
$title = $html->find('title')->innertext;
Or if you prefer with preg_match and you should be really using cURL instead of fgc...
function curl($url){
$headers[] = "User-Agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13";
$headers[] = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
$headers[] = "Accept-Language:en-us,en;q=0.5";
$headers[] = "Accept-Encoding:gzip,deflate";
$headers[] = "Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$headers[] = "Keep-Alive:115";
$headers[] = "Connection:keep-alive";
$headers[] = "Cache-Control:max-age=0";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_ENCODING, "gzip");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($curl);
curl_close($curl);
return $data;
}
$data = curl('http://www.google.com');
$regex = '#<title>(.*?)</title>#mis';
preg_match($regex,$data,$match);
var_dump($match);
echo $match[1];

Another option: Some hosts disable CURLOPT_FOLLOWLOCATION so recursive is what you want, also will log into a text file any errors. Also a simple example of how to use DOMDocument() to extract the content, obviously its not extensive but something you could build appon.
<?php
function file_get_site($url){
(function_exists('curl_init')) ? '' : die('cURL Must be installed. Ask your host to enable it or uncomment extension=php_curl.dll in php.ini');
$curl = curl_init();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: ";
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0 Firefox/5.0');
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_REFERER, $url);
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_TIMEOUT, 60);
$html = curl_exec($curl);
$status = curl_getinfo($curl);
curl_close($curl);
if($status['http_code']!=200){
if($status['http_code'] == 301 || $status['http_code'] == 302) {
list($header) = explode("\r\n\r\n", $html, 2);
$matches = array();
preg_match("/(Location:|URI:)[^(\n)]*/", $header, $matches);
$url = trim(str_replace($matches[1],"",$matches[0]));
$url_parsed = parse_url($url);
return (isset($url_parsed))? file_get_site($url):'';
}
$oline='';
foreach($status as $key=>$eline){$oline.='['.$key.']'.$eline.' ';}
$line =$oline." \r\n ".$url."\r\n-----------------\r\n";
$handle = #fopen('./curl.error.log', 'a');
fwrite($handle, $line);
return FALSE;
}
return $html;
}
function get_content_tags($source,$tag,$id=null,$value=null){
$xml = new DOMDocument();
#$xml->loadHTML($source);
foreach($xml->getElementsByTagName($tag) as $tags) {
if($id!=null){
if($tags->getAttribute($id)==$value){
return $tags->getAttribute('content');
}
}
return $tags->nodeValue;
}
}
$source = file_get_site('http://www.freshdirect.com/about/index.jsp');
echo get_content_tags($source,'title'); //FreshDirect
echo get_content_tags($source,'meta','name','description'); //Online grocer providing high quality fresh......
?>

Related

Issue with hyperlinks when pulling Facebook post into web page

I'm using the following script to pull the latest post from my Facebook page.
It does this as expected, however, if the Facebook post contains a hyperlink, the link becames garbled & no longer works. Try it out if you can using my code - making sure curl is installed.
<?php
$url = "http://www.facebook.com/feeds/page.php?id=466171083413035&format=json";
// disguises the curl using fake headers and a fake user agent.
function disguise_curl($url)
{
$curl = curl_init();
// Setup headers - the same headers from Firefox version 2.0.0.6
// below was split up because the line was too long.
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: "; // browsers keep this blank.
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla');
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_REFERER, '');
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
$html = curl_exec($curl); // execute the curl command
curl_close($curl); // close the connection
return $html; // and finally, return $html
}
// uses the function and displays the text off the website
$text = disguise_curl($url);
$json_feed_object = json_decode($text);
$i = 0;
foreach ( $json_feed_object->entries as $entry )
{
echo "<h2>{$entry->title}</h2>";
$published = date("g:i A F j, Y", strtotime($entry->published));
echo "<small>{$published}</small>";
$content = preg_replace("/<img[^>]+\>/i", "", $entry->content);
echo "<p style='word-wrap:break-word;'>{$content}</p>";
echo "<hr />";
$i++;
if ($i == 1) { break;}
}
?>
EDIT
My hyperlink appears as:
http://www.empireonline.com/news/story.asp?NID=36903<br/><br/>
Has anyone ever come across this issue before? Is there a solution?
Many thanks for any pointers.

My bad.
All I needed to was do a string replace on any URLs and append facebook.com.
Here is my code in case it helps any others:
$content = str_replace(' href="/l.php', ' href="http://www.facebook.com/l.php',$content);

Why can I not scrape the title off this site?

I'm using simple-html-dom to scrape the title off of a specified site.
<?php
include('simple_html_dom.php');
$html = file_get_html('http://www.pottermore.com/');
foreach($html->find('title') as $element)
echo $element->innertext . '<br>';
?>
Any other site I've tried works, apple.com for example.
But if I input pottermore.com, it doesn't output anything. Pottermore has flash elements on it, but the home screen I'm trying to scrape the title off of has no flash, just html.

This works for me :)
$url = 'http://www.pottermore.com/';
$html = get_html($url);
file_put_contents('page.htm',$html);//just to test what you have downloaded
echo 'The title from: '.$url.' is: '.get_snip($html, '<title>','</title>');
function get_html($url)
{
$ch = curl_init();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: "; //browsers keep this blank.
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows;U;Windows NT 5.0;en-US;rv:1.4) Gecko/20030624 Netscape/7.1 (ax)');
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, COOKIE);
curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIE);
$result = curl_exec ($ch);
curl_close ($ch);
return($result);
}
function get_snip($string,$start,$end,$trim_start='1',$trim_end='1')
{
$startpos = strpos($string,$start);
$endpos = strpos($string,$end,$startpos);
if($trim_start!='')
{
$startpos += strlen($start);
}
if($trim_end=='')
{
$endpos += strlen($end);
}
return(substr($string,$startpos,($endpos-$startpos)));
}

Just to confirm what others are saying, if you don't send a user agent string this site sends 403 Forbidden.
Adding this worked for me:
User-Agent: Mozilla/5.0 (Windows;U;Windows NT 5.0;en-US;rv:1.4) Gecko/20030624 Netscape/7.1 (ax)

The function file_get_html uses file_get_contents under the covers. This function can pull data from a URL, but to do so, it sends a User Agent string.
By default, this string is empty. Some webservers use this fact to detect that a non-browser is accessing its data and opt to forbid this.
You can set user_agent in php.ini to control the User Agent string that gets sent. Or, you could try:
ini_set('user_agent','UA-String');
with 'UA-String' set to whatever you like.

Grab imdb poster image from search term using php

i would like to get the poster image url using php from imdb from a search term. For example i have the search term 21 Jump Street and i would like to get back the image ur or only the imdb movie url. With the below code i need only to retrieve the url of the movie from a search term
here is the code i have
<?php
include("simple_html_dom.php");
//url to imdb page
$url = 'hereistheurliwanttogetfromsearch';
//get the page content
$imdb_content = file_get_contents($url);
$html = str_get_html($imdb_content);
$name = $html->find('title',0)->plaintext;
$director = $html->find('a[itemprop="director"]',0)->innertext;
$plot = $html->find('p[itemprop="description"]',0)->innertext;
$release_date = $html->find('time[itemprop="datePublished"]',0)->innertext;
$mpaa = $html->find('span[itemprop="contentRating"]',0)->innertext;
$run_time = $html->find('time[itemprop="duration"]',0)->innertext;
$img = $html->find('img[itemprop="image"]',0)->src;
$content = "";
//build content
$content.= '<h2>Film</h2><p>'.$name.'</p>';
$content.= '<h2>Director</h2><p>'.$director.'</p>';
$content.= '<h2>Plot</h2><p>'.$plot.'</p>';
$content.= '<h2>Release Date</h2><p>'.$release_date.'</p>';
$content.= '<h2>MPAA</h2><p>'.$mpaa.'</p>';
$content.= '<h2>Run Time</h2><p>'.$run_time.'</p>';
$content.= '<h2>Full Details</h2><p>'.$url.'</p>';
$content.= '<img src="'.$img.'" />';
echo $content;
?>

Using the API that Kasper Mackenhauer Jacobsenless suggested here's a fuller answer:
$url = 'http://www.imdbapi.com/?i=&t=21+jump+street';
$json_response = file_get_contents($url);
$object_response = json_decode($json_response);
if(!is_null($object_response) && isset($object_response->Poster)) {
$poster_url = $object_response->Poster;
echo $poster_url."\n";
}

Parsing with regex is bad but there is very little in this that could break. Its advised to use curl as its faster and you can mask your usergent.
The main problem with getting the image from a search is you first need to know the IMDB ID then you can load the page and rip the image url. Hope it helps
<?php
//Is form posted
if($_SERVER['REQUEST_METHOD']=='POST'){
$find = $_POST['find'];
//Get Imdb code from search
$source = file_get_curl('http://www.imdb.com/find?q='.urlencode(strtolower($find)).'&s=tt');
if(preg_match('#/title/(.*?)/mediaindex#',$source,$match)){
//Get main page for imdb id
$source = file_get_curl('http://www.imdb.com/title/'.$match[1]);
//Grab the first .jpg image, which is always the main poster
if(preg_match('#<img src=\"(.*).jpg\"#',$source,$match)){
$imdb=$match[1];
//do somthing with image
echo '<img src="'.$imdb.'" />';
}
}
}
//The curl function
function file_get_curl($url){
(function_exists('curl_init')) ? '' : die('cURL Must be installed');
$curl = curl_init();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: ";
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0 Firefox/5.0');
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_HEADER, true);
curl_setopt($curl, CURLOPT_REFERER, $url);
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_TIMEOUT, 5);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
$html = curl_exec($curl);
$status = curl_getinfo($curl);
curl_close($curl);
if($status['http_code'] != 200){
if($status['http_code'] == 301 || $status['http_code'] == 302) {
list($header) = explode("\r\n\r\n", $html, 2);
$matches = array();
preg_match("/(Location:|URI:)[^(\n)]*/", $header, $matches);
$url = trim(str_replace($matches[1],"",$matches[0]));
$url_parsed = parse_url($url);
return (isset($url_parsed))? file_get_curl($url):'';
}
return FALSE;
}else{
return $html;
}
}
?>
<form method="POST" action="">
<p><input type="text" name="find" size="20"><input type="submit" value="Submit"></p>
</form>

How to return an array from a function

I have a function which is suposed to return an array. But it is not working. When i try a print_r nothing is returned. The strange thing is that if I put a print_r in the function just before the return, it is returning the array properly. Hope someone can help. Thank you in advance for your replies. Cheers. Marc.
$url = "http://www.somesite.com";
$path ="somexpath";
$print = print_url_data($url, $path);
print_r($print);
function print_url_data($url, $path)
{
$content = get_url_data($url, $path);
foreach ($content as $value)
{
$output .= $value->nodeValue . "<br />";
}
return $output;
}
function get_url_data($url, $path)
{
$xml_content = get_url($url);
$dom = new DOMDocument();
#$dom->loadHTML($xml_content);
$xpath = new DomXPath($dom);
$content_title = $xpath->query($path);
$tableau = array();
foreach ($content_title as $node)
array_push($tableau, utf8_decode(urldecode($node->nodeValue)));
return $tableau; //What is being returned to the function call
}
function get_url($url)
{
$curl = curl_init();
// Setup headers - I used the same headers from Firefox version 2.0.0.6
// below was split up because php.net said the line was too long. :/
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: "; // browsers keep this blank.
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)');
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_REFERER, '[url=http://www.google.com]http://www.google.com[/url]');
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
$html = curl_exec($curl); // execute the curl command
curl_close($curl); // close the connection
return $html; // and finally, return $html
}

Make sure errors and warnings are enabled in php.ini file. It may help.

Sorry my mistake. I misplaced the array construction. Here below the code that works for those who are interested. Thanks to all that took time trying to help me out. Cheers.
<?php
$url = "http://www.somesite.com";
$path = "somexpath";
print_r(print_url_data($url, $path));
///////////////////////////////////
function print_url_data($url, $path)
{
$content = get_url_data($url, $path);
$tableau = array();
foreach ($content as $value)
{
array_push($tableau, $value->nodeValue);
}
return $tableau;
}
function get_url_data($url, $path)
{
$xml_content = get_url($url);
$dom = new DOMDocument();
#$dom->loadHTML($xml_content);
$xpath = new DomXPath($dom);
$content_title = $xpath->query($path);
return $content_title;
}
function get_url($url)
{
$curl = curl_init();
// Setup headers - I used the same headers from Firefox version 2.0.0.6
// below was split up because php.net said the line was too long. :/
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: "; // browsers keep this blank.
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)');
curl_setopt($curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($curl, CURLOPT_REFERER, '[url=http://www.google.com]http://www.google.com[/url]');
curl_setopt($curl, CURLOPT_ENCODING, 'gzip,deflate');
curl_setopt($curl, CURLOPT_AUTOREFERER, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl, CURLOPT_TIMEOUT, 10);
$html = curl_exec($curl); // execute the curl command
curl_close($curl); // close the connection
return $html; // and finally, return $html
}

Edit:
With light to the new code that you have posted change :
$output .= $value->nodeValue . "<br />";
to
$output .= $value . "<br />";
This working for me if you would like me to post a link to a test script on my server.
(You were referencing $content as an object, but this was declared as an array :] )
If by $node->nodeValue you are parsing xml; I had a similar problem with this a while back. I found that explicitly casting the nodevalue to a string when adding it to the array fixed my problem.
I believe this maybe because a reference to the xml object is added to the array instead of the string; once your function ends, the xml object is destroyed and the data is no longer accessible. Hope this helps :)
Example:
array_push($tableau, (String) utf8_decode(urldecode($node->nodeValue)));

CURL Authentication being lost?

I am authenticating a login via CURL just fine. I have a variable I am using to display the returned HTML, and it is returning my user control panel as if I am logged in.
After authenticating, I want to communicate variables with a form on another page within the site; but for some reason the HTML from that page is returning a non-authenticated version of the header (as if the original authentication never took place.)
I have a cookies.txt file with 777 permissions, and have tried just getting the contents of the same page shown when I authenticate and it is as if I am losing any associated session/cookie data somewhere along the way.
Here is my curl.class file -
<?
class Curl {
public $cookieJar = "";
// Make sure the cookies.txt file is read/write permissions
public function __construct($cookieJarFile = 'cookies.txt') {
$this->cookieJar = $cookieJarFile;
}
function setup() {
$header = array();
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
$header[] = "Cache-Control: max-age=0";
$header[] = "Connection: keep-alive";
$header[] = "Keep-Alive: 300";
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
$header[] = "Accept-Language: en-us,en;q=0.5";
$header[] = "Pragma: "; // browsers keep this blank.
curl_setopt($this->curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7');
curl_setopt($this->curl, CURLOPT_HTTPHEADER, $header);
curl_setopt($this->curl, CURLOPT_COOKIEJAR, $this->cookieJar);
curl_setopt($this->curl, CURLOPT_COOKIEFILE, $this->cookieJar);
curl_setopt($this->curl, CURLOPT_AUTOREFERER, true);
curl_setopt($this->curl, CURLOPT_COOKIESESSION, true);
curl_setopt($this->curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($this->curl, CURLOPT_RETURNTRANSFER, true);
}
function get($url) {
$this->curl = curl_init($url);
$this->setup();
return $this->request();
}
function getAll($reg, $str) {
preg_match_all($reg, $str, $matches);
return $matches[1];
}
function postForm($url, $fields, $referer = '') {
$this->curl = curl_init($url);
$this->setup();
curl_setopt($this->curl, CURLOPT_URL, $url);
curl_setopt($this->curl, CURLOPT_POST, 1);
curl_setopt($this->curl, CURLOPT_REFERER, $referer);
curl_setopt($this->curl, CURLOPT_POSTFIELDS, $fields);
return $this->request();
}
function getInfo($info) {
$info = ($info == 'lasturl') ? curl_getinfo($this->curl, CURLINFO_EFFECTIVE_URL) : curl_getinfo($this->curl, $info);
return $info;
}
function request() {
return curl_exec($this->curl);
}
}
?>
And here is my curl.php file -
<?
include('curl.class.php'); // This path would change to where you store the file
$curl = new Curl();
$url = "http://www.site.com/public/member/signin";
$fields = "MAX_FILE_SIZE=50000000&dado_form_3=1&member[email]=email&member[password]=pass&x=16&y=5&member[persistent]=true";
// Calling URL
$referer = "http://www.site.com/public/member/signin";
$html = $curl->postForm($url, $fields, $referer);
echo($html);
?>
<hr style="clear:both;"/>
<?
$html = $curl->postForm('http://www.site.com/index.php','nid=443&sid=733005&tab=post&eval=yes&ad=&MAX_FILE_SIZE=10000000&ip=63.225.235.30','http://www.site.com/public/member/signin');
echo $html; // This will show you the HTML of the current page you and logged into
?>
Any ideas?

As always when doing HTTP scripting, you should use LiveHTTPHeaders or similar to record a manual session first and then you should mimic that as closely as possible when you write your curl stuff.
Also (unfortunately) the command line tool curl offers slightly better debug and tracing options than what the PHP binding does, which makes that a better tool to work out exactly what you need to do and once that works you convert it to a PHP program.
See http://curl.haxx.se/docs/httpscripting.html for further details.

Err, please tell us what authentication scheme the server is using. Not all schemes use cookies.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

file_get_contents script works with some websites but not others - php

Related

Issue with hyperlinks when pulling Facebook post into web page

Why can I not scrape the title off this site?

Grab imdb poster image from search term using php

How to return an array from a function

CURL Authentication being lost?

Categories

Resources