Grab image url using php DOM - php

little problem with php DOM.
<?php
$url = "http://www.dogpile.com/search/images?q=" . rawurlencode($_GET['q']);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_MAXREDIRS, 3);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13");
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$output = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument;
$dom->loadHTML($output);
print_r($dom->getElementById('imageResults'));
?>
I'm trying to get thumbnail and full image url from the results. But I cannot grab required info...

http://simplehtmldom.sourceforge.net/manual.html
Download library read manual and use it!

Related

grab redirected url by using cURL

I am trying to find where I'll be redirected at. So I tried to functions for this, but none of those are working properly.
the links is here. when you try to enter, you will be redirected:
https://lions-mansion.jp/MA141070/
so I tried use cURL,
function redirect1($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
$data = curl_exec($ch);
$data = curl_getinfo($ch,CURLINFO_EFFECTIVE_URL );
curl_close($ch);
return $data;
}
and also this:
function redirect($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
if (preg_match('~Location: (.*)~i', $result, $match)) {
$location = trim($match[1]);
}
return $result;
}
But I couldn't find the redirected url.
this page does not use a redirect-scheme that libcurl understands (it uses a html <meta http-equiv="REFRESH"-redirect, unsupported by libcurl), so libcurl can neither tell you where it is being redirected, nor can libcurl auto-follow the redirect (because libcurl does not understand it)
you need to parse out the redirect url yourself from the HTML, eg
function redirect1($url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,0);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
$data = curl_exec($ch);
$domd=#DOMDocument::loadHTML($data);
$xp=new DOMXPath($domd);
// <META http-equiv="REFRESH" content="0;URL=http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo" />
$location=$xp->query("//meta[#http-equiv='REFRESH']")->item(0)->getAttribute("content");
// 0;URL=http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo
$location=substr($location,stripos($location,'URL=')+4);
curl_close($ch);
return $location;
}
var_dump(redirect1('https://lions-mansion.jp/MA141070/'));
output:
C:\projects\misc>php re.php
string(57) "http://sumai.tokyu-land.co.jp/branz/roppongi4/?iad=daikyo"
If keep you CURLOPT_RETURNTRANSFER to true, after executing the CURL command you can use this function call to get the redirect of effective URL:
$finalUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);

CURL/PHP - Trying to parse link giving Call to undefined method DOMDocument::query()

Here is my code:
<?php
$url = "http://www.sportsdirect.com/adidas-goletto-mens-astro-turf-trainers-263244?colcode=26324408";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSLVERSION, 3);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curl, CURLOPT_VERBOSE, true);
$str = curl_exec($curl);
curl_close($curl);
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile($str);
$xpath = new DOMXpath($doc);
$name = $doc->query('//span[#id="ProductName"]')->item(0)->nodeValue;
echo $name;
?>
I am trying to get the name of the product but i receive:
Fatal error: Call to undefined method DOMDocument::query() in /public_html/test.php on line 24
Can you please help me out resolve my problem and find my mistake because i am stuck on this problem for hours?
I just want to get the name of the product which is contained in span element with id ProductName.
Thanks in advance!
You can get the desired result by using PHP regexes:
<?php
$url = "http://www.sportsdirect.com/adidas-goletto-mens-astro-turf-trainers-263244?colcode=26324408";
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSLVERSION, 3);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curl, CURLOPT_VERBOSE, true);
$str = curl_exec($curl);
curl_close($curl);
preg_match_all("/id=\"ProductName\"(.+?)>(.+?)</", $str, $str_out);
$name = $str_out[2][0];
echo $name;
?>
This works using preg_match_all() to extract the product name.

Why CURLINFO_EFFECTIVE_URL is not working on the link?

CURLINFO_EFFECTIVE_URL is refusing to work on this URL.
The result is exactly the same as the original URL, but if you follow the URL in your browser, it clearly redirects.
What is going on here?
Here is my code:
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_exec($ch);
$redirectURL = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo $redirectURL;
curl_close($ch);

CURL statements combine into one

I have a bunch of curl statements in my app controller file. How can I possibly combine them all into one curl statement. Is it even possible. Thanks
// used for curl and curl_post
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
// used for curl_twitter
curl_setopt($ch, CURLOPT_USERAGENT,'spider');
// used for curl_post, curl_auth
curl_setopt($ch,CURLOPT_POST, count($post_data_count));
curl_setopt($ch,CURLOPT_POSTFIELDS, $post_data_string);
// used for curl_twitter
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);C
If you use the same curl statement evey time you should create a function and juste pass by parameter the wanted change.
function curlPerform($url, $postfields, $useragent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13'){
$ch = curl_init($url);
// used for curl and curl_post
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
// used for curl_post, curl_auth
curl_setopt($ch,CURLOPT_POST, count($postfields));
curl_setopt($ch,CURLOPT_POSTFIELDS, $postfields);
// used for curl_twitter
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$reponse = curl_exec($ch);
curl_close($ch);
return $reponse;
}
$reponse = curlPerform($yourUrl, $yourPostFields); // Each time that you want to run a curl request just call this function DRY
Or use curl_setopt_array
$ch = curl_init();
$options = array(CURLOPT_URL => 'http://www.example.com/',
CURLOPT_HEADER => false
);
curl_setopt_array($ch, $options);
curl_exec($ch);
curl_close($ch);

php curl cookie not registering

I've been playing with this curl facebook login script for a while just trying to get to grips with some of the features in curl, but it seems that I can not get the cookies to register:
php script
function facebookLogin(){
$login_email = 'email';
$login_pass = 'pass';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.facebook.com/login.php');
curl_setopt($ch, CURLOPT_POSTFIELDS,'email='.urlencode($login_email).'&pass='.urlencode($login_pass).'&login=Login');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, "http://www.facebook.com");
$page = curl_exec($ch);
echo $page;
}
I have a text file called cookies.txt which is in the same directory as the script, but after running this script nothing is written into the file and therefore no cookies are created, this is a big issue when trying to explore other web pages on the same website as you have to keep logging in.
Where am I going wrong?
Ok it turns out it is registered even if the cookies.txt file is empty but you need to make sure you call this file when you try to explore other parts of the site e.g.
function facebookGoToMessages(){
facebookLogin();
$ch = curl_init ("http://www.facebook.com/messages");
curl_setopt ($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, "http://www.facebook.com");
$page = curl_exec ($ch);
echo $page;
}

Categories