Get Links with simple html dom - php

I'm trying to get links from the site.
"http://www.perfumesclub.com/es/perfume/mujer/c/"
For this use "user-agent" in simple html Sun.
But I get this error ..
Fatal error: Call to a member function find() on string in C:\Users\Desktop\www\funciones.php on line 448
This is my code:
Thanks^^
$url = 'http://www.perfumesclub.com/es/perfume/mujer/c/';
$option = array(
'http' => array(
'method' => 'GET',
'header' => 'User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)',
)
);
$context = stream_context_create($option);
$html = new simple_html_dom();
$html = file_get_contents ($url, false, $context);
$perfumes = $html->find('.imageProductDouble'); --> this is line 448
foreach($perfumes as $perfume) {
// Get the link
$enlaces = "http://www.perfumesclub.com" . $perfume->href;
echo $enlaces . "<br/>";
}

wrap your file_get_contents in str_get_html function
// method 1
$html = new simple_html_dom();
$html->load( file_get_contents ($url, false, $context) );
// or method 2
$html = str_get_html( file_get_contents ($url, false, $context) );
you're creating a new dom and assigning it to the variable $html, than reading the url returning the string and setting it to $html, thus overwriting your simple_html_dom instance, so when your invoking the find method you have a string instead of an object.

$html is a string after the call to file_get_contents. Try
$html = file_get_html($url);
OR use
$html = str_get_html($html);
after the call to file_get_contents.

Related

get content of search result with file_get_content php

is there a possible way to get content of search result by file_get_content. I am trying to do this site's search results.
http://brillia.com/search/?attribute=1&area=13900,13100,13200,14999,12999,11999
but it's not giving me the content of this part ?attribute=1&area=13900,13100,13200,14999,12999,11999 is it something missing in my function. Or file_get_content is not enough for this?
function pageContent(String $url): \DOMDocument
{
$html = cache()->rememberForever($url, function () use ($url) {
$opts = [
"http" => [
"method" => "GET",
"header" => "Accept: text/html\r\n"
]
];
$context = stream_context_create($opts);
$file = file_get_contents($url, false, $context);
return $file;
});
$parser = new \DOMDocument();
libxml_use_internal_errors(true);
$parser->loadHTML($html = mb_convert_encoding($html,'HTML-ENTITIES', 'ASCII, JIS, UTF-8, EUC-JP, SJIS'));
return $parser;
}
The URL you're using is making another Ajax call, which is:
http://brillia.com/api/search/?area=13900,13100,13200,14999,12999,11999&key=2CsR0Bzv&mode=1&attribute=1&area=13900%2C13100%2C13200%2C14999%2C12999%2C11999&_=1552729056711
This will give you the desired result.
<?php
function pageContent( $url ) {
header('Content-type: text/html; charset=EUC-JP');
echo '<base href="http://brillia.com">';
echo file_get_contents($url);
}
echo pageContent('http://brillia.com/search/?attribute=1&area=13900,13100,13200,14999,12999,11999');

Instagram Scraping in PHP

I want to add a feature in my project of Instagram followers.
<?php
function callInstagram($url)
{
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_SSL_VERIFYHOST => 2));
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
$url = "https://www.instagram.com/xyz/";
$dom = new domDocument();
$dom->loadHTML($result);
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('script type');
?>
I am using DOM to get the content from HTML: 'script type' onwards but not able to get it.
You should actually call the callInstagram($url) function, otherwise your $result variable will be empty. The main routine should therefore begin like this (with added second line):
$url = "https://www.instagram.com/ravij28/";
$result = callInstagram($url);
$dom = new DOMDocument();
$dom->loadHTML($result);
[..]
Also, when you want to retrieve the scripts on the page, you need to use the tag name, which is just script, not script type. So, the last line of your snippet needs to read:
$tables = $dom->getElementsByTagName('script');

Scraping in PHP

I want to build a code in which if I give the username it dump me the below highlighted value(no. of followers) from the page source of any instagram user.
I know about curl and DOM concept a bit.[![enter image description here][1]][1]
function callInstagram($url)
{
$ch = curl_init();
curl_setopt_array($ch, array(CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_SSL_VERIFYPEER => false, CURLOPT_SSL_VERIFYHOST => 2)) $result = curl_exec($ch); curl_close($ch); return $result; }
$url = "instagram.com/xyz/";;
$dom = new domDocument();
$dom->loadHTML(callInstagram($url));
$dom->preserveWhiteSpace = false;
$tables = $dom->getElementsByTagName('script');
print_r($tables); ?> Still building
Look like you are trying to get instagram's data.
It's better to use instragram's API to achieve your goal.
link: https://www.instagram.com/developer/
Edit:
Another way assume you can get string of all html.
Next, use regex to extract json string out.
You can use this regex: _sharedData = (.*);
Finally, use json_decode to convert string to json.

Scraping iframe video from other sites through PHP

I want to scrape video from other sites to my sites (e.g. from a live video site).
How can I scrape the <iframe> video from other websites? Is the process the same as that for scraping images?
$html = file_get_contents('http://website.com/');
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$iframes = $dom->getElementsByTagName('frame');
foreach ($iframes as $iframe) {
$pic = $iframe->getAttribute('src');
echo '<li><frame src="'.$pic.'"';
}
This post is a little old, but still, here's my answer:
I'd recommend you to use cURL and Xpath to scrape the site and parse the HTML data. file_get_content has some security issues and some hosts may disable it. You could do something like this:
<?php
function scrape($URL){
//cURL options
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE, //return html data in string instead of printing it out on screen
CURLOPT_FOLLOWLOCATION => TRUE, //follow header('Location: location');
CURLOPT_CONNECTTIMEOUT => 60, //max time to try to connect to page
CURLOPT_HEADER => FALSE, //include header
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0", //User Agent
CURLOPT_URL => $URL //SET THE URL
);
$ch = curl_init($URL);//initialize a cURL session
curl_setopt_array($ch, $options);//set the cURL options
$data = curl_exec($ch);//execute cURL (the scraping)
curl_close($ch);//close the cURL session
return $data;
}
function parse(&$data, $query, &$dom){
$Xpath = new DOMXpath($dom); //new Xpath object associated to the domDocument
$result = $Xpath->query($query);//run the Xpath query through the HTML
var_dump($result);
return $result;
}
//new domDocument
$dom = new DomDocument("1.0");
//Scrape and parse
$data = scrape('http://stream-tv-series.net/2013/02/22/new-girl-s1-e6-thanksgiving/'); //scrape the website
#$dom->loadHTML($data); //load the html data to the dom
$XpathQuery = '//iframe'; //Your Xpath query could look something like this
$iframes = parse($data, $XpathQuery, $dom); //parse the HTML with Xpath
foreach($iframes as $iframe){
$src = $iframe->getAttribute('src'); //get the src attribute
echo '<li><iframe src="' . $src . '"></iframe></li>'; //echo the iframes
}
?>
Here are some links that you could find useful:
cURL: http://php.net/manual/fr/book.curl.php
Xpath: http://www.w3schools.com/xpath/
There is also the DomDocument documention on php.net. I can't post the link, I don't have enough reputation.

PHP: how to load file from different server as string?

I am trying to load an XML file from a different domain name as a string. All I want is an array of the text within the < title >< /title > tags of the xml file, so I am thinking since I am using php4 the easiest way would be to do a regex on it to get them. Can someone explain how to load the XML as a string? Thanks!
You could use cURL like the example below. I should add that regex-based XML parsing is generally not a good idea, and you may be better off using a real parser, especially if it gets any more complicated.
You may also want to add some regex modifiers to make it work across multiple lines etc., but I assume the question is more about fetching the content into a string.
<?php
$curl = curl_init('http://www.example.com');
//make content be returned by curl_exec rather than being printed immediately
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($curl);
if ($result !== false) {
if (preg_match('|<title>(.*)</title>|i', $result, $matches)) {
echo "Title is '{$matches[1]}'";
} else {
//did not find the title
}
} else {
//request failed
die (curl_error($curl));
}
first use
file_get_contents('http://www.example.com/');
to get the file,
insert in to var.
after parse the xml
the link is
http://php.net/manual/en/function.xml-parse.php
have example in the comments
If you're loading well-formed xml, skip the character-based parsing, and use the DOM functions:
$d = new DOMDocument;
$d->load("http://url/file.xml");
$titles = $d->getElementsByTagName('title');
if ($titles) {
echo $titles->item(0)->nodeValue;
}
If you can't use DOMDocument::load() due to how php is set up, the use curl to grab the file and then do:
$d = new DOMDocument;
$d->loadXML($grabbedfile);
...
I have this function as a snippet:
function getHTML($url) {
if($url == false || empty($url)) return false;
$options = array(
CURLOPT_URL => $url, // URL of the page
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 3, // stop after 3 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
//Ending all that cURL mess...
//Removing linebreaks,multiple whitespace and tabs for easier Regexing
$content = str_replace(array("\n", "\r", "\t", "\o", "\xOB"), '', $content);
$content = preg_replace('/\s\s+/', ' ', $content);
$this->profilehtml = $content;
return $content;
}
That returns the HTML with no linebreaks, tabs, multiple spaces, etc, only 1 line.
So now you do this preg_match:
$html = getHTML($url)
preg_match('|<title>(.*)</title>|iUsm',$html,$matches);
and $matches[1] will have the info you need.

Categories