Why the query doesn't match the DOM? - php

Here is my code:
$res = file_get_contents("http://www.lenzor.com/photo/search/index/type/user/%D8%B9%D9%84%DB%8C//text/%D9%81%D8%A7%D8%B7%D9%85%D9%87");
$doc = new \DOMDocument();
#$doc->loadHTMLFile($res);
$xpath = new \DOMXpath($doc);
$links = $xpath->query("//ul[#class='user_box']/li");
$result = array();
if (!is_null($links)) {
foreach ($links as $link) {
$href = $link->getAttribute('class');
$result[] = [$href];
}
}
print_r($result);
Here is the content I'm working on. I mean it's the result of echo $res.
Ok well, the result of my code is an empty array. So $links is empty and that foreach won't be executed. Why? Why //ul[#class='user_box']/li query doesn't match the DOM ?
Expected result is an array contains the class attribute of lis.

Try this, Hope this will be helpful. There are few mistakes in your code.
1. You should search like this '//ul[#class="user_box clearfix"]/li' because class="user_box clearfix" class attribute of that HTML source contains two classes.
2. You should use loadHTMLinstead of loadHTMLFile.
<?php
ini_set('display_errors', 1);
libxml_use_internal_errors(true);
$res = file_get_contents("http://www.lenzor.com/photo/search/index/type/user/%D8%B9%D9%84%DB%8C//text/%D9%81%D8%A7%D8%B7%D9%85%D9%87");
$doc = new \DOMDocument();
$doc->loadHTML($res);
$xpath = new \DOMXpath($doc);
$links = $xpath->query('//ul[#class="user_box clearfix"]/li');
$result = array();
if (!is_null($links)) {
foreach ($links as $link) {
$href = $link->getAttribute('class');
$result[] = [$href];
}
}
print_r($result);

Related

Getting data from HTML using DOMDocument

I'm trying to get data from HTML using DOM. I can get some data, but can't figure out how to get the rest. Here is an image highlighting the data I want.
http://i.imgur.com/Es51s5s.png
here is the code itself
http://pastebin.com/Re8qEivv
and here my PHP code
$html = file_get_contents('result.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
$tr = $dom->getElementsByTagName('tr');
foreach ($tr as $row){
$td = $row->getElementsByTagName('td');
$td1 = $td->item(1);
$td2 = $td->item(2);
foreach ($td1->childNodes as $node){
$title = $node->textContent;
}
foreach ($td2->childNodes as $node){
$type = $node->textContent;
}
}
Figured it out
$html = file_get_contents('result.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
$tr = $dom->getElementsByTagName('tr');
foreach ($tr as $row){
$td = $row->getElementsByTagName('td');
$td1 = $td->item(1);
$td2 = $td->item(2);
$title = $td1->childNodes->item(0)->textContent;
$firstURL = $td1->getElementsByTagName('a')->item(0)->getAttribute('href');
$type = $td2->childNodes->item(0)->textContent;
$imageURL = $td2->getElementsByTagName('img')->item(0)->getAttribute('src');
}
I have used following class.
http://sourceforge.net/projects/simplehtmldom/
This is very simple and easy to use class.
You can use
$html->find('#RosterReport > tbody', 0);
to find specific table
$html->find('tr')
$html->find('td')
to find table rows or columns
Note $html is variable have full html dom content.

PHP Dom Getting Multiple href From Class

Could someone please help me out.
I'm trying to get multiple href's from a page for exmaple.
The page
<div class="link__ttl">
Version 1
</div>
<div class="link__ttl">
Version 1
</div>
PHP Dom
$data = array();
$data['links'] = array();
$page = $this->curl->get($page);
$dom = new DOMDocument();
#$dom->loadHTML($page);
$divs = $dom->getElementsByTagName('div');
for($i=0;$i<$divs->length;$i++){
if ($divs->item($i)->getAttribute("class") == "link__ttl") {
foreach ($divs as $div) {
$link = $div->getElementsByTagName('a');
$data['links'][] = $link->getAttribute("href");
}
}
}
But this don't same to work and i get a error
Call to undefined method DOMNodeList::getAttribute()
Could someone help me out here please thanks
You're testing divs for having the link__tt class, but then just for each all the divs. Take only the anchors from the divs that have the class.
Then you're trying to call getAttribute from a DOMNodeList, you need to get the underlying domnode to get the attribute.
$divs = $dom->getElementsByTagName('div');
for($i=0;$i<$divs->length;$i++){
$div = $divs->item($i);
if ($div->getAttribute("class") == "link__ttl") {
$link = $div->getElementsByTagName('a');
$data['links'][] = $link->item(0)->getAttribute("href");
}
}
Another solution is to use xpath
$path = new DOMXPath($dom);
$as = $path->query('//div[#class="link__ttl"]/a');
for($i=0;$i<$as->length;$i++){
$data['links'][] = $as->item($i)->getAttribute("href");
}
http://codepad.org/pX5qA1BB
$link = $div->getElementsByTagName('a'); retrieves a LIST of Items where you cant's get an attribute-value "href" of...
try use of $link[0] instead of $link
Any part of a DOM is an node. The attributes are nodes, too, not just the elements. Using Xpath you can directly fetch an list of href attribute nodes.
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$result = [];
foreach ($xpath->evaluate('//div[#class = "link__ttl"]/a/#href') as $href) {
$result[] = $href->value;
}
var_dump($result);
Output: https://eval.in/150202
array(2) {
[0]=>
string(24) "/watch-link-53767-934537"
[1]=>
string(24) "/watch-link-53759-934537"
}

looking to loop for 2 element in the same time (php /xpath )

I'm trying to extract 2 elements using PHP Curl and Xpath!
So far have the element separated in foreach but I would like to have them in the same time:
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->evaluate("//p[#class='row']/a/#href");
//$elements = $xpath->query("//p[#class='row']/a");
foreach ($elements as $element) {
$url = $element->nodeValue;
//$title = $element->nodeValue;
}
When I echo each one out of the foreach I only get 1 element and when its echoed inside the foreach i get all of them.
My question is how can I get them both at the same time (url and title ) and whats the best way to add them into myqsl using pdo.
thank you
There is no need, in this case, to use XPath twice. You could do one query and navigate to the associated other node(s).
For example, find all of the hrefs that you are interested in and get their ownerElement's (the <a>) node value.
$hrefs = $xpath->query("//p[#class='row']/a/#href");
foreach ($hrefs as $href) {
$url = $href->value;
$title = $href->ownerElement->nodeValue;
// Insert into db here
}
Or, find all of the <a>s that you are interested in and get their href attributes.
$anchors = $xpath->query("//p[#class='row']/a[#href]");
foreach ($anchors as $anchor) {
$url = $anchor->getAttribute("href");
$title = $anchor->nodeValue;
// Insert into db here
}
You're overwriting $url on each iteration. Maybe use an array?
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->evaluate("//p[#class='row']/a/#href");
//$elements = $xpath->query("//p[#class='row']/a");
$urls = array();
foreach ($elements as $element){
array_push($urls, $element->nodeValue);
//$title = $element->nodeValue;
}

Xpath for extracting links

I create an scraper for an automoto site and first I want to get all manufactures and after that all links of models for each manufactures but with the code below I get only the first model on the list. Why?
<?php
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.auto-types.com');
$xpath = new DOMXPath($dom);
$entries = $xpath->query("//li[#class='clearfix_center']/a/#href");
$output = array();
foreach($entries as $e) {
$dom2 = new DOMDocument();
#$dom2->loadHTMLFile('http://www.auto-types.com' . $e->textContent);
$xpath2 = new DOMXPath($dom2);
$data = array();
$data['newLinks'] = trim($xpath2->query("//div[#class='modelImage']/a/#href")->item(0)->textContent);
$output[] = $data;
}
echo '<pre>' . print_r($output, true) . '</pre>';
?>
SO I need to get: mercedes/100, mercedes/200, mercedes/300 but now with my script i get only the first link so mercedes/100...
please help
You need to iterate through the results instead of just taking the first item:
$items = $xpath2->query("//div[#class='modelImage']/a/#href");
$links = array();
foreach($items as $item) {
$links[] = $item->textContent;
}
$data['newLinks'] = implode(', ', $links);

Simple HTML DOM gets only 1 element

I'm following a simplified version of the scraping tutorial by NetTuts here, which basically finds all divs with class=preview
http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/comment-page-1/#comments
This is my code. The problem is that when I count $items I get only 1, so it's getting only the first div with class=preview, not all of them.
$articles = array();
$html = new simple_html_dom();
$html->load_file('http://net.tutsplus.com/page/76/');
$items = $html->find('div[class=preview]');
echo "count: " . count($items);
Try using DOMDocument and DOMXPath:
$file = file_get_contents('http://net.tutsplus.com/page/76/');
$dom = new DOMDocument();
#$dom->loadHTML($file);
$domx = new DOMXPath($dom);
$nodelist = $domx->evaluate("//div[#class='preview']");
foreach ($nodelist as $node) { print $node->nodeValue; }

Categories