Target element within specific element domdocument - php

I want to target a tags with class genre within parent div with id test:
<div id="test">
<a class="genre">hello</a>
<a class="genre">hello2</a>
</div>
So far, I can get all the genre a tags:
$xpath = new DOMXPath($doc);
$elements = $xpath->query('//a[#class="genre"]');
... but I want to adjust //a[#class="genre"] so I only target the ones within the test div.

I don't understand why you did not write it yourself because you use all needed elements of xpath in your expression. Or, maybe, i've misunderstand you question
$elements = $xpath->query('//div[#id="test"]/a[#class="genre"]');

Related

Get one div above the div of certain class PHP DOMDocument

This is the first time i am using PHP DOMDocument and i don't know its methods.
I grab the html that has the following format
<div class=row abc>...</div>
<div class=row xyz>...</div>
<div class=row qrs>...</div>
...
...
<div class="row>This is what i want to grab</div>
<div class="row show-more-result">Show More</div>
What i am trying to achieve is that first i select the div with class show-more-results and then target the one level upper div thats where my data is present.
I have started exploring the PHP DOMDocument class but there is not any getElementByClass method i found
public function scrapping()
{
// Create a DOMDocument Object to fetch the search results
$dom = new \DOMDocument;
#$dom->loadHTML($this->_response);
$dom->preserveWhiteSpace = false;
$xpath = new \DomXpath($dom);
$show_more_div = $xpath->query('//*[#class="show-more-result"]')->item(0);
$stuff = $show_more_div->textContent;
echo($stuff);
}
I tried to target the show more div but it says Trying to get property of non-object as if the $xpath-query() returns nothing.
Please help me in targeting the desired div.
Updated
var_dump($xpath->query('//*[#class="show-more-result"]')->item(0));
// NULL
You're doing a straight string equality:
$show_more_div = $xpath->query('//*[#class="show-more-result"]')->item(0);
^^^^^^^
But your target div's class is actually row show-more-result. You need to do a substring match instead:
//*[contains(#class, 'show-more-result')]

Extract only first level paragraphs from html

I have the following html:
<div id="myID">
<p>I want this</p>
<p>and I want this</p>
<div>
<p>I don't want this</p>
</div>
</div>
I want to extract only the first level <p>...</p> elements.
I've tried using the excellent simple_html_dom library e.g. $html->find('#myID p') but in the case above, this finds all three <p>...</p> elements
Is there a better way to do this?
Instead of having to use some external library why don't you just use the built in classes to handle the dom?
First create a DOMDocument instance using your HTML:
$dom = new DOMDocument();
$dom->loadHtml($yourHtml);
After that use DOMXPath to select your elements:
$xpath = new DOMXpath($dom);
$nodes = $xpath->query("//*[#id='myID']/p");
var_dump($nodes->length); // outputs 2
This selects all p elements which are direct children of the element with the id myID. Demo

How to parse multiple elements in portions for html via Simple Html Dom

I am attempting to get various elements inside of an li as shown below. I am pretty new to this so I may not be using the most efficient methods but this is where I have started...
EXAMPLE CODE SIMPLIFIED....
<li id='entry_0' title='09879879'>
<div ....>
<h2> The title text would go here </h2>
<span class='entrySize' ....> 20oz </span>
<span class='entryPrice' ....> $32.09 </span>
<span class='anotherEntry' ....> More Data I need To Grab </span>
.......
</div>
</li>
<li> .... With same structure as above .... 100's of entries like this </li>
I know how to pull individual parts separately but having trouble grasping how to do it grouped within a portion of the html.
$filename = "directory/file.html";
$html = file_get_html($filename);
for($i=0; $i<=count(entryNumber);$i++)
{
$li_id = "entry_".$i;
foreach($html->find('li[id='.$li_id.']') as $li) {
echo $li->innertext;
}
}
So this gets me the content in the line item tag with the id number as the unique attribute. I would like to grab the h2 text, entrySize, entryPrice etc as I iterate through the line item tags. What I don't understand is once I have the line item tag content how can I parse through that line item inner tags and attributes. There maybe other parts of the full HTML document that has tags with same id, class as these throughout the document so I am breaking this down to portions and than looking to parse each section at a time.
I would also like to pull the title attribute out of the title tag for the li tag.
I hope my explanation make sense.
You should probably use a DOM parser. PHP comes bundled with one, and there are many other's you could use.
http://php.net/dom
PHP Simple HTML DOM Parser
<?php
$html = file_get_content($page);
$doc = new DOMDocument();
$doc->loadHTML($html);
// now find what you need
$items = $dom->getElementsByTagName('li');
foreach ($items as $item) {
$id = $item->getAttribute('id');
if (strpos($id, 'item_') !== false) {
// found matchin li, grab its children
}
}
Use this as a baseline, we can't write all the code for you. Check out the PHP docs to finish this :) From what I have so far, you need to follow the docs to make it grab the child values, and handle them.

Zend_Dom_Query query element issue

I have an issue where I have a div that doesnt have a class or id. Is it possible to select an div element when I know its innerText ie
<div class="thishere"></div>
<div>Search on a this text</div>
If not, the div before it has a class, how do i find its next sibling?
$selector = new Zend_Dom_Query($response->getBody());
$nodes = $selector->query('????');
Using JavaScript you can loop through every element on the page like this says and find that div with the special class. Then, you'll know that the next element in the loop will be that second div and you can get its contents using element.innerHTML.
$text = <<<text
<div class="thishere"></div>
<div>Search on a this text</div>
text;
$selector = new Zend_Dom_Query ($text);
$nodes = $selector->queryXpath('//div[contains(text(),"Search on a this text")]');
foreach ($nodes as $node)
{
...
}

DOMXPath union extract with PHP

I'm trying to get img and the div which is coming after the div which contains that img, all in one query.
So I did this:
$nodes = $xpath->query('//div[starts-with(#id, "someid")]/img |
//div[starts-with(#id, "someid")]/following-sibling::div[#class="spec_class"][1]/text()');
Now, I'm able to get the attributes of img tag, but I can't get the text of the following sibling. If I separate the query (two queries - first for the img and second query for the sibling) it works. But how can I do this with only one query? By the way, there is no error in the syntax. But somehow the union doesn't work or maybe I'm not extracting the sibling content right.
Here's the markup (which repeats many times with another text and id="someid_%randomNumber%)
<div id="someid_1">
<img src="link_to_image.png" />
...some text...
</div>
<div>...another text...</div>
<div class="spec_class">
...Important text...
</div>
I want to get in one query both link_to_image.png and ...Important text...
Your query seems correct.
Example XML:
<div>
<div id="someid-1"><img src="foo"/></div>
<div class="spec_class">bar</div>
<div class="spec_class">baz</div>
</div>
Example PHP Code:
$dom = new DOMDocument;
$dom->loadXml($xhtml);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//div…') as $node) {
echo $dom->saveXML($node);
}
Outputs (demo):
<img src="foo"/>bar
Note that you will have to iterate the DOMNodeList returned by the XPath query.

Categories