Get just the first item with DOMDocument in PHP

Get just the first item with DOMDocument in PHP - php

I am using this below code to get the elements that are in special HTML element :
$dom = new DOMDocument();
#$dom->loadHTML($google_html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//span[#class="st"]');
foreach ($tags as $tag) {
echo $node_value;
}
Now, the problem is that, the code gives all of the elements that are in one special class, but i just need to get the First item that has that class name.
So i don't need using foreach loops.
How to use that code to get JUST the FIRST item ?

The following will make sure you get just the first one in the DOMNodeList that is returned
$xpath->query('//span[#class="st"][1]');
The following gets the only item in the DOMNodeList
$tags = $xpath->query('//span[#class="st"][1]');
$first = $tags->item(0);
$text = $first->textContent;
See XPath: Select first element with a specific attribute

Related

How to parse html and create array from dt/dd

I have a select element for which I want to dynamically create the option values using information from another website. The information can be seen at http://www.dandwiki.com/wiki/SRD:Classes. It is the list of 'Base Classes'. I have tried using the DOMDocument class, but I can't see any way of using a url instead of an html file. I have tried using file_get_html and a foreach loop, but can't make it work with the format of the data on the website. It is in a dt/dd element, and the elements don't have id's. What would be the best way to pull the information off the website, and create an option value for each class in my select element?

I usually use http://www.bit-101.com/xpath/ to test the query. You would end up with something like this.
<?php
$url = 'http://www.dandwiki.com/wiki/SRD:Classes';
$html = file_get_contents($url);
$dom = new DOMDocument();
//from file
#$dom->loadHTML($html);
//Creating a new DOMPath
$Xpath = new DOMXPath($dom);
// Get links
// //big Selects all <big> elements in the document.
// [text()="Base Classes"] with text "base classes"
// /.. get parent class
// /.. get parent class
// //a select all <a> elements
// /text() get text
$query = '//big[text()="Base Classes"]/../..//a/text()';
$entries = $Xpath->query($query);
foreach ($entries as $entry) {
echo $entry->nodeValue . "<br>";
}
?>

How to get only 1 tag using new DOMDocument

I only want to get the first iframe. how should I do that?
here is my code:
$url = "http://www.flixxy.com/10-famous-movie-scenes.htm";
$page = new DOMDocument;
$page->loadHTML(file_get_contents($url));
foreach ($page->getElementsByTagName('iframe') as $node) {
echo $node->getAttribute('src');
}
i'm just new in this. thanks in advance. :)

getElementsByTagName() returns a DOMNodeList of all the matched tags. Since you only want the first tag, the foreach loop is not required. You can simply use the item() method to traverse the DOMNodeList and retrieve the first iframe tag:
$nodes = $page->getElementsByTagName('iframe'); // get all the tags
echo $node->item(0)->getAttribute('src'); // get the attribute
You can shorten and make it one-line, if you want:
echo $page->getElementsByTagName('iframe')->item(0)->getAttribute('src');

how do i extract values from multiple divs with xpath

How do I make this code snippet return the values for every div with class age on the page I am parsing rather than just the first one as it does now?
$nodelist = $xpath->query('//div[#class="age"]')->item(0);
print_r($nodelist->nodeValue);
I have some similar code that returns all the images I want but I can't seem to modify it to return the matching div values I want:
$nodelist = $xpath->query( "//div[#class='thumb-wrapper']" );
foreach ($nodelist as $node)
{
$tags = $node->getElementsByTagName('img');
$image = $tags->item(0)->getAttribute('src');
echo '<img src="'. $image .'" alt="image" ><br>';
}

You need to use "*"
Using the star () selects every element that is within the preceding
path. So if you wanted to match every element that is within a td tag
(such as p, div, etc.), you would write: //td/

The problem with this code isn't the XPath its what you do with it once its returned.
$nodelist = $xpath->query('//div[#class="age"]')->item(0);
print_r($nodelist->nodeValue);
This gets all of the divs and then gets the first one using ->item(0) and then assigns that frst item to the variable $nodelist.
Using you existing code as an example you can alter it by removing the ->item(0), assign all the results to $nodelist and iterate through them just like the second 'working' example:
$nodelist = $xpath->query('//div[#class="age"]');
foreach ($nodelist as $node)
{
// Do something with each div
}

PHP DOMDocument, retrieve just content of a div, without div tag

I'm using DOMDocument to retrieve on a HTML page a special div.
I just want to retrive the content of this div, without the div tag.
For example :
$dom = new DOMDocument;
$dom->loadHTML($webtext['content']);
$main = $dom->getElementById('inter');
$dom->saveHTML()
Here, i have the result :
<div id="inter">
//SOME THINGS IN MY DIV
</div>
And i just want to have :
//SOME THINGS IN MY DIV
Ideas ? Thanks !

I'm going to go with simple does it. You already have:
$dom = new DOMDocument;
$dom->loadHTML($webtext['content']);
$main = $dom->getElementById('inter');
$dom->saveHTML();
Now, DOMDocument::getElementById() returns one DOMElement which extends DOMNode which has the public stringnodeValue. Since you don't specify if you are expecting anything but text within that div, I'm going to assume that you want anything that may be stored in there as plain text. For that, we are going to remove $dom->saveHTML();, and instead replace it with:
$divString = $main->nodeValue;
With that, $divString will contain //SOME THINGS IN MY DIV, which, from your example, is the desired output.
If, however, you want the HTML of the inside of it and not just a String representation - replace it with the following instead:
$divString = "";
foreach($main->childNodes as $c)
$divString .= $c->ownerDocument->saveXML($c);
What that does is takes advantage of the inherited DOMNode::childNodes which contains a DOMNodeList each containing its own DOMNode (for reference, see above), and we loop through each one getting the ownerDocument which is a DOMDocument and we call the DOMDocument::saveXML() function. The reason we pass the current $c node in to the function is to prevent an entire valid document from being outputted, and because the ownerDocument is what we are looping through - we need to get one child at a time, with no children left behind. (sorry, it's late, couldn't resist.)
Now, after either option, you can do with $divString what you will. I hope this has helped explain the process to you and hopefully you walk away with a better understanding of what is going on instead of rote copying of code just because it works. ^^

you can use my custom function to remove extra div from content
$html_string = '<div id="inter">
SOME THINGS IN MY DIV
</div>';
// custom function
function DOMgetinnerHTML($element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
your code will like
$dom = new DOMDocument;
$dom->loadHTML($html_string);
$divs = $dom->getElementsByTagName('div');
$innerHTML_contents = DOMgetinnerHTML($divs->item(0));
echo $innerHTML_contents
and your output will be
SOME THINGS IN MY DIV

you can use xpath
$xpath = new DOMXPath($xml);
foreach($xpath->query('//div[#id="inter"]/*') as $node)
{
$node->nodeValue
}
or simplu you can edit your code. see here
$main = $dom->getElementById('inter');
echo $main->nodeValue

confused with xpath

I've got this PHP code loading in some html.
$dom = new DOMDocument();
$dom->loadHTML($somehtml);
$xpath = new DOMXPath($dom);
$divContent = $xpath->query('//table[class="defURLP"]');
echo $divContent;
I'm too confused to understand quite what needs to go on here, however my desire would it to be able to populate the variable $divContent to have the html contents of the table with the classname defURLP
It's currently just returning
object(DOMNodeList)#3 (0) { }

You need to retrieve the first item from the DOMNodeList returned by your xpath query, since there may be more than one in the list.
// Queries for tables having class defURLP
$tables = $xpath->query('//table[class="defURLP"]');
// Reference the first one in $divContent
$divContent = $tables->item(0);
// Output its nodeValue
echo $divContent->nodeValue;
Or iterate over the node list with a foreach:
$tables = $xpath->query('//table[class="defURLP"]');
// Iterate over the whole node list in $tables (if it is multiple nodes)
foreach ($tables as $t) {
echo $t->nodeValue;
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Get just the first item with DOMDocument in PHP - php

Related

How to parse html and create array from dt/dd

How to get only 1 tag using new DOMDocument

how do i extract values from multiple divs with xpath

PHP DOMDocument, retrieve just content of a div, without div tag

confused with xpath

Categories

Resources