PHP Grab HTML Tag Rows/Hierarchy

PHP Grab HTML Tag Rows/Hierarchy - php

So I wanted to know if there was a way to grab a specific HTML tag's information using PHP.
Let's say we had this code:
<ul>
<li>First List</li>
<li>Second List</li>
<li>Third List</li>
</ul>
How could I search the HTML and pull the third list item's value into a variable? Or is there a way you could pull the entire unordered list into an array?

Has not been tested or compiled, but one way is create a function that utilizes PHP: DOMDocument and its method getElementsByTagName which returns a
PHP: DOMNodeList that you can access the node at a specific index.
function grabAttributes($file, $tag, $index) {
$dom = new DOMDocument();
if (!#$dom->load($file)) {
echo $file . " doesn't exist!\n";
return;
}
$list = $dom->getElementsByTagName($tag); // returns DOMNodeList of given tag
$newElement = $list->item($index)->nodeValue; // initialize variable
return $newElement;
}
If you call grabAttributes("myfile.html", "li", 2) the variable will be set to "Third List"
Or you can make a function to put all the attributes of a given tag into an array.
function putAttributes($file, $tag) {
$dom = new DOMDocument();
if (!#$dom->load($file)) {
echo $file . " doesn't exist!\n";
return;
}
$list = $dom->getElementsByTagName($tag); // returns DOMNodeList of given tag
$myArray = array(); // array to contain values.
foreach ($list as $tag) { // loop through node list and add to an array.
$myArray[] = $tag->nodeValue;
}
return $myArray;
}
If you call putAttributes("myfile.html", "li") it would return array("First List", "Second List", "Third List")

Related

How to get a list of all html elements in PHP?

According to the documentation for DOMDocument::getElementsByTagName, I can call the function with "*" argument, and get a list of all HTML elements from some HTML code.
However, with the following code:
<?php
$dom = new DOMDocument();
$dom->loadHTML("<html><body><div>hello</div><div>bye</div></body></html>");
$nodes = $dom->getElementsByTagName("*");
foreach ($nodes as $node) {
$new_text= new DOMText($node->textContent."MODIFIED");
$node->removeChild($node->firstChild);
$node->appendChild($new_text);
}
$content = $dom->saveHTML();
echo $content;
?>
I get a list of only one element, and the result of execution of the code above is:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>hellobyeMODIFIED</html>
while I would expect something like this:
<html><body><div>helloMODIFIED</div><div>byeMODIFIED</div></body></html>
Shouldn't DOMDocument::getElementsByTagName method return a list of as many HTML elements as available in the HTML code?
Note: I need to create DOMText instances explicitly, because I need this to work in PHP 5.4. DOMNode::textContent is accessible for writing only from PHP 5.6

The DOMDocument::getElementsByTagName method actually returns all the tags, if the first argument is '*'. But your code replaces <body> tag (including all child nodes) with a text node at the first iteration.
Iterate the nodes, and modify only the nodes with nodeType property equal to XML_TEXT_NODE:
$nodes = $dom->getElementsByTagName('*');
foreach ($nodes as $node) {
for ($child = $node->firstChild; $child; $child = $child->nextSibling) {
if (! ($child->nodeType === XML_TEXT_NODE && trim($child->textContent))) {
continue;
}
// The textContent is writable since PHP 5.6.1
if (PHP_VERSION_ID >= 50601) {
$child->textContent .= 'MODIFIED';
continue;
}
// For older versions, create DOMText explicitly
$text = new DOMText($child->textContent . 'MODIFIED');
try {
if ($child->parentNode->replaceChild($text, $child))
$child = $text;
} catch (Exception $e) {
trigger_error("Failed to modify text '$child->textContent': "
. $e->getMessage(), E_USER_WARNING);
}
}
}
echo $dom->saveHTML();
Note, for PHP versions 5.6.1 and newer, you don't need to create DOMText instances explicitly, since the DOMNode::textContent property is accessible for read and write. So you can simply modify the text by assigning a string value to this property. Only make sure that the node has no child nodes other than XML_TEXT_NODE.
The code above checks if trim($child->textContent) is not empty, because the document may contain extra space characters (including newline), e.g.:
<div><!-- newline/spaces -->
<span>text</span><!-- newline/spaces -->
</div><!-- newline/spaces -->

This function 'DOMDocument::getElementsByTagName' returns a new instance of class DOMNodeList containing all the elements.
And it works fine:
<?php
$dom = new DOMDocument();
$dom->loadHTML("<html><body><div>hello</div><div>bye</div></body></html>");
$nodes = $dom->getElementsByTagName("*");
foreach ($nodes as $node) {
echo $node->tagName."<br />";
}
?>
it output all tags of your document.
Probably you need smth like:
<?php
$dom = new DOMDocument();
$dom->loadHTML("<html><body><div>hello</div><div>bye</div></body></html>");
$nodes = $dom->getElementsByTagName("*");
foreach ($nodes as $node) {
if ($node->tagName=='div'){
$node->nodeValue .= "new content";
}
}
$content = $dom->saveHTML();
echo htmlspecialchars($content);
?>

Try this:-
foreach($dom->getElementsByTagName('*') as $element ){
}

How to get all child nodes from DOMDocument?

I have the following
$string = '<html><head></head><body><ul id="mainmenu">
<li id="1">Hallo</li>
<li id="2">Welt
<ul>
<li id="3">Sub Hallo</li>
<li id="4">Sub Welt</li>
</ul>
</li>
</ul></body></html>';
$dom = new DOMDocument;
$dom->loadHTML($string);
now I want to have all li IDs inside one array.
I tried the following:
$all_li_ids = array();
$menu_nodes = $dom->getElementById('mainmenu')->childNodes;
foreach($menu_nodes as $li_node){
if($li_node->nodeName=='li'){
$all_li_ids[]=$li_node->getAttribute('id');
}
}
print_r($all_li_ids);
As you might see, this will print out [1,2]
How do I get all children (the subchildren as well [1,2,3,4])?

My test doesn't return element by using $dom->getElementById('mainmenu'). But if your using does, do not use Xpath
$xpath = new DOMXPath($dom);
$ul = $xpath->query("//*[#id='mainmenu']")->item(0);
$all_li_ids = array();
// Find all inner li tags
$menu_nodes = $ul->getElementsByTagName('li');
foreach($menu_nodes as $li_node){
$all_li_ids[]=$li_node->getAttribute('id');
}
print_r($all_li_ids); 1,2,3,4

One way to do it would be to add another foreach loop, ie:
foreach($menu_nodes as $node){
if($node->nodeName=='li'){
$all_li_ids[]=$node->getAttribute('id');
}
foreach($node as $sub_node){
if($sub_node->nodeName=='li'){
$all_li_ids[]=$sub_node->getAttribute('id');
}
}
}

How to get value of onclick= using xpath?

I have a string that has lots of <li> sets of data. I want to get this value:
1: call.php?category=fruits&fruitid=123456
inside onclick using xpath . My current xpath doesn't get me the onclick value so I parse it further to get my required data ! Could any one tell me what is the correct xpath to get value of onclick?
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTML($code2);
$xpath = new DOMXPath($dom);
// Empty array to hold all links to return
$result = array();
//Loop through each <li> tag in the dom
foreach($dom->getElementsByTagName('li') as $li) {
//Loop through each <a> tag within the li, then extract the node value
foreach($li->getElementsByTagName('a') as $links){
$result[] = $links->nodeValue;
echo $result[0] . "\n";
}
$onclicks = $xpath->query("//li/a/onclick");
foreach ($onclicks as $onclick) {
echo $onclick->nodeValue . "\n";
}
}
data:
<li><a id="FR123456" onclick="setFood(false);setSeasonFruitID('123456');getit('call.php?category=fruits&fruitid=123456&',detailFruit,false);">mango season</a><img src="http://imagehosting.com/images/fru_123456.png">
</li>

onclick is an attribute, and you use #attribute_name to reference attribute in XPath :
$onclicks = $xpath->query("//li/a/#onclick");
foreach ($onclicks as $onclick) {
echo $onclick->nodeValue . "\n";
}

Try something like this :
$onclicks = $xpath->query("//li/a");
foreach ($links as $link) {
echo $link->getAttribute('onclick'). "\n";
}

pick the elements INSIDE the ul NOT the ul itself

this is the php code:
include_once('simple_html_dom.php');
$html = file_get_html('URL');
$elem = $html->find('ul[id=members-list]', 0);
echo $elem;
I would like to be able to pick the inside of the UL so the elements per se, not the ul itself.
html as follows:
<ul id="members-list">
<li>1</li>
<li>2</li>
<li>3</li>
<li>4</li>
</ul>
so when I do echo $elem it returns the ul included. I want to take it out just return :
<li>1</li>
<li>2</li>
<li>3</li>
<li>4</li>

You just forgot to use children() method. Consider this example:
$ul = $html->find('ul[id="members-list"]', 0)->children();
foreach($ul as $li) {
echo $li;
}
Is stated in the manual:
How to traverse the DOM tree? -> Traverse the DOM Tree
mixed$e->children ( [int $index] ) Returns the Nth child object if index is set, otherwise return an array of children.
Or the much easier way: ->innertext magic attribute
$ul = $html->find('ul[id="members-list"]', 0);
echo $ul->innertext;

You can use:
$('#members-list li')
to iterate over them:
$('#members-list li').each(function(){
console.log(this);//object of current li
});

Have a look at this this also will print the values of inside li tag
<?php
$html = file_get_contents('2.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('ul') as $node) {
foreach($node->childNodes as $childNode){
echo $childNode->nodeValue;
}
}
?>

You have to use .html() for the selected ul element :
include_once('simple_html_dom.php');
$html = file_get_html('URL');
$elem = $html->find('ul[id=members-list]', 0)->html();
^-- to get all child elements with tag
echo $elem;

XPATH Get Attribute of Current Node

Having trouble getting the attribute of the current node in PHP and making a condition based on that attribute...
Example XML
<div class='parent'>
<div class='title'>A Title</div>
<div class='child'>some text</div>
<div class='child'>some text</div>
<div class='title'>A Title</div>
<div class='child'>some text</div>
<div class='child'>some text</div>
</div>
What I am trying to do is traverse the XML in PHP and do different things based on the class of the element/node
Eg.
$doc->loadHTML($xml_string);
$xpath = new DOMXpath($doc);
$nodeLIST = $xpath->query("//div[#class='parent']/div");
foreach ($nodeLIST as $node) {
if (CURRENT DIV NODE ATTRIBUTE EQUALS TITLE) {
SET $TITLE VARIABLE TO THE TEXT() OF THE CURRENT NODE
}
ELSEIF(CURRENT DIV NODE ATTRIBUTE EQUALS CHILD){
SET $CHILD VARIABLE TO THE TEXT() OF THE CURRENT NODE
}
}
I've tried all kind of things like the following...
if ($xpath->query("./[#class='title']/text()",$node)->length > 0) { }
But all i keep getting is PHP errors saying that my XPATH syntax is not valid. Can anyone help me?

You can achieve this by using getAttribute() method. Example:
foreach($nodeLIST as $node) {
$attribute = $node->getAttribute('class');
if($attribute == 'title') {
// do something
} elseif ($attribute == 'child') {
// do something
}
}

$node->getAttribute('class') gives you the attribute value, $node->textContent the string contents of the node. I wouldn't dive into XPath to read out the string value.

You can filter the 'title' and 'child' sets in different nodelists:
$titles = $xpath->query("//div[#class='parent']/div[#class='title']");
$children = $xpath->query("//div[#class='parent']/div[#class='child']");
And then process them separately:
foreach ($titles as $title) {
echo $title->textContent."\n";
}
foreach ($children as $child) {
echo $child->textContent."\n";
}
See: http://codepad.viper-7.com/x4LA50

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Grab HTML Tag Rows/Hierarchy - php

Related

How to get a list of all html elements in PHP?

How to get all child nodes from DOMDocument?

How to get value of onclick= using xpath?

pick the elements INSIDE the ul NOT the ul itself

XPATH Get Attribute of Current Node

Categories

Resources