php - parentNode of a string found with preg_match

php - parentNode of a string found with preg_match - php

I'm trying to access the parentNode of an element found with preg_match, because I would like to read the result found with regex through the DOM of the document. I can't access it directly through PHP's DOMDocument because the amount of div's is variable and they have no actualy ID or any other attribute that is able to match.
To illustrate this: in the below example I'd match match_me with preg_match, and then I'd want to access the parentNode (div) and put all the child elements (the p's) in an DOMdocument object, so I can easily display them.
<div>
.... variable amount of divs
<div>
<div>
<p>1 match_me</p><p>2</p>
</div>
</div>
</div>

Use DOMXpath to query for the node by the value of its child:
$dom = new DOMDocument();
// Load your doc however necessary...
$xpath = new DOMXpath($dom);
// This query should match the parent div itself
$nodes = $xpath->query('/div[p/text() = "1 match_me"]');
$your_div = $nodes->item(0);
// Do something with the children
$p_tags = $your_div->childNodes;
// Or in this version, the query returns the `<p>` on which `parentNode` is called
$ndoes = $xpath->query('/p[text() = "1 match_me"]');
$your_div = $nodes->item(0)->parentNode;

Related

Search for substrings of text in a node in a XML file

I have this PHP I found in a Q&A forum that queries an XML file:
$doc = new DOMDocument; // Create a new dom document
$doc->preserveWhiteSpace = false; // Set features
$doc->formatOutput = true; // Create indents on xml
$doc->Load('i.xml'); // Load the file
$xpath = new DOMXPath($doc);
$query = '//users/user/firstname[.= "'.$_POST["search"].'"]'; // The xpath (starts from root node)
$names = $xpath->query($query); // A list of matched elements
$Output="";
foreach ($names as $node) {
$Output.=$doc->saveXML($node->parentNode)."\n"; // We get the parent of "<firstname>" element (the entire "<user>" node and its children) (maybe get the parent node directly using xpath)
// and use the saveXML() to convert it to string
}
echo $Output."<br>\n\n"; // The result
echo "<hr><br><b>Below view the results as HTML content. (See also the page's HTML code):</b>\n<pre>".htmlspecialchars($Output)."</pre>";
The script will search the values of all the firstname nodes in the XML document from the input from the POST, and will return the parent node of the nodefirstname, if the POST input value matches any of the node values.
This script works well, but it only returns queries that contain the entire value of a firstname node, and will not work if I search for a substring of the node's text (e.g a query for Potato will return Potato, but a query for Pot, will not give me results for Potato).
So how do you get a result that only contains a substring of the node's text, instead of the entire value ?

How to get full HTML from DOMXPath::query() method?

I have document from which I want to extract specific div with it's untouched content.
I do:
$dom = new DOMDocument();
$dom->loadHTML($string);//that's HTML of my document, string
and xpath query:
$xpath = new DOMXPath($dom);
$xpath_resultset = $xpath->query("//div[#class='text']");
/*I'm after div class="text"*/
now I do item(0) method on what I get with $xpath_resultset
$my_content = $xpath_resultset->item(0);
what I get is object (not string) $my_content which I can echo or settype() to string, but as result I get is with fully stripped markup?
What to do to get all from div class='text' here?

Just pass the node to the DOMDocument::saveHTML method:
$htmlString = $dom->saveHTML($xpath_resultset->item(0));
This will give you a string representation of that particular DOMNode and all its children.

getElementsByTagName title is coming back with DOMNodeList Object

Our script uses dom to parse all the a tags from a document then loops through child nodes and extracts information which works fine here's how the code starts
#$dom->loadHTML($str);
$documentLinks = $dom->getElementsByTagName("a");
Part of the loop
$this->count]['href'] = strip_tags($documentLink->getAttribute('href'));
I now need to get the title tag from each page were lopping through so I thoguht I could do
$documentTitle = $dom->getElementsByTagName("title");
$documentLinks = $dom->getElementsByTagName("a");
Then add this to the loop/array to get the document title but it comes back with "[title] => DOMNodeList Object()" How can I include the title tag in the loop which is going through a tags/child nodes?
$this->count]['title'] = $documentTitle;

getElementsByTagName returns a DOMNodeList object. You want the text content of the first (should only be one page title) item in the list.
Try this:
$documentTitle = $dom->getElementsByTagName('title')->item(0)->textContent;

PHP DOMXPath problem

$xpath = new DOMXpath($doc);
$res = $xpath->query(".//*[#id='post2679883']/tr[2]/td[2]/div[2]");
foreach( $res as $obj ) {
var_dump($obj->nodeValue);
}
I need to take all the items in the id with the word "post".
Example:
<div id="post2242424">trarata</div>
<div id="post114525">trarata</div>
<div id="post8568686">trarata</div>
Question number two:
I need to get this elements with HTML tags, but $obj->nodeValue returns text without html tags.

You could use the xpath function starts-with to filter the nodes in your XPath if all the nodes you want start with "post". For example;
$xpath->query(".//*[starts-with(#id, 'post')]/tr[2]/td[2]/div[2]");
For the second part, I think has been answered already - PHP DOMDocument stripping HTML tags

What xPath should I use to display the requested data?

I am using the following script to get the POST TITLE and the CONTENT of an RSS feed. The structure of it is: ( I guess i did not make any error)
<div id="feedBody">
<div id="feedContent">
<div class="entry">
<h3>TITLE OF POST</h3>
<div base="http://feeds.feedburner.com/blogspot/hyMBI"
class="feedEntryContent"
> CONTENT OF POST </div>
</div>
</div>
</div>
<?php
$dom = new DOMDocument;
libxml_use_internal_errors(TRUE);
$dom->loadHTMLFile('http://feeds.feedburner.com/blogspot/hyMBI');
libxml_clear_errors();
$xPath = new DOMXPath($dom);
$links = $xPath->query('????????????????');
foreach($links as $link) {
printf("%s \n", $link->nodeValue);
}
?>
What xPath should I use to get the data? Is there any way of having them seperate?
Thanks a million, hopefully this is my last question on my project...

First, you should load the XML using load, not loadHTMLFile.
Judging by your variable name "$links", I guess you're wanting the values of the <link> elements inside the <item> elements. So construct an xpath query that says just that: //item/link.

Basic XPath: //div[#class="entry"] gets you an array of all entries. You can get the first (or only) entry with //div[#class="entry"][1]. With that, you can use h3 to get the text of the title node, and div[1] to get the contents (if it's guaranteed that there's only one, otherwise specify the class).
You can put them together like //div[#class="entry"][1]/h3 if you like, so that you only have to query the root node. Otherwise, save the new node for the next query, like:
$entries = $xPath->query('//div[#class="entry"][1]');
foreach($entry in $entries) {
$title = $xPath->evaluate('h3[1]',$entry);
$post = $xPath->evaluate('div[1]',$entry);
}
If your RSS returns a whole group of posts, you can leave off the first [1] and loop through the whole group this way.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php - parentNode of a string found with preg_match - php

Related

Search for substrings of text in a node in a XML file

How to get full HTML from DOMXPath::query() method?

getElementsByTagName title is coming back with DOMNodeList Object

PHP DOMXPath problem

What xPath should I use to display the requested data?

Categories

Resources