PHP -- Closing end DIV tag - php

I need to figure the closing tag for below code
<div class="emph"><div class="level"> Some testing </div></div>
In this i need to find the correct tag for parent DIV. my goal is to add the class name before the closing DIV like below
<div class="emph"><div class="level"> Some testing <!--level--></div><!--emph--></div>
For that i need to find the exact closing Parent DIV.
is that possible to achieve in PHP?

You can use simpleXML (or any other XML class) - for each div element, read it's class and append at the end of node content. It's not exactly finding the closing tag, but achieves your specified goal.
Sample code:
$dom = new DOMDocument;
$dom->loadXML($xml);
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
if ($div->getAttribute('class')!='') {
$div->nodeValue = $div->nodeValue.'<!--'.$div->getAttribute('class').'-->';
}
}
echo $dom->saveXML();

While printing the divs in PHP keep an array $div_array = array()
As soon as you open a div do:
array_push($div_array, 'emph'); // or 'level' depending on the classname
As soon as you're ready to print the closing tag, ask for the value of the last div by:
array_pop($div_array);
// for example
echo '<!-- '.array_pop($div_array).' -->';
Popping the array also deletes the last entry of the array. Which is what you want I presume.

Related

How to parse multiple elements in portions for html via Simple Html Dom

I am attempting to get various elements inside of an li as shown below. I am pretty new to this so I may not be using the most efficient methods but this is where I have started...
EXAMPLE CODE SIMPLIFIED....
<li id='entry_0' title='09879879'>
<div ....>
<h2> The title text would go here </h2>
<span class='entrySize' ....> 20oz </span>
<span class='entryPrice' ....> $32.09 </span>
<span class='anotherEntry' ....> More Data I need To Grab </span>
.......
</div>
</li>
<li> .... With same structure as above .... 100's of entries like this </li>
I know how to pull individual parts separately but having trouble grasping how to do it grouped within a portion of the html.
$filename = "directory/file.html";
$html = file_get_html($filename);
for($i=0; $i<=count(entryNumber);$i++)
{
$li_id = "entry_".$i;
foreach($html->find('li[id='.$li_id.']') as $li) {
echo $li->innertext;
}
}
So this gets me the content in the line item tag with the id number as the unique attribute. I would like to grab the h2 text, entrySize, entryPrice etc as I iterate through the line item tags. What I don't understand is once I have the line item tag content how can I parse through that line item inner tags and attributes. There maybe other parts of the full HTML document that has tags with same id, class as these throughout the document so I am breaking this down to portions and than looking to parse each section at a time.
I would also like to pull the title attribute out of the title tag for the li tag.
I hope my explanation make sense.
You should probably use a DOM parser. PHP comes bundled with one, and there are many other's you could use.
http://php.net/dom
PHP Simple HTML DOM Parser
<?php
$html = file_get_content($page);
$doc = new DOMDocument();
$doc->loadHTML($html);
// now find what you need
$items = $dom->getElementsByTagName('li');
foreach ($items as $item) {
$id = $item->getAttribute('id');
if (strpos($id, 'item_') !== false) {
// found matchin li, grab its children
}
}
Use this as a baseline, we can't write all the code for you. Check out the PHP docs to finish this :) From what I have so far, you need to follow the docs to make it grab the child values, and handle them.

Insert HTML codes to specific location

I know this topic was posted everywhere, but their question is not I want. I want to insert some HTML codes before the page is loaded without touching the original code in the page.
Suppose my header was rendered by a function called render_header():
function render_body() {
return "<body>
<div class='container'>
<div class='a'>A</div>
<div class='b'>B</div>
</div>
</body>";
}
From now, I want to insert HTML codes using PHP without editing the render_body(). I want a function that insert some divs to container'div.
render_body();
<?php *//Insert '<div class="c" inside div container* ?>
Just as an alternative using XPath - this should load in the output from render_body() to an XML (DOMDocument) object and create an XPath object to query your HTML so you can easily work out where you want to insert the new HTML.
This will probably only work if you're using XML well formed HTML though.
//read in the document
$xml = new DOMDocument();
$xml->loadHTML(render_body());
//create an XPath query object
$xpath = new DOMXpath($xml);
//create the HTML nodes you want to insert
// using $xml->createElement() ...
//find the node to which you want to attach the new content
$xmlDivClassA = $xpath->query('//body/div[#class="a"]')->item(0);
$xmlDivClassA->appendChild( /* the HTML nodes you've previously created */ );
//output
echo $xml->saveHTML();
Took a little while as I had to refer to the documentation ... too much JQuery lately it's ruining my ability to manipulate the DOM without looking things up :\
The only thing I can think of is to turn on output buffering and then use the DOMDocument class to read in the entire buffer and then make changes to it. It is worth doing some reading of the documentation (http://www.php.net/manual/en/book.dom.php) provided in the script...
ie.:
<?php
function render_body() {
return "<body>
<div class='container'>
<div class='a'>A</div>
<div class='b'>B</div>
</div>
</body>";
}
$dom = new DOMDocument();
$dom->loadHTML(render_body());
// get body tag
$body = $dom->getElementsByTagName('body')->item(0);
// add a new element at the end of the body
$element = $dom->createElement('div', 'My new element at the end!');
$body->appendChild($element);
echo $dom->saveHTML(); // echo what is in the dom
?>
EDIT:
As per CD001's suggestions, I have tested this code and it works.

Simple HTML DOM Parser - Get all plaintex rather than text of certain element

I tried all the solutions posted on this question. Although it is similar to my question, it's solutions aren't working for me.
I am trying to get the plain text that is outside of <b> and it should be inside the <div id="maindiv>.
<div id=maindiv>
<b>I don't want this text</b>
I want this text
</div>
$part is the object that contains <div id="maindiv">.
Now I tried this:
$part->find('!b')->innertext;
The code above is not working. When I tried this
$part->plaintext;
it returned all of the plain text like this
I don't want this text I want this text
I read the official documentation, but I didn't find anything to resolve this:
Query:
$selector->query('//div[#id="maindiv"]/text()[2]')
Explanation:
// - selects nodes regardless of their position in tree
div - selects elements which node name is 'div'
[#id="maindiv"] - selects only those divs having the attribute id="maindiv"
/ - sets focus to the div element
text() - selects only text elements
[2] - selects the second text element (the first is whitespace)
Note! The actual position of the text element may depend on
your preserveWhitespace setting.
Manual: http://www.php.net/manual/de/class.domdocument.php#domdocument.props.preservewhitespace
Example:
$html = <<<EOF
<div id="maindiv">
<b>I dont want this text</b>
I want this text
</div>
EOF;
$doc = new DOMDocument();
$doc->loadHTML($html);
$selector = new DOMXpath($doc);
$node = $selector->query('//div[#id="maindiv"]/text()[2]')->item(0);
echo trim($node->nodeValue); // I want this text
remove the <b> first:
$part->find('b', 0)->outertext = '';
echo $part->innertext; // I want this text

Zend_Dom_Query query element issue

I have an issue where I have a div that doesnt have a class or id. Is it possible to select an div element when I know its innerText ie
<div class="thishere"></div>
<div>Search on a this text</div>
If not, the div before it has a class, how do i find its next sibling?
$selector = new Zend_Dom_Query($response->getBody());
$nodes = $selector->query('????');
Using JavaScript you can loop through every element on the page like this says and find that div with the special class. Then, you'll know that the next element in the loop will be that second div and you can get its contents using element.innerHTML.
$text = <<<text
<div class="thishere"></div>
<div>Search on a this text</div>
text;
$selector = new Zend_Dom_Query ($text);
$nodes = $selector->queryXpath('//div[contains(text(),"Search on a this text")]');
foreach ($nodes as $node)
{
...
}

Retrieving relative DOM nodes in PHP

I want to retrieve the data of the next element tag in a document, for example:
I would like to retrieve <blockquote> Content 1 </blockquote> for every different span only.
<html>
<body>
<span id=12341></span>
<blockquote>Content 1</blockquote>
<blockquote>Content 2</blockquote>
<!-- misc html in between including other spans w/ no relative blockquotes-->
<span id=12342></span>
<blockquote>Content 1</blockquote>
<!-- misc html in between including other spans w/ no relative blockquotes-->
<span id=12343></span>
<blockquote>Content 1</blockquote>
<blockquote>Content 2</blockquote>
<blockquote>Content 3</blockquote>
<blockquote>Content 4</blockquote>
<!-- misc html in between including other spans w/ no relative blockquotes-->
<span id=12344></span>
<blockquote>Content 1</blockquote>
<blockquote>Content 2</blockquote>
<blockquote>Content 3</blockquote>
</body>
</html>
Now two things I'm wondering:
1.)How can I write an expression that matches and only outputs a blockquote that's followed right after a closed element (<span></span>)?
2.)If I wanted, how could I get Content 2, Content 3, etc if I ever have a need to output them in the future while still applying to the rules of the previous question?
Now two things I'm wondering:
1.)How can I write an expression that matches and only outputs a blockquote
that's followed right after a closed
element (<span></span>)?
Assuming that the provided text is converted to a well-formed XML document (you need to enclose the values of the id attributes in quotes)
Use:
/*/*/span/following-sibling::*[1][self::blockquote]
This means in English: Select all blockquote elements each of which is the first, immediate following sibling of a span element that is a grand-child of the top element of the document.
2.)If I wanted, how could I get Content 2, Content 3, etc if I ever
have a need to output them in the
future while still applying to the
rules of the previous question?
Yes.
You can get all sets of contigious blockquote elements following a span:
/*/*/span/following-sibling::blockquote
[preceding-sibling::*[not(self::blockquote)][1][self::span]]
You can get the contigious set of blockquote elements following the (N+1)-st span by:
/*/*/span/following-sibling::blockquote
[preceding-sibling::*
[not(self::blockquote)][1]
[self::span and count(preceding-sibling::span)=$vN]
]
where $vN should be substituted by the number N.
Thus, the set of contigious set of blockquote elements following the first span is selected by:
/*/*/span/following-sibling::blockquote
[preceding-sibling::*
[not(self::blockquote)][1]
[self::span and count(preceding-sibling::span)=0]
]
the set of contigious set of blockquote elements following the second span is selected by:
/*/*/span/following-sibling::blockquote
[preceding-sibling::*
[not(self::blockquote)][1]
[self::span and count(preceding-sibling::span)=1]
]
etc. ...
See in the XPath Visualizer the nodes selected by the following expression :
/*/*/span/following-sibling::blockquote
[preceding-sibling::*
[not(self::blockquote)][1]
[self::span and count(preceding-sibling::span)=3]
]
Short answer: Load your HTML into DOMDocument, and select the nodes you want with XPath.
http://www.php.net/DOM
Long answer:
$flag = false;
$TEXT = array();
foreach ($body->childNodes as $el) {
if ($el->nodeName === '#text') continue;
if ($el->nodeName === 'span') {
$flag = true;
continue;
}
if ($flag && $el->nodeName === 'blockqoute') {
$TEXT[] = $el->firstChild->nodeValue;
$flag = false;
continue;
}
}
Try the following *
/html/body/span/following-sibling::*[1][self::blockquote]
to match any first blockquotes after a span element that are direct children of body or
//span/following-sibling::*[1][self::blockquote]
to match any first blockquotes following a span element anywhere in the document
* edit: fixed Xpath. Credits to Dimitre. My initial version would match any first blockquote after the span, e.g. it would match span p blockquote, which is not what you wanted.
Both of the above would match "Content 1" blockquotes. If you'd want to match the other blockquotes following the span (siblings, not descendants) remove the [1]
Example:
$dom = new DOMDocument;
$dom->load('yourFile.xml');
$xp = new DOMXPath($dom);
$query = '/html/body/span/following-sibling::*[1][self::blockquote]';
foreach($xp->query($query) as $blockquote) {
echo $dom->saveXml($blockquote), PHP_EOL;
}
If you want to do that without XPath, you can do
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->load('yourFile.xml');
$body = $dom->getElementsByTagName('body')->item(0);
foreach($body->getElementsByTagName('span') as $span) {
if($span->nextSibling !== NULL &&
$span->nextSibling->nodeName === 'blockquote')
{
echo $dom->saveXml($span->nextSibling), PHP_EOL;
}
}
If the HTML you scrape is not valid XHTML, use loadHtmlFile() instead to load the markup. You can suppress errors with libxml_use_internal_errors(TRUE) and libxml_clear_errors().
Also see Best methods to parse HTML for alternatives to DOM (though I find DOM a good choice).
Besides #Dimitre good answer, you could also use:
/html
/body
/blockquote[preceding-sibling::*[not(self::blockquote)][1]
/self::span[#id='12341']]

Categories