Dom Document - extract a document id & save

Dom Document - extract a document id & save - php

I am trying to extract a specific clump of HTML using dom document.
My code is as follows:
$domd = new DOMDocument('1.0', 'utf-8');
$domd->loadHTML($string);
$this->hook = 'content';
if($this->hook !== '') {
$main = $domd->getElementById($this->hook);
$newstr = "";
foreach($main->childNodes as $node) {
$newstr .= $domd->saveXML($node, LIBXML_NOEMPTYTAG);
}
$domd->loadHTML($newstr);
}
//MORE PARSING USING THE DOMD OBJECT
It works great BUT the foreach is quite slow, and I was wondering if there's a more intelligent way of doing this. I am re-loading the HTML into the $domd so I can keep editing. In the back of my mind I feel I should be saving a fragment, not re-loading the saved $newstr into the object.
Can this be made more elegant or faster?
Thanks!

I'm assuming you want to mutate your existing $domd document, replacing it completely with those child nodes you're grabbing from that content node:
UPDATE: Just realized that since you were reloading using loadHTML, you probably wanted to preserve the html/body nodes that it creates. Code below has been adjusted to empty body and append the fragment there:
$domd = new DOMDocument('1.0', 'utf-8');
$domd->loadHTML($string);
$this->hook = 'content';
if($this->hook !== '') {
$main = $domd->getElementById($this->hook);
$fragment = $domd->createDocumentFragment();
while($main->hasChildNodes()) {
$fragment->appendChild($main->firstChild);
}
$body = $domd->getElementsByTagName("body")->item(0);
while($body->hasChildNodes()) {
$body->removeChild($body->firstChild);
}
$body->appendChild($fragment);
}
//MORE PARSING USING THE DOMD OBJECT

Related

Xml Php Warning Extra Content at end of document

I'm trying to write a droid app that sends and receives XML between the app and a web service. When I try to run the following code
$dom = new domDocument;
$dom = simplexml_load_file('php://input');
$xml = simplexml_import_dom($dom);
$messages = Messages::find_by_sql("SELECT * FROM messages WHERE reciever = '$xml->userName'");
$xmlString = "";
if($messages)
{
foreach($messages as $message)
{
$ts = strtotime($message->ts);
$xmlString=$xmlString."<Message><sender>".$message->sender."</sender><reciever>".$message->reciever."</reciever><timestamp>"."123"."</timestamp><text>".$message->text."</text></Message>";
}
}
else
{
//do something
}
$xmlReturn = new DOMDocument('1.0', 'UTF-8');
$xmlReturn->loadXML($xmlString);
echo($xmlReturn->saveXML());
?>
I get a Warning Extra content at the end of the document.
The error comes from this line: $xmlReturn->loadXML($xmlString);
I'm not 100% sure that you can create an xml document by loading a string, but I've seen similar things done and if you look here you can see what it ouputs, which looks like valid XML to me.

An XML document can have only one root element. You are stringing together multiple <message>…</message> combinations here, so a root element encapsulating those is missing.

Regex Replacement Dependent On Class

I have the following code that replaces all tags on a page and adds the nCode image resizer to it. The code is as follows:
function ncode_the_content($content) {
return preg_replace("/<img([^`|>]*)>/im", "<img onload=\"NcodeImageResizer.createOn(this);\"$1>", $content); }
}
What I need to do is make it so that if an image has the class of "noresize" it doesn't do the preg_match.
I have only managed to get it so that if there is the "noresize" class anywhere on the page it stops resizing all images instead of just the one with the correct class.
Any suggestions?
UPDATE:
Am I even remotely in the right ballpark with this?
function ncode_the_content($content) {
//Load the HTML page
$html = file_get_contents($content);
//Parse it. Here we use loadHTML as a static method
//to parse the HTML and create the DOM object in one go.
#$dom = DOMDocument::loadHTML($html);
//Init the XPath object
$xpath = new DOMXpath($dom);
//Query the DOM
$linksnoresize = $xpath->query( 'img[#class = "noresize"]' );
$links = $xpath->query( 'img[]' );
//Display the results as in the previous example
foreach($links as $link){
echo $link->getAttribute('onload'), 'NcodeImageResizer.createOn(this);';
}
foreach($linksnoresize as $link){
echo $link->getAttribute('onload'), '';
}
}

Here's some untested code:
$dom = DOMDocument::loadHTML($content);
$images = $dom->getElementsByTagName("img");
foreach ($images as $image) {
if (!strstr($image->getAttribute("class"), "noresize")) {
$image->setAttribute("onload", "NcodeImageResizer.createOn(this);");
}
}
But, if it were me, I would eschew any such inline event handler and instead just find the appropriate elements with Javascript.

I ended up just using pure CSS and adding a around the images I didn't want to be resized. Forced the width and height of that div back to auto and then removed the warning message that was displayed above them. Seems to work fine. Thanks for your help :)

Extract and dump a DOM node (and its children) in PHP

’I have the following scenario and I'm already spending hours trying to handle it: I'm developing a Wordpress theme (hence PHP) and I want to check whether the content of a post (which is HTML) contains a tag with a certain id/class. If so, I want to extract it from the content and place it somewhere else.
Example: Let's say the text content of the Wordpress post is
<?php
/* $content actually comes from WP function get_the_content() */
$content = '<p>some text and so forth that I don\'t care about...</p> <div class="the-wanted-element"><p>I WANT THIS DIV!!!</p></div>';
?>
So how can I extract that div with the class (could also live with giving it an ID), output it (with tags and all that) in one place of the template, and output the rest (without the extracted tag, of course) in another place of the template?
I've already tried with the DOMDocument class, p.i.t.a. to me, maybe I'm too stupid.

Try:
$content = '<p>some text and so forth that I don\'t care about...</p> <div class="the-wanted-element"><p>I WANT THIS DIV!!!</p></div>';
$dom = new DomDocument;
$dom->loadHtml($content);
$xpath = new DomXpath($dom);
$contents = '';
foreach ($xpath->query('//div[#class="the-wanted-element"]') as $node) {
$contents = $dom->saveXml($node);
break;
}
echo $contents;
How to get the remaining xml/html:
$content = '<p>some text and so forth that I don\'t care about...</p> <div class="the-wanted-element"><p>I WANT THIS DIV!!!</p></div>';
$dom = new DomDocument;
$dom->loadHtml($content);
$xpath = new DomXpath($dom);
foreach ($xpath->query('//div[#class="the-wanted-element"]') as $node) {
$node->parentNode->removeChild($node);
break;
}
$contents = '';
foreach ($xpath->query('//body/*') as $node) {
$contents .= $dom->saveXml($node);
}
echo $contents;

Replace element using phpquery (php version of jquery)

I want to replace all <span> tags with <p> using phpquery. What is wrong with my code? It finds the span but the replaceWith function is not doing anything.
$event = phpQuery::newDocumentHTML(file_get_contents('event.html'));
$formatted_event = $event->find('span')->replaceWith('<p>');
This documentation indicates this is possible:
http://code.google.com/p/phpquery/wiki/Manipulation#Replacing
http://api.jquery.com/replaceWith/
This is the html that gets returned with and without ->replaceWith('<p></p>') in the code:
<span class="Subhead1">Event 1<br></span><span class="Subhead2">Event 2<br>
August 12, 2010<br>
2:35pm <br>
Free</span>

If you dont mind a plain DOMDocument solution (DOMDocument is used unter the hood of phpQuery to parse the HTML fragments), I did something similiar a while ago. I adapted the code to do what you need:
$document = new DOMDocument();
// load html from file
$document->loadHTMLFile('event.html');
// find all span elements in document
$spanElements = $document->getElementsByTagname('span');
$spanElementsToReplace = array();
// use temp array to store span elements
// as you cant iterate a NodeList and replace the nodes
foreach($spanElements as $spanElement) {
$spanElementsToReplace[] = $spanElement;
}
// create a p element, append the children of the span to the p element,
// replace span element with p element
foreach($spanElementsToReplace as $spanElement) {
$p = $document->createElement('p');
foreach($spanElement->childNodes as $child) {
$p->appendChild($child->cloneNode(true));
}
$spanElement->parentNode->replaceChild($p, $spanElement);
}
// print innerHTML of body element
print DOMinnerHTML($document->getElementsByTagName('body')->item(0));
// --------------------------------
// Utility function to get innerHTML of an element
// -> "stolen" from: http://www.php.net/manual/de/book.dom.php#89718
function DOMinnerHTML($element) {
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
Maybe this can get you in the right direction on how to do the replacement in phpQuery?
EDIT:
I gave the jQuery documentation of replaceWith another look, it seems to me, that you have to pass in the whole html fragment which you want to be your new, replaced content.
This code snipped worked for me:
$event = phpQuery::newDocumentHTML(...);
// iterate over the spans
foreach($event->find('span') as $span) {
// make $span a phpQuery object, otherwise its just a DOMElement object
$span = pq($span);
// fetch the innerHTMLL of the span, and replace the span with <p>
$span->replaceWith('<p>' . $span->html() . '</p>');
}
print (string) $event;
I couldnt find any way to do this with chained method calls in one line.

Wont str_replace do a better job at this? Its faster and easier to debug.
Always take into consideration that external libraries may have bugs:)
$htmlContent = str_replace("<span", "<p", $htmlContent);
$htmlContent = str_replace("</span>", "</p>", $htmlContent);

Try;
$formatted_event = $event->find('span')->replaceWith('<p></p>');

Did you try:
$formatted_event = $event->find('span')->replaceWith('p');

You could actually use the wrap function
$event = phpQuery::newDocumentHTML(...);
// iterate over the spans
foreach($event->find('span') as $span) {
// make $span a phpQuery object, otherwise its just a DOMElement object
$span = pq($span);
// wrap the span with <p>
$span->wrap('<p></p>');
}
print (string) $event;

How to get a div via PHP?

I get a page using file_get_contents from a remote server, but I want to filter that page and get a DIV from it that has class "text" using PHP. I started with DOMDocument but I'm lost now.
Any help?
$file = file_get_contents("xx");
$elements = new DOMDocument();
$elements->loadHTML($file);
foreach ($elements as $element) {
if( !is_null($element->attributes)) {
foreach ($element->attributes as $attrName => $attrNode) {
if( $attrName == "class" && $attrNode== "text") {
echo $element;
}
}
}
}

Once you have loaded the document to a DOMDocument instance, you can use XPath queries on it -- which might be easier than going yourself through the DOM.
For that, you can use the DOMXpath class.
For example, you should be able to do something like this :
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="text"]');
foreach ($tags as $tag) {
var_dump($tag->textContent);
}
(Not tested, so you might need to adapt the XPath query a bit...)

Personally, I like Simple HTML Dom Parser.
include "lib.simple_html_dom.php"
$html = file_get_html('http://scrapeyoursite.com');
$html->find('div.text')->plaintext;
Pretty simple, huh? It accommodates selectors like jQuery :)

you can use simple_html_dom like here simple_html_dom doc
or use my code like here :
include "simple_html_dom.php";
$html = new simple_html_dom();
$html->load_file('www.yoursite.com');
$con_div = $html->find('div',0);//get value plaintext each html
echo the $con_div in plaintext..
$con_div->plaintext;
it's mean you will find the first div in array ('div',0) and show it in plaintext..
i hope it help you :cheer

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Dom Document - extract a document id & save - php

Related

Xml Php Warning Extra Content at end of document

Regex Replacement Dependent On Class

Extract and dump a DOM node (and its children) in PHP

Replace element using phpquery (php version of jquery)

How to get a div via PHP?

Categories

Resources