How to parse html with php DomXpath, modify and save

How to parse html with php DomXpath, modify and save - php

After googling could not found anything related to my issue.
Problem is: I parse page, find one table [there is four tables].
And when I found, I want to add one/some row/rows to table. But I don`t know how to do it. Some similar issues are about parsing xml and viewing content.
In code I have something like this:
$dom = new DOMDocument();
$dom->loadHTML($output->getHTML());
$xpath = new DOMXPath($dom);
$tableProp = $xpath->query('//*[#class="smwb-factbox"][2]');
....
$dom->asHTML();

Solution is simple:
With set of methods such as createElement, setAttribute and appendChild I solved my problem, example as follows:
$dom = new DOMDocument();
$dom->loadHTML(mb_convert_encoding($output->getHTML(), 'HTML-ENTITIES', 'utf-8'));
$xpath = new DOMXPath($dom);
$tableProp = $xpath->query('//*[#class="smwb-factbox"][2]');
...
$th_el = $dom->createElement('th', $th_outer_inner_span_a_el);
...
$td_el = $dom->createElement('td', '');
$td_el->appendChild($td_el_outer_span);
$tr_el = $dom->createElement('tr', '');
$tr_el->setAttribute('class', 'smwb-propvalue');
$tr_el->appendChild($th_el);
$tr_el->appendChild($td_el);
$tableProp->item(0)->appendChild($tr_el);
$dom->saveHTML();
...
The idea is pretty simple.
I have table in mediawiki, find it, create new row and insert it, after save it. That's all.

Related

PHP DOMDocument how to get that content of this tag?

I am using domDocument hoping to parse this little html code. I am looking for a specific span tag with a specific id.
<span id="CPHCenter_lblOperandName">Hello world</span>
My code:
$dom = new domDocument;
#$dom->loadHTML($html); // the # is to silence errors and misconfigures of HTML
$dom->preserveWhiteSpace = false;
$nodes = $dom->getElementsByTagName('//span[#id="CPHCenter_lblOperandName"');
foreach($nodes as $node){
echo $node->nodeValue;
}
But For some reason I think something is wrong with either the code or the html (how can I tell?):
When I count nodes with echo count($nodes); the result is always 1
I get nothing outputted in the nodes loop
How can I learn the syntax of these complex queries?
What did I do wrong?

You can use simple getElementById:
$dom->getElementById('CPHCenter_lblOperandName')->nodeValue
or in selector way:
$selector = new DOMXPath($dom);
$list = $selector->query('/html/body//span[#id="CPHCenter_lblOperandName"]');
echo($list->item(0)->nodeValue);
//or
foreach($list as $span) {
$text = $span->nodeValue;
}

Your four part question gets an answer in three parts:
getElementsByTagName does not take an XPath expression, you need to give it a tag name;
Nothing is output because no tag would ever match the tagname you provided (see #1);
It looks like what you want is XPath, which means you need to create an XPath object - see the PHP docs for more;
Also, a better method of controlling the libxml errors is to use libxml_use_internal_errors(true) (rather than the '#' operator, which will also hide other, more legitimate errors). That would leave you with code that looks something like this:
<?php
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query("//span[#id='CPHCenter_lblOperandName']") as $node) {
echo $node->textContent;
}

In PHP, using DomDocument getElementByID not working? What am I doing wrong?

Here is a bit of my code...
$dom = new DomDocument;
$html = $newIDs[0];
$dom->validateOnParse = true;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = true;
$tryID = $dom->getElementById('ID');
echo $tryID;
I am trying to get multiple specific IDs from a website, this just shows one, and I have seen this method everywhere, including on here, but when I try and print something out nothing shows up. I tried testing to see if it is reading something in with
if(!$tryID)
{
("Element not found");
}
But it never prints that out either. Lastly, I have used
echo $tryID->nodeValue;
and still nothing... anyone know what I am doing wrong?
Also, if I do get this working can I read in multiple different things to different variables on the same $dom ? If that makes ay sense.

Ok, so your solution.
For a DIV:
<div id="divID" name="notWorking">This is not working!</div>
This will do:
<?php
$dom = new DOMDocument("1.0", "utf-8");
$dom->loadHTMLFile('YourFile.html');
$div = $dom->getElementById('divID');
echo $div->textContent;
$div->setAttribute("name", "yesItWorks");
?>
Should work without the file as long as you pass a Well-Made XML or XHTML content, changing
$dom->loadHTMLFile('YourFile.html');
to your
$dom->loadHTML($html);
Oh yeah, and of course, to CHANGE the content (For completeness):
$div->removeChild($div->firstChild);
$newText = new DOMText('Yes this works!');
$div->appendChild($newText);
Then you can just Echo it again or something.

PHP DOMXpath not picking anything up

I'm trying to write a script that grabs the URL of the first image from this website: http://www.slothradio.com/covers/?adv=&artist=pantera&album=vulgar+display+of+power
Here's my script:
$content = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($content);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[#class='album0']/img");
echo '<pre>';print_r($elements);exit;
When I run that, it outputs
DOMNodeList Object
(
)
Even when I change my query to $xpath->query("*/img"), I still get nothing. What am I doing wrong?

$doc->loadHTMLFile($content); takes in FILE PATH not HTML content see documentation
http://php.net/manual/en/domdocument.loadhtmlfile.php
Use
$doc = new DOMDocument();
$doc->loadHTMLFile($url);
To Output Element use
var_dump(iterator_to_array($elements));
//Or
print_r(iterator_to_array($elements));
Thanks
:)

What am I doing wrong?
You are using print_r, but DOMNodeList does not offer any output for that function (because it's an internal class). You can start with outputting the number of items for example. In the end you need to iterate over the node list and deal with each node on your own.
printf("Found %d element(s).\n", $elements->length);

A question about saving a file in php

I've used the following code to do an XSLT in php:
# LOAD XML FILE
$XML = new DOMDocument();
$XML = simplexml_load_file("images/upload/source.xml");
# START XSLT
$xslt = new XSLTProcessor();
$XSL = new DOMDocument();
$XSL->load( 'xsl/transfer.xsl', LIBXML_NOCDATA);
$xslt->importStylesheet( $XSL );
#PRINT
print $XML->saveXML();
print $XML->save("newfile.xml") ;
The code is quite straightforward, we need to load the source xml file and then load up the stylesheet, and indeed it actually works.
The code that causes trouble is the last line:
print $XML->save("newfile.xml") ;
after running which I got error "Fatal error: Call to undefined method SimpleXMLElement::save() ". But, actually ,I was following a tutorial here:
http://devzone.zend.com/article/1713.
Maybe I screwed up something, could anybody give me a hint? thanks in advance.
Following your guys' advice, I modified the code like this:
# LOAD XML FILE
$XML = new DOMDocument();
$XML->load("images/upload/source.xml");
# START XSLT
$xslt = new XSLTProcessor();
$XSL = new DOMDocument();
$XSL->load( 'xsl/transfer.xsl', LIBXML_NOCDATA);
$xslt->importStylesheet( $XSL );
#PRINT
print $xslt->transformToXML( $XML );
now the correctly-transformed XML gets shown in the browser, I've tried some ways but still couldn't figure out how to print this result to a file instead of showing in the browser, any help is appreciated, thanks in advance.

You're changing how $XML is defined, simply call the load method on $XML instead of simplexml_load_file:
$XML = new DOMDocument();
$XML->load("images/upload/source.xml");
There's no reason at all to use simplexml since the XSLT processing is all done with DOMDocument. So just replace that one line, and you should be good to go...

$XML = new DOMDocument();
$XML = simplexml_load_file("images/upload/source.xml");
First you store a DOMDocument in $XML, and then you replace it with a SimpleXMLElement. DOMDocument does have a save method, but SimpleXMLElement does not.
Admission: didn't look at the tutorial, so I don't know why/if that one works.

$XML = new DOMDocument();
$XML = simplexml_load_file("images/upload/source.xml");
You're saying that $XML is a DOMDocument and then you replace it with a SimpleXMLElement on line 2
Use
$XML = new DOMDocument();
$XML->load("images/upload/source.xml");
instead

Problem:
$XML = new DOMDocument();
$XML = simplexml_load_file("images/upload/source.xml");
You create a DOMDocument, which you then overwrite with a SimpleXMLElement object. The first line is dead code. You aren't using it at all, since you overwrite it in the next statement.
save is a method in DOMDocument. asXML($file) is the equivalent for SimpleXML (or saveXML($file) which is an alias.
If you look at the tutorial, it's clearly:
$xsl = new DomDocument();
$xsl->load("articles.xsl");
$inputdom = new DomDocument();
$inputdom->load("articles.xml");
So, if you use simplexml_load_file, then you're not really following the tutorial.

PHP HTML DOMDocument getElementById problems

A little new to PHP parsing here, but I can't seem to get PHP's DOMDocument to return what is clearly an identifiable node. The HTML loaded will come from the 'net so can't necessarily guarantee XML compliance, but I try the following:
<?php
header("Content-Type: text/plain");
$html = '<html><body>Hello <b id="bid">World</b>.</body></html>';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = true;
/*** load the html into the object ***/
$dom->loadHTML($html);
var_dump($dom);
$belement = $dom->getElementById("bid");
var_dump($belement);
?>
Though I receive no error, I only receive the following as output:
object(DOMDocument)#1 (0) {
}
NULL
Should I not be able to look up the <b> tag as it does indeed have an id?

The Manual explains why:
For this function to work, you will need either to set some ID attributes with DOMElement->setIdAttribute() or a DTD which defines an attribute to be of type ID. In the later case, you will need to validate your document with DOMDocument->validate() or DOMDocument->validateOnParse before using this function.
By all means, go for valid HTML & provide a DTD.
Quick fixes:
Call $dom->validate(); and put up with the errors (or fix them), afterwards you can use $dom->getElementById(), regardless of the errors for some reason.
Use XPath if you don't feel like validing: $x = new DOMXPath($dom); $el = $x->query("//*[#id='bid']")->item(0);
Come to think of it: if you just set validateOnParse to true before loading the HTML, if would also work ;P
.
$dom = new DOMDocument();
$html ='<html>
<body>Hello <b id="bid">World</b>.</body>
</html>';
$dom->validateOnParse = true; //<!-- this first
$dom->loadHTML($html); //'cause 'load' == 'parse
$dom->preserveWhiteSpace = false;
$belement = $dom->getElementById("bid");
echo $belement->nodeValue;
Outputs 'World' here.

Well, you should check if $dom->loadHTML($html); returns true (success) and I would try
var_dump($belement->nodeValue);
for output to get a clue what might be wrong.
EDIT:
http://www.php-editors.com/php_manual/function.domdocument-get-element-by-id.html - it seems that DOMDocument uses XPath internally.
Example:
$xpath = xpath_new_context($dom);
var_dump(xpath_eval_expression($xpath, "//*[#ID = 'YOURIDGOESHERE']"));

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to parse html with php DomXpath, modify and save - php

Related

PHP DOMDocument how to get that content of this tag?

In PHP, using DomDocument getElementByID not working? What am I doing wrong?

PHP DOMXpath not picking anything up

A question about saving a file in php

PHP HTML DOMDocument getElementById problems

Categories

Resources