I am using PHP DOMDocument() to generate XML file with elements.
I am appending all details into sample xml file into components tag. But closing tag is not coming. I want to create closing tag.
My Code is doing this
<component expiresOn="2022-12-31" id="pam" />
I want to do like following
<component expiresOn="2022-12-31" id="pam"></component>
My PHP CODE SAMPLE
$dom = new DOMDocument();
$dom->load("Config.xml");
$components = $dom->getElementsByTagName('components')->item(0);
if(!empty($_POST["pam"])) {
$pam = $_POST["pam"];
$component = $dom->createElement('component');
$component->setAttribute('expiresOn', $expirydate);
$component->setAttribute('id', "pam");
$components->appendChild($component5);
}
$dom->save("Config.xml");
I tested following suggestion and its not working. Both xml-php code are different.
$dom->saveXml($dom,LIBXML_NOEMPTYTAG);
Self-closing tags using createElement
I tested following.
You're trying to use DOMDocument::saveXML to save the new XML back into the original file, but all that function does is return the XML as a string. Since you aren't assigning the result to anything, nothing happens.
If you want to save the XML back to your file, as well as avoiding self-closing tags, you'll need to use the save method as you originally were, and also pass the option:
$dom->save('licenceConfig.xml', LIBXML_NOEMPTYTAG);
See https://3v4l.org/e6N5s for a demo
Related
I want to parse HTML code present in $raw to get the title and save it mysql. I have tried to do it with php dom and Ganon HTML parser but when I run it, shows me an error 500. it would be great if you solve this problem with Ganon.
function store($raw)
{
include_once('ganon.php');
$html = file_get_dom($raw);
echo $html('title', 0)->parent->getPlainText();
}
store ('<html> all html code </html>');
There are a few problems with your code.
Firstly you use file_get_dom() which is expecting to be passed in a file name, so usestr_get_dom() instead.
Secondly, the example HTML doesn't contain a title, so this won't work.
Then when you find the title, you go to the parent element and output from there. You just need to use that nodes content.
include_once('ganon.php');
function store($raw)
{
$html = str_get_dom($raw);
echo $html('title', 0)->getPlainText();
}
store ('<html><title>Title</title> all html code </html>');
outputs...
Title of page
I created a php parser for editing the html which is created by a CMS. The first thing I do is parse a custom tag for adding modules.
After that things like links, images etc. are if needed updated, changed or w/e. This all works.
Now I noticed that when a custom tag is replaced with the html the module generated this html is NOT processed by the rest of the actions.
For example; all links with a href of /pagelink-001 are replaced with the actual link of the current page. This works for the initial loaded html, not the replaced tag. Below I have a short version of the code. I tried saving it with saveHtml() and load it with loadHtml() and things like that.
I'm guessing this is because $doc with the loaded html is not updated as such.
My code:
$html = 'Link1<customtag></customtag>';
// Load the html (all other settings are not shown to keep it simple. Can be added if this is important)
$doc->loadHTML($html);
// Replace custom tag
foreach($xpath->query('//customtag') as $module)
{
// Create fragment
$return = $doc->createDocumentFragment();
// Check the kind of module
switch($module)
{
case 'news':
$html = $this->ZendActionHelperThatReturnsHtml;
// <div class="news">Link2</div>
break;
}
// Fill fragment
$return->appendXML($html);
// Replace tag with html
$module->parentNode->replaceChild($return, $module);
}
foreach($doc->getElementsByTagName('a') as $link)
{
// Replace the the /pagelink with a correct link
}
In this example Link1 href is replaced with the correct value, however Link2 is not. Link2 does correctly appear as a link and all that works fine.
Any directions of how I can update the $doc with the new html or if that is indeed the problem would be awesome. Or please tell me if I'm completely wrong (and where to look)!
Thanks in advance!!
It seemed that I was right and the returned string was a string and not html. I discovered in my code the innerHtml function from #Keyvan that I implemented at some point. This resulted in my function being this:
// Start with the modules, so all that content can be fixed as well
foreach($xpath->query('//customtag') as $module)
{
// Create fragment
$fragment = $doc->createDocumentFragment();
// Check the kind of module
switch($module)
{
case 'news':
$html = htmlspecialchars_decode($this->ZendActionHelperThatReturnsHtml); // Note htmlspecialchars_decode!
break;
}
// Set contents as innerHtml instead of string
$module->innerHTML = $html;
// Append child
$fragment->appendChild($module->childNodes->item(0));
// Replace tag with html
$module->parentNode->replaceChild($fragment, $module);
}
I am studying parsing HTML on PHP and I am using DOM for this.
I write this code inside my php file:
<?php
$site = new DOMDocument();
$div = $site->createElement("div");
$class = $site->createAttribute("class");
$class->nodeValue = "wrapper";
$div->appendChild($class);
$site->appendChild($div);
$html = $site->saveHTML();
echo $html;
?>
And when I run this on the browser and view the page source, only this code comes out:
<div class="wrapper"></div>
I don't know why it is not showing the whole html document that supposedly have to be. I am using XAMPP v3.2.1.
Please tell me where did I gone wrong with this. Thanks.
It's showing the whole HTML you created. A div node with a wrapper class attribute.
See the example in the docs. There the html, head, etc. nodes are explicitly created.
PHP only adds missing DOCTYPE, html and body elements when loading HTML, not when saving.
Adding $site->loadHTML($site->saveHTML()); before $html = $site->saveHTML(); will demonstrate this.
I need work with namespaces on XML from a code and do something with it. For instance:
<system:include file="./test.php" cache="true" />
That would be the final output of the content, but it is necessary to process the special tags (like the system:include) before send to client.
So I will get all elements of final output to search about namespaced tags or specific ones. The problem is that if I use DOMDocument and read like XML, I have some problems with namespaces declaration (Namespace prefix system on include is not defined in Entity).
My test code is:
<?php
$document = new DOMDocument();
$document->loadXML('
<system:include file="./test.php" cache="true" />
');
foreach($document->childNodes as $node) {
var_dump($node->nodeName);
}
?>
I need do it because I need process some special tags and converts it to real HTML. For instance: convert <b> to <strong> (just an example!) or make something better like include and cache a specific page using tags.
Another example:
<h7>Hello World!</h7>
Converts to:
<div class="h7">Hello World!</div>
Note: the ob contents will be sent to a specific method that will search by this special tags. So I don't know if I can make namespaces declaration before (will be hard and slowly, probably).
Bye!
I can get it to work if I specify a root element in the XML, and then declare the system namespace inside the root element. <root xmlns:system="system">...</root>
<?php
function dump($root) {
foreach($root->childNodes as $node) {
echo $node->nodeName;
echo "\n";
dump($node);
}
}
$doc = new DOMDocument();
$doc->loadXML('<root xmlns:system="system"><system:include file="./test.php" cache="true" /></root>');
dump($doc);
?>
I'm attempting to scrape the value of an input box from a URL. I seem to be having problems with my implementation of XPath.
The page to be scraped looks something like:
<!DOCTYPE html>
<html lang="en">
<head></head>
<body>
<div><span>Blah</span></div>
<div><span>Blah</span> Blah</div>
<div>
<form method="POST" action="blah">
<input name="SomeName" id="SomeId" value="GET ME"/>
<input type="hidden" name="csrfToken" value="ajax:3575644127378754050" id="csrfToken-login">
</form>
</div>
</body>
</html>
and I'm attempting to parse it like this:
$Contents = file_get_contents("https://www.linkedin.com/uas/login");
$Selector = "//input[#id='csrfToken-login']/#value";
print_r($Selector);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHtml($Contents);
$xpath = new DOMXPath($dom);
libxml_use_internal_errors(false);
print_r($xpath->query($Selector));
NB: dump() just wraps print_r() but adds some stack trace info and formatting.
The output is as folllowws:
14:50:08 scraper.php 181: (Scraper->Test)
//input[#id='csrfToken-login']/#value
14:50:08 scraper.php 188: (Scraper->Test)
DOMNodeList Object
(
)
Which I'm assuming means it was unable to find anything in the document which matches my selector? I've tried a number of variations, jsut to see if I can get something back:
/input/#value
/input
//input
/div
The only selector which I've been able to get anything from is / which returns the entire document.
What am I doing wrong?
EDIT: As some can't reproduce the problem with the old example, I've replaced it with an almost identical example which also demonstrates the problem but uses a public URL (LinkedIn login page).
There's been a suggestion that this isn't possible due to the parser choking on html5 - (as is the internal page) anyone have any experience of this?
If your selector starts with a single slash(/), it means the absolute path from the root. You need to use double slash (//) which selects all matching elements regardless of their location.
print_r won't work for this. Everything was fine in your code except for actually getting value.
Lists classes in PHP usually have a property called length, check that instead.
$Contents = file_get_contents("https://www.linkedin.com/uas/login");
$Selector = "//input[#id='csrfToken-login']/#value";
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHtml($Contents);
$xpath = new DOMXPath($dom);
libxml_use_internal_errors(false);
$b = $xpath->query($Selector);
echo $b->item(0)->value;
DOMXPath looks fine to me.
As for the xpath use descendant-or-self shortcut // to get to the input tag
//input[#id='SomeId']/#value
I've been to the LinkedIn login page that you specified and it is malformed; even your pared-down example has an unclosed input node. I know nothing about PHP's XPath implementation, but I'm guessing no straight XPath API is ever going to work with a malformed document.
Your XPath is correct, by the way.
You might need an intermediary step using TagSoup to "well form" the source before you start querying it, or Google "tag soup php" for any PHP-specific solutions/implementations.
I hope this helps,
Zachary