Disable html entity encoding in PHP DOMDocument - php

I cannot figure out how to stop DOMDocument from mangling these characters.
<?php
$doc = new DOMDocument();
$doc->substituteEntities = false;
$doc->loadHTML('<p>¯\(°_o)/¯</p>');
print_r($doc->saveHTML());
?>
Expected Output:
¯(°_o)/¯
Actual Output:
¯(°_o)/¯
http://codepad.org/W83eHSsT

I've found a hint in the comments of DOMDocument::loadHTML documentation:
(Comment from <mdmitry at gmail dot com> 21-Dec-2009 05:02: "You can also load HTML as UTF-8 using this simple hack:")
Just add '<?xml encoding="UTF-8">' before the HTML-input:
$doc = new DOMDocument();
//$doc->substituteEntities = false;
$doc->loadHTML('<?xml encoding="UTF-8">' . '<p>¯\(°_o)/¯</p>');
print_r($doc->saveHTML());

<?xml version="1.0" encoding="utf-8">
in the top of the document takes care of tags.. for both saveXML and saveHTML.

Related

Creating PHP -> XML with special character '&'

I'm trying to create an XML from PHP with special characters.
$xml = new SimpleXMLElement('<?xml version="1.0" encoding="UTF-8" ' . 'standalone="yes"?><Root/>');
$data->addChild('NAME', $variable);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xmlAusgabe = $xml->asXML());
$dom->save('../test.xml');
When there is a special character like '&' in it the output is empty.
I thought these characters are available in UTF-8.
Can someone help me?
When embed XML, always there are many problems :( try this:
Change
$dom->loadXML($xmlAusgabe = $xml->asXML());
by
$xmlAusgabe = $xml->asXML();
$xmlAusgabe = mb_convert_encoding( $xmlAusgabe, 'HTML-ENTITIES', 'UTF-8') ;
$dom->loadXML( $xmlAusgabe );
Then check encoding and fileencoding is utf8. If you use vim editor:
set encoding=utf-8
set fileencoding=utf-8
UPDATE
libxml_use_internal_errors( FALSE );
$xml = new SimpleXMLElement('<?xml version="1.0" encoding="UTF-8" standalone="yes"?><Root/>');
$data->addChild('NAME', mb_convert_encoding( $variable, 'HTML-ENTITIES', 'UTF-8') );
$xmlAusgabe = $xml->asXML();
$dom = new DOMDocument();
$dom->loadXML( $xmlAusgabe );
$dom->save('../test.xml');
I check this code and run. To print error use:
libxml_get_errors()

Create a XML document using the DOM object with white characters

I have a question: I am trying to create a XML file using DomDocument and I would like to have this output:
<?xml version="1.0" encoding="UTF-8"?>
<winstrom version="1.0">
<main_tag>
<child_tag>example</child_tag>
</main_tag>
<winstrom>
The problem is with the second row - if I write it as below then the output is "Invalid Character Error". I guess it is not allowed to have space characters there... However I need it like this, so what are the options?
$dom = new DomDocument('1.0', 'UTF-8');
$root = $dom->createElement('winstrom version=1.0');
$dom->appendChild($root);
$item = $dom->createElement('hlavni_tag');
$root2->appendChild($item);
$text = $dom->createTextNode('example');
$item->appendChild($text);
$dom->formatOutput = true;
echo $dom->saveXML();
There seems to be a misunderstanding of what an XML element is and how it differs from attributes.
Try this code:
<?php
$dom = new DomDocument('1.0', 'UTF-8');
$root = $dom->createElement('winstrom');
$root->setAttribute("version","1.0");
$dom->appendChild($root);
$root2 = $dom->createElement("main_tag"); //You forgot this part
$root->appendChild($root2);
$item = $dom->createElement('hlavni_tag'); //Should it be "child_tag"?
$root2->appendChild($item);
$text = $dom->createTextNode('example');
$item->appendChild($text);
$dom->formatOutput = true;
echo $dom->saveXML();

PHP XML response start tag expected, but i see it in var_dump

I have the following being returned
var_dump:
string(799) "<?xml version="1.0" encoding="ISO-8859-1"?> <serv:message xmlns:serv="http://www.webex.com/schemas/2002/06/service" xmlns:com="http://www.webex.com/schemas/2002/06/common" xmlns:att="http://www.webex.com/schemas/2002/06/service/attendee"><serv:header><serv:response><serv:result>SUCCESS</serv:result><serv:gsbStatus>BACKUP</serv:gsbStatus></serv:response></serv:header><serv:body><serv:bodyContent xsi:type="att:registerMeetingAttendeeResponse" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><att:register><att:attendeeID>29281003</att:attendeeID></att:register></serv:bodyContent></serv:body></serv:message>"
i'm trying to use SimpleXML, but i'm first validating output with this function (sorry, can't remember where i found it on stackoverflow):
function isXML($xml){
libxml_use_internal_errors(true);
$doc = new DOMDocument('1.0', 'utf-8');
$doc->loadXML($xml);
$errors = libxml_get_errors();
if(empty($errors)){
return true;
}
$error = $errors[0];
if($error->level < 3){
return true;
}
$explodedxml = explode("r", $xml);
$badxml = $explodedxml[($error->line)-1];
$message = $error->message . ' at line ' . $error->line . '. Bad XML: ' . htmlentities($badxml);
return $message;
}
result of isXML()
Start tag expected, '<' not found at line 1. Bad XML: <?xml ve
I see the '<', unless the var_dump is inaccurate. I've broken this thing down as much as I could. Any help would be greatly appreciated.
I stripped the problem down a little more:
$xml = '<?xml version="1.0" encoding="ISO-8859-1"?>
<serv:message xmlns:serv="http://www.webex.com/schemas/2002/06/service"/>';
// escape xml special chars - this will provoke the error
$xml = htmlspecialchars($xml);
$document = new DOMDocument();
$document->loadXml($xml);
Output:
Warning: DOMDocument::loadXML(): Start tag expected, '<' not found in Entity, line: 1 in /tmp/...
What happens it that your XML is still escaped/encoded. You do not see that in the browser because the special characters are interpreted by it. It treats the response (including the var_dump()) as HTML. Open the source view to check the actual value.
Debug the source that reads the XML string, you might want to change it or add a html_entity_decode() there.
HINT: You're XML uses namespaces, so you might better off with DOM + Xpath. Check out DOMXpath::evaluate().

PHP DOMDocument - createDocumentFragment does not work with loadHTML

I have a string that contains HTML and I would like to insert this HTML in a DOMElement.
For that, I did:
$abstract = "<p xmlns:default="http://www.w3.org/1998/Math/MathML">Test string <formula type="inline"><default:math xmlns="http://www.w3.org/1998/Math/MathML"><default:mi>π</default:mi></default:math></formula></p>"
$dom = new \DOMDocument();
#$dom->loadHTML($abstract);
$frag = $dom->createDocumentFragment();
When var dumping the $frag->nodeValue, I am getting null. Any idea?
I am not sure what you expect, you creating a new fragment and you add no content. Even if you do it would not work because the document fragment is no node, it is an helper construct to add a XML fragment to a document.
Here is an example:
$dom = new \DOMDocument();
$body = $dom->appendChild($dom->createElement('body'));
$fragment = $dom->createDocumentFragment();
$fragment->appendXml('<p>first</p>second');
$body->appendChild($fragment);
echo $dom->saveHtml();
Output:
<body><p>first</p>second</body>

Self-closing tags using createElement

I need to add a self-closing tag to XML file with DOM in PHP, but I don't know how, because standardly, this tag looks like this:
<tag></tag>
But it should look like this:
<tag/>
DOM will do that automatically for you
$dom = new DOMDocument;
$dom->appendChild($dom->createElement('foo'));
echo $dom->saveXml();
will give by default
<?xml version="1.0"?>
<foo/>
unless you do
$dom = new DOMDocument;
$dom->appendChild($dom->createElement('foo'));
echo $dom->saveXml($dom, LIBXML_NOEMPTYTAG);
which would then give
<?xml version="1.0" encoding="UTF-8"?>
<foo></foo>
Just pass a node param to DOMDocument::saveXML in order to output only a specific node, without any XML declaration:
$doc = new \DOMDocument('1.0', 'UTF-8');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = false;
$node = $doc->createElement('foo');
// Trimming the default carriage return char from output
echo trim($doc->saveXML($node));
will give
<foo/>
not containing any new line / carriage return ending char.

Categories