I am parsing some XML with PHP DOMDocument. This is my code:
$doc = new DOMDocument;
$doc->resolveExternals = true;
$doc->substituteEntities = true;
$doc->load('../poems_xml/'.$pid.'.xml');
$xsl = new DOMDocument;
$xsl->load('../xslt/title.xsl');
$proc = new XSLTProcessor;
$proc->importStylesheet($xsl);
$ptitle = $proc->transformToXML($doc);
I have an entity file declared at the beginning of my .xml:
<?xml version="1.0" encoding="utf-8"?>
<?oxygen RNGSchema="../dtd/dps.rng" type="xml"?>
<?xml-stylesheet href="../dtd/dps.css" type="text/css"?>
<!DOCTYPE TEI SYSTEM "../dtd/entities.ent">
[...]
And the entities file looks like this:
[...]
<!ENTITY d1_AytR_002 "<rs key='d1_AytR_002'>d1_AytR_002</rs>">
[...]
In my .xml I use these entities like so:
...&d1_AytR_002;...
Now, it all goes well in terms of parsing the file and transform it via the xslt and css files, except for the entities. They just get ignored. Turning on the php_error_log flag, I get this:
Notice: DOMDocument::load(): Namespace default prefix was not found in Entity, line: 1 in index.php on line 28
(line 28 of index.php is where the load('../poems_xml/'.$pid.'.xml') instruction is). Can someone shed some light on what I should check/add regarding my entities?
I'm using PHP 5.6.40.
A workaround (and possible permanent solution) is that of adding the namespace to each of the <!ENTITY>s, like so:
<!ENTITY d1_AytR_002 "<rs xmlns="http://www.tei-c.org/ns/1.0" key='d1_AytR_002'>d1_AytR_002</rs>">
Related
Please give me a hint why my code is NOT vulnerable to XXE.
code:
$text = $_POST['textarea'];
$doc= new DOMDocument();
$doc->loadXML($text);
echo $doc->textContent;
testcase 1:
<justsomexmltag>Hello world</justsomexmltag>
result 1:
Hello world
So far so good. However, when I'm trying to inject XML code to retrieve a local file's content:
<?xml version="1.0"?>
<!DOCTYPE log [
<!ENTITY ent SYSTEM "test.txt">
]>
<log><text>&ent;</text></log>
then nothing is printed. "test.txt" is on the same level in the file structure as the php file where I carry out the attack. I have tried
<!ENTITY ent SYSTEM file:///"test.txt">
as well as
<!ENTITY ent SYSTEM file:///full path to the file>
but to no avail.
test.txt:
This is just a test.
Have tried:
<test>This is just a test.</test>
no results.
Any hints?
reflecting #Paul Crovella, here's an edit:
CP-ing your code resulted in:
DOMDocument::loadXML(): I/O warning : failed to load external entity file:// full path to file name
DOMDocument::loadXML(): Failure to process entity ent in Entity
DOMDocument::loadXML(): Entity 'ent' not defined in Entity
By default libxml will not load external entities precisely to avoid this issue. To convince it to do so you'd need to set either substituteEntities or validateOnParse to true prior to loading. E.g.:
$xml = <<<'XML'
<?xml version="1.0"?>
<!DOCTYPE log [
<!ENTITY ent SYSTEM "test.txt">
]>
<log><text>&ent;</text></log>
XML;
$dom = new DOMDocument();
$dom->substituteEntities = true;
$dom->loadXML($xml);
echo $dom->textContent;
Outputs:
This is just a test.
thanks to this helpful community I've been enabled to make a xsl-stylesheet extracting some metainformation from xml-files on my site. Of course, I do not want to code the stylesheet directly in the xml-files, which shall be left untouched. Also, I do not want to preprocess the files in OxyGen and upload the metainfo-files.
So I simply tried this, in metainfo.php:
<?php echo '<?xml-stylesheet type="text/xsl" href="metainfo.xsl"?>'; include ('sample.xml') ?>
Still, loading metainfo.php will display the whole xml file. The source code looks fine, but when I copy it, save it as xml and open it in OxyGen, there is this little bugger '' in the code, which apperntly is called a BOM:
<?xml-stylesheet type="text/xsl" href="metainfo.xsl"?> <?xml-stylesheet type="text/xsl" href="metainfo.xsl"?>
Might this cause the trouble in the browser too? Or is it something else, more basic?
After some extra work, there's what I figured out as a solution myself:
<?php
$signatur = $_GET['signatur'];
# LOAD XML FILE
$XML = new DOMDocument();
$XML->load( 'xml/'.$signatur.'.xml' );
# START XSLT
$xslt = new XSLTProcessor();
# IMPORT STYLESHEET 1
$XSL = new DOMDocument();
$XSL->load( 'metainfo.xsl' );
$xslt->importStylesheet( $XSL );
#PRINT
print $xslt->transformToXML( $XML );
?>
I am getting error while loading the xml file. I got many answers related to the topic but I really could not find why this error maybe coming in my file.
Warning: DOMDocument::load() [<a href='domdocument.load'>domdocument.load</a>]: Extra content at the end of the document
When I am running the file, it runs successfully, but when I reload it, it gives the above error instead of adding another node. But, next time when I reload it runs successfully again. This is happening alternatively. Please someone tell me why is this happening and how to solve the problem.
I am using this php code to edit the xml file:
<?php
$dom = new DomDocument("1.0", "UTF-8");
$dom->load('filename.xml');
$noteElem = $dom->createElement('note');
$toElem = $dom->createElement('to', 'Chikck');
$fromElem = $dom->createElement('from', 'ewrw');
$noteElem->appendChild($toElem);
$noteElem->appendChild($fromElem);
$dom->appendChild($noteElem);
$dom->formatOutput = TRUE;
//$xmlString = $dom->saveXML();
//echo $xmlString;
$dom->save('filename.xml');
?>
This is the xml file I am editing:
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>Chikck</to>
<from>ewrw</from>
</note>
The extra content error is caused by having two of the same node, in this case the note node, as a root element.
You could add a new root element notes for example, and then add more note elements within that.
Here's an example using the simplexml library (just because I use this one and I'm familiar with it)
New filename2.xml: (with added notes element as root)
<?xml version="1.0" encoding="UTF-8"?>
<notes>
<note>
<to>Chikck</to>
<from>ewrw</from>
</note>
</notes>
PHP script:
<?php
$xml = simplexml_load_file('filename2.xml');
$note = $xml->addChild('note');
$to = $note->addchild('to', 'Chikck');
$from = $note->addChild('from', 'ewrw');
$xml->asXML('filename2.xml');
?>
filename2.xml after running script:
<?xml version="1.0" encoding="UTF-8"?>
<notes>
<note>
<to>Chikck</to>
<from>ewrw</from>
</note>
<note>
<to>Chikck</to>
<from>ewrw</from>
</note>
</notes>
I have an XML doc that I need to load with PHP. I am currently using the simplexml_load_file() function, however the xml file is malformed, and consequently I am getting a parse error.
The XML file looks something like this:
...
</result>something1>
</else>
</else>
</resu
...
As you can see, this XML is whack and this function is throwing an error trying to parse it. Also I don't need this data that is corrupted. I would just like to read in the stuff that I can and throw everything else away.
As Jonah Bron suggested, try DOMDocument::loadHTML():
$dom = new DOMDocument();
$dom->strictErrorChecking = false;
libxml_use_internal_errors(true);
$dom->loadHTML($xml);
#Juliusz
You don't actually need to set the strictErrorChecking for this I don't think. I tried the following and it seems to work fine. To ignore the errors you need to set the libxml_use_internal_errors(true). Essentially you want to use DOMDocument instead of simplexml. I tried the following and worked without any problems:
<?php
$string = <<<XML
<?xml version='1.0'?>
<document>
<cmd>login</cmd>
<login>Richard</login>
</else>
</else>
</document>
XML;
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($string);
print $dom->saveHTML();
?>
Thusjanthan Kubendranathan
try to tidy it up, it worked well for me.
http://hu2.php.net/manual/en/intro.tidy.php
I'm using PHP5 to create XML files. I have code like this:
$doc = new DOMDocument();
...
$xml_content = $doc->saveXML();
The problem is that created XML code starts with a root node like this one:
<?xml version="1.0"?>
But I want it to be like this:
<?xml version="1.0" standalone="yes" ?>
I guess I need to call some function on $doc, but I can't figure out which one?
You want to set
$doc->xmlStandalone = true;
It's not a function of the class, it's a property so it's a little harder to find in the docs. You can read about it here.