I've been trying to mimic the XXE vulnerability on my virtual machine for learning purposes. However, I think I misunderstand something. When I call libxml_disable_external_entities(false); I would think that external entities are now loaded, however, external entities does not seem to load.
When I set the flag LIBXML_NOENT, it does work. But that leaves me with the question, what does libxml_disable_external_entities actually do then? I have set it both to false and true, with the same results.
So I went to the internet with the same question and noticed a CVE report on the ubuntu website (I have tested on ubuntu). The CVE stated that libxml is standard vulnerable to XXE. The fix was to stop loading external entities by default. See: https://usn.ubuntu.com/1904-1/
My test setup:
<?php
$xml= '<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "http://www.localhost:8000" >]><foo>&xxe;</foo>';
libxml_disable_entity_loader(false);
$document = new DOMDocument();
$document->loadXML($xml);
?>
I expected my webserver to receive a call, but it did not. The PHP handler gives me no output. So my question: what does libxml_disable_entity_loader actually do?
PS: Could not find anything in libxml code either. Could find the fix to stop loading external entities unless otherwise stated, though.
Thanks!
UPDATE: I have found that it only works with the combination if libxml_disable_entitiy_loader(false); and the LIBXML_NOENT flag set in the 'loadXML' method.
EDIT:
The method does the following (in PHP code):
PHP_LIBXML_API zend_bool php_libxml_disable_entity_loader(zend_bool disable) /* {{{ */
{
zend_bool old = LIBXML(entity_loader_disabled);
LIBXML(entity_loader_disabled) = disable;
return old;
}
However, searching for entity_loader_disabled yields no results in libxml code.
Related
I'm using an instance of PHPs built-in XMLReader to read some kind of user-generated XML file. Usually this XML files content starts like the following sample snippet, where everything works fine:
<?xml version="1.0" encoding="UTF-8"?>
<openimmo>
<uebertragung art="OFFLINE" umfang="VOLL" version="1.2.7" (...)
However, another user uses a different software to send and generate the XML file. The XML generated by this software starts like:
<?xml version="1.0" encoding="UTF-8"?>
<openimmo xmlns="http://www.openimmo.de" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openimmo.de openimmo.xsd">
<uebertragung art="OFFLINE" umfang="VOLL" version="1.2.7" (...)
Which causes my importer to fail with the following error:
XMLReader::read(): Element '{http://www.openimmo.de}openimmo': No matching global declaration available for the validation root.
I'm already doing validation by manually applying some XSD schema. The passed file follows the same schema, just explicitly specifies the xmlns attributes. How can I work around this issue? How can I tell XMLReader to just ignore that xmlns statement?
My code (simplified to the relevant sections) looks like the following snippet:
$reader = new XMLReader();
$success = #$reader->open($path);
if (!$success) { /* error handling */ }
$reader->setSchema($localOpenImmoXsdPath);
/* then starts reading and throws the above exception */
Namespace information is fundamental and there's no way an XML parser is going to ignore it.
Your options are either (a) send the file back to sender, saying it doesn't conform to the agreed schema, or (b) transform the file sent to you so that it does conform, by changing the namespace. That's a fairly simple XSLT transformation.
My immediate instinct was to look at the OpenImmo specs to see what they say about namespaces and schema conformance, but unfortunately access to the specs requires registration and licensing. Basically, either the specs allow both these formats, which would be a pretty shoddy spec, or they only allow one of them, in which case you shouldn't be accepting both.
I am using xsd2php library to parse XSD which describes API request body. Then using the same library (which itself uses jsm-serializer) I try to serialize objects:
$payload = new TrackRequest;
$searchCriteria = new SearchCriteriaAType;
$searchCriteria->addToConsignmentNumber(11111);
$payload->setSearchCriteria($searchCriteria);
$levelOfDetail = new LevelOfDetailAType;
$levelOfDetail->setSummary(true);
$payload->setLevelOfDetail($levelOfDetail);
Using basic serializer settings:
$serializerBuilder = SerializerBuilder::create();
$serializerBuilder->addMetadataDir(__DIR__ . '/../../metadata/Tracking', 'TNTExpressConnect\Tracking\XSD');
$serializerBuilder->setPropertyNamingStrategy(new IdenticalPropertyNamingStrategy);
$serializerBuilder->configureHandlers(function (HandlerRegistryInterface $handler) use ($serializerBuilder) {
$serializerBuilder->addDefaultHandlers();
$handler->registerSubscribingHandler(new BaseTypesHandler()); // XMLSchema List handling
$handler->registerSubscribingHandler(new XmlSchemaDateHandler()); // XMLSchema date handling
});
Serialization results in:
<?xml version="1.0" encoding="UTF-8"?>
<result>
<searchCriteria>
<account/>
<alternativeConsignmentNumber/>
<consignmentNumber>
<entry><![CDATA[11111]]></entry>
</consignmentNumber>
<customerReference/>
<pieceReference/>
</searchCriteria>
<levelOfDetail>
<summary>true</summary>
</levelOfDetail>
</result>
Regarding this results I have several questions:
Why the root element is <result> and not <TrackRequest>?
How to get rid of CDATA?
How to get rid of <entry> tags in favor of creating separate consigmentNumber tag for each entry?
How to replace <summary>true</summary> with self-closing tag <summary/>
I guess for every one of this cases I can create a dedicated handler, but maybe there is a built-in solution, which I overlooked in the documentation (maybe some config options that can be placed in yaml).
And if I have to create handlers maybe someone can point me the more sophisticated example, that explains how to do it right.
I'm not a big fan of annotations, so I would prefer to use separate config files.
Thank you in advance.
You should have a look ar the YAML Reference. A lot of things can be set up with the meta data files.
To change the "result" to "TrackRequest" add this line to the file:
Vendor\MyBundle\Model\ClassName:
xml_root_name: TrackRequest ## Changes the root element
To get rid of cdata in entry change the property:
properties:
entry:
xml_element:
cdata: false ## Add this to disable cdata tags
Just came accross the same problems as you did. I hope it helps.
I am parsing XML using SimpleXML in PHP 5 and external entities are not working. The XML parses, but the entities just are blank. The underlying library is libxml2.
This is the code:
libxml_disable_entity_loader(false);
simplexml_load_file($target_file);
It parses the XML as expected, but doesn't resolve the external entities and seems to ignore them.
This is the expected behavior, because you need to tell when loading the document to expand (and therefore remove) these entities:
libxml_disable_entity_loader(false);
simplexml_load_file($target_file, 'SimpleXMLElement', LIBXML_NOENT);
############
This constant is also cross-linked on the manual page of libxml_disable_entity_loader.
The function itself only enables or disables the default entity loader. Additionally the parser needs to be told via the libxml2 based option flag, that those entities should be substituted. Only then the default loader (or if you have set it to a different one) would kick in.
Online Demo:
<?php
/**
* #link https://stackoverflow.com/a/29864193/367456
*/
$buffer = <<<XML
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY >
<!ENTITY xxe SYSTEM "data://text/plain,test" >]><foo>&xxe;</foo>
XML;
libxml_disable_entity_loader(false);
$xml = simplexml_load_string($buffer);
$xml->asXML('php://output');
$xml = simplexml_load_string($buffer, 'SimpleXMLElement', LIBXML_NOENT);
$xml->asXML('php://output');
Output for 5.2.11 - 5.6.8, php7#20140507 - 20150401, hhvm-3.5.0 - 3.6.1
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "data://text/plain,test">
]>
<foo>&xxe;</foo>
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "data://text/plain,test">
]>
<foo>test</foo>
The XML is based on the exploit examples outlined on XML External Entity (XXE) Processing (OWASP Wiki) and modified for the PHP demonstration regarding your question.
And more important than PHP versions is the version of libxml both on system and the binding in PHP. Just saying in case the 3v4l.org demo code creates the impression that it always behaved the same in all PHP versions - this just must not be the case.
Related Q&A
Clarifications on XXE vulnerabilities throughout PHP versions (Jun 2014)
Check for malicious XML before allowing DTD loading? (Jul 2014)
I post a question here as a last resort, I have browsed the web and went through many attempts but did not succeed.
Replicating a XXE attack is what I am trying to do, in order to prevent them, but I cannot seem to get my head around the way PHP works with XML entities. For the record I am using PHP 5.5.10 on Ubuntu 12.04, but I have done some tests on 5.4 and 5.3, and libxml2 seem to be of version 2.7.8 (which does not seem to include the default to not resolving entities).
In the following example, calling libxml_disable_entity_loader() with true or false has no effect, or I am doing something wrong.
$xml = <<<XML
<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY c PUBLIC "bar" "/etc/passwd">
]>
<root>
<test>Test</test>
<sub>&c;</sub>
</root>
XML;
libxml_disable_entity_loader(true);
$dom = new DOMDocument();
$dom->loadXML($xml);
// Prints Test.
print $dom->textContent;
But, I could specifically pass some arguments to loadXML() to allow some options, and that works when the entity is a local file, not when it is an external URL.
$xml = <<<XML
<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY c PUBLIC "bar" "/etc/passwd">
]>
<root>
<test>Test</test>
<sub>&c;</sub>
</root>
XML;
$dom = new DOMDocument();
$dom->loadXML($xml, LIBXML_NOENT | LIBXML_DTDLOAD);
// Prints Test.
print $dom->textContent;
Now if we are changing the entity to something else, as in the following example, the entity is resolved but I could not disable it at all using the parameters or function... What is happening?!
$xml = <<<XML
<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY c "Blah blah">
]>
<root>
<test>Test</test>
<sub>&c;</sub>
</root>
XML;
$dom = new DOMDocument();
$dom->loadXML($xml);
// Prints Test.
print $dom->textContent;
The only way that I could find was to overwrite the properties of the DOMDocument object.
resolveExternals set to 1
substituteEntities set to 1
Then they are resolved, or not.
So to summarise, I would really like to understand what I am obviously not understanding . Why do those parameters and function seem to have no effect? Is libxml2 taking precedence over PHP?
Many thanks!
References:
https://www.owasp.org/index.php/XML_External_Entity_%28XXE%29_Processing
http://au2.php.net/libxml_disable_entity_loader
http://au2.php.net/manual/en/libxml.constants.php
http://www.vsecurity.com/download/papers/XMLDTDEntityAttacks.pdf
http://www.mediawiki.org/wiki/XML_External_Entity_Processing
How can I use PHP's various XML libraries to get DOM-like functionality and avoid DoS vulnerabilities, like Billion Laughs or Quadratic Blowup?
Keeping it simple .. As it should be simple :-)
Your first code snippet
libxml_disable_entity_loader does or does not do anything here based on whether your system resolves entities by default or not (mine does not). This is controlled by LIBXML_NOENT option of libxml.
Without it the document processor may not even try translating external entities and therefore libxml_disable_entity_loader has nothing to really influence (if libxml does not load entities by default which seems to be the case in your test-case).
Add LIBXML_NOENT to loadXML() like this:
$dom->loadXML($xml, LIBXML_NOENT);
and you'll quickly get:
PHP Warning: DOMDocument::loadXML(): I/O warning : failed to load external entity "/etc/passwd" in ...
PHP Warning: DOMDocument::loadXML(): Failure to process entity c in Entity, line: 7 in ...
PHP Warning: DOMDocument::loadXML(): Entity 'c' not defined in Entity, line: 7 in ...
Your second code snippet
In this scenario you've enabled entity resolving by using the LIBXML_NOENT option, that's why it goes after /etc/passwd.
The example works just fine on my machine even for external URL - I changed the ENTITY to an external one like this:
<!ENTITY c PUBLIC "bar" "https://stackoverflow.com/opensearch.xml">
It can, however, be even influenced by eg. allow_url_fopen PHP INI setting - put it to false and PHP won't ever load a remote file.
Your third code snippet
XML Entity that you've provided is not an external one but rather an internal one (see eg. here).
Your entity:
<!ENTITY c "Blah blah">
How internal entity is defined:
<!ENTITY % name "entity_value">
Therefore there is no reason for PHP or libxml to prevent resolving such entity.
Conclusion
I've quickly put up a PHP XXE tester script which tries out different settings and shows whether XXE is successful and in which case.
The only line that should actually show a warning is the "LIBXML_NOENT" one.
If any other line loads the WARNING, external entity loaded! your setup does allow loading external entities by default.
You can't go wrong by using SHOULD USE libxml_disable_entity_loader() regardless of your/your provider's machine default settings. If your app ever gets migrated it might become vulnerable instantly.
correct usage
As the MediaWiki states in link you've posted.
Unfortunately, the way that libxml2 implements the disabling, the library is crippled when external entities are disabled, and functions that would otherwise be safe cause an exception in the entire parsing.
$oldValue = libxml_disable_entity_loader(true);
// do whatever XML-processing related
libxml_disable_entity_loader($oldValue);
Note: libxml_disable_entity_loader() also prohibits loading external xml files directly (not through entities):
<?php
$remote_xml = "https://stackoverflow.com/opensearch.xml";
$dom = new DOMDocument();
if ($dom->load($remote_xml) !== FALSE)
echo "loaded remote xml!\n";
else
echo "failed to load remote xml!\n";
libxml_disable_entity_loader(true);
if ($dom->load($remote_xml) !== FALSE)
echo "loaded remote xml after libxml_disable_entity_loader(true)!\n";
else
echo "failed to remote xml after libxml_disable_entity_loader(true)!\n";
On my machine:
loaded remote xml!
PHP Warning: DOMDocument::load(): I/O warning : failed to load external entity "https://stackoverflow.com/opensearch.xml" in ...
failed to remote xml after libxml_disable_entity_loader(true)!
It might perhaps be related to this PHP bug but PHP is being really stupid about it as:
libxml_disable_entity_loader(true);
$dom->loadXML(file_get_contents($remote_xml));
works just fine.
I have switched my zend framework version from 1.11 to 1.12.3 In the tests i detect a strange error that i cannot explain. I have some xml fetch and processing routines that yell at me.
PHP Fatal error: Uncaught exception 'Zend_Dom_Exception' with message
'Invalid XML: Detected use of illegal DOCTYPE' in ....
In zend framework 1.11 i had library/Zend/Dom/Query.php:197:
switch ($type) {
case self::DOC_XML:
$success = $domDoc->loadXML($document);
break;
....
In 1.12 the code looks strange
switch ($type) {
case self::DOC_XML:
$success = $domDoc->loadXML($document);
foreach ($domDoc->childNodes as $child) {
if ($child->nodeType === XML_DOCUMENT_TYPE_NODE) {
require_once 'Zend/Dom/Exception.php';
throw new Zend_Dom_Exception(
'Invalid XML: Detected use of illegal DOCTYPE'
);
}
}
break;
.....
If i get this right, this routine will not parse doc xml with doctype.
Little example that fails on my computer all the time:
require_once 'Zend/Dom/Query.php';
$f = '<?xml version="1.0" standalone="yes"?>' .
'<!DOCTYPE hallo [<!ELEMENT hallo (#PCDATA)>]>' .
'<hallo>Hallo Welt!</hallo>';
$dom = new Zend_Dom_Query($f);
$results = $dom->queryXpath('//hallo');
Can someone explain this to me???
I testeted with Zend Framework 1.12.3 and php 5.3.2 and 5.4.6
I read it the same way as you did. Googled about it for a while and found the following in the HTML <!DOCTYPE> Declaration article from w3schools:
The declaration must be the very first thing in your HTML document, before the tag.
I've coded a small test based on your example and just moved the <!DOCTYPE> declaration to the top of your XML and it seems to work:
<?php
require_once 'Zend/Dom/Query.php';
$f = <<<XML
<!DOCTYPE hallo [<!ELEMENT hallo (#PCDATA)>]>
<?xml version="1.0" standalone="yes"?>
<hallo>Hallo Welt!</hallo>
XML;
$dom = new Zend_Dom_Query($f);
$results = $dom->queryXpath('//hallo');
foreach ($results as $result) {
echo $result->C14N();
}
Output:
<hallo>Hallo Welt!</hallo>
Ok i had a little talk with Matthew Weier O'Phinney and the reason why DOCTYPES are not accepted anymore. The reason is the security patch here http://framework.zend.com/security/advisory/ZF2012-02
They disabled the doctype feature to prevent XXE and XEE.
"I closed the report because it's something we cannot fix, due to security implications. It doesn't matter if it's valid XML -- XEE and XXE vectors utilize perfectly valid XML in order to exploit issues in the underlying XML parser. Because we cannot control what version of libxml is used in every PHP distribution on which ZF is deployed, we must be defensive in our code. Furthermore, the moment we add a switch to disable the XEE and XXE vector checks, folks will use that switch without understanding the reason behind them.
There are a number of tools you can use to pre-process XML -- including pandoc or the PCRE tools in PHP -- if you cannot control the source of the XML and still want to parse it with our tools."
I've mentioned that this was already fixed by libxml2 itself in 2012. But he argued that they have no idea witch version of libxml2 is used in the special cases.
So what are the solutions?
Use XML Preprocessor
Write a patch that removes this changes (only if you are sure that you use a XXE XEE patched libxml2 version)
Write your own components
Use php components SimpleXMLElement or DomDocument
Thank you Rolando Isidoro for the help :)