xml:
<lev:Locatie axisLabels="x y" srsDimension="2" srsName="epsg:28992" uomLabels="m m">
<gml:exterior xmlns:gml="http://www.opengis.net/gml">
<gml:LinearRing>
<gml:posList>
222518.0 585787.0 222837.0 585875.0 223229.0 585969.0 223949.0 586123.0 223389.0 586579.0 223305.0 586564.0 222690.0 586464.0 222706.0 586319.0 222424.0 586272.0 222287.0 586313.0 222054.0 586517.0 221988.0 586446.0 222174.0 586305.0 222164.0 586292.0 222172.0 586202.0 222232.0 586143.0 222279.0 586149.0 222358.0 586076.0 222422.0 586018.0 222518.0 585787.0
</gml:posList>
</gml:LinearRing>
</gml:exterior>
</lev:Locatie>
I need to get to the gml:posList. I tried the following
SimpleXML:
$xmldata = new SimpleXMLElement($xmlstr);
$xmlns = $xmldata->getNamespaces(true);
$retval = array();
foreach( $xmldata as $attr => $child ) {
if ( (string)$child !== '' ) {
$retval[$attr] = (string)$child;
}
else {
$retval[$attr] = $child->children( $xmlns['gml'] );
}
}
var_export( $retval );
xpath:
$domdoc = new DOMDocument();
$domdoc->loadXML($xml );
$xpath = new DOMXpath($domdoc);
$xpath->registerNamespace('l', $xmlns['lev'] );
$xpath->registerNamespace('g', $xmlns['gml'] );
var_export( $xml->xpath('//g:posList') );
If I query the attributes for lev:Locatie, I can get them, however, I seem unable to retrieve the gml:posList's value or the attributes for e.g gml:exterior. I know I'm doing something wrong, I just don't see what ...
You're registering the namespaces on the DOMXpath instance, but use a SimpleXMLElement::xpath() call. That will not work. You can register them on the SimpleXMLElement using SimpleXMLElement::registerXpathNamespace() or you switch to DOM and use DOMXpath::evaluate(). The attributes do not have a prefix, so they are not in a namespace. gml:exterior does not have any attributes, only the namespace definition. It looks like an attribute but it is handled differently by the parser.
The nice thing about DOMXpath::evaluate() is that it can a node list or a scalar depending on the Xpath expression. So you can fetch a value directly.
For example the gml:posList:
$xmlString = <<<'XML'
<lev:Locatie axisLabels="x y" srsDimension="2" srsName="epsg:28992" uomLabels="m m" xmlns:lev="urn:lev">
<gml:exterior xmlns:gml="http://www.opengis.net/gml">
<gml:LinearRing>
<gml:posList>
222518.0 585787.0 222837.0
</gml:posList>
</gml:LinearRing>
</gml:exterior>
</lev:Locatie>
XML;
$document = new DOMDocument();
$document->loadXML($xmlString);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('g', 'http://www.opengis.net/gml');
var_export(
$xpath->evaluate('normalize-space(//g:posList)')
);
Output:
'222518.0 585787.0 222837.0'
normalize-space() is an Xpath function that replaces all sequences of whitespaces with a single space and trims the result. Because it is a string function it triggers a implicit cast of the first node from the location path.
Related
I have the following XML:
<root>
<level name="level1">
<!-- More children <level> -->
</level>
<level name="level2">
<!-- Some more children <level> -->
</level>
</root>
How can I extract a <level> directly under <root> so that I can run an XPath query such as $xml->xpath('//some-query') relative to the extracted <level>?
DOMXPath::query's second parameter is the context node. Just pass the DOMNode instance you have previously "found" and your query runs "relative" to that node. E.g.
<?php
$doc = new DOMDocument;
$doc->loadxml( data() );
$xpath = new DOMXPath($doc);
$nset = $xpath->query('/root/level[#name="level1"]');
if ( $nset->length < 1 ) {
die('....no such element');
}
else {
$elLevel = $nset->item(0);
foreach( $xpath->query('c', $elLevel) as $elC) {
echo $elC->nodeValue, "\r\n";
}
}
function data() {
return <<< eox
<root>
<level name="level1">
<c>C1</c>
<a>A</a>
<c>C2</c>
<b>B</b>
<c>C3</c>
</level>
<level name="level2">
<!-- Some more children <level> -->
</level>
</root>
eox;
}
But unless you have to perform multiple separate (possible complex) subsequent queries, this is most likely not necessary
<?php
$doc = new DOMDocument;
$doc->loadxml( data() );
$xpath = new DOMXPath($doc);
foreach( $xpath->query('/root/level[#name="level1"]/c') as $c ) {
echo $c->nodeValue, "\r\n";
}
function data() {
return <<< eox
<root>
<level name="level1">
<c>C1</c>
<a>A</a>
<c>C2</c>
<b>B</b>
<c>C3</c>
</level>
<level name="level2">
<c>Ahh</c>
<a>ouch</a>
<c>no</c>
<b>wrxl</b>
</level>
</root>
eox;
}
has the same output using just one query.
DOMXpath::evaluate() allows you to fetch node lists and scalar values from a DOM.
So you can fetch a value directly using an Xpath expression:
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
var_dump(
$xpath->evaluate('string(/root/level[#name="level2"]/#name)')
);
Output:
string(6) "level2"
The Xpath expression
All level element nodes in root:
/root/level
That have a specific name attribute:
/root/level[#name="level2"]
The value you like to fetch (name attribute for validation):
/root/level[#name="level2"]/#name
Cast into a string, if node was found the result will be an empty string:
string(/root/level[#name="level2"]/#name)
Loop over nodes, use them as context
If you need to execute several expression for the node it might be better to fetch it separately and use foreach(). The second argument for DOMXpath::evaluate() is the context node.
foreach ($xpath->evaluate('/root/level[#name="level2"]') as $level) {
var_dump(
$xpath->evaluate('string(#name)', $level)
);
}
Node list length
If you need to handle that no node was found you can check the DOMNodeList::$length property.
$levels = $xpath->evaluate('/root/level[#name="level2"]');
if ($levels->length > 0) {
$level = $levels->item(0);
var_dump(
$xpath->evaluate('string(#name)', $level)
);
} else {
// no level found
}
count() expression
You can validate that here are elements before with a count() expression, too.
var_dump(
$xpath->evaluate('count(/root/level[#name="level2"])')
);
Output:
float(1)
Boolean result
It is possible to make that a condition in Xpath and return the boolean value.
var_dump(
$xpath->evaluate('count(/root/level[#name="level2"]) > 0')
);
Output:
bool(true)
Using querypath for parsing XML/HTML makes this all super easy.
$qp = qp($xml) ;
$levels = $qp->find('root')->eq(0)->find('level') ;
foreach($levels as $level ){
//do whatever you want with it , get its xpath , html, attributes etc.
$level->xpath() ; //
}
Excellent beginner tutorial for Querypath
This should work:
$dom = new DOMDocument;
$dom->loadXML($xml);
$levels = $dom->getElementsByTagName('level');
foreach ($levels as $level) {
$levelname = $level->getAttribute('name');
if ($levelname == 'level1') {
//do stuff
}
}
I personally prefer the DOMNodeList class for parsing XML.
I have an XML file that looks something like this:
<booking-info-list>
<booking-info>
<index>1</index>
<pricing-info-index>1</pricing-info-index>
<booking-type>W</booking-class>
<cabin-type>E</cabin-type>
<ticket-type>E</ticket-type>
<booking-status>P</booking-status>
</booking-info>
<booking-info>
<index>2</index>
<pricing-info-index>1</pricing-info-index>
<booking-type>W</booking-class>
<cabin-type>E</cabin-type>
<ticket-type>E</ticket-type>
<booking-status>P</booking-status>
</booking-info>
<booking-info>
<index>3</index>
<pricing-info-index>1</pricing-info-index>
<booking-type>W</booking-class>
<cabin-type>E</cabin-type>
<ticket-type>E</ticket-type>
<booking-status>P</booking-status>
</booking-info>
</booking-info-list>
Is there a simple way to replace/remove the - (hyphen) in all tags?
The hyphen is not a special character in XML node names. It is a problem in SimpleXML only because it is an operator in PHP. Here is no need to change them and possibly destroy the XML.
You can use the variable variable syntax to access the elements.
$element = simplexml_load_string($xml);
foreach($element->{'booking-info'} as $element) {
var_dump($element);
}
It is not an issue if you're using Xpath:
$element = simplexml_load_string($xml);
foreach ($element->xpath('//booking-info') as $element) {
var_dump($element);
}
The Xpath expression is a string for PHP.
Or DOM:
$document = new DOMDocument();
$document->loadXml($xml);
foreach ($document->getElementsByTagName('booking-info') as $node) {
var_dump($node);
}
The name is a string for PHP.
Or DOM with XPath:
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
foreach ($xpath->evaluate('//booking-info') as $node) {
var_dump($node);
}
HINT: You have an error in the XML - <booking-type>...</booking-class> has different names for the opening and closing tag.
I'm looking for a way to transform this:
...[inner content]...
Into this:
...[inner content]...
The context has multiple links a with other showinfo:[integer] values. (I can process those ones)
Thanks for any help,
Bálint
Edit: Thanks to Kaiser's answer, here is the working snippet:
$html = $a;
$dom = new \DOMDocument;
#$dom->loadHTML( $html ); //Cannot guarantee all-valid input
foreach ($dom->getElementsByTagName('a') as $tag) {
// Fixed strstr order and added a != false check - the, because the string started with the substring
if ($tag->hasAttribute('href') && strstr($tag->getAttribute('href'), 'showinfo:3875') != false) {
$tag->setAttribute( 'href', "http://somelink.com/{$tag->textContent}");
// Assign the Converted HTML, prevents failing when saving
$html = $tag;
}
}
return $dom->saveHTML( $dom);
}
You can use DOMDocument for a pretty reliable and fast way to handle DOM nodes and their attributes, etc. Hint: Much faster and more reliable than (most) Regex.
// Your original HTML
$html = '[inner content]';
$dom = new \DOMDocument;
$dom->loadHTML( $html );
Now that you have your DOM ready, you can use either the DOMDocument methods or DOMXPath to search through it and obtain your target element.
Example with XPath:
$xpath = new DOMXpath( $dom );
// Alter the query to your needs
$el = $xpath->query( "/html/body/a[href='showinfo:']" );
or for example by ID with the DOMDocument methods:
// Check what we got so we have something to compare
var_dump( 'BEFORE', $html );
foreach ( $dom->getElementsByTagName( 'a' ) as $tag )
{
if (
$tag->hasAttribute( 'href' )
and stristr( $tag->getAttribute( 'href' ), 'showinfo:3875' )
)
{
$tag->setAttribute( 'href', "http://somelink.com/{$tag->textContent}" );
// Assign the Converted HTML, prevents failing when saving
$html = $tag;
}
}
// Now Save Our Converted HTML;
$html = $dom->saveHTML( $html);
// Check if it worked:
var_dump( 'AFTER', $html );
It's as easy as that.
I'm trying to extract all of the "name" and "form13FFileNumber" values from xpath "//otherManagers2Info/otherManager2/otherManager" in this document:
https://www.sec.gov/Archives/edgar/data/1067983/000095012314002615/primary_doc.xml
Here is my code. Any idea what I am doing wrong here?
$xml = file_get_contents($url);
$dom = new DOMDocument();
$dom->loadXML($xml);
$x = new DOMXpath($dom);
$other_managers = array();
$nodes = $x->query('//otherManagers2Info/otherManager2/otherManager');
if (!empty($nodes)) {
$i = 0;
foreach ($nodes as $n) {
$i++;
$other_managers[$i]['form13FFileNumber'] = $x->evaluate('form13FFileNumber', $n)->item(0)->nodeValue;
$other_managers[$i]['name'] = $x->evaluate('name', $n)->item(0)->nodeValue;
}
}
Like you posted in the comment you can just register the namespace with an own prefix for Xpath. Namespace prefixes are just aliases. Here is no default namespace in Xpath, so you always have to register and use an prefix.
However, expressions always return a traversable node list, you can use foreach to iterate them. query() and evaluate() take a context node as the second argument, expression are relative to the context. Last evaluate() can return scalar values directly. This happens if you cast the node list in Xpath into a scalar type (like a string) or use function like count().
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('e13', 'http://www.sec.gov/edgar/thirteenffiler');
$xpath->registerNamespace('ecom', 'http://www.sec.gov/edgar/common');
$result = [];
$nodes = $xpath->evaluate('//e13:otherManagers2Info/e13:otherManager2/e13:otherManager');
foreach ($nodes as $node) {
$result[] = [
'form13FFileNumber' => $xpath->evaluate('string(e13:form13FFileNumber)', $node),
'name' => $xpath->evaluate('string(e13:name)', $node),
];
}
var_dump($result);
Demo: https://eval.in/125200
I'm after a way of making simplexml_load_string return a document where all the text values are urldecoded. For example:
$xmlstring = "<my_element>2013-06-19+07%3A20%3A51</my_element>";
$xml = simplexml_load_string($xmlstring);
$value = $xml->my_element;
//and value would contain: "2013-06-19 07:20:51"
Is it possible to do this? I'm not concerned about attribute values, although that would be fine if they were also decoded.
Thanks!
you can run
$value = urldecode( $value )
which will decode your string.
See: http://www.php.net/manual/en/function.urldecode.php
As long as each value is inside an element of its own (in SimpleXML you can not process text-nodes on its own, compare with the table in Which DOMNodes can be represented by SimpleXMLElement?) this is possible.
As others have outlined, this works by applying the urldecode function on each of these elements.
To do that, you need to change and add some lines of code:
$xml = simplexml_load_string($xmlstring, 'SimpleXMLIterator');
if (!$xml->children()->count()) {
$nodes = [$xml];
} else {
$nodes = new RecursiveIteratorIterator($xml, RecursiveIteratorIterator::LEAVES_ONLY);
}
foreach($nodes as $node) {
$node[0] = urldecode($node);
}
This code-example takes care that each leave is processed and in case, it's only the root element, that that one is processed. Afterwards, the whole document is changed so that you can access it as known. Demo:
<?php
/**
* URL decode all values in XML document in PHP
* #link https://stackoverflow.com/q/17805643/367456
*/
$xmlstring = "<root><my_element>2013-06-19+07%3A20%3A51</my_element></root>";
$xml = simplexml_load_string($xmlstring, 'SimpleXMLIterator');
$nodes = $xml->children()->count()
? new RecursiveIteratorIterator(
$xml, RecursiveIteratorIterator::LEAVES_ONLY
)
: [$xml];
foreach ($nodes as $node) {
$node[0] = urldecode($node);
}
echo $value = $xml->my_element; # prints "2013-06-19 07:20:51"