I have issue with how to get element xml using xpath php, i already create a php file to extract the "attributes" xml by using xpath php.
What i want is how to extract every element in xml by using xpath.
test.xml
<?xml version="1.0" encoding="UTF-8"?>
<InvoicingData>
<CreationDate> 2014-02-02 </CreationDate>
<OrderNumber> XXXX123 </OrderNumber>
<InvoiceDetails>
<InvoiceDetail>
<SalesCode> XX1A </SalesCode>
<SalesName> JohnDoe </SalesName>
</InvoiceDetail>
</InvoiceDetails>
</InvoicingData>
read.php
<?php
$doc = new DOMDocument();
$doc->loadXML(file_get_contents("test.xml"));
$xpath = new DOMXpath($doc);
$nodes = $xpath->query('//*');
$names = array();
foreach ($nodes as $node)
{
$names[] = $node->nodeName;
}
echo join(PHP_EOL, ($names));
?>
From the code above it will print like this :
CreationDate OrderNumber InvoiceDetails InvoiceDetail SalesCode
SalesName
So, the problem is, how to get the element inside the attribute, basically this is what i want to print :
2014-02-02 XXXX123 XX1A JohnDoe
You use $node->textContent to get the textual value of the node (and its descendants, if any).
In response to your first comment:
You didn't use $node->textContent. Try this:
$doc = new DOMDocument();
$doc->loadXML(file_get_contents("test.xml"));
$xpath = new DOMXpath($doc);
$nodes = $xpath->query('//*');
$names = array();
$values = array(); // created a separate array for the values
foreach ($nodes as $node)
{
$names[] = $node->nodeName;
$values[] = $node->textContent; // push to $values array
}
echo join(PHP_EOL, ($values));
However, if you only want to push the textual values when they're a direct child of an element and still want to collect all node names as well, you could do something like:
foreach ($nodes as $node)
{
$names[] = $node->nodeName;
// check that this node only contains one text node
if( $node->childNodes->length == 1 && $node->firstChild instanceof DOMText ) {
$values[] = $node->textContent;
}
}
echo join(PHP_EOL, ($values));
And if you only care about the nodes that directly contain textual values, you could do something like this:
// this XPath query only selects those nodes that directly contain non-whitespace text
$nodes = $xpath->query('//*[./text()[normalize-space()]]');
$values = array();
foreach ($nodes as $node)
{
// add nodeName as key
// (only works reliable of there's never a duplicate nodeName in your XML)
// and add textContent as value
$values[ $node->nodeName ] = trim( $node->textContent );
}
var_dump( $values );
Related
I'm trying to get the bing search results with XPath. Here is my code:
$html = file_get_contents("http://www.bing.com/search?q=bacon&first=11");
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHtml($html);
$x = new DOMXpath($doc);
$output = array();
// just grab the urls for now
foreach ($x->query("//li[#class='b_algo']") as $node)
{
//$output[] = $node->getAttribute("href");
$tmpDom = new DOMDocument();
$tmpDom->loadHTML($node);
$tmpDP = new DOMXPath($tmpDom);
echo $tmpDP->query("//div[#class='b_title']//h2//a//href");
}
return $output;
This foreach iterates over all results, all I want to do is to extract the link and text from $node in foreach, but because $node itself is an object I can't create a DOMDocument from it. How can I query it?
First of all, your XPath expression tries to match non-existant href subelements, query #href for the attribute.
You don't need to create any new DOMDocuments, just pass the $node as context item:
foreach ($x->query("//li[#class='b_algo']") as $node)
{
var_dump( $x->query("./div[#class='b_title']//h2//a//#href", $node)->item(0) );
}
If you're just interested in the URLs, you could also query them directly:
foreach ($x->query("//li[#class='b_algo']/div[#class='b_title']/h2/a/#href") as $node)
{
var_dump($node);
}
I'm trying to extract all of the "name" and "form13FFileNumber" values from xpath "//otherManagers2Info/otherManager2/otherManager" in this document:
https://www.sec.gov/Archives/edgar/data/1067983/000095012314002615/primary_doc.xml
Here is my code. Any idea what I am doing wrong here?
$xml = file_get_contents($url);
$dom = new DOMDocument();
$dom->loadXML($xml);
$x = new DOMXpath($dom);
$other_managers = array();
$nodes = $x->query('//otherManagers2Info/otherManager2/otherManager');
if (!empty($nodes)) {
$i = 0;
foreach ($nodes as $n) {
$i++;
$other_managers[$i]['form13FFileNumber'] = $x->evaluate('form13FFileNumber', $n)->item(0)->nodeValue;
$other_managers[$i]['name'] = $x->evaluate('name', $n)->item(0)->nodeValue;
}
}
Like you posted in the comment you can just register the namespace with an own prefix for Xpath. Namespace prefixes are just aliases. Here is no default namespace in Xpath, so you always have to register and use an prefix.
However, expressions always return a traversable node list, you can use foreach to iterate them. query() and evaluate() take a context node as the second argument, expression are relative to the context. Last evaluate() can return scalar values directly. This happens if you cast the node list in Xpath into a scalar type (like a string) or use function like count().
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('e13', 'http://www.sec.gov/edgar/thirteenffiler');
$xpath->registerNamespace('ecom', 'http://www.sec.gov/edgar/common');
$result = [];
$nodes = $xpath->evaluate('//e13:otherManagers2Info/e13:otherManager2/e13:otherManager');
foreach ($nodes as $node) {
$result[] = [
'form13FFileNumber' => $xpath->evaluate('string(e13:form13FFileNumber)', $node),
'name' => $xpath->evaluate('string(e13:name)', $node),
];
}
var_dump($result);
Demo: https://eval.in/125200
I'm after a way of making simplexml_load_string return a document where all the text values are urldecoded. For example:
$xmlstring = "<my_element>2013-06-19+07%3A20%3A51</my_element>";
$xml = simplexml_load_string($xmlstring);
$value = $xml->my_element;
//and value would contain: "2013-06-19 07:20:51"
Is it possible to do this? I'm not concerned about attribute values, although that would be fine if they were also decoded.
Thanks!
you can run
$value = urldecode( $value )
which will decode your string.
See: http://www.php.net/manual/en/function.urldecode.php
As long as each value is inside an element of its own (in SimpleXML you can not process text-nodes on its own, compare with the table in Which DOMNodes can be represented by SimpleXMLElement?) this is possible.
As others have outlined, this works by applying the urldecode function on each of these elements.
To do that, you need to change and add some lines of code:
$xml = simplexml_load_string($xmlstring, 'SimpleXMLIterator');
if (!$xml->children()->count()) {
$nodes = [$xml];
} else {
$nodes = new RecursiveIteratorIterator($xml, RecursiveIteratorIterator::LEAVES_ONLY);
}
foreach($nodes as $node) {
$node[0] = urldecode($node);
}
This code-example takes care that each leave is processed and in case, it's only the root element, that that one is processed. Afterwards, the whole document is changed so that you can access it as known. Demo:
<?php
/**
* URL decode all values in XML document in PHP
* #link https://stackoverflow.com/q/17805643/367456
*/
$xmlstring = "<root><my_element>2013-06-19+07%3A20%3A51</my_element></root>";
$xml = simplexml_load_string($xmlstring, 'SimpleXMLIterator');
$nodes = $xml->children()->count()
? new RecursiveIteratorIterator(
$xml, RecursiveIteratorIterator::LEAVES_ONLY
)
: [$xml];
foreach ($nodes as $node) {
$node[0] = urldecode($node);
}
echo $value = $xml->my_element; # prints "2013-06-19 07:20:51"
I'm trying to extract 2 elements using PHP Curl and Xpath!
So far have the element separated in foreach but I would like to have them in the same time:
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->evaluate("//p[#class='row']/a/#href");
//$elements = $xpath->query("//p[#class='row']/a");
foreach ($elements as $element) {
$url = $element->nodeValue;
//$title = $element->nodeValue;
}
When I echo each one out of the foreach I only get 1 element and when its echoed inside the foreach i get all of them.
My question is how can I get them both at the same time (url and title ) and whats the best way to add them into myqsl using pdo.
thank you
There is no need, in this case, to use XPath twice. You could do one query and navigate to the associated other node(s).
For example, find all of the hrefs that you are interested in and get their ownerElement's (the <a>) node value.
$hrefs = $xpath->query("//p[#class='row']/a/#href");
foreach ($hrefs as $href) {
$url = $href->value;
$title = $href->ownerElement->nodeValue;
// Insert into db here
}
Or, find all of the <a>s that you are interested in and get their href attributes.
$anchors = $xpath->query("//p[#class='row']/a[#href]");
foreach ($anchors as $anchor) {
$url = $anchor->getAttribute("href");
$title = $anchor->nodeValue;
// Insert into db here
}
You're overwriting $url on each iteration. Maybe use an array?
#$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->evaluate("//p[#class='row']/a/#href");
//$elements = $xpath->query("//p[#class='row']/a");
$urls = array();
foreach ($elements as $element){
array_push($urls, $element->nodeValue);
//$title = $element->nodeValue;
}
Question
How can I remove empty xml tags in PHP?
Example:
$value1 = "2";
$value2 = "4";
$value3 = "";
xml = '<parentnode>
<tag1> ' .$value1. '</tag1>
<tag2> ' .$value2. '</tag2>
<tag3> ' .$value3. '</tag3>
</parentnode>';
XML Result:
<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
<tag3></tag3> // <- Empty tag
</parentnode>
What I want!
<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
</parentnode>
The XML without the empty tags like "tag3"
Thanks!
You can use XPath with the predicate not(node()) to select all elements that do not have child nodes.
<?php
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadxml('<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
<tag3></tag3>
<tag2>4</tag2>
<tag3></tag3>
<tag2>4</tag2>
<tag3></tag3>
</parentnode>');
$xpath = new DOMXPath($doc);
foreach( $xpath->query('//*[not(node())]') as $node ) {
$node->parentNode->removeChild($node);
}
$doc->formatOutput = true;
echo $doc->savexml();
prints
<?xml version="1.0"?>
<parentnode>
<tag1>2</tag1>
<tag2>4</tag2>
<tag2>4</tag2>
<tag2>4</tag2>
</parentnode>
This works recursively and removes nodes that:
contain only spaces
do not have attributes
do not have child notes
// not(*) does not have children elements
// not(#*) does not have attributes
// text()[normalize-space()] nodes that include whitespace text
while (($node_list = $xpath->query('//*[not(*) and not(#*) and not(text()[normalize-space()])]')) && $node_list->length) {
foreach ($node_list as $node) {
$node->parentNode->removeChild($node);
}
}
$dom = new DOMDocument;
$dom->loadXML($xml);
$elements = $dom->getElementsByTagName('*');
foreach($elements as $element) {
if ( ! $element->hasChildNodes() OR $element->nodeValue == '') {
$element->parentNode->removeChild($element);
}
}
echo $dom->saveXML();
CodePad.
The solution that worked with my production PHP SimpleXMLElement object code, by using Xpath, was:
/*
* Remove empty (no children) and blank (no text) XML element nodes, but not an empty root element (/child::*).
* This does not work recursively; meaning after empty child elements are removed, parents are not reexamined.
*/
foreach( $this->xml->xpath('/child::*//*[not(*) and not(text()[normalize-space()])]') as $emptyElement ) {
unset( $emptyElement[0] );
}
Note that it is not required to use PHP DOM, DOMDocument, DOMXPath, or dom_import_simplexml().
//this is a recursively option
do {
$removed = false;
foreach( $this->xml->xpath('/child::*//*[not(*) and not(text()[normalize-space()])]') as $emptyElement ) {
unset( $emptyElement[0] );
$removed = true;
}
} while ($removed) ;
If you're going to be a lot of this, just do something like:
$value[] = "2";
$value[] = "4";
$value[] = "";
$xml = '<parentnode>';
for($i=1,$m=count($value); $i<$m+1; $i++)
$xml .= !empty($value[$i-1]) ? "<tag{$i}>{$value[$i-1]}</tag{$i}>" : null;
$xml .= '</parentnode>';
echo $xml;
Ideally though, you should probably use domdocument.