How to replace special chars in XML node with SimpleXMLElement PHP

How to replace special chars in XML node with SimpleXMLElement PHP - php

I have an XML file that looks something like this:
<booking-info-list>
<booking-info>
<index>1</index>
<pricing-info-index>1</pricing-info-index>
<booking-type>W</booking-class>
<cabin-type>E</cabin-type>
<ticket-type>E</ticket-type>
<booking-status>P</booking-status>
</booking-info>
<booking-info>
<index>2</index>
<pricing-info-index>1</pricing-info-index>
<booking-type>W</booking-class>
<cabin-type>E</cabin-type>
<ticket-type>E</ticket-type>
<booking-status>P</booking-status>
</booking-info>
<booking-info>
<index>3</index>
<pricing-info-index>1</pricing-info-index>
<booking-type>W</booking-class>
<cabin-type>E</cabin-type>
<ticket-type>E</ticket-type>
<booking-status>P</booking-status>
</booking-info>
</booking-info-list>
Is there a simple way to replace/remove the - (hyphen) in all tags?

The hyphen is not a special character in XML node names. It is a problem in SimpleXML only because it is an operator in PHP. Here is no need to change them and possibly destroy the XML.
You can use the variable variable syntax to access the elements.
$element = simplexml_load_string($xml);
foreach($element->{'booking-info'} as $element) {
var_dump($element);
}
It is not an issue if you're using Xpath:
$element = simplexml_load_string($xml);
foreach ($element->xpath('//booking-info') as $element) {
var_dump($element);
}
The Xpath expression is a string for PHP.
Or DOM:
$document = new DOMDocument();
$document->loadXml($xml);
foreach ($document->getElementsByTagName('booking-info') as $node) {
var_dump($node);
}
The name is a string for PHP.
Or DOM with XPath:
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
foreach ($xpath->evaluate('//booking-info') as $node) {
var_dump($node);
}
HINT: You have an error in the XML - <booking-type>...</booking-class> has different names for the opening and closing tag.

Related

Extract node information from external URL using query

I want to be able to extract information from specific nodes from an external XML file. I currently have been trying
$contents = file_get_contents('https://experiencehermann.com/post-sitemap.xml');
$dom = new DOMDocument;
$dom -> loadXML($contents);
$finder = new DOMXPath($dom);
$nodes = $finder->query('//loc');
foreach ($nodes as $node) {
echo $node->nodeValue ."</br />";
}
I'm able to use this same technique when I have the XML in the PHP directly but not when pulling from an external source.
Thanks in advance!

As your query is quite simple, you don't even need XPath, you can simply use the getElementsByTagName method on the DOMDocument object:
$dom = new DOMDocument;
$dom->loadXML($contents);
$nodes = $dom->getElementsByTagName('loc');
foreach ($nodes as $node) {
if ($node->nodeName === 'image:loc')
continue;
echo $node->nodeValue ."<br />\n";
}

How to query a DOMNode using XPath in PHP?

I'm trying to get the bing search results with XPath. Here is my code:
$html = file_get_contents("http://www.bing.com/search?q=bacon&first=11");
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHtml($html);
$x = new DOMXpath($doc);
$output = array();
// just grab the urls for now
foreach ($x->query("//li[#class='b_algo']") as $node)
{
//$output[] = $node->getAttribute("href");
$tmpDom = new DOMDocument();
$tmpDom->loadHTML($node);
$tmpDP = new DOMXPath($tmpDom);
echo $tmpDP->query("//div[#class='b_title']//h2//a//href");
}
return $output;
This foreach iterates over all results, all I want to do is to extract the link and text from $node in foreach, but because $node itself is an object I can't create a DOMDocument from it. How can I query it?

First of all, your XPath expression tries to match non-existant href subelements, query #href for the attribute.
You don't need to create any new DOMDocuments, just pass the $node as context item:
foreach ($x->query("//li[#class='b_algo']") as $node)
{
var_dump( $x->query("./div[#class='b_title']//h2//a//#href", $node)->item(0) );
}
If you're just interested in the URLs, you could also query them directly:
foreach ($x->query("//li[#class='b_algo']/div[#class='b_title']/h2/a/#href") as $node)
{
var_dump($node);
}

Trouble extracting data from an XML document using XPath

I'm trying to extract all of the "name" and "form13FFileNumber" values from xpath "//otherManagers2Info/otherManager2/otherManager" in this document:
https://www.sec.gov/Archives/edgar/data/1067983/000095012314002615/primary_doc.xml
Here is my code. Any idea what I am doing wrong here?
$xml = file_get_contents($url);
$dom = new DOMDocument();
$dom->loadXML($xml);
$x = new DOMXpath($dom);
$other_managers = array();
$nodes = $x->query('//otherManagers2Info/otherManager2/otherManager');
if (!empty($nodes)) {
$i = 0;
foreach ($nodes as $n) {
$i++;
$other_managers[$i]['form13FFileNumber'] = $x->evaluate('form13FFileNumber', $n)->item(0)->nodeValue;
$other_managers[$i]['name'] = $x->evaluate('name', $n)->item(0)->nodeValue;
}
}

Like you posted in the comment you can just register the namespace with an own prefix for Xpath. Namespace prefixes are just aliases. Here is no default namespace in Xpath, so you always have to register and use an prefix.
However, expressions always return a traversable node list, you can use foreach to iterate them. query() and evaluate() take a context node as the second argument, expression are relative to the context. Last evaluate() can return scalar values directly. This happens if you cast the node list in Xpath into a scalar type (like a string) or use function like count().
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$xpath->registerNamespace('e13', 'http://www.sec.gov/edgar/thirteenffiler');
$xpath->registerNamespace('ecom', 'http://www.sec.gov/edgar/common');
$result = [];
$nodes = $xpath->evaluate('//e13:otherManagers2Info/e13:otherManager2/e13:otherManager');
foreach ($nodes as $node) {
$result[] = [
'form13FFileNumber' => $xpath->evaluate('string(e13:form13FFileNumber)', $node),
'name' => $xpath->evaluate('string(e13:name)', $node),
];
}
var_dump($result);
Demo: https://eval.in/125200

extracting and printing an html element by it's class using DOMDocument

what i want to do is to get an element with its class name and show it as a actual html element not it nodes or its inner data
here is my code
$html = file_get_contents("www.site.com");
$dom = new DOMDocument('1.0');
$dom->loadHTML($html);
$element = $dom->getElementById('myid');
$string = $element->C14N();
here is how i do it using ID but i want to now if there is a way to do this using class apparently there is no getElementByClass method

There is no straightforward method in php dom to do this. You will have to walk all the elements and check if their class attribute contains the class name you need...
$html = file_get_contents("www.site.com");
$dom = new DOMDocument('1.0');
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('div') as $element) {
if (strpos($element->getAttribute('class'), 'yourClassNameHere') !== false) {
$string = $element->C14N();
}
}
You can also use DOMXpath:
$xpath = new DOMXpath($doc);
foreach ($xpath->query("*/div[#class='yourClassNameHere']") as $element) {
$string = $element->C14N();
}

ignoring nested elements when parsing xml with php

probably a simple question to answer for someone:::
xml:
<foobar>
<foo>i am a foo</foo>
<bar>i am a bar</bar>
<foo>i am a <bar>bar</bar></foo>
</foobar>
In the above, I want to display all elements that are <foo>. When the script gets to the line with the nested < bar > the result is "i am a bar" .. which isn't the result I had hoped for.
Is it not possible to print out the entire contents of that element as it is, so that i see: "i am a <bar>bar</bar>"
php:
$xml = file_get_contents('sample');
$dom = new DOMDocument;
#$dom->loadHTML($xml);
$resources= $dom->getElementsByTagName('foo');
foreach ($resources as $resource){
echo $resource->nodeValue . "\n";
}

After some trolling and trying to do what I needed with SimpleXML, I arrived at the following conclusion. My issue with SimpleXML was where the elements are. If the xml is structured, and the hierarchy is standard ... I have no problem.
If the XML is a web page for example, and the <foo> element is anywhere, SimpleXML doesn't have a good facility like getElementsByTagName to pull out the element wherever it may be....
<?php
$doc = new DOMDocument();
$doc->load('sample');
$element_name = 'foo';
if ($doc->getElementsByTagName($element_name)->length > 0) {
$resources = $doc->getElementsByTagName($element_name);
foreach ($resources as $resource) {
$id = null;
if (!$resource->hasAttribute('id')) {
$resource->setAttribute('id', gen_uuid());
}
$innerHTML = null;
$children = $resource->childNodes;
foreach ($children as $child) {
$tmp_doc = new DOMDocument();
$tmp_doc->appendChild($tmp_doc->importNode($child,true));
$innerHTML .= rtrim($tmp_doc->saveHTML());
}
$resource->nodevalue = $innerHTML;
}
}
echo $doc->saveHTML();
?>

Rather than writing all that code, you might try XPath. That expression would be "//foo", which would get a list of all the elements in the document named "foo".
http://php.net/manual/en/simplexmlelement.xpath.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to replace special chars in XML node with SimpleXMLElement PHP - php

Related

Extract node information from external URL using query

How to query a DOMNode using XPath in PHP?

Trouble extracting data from an XML document using XPath

extracting and printing an html element by it's class using DOMDocument

ignoring nested elements when parsing xml with php

Categories

Resources