I've used SO for many years and always found an answer but this time I have got myself well and truly lost.
I have an xml file I would like to split the compatbility into well formed xml
`<product>
<item>
<partno>abc123</partno>
<Compatbility>model1: 110C, 115C, 117C. model2: 1835C, 1840C. model3: 210C, 215C, 3240C.</Compatbility>
</item>
</product>`
In Compatbility the word model changes with each item entry although the : after model is always there as is the . after each model group.
Should I use SimpleXml DomXml or an xpath to get the following result
`<product>
<item>
<partno>abc123</partno>
<Compatbility>
<model>model1: 110C, 115C, 117C.</model>
<model>model2: 1835C, 1840C.</model>
<model>model3: 210C, 215C, 3240C.</model>
</Compatbility>
</item>
</product>`
Thanks
For simplexml, you can run a regular expression matching on the text-value of an element.
You can then remove all inner text and add the parsed result as new child elements.
This can be done with all you said: DOMDocument, SimpleXMLElement - both with or without xpath.
Here is a commented example in SimpleXML (online demo):
<?php
/**
* #link http://stackoverflow.com/q/24304095/367456
* #link https://eval.in/164934
*/
$buffer = <<<XML
<product>
<item>
<partno>abc123</partno>
<Compatbility>model1: 110C, 115C, 117C. model2: 1835C, 1840C. model3: 210C, 215C, 3240C.</Compatbility>
</item>
</product>
XML;
# load the xml string
$xml = simplexml_load_string($buffer);
# obtain the element in question
$compatbility = $xml->item->Compatbility;
# parse it's inner text-value for the models by a regex
$pattern = '~(model\\d?: [^.]+\\.) ?~u';
$result = preg_match_all($pattern, $compatbility, $matches);
# remove the text (so called simplexml self-reference)
$compatbility->{0} = '';
# add the parsed models as new model elements
foreach ($matches[1] as $model) {
$compatbility->model[] = $model;
}
# output the xml
$xml->asXML('php://output');
The output it gives is:
<?xml version="1.0"?>
<product>
<item>
<partno>abc123</partno>
<Compatbility><model>model1: 110C, 115C, 117C.</model><model>model2: 1835C, 1840C.</model><model>model3: 210C, 215C, 3240C.</model></Compatbility>
</item>
</product>
First ofcourse, you need to convert that first into something that you can manipulate (arrays). Then the usual parsing (using explode). In the end, you will need to create a new xml again. Consider this example:
$xml_string = '<product><item><partno>abc123</partno><Compatbility>model1: 110C, 115C, 117C. model2: 1835C, 1840C. model3: 210C, 215C, 3240C.</Compatbility></item></product>';
$original_xml = simplexml_load_string($xml_string);
$data = json_decode(json_encode($original_xml), true);
$compatbility = $data['item']['Compatbility']; // get all compatibility values
// explode values
$compatbility = array_filter(array_map('trim', explode('.', $compatbility)));
$new_xml = new SimpleXMLElement('<product/>'); // initialize new xml
// add necessary values
$new_xml->addChild('item')->addChild('partno', $data['item']['partno']);
$new_xml->item->addChild('Compatbility');
// loop the values and add them as children
foreach($compatbility as $value) {
$value = trim(preg_replace('/(\w+):/', '', $value));
$new_xml->item->Compatbility->addChild('model', $value);
}
echo $new_xml->asXML(); // output as xml
Related
I been searching information how to remove white spaces between tag values leaved by a PHP code when I export it to XML, I will explain detailed, first I load and XML then I do a search on the file with xPath, then I remove some elements that do not match some brands and finally I reexport it as a new XML, the problem is that this new XML is full of white spaces leaved by the code. I tried trim it but it doesn't seems to work correctly.
Here is my code:
<?php
$sXML = simplexml_load_file('file.xml'); //First load the XML
$brands = $sXML->xPath('//brand'); //I do a search for the <brand> tag
function filter(string $input) { //Then I give it a list of variables
switch ($input) {
case 'BRAND 3':
case 'BRAND 4':
return false;
default:
return true;
}
}
array_walk($brands, function($brand) { //I remove all elements do not match my list
$content = (string) $brand;
if (filter($content)) {
$item = $brand->xPath('..')[0];
unset($item[0]);
}
});
$sXML->asXML('filtred.xml'); // And finally export a new xml
?>
This one is the original XML:
<?xml version="1.0" encoding="utf-8"?>
<products>
<item>
<reference>00001</reference>
<other_string>PRODUCT 1</other_string>
<brand>BRAND 1</brand>
</item>
<item>
<reference>00002</reference>
<other_string>PRODUCT 2</other_string>
<brand>BRAND 2</brand>
</item>
<item>
<reference>00003</reference>
<other_string>PRODUCT 3</other_string>
<brand>BRAND 3</brand>
</item>
<item>
<reference>00004</reference>
<other_string>PRODUCT 4</other_string>
<brand>BRAND 4</brand>
</item>
<item>
<reference>00005</reference>
<other_string>PRODUCT 5</other_string>
<brand>BRAND 5</brand>
</item>
</products>
And the output of the script sends this:
<?xml version="1.0" encoding="utf-8"?>
<products>
<item>
<reference>00001</reference>
<other_string>PRODUCT 1</other_string>
<brand>BRAND 1</brand>
</item>
<item>
<reference>00002</reference>
<other_string>PRODUCT 2</other_string>
<brand>BRAND 2</brand>
</item>
<item>
<reference>00005</reference>
<other_string>PRODUCT 5</other_string>
<brand>BRAND 5</brand>
</item>
</products>
As you can see on the output, there is a white space between product 2 and product 5 and that I need to remove it. Any help will be appreciate.
You can force SimpleXML to trim all whitespace when it reads the file, by passing the LIBXML_NOBLANKS option to simplexml_load_file:
$sXML = simplexml_load_file('file.xml', null, LIBXML_NOBLANKS);
Then when you call ->asXML(), all the whitespace will be removed, and you'll get XML all on one line, like this:
<?xml version="1.0" encoding="utf-8"?>
<products><item><reference>00003</reference><other_string>PRODUCT 3</other_string><brand>BRAND 3</brand></item><item><reference>00004</reference><other_string>PRODUCT 4</other_string><brand>BRAND 4</brand></item></products>
To re-generate whitespace based on the remaining structure, you'll need to use DOM rather than SimpleXML - but that's easy to do without changing any of your existing code, because dom_import_simplexml simply "rewraps" the XML without reparsing it.
Then you can use the DOMDocument formatOutput property and save() method to "pretty-print" the document:
$sXML = simplexml_load_file('file.xml', null, LIBXML_NOBLANKS);
// ...
// process $sXML as before
// ...
$domDocument = dom_import_simplexml($sXML)->ownerDocument;
$domDocument->formatOutput = true;
echo $domDocument->save('filtered.xml');
Another possibility is to use preg_replace:
// Get simpleXml as string
$xmlAsString = $yourSimpleXmlObject->asXML();
// Remove newlines
$xmlAsString = preg_replace("/\n/", "", $xmlAsString);
// Remove spaces between tags
$xmlAsString = preg_replace("/>\s*</", "><", $xmlAsString);
var_dump($xmlAsString);
Now you get your XML as string in one line (including the XML declaration).
I'm trying to generate a RSS feed using PHP SimpleXMLElement, the problem is that i need to prefix elements and can't find a way to do this using the SimpleXMLElement class.
I've tried using $item->addChild('prefix:element', 'value') but in the result xml it strips the prefix, any idea why this happens ?.
I wonder if there is a way to solve this using the SimpleXMLElement or any other cleaner way than just echoing the XML.
For clarification, this is my PHP code:
$xml = new SimpleXMLElement('<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0"/>');
$channel = $xml->addChild('channel');
$channel->addChild('title', 'Text');
$channel->addChild('link', 'http://example.com');
$channel->addChild('description', 'An example item from the feed.');
foreach($this->products as $product) {
$item = $channel->addChild('item');
foreach($product as $key => $value)
$item->addChild($key, $value);
}
return $xml->asXML();
And this is the example XML i'm trying to generate:
<?xml version="1.0"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
<channel>
<title>Test Store</title>
<link>http://www.example.com</link>
<description>An example item from the feed</description>
<item>
<g:id>DB_1</g:id>
<g:title>Dog Bowl In Blue</g:title>
<g:description>Solid plastic Dog Bowl in marine blue color</g:description>
...
</item>
...
Thanks in advance
You need to pass the namespace uri of the prefix to add child element with prefix :
$item->addChild($key, $value, 'http://base.google.com/ns/1.0');
eval.in demo :
$xml = new SimpleXMLElement('<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0"/>');
$channel = $xml->addChild('channel');
$channel->addChild('title', 'Text');
$channel->addChild('link', 'http://example.com');
$channel->addChild('description', 'An example item from the feed.');
$item = $channel->addChild('item');
$item->addChild('g:foo', 'bar', 'http://base.google.com/ns/1.0');
print $xml->asXML();
I was tesing with a simple example of how to display XML in browser using PHP and found this example which works good
<?php
$xml = new DOMDocument("1.0");
$root = $xml->createElement("data");
$xml->appendChild($root);
$id = $xml->createElement("id");
$idText = $xml->createTextNode('1');
$id->appendChild($idText);
$title = $xml->createElement("title");
$titleText = $xml->createTextNode('Valid');
$title->appendChild($titleText);
$book = $xml->createElement("book");
$book->appendChild($id);
$book->appendChild($title);
$root->appendChild($book);
$xml->formatOutput = true;
echo "<xmp>". $xml->saveXML() ."</xmp>";
$xml->save("mybooks.xml") or die("Error");
?>
It produces the following output:
<?xml version="1.0"?>
<data>
<book>
<id>1</id>
<title>Valid</title>
</book>
</data>
Now I have got two questions regarding how the output should look like.
The first line in the xml file '', should not be displayed, that is it should be hidden
How can I display the TextNode in the next line. In total I am exepecting an output in this fashion
<data>
<book>
<id>1</id>
<title>
Valid
</title>
</book>
</data>
Is that possible to get the desired output, if so how can I accomplish that.
Thanks
To skip the XML declaration you can use the result of saveXML on the root node:
$xml_content = $xml->saveXML($root);
file_put_contents("mybooks.xml", $xml_content) or die("cannot save XML");
Please note that saveXML(node) has a different output from saveXML().
First question:
here is my post where all usable threads with answers are listed: How do you exclude the XML prolog from output?
Second question:
I don't know of any PHP function that outputs text nodes like that.
You could:
read xml using DomDocument and save each node as string
iterate trough nodes
detect text nodes and add new lines to xml string manually
At the end you would have the same XML with text node values in new line:
<node>
some text data
</node>
I need to get <name> and <URL> tag's value where subtype="mytype".How can do it in PHP?
I want document name and test.pdf path in my result.
<?xml version="1.0" encoding="UTF-8"?>
<test>
<required>
<item type="binary">
<name>The name</name>
<url visibility="restricted">c:/temp/test/widget.exe</url>
</item>
<item type="document" subtype="mytype">
<name>document name</name>
<url visiblity="visible">c:/temp/test.pdf</url>
</item>
</required>
</test>
Use SimpleXML and XPath, eg
$xml = simplexml_load_file('path/to/file.xml');
$items = $xml->xpath('//item[#subtype="mytype"]');
foreach ($items as $item) {
$name = (string) $item->name;
$url = (string) $item->url;
}
PHP 5.1.2+ has an extension called SimpleXML enabled by default. It's very useful for parsing well-formed XML like your example above.
First, create a SimpleXMLElement instance, passing the XML to its constructor. SimpleXML will parse the XML for you. (This is where I feel the elegance of SimpleXML lies - SimpleXMLElement is the entire library's sole class.)
$xml = new SimpleXMLElement($yourXml);
Now, you can easily traverse the XML as if it were any PHP object. Attributes are accessible as array values. Since you're looking for tags with specific attribute values, we can write a simple loop to go through the XML:
<?php
$yourXml = <<<END
<?xml version="1.0" encoding="UTF-8"?>
<test>
<required>
<item type="binary">
<name>The name</name>
<url visibility="restricted">c:/temp/test/widget.exe</url>
</item>
<item type="document" subtype="mytype">
<name>document name</name>
<url visiblity="visible">c:/temp/test.pdf</url>
</item>
</required>
</test>
END;
// Create the SimpleXMLElement
$xml = new SimpleXMLElement($yourXml);
// Store an array of results, matching names to URLs.
$results = array();
// Loop through all of the tests
foreach ($xml->required[0]->item as $item) {
if ( ! isset($item['subtype']) || $item['subtype'] != 'mytype') {
// Skip this one.
continue;
}
// Cast, because all of the stuff in the SimpleXMLElement is a SimpleXMLElement.
$results[(string)$item->name] = (string)$item->url;
}
print_r($results);
Tested to be correct in codepad.
Hope this helps!
You can use the XML Parser or SimpleXML.
Using PHP, how do I get an entire subset of nodes from an XML document? I can retrieve something like:
<?xml version="1.0" encoding="utf-8"?>
<people>
<certain>
<name>Jane Doe</name>
<age>21</age>
</certain>
<certain>
<certain>
<name>John Smith</name>
<age>34</age>
</certain>
</people>
But what if I only want to return the child nodes of like this?
<certain>
<name>Jane Doe</name>
<age>21</age>
</certain>
<certain>
<certain>
<name>John Smith</name>
<age>34</age>
</certain>
EDIT: I'm trying to get a subset of XML and pass that directly, not an object like simplexml would give me. I am basically trying to get PHP to do what .NET's OuterXml does... return literally the above subset of XML as is... no interpreting or converting or creating a new XML file or anything... just extract those nodes in situ and pass them on. Am I going to have to get the XML file, parse out what I need and then rebuild it as a new XML file? If so then I need to get rid of the <?xml version="1.0" encoding="utf-8"?> bit... ugh.
The answer would be to use XPath.
$people = simplexml_load_string(
'<?xml version="1.0" encoding="utf-8"?>
<people>
<certain>
<name>Jane Doe</name>
<age>21</age>
</certain>
<certain>
<name>John Smith</name>
<age>34</age>
</certain>
</people>'
);
// get all <certain/> nodes
$people->xpath('//certain');
// get all <certain/> nodes whose <name/> is "John Smith"
print_r($people->xpath('//certain[name = "John Smith"]'));
// get all <certain/> nodes whose <age/> child's value is greater than 21
print_r($people->xpath('//certain[age > 21]'));
Take 2
So apparently you want to copy some nodes from a document into another document? SimpleXML doesn't support that. DOM has methods for that but they're kind of annoying to use. Which one are you using? Here's what I use: SimpleDOM. In fact, it's really SimpleXML augmented with DOM's methods.
include 'SimpleDOM.php';
$results = simpledom_load_string('<results/>');
foreach ($people->xpath('//certain') as $certain)
{
$results->appendChild($certain);
}
That routine finds all <certain/> node via XPath, then appends them to the new document.
You could use DOMDocument.GetElementsByTagName or you could:
Use XPath?
<?php
$xml = simplexml_load_file("test.xml");
$result = $xml->xpath("//certain");
print_r($result);
?>
Use DOM and XPath. Xpath allows you to select nodes (and values) from an XML DOM.
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$result = '';
foreach ($xpath->evaluate('/people/certain') as $node) {
$result .= $dom->saveXml($node);
}
echo $result;
Demo: https://eval.in/162149
DOMDocument::saveXml() has a context argument. If provided it saves that node as XML. Much like outerXml(). PHP is able to register your own classes for the DOM nodes, too. So it is even possible to add an outerXML() function to element nodes.
class MyDomElement extends DOMElement {
public function outerXml() {
return $this->ownerDocument->saveXml($this);
}
}
class MyDomDocument extends DOMDocument {
public function __construct($version = '1.0', $encoding = 'utf-8') {
parent::__construct($version, $encoding);
$this->registerNodeClass('DOMElement', 'MyDomElement');
}
}
$dom = new MyDomDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
$result = '';
foreach ($xpath->evaluate('/people/certain') as $node) {
$result .= $node->outerXml();
}
echo $result;
Demo: https://eval.in/162157
See http://www.php.net/manual/en/domdocument.getelementsbytagname.php
The answer turned out to be a combination of the xpath suggestion and outputting with asXML().
Using the example given by Josh Davis:
$people = simplexml_load_string(
<?xml version="1.0" encoding="utf-8"?>
<people>
<certain>
<name>Jane Doe</name>
<age>21</age>
</certain>
<certain>
<name>John Smith</name>
<age>34</age>
</certain>
</people>'
);
// get all <certain/> nodes
$nodes = $people->xpath('/people/certain');
foreach ( $nodes as $node ) {
$result .= $node->asXML()."\n";
}
echo $result;