Parsing XML using PHP - php

I've consistently had an issue with parsing XML with PHP and not really found "the right way" or at least a standardised way of parsing XML files.
Firstly i'm trying to parse this:
<item>
<title>2884400</title>
<description><![CDATA[ ><img width="126" alt="" src="http://userserve-ak.last.fm/serve/126/27319921.jpg" /> ]]></description>
<link>http://www.last.fm/music/+noredirect/Beatles/+images/27319921</link>
<author>anne710</author>
<pubDate>Tue, 21 Apr 2009 16:12:31 +0000</pubDate>
<guid>http://www.last.fm/music/+noredirect/Beatles/+images/27319921</guid>
<media:content url="http://userserve-ak.last.fm/serve/_/27319921/Beatles+2884400.jpg" fileSize="13065" type="image/jpeg" expression="full" width="126" height="126" />
<media:thumbnail url="http://userserve-ak.last.fm/serve/126/27319921.jpg" type="image/jpeg" width="126" height="126" />
</item>
I'm using this code:
$doc = new DOMDocument();
$doc->load('http://ws.audioscrobbler.com/2.0/artist/beatles/images.rss');
$arrFeeds = array();
foreach ($doc->getElementsByTagName('item') as $node) {
$itemRSS = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue
);
array_push($arrFeeds, $itemRSS);
}
Now I want to get the "media:content" and "media:thumbnail" url attributes, how would i do that? Now i think i should be using DOMElement::getAttribute but i haven't managed to get it to work :/ Can anyone shed some light on this, and also let me know if this is a good way to parse XML?
Regards,
Shadi

You can use SimpleXML as suggested by the other posters, but you need to use the children() and attributes() functions so you can deal with the different namespaces
Example (untested):
$feed = file_get_contents('http://ws.audioscrobbler.com/2.0/artist/beatles/images.rss');
$xml = new SimpleXMLElement($feed);
foreach ($xml->channel->item as $item) {
foreach ($item->children('http://search.yahoo.com/mrss' as $media_element) {
var_dump($media_element);
}
}
Alternatively, you can use XPath (again, untested):
$feed = file_get_contents('http://ws.audioscrobbler.com/2.0/artist/beatles/images.rss');
$xml = new SimpleXMLElement($feed);
$xml->registerXPathNamespace('media', 'http://ws.audioscrobbler.com/2.0/artist/beatles/images.rss');
$images = $xml->xpath('/rss/channel/item/media:content#url');
var_dump($images);

Try this. It'll work fine.
$doc = new DOMDocument();
$doc->load('http://ws.audioscrobbler.com/2.0/artist/beatles/images.rss');
$arrFeeds = array();
foreach ($doc->getElementsByTagName('item') as $node) {
$itemRSS = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue,
'thumbnail' => $node->getElementsByTagName('thumbnail')->item(0)->getAttribute('url')
);
array_push($arrFeeds, $itemRSS);
}

This was how i have eventually done it using XMLReader:
<?php
define ('XMLFILE', 'http://ws.audioscrobbler.com/2.0/artist/vasco%20rossi/images.rss');
echo "<pre>";
$items = array ();
$i = 0;
$xmlReader = new XMLReader();
$xmlReader->open(XMLFILE, null, LIBXML_NOBLANKS);
$isParserActive = false;
$simpleNodeTypes = array ("title", "description", "media:title", "link", "author", "pubDate", "guid");
while ($xmlReader->read ())
{
$nodeType = $xmlReader->nodeType;
// Only deal with Beginning/Ending Tags
if ($nodeType != XMLReader::ELEMENT && $nodeType != XMLReader::END_ELEMENT) { continue; }
else if ($xmlReader->name == "item") {
if (($nodeType == XMLReader::END_ELEMENT) && $isParserActive) { $i++; }
$isParserActive = ($nodeType != XMLReader::END_ELEMENT);
}
if (!$isParserActive || $nodeType == XMLReader::END_ELEMENT) { continue; }
$name = $xmlReader->name;
if (in_array ($name, $simpleNodeTypes)) {
// Skip to the text node
$xmlReader->read ();
$items[$i][$name] = $xmlReader->value;
} else if ($name == "media:thumbnail") {
$items[$i]['media:thumbnail'] = array (
"url" => $xmlReader->getAttribute("url"),
"width" => $xmlReader->getAttribute("width"),
"height" => $xmlReader->getAttribute("height"),
"type" => $xmlReader->getAttribute("type")
);
} else if ($name == "media:content") {
$items[$i]['media:content'] = array (
"url" => $xmlReader->getAttribute("url"),
"width" => $xmlReader->getAttribute("width"),
"height" => $xmlReader->getAttribute("height"),
"filesize" => $xmlReader->getAttribute("fileSize"),
"expression" => $xmlReader->getAttribute("expression")
);
}
}
print_r($items);
echo "</pre>";
?>

<?php
#Convert the String Into XML
$xml = new SimpleXMLElement($_POST['name']);
#Itterate through the XML for the data
$values = "VALUES('' , ";
foreach($xml->item as $item)
{
//you now have access to that aitem
}
?>

Try using SimpleXML: http://us2.php.net/simplexml

You would want something like this:
'content' => $node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'content')->item(0)->getAttribute('url');
'thumbnail' => $node->getElementsByTagNameNS('http://search.yahoo.com/mrss/', 'thumbnail')->item(0)->getAttribute('url');
I believe that will work, it's been a while since I've done anything like this.

You may get the error Call to a member function getAttribute() on a non-object if a feed is missing entries like thumbnail, so while I like #Helder Robalo's answer you should check to make sure a node exists before trying to use things like getAttribute():
<?php
header('Content-type: text/plain; charset=utf-8');
$doc = new DOMDocument();
$doc->load('http://ws.audioscrobbler.com/2.0/artist/beatles/images.rss');
$arrFeeds = array();
foreach ($doc->getElementsByTagName('item') as $node) {
$itemRSS = array (
'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
'date' => $node->getElementsByTagName('pubDate')->item(0)->nodeValue
);
if( sizeof($node->getElementsByTagName('thumbnail')->item(0)) > 0 )
{
$itemRSS['thumbnail'] = $node->getElementsByTagName('thumbnail')->item(0)->getAttribute('url');
}
else
{
$itemRSS['thumbnail'] = '';
}
array_push($arrFeeds, $itemRSS);
}
print_r($arrFeeds);

Media:content attributes are actually pretty easy to get with SIMPLE XML
if(!#$x=simplexml_load_file($feed_url)){
}
else
{
foreach($x->channel->item as $entry)
{
$media = $entry->children('http://search.yahoo.com/mrss/')->attributes();
$url = (string) $media['url'];
}
}

Related

I create a script with php to convert xml to csv, but all results is vertical instead in one line with headers

I want to create via php a script to convert xml to csv. I get the xml from url and with the follow code i make a csv. The problem is that the field goes vertical instead horizontal.
For example my xml is like:
<product>
<id>1001</id>
<sku>product1</sku>
<name>Product 1 Name</name>
<manufacturer>My Company</manufacturer>
</product>
<product>
<id>1002</id>
<sku>product2</sku>
<name>Product 2 Name</name>
<manufacturer>My Company</manufacturer>
</product>
<product>
<id>1003</id>
<sku>product3</sku>
<name>Product 3 Name</name>
<manufacturer>My Company</manufacturer>
</product>
And i get something like:
id,1001
sku,product1
name,"product 1"
manufacturer,My Company
id,1002
sku,product2
name,"product 2"
manufacturer,My Company
id,1003
sku,product3
name,"product 3"
manufacturer,My Company
instead this (this i want)
"id","sku","name","manufactuer"
"1001","product1","Product 1","My Company"
"1002","product2","Product 2","My Company"
"1003","product3","Product 3","My Company"
My code now is
file_put_contents("products.xml", fopen("https://xml.mysite.com/get.asp?xml=products&key=myxml", 'r'));
if (file_exists('products.xml')){
$xml = simplexml_load_file('products.xml');
file_put_contents("products.csv", "");
$f = fopen('products.csv', 'w');
createCsv($xml, $f);
fclose($f);
}
function createCsv($xml,$f){
foreach ($xml->children() as $item) {
$hasChild = (count($item->children()) > 0)?true:false;
if(!$hasChild){
$put_arr = array($item->getName(),$item);
fputcsv($f, $put_arr ,',','"');
} else {
createCsv($item, $f);
}
}
}
What i can do please?
SimpleXML (and DOM) can use Xpath to fetch elements from an XML. You would need one expression for the rows and a list of expressions for the columns.
function readRowsFromSimpleXML(
SimpleXMLElement $element, string $rowExpression, array $columnExpressions
): Generator {
foreach ($element->xpath($rowExpression) as $rowNode) {
$row = [];
foreach ($columnExpressions as $column => $expression) {
$row[$column] = (string)($rowNode->xpath($expression)[0] ?? '');
}
yield $row;
}
}
$rows = readRowsFromSimpleXML(
simplexml_load_file('products.xml'),
'//product',
$columns = [
'id' => './id',
'sku' => './sku',
'name' => './name',
'price' => './price',
'manufacturer' => './manufacturer'
]
);
readRowsFromSimpleXML(...) will return a Generator. It will not read the data yet. This will only happen if you resolve it - for example with foreach().
Addressing the row and column data explicitly keeps the output more stable. It even work if an element is missing. I added a price column to show this.
To put this into a CSV you have to iterate the generator:
$fh = fopen('php://stdout', 'w');
fputcsv($fh, array_keys($columns));
foreach ($rows as $row) {
fputcsv($fh, array_values($row));
}
Output:
id,sku,name,price,manufacturer
1001,product1,"Product 1 Name",,"My Company"
1002,product2,"Product 2 Name",,"My Company"
1003,product3,"Product 3 Name",,"My Company"
This works with more complex expressions as well. For example reading a currency attribute of the price element or multiple images:
$columns = [
'id' => './id',
'sku' => './sku',
'name' => './name',
'manufacturer' => './manufacturer',
'price' => './price',
'price' => './price/#currency',
'image0' => '(./image)[1]',
'image1' => '(./image)[2]'
]';
If you need to aggregate values, add a callback to the column definition.
function readRowsFromSimpleXML(
SimpleXMLElement $element, string $rowExpression, array $columnExpressions
): Generator {
foreach ($element->xpath($rowExpression) as $rowNode) {
$row = [];
foreach ($columnExpressions as $column => $options) {
if (is_array($options)) {
[$expression, $callback] = $options;
} else {
$expression = $options;
$callback = null;
}
$values = $rowNode->xpath($expression);
if ($callback) {
$row[$column] = $callback($values);
} else {
$row[$column] = (string)($rowNode->xpath($expression)[0] ?? '');
}
}
yield $row;
}
}
$rows = readRowsFromSimpleXML(
simplexml_load_file('products.xml'),
'//product',
$columns = [
'id' => './id',
'sku' => './sku',
// ...
'categories' => [ './category', fn ($values) => implode(',', $values) ]
]
);
Complex configuration arrays are difficult to maintain. A more encapsulated approach would be a class. The following class works with SimpleXML and DOM. The fields/columns are added with a method.
class XMLRecordsReader implements \IteratorAggregate {
private $_source;
private $_expression = './*';
private $_fields = [];
public function __construct($source) {
if ($source instanceof \SimpleXMLElement) {
$this->_source = dom_import_simplexml($source);
return;
}
if ($source instanceof \DOMNode) {
$this->_source = $source;
return;
}
throw new \InvalidArgumentException('Need SimpleXMLElement or DOMNode $source.');
}
public function setExpression(string $expression): self {
$this->_expression = $expression;
return $this;
}
public function addField(string $name, string $expression, callable $mapper = null): self {
$this->_fields[$name] = [$expression, $mapper];
return $this;
}
public function getIterator(): \Generator {
$xpath = new DOMXpath(
$this->_source instanceof DOMDocument ? $this->_source : $this->_source->ownerDocument
);
foreach ($xpath->evaluate($this->_expression, $this->_source) as $node) {
$record = [];
foreach ($this->_fields as $field => $options) {
[$expression, $mapper] = $options;
$values = $xpath->evaluate($expression, $node);
if ($mapper) {
$record[$field] = $mapper($values);
} else if ($values instanceof DOMNodeList) {
$value = $values[0] ?? null;
$record[$field] = $value->textContent ?? '';
} else {
$record[$field] = (string)($values ?? '');
}
}
yield $record;
}
}
}
$reader = new XMLRecordsReader(
simplexml_load_file('products.xml'),
);
$reader
->addField('id', './id')
->addField('sku', './sku')
->addField('name', './name')
->addField('manufacturer', './manufacturer')
->addField('price', './price')
->addField('currency', './price/#currency')
->addField('image0', '(./image)[1]')
->addField('image1', '(./image)[2]')
->addField(
'categories',
'./category',
fn (\DOMNodeList $values) => implode(
',',
array_map(
fn (\DOMNode $node) => $node->textContent,
iterator_to_array($values)
)
)
);
var_dump(iterator_to_array($reader));

content of my xml is deleting when im adding a root to my xml document

I'm creating an XLM document and then trying to add a root :
**function makeTransaction()** {
$arr = [
"client_id" => $this->client_id,
"client_name" => $this->client_name,
"client_password" => $this->client_password,
"sender_first_name" => "Marie",
"sender_last_name" => "Dupont",
"sender_address" => "Washington Street",
"sender_city" => "New York",
"sender_country" => "USA",
];
$myxml = $this->arrayToXml($arr, "<validate_transaction/>");
}
**function arrayToXml($array, $rootElement = null, $xml = null)** {
$_xml = $xml;
if ($_xml === null) {
$_xml = new SimpleXMLElement($rootElement !== null ? $rootElement :
'<root/>');
}
foreach ($array as $k => $v) {
if (is_array($v)) {
arrayToXml($v, $k, $_xml->addChild($k));
} else {
$_xml->addChild($k, $v);
}
*$doc = new DOMDocument();
$doc->appendChild($doc->createElement('Money'));
$domnode = dom_import_simplexml($_xml);
$doc->documentElement->appendChild($doc->importNode($domnode));
$_xml= simplexml_import_dom($doc);
print($_xml->asXML());*
}
I dont understand, im success to get the xml document at the beggining but one i'm adding the last part of the function to add the root ("money'), i only have this result:
<money>
<validate_transaction>
<validate_transaction/>
</money>
and everything inside validate transaction disapear.
Could you help me please

Simple XML return with PHP and DOMDocument

I am building a simple api to Post/Return XML for my app. Here is the code:
$returnData = array (
"ResultCode" => "0",
"ResultString" => "uppdated"
);
$xml = new DOMDocument();
$dateInfoElement = $xml->createElement("versionCheckResult");
foreach ($returnData as $key => $value) {
$xmlNode = $xml->createElement($key,$value);
$dateInfoElement->appendChild($xmlNode);
}
$xml->appendChild($dateInfoElement);
echo $xml;
Sadly, I am getting no return, not a thing. Php is not my strong side but it seemed easier than working with Node.JS and mongoDB. Can you tell me what I am doing wrong?
If you're using DOMDocument, you need to use this method to display your XML as a string : DOMDocument::saveXML()
$returnData = array (
"ResultCode" => "0",
"ResultString" => "uppdated"
);
$xml = new DOMDocument();
$dateInfoElement = $xml->createElement("versionCheckResult");
foreach ($returnData as $key => $value) {
$xmlNode = $xml->createElement($key,$value);
$dateInfoElement->appendChild($xmlNode);
}
$xml->appendChild($dateInfoElement);
echo $xml->saveXML(); //This should works as expected

Parse RSS/Atom feeds with PHP's HTML DomDocument

How to find the values for namespace content:encoded and dc:creator with the following code
Unfortunately I cannot use simplepie or magpierss or even simplexml.
I know I've to use $doc->getElementsByTagName, but cannot figure out where?
<?php
function rss_to_array($tags, $array, $url) {
$doc = new DOMdocument();
#$doc->load($url);
$rss_array = array();
foreach($tags as $tag) {
if ($doc->getElementsByTagName($tag)) {
foreach($doc->getElementsByTagName($tag) AS $node) {
$items = array();
foreach($array AS $key => $values) {
$items[$key] = array();
foreach($values as $value) {
if ($itemsCheck = $node->getElementsByTagName($value)) {
for( $j=0 ; $j < $itemsCheck->length; $j++ ) {
if (($attribute = $itemsCheck->item($j)->nodeValue) != "") {
$items[$key][] = $attribute;
} else if ($attribute = $itemsCheck->item($j)->getAttribute('term')) {
$items[$key][] = $attribute;
} else if ($itemsCheck->item($j)->getAttribute('rel') == 'alternate') {
$items[$key][] = $itemsCheck->item($j)->getAttribute('href');
}
}
}
}
}
array_push($rss_array, $items);
}
}
}
return $rss_array;
}
$rss_item_tags = array('item', 'entry');
$rss_tags = array(
'title' => array('title'),
'description' => array('description', 'content', 'summary'),
'link' => array('link', 'feedburner'),
'category' => array('category')
);
$rssfeed = rss_to_array($rss_item_tags, $rss_tags, $url);
echo '<pre>';
print_r($rssfeed);
echo '</pre>';
exit;
?>
for RSS feeds, try using simplexml_load_file. It creates an object out of the XML and, as all RSS feeds are the same, then you can do something like:
$feed = simplexml_load_file(your_rss_url_here);
for($i=0; $i < 10; $i++){
// this is assuming there are 10 pieces of content for each RSS you're loading
$link = $feed->channel->item[$i]->link;
// do each for pubdate, author, description, title, etc.
}
http://php.net/manual/en/book.simplexml.php

PHP convert XML to JSON

I am trying to convert xml to json in php. If I do a simple convert using simple xml and json_encode none of the attributes in the xml show.
$xml = simplexml_load_file("states.xml");
echo json_encode($xml);
So I am trying to manually parse it like this.
foreach($xml->children() as $state)
{
$states[]= array('state' => $state->name);
}
echo json_encode($states);
and the output for state is {"state":{"0":"Alabama"}} rather than {"state":"Alabama"}
What am I doing wrong?
XML:
<?xml version="1.0" ?>
<states>
<state id="AL">
<name>Alabama</name>
</state>
<state id="AK">
<name>Alaska</name>
</state>
</states>
Output:
[{"state":{"0":"Alabama"}},{"state":{"0":"Alaska"}
var dump:
object(SimpleXMLElement)#1 (1) {
["state"]=>
array(2) {
[0]=>
object(SimpleXMLElement)#3 (2) {
["#attributes"]=>
array(1) {
["id"]=>
string(2) "AL"
}
["name"]=>
string(7) "Alabama"
}
[1]=>
object(SimpleXMLElement)#2 (2) {
["#attributes"]=>
array(1) {
["id"]=>
string(2) "AK"
}
["name"]=>
string(6) "Alaska"
}
}
}
Json & Array from XML in 3 lines:
$xml = simplexml_load_string($xml_string);
$json = json_encode($xml);
$array = json_decode($json,TRUE);
Sorry for answering an old post, but this article outlines an approach that is relatively short, concise and easy to maintain. I tested it myself and works pretty well.
http://lostechies.com/seanbiefeld/2011/10/21/simple-xml-to-json-with-php/
<?php
class XmlToJson {
public function Parse ($url) {
$fileContents= file_get_contents($url);
$fileContents = str_replace(array("\n", "\r", "\t"), '', $fileContents);
$fileContents = trim(str_replace('"', "'", $fileContents));
$simpleXml = simplexml_load_string($fileContents);
$json = json_encode($simpleXml);
return $json;
}
}
?>
I figured it out. json_encode handles objects differently than strings. I cast the object to a string and it works now.
foreach($xml->children() as $state)
{
$states[]= array('state' => (string)$state->name);
}
echo json_encode($states);
I guess I'm a bit late to the party but I have written a small function to accomplish this task. It also takes care of attributes, text content and even if multiple nodes with the same node-name are siblings.
Dislaimer:
I'm not a PHP native, so please bear with simple mistakes.
function xml2js($xmlnode) {
$root = (func_num_args() > 1 ? false : true);
$jsnode = array();
if (!$root) {
if (count($xmlnode->attributes()) > 0){
$jsnode["$"] = array();
foreach($xmlnode->attributes() as $key => $value)
$jsnode["$"][$key] = (string)$value;
}
$textcontent = trim((string)$xmlnode);
if (count($textcontent) > 0)
$jsnode["_"] = $textcontent;
foreach ($xmlnode->children() as $childxmlnode) {
$childname = $childxmlnode->getName();
if (!array_key_exists($childname, $jsnode))
$jsnode[$childname] = array();
array_push($jsnode[$childname], xml2js($childxmlnode, true));
}
return $jsnode;
} else {
$nodename = $xmlnode->getName();
$jsnode[$nodename] = array();
array_push($jsnode[$nodename], xml2js($xmlnode, true));
return json_encode($jsnode);
}
}
Usage example:
$xml = simplexml_load_file("myfile.xml");
echo xml2js($xml);
Example Input (myfile.xml):
<family name="Johnson">
<child name="John" age="5">
<toy status="old">Trooper</toy>
<toy status="old">Ultrablock</toy>
<toy status="new">Bike</toy>
</child>
</family>
Example output:
{"family":[{"$":{"name":"Johnson"},"child":[{"$":{"name":"John","age":"5"},"toy":[{"$":{"status":"old"},"_":"Trooper"},{"$":{"status":"old"},"_":"Ultrablock"},{"$":{"status":"new"},"_":"Bike"}]}]}]}
Pretty printed:
{
"family" : [{
"$" : {
"name" : "Johnson"
},
"child" : [{
"$" : {
"name" : "John",
"age" : "5"
},
"toy" : [{
"$" : {
"status" : "old"
},
"_" : "Trooper"
}, {
"$" : {
"status" : "old"
},
"_" : "Ultrablock"
}, {
"$" : {
"status" : "new"
},
"_" : "Bike"
}
]
}
]
}
]
}
Quirks to keep in mind:
Several tags with the same tagname can be siblings. Other solutions will most likely drop all but the last sibling. To avoid this each and every single node, even if it only has one child, is an array which hold an object for each instance of the tagname. (See multiple "" elements in example)
Even the root element, of which only one should exist in a valid XML document is stored as array with an object of the instance, just to have a consistent data structure.
To be able to distinguish between XML node content and XML attributes each objects attributes are stored in the "$" and the content in the "_" child.
Edit:
I forgot to show the output for your example input data
{
"states" : [{
"state" : [{
"$" : {
"id" : "AL"
},
"name" : [{
"_" : "Alabama"
}
]
}, {
"$" : {
"id" : "AK"
},
"name" : [{
"_" : "Alaska"
}
]
}
]
}
]
}
A common pitfall is to forget that json_encode() does not respect elements with a textvalue and attribute(s). It will choose one of those, meaning dataloss.
The function below solves that problem. If one decides to go for the json_encode/decode way, the following function is advised.
function json_prepare_xml($domNode) {
foreach($domNode->childNodes as $node) {
if($node->hasChildNodes()) {
json_prepare_xml($node);
} else {
if($domNode->hasAttributes() && strlen($domNode->nodeValue)){
$domNode->setAttribute("nodeValue", $node->textContent);
$node->nodeValue = "";
}
}
}
}
$dom = new DOMDocument();
$dom->loadXML( file_get_contents($xmlfile) );
json_prepare_xml($dom);
$sxml = simplexml_load_string( $dom->saveXML() );
$json = json_decode( json_encode( $sxml ) );
by doing so, <foo bar="3">Lorem</foo> will not end up as {"foo":"Lorem"} in your JSON.
Try to use this
$xml = ... // Xml file data
// first approach
$Json = json_encode(simplexml_load_string($xml));
---------------- OR -----------------------
// second approach
$Json = json_encode(simplexml_load_string($xml, "SimpleXMLElement", LIBXML_NOCDATA));
echo $Json;
Or
You can use this library : https://github.com/rentpost/xml2array
if you XML is a soap file, you can use this:
$xmlStr = preg_replace("/(<\/?)(\w+):([^>]*>)/", "$1$2$3", $xmlStr);
$xml = new SimpleXMLElement($xmlStr);
return json_encode($xml);
Best solution which works like a charm
$fileContents= file_get_contents($url);
$fileContents = str_replace(array("\n", "\r", "\t"), '', $fileContents);
$fileContents = trim(str_replace('"', "'", $fileContents));
$simpleXml = simplexml_load_string($fileContents);
//$json = json_encode($simpleXml); // Remove // if you want to store the result in $json variable
echo '<pre>'.json_encode($simpleXml,JSON_PRETTY_PRINT).'</pre>';
Source
This solution handles namespaces, attributes, and produces consistent result with repeating elements (always in array, even if there is only one occurrence).
Inspired by ratfactor's sxiToArray().
/**
* <root><a>5</a><b>6</b><b>8</b></root> -> {"root":[{"a":["5"],"b":["6","8"]}]}
* <root a="5"><b>6</b><b>8</b></root> -> {"root":[{"a":"5","b":["6","8"]}]}
* <root xmlns:wsp="http://schemas.xmlsoap.org/ws/2004/09/policy"><a>123</a><wsp:b>456</wsp:b></root>
* -> {"root":[{"xmlns:wsp":"http://schemas.xmlsoap.org/ws/2004/09/policy","a":["123"],"wsp:b":["456"]}]}
*/
function domNodesToArray(array $tags, \DOMXPath $xpath)
{
$tagNameToArr = [];
foreach ($tags as $tag) {
$tagData = [];
$attrs = $tag->attributes ? iterator_to_array($tag->attributes) : [];
$subTags = $tag->childNodes ? iterator_to_array($tag->childNodes) : [];
foreach ($xpath->query('namespace::*', $tag) as $nsNode) {
// the only way to get xmlns:*, see https://stackoverflow.com/a/2470433/2750743
if ($tag->hasAttribute($nsNode->nodeName)) {
$attrs[] = $nsNode;
}
}
foreach ($attrs as $attr) {
$tagData[$attr->nodeName] = $attr->nodeValue;
}
if (count($subTags) === 1 && $subTags[0] instanceof \DOMText) {
$text = $subTags[0]->nodeValue;
} elseif (count($subTags) === 0) {
$text = '';
} else {
// ignore whitespace (and any other text if any) between nodes
$isNotDomText = function($node){return !($node instanceof \DOMText);};
$realNodes = array_filter($subTags, $isNotDomText);
$subTagNameToArr = domNodesToArray($realNodes, $xpath);
$tagData = array_merge($tagData, $subTagNameToArr);
$text = null;
}
if (!is_null($text)) {
if ($attrs) {
if ($text) {
$tagData['_'] = $text;
}
} else {
$tagData = $text;
}
}
$keyName = $tag->nodeName;
$tagNameToArr[$keyName][] = $tagData;
}
return $tagNameToArr;
}
function xmlToArr(string $xml)
{
$doc = new \DOMDocument();
$doc->loadXML($xml);
$xpath = new \DOMXPath($doc);
$tags = $doc->childNodes ? iterator_to_array($doc->childNodes) : [];
return domNodesToArray($tags, $xpath);
}
Example:
php > print(json_encode(xmlToArr('<root a="5"><b>6</b></root>')));
{"root":[{"a":"5","b":["6"]}]}
I've used Miles Johnson's TypeConverter for this purpose. It's installable using Composer.
You could write something like this using it:
<?php
require 'vendor/autoload.php';
use mjohnson\utility\TypeConverter;
$xml = file_get_contents("file.xml");
$arr = TypeConverter::xmlToArray($xml, TypeConverter::XML_GROUP);
echo json_encode($arr);
Optimizing Antonio Max answer:
$xmlfile = 'yourfile.xml';
$xmlparser = xml_parser_create();
// open a file and read data
$fp = fopen($xmlfile, 'r');
//9999999 is the length which fread stops to read.
$xmldata = fread($fp, 9999999);
// converting to XML
$xml = simplexml_load_string($xmldata, "SimpleXMLElement", LIBXML_NOCDATA);
// converting to JSON
$json = json_encode($xml);
$array = json_decode($json,TRUE);
$content = str_replace(array("\n", "\r", "\t"), '', $response);
$content = trim(str_replace('"', "'", $content));
$xml = simplexml_load_string($content);
$json = json_encode($xml);
return json_decode($json,TRUE);
This worked for me
If you would like to only convert a specific part of the XML to JSON, you can use XPath to retrieve this and convert that to JSON.
<?php
$file = #file_get_contents($xml_File, FILE_TEXT);
$xml = new SimpleXMLElement($file);
$xml_Excerpt = #$xml->xpath('/states/state[#id="AL"]')[0]; // [0] gets the node
echo json_encode($xml_Excerpt);
?>
Please note that if you Xpath is incorrect, this will die with an error. So if you're debugging this through AJAX calls I recommend you log the response bodies as well.
This is an improvement of the most upvoted solution by Antonio Max, which also works with XML that has namespaces (by replacing the colon with an underscore). It also has some extra options (and does parse <person my-attribute='name'>John</person> correctly).
function parse_xml_into_array($xml_string, $options = array()) {
/*
DESCRIPTION:
- parse an XML string into an array
INPUT:
- $xml_string
- $options : associative array with any of these keys:
- 'flatten_cdata' : set to true to flatten CDATA elements
- 'use_objects' : set to true to parse into objects instead of associative arrays
- 'convert_booleans' : set to true to cast string values 'true' and 'false' into booleans
OUTPUT:
- associative array
*/
// Remove namespaces by replacing ":" with "_"
if (preg_match_all("|</([\\w\\-]+):([\\w\\-]+)>|", $xml_string, $matches, PREG_SET_ORDER)) {
foreach ($matches as $match) {
$xml_string = str_replace('<'. $match[1] .':'. $match[2], '<'. $match[1] .'_'. $match[2], $xml_string);
$xml_string = str_replace('</'. $match[1] .':'. $match[2], '</'. $match[1] .'_'. $match[2], $xml_string);
}
}
$output = json_decode(json_encode(#simplexml_load_string($xml_string, 'SimpleXMLElement', ($options['flatten_cdata'] ? LIBXML_NOCDATA : 0))), ($options['use_objects'] ? false : true));
// Cast string values "true" and "false" to booleans
if ($options['convert_booleans']) {
$bool = function(&$item, $key) {
if (in_array($item, array('true', 'TRUE', 'True'), true)) {
$item = true;
} elseif (in_array($item, array('false', 'FALSE', 'False'), true)) {
$item = false;
}
};
array_walk_recursive($output, $bool);
}
return $output;
}
This is better solution
$fileContents= file_get_contents("https://www.feedforall.com/sample.xml");
$fileContents = str_replace(array("\n", "\r", "\t"), '', $fileContents);
$fileContents = trim(str_replace('"', "'", $fileContents));
$simpleXml = simplexml_load_string($fileContents);
$json = json_encode($simpleXml);
$array = json_decode($json,TRUE);
return $array;
Found FTav's answer the most useful as it is very customizable, but his xml2js function has some flaws. For instance, if children elements has equal tagnames they all will be stored in a single object, this means that the order of elements will not be preserved. In some cases we really want to preserve order, so we better store every element's data in a separate object:
function xml2js($xmlnode) {
$jsnode = array();
$nodename = $xmlnode->getName();
$current_object = array();
if (count($xmlnode->attributes()) > 0) {
foreach($xmlnode->attributes() as $key => $value) {
$current_object[$key] = (string)$value;
}
}
$textcontent = trim((string)$xmlnode);
if (strlen($textcontent) > 0) {
$current_object["content"] = $textcontent;
}
if (count($xmlnode->children()) > 0) {
$current_object['children'] = array();
foreach ($xmlnode->children() as $childxmlnode) {
$childname = $childxmlnode->getName();
array_push($current_object['children'], xml2js($childxmlnode, true));
}
}
$jsnode[ $nodename ] = $current_object;
return $jsnode;
}
Here is how it works. Initial xml structure:
<some-tag some-attribute="value of some attribute">
<another-tag>With text</another-tag>
<surprise></surprise>
<another-tag>The last one</another-tag>
</some-tag>
Result JSON:
{
"some-tag": {
"some-attribute": "value of some attribute",
"children": [
{
"another-tag": {
"content": "With text"
}
},
{
"surprise": []
},
{
"another-tag": {
"content": "The last one"
}
}
]
}
}
All solutions here have problems!
... When the representation need perfect XML interpretation (without problems with attributes) and to reproduce all text-tag-text-tag-text-... and order of tags. Also good remember here that JSON object "is an unordered set" (not repeat keys and the keys can't have predefined order)... Even ZF's xml2json is wrong (!) because not preserve exactly the XML structure.
All solutions here have problems with this simple XML,
<states x-x='1'>
<state y="123">Alabama</state>
My name is <b>John</b> Doe
<state>Alaska</state>
</states>
... #FTav solution seems better than 3-line solution, but also have little bug when tested with this XML.
Old solution is the best (for loss-less representation)
The solution, today well-known as jsonML, is used by Zorba project and others, and was first presented in ~2006 or ~2007, by (separately) Stephen McKamey and John Snelson.
// the core algorithm is the XSLT of the "jsonML conventions"
// see https://github.com/mckamey/jsonml
$xslt = 'https://raw.githubusercontent.com/mckamey/jsonml/master/jsonml.xslt';
$dom = new DOMDocument;
$dom->loadXML('
<states x-x=\'1\'>
<state y="123">Alabama</state>
My name is <b>John</b> Doe
<state>Alaska</state>
</states>
');
if (!$dom) die("\nERROR!");
$xslDoc = new DOMDocument();
$xslDoc->load($xslt);
$proc = new XSLTProcessor();
$proc->importStylesheet($xslDoc);
echo $proc->transformToXML($dom);
Produce
["states",{"x-x":"1"},
"\n\t ",
["state",{"y":"123"},"Alabama"],
"\n\t\tMy name is ",
["b","John"],
" Doe\n\t ",
["state","Alaska"],
"\n\t"
]
See http://jsonML.org or github.com/mckamey/jsonml. The production rules of this JSON are based on the element JSON-analog,
This syntax is a element definition and recurrence, with element-list ::= element ',' element-list | element.
After researching a little bit all of the answers, I came up with a solution that worked just fine with my JavaScript functions across browsers (Including consoles / Dev Tools) :
<?php
// PHP Version 7.2.1 (Windows 10 x86)
function json2xml( $domNode ) {
foreach( $domNode -> childNodes as $node) {
if ( $node -> hasChildNodes() ) { json2xml( $node ); }
else {
if ( $domNode -> hasAttributes() && strlen( $domNode -> nodeValue ) ) {
$domNode -> setAttribute( "nodeValue", $node -> textContent );
$node -> nodeValue = "";
}
}
}
}
function jsonOut( $file ) {
$dom = new DOMDocument();
$dom -> loadXML( file_get_contents( $file ) );
json2xml( $dom );
header( 'Content-Type: application/json' );
return str_replace( "#", "", json_encode( simplexml_load_string( $dom -> saveXML() ), JSON_PRETTY_PRINT ) );
}
$output = jsonOut( 'https://boxelizer.com/assets/a1e10642e9294f39/b6f30987f0b66103.xml' );
echo( $output );
/*
Or simply
echo( jsonOut( 'https://boxelizer.com/assets/a1e10642e9294f39/b6f30987f0b66103.xml' ) );
*/
?>
It basically creates a new DOMDocument, loads and XML file into it and traverses through each one of the nodes and children getting the data / parameters and exporting it into JSON without the annoying "#" signs.
Link to the XML file.
With accepted (antonio's) answer, from such source:
<MyData>
<Level1 myRel="parent" myName="AAA">
<Level2 myRel="child1" myName="BBB">
<Level2 myRel="child2" myName="CCC">
...
you will get array like:
'Level1' =>
[
0 =>
[
'#attributes' =>
[
'myRel' => 'parent'
'myName' => 'AAA'
],
'Level2' =>
[
0 =>
[
'#attributes' =>
[
'myRel' => 'child_1'
'myName' => 'BBB'
],
So, if you want to have Key-Paired array (instead of 0 numbers), with your chosen key, i.e. myName:
'Level1' =>
[
'AAA' =>
[
'#attributes' =>
[
'myRel' => 'parent'
'myName' => 'AAA'
],
'Level2' =>
[
'BBB' =>
[
'#attributes' =>
[
'myRel' => 'child_1'
'myName' => 'BBB'
],
then use xmlToArrayByKey($xmlContent, 'myName'). Code Here:
public function xmlToArrayByKey($content, $keyName)
{
try
{
$xml = simplexml_load_string($content, "SimpleXMLElement", LIBXML_NOCDATA );
$array= json_decode( json_encode($xml), TRUE);
return $this->xmlSetChild($array, $keyName);
} catch (Exception $ex) {
return ['xmlerror'=>$ex];
}
}
public function xmlSetChild($array, $keyName, $step=0)
{
$new_array= [];
foreach ($array as $key_1=>$value_1)
{
if (is_array($value_1) && isset($value_1[0]))
{
foreach ($value_1 as $idx=>$value_2)
{
$keyValue = $value_2['#attributes'][$keyName];
$new_array[$key_1][$keyValue] = $this->xmlSetChild($value_2, $keyName, $step+1);
}
}
else{
$new_array[$key_1]=$value_1;
}
}
return $new_array;
}
//main fuction ===========================
function xml2array($responce)
{
$doc = new DOMDocument();
$doc->loadXML($responce);
$root = $doc->documentElement;
$output = domNodeToArray($root);
$output['#root'] = $root->tagName;
return $output;
}
//convert function =====================
function domNodeToArray($node)
{
$output = [];
switch ($node->nodeType) {
case XML_CDATA_SECTION_NODE:
case XML_TEXT_NODE:
$output = trim($node->textContent);
break;
case XML_ELEMENT_NODE:
for ($i = 0, $m = $node->childNodes->length; $i < $m; $i++) {
$child = $node->childNodes->item($i);
$v = domNodeToArray($child);
if (isset($child->tagName)) {
$t = $child->tagName;
if (!isset($output[$t])) {
$output[$t] = [];
}
$output[$t][] = $v;
} elseif ($v || $v === '0') {
$output = (string) $v;
}
}
if ($node->attributes->length && !is_array($output)) { // Has attributes but isn't an array
$output = ['#content' => $output]; // Change output into an array.
}
if (is_array($output)) {
if ($node->attributes->length) {
$a = [];
foreach ($node->attributes as $attrName => $attrNode) {
$a[$attrName] = (string) $attrNode->value;
}
$output['#attributes'] = $a;
}
foreach ($output as $t => $v) {
if (is_array($v) && count($v) == 1 && $t != '#attributes') {
$output[$t] = $v[0];
}
}
}
break;
}
return $output;
}
//REQUEST BY SOAP CLINTE==========================================================
$sopeclient = new SoapClient('http://b2b.travel.us/FlightBooking.asmx?wsdl');
$param = array('InputSTR'=>'
<AirSearchQuery>
<Master>
<CompanyId>*****</CompanyId>
<AgentId>1</AgentId>
<BranchId>1</BranchId>
<CoustmerType>AGNT</CoustmerType>
</Master>
<JourneyType>O</JourneyType>
<Currency>USD</Currency>
<Segments>
<Segment id="1">
<Origin>'.$request->depart.'</Origin>
<Destination>'.$request->destination.'</Destination>
<Date>'.$request->departOn.'</Date>
<Time></Time>
</Segment>
</Segments>
</AirSearchQuery>
);
$responce = $sopeclient->SearchFare($param);
}
//RESPONCE GET ======================================================
+"SearchFareResult": "<Itineraries><Itinerary><UniqueID>SK12041915MS7601445MS8750805</UniqueID><TrackID>AAL_LOS_24-02-2022_100_697074770_637812140580760438</TrackID><BaseFare>301.00</BaseFare><Taxes>224.90</Taxes><TotalPrice>525.90</TotalPrice><GrandTotal /><Currency>USD</Currency><FareType>RP</FareType><Adult><NoAdult>1</NoAdult><AdTax>224.9</AdTax><AdtBFare>301.00</AdtBFare></Adult><IndexNumber>0</IndexNumber><Provider>2A</Provider><ValCarrier>MS</ValCarrier><LastTicketingDate /><OutBound>3</OutBound><InBound>0</InBound><Sectors><Sector nearby="" isConnect="" isStopover=""><AirV>SK</AirV><AirlineName>Scandinavian Airlines</AirlineName><AirlineLogoPath>http://www.travelcation.us/AirlineLogo/SKs.gif</AirlineLogoPath><Class>U</Class><CabinClass><Code>Y</Code><Des>ECONOMY</Des></CabinClass><NoSeats>9</NoSeats><FltNum>1204</FltNum><Departure><AirpCode>AAL</AirpCode><Terminal /><Date>24-02-2022</Date><Time>19:15</Time><AirpName>Aalborg</AirpName><CityCode>AAL</CityCode><CityName>Aalborg</CityName><CountryCode>DK</CountryCode><CountryName>Denmark</CountryName><Day>Thu</Day><GEO_Code /></Departure><Arrival><AirpCode>CPH</AirpCode><Terminal>3</Terminal><Date>24-02-2022</Date><Time>20:00</Time><AirpName>Kastrup</AirpName><CityCode>CPH</CityCode><CityName>Copenhagen</CityName><CountryCode>DK</CountryCode><CountryName>Denmark</CountryName><Day>Thu</Day><GEO_Code /></Arrival><EquipType>Canadair Regional Jet CR9</EquipType><ElapsedTime>00:45</ElapsedTime><ActualTime>42:00</ActualTime><TechStopOver>0</TechStopOver><Status>OK</Status><isReturn>false</isReturn><OptrCarrier OptrCarrierDes="Cityjet">WX</OptrCarrier><MrktCarrier MrktCarrierDes="Scandinavian Airlines">SK</MrktCarrier><BaggageInfo>2 pcs</BaggageInfo><TransitTime time="00:00" /></Sector><Sector nearby="" isConnect="" isStopover=""><AirV>MS</AirV><AirlineName>EgyptAir</AirlineName><AirlineLogoPath>http://www.travelcation.us/AirlineLogo/MSs.gif</AirlineLogoPath><Class>V</Class><CabinClass><Code>Y</Code><Des>ECONOMY</Des></CabinClass><NoSeats>9</NoSeats><FltNum>760</FltNum><Departure><AirpCode>CPH</AirpCode><Terminal>3</Terminal><Date>25-02-2022</Date><Time>14:45</Time><AirpName>Kastrup</AirpName><CityCode>CPH</CityCode><CityName>Copenhagen</CityName><CountryCode>DK</CountryCode><CountryName>Denmark</CountryName><Day>Fri</Day><GEO_Code /></Departure><Arrival><AirpCode>CAI</AirpCode><Terminal>3</Terminal><Date>25-02-2022</Date><Time>20:05</Time><AirpName>Cairo Intl.</AirpName><CityCode>CAI</CityCode><CityName>Cairo</CityName><CountryCode>EG</CountryCode><CountryName>Egypt</CountryName><Day>Fri</Day><GEO_Code /></Arrival><EquipType>Boeing 738</EquipType><ElapsedTime>05:20</ElapsedTime><ActualTime>42:00</ActualTime><TechStopOver>0</TechStopOver><Status>OK</Status><isReturn>false</isReturn><OptrCarrier OptrCarrierDes="EgyptAir">MS</OptrCarrier><MrktCarrier MrktCarrierDes="EgyptAir">MS</MrktCarrier><BaggageInfo>2 pcs</BaggageInfo><TransitTime time="18:45">Connection of 18 Hours 45 Mins in Kastrup, Copenhagen, Denmark</TransitTime></Sector><Sector nearby="" isConnect="" isStopover=""><AirV>MS</AirV><AirlineName>EgyptAir</AirlineName><AirlineLogoPath>http://www.travelcation.us/AirlineLogo/MSs.gif</AirlineLogoPath><Class>L</Class><CabinClass><Code>Y</Code><Des>ECONOMY</Des></CabinClass><NoSeats>5</NoSeats><FltNum>875</FltNum><Departure><AirpCode>CAI</AirpCode><Terminal>3</Terminal><Date>26-02-2022</Date><Time>08:05</Time><AirpName>Cairo Intl.</AirpName><CityCode>CAI</CityCode><CityName>Cairo</CityName><CountryCode>EG</CountryCode><CountryName>Egypt</CountryName><Day>Sat</Day><GEO_Code /></Departure><Arrival><AirpCode>LOS</AirpCode><Terminal>D</Terminal><Date>26-02-2022</Date><Time>13:15</Time><AirpName>Murtala Muhammed</AirpName><CityCode>LOS</CityCode><CityName>Lagos</CityName><CountryCode>NG</CountryCode><CountryName>Nigeria</CountryName><Day>Sat</Day><GEO_Code /></Arrival><EquipType>Boeing 738</EquipType><ElapsedTime>05:10</ElapsedTime><ActualTime>42:00</ActualTime><TechStopOver>0</TechStopOver><Status>OK</Status><isReturn>false</isReturn><OptrCarrier OptrCarrierDes="EgyptAir">MS</OptrCarrier><MrktCarrier MrktCarrierDes="EgyptAir">MS</MrktCarrier><BaggageInfo>2 pcs</BaggageInfo><TransitTime time="12:00">Connection of 12 Hours 0 Mins in Cairo Intl., Cairo, Egypt</TransitTime></Sector></Sectors><FareBasisCodes><FareBasiCode><FareBasis>VOFLOWMS</FareBasis><Airline>MS</Airline><PaxType>ADT</PaxType><Origin /><Destination /><FareRst /></FareBasiCode>
//call method===========================================
$xml2json = xml2array($responce->SearchFareResult);
print_r($xml2json);
die;
//view result ====================================================
array:3 [▼
"Itinerary" => array:63 [▼
0 => array:17 [▼
"UniqueID" => "SK12041915MS7601445MS8750805"
"TrackID" => "AAL_LOS_24-02-2022_100_946417400_637812150487718359"
"BaseFare" => "301.00"
"Taxes" => "224.90"
"TotalPrice" => "525.90"
"GrandTotal" => []
"Currency" => "USD"
"FareType" => "RP"
"Adult" => array:3 [▼
"NoAdult" => "1"
"AdTax" => "224.9"
"AdtBFare" => "301.00"
]
"IndexNumber" => "0"
"Provider" => "2A"
"ValCarrier" => "MS"
"LastTicketingDate" => []
"OutBound" => "3"
"InBound" => "0"
"Sectors" => array:1 [▼
"Sector" => array:3 [▶]
]
"FareBasisCodes" => array:1 [▶]
Looks like the $state->name variable is holding an array. You can use
var_dump($state)
inside the foreach to test that.
If that's the case, you can change the line inside the foreach to
$states[]= array('state' => array_shift($state->name));
to correct it.
$templateData = $_POST['data'];
// initializing or creating array
$template_info = $templateData;
// creating object of SimpleXMLElement
$xml_template_info = new SimpleXMLElement("<?xml version=\"1.0\"?><template></template>");
// function call to convert array to xml
array_to_xml($template_info,$xml_template_info);
//saving generated xml file
$xml_template_info->asXML(dirname(__FILE__)."/manifest.xml") ;
// function defination to convert array to xml
function array_to_xml($template_info, &$xml_template_info) {
foreach($template_info as $key => $value) {
if(is_array($value)) {
if(!is_numeric($key)){
$subnode = $xml_template_info->addChild($key);
if(is_array($value)){
$cont = 0;
foreach(array_keys($value) as $k){
if(is_numeric($k)) $cont++;
}
}
if($cont>0){
for($i=0; $i < $cont; $i++){
$subnode = $xml_body_info->addChild($key);
array_to_xml($value[$i], $subnode);
}
}else{
$subnode = $xml_body_info->addChild($key);
array_to_xml($value, $subnode);
}
}
else{
array_to_xml($value, $xml_template_info);
}
}
else {
$xml_template_info->addChild($key,$value);
}
}
}
If you are ubuntu user install xml reader (i have php 5.6. if you have other please find package and install)
sudo apt-get install php5.6-xml
service apache2 restart
$fileContents = file_get_contents('myDirPath/filename.xml');
$fileContents = str_replace(array("\n", "\r", "\t"), '', $fileContents);
$fileContents = trim(str_replace('"', "'", $fileContents));
$oldXml = $fileContents;
$simpleXml = simplexml_load_string($fileContents);
$json = json_encode($simpleXml);

Categories