How to convert XML attributes to text nodes - php

I have a PHP script that pulls an XML file from a remote server, and converts it to JSON using simplexml_load_string and json_encode. However, the simplexml_load_string seems to ignore inline attributes, like so:
<AxisFeedrate dataItemId="iid7" timestamp="2012-03-21T15:15:41-04:00" sequence="7" name="Yfrt" subType="ACTUAL" units="MILLIMETER/SECOND">UNAVAILABLE</AxisFeedrate>
In this case the JSON representation would be {AxisFeedrate: 'UNAVAILABLE'}
However, I need to have those attributes available. One idea I've been approaching is replacing strings to turn the attributes into text nodes like so:
<AxisFeedrate>
<dataItemId>iid7</dataItemId>
<timestamp>2012-03-21T15:15:41-04:00</timestamp>
<sequence>7</sequence>
<name>Yfrt</name>
<subType>ACTUAL</subType>
<units>MILLIMETER/SECOND"</units>
<value>UNAVAILABLE</value>
</AxisFeedrate>
I can turn the attributes into their own tag elements with regular find/replace, but I'm having trouble wrapping the original text value in a Value tag, at least with find/replace.
What are some good approaches for doing this? The above chunk of XML is in the middle of many similar chunks on different data items, so I couldn't just start by replacing the first closing > with >...

You could use SimpleXML itself to read the attributes.
Example:
<?php
$xml=simplexml_load_string('<AxisFeedrate dataItemId="iid7" timestamp="2012-03-21T15:15:41-04:00" sequence="7" name="Yfrt" subType="ACTUAL" units="MILLIMETER/SECOND">UNAVAILABLE</AxisFeedrate>');
foreach($xml->attributes() as $k=>$v) {
echo $k." -> ".(string)$v."\n";
}
?>
Output:
dataItemId -> iid7
timestamp -> 2012-03-21T15:15:41-04:00
sequence -> 7
name -> Yfrt
subType -> ACTUAL
units -> MILLIMETER/SECOND

Try this regex: ([\w]*?)="(.*?)" with this replace <$1>$2</$1>\n

You should use SimpleXML. Be aware though, that you have to cast values to string type explicitly, or you'll get objects.
$xml_string = <<<XML
<AxisFeedrate
dataItemId="iid7"
timestamp="2012-03-21T15:15:41-04:00"
sequence="7"
name="Yfrt"
subType="ACTUAL"
units="MILLIMETER/SECOND"
>UNAVAILABLE</AxisFeedrate>
XML;
$xml = simplexml_load_string($xml_string);
$axis_info = array('value' => (string)$xml);
foreach($xml -> attributes() as $attr => $val) {
$axis_info[$attr] = (string) $val;
}
echo json_encode(array("AxisFeedrate" => $axis_info));
Update:
This will give you a more generic version, but notice that the attributes are cast as an array and that this only works on a single element:
$xml_string = <<<XML
<AxisFeedrate dataItemId="iid7" timestamp="2012-03-21T15:15:41-04:00" sequence="7" name="Yfrt" subType="ACTUAL" units="MILLIMETER/SECOND">UNAVAILABLE</AxisFeedrate>
XML;
$xml = simplexml_load_string($xml_string);
$obj_name = $xml -> getName();
$attributes = (array) $xml->attributes();
$axis_info[$obj_name] = $attributes["#attributes"];
$axis_info[$obj_name]['value'] = (string) $xml;
echo json_encode($axis_info);

Related

Replace element value in DOM

I want to save DOM tags value to exist XML, I found replace function but it is in js and I need the function in PHP
I tried save and saveXML function, but this didn't worked. I have tags in XML with colon "iaiext:auction_title". I used getElement and it's work good, next i cut title to 50 characters function work too, but how i can replace old title to this new title if i dont use path like simple_load_file. How to show in my script this path?
$dom = new DOMDocument;
$dom->load('p.xml');
$i = 0;
$tytuly = $dom->getElementsByTagName('auction_title');
foreach ($tytuly as $tytul){
$title = $tytul->nodeValue;
$end_title = doTitleCut($title);
//echo "<pre>";
//echo($end_title);
//echo "<pre>";
$i = $i+1;
}
In your loop, you can update a particular nodes value the same way you fetch it - with nodeValue. So in your loop, just update it each time...
$tytul->nodeValue = doTitleCut($title);
Then after your loop, you can just echo the new XML out using
echo $dom->saveXML();
or save it using
$dom->save("3.xml");
It is the same basic API in PHP. However browsers implement more or other parts of the API. Here are 5 revisions of the API (DOM Level 1 to 4 and DOM LS). DOM 3 added a property to read/write the text content of a node: https://www.w3.org/TR/DOM-Level-3-Core/core.html#Node3-textContent
The following example prefixes the titles:
$xml = <<<'XML'
<auctions>
<auction_title>World!</auction_title>
<auction_title>World & Universe!</auction_title>
</auctions>
XML;
$document = new DOMDocument();
$document->loadXML($xml);
$titleNodes = $document->getElementsByTagName('auction_title');
foreach ($titleNodes as $titleNode) {
$title = $titleNode->textContent;
$titleNode->textContent = 'Hello '.$title;
}
echo $document->saveXML();
Output:
<?xml version="1.0"?>
<auctions>
<auction_title>Hello World!</auction_title>
<auction_title>Hello World & Universe!</auction_title>
</auctions>
PHPs DOMNode::$nodeValue implementation does not match the W3C API definition. It behaves the same as DOMNode::$textContent for reads and does not fully escape on write.

How to echo information from xml in PHP

I have some problems with echo line from xml file.
How i can do it fine?
I try to
$test = file_get_contents('');
$test = iconv('WINDOWS-1251', 'UTF-8', $test);
$test = "<xmp>".$test."</xmp>";
And try to find with preg_match_all, but it isn't work.
preg_match_all('/<ya:created dc:date="\d+\-\d+\-\d+\T\d+\:\d+\:\d+/', $test, $output_array);
It's work on https://www.phpliveregex.com/ but isn't work on my site.
https://www.phpliveregex.com/p/qCH
My XML:
<?xml version="1.0" encoding="WINDOWS-1251"?>
<rdf:RDF
xml:lang="ru"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:ya="http://blogs.yandex.ru/schema/foaf/"
xmlns:img="http://blogs.yandex.ru/schema/foaf/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<foaf:Person>
<ya:publicAccess>allowed</ya:publicAccess>
<foaf:gender>male</foaf:gender>
<ya:created dc:date="2011-01-30T16:43:45+03:00"/>
<ya:lastLoggedIn dc:date="2019-01-16T18:54:55+03:00"/>
<ya:modified dc:date="2019-01-13T21:15:43+03:00"/>
</foaf:Person>
</rdf:RDF>
You would be better accessing it using something like SimpleXML and XPath. There are at least two ways of doing it, both here rely on using XPath but you have to use the namespaces (the ya: bit) to ensure you get the right element. As XPath returns a list of matches, I just use [0] to get the first one, if there are multiple ones you can use a loop...
$test = file_get_contents("data.xml");
$xml = simplexml_load_string($test);
// Version 1
// Fetch the ya:created element
$created = $xml->xpath("//ya:created")[0];
// Extract the attributes and print the date
echo $created[0]->attributes("http://purl.org/dc/elements/1.1/")->date;
// Version 2
// Extract the dd:date attribute (using the #)
$createdDate = $xml->xpath("//ya:created/#dc:date")[0];
echo $createdDate;
Forgot to say - if you want to use these fields for a database etc. you may need to cast them to a string to make sure they are converted...
$date = (string)$createdDate;

PHP: Keeping HTML inside XML node without CDATA

I've got an xml like this:
<father>
<son>Text with <b>HTML</b>.</son>
</father>
I'm using simplexml_load_string to parse it into SimpleXmlElement. Then I get my node like this
$xml->father->son->__toString(); //output: "Text with .", but expected "Text with <b>HTML</b>."
I need to handle simple HTML such as:
<b>text</b> or <br/> inside the xml which is sent by many users.
Me problem is that I can't just ask them to use CDATA because they won't be able to handle it properly, and they are already use to do without.
Also, if it's possible I don't want the file to be edited because the information need to be the one sent by the user.
The function simplexml_load_string simply erase anything inside HTML node and the HTML node itself.
How can I keep the information ?
SOLUTION
To handle the problem I used the asXml as explained by #ThW:
$tmp = $xml->father->son->asXml(); //<son>Text with <b>HTML</b>.</son>
I just added a preg_match to erase the node.
A CDATA section is a character node, just like a text node. But it does less encoding/decoding. This is mostly a downside, actually. On the upside something in a CDATA section might be more readable for a human and it allows for some BC in special cases. (Think HTML script tags.)
For an XML API they are nearly the same. Here is a small DOM example (SimpleXML abstracts to much).
$document = new DOMDocument();
$father = $document->appendChild(
$document->createElement('father')
);
$son = $father->appendChild(
$document->createElement('son')
);
$son->appendChild(
$document->createTextNode('With <b>HTML</b><br>It\'s so nice.')
);
$son = $father->appendChild(
$document->createElement('son')
);
$son->appendChild(
$document->createCDataSection('With <b>HTML</b><br>It\'s so nice.')
);
$document->formatOutput = TRUE;
echo $document->saveXml();
Output:
<?xml version="1.0"?>
<father>
<son>With <b>HTML</b><br>It's so nice.</son>
<son><![CDATA[With <b>HTML</b><br>It's so nice.]]></son>
</father>
As you can see they are serialized very differently - but from the API view they are basically exchangeable. If you're using an XML parser the value you get back should be the same in both cases.
So the first possibility is just letting the HTML fragment be stored in a character node. It is just a string value for the outer XML document itself.
The other way would be using XHTML. XHTML is XML compatible HTML. You can mix an match different XML formats, so you could add the XHTML fragment as part of the outer XML.
That seems to be what you're receiving. But SimpleXML has some problems with mixed nodes. So here is an example how you can read it in DOM.
$xml = <<<'XML'
<father>
<son>With <b>HTML</b><br/>It's so nice.</son>
</father>
XML;
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$result = '';
foreach ($xpath->evaluate('/father/son[1]/node()') as $child) {
$result .= $document->saveXml($child);
}
echo $result;
Output:
With <b>HTML</b><br/>It's so nice.
Basically you need to save each child of the son element as XML.
SimpleXML is based on the same DOM library internally. That allows you to convert a SimpleXMLElement into a DOM node. From there you can again save each child as XML.
$father = new SimpleXMLElement($xml);
$sonNode = dom_import_simplexml($father->son);
$document = $sonNode->ownerDocument;
$result = '';
foreach ($sonNode->childNodes as $child) {
$result .= $document->saveXml($child);
}
echo $result;

Modify xml nodes using DOM or SIMPLE XML?

I have source XML here: http://www.grilykrby.cz/rss/pf-heureka.xml. I want to use this xml feed and create another modified on my own server. I would like to change every node CATEGORYTEXT which contains word Prislusenstvi. I just tried something but I got only the listing of all categories without changing XML :-(
Here is the example of my code. The row $kategorie="nejaka kategorie"; doesn't work.
<?php
$file = "http://www.grilykrby.cz/rss/pf-heureka.xml";
$xml=simplexml_load_file($file);
foreach ($xml->xpath('//SHOPITEM/CATEGORYTEXT') as $kategorie) {
echo $kategorie."<br />";
$kategorie="nejaka kategorie";
}
file_put_contents('test.xml', $xml->asXML());
?>
$kategorie is just a temp variable used in the loop which contains a copy of the data returned by xpath query. You would need to actually set the value directly in the $xml object.
I would personally also consider doing a str_replace or preg_replace within the XML content itself before parsing it into a simpleXML object.
Final Accepted Answer
<?php
$xml = simplexml_load_file('http://www.grilykrby.cz/rss/pf-heureka.xml');
$i=0;
foreach($xml -> SHOPITEM as $polozka) {
if ($polozka -> CATEGORYTEXT == "Příslušenství") $xml -> SHOPITEM[$i] -> CATEGORYTEXT = "Some other text";
$i++;
}
?>

How to get a specific node text using php DOM

I am trying to get the value (text) of a specific node from an xml document using php DOM classes but I cannot do it right because I get the text content of that node merged with its descendants.
Let's suppose that I need to get the trees from this document:
<?xml version="1.0"?>
<trees>
LarchRedwoodChestnutBirch
<trimmed>Larch</trimmed>
<trimmed>Redwood</trimmed>
</trees>
And I get:
LarchRedwoodChestnutBirchLarchRedwood
You can see that I cannot remove the substring LarchRedwood made by the trimmed trees from the whole text because I would get only ChestnutBirch and it is not what I need.
Any suggest? (Thanx)
I got it. This works:
function specificNodeValue($node, $implode = true) {
$value = array();
if ($node->childNodes) {
for ($i = 0; $i < $node->childNodes->length; $i++) {
if (!(#$node->childNodes->item($i)->tagName)) {
$value[] = $node->childNodes->item($i)->nodeValue;
}
}
}
return (is_string($implode) ? implode($implode, $value) : ($implode === true ? implode($value) : $value));
}
A given node is like a root, if you get no tagName when you parse its child nodes then it is itself, so the value of that child node it is its own value.
Inside a bad formed xml document a node could have many pieces of value, put them all into an array to get the whole value of the node.
Use the function above to get needed node value without subnode values merged within.
Parameters are:
$node (required) must be a DOMElement object
$implode (optional) if you want to get a string (true by default) or an array (false) made up by many pieces of value. (Set a string instead of a boolean value if you wish to implode the array using a "glue" string).
You can try this to remove the trimmed node
$doc = new DOMDocument('1.0', 'utf-8');
$doc->loadXML($xml);
$xpath = new DOMXpath($doc);
$trees = $doc->getElementsByTagName('trees')->item(0);
foreach ($xpath->query('/trees/*') as $node)
{
$trees->removeChild($node);
}
echo $trees->textContent;
echo $trees->nodeValue;
Use $node->nodeValue to get a node's text content. If you use $node->textContent, you get all text from the current node and all child nodes.
Ideally, the XML should be:
<?xml version="1.0"?>
<trees>
<tree>Larch</tree>
<tree>Redwood</tree>
<tree>Chestnut</tree>
<tree>Birch</tree>
</trees>
To split "LarchRedwoodChestnutBirch" into separate words (by capital letter), you'll need to use PHP's "PCRE" functions:
http://www.php.net/manual/en/book.pcre.php
'Hope that helps!

Categories