PHP parsing xml xpath - php

XML
<person>
<description>
<p>blah blah blah</p>
<p>kjdsfksdjf</p>
</description>
</person>
<person>
<description>
k kjsdf kk sak kfsdjk sadk
</description>
</person>
I'd like to parse the description so that it returns the html tags that are inside.
I've tried both of these, without success
$description = ereg_replace('<description>|</description>','',$person->description->asXML());
$description = $person->description;
Any suggestions?
EDIT
What I'm trying to accomplish is to import an xml file into a mysql db. Everything is working accept what is mentioned above... the paragraph tags inside the description aren't showing up... and they need to be there. The mysql field "description" is set as a text field. If I was to parse the xml to output in the browser then $description = ereg_replace('<description>|</description>','',$person->description->asXML()); works fine... this isn't true though when I'm trying to import into mysql. Do I need to add something to the mysql INSERT? mysql_query("UPDATE table SET description = '$value' WHERE id = '$id'");

Please familiarize yourself with the SimpleXml API:
$xml = <<< XML
<person>
<description>
<p>blah blah blah</p>
<p>kjdsfksdjf</p>
</description>
</person>
XML;
$person = simplexml_load_string($xml);
foreach ($person->description->children() as $child) {
echo $child->asXml();
}
gives
<p>blah blah blah</p><p>kjdsfksdjf</p>
Note that SimpleXml isnt capable of doing the same for the second description element you show because it has no concept of text nodes, e.g.
$xml = <<< XML
<person>
<description>
k kjsdf kk sak kfsdjk sadk
</description>
</person>
XML;
$person = simplexml_load_string($xml);
foreach ($person->description->children() as $child) {
echo $child->asXml();
}
will return an empty string. If you want a unified API, use DOM:
$xml = <<< XML
<people>
<person>
<description>
<p>blah blah blah</p>
<p>kjdsfksdjf</p>
</description>
</person>
<person>
<description>
k kjsdf kk sak kfsdjk sadk
</description>
</person>
</people>
XML;
$dom = new DOMDocument;
$dom->loadXml($xml);
$xp = new DOMXPath($dom);
foreach ($xp->query('/people/person/description/node()') as $child) {
echo $dom->saveXml($child);
}
will give
<p>blah blah blah</p>
<p>kjdsfksdjf</p>
k kjsdf kk sak kfsdjk sadk
For importing XML into MySql, you can also use http://dev.mysql.com/doc/refman/5.5/en/load-xml.html

I'd like to parse the description so that it returns the html tags that are inside.
In XPath you would select the child nodes of the description elements.
Use:
"//person/description/*"
to get all child nodes (html tags only) or
"//person/description/node()"
to get all child nodes (html tags and text nodes).
For instance, this php code:
<?php
$xml = simplexml_load_file("test.xml");
$result = $xml->xpath("//person/description/*");
print_r($result);
?>
Returns an array of SimpleXMLElements which are children of description. Each item is retrieved with all its descendant nodes.

Related

PHP XML append to created file

I have the following XML documment:
<list>
<person>
<name>Simple name</name>
</person>
</list>
I try to read it, and basically create another "person" element. The output I want to achieve is:
<list>
<person>
<name>Simple name</name>
</person>
<person>
<name>Simple name again</name>
</person>
</list>
Here is how I am doing it:
$xml = new DOMDocument();
$xml->load('../test.xml');
$list = $xml->getElementsByTagName('list') ;
if ($list->length > 0) {
$person = $xml->createElement("person");
$name = $xml->createElement("name");
$name->nodeValue = 'Simple name again';
$person->appendChild($name);
$list->appendChild($person);
}
$xml->save("../test.xml");
What I am missing here?
Edit: I have translated the tags, so that example would be clearer.
Currently, you're pointing/appending to the node list instead of that found parent node:
$list->appendChild($person);
// ^ DOMNodeList
You should point to the element:
$list->item(0)->appendChild($person);
Sidenote: The text can already put inside the second argument of ->createElement():
$name = $xml->createElement("name", 'Simple name again');

Prepending raw XML using PHP's SimpleXML

Given a base $xml and a file containing a <something> tag with attributes, children and children of its children, I would like to append it as first child and all of its children as raw XML.
Original XML:
<root>
<people>
<person>
<name>John Doe</name>
<age>47</age>
</person>
<person>
<name>James Johnson</name>
<age>13</age>
</person>
</people>
</root>
XML in file:
<something someval="x" otherthing="y">
<child attr="val" ..> { some children and values ... }</child>
<child attr="val2" ..> { some children and values ... }</child>
...
</something>
Result XML:
<root>
<something someval="x" otherthing="y">
<child attr="val" ..> { some children and values ... }</child>
<child attr="val2" ..> { some children and values ... }</child>
...
</something>
<people>
<person>
<name>John Doe</name>
<age>47</age>
</person>
<person>
<name>James Johnson</name>
<age>13</age>
</person>
</people>
</root>
This tag would contain several children both direct and recursively, so it would not be practical to build the XML via the SimpleXML operations. Besides, keeping it in a file would result in lower maintenance costs.
Technically it would simply be prepending one child. The problem is that this child would have other children and so on.
On the PHP addChild page there's a comment that says:
$x = new SimpleXMLElement('<root name="toplevel"></root>');
$f1 = new SimpleXMLElement('<child pos="1">alpha</child>');
$x->{$f1->getName()} = $f1; // adds $f1 to $x
However, this does not seem to treat my XML as raw XML therefore causing < and > escaped tags to appear. Several warnings concerning namespaces seem to appear as well.
I suppose I could do a quick replace of such tags but I am not sure whether it could cause future problems and it certainly does not feel right.
Manually hacking the XML is not an option and neither is adding children one by one. Choosing a different library could be.
Any clues on how to get this working?
Thanks!
I'm really not sure if that will work. Try this or downvote this, but I hope it helps. Using DOMDocument (Reference)
<?php
$xml = new DOMDocument();
$xml->loadHTML($yourOriginalXML);
$newNode = DOMDocument::createElement($someXMLtoPrepend);
$nodeRoot = $xml->getElementsByTagName('root')->item(0);
$nodeOriginal = $xml->getElementsByTagName('people')->item(0);
$nodeRoot->insertBefore($newNode,$nodeOriginal);
$finalXmlAsString = $xml->saveXML();
?>
Sometimes UTF-8 can make problems, then try this:
<?php
$xml = new DOMDocument();
$xml->loadHTML(mb_convert_encoding($yourOriginalXML, 'HTML-ENTITIES', 'UTF-8'));
$newNode = DOMDocument::createElement(mb_convert_encoding($someXMLtoPrepend, 'HTML-ENTITIES', 'UTF-8'));
$nodeRoot = $xml->getElementsByTagName('root')->item(0);
$nodeOriginal = $xml->getElementsByTagName('people')->item(0);
$nodeRoot->insertBefore($newNode,$nodeOriginal);
$finalXmlAsString = $xml->saveXML();
?>

php xml dom extract data from non-standard xml

Hello I have xml like this:
<specs><my>base</my><root>none</root></specs>
<books>
<item>
<id>14</id>
<title>How to live</title>
</item>
<item>
...
</item>
</books>
How can I extract value from < my > ? and then < title >?
when I have data such as :<specs><my>base</my><root>none</root></specs> in xml this code works for me. So how should I modify it to work with data such as books as well in xml?
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXPath($dom);
$entry = $xpath->query("//xml/specs/my");
foreach($entry as $ent){
echo $ent->nodeValue;
}
simply I added this:
$xml="<xml>".$xml."</xml>";
and now this works $xpath->query("//xml/specs/my"); as well as $xpath->query("//xml/books/item");

Parsing XML file

I've got a problem with parsing an XML file (nb. well formed one).
Consider XML file like this:
<?xml version="1.0" encoding="utf-8" ?>
<root>
<list>
<item no="1">
<title>Item's 1 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
<item no="2">
<title>Item's 2 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
</list>
</root>
I need to get contents contents of each item in the list and put them in an array. Generally not a problem, but in this case, I can't get my head round it.
Problem lays in <content> contents. It is string with tags in-between. I can't find a way to extract the contents. SimpleXML returns/echoes just the string with anything including and inside <special> tags stripped out. Like this:
Some long content with inside.
I'd ideally want it to get a string like this:
Some long content with <special>tags</special> inside
How do I get it?
You could use DOMDocument which is built into PHP.
<?php
$xml = <<<END
<?xml version="1.0" encoding="utf-8" ?>
<root>
<list>
<item no="1">
<title>Item's 1 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
<item no="2">
<title>Item's 2 title</title>
<content>Some long content with <special>tags</special> inside</content>
</item>
</list>
</root>
END;
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML($xml);
$nodes = $doc->getElementsByTagName('content');
foreach ( $nodes as $node )
{
$temp_doc = new DOMDocument('1.0', 'UTF-8');
foreach ( $node->childNodes as $child )
$temp_doc->appendChild($temp_doc->importNode($child, true));
echo $temp_doc->saveHTML(); // Outputs: Some long content with <special>tags</special> inside
}
To select the top level "content" elements (in case there are "content" elements inside), you can use DOMXPath.
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML($xml); // $xml from the example above
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('/root/list/item/content');
foreach ( $nodes as $node )
{
$temp_doc = new DOMDocument('1.0', 'UTF-8');
foreach ( $node->childNodes as $child )
$temp_doc->appendChild($temp_doc->importNode($child, true));
echo $temp_doc->saveHTML(); // Outputs: Some long content with <special>tags</special> inside
}
SimpleXML just doesn't support mixed content (text nodes with element nodes as siblings). I suggest you use XMLReader instead.
You could use SimpleXML's asXML function. It will return that called node as the xml string;
$xml = simplexml_load_file($file);
foreach($xml->list->item as $item) {
$content = $item->contents->asXML();
echo $content."\n";
}
will print:
<content>Some long content with <special>tags</special> inside</content>
<content>Some long content with <special>tags</special> inside</content>
it's a little ugly but you could then clip out the <content> and </content> with a substr:
$content = substr($content,9,-10);

Parsing an XML with Xpath in PHP

Consider the following code :
$dom = new DOMDocument();
$dom->loadXML($file);
$xmlPath = new DOMXPath($dom);
$arrNodes = $xmlPath->query('*/item');
foreach($arrNodes as $item){
//missing code
}
The $file is an xml and each item has a title and a description.
How can I display them (title and description)?
$file = "<item>
<title>test_title</title>
<desc>test</desc>
</item>";
I suggest using php's simplexml, with that, you still get xpath functionality, but with easier approach, for example you would access attributes like this:
$name = $item['name'];
Here's an example:
xmlfile.xml:
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<items>
<item title="Hello World" description="Hellowing the world.." />
<item title="Hello People" description="greeting people.." />
</items>
</xml>
do.php:
<?php
$xml_str = file_get_contents('xmlfile.xml');
$xml = new SimpleXMLElement($xml_str);
$items = $xml->xpath('*/item');
foreach($items as $item) {
echo $item['title'], ': ', $item['description'], "\n";
}
If your item looks like this:
<item>
<title>foo</title>
<description>frob</description>
</item>
You could use getElementsByTagName() and nodeValue:
foreach($arrNodes as $item){
print $item->getElementsByTagName('title')->item(0)->nodeValue;
}
Are title and description attributes? E. g. does an item look like this:
<item title="foo" description="frob" />
If so, you could just use getAttribute():
...
foreach($arrNodes as $item){
print $item->getAttribute('title');
}
The right XPath expression should be:
/*/item/title | /*/item/desc
Or
/*/item/*[self::title or self::desc]
This is evaluate to a node set with title and desc element in document order

Categories