-
Hello Everyone,
I'm trying to access data in a XML file:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://dublincore.org/documents/dcmi- namespace/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd";>
<responseDate>2013-04-15T12:14:31Z</responseDate>
<ListRecords>
<record>
<header>
<identifier>
a1b31ab2-9efe-11df-9922-efbb156aa6c1:01442b82-59a4-627e-800f-c63de74fc109
</identifier>
<datestamp>2012-08-16T14:42:52Z</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd";>
<dc:description>...</dc:description>
<dc:date>1921</dc:date>
<dc:identifier>K11510</dc:identifier>
<dc:source>Waterschap Vallei & Eem</dc:source>
<dc:source>...</dc:source>
<dc:source>610</dc:source>
<dc:coverage>Bunschoten</dc:coverage>
<dc:coverage>Veendijk</dc:coverage>
<dc:coverage>Spakenburg</dc:coverage>
</oai_dc:dc>
</metadata>
<about>...</about>
</record>
This a a example of the XML.
I need to access data like dc:date dc:source etc.
Anyone any ideas?
Best regards,
Tim
-- UPDATE --
I'm now trying this:
foreach( $xml->ListRecords as $records )
{
foreach( $records AS $record )
{
$data = $record->children( 'http://www.openarchives.org/OAI/2.0/oai_dc/' );
$rows = $data->children( 'http://purl.org/dc/elements/1.1/' );
echo $rows->date;
break;
}
break;
}
You have nested elements that are in different XML namespaces. In concrete you have got two additional namespaces involved:
$nsUriOaiDc = 'http://www.openarchives.org/OAI/2.0/oai_dc/';
$nsUriDc = 'http://purl.org/dc/elements/1.1/';
The first one is for the <oai_dc:dc> element which contains the second ones * <dc:*>* elements like <dc:description> and so on. Those are the elements you're looking for.
In your code you already have a good nose how this works:
$data = $record->children( 'http://www.openarchives.org/OAI/2.0/oai_dc/' );
$rows = $data->children( 'http://purl.org/dc/elements/1.1/' );
However there is a little mistake: the $data children are not children of $record but of $record->metadata.
You also do not need to nest two foreach into each other. The code example:
$nsUriOaiDc = 'http://www.openarchives.org/OAI/2.0/oai_dc/';
$nsUriDc = 'http://purl.org/dc/elements/1.1/';
$records = $xml->ListRecords->record;
foreach ($records as $record)
{
$data = $record->metadata->children($nsUriOaiDc);
$rows = $data->children($nsUriDc);
echo $rows->date;
break;
}
/** output: 1921 **/
If you are running into problems like these, you can make use of $record->asXML('php://output'); to show which element(s) you are currently traversing to.
I think this is what you're looking for. Hope it helps ;)
use DomDocument for this like access to dc:date
$STR='
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://dublincore.org/documents/dcmi- namespace/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd";>
<responseDate>2013-04-15T12:14:31Z</responseDate>
<ListRecords>
<record>
<header> <identifier> a1b31ab2-9efe-11df-9922-efbb156aa6c1:01442b82-59a4-627e-800f-c63de74fc109 </identifier>
<datestamp>2012-08-16T14:42:52Z</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd";>
<dc:description>...</dc:description>
<dc:date>1921</dc:date>
<dc:identifier>K11510</dc:identifier>
<dc:source>Waterschap Vallei & Eem</dc:source>
<dc:source>...</dc:source>
<dc:source>610</dc:source>
<dc:coverage>Bunschoten</dc:coverage>
<dc:coverage>Veendijk</dc:coverage>
<dc:coverage>Spakenburg</dc:coverage>
</oai_dc:dc>
</metadata>
<about>...</about>
</record>';
$dom= new DOMDocument;
$STR= str_replace("&", "&", $STR); // disguise &s going IN to loadXML()
// $dom->substituteEntities = true; // collapse &s going OUT to transformToXML()
$dom->recover = TRUE;
#$dom->loadHTML('<?xml encoding="UTF-8">' .$STR);
// dirty fix
foreach ($dom->childNodes as $item)
if ($item->nodeType == XML_PI_NODE)
$dom->removeChild($item); // remove hack
$dom->encoding = 'UTF-8'; // insert proper
print_r($doc->getElementsByTagName('dc')->item(0)->getElementsByTagName('date')->item(0)->textContent);
output:
1921
or access to dc:source
$source= $doc->getElementsByTagName('dc')->item(0)->getElementsByTagName('source');
foreach($source as $value){
echo $value->textContent."\n";
}
output:
Waterschap Vallei & Eem
...
610
or give you array
$array=array();
$source= $doc->getElementsByTagName('dc')->item(0)->getElementsByTagName("*");
foreach($source as $value){
$array[$value->localName][]=$value->textContent."\n";
}
print_r($array);
output:
Array
(
[description] => Array
(
[0] => ...
)
[date] => Array
(
[0] => 1921
)
[identifier] => Array
(
[0] => K11510
)
[source] => Array
(
[0] => Waterschap Vallei & Eem
[1] => ...
[2] => 610
)
[coverage] => Array
(
[0] => Bunschoten
[1] => Veendijk
[2] => Spakenburg
)
)
Using XPath makes dealing with namespaces more straightforward:
<?php
// load the XML into a DOM document
$doc = new DOMDocument;
$doc->load('oai-response.xml'); // or use $doc->loadXML($xml) for an XML string
// bind the DOM document to an XPath object
$xpath = new DOMXPath($doc);
// map all the XML namespaces to prefixes, for use in XPath queries
$xpath->registerNamespace('oai', 'http://www.openarchives.org/OAI/2.0/');
$xpath->registerNamespace('oai_dc', 'http://www.openarchives.org/OAI/2.0/oai_dc/');
$xpath->registerNamespace('dc', 'http://purl.org/dc/elements/1.1/');
// identify each record using an XPath query
// collect data as either strings or arrays of strings
foreach ($xpath->query('oai:ListRecords/oai:record/oai:metadata/oai_dc:dc') as $item) {
$data = array(
'date' => $xpath->evaluate('string(dc:date)', $item), // $item is the context for this query
'source' => array(),
);
foreach ($xpath->query('dc:source', $item) as $source) {
$data['source'][] = $source->textContent;
}
print_r($data);
}
Related
I have a xml file output.xml
<?xml version="1.0" encoding="UTF-8"?>
<items>
<item>
<key type="your_id">CYBEX-525A/DA-IPOD</key>
<key type="web">cybex-525at-arc-trainer</key>
<key type="web">standard-console-2573</key>
<key type="name">Cybex 525AT Arc Trainer</key>
<key type="name">Standard console</key>
<review>
<order_id>1544346 1</order_id>
<author_nick>Brock</author_nick>
<author_email>bb#GMAIL.COM</author_email>
<date type="accepted">2013-10-14</date>
<comment type="overall">This cardio machine is exceptional. It works every part of your leg muscles if you rotate through the height settings and include calf-raises and squats during your routine. It also works your shoulders and biceps if you focus on working them while operating the arm poles. Unlike a standard elliptical it will raise your heart rate and cause you to sweat heavily soon after you start your routine. If you're a runner and are used to using a treadmill, you will feel satisfied after using this machine. It is kind of addictive because your body feels so good during and after use. I have combined 30 minutes on the treadmill with 30 minutes on the Arc for weight-loss, muscle tone, and cardiovascular training.</comment>
<score type="overall">5</score>
</review>
</item>
</items>
I need to save it in db, I'm just using below code to save data
if(file_exists('output.xml')){
$languages = simplexml_load_file("output.xml");
//echo '<pre>'; print_r($languages) ; die;
foreach($languages as $item){
echo '<pre>'; print_r($item->key) ; die;
foreach($item->review as $review){
$order_id = $review[0]->order_id;
$authorName = $review[0]->author_nick;
$authorEmail = strtolower($review[0]->author_email);
$comment = $review[0]->comment;
$score = $review[0]->score;
$date = $review[0]->date;
}
}
}
I need to get value of <key type="your_id">CYBEX-525A/DA-IPOD</key> and <key type="web">cybex-525at-arc-trainer</key> but unable to get data
when i print echo '<pre>'; print_r($item->key) ; die; within loop I'm getting following out put:
SimpleXMLElement Object
(
[#attributes] => Array
(
[type] => your_id
)
[0] => CYBEX-525A/DA-IPOD
[1] => cybex-525at-arc-trainer
[2] => standard-console-2573
[3] => Cybex 525AT Arc Trainer
[4] => Standard console
)
Is there any method for get all these data.
DOMXpath::evaluate() allows you to use expression to fetch nodes and scalar values from an XML DOM. The expression defines if the result is a node list or a scalar value. You can iterate a node list with foreach().
$document = new DOMDocument();
$document->loadXml($xmlString);
$xpath = new DOMXPath($document);
foreach ($xpath->evaluate('/items/item') as $item) {
var_dump(
[
'id' => $xpath->evaluate('string(key[#type="your_id"])', $item)
]
);
foreach ($xpath->evaluate('review', $item) as $review) {
var_dump(
[
'nick' => $xpath->evaluate('string(author_nick)', $review),
'email' => $xpath->evaluate('string(author_email)', $review)
]
);
}
}
Output:
array(1) {
["id"]=>
string(18) "CYBEX-525A/DA-IPOD"
}
array(2) {
["nick"]=>
string(5) "Brock"
["email"]=>
string(12) "bb#GMAIL.COM"
}
The second argument or DOMXpath::evaluate() is a context for the expression. So it is easy to iterate a list of nodes and fetch data for them.
Xpath functions like string() case the first node of a list into a scalar value. If the list was empty, and empty value of the type will be returned.
SimpleXML allows you fetch arrays of nodes using SimpleXMLElement::xpath(). Here is no direct way to fetch scalar values, but the implemented magic methods allow a compact syntax.
You will have to cast the returned SimpleXMLElement objects into strings.
$items = new SimpleXMLElement($xmlString);
foreach ($items->xpath('item') as $item) {
var_dump(
[
'id' => (string)$item->xpath('key[#type="your_id"]')[0]
]
);
foreach ($item->xpath('review') as $review) {
var_dump(
[
'nick' => (string)$review->author_nick,
'email' => (string)$review->author_email
]
);
}
}
If you won't use DOMDocument, you can try :
foreach($languages->item as $item){ //loop through item(s)
foreach($item->key as $key) { //loop through key(s)
if ($key["type"] == "your_id") echo $key; //echo CYBEX-525A/DA-IPOD
}
print_r($item->review); //prints review data
}
You can use as example
$xml="".$rs_rssfeed['rss_url']."";
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
$xpath = new DOMXPath($xmlDoc);
$count = $xpath->evaluate('count(//item)');
if($count>0)
{
$x=$xmlDoc->getElementsByTagName('item');
for ($i=0; $i<$count = $xpath->evaluate('count(//item)'); $i++)
{
$item_title=$x->item($i)->getElementsByTagName('title')->item(0)->nodeValue;
$item_link=$x->item($i)->getElementsByTagName('link')->item(0)->nodeValue;
$item_desc=$x->item($i)->getElementsByTagName('description')->item(0)->nodeValue;
$item_pubdate=$x->item($i)->getElementsByTagName('pubDate')->item(0)->nodeValue;
}
}
I'm exploring XML and PHP, mostly XPath and other parsers.
Here be the xml:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<actors>
<actor id="1">Christian Bale</actor>
<actor id="2">Liam Neeson</actor>
<actor id="3">Michael Caine</actor>
</actors>
<foo:singers>
<foo:singer id="4">Tom Waits</foo:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
<items>
<item id="7">Pizza</item>
<item id="8">Cheese</item>
<item id="9">Cane</item>
</items>
</root>
Here be my path & code:
$xml = simplexml_load_file('xpath.xml');
$result = $xml -> xpath('/root/actors');
echo '<pre>'.print_r($result,1).'</pre>';
Now, said path returns:
Array
(
[0] => SimpleXMLElement Object
(
[actor] => Array
(
[0] => Christian Bale
[1] => Liam Neeson
[2] => Michael Caine
)
)
)
Whereas a seemingly similar line of code, which I would have though would result in the singers, doesnt. Meaning:
$result = $xml -> xpath('/root/foo:singers');
Results in:
Array
(
[0] => SimpleXMLElement Object
(
)
)
Now I would've thought the foo: namespace in this case is a non-issue and both paths should result in the same sort of array of singers/actors respectively? How come that is not the case?
Thank-you!
Note: As you can probably gather I'm quite new to xml so please be gentle.
Edit: When I go /root/foo:singers/foo:singer I get results, but not before. Also with just /root I only get actors and items as results, foo:singers are completely omitted.
SimpleXML is, for a number of reasons, simply a bad API.
For most purposes I suggest PHP's DOM extension. (Or for very large documents a combination of it along with XMLReader.)
For using namespaces in xpath you'll want to register those you'd like to use, and the prefix you want to use them with, with your xpath processor.
Example:
$dom = new DOMDocument();
$dom->load('xpath.xml');
$xpath = new DOMXPath($dom);
// The prefix *can* match that used in the document, but it's not necessary.
$xpath->registerNamespace("ns", "http://www.foo.org/");
foreach ($xpath->query("/root/ns:singers") as $node) {
echo $dom->saveXML($node);
}
Output:
<foo:singers>
<foo:singer id="4">Tom Waits</foo:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
DOMXPath::query returns a DOMNodeList containing matched nodes. You can work with it essentially the same way you would in any other language with a DOM implementation.
You can use // expression like:
$xml -> xpath( '//foo:singer' );
to select all foo:singer elements no matter where they are.
EDIT:
SimpleXMLElement is selected, you just can't see the child nodes with print_r(). Use SimpleXMLElement methods like SimpleXMLElement::children to access them.
// example 1
$result = $xml->xpath( '/root/foo:singers' );
foreach( $result as $value ) {
print_r( $value->children( 'foo', TRUE ) );
}
// example 2
print_r( $result[0]->children( 'foo', TRUE )->singer );
I'm trying to teach myself to handle the SimpleXMP_read_file command / object.
So I have looked deeply into the problem at "simpleXMLElement attributes and foreach" (
simpleXMLElement attributes and foreach ).
copied it bit by bit into my PHP browser and ran it.
test.xml:
<?xml version="1.0" encoding="utf-8"?>
<response result="0">
<reports>
<get count="2">
<row a="first" b="second" comment="test" c=""/>
<row a="first1" b="second2" comment="test2" c=""/>
</get>
</reports>
</response>
modified the php like this:
PHP:
$xml = simplexml_load_file('test.xml');
$rows = $xml->xpath('reports/get/row');
foreach($rows as $row)
{
foreach($row->attributes() as $key)
{
echo ('test: '.$key['a'] .' '.$key['b'].' '.$key['comment'].' '.$key['c'].'<br>') ;
}
}
I get no errors but only 2 lines :
test
test
No data.
Can anyone tell me why ?
You are doing a foreach over $row->attributes(). Therefore each iteration of the loop is a different attribute. None of the attributes have a $key['a'] value set.
You probably want to do:
foreach($rows as $row){
$key = $row->attributes();
echo 'test: '.$key['a'] .' '.$key['b'].' '.$key['comment'].' '.$key['c'].'<br>';
}
after doing a print_r($rows); i have got the following. Now you can access the array elements and class objects directly by $row->attributes['a'] etc.
foreach($rows as $row){
$xmlObjElement = json_decode(json_encode((array)$row), TRUE);
foreach($xmlObjElement as $fo){
print_r( $fo );
}
}
Output:
Array
(
[a] => first1
[b] => second2
[comment] => test2
[c] =>
)
Now you can access like $fo['a'] etc...
I have some xml, this is a simple version of it.
<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>
Using SimpleXML on the content,
$obj = simplexml_load_string( $xml );
I can use $obj->xpath( '//items/item' ); and get access to the #attributes.
I need an array result, so I have tried the json_decode(json_encode($obj),true) trick, but that looks to be removing access to the #attributes (ie. abc="123").
Is there another way of doing this, that provides access to the attributes and leaves me with an array?
You need to call attributes() function.
Sample code:
$xmlString = '<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>';
$xml = new SimpleXMLElement($xmlString);
foreach( $xml->items->item as $value){
$my_array[] = strval($value->attributes());
}
print_r($my_array);
Eval
You can go the route with json_encode and json_decode and you can add the stuff you're missing because that json_encode-ing follows some specific rules with SimpleXMLElement.
If you're interested into the rules and their details, I have written two blog-posts about it:
SimpleXML and JSON Encode in PHP – Part I
SimpleXML and JSON Encode in PHP – Part II
For you perhaps more interesing is the third part which shows how you can modify the json serialization and provide your own format (e.g. to preserve the attributes):
SimpleXML and JSON Encode in PHP – Part III and End
It ships with a full blown example, here is an excerpt in code:
$xml = '<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>';
$obj = simplexml_load_string($xml, 'JsonXMLElement');
echo $json = json_encode($obj, JSON_PRETTY_PRINT), "\n";
print_r(json_decode($json, TRUE));
Output of JSON and the array is as following, note that the attributes are part of it:
{
"items": {
"item": [
{
"#attributes": {
"abc": "123"
},
"#text": "item one"
},
{
"#attributes": {
"abc": "456"
},
"#text": "item two"
}
]
}
}
Array
(
[items] => Array
(
[item] => Array
(
[0] => Array
(
[#attributes] => Array
(
[abc] => 123
)
[#text] => item one
)
[1] => Array
(
[#attributes] => Array
(
[abc] => 456
)
[#text] => item two
)
)
)
)
$xml = new SimpleXMLElement($xmlString);
$xml is now an object. To get the value of an attribute:
$xml->something['id'];
Where 'id' is the name of the attribute.
While it's theoretically possible to write a generic conversion from XML to PHP or JSON structures, it is very hard to capture all the subtleties that might be present - the distinction between child elements and attributes, text content alongside attributes (as you have here) or even alongside child elements, multiple child nodes with the same name, whether order of child elements and text nodes is important (e.g. in XHTML or DocBook), etc, etc.
If you have a specific format you need to produce, it will generally be much easier to use an API - like SimpleXML - to loop over the XML and produce the structure you need.
You don't specify the structure you want to achieve, but the general approach given your input would be to loop over each item, and either access known attributes, or loop over each attribute:
$sxml = simplexml_load_string( $xml );
$final_array = array();
foreach ( $sxml->items->item as $xml_item )
{
$formatted_item = array();
// Text content of item
$formatted_item['content'] = (string)$xml_item;
// Specifically get 'abc' attribute
$formatted_item['abc'] = (string)$xml_item['abc'];
// Maybe one of the attributes is an integer
$formatted_item['foo_id'] = (int)$xml_item['foo_id'];
// Or maybe you want to loop over lots of possible attributes
foreach ( $xml_item->attributes() as $attr_name => $attr_value )
{
$formatted_item['attrib:' . $attr_name] = (string)$attr_value;
}
// Add it to a final list
$final_array[] = $formatted_item;
// Or maybe you want that array to be keyed on one of the attributes
$final_array[ (string)$xml_item['key'] ] = $formatted_item;
}
Here is a class I've found that is able to process XML into array very nicely: http://outlandish.com/blog/xml-to-json/ (backup). Converting to json is a matter of a json_encode() call.
Here is the XML
<us:ItemMaster>
<us:ItemMasterHeader>
<oa:ItemID agencyRole="Product_Number">
<oa:ID>9227950</oa:ID>
</oa:ItemID>
<oa:ItemID agencyRole="Prefix_Number">
<oa:ID>AAG</oa:ID>
</oa:ItemID>
<oa:ItemID agencyRole="Stock_Number_Butted">
<oa:ID>5035</oa:ID>
</oa:ItemID>
<oa:ItemID agencyRole="Manufacturer_Sku_Number">
<oa:ID>5035</oa:ID>
</oa:ItemID>
</us:ItemMasterHeader>
</us:ItemMaster>
I want to extract the Product_Number, Prefix_Number, Stock_Number_Butted and Manufacturer_Sku_Number.
Can you advice how to do it using regex in php?
I don't want to use xml parser for this, that is getting very lengthy, as i have so many large xml files to process.
Thanks!
Update:
For those who are seeking the same
Found xpath is the best way to proceed and i found this link very helpful.
Here is the code:
<?php
echo "<pre>";
$info = array();
$xmlStr = file_get_contents("http://officedealersolution.highviews.co.cc/sftp/ecdb.individual_items/AAG5035.xml");
$xml = new SimpleXMLElement($xmlStr);
$res = $xml->xpath("//us:DataArea/us:ItemMaster/us:ItemMasterHeader/oa:ItemID[#agencyRole=\"Product_Number\"]/oa:ID");
$info['Product_Number'] = $res[0];
$res = $xml->xpath("//us:DataArea/us:ItemMaster/us:ItemMasterHeader/oa:ItemID[#agencyRole=\"Prefix_Number\"]/oa:ID");
$info['Prefix_Number'] = $res[0];
$res = $xml->xpath("//us:DataArea/us:ItemMaster/us:ItemMasterHeader/oa:ItemID[#agencyRole=\"Stock_Number_Butted\"]/oa:ID");
$info['Stock_Number_Butted'] = $res[0];
$res = $xml->xpath("//us:DataArea/us:ItemMaster/us:ItemMasterHeader/oa:ItemID[#agencyRole=\"Manufacturer_Sku_Number\"]/oa:ID");
$info['Manufacturer_Sku_Number'] = $res[0];
print_r($info);
echo "</pre>";
?>
Outputs:
Array
(
[Product_Number] => SimpleXMLElement Object
(
[0] => 9227950
)
[Prefix_Number] => SimpleXMLElement Object
(
[0] => AAG
)
[Stock_Number_Butted] => SimpleXMLElement Object
(
[0] => 5035
)
[Manufacturer_Sku_Number] => SimpleXMLElement Object
(
[0] => 5035
)
)
Here is a very good xpath tutorial by w3schools http://www.w3schools.com/xpath/xpath_syntax.asp
When all you use is a hammer, everything looks like a nail.
Regex is completely the wrong tool for the job. Use one of PHP's XML extensions (such as DOMDocument) instead.
If the file is valid XML, the following code will get what you want assuming $data contains the XML data as string.
$xml = new SimpleXmlElement($data);
$nss = $xml->getNamespaces(true);
$us = $xml->children($nss['us']);
$im = $us->ItemMaster;
$imh = $im->ItemMasterHeader;
$oa = $imh->children($nss['oa']);
$parsed_data=array();
foreach($oa->ItemID as $item_id){
$attr = $item_id->attributes();
$role = (string)($attr->agencyRole);
$id = (string)($item_id->ID);
$parsed_data[$role] = $id;
}
print_r($parsed_data);