Here is the XML
<us:ItemMaster>
<us:ItemMasterHeader>
<oa:ItemID agencyRole="Product_Number">
<oa:ID>9227950</oa:ID>
</oa:ItemID>
<oa:ItemID agencyRole="Prefix_Number">
<oa:ID>AAG</oa:ID>
</oa:ItemID>
<oa:ItemID agencyRole="Stock_Number_Butted">
<oa:ID>5035</oa:ID>
</oa:ItemID>
<oa:ItemID agencyRole="Manufacturer_Sku_Number">
<oa:ID>5035</oa:ID>
</oa:ItemID>
</us:ItemMasterHeader>
</us:ItemMaster>
I want to extract the Product_Number, Prefix_Number, Stock_Number_Butted and Manufacturer_Sku_Number.
Can you advice how to do it using regex in php?
I don't want to use xml parser for this, that is getting very lengthy, as i have so many large xml files to process.
Thanks!
Update:
For those who are seeking the same
Found xpath is the best way to proceed and i found this link very helpful.
Here is the code:
<?php
echo "<pre>";
$info = array();
$xmlStr = file_get_contents("http://officedealersolution.highviews.co.cc/sftp/ecdb.individual_items/AAG5035.xml");
$xml = new SimpleXMLElement($xmlStr);
$res = $xml->xpath("//us:DataArea/us:ItemMaster/us:ItemMasterHeader/oa:ItemID[#agencyRole=\"Product_Number\"]/oa:ID");
$info['Product_Number'] = $res[0];
$res = $xml->xpath("//us:DataArea/us:ItemMaster/us:ItemMasterHeader/oa:ItemID[#agencyRole=\"Prefix_Number\"]/oa:ID");
$info['Prefix_Number'] = $res[0];
$res = $xml->xpath("//us:DataArea/us:ItemMaster/us:ItemMasterHeader/oa:ItemID[#agencyRole=\"Stock_Number_Butted\"]/oa:ID");
$info['Stock_Number_Butted'] = $res[0];
$res = $xml->xpath("//us:DataArea/us:ItemMaster/us:ItemMasterHeader/oa:ItemID[#agencyRole=\"Manufacturer_Sku_Number\"]/oa:ID");
$info['Manufacturer_Sku_Number'] = $res[0];
print_r($info);
echo "</pre>";
?>
Outputs:
Array
(
[Product_Number] => SimpleXMLElement Object
(
[0] => 9227950
)
[Prefix_Number] => SimpleXMLElement Object
(
[0] => AAG
)
[Stock_Number_Butted] => SimpleXMLElement Object
(
[0] => 5035
)
[Manufacturer_Sku_Number] => SimpleXMLElement Object
(
[0] => 5035
)
)
Here is a very good xpath tutorial by w3schools http://www.w3schools.com/xpath/xpath_syntax.asp
When all you use is a hammer, everything looks like a nail.
Regex is completely the wrong tool for the job. Use one of PHP's XML extensions (such as DOMDocument) instead.
If the file is valid XML, the following code will get what you want assuming $data contains the XML data as string.
$xml = new SimpleXmlElement($data);
$nss = $xml->getNamespaces(true);
$us = $xml->children($nss['us']);
$im = $us->ItemMaster;
$imh = $im->ItemMasterHeader;
$oa = $imh->children($nss['oa']);
$parsed_data=array();
foreach($oa->ItemID as $item_id){
$attr = $item_id->attributes();
$role = (string)($attr->agencyRole);
$id = (string)($item_id->ID);
$parsed_data[$role] = $id;
}
print_r($parsed_data);
Related
I'm exploring XML and PHP, mostly XPath and other parsers.
Here be the xml:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:foo="http://www.foo.org/" xmlns:bar="http://www.bar.org">
<actors>
<actor id="1">Christian Bale</actor>
<actor id="2">Liam Neeson</actor>
<actor id="3">Michael Caine</actor>
</actors>
<foo:singers>
<foo:singer id="4">Tom Waits</foo:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
<items>
<item id="7">Pizza</item>
<item id="8">Cheese</item>
<item id="9">Cane</item>
</items>
</root>
Here be my path & code:
$xml = simplexml_load_file('xpath.xml');
$result = $xml -> xpath('/root/actors');
echo '<pre>'.print_r($result,1).'</pre>';
Now, said path returns:
Array
(
[0] => SimpleXMLElement Object
(
[actor] => Array
(
[0] => Christian Bale
[1] => Liam Neeson
[2] => Michael Caine
)
)
)
Whereas a seemingly similar line of code, which I would have though would result in the singers, doesnt. Meaning:
$result = $xml -> xpath('/root/foo:singers');
Results in:
Array
(
[0] => SimpleXMLElement Object
(
)
)
Now I would've thought the foo: namespace in this case is a non-issue and both paths should result in the same sort of array of singers/actors respectively? How come that is not the case?
Thank-you!
Note: As you can probably gather I'm quite new to xml so please be gentle.
Edit: When I go /root/foo:singers/foo:singer I get results, but not before. Also with just /root I only get actors and items as results, foo:singers are completely omitted.
SimpleXML is, for a number of reasons, simply a bad API.
For most purposes I suggest PHP's DOM extension. (Or for very large documents a combination of it along with XMLReader.)
For using namespaces in xpath you'll want to register those you'd like to use, and the prefix you want to use them with, with your xpath processor.
Example:
$dom = new DOMDocument();
$dom->load('xpath.xml');
$xpath = new DOMXPath($dom);
// The prefix *can* match that used in the document, but it's not necessary.
$xpath->registerNamespace("ns", "http://www.foo.org/");
foreach ($xpath->query("/root/ns:singers") as $node) {
echo $dom->saveXML($node);
}
Output:
<foo:singers>
<foo:singer id="4">Tom Waits</foo:singer>
<foo:singer id="5">B.B. King</foo:singer>
<foo:singer id="6">Ray Charles</foo:singer>
</foo:singers>
DOMXPath::query returns a DOMNodeList containing matched nodes. You can work with it essentially the same way you would in any other language with a DOM implementation.
You can use // expression like:
$xml -> xpath( '//foo:singer' );
to select all foo:singer elements no matter where they are.
EDIT:
SimpleXMLElement is selected, you just can't see the child nodes with print_r(). Use SimpleXMLElement methods like SimpleXMLElement::children to access them.
// example 1
$result = $xml->xpath( '/root/foo:singers' );
foreach( $result as $value ) {
print_r( $value->children( 'foo', TRUE ) );
}
// example 2
print_r( $result[0]->children( 'foo', TRUE )->singer );
I'm working with a web service that is submitting an HTTP POST to a PHP page as follows:
FORM/POST PARAMETERS:
None
HEADERS:
Content-Type: text/xml
BODY:
<?xml version="1.0"?>
<mogreet>
<event>message-in</event>
<type>command_sms</type>
<campaign_id>12345</campaign_id>
<shortcode>123456</shortcode>
<msisdn>15552345678</msisdn>
<carrier><![CDATA[T-Mobile]]></carrier>
<carrier_id>2</carrier_id>
<message><![CDATA[xxxx testing]]></message>
</mogreet>
I need to be able to convert each of the XML elements into PHP variables so I can update a database. I've never had to work with an incoming POST with XML data before and not sure where to starT - I am familiar with processing incoming GET/POST requests but not raw xml.
I think you'll need to use $HTTP_RAW_POST_DATA. After that then you could use SimpleXMLElement as #ChristianGolihardt suggested.
Note that HTTP_RAW_POST_DATA is only available if the always_populate_raw_post_data setting has been enabled in php.ini. Otherwise, it may be easiest to do this:
$postData = file_get_contents("php://input");
...
$xml = new SimpleXMLElement($postData);
...
This will eliminate all the SimpleXMLElement objects and return your array:
from an xml string:
<?php
$xml='<?xml version="1.0"?>
<mogreet>
<event>message-in</event>
<type>command_sms</type>
<campaign_id>12345</campaign_id>
<shortcode>123456</shortcode>
<msisdn>15552345678</msisdn>
<carrier><![CDATA[T-Mobile]]></carrier>
<carrier_id>2</carrier_id>
<message><![CDATA[xxxx testing]]></message>
</mogreet>';
$xml = simplexml_load_string($xml);
$xml_array = json_decode(json_encode((array) $xml), 1);
print_r($xml_array);
?>
from an xml file:
$xml = simplexml_load_file("mogreet.xml");
$xml_array = json_decode(json_encode((array) $xml), 1);
print_r($xml_array);
output:
Array
(
[event] => message-in
[type] => command_sms
[campaign_id] => 12345
[shortcode] => 123456
[msisdn] => 15552345678
[carrier] => Array
(
)
[carrier_id] => 2
[message] => Array
(
)
)
Take a look at SimpleXMLElement:
http://php.net/manual/de/class.simplexmlelement.php
$xmlstr = $_POST['key'];
$xml = new SimpleXMLElement($xmlstr);
//work with $xml like this:
$event = $xml->mogreet->event;
You can see the key if you do this:
print_r($_POST);
Most times, we work with this kind of api, we want to log it, because we can not see it:
$debugFile = 'debug.log'
file_put_contents($debugFile, print_r($_POST, true), FILE_APPEND);
Take also a look at the Answer from Matt Browne, for getting Raw Input.
i have used one xml file in that file there is one tag like
<image><![CDATA[ _abc.jpg ]]></image>
$xml = simplexml_load_file('test1.xml');
foreach($xml as $product)
{
echo $product->image;
}
Please tell me how to parse data with in php?
You could make of simplexml_load_string with these libxml options
<?php
$xml='<image><![CDATA[ _abc.jpg ]]></image>';
$xml = simplexml_load_string($xml,'SimpleXMLElement', LIBXML_NOCDATA | LIBXML_NOBLANKS);
print_r($xml);
OUTPUT :
SimpleXMLElement Object
(
[0] => _abc.jpg
)
To just print the _abc.jpg , Just do this. echo $xml[0];
I have some xml, this is a simple version of it.
<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>
Using SimpleXML on the content,
$obj = simplexml_load_string( $xml );
I can use $obj->xpath( '//items/item' ); and get access to the #attributes.
I need an array result, so I have tried the json_decode(json_encode($obj),true) trick, but that looks to be removing access to the #attributes (ie. abc="123").
Is there another way of doing this, that provides access to the attributes and leaves me with an array?
You need to call attributes() function.
Sample code:
$xmlString = '<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>';
$xml = new SimpleXMLElement($xmlString);
foreach( $xml->items->item as $value){
$my_array[] = strval($value->attributes());
}
print_r($my_array);
Eval
You can go the route with json_encode and json_decode and you can add the stuff you're missing because that json_encode-ing follows some specific rules with SimpleXMLElement.
If you're interested into the rules and their details, I have written two blog-posts about it:
SimpleXML and JSON Encode in PHP – Part I
SimpleXML and JSON Encode in PHP – Part II
For you perhaps more interesing is the third part which shows how you can modify the json serialization and provide your own format (e.g. to preserve the attributes):
SimpleXML and JSON Encode in PHP – Part III and End
It ships with a full blown example, here is an excerpt in code:
$xml = '<xml>
<items>
<item abc="123">item one</item>
<item abc="456">item two</item>
</items>
</xml>';
$obj = simplexml_load_string($xml, 'JsonXMLElement');
echo $json = json_encode($obj, JSON_PRETTY_PRINT), "\n";
print_r(json_decode($json, TRUE));
Output of JSON and the array is as following, note that the attributes are part of it:
{
"items": {
"item": [
{
"#attributes": {
"abc": "123"
},
"#text": "item one"
},
{
"#attributes": {
"abc": "456"
},
"#text": "item two"
}
]
}
}
Array
(
[items] => Array
(
[item] => Array
(
[0] => Array
(
[#attributes] => Array
(
[abc] => 123
)
[#text] => item one
)
[1] => Array
(
[#attributes] => Array
(
[abc] => 456
)
[#text] => item two
)
)
)
)
$xml = new SimpleXMLElement($xmlString);
$xml is now an object. To get the value of an attribute:
$xml->something['id'];
Where 'id' is the name of the attribute.
While it's theoretically possible to write a generic conversion from XML to PHP or JSON structures, it is very hard to capture all the subtleties that might be present - the distinction between child elements and attributes, text content alongside attributes (as you have here) or even alongside child elements, multiple child nodes with the same name, whether order of child elements and text nodes is important (e.g. in XHTML or DocBook), etc, etc.
If you have a specific format you need to produce, it will generally be much easier to use an API - like SimpleXML - to loop over the XML and produce the structure you need.
You don't specify the structure you want to achieve, but the general approach given your input would be to loop over each item, and either access known attributes, or loop over each attribute:
$sxml = simplexml_load_string( $xml );
$final_array = array();
foreach ( $sxml->items->item as $xml_item )
{
$formatted_item = array();
// Text content of item
$formatted_item['content'] = (string)$xml_item;
// Specifically get 'abc' attribute
$formatted_item['abc'] = (string)$xml_item['abc'];
// Maybe one of the attributes is an integer
$formatted_item['foo_id'] = (int)$xml_item['foo_id'];
// Or maybe you want to loop over lots of possible attributes
foreach ( $xml_item->attributes() as $attr_name => $attr_value )
{
$formatted_item['attrib:' . $attr_name] = (string)$attr_value;
}
// Add it to a final list
$final_array[] = $formatted_item;
// Or maybe you want that array to be keyed on one of the attributes
$final_array[ (string)$xml_item['key'] ] = $formatted_item;
}
Here is a class I've found that is able to process XML into array very nicely: http://outlandish.com/blog/xml-to-json/ (backup). Converting to json is a matter of a json_encode() call.
-
Hello Everyone,
I'm trying to access data in a XML file:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://dublincore.org/documents/dcmi- namespace/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd";>
<responseDate>2013-04-15T12:14:31Z</responseDate>
<ListRecords>
<record>
<header>
<identifier>
a1b31ab2-9efe-11df-9922-efbb156aa6c1:01442b82-59a4-627e-800f-c63de74fc109
</identifier>
<datestamp>2012-08-16T14:42:52Z</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd";>
<dc:description>...</dc:description>
<dc:date>1921</dc:date>
<dc:identifier>K11510</dc:identifier>
<dc:source>Waterschap Vallei & Eem</dc:source>
<dc:source>...</dc:source>
<dc:source>610</dc:source>
<dc:coverage>Bunschoten</dc:coverage>
<dc:coverage>Veendijk</dc:coverage>
<dc:coverage>Spakenburg</dc:coverage>
</oai_dc:dc>
</metadata>
<about>...</about>
</record>
This a a example of the XML.
I need to access data like dc:date dc:source etc.
Anyone any ideas?
Best regards,
Tim
-- UPDATE --
I'm now trying this:
foreach( $xml->ListRecords as $records )
{
foreach( $records AS $record )
{
$data = $record->children( 'http://www.openarchives.org/OAI/2.0/oai_dc/' );
$rows = $data->children( 'http://purl.org/dc/elements/1.1/' );
echo $rows->date;
break;
}
break;
}
You have nested elements that are in different XML namespaces. In concrete you have got two additional namespaces involved:
$nsUriOaiDc = 'http://www.openarchives.org/OAI/2.0/oai_dc/';
$nsUriDc = 'http://purl.org/dc/elements/1.1/';
The first one is for the <oai_dc:dc> element which contains the second ones * <dc:*>* elements like <dc:description> and so on. Those are the elements you're looking for.
In your code you already have a good nose how this works:
$data = $record->children( 'http://www.openarchives.org/OAI/2.0/oai_dc/' );
$rows = $data->children( 'http://purl.org/dc/elements/1.1/' );
However there is a little mistake: the $data children are not children of $record but of $record->metadata.
You also do not need to nest two foreach into each other. The code example:
$nsUriOaiDc = 'http://www.openarchives.org/OAI/2.0/oai_dc/';
$nsUriDc = 'http://purl.org/dc/elements/1.1/';
$records = $xml->ListRecords->record;
foreach ($records as $record)
{
$data = $record->metadata->children($nsUriOaiDc);
$rows = $data->children($nsUriDc);
echo $rows->date;
break;
}
/** output: 1921 **/
If you are running into problems like these, you can make use of $record->asXML('php://output'); to show which element(s) you are currently traversing to.
I think this is what you're looking for. Hope it helps ;)
use DomDocument for this like access to dc:date
$STR='
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dc="http://dublincore.org/documents/dcmi- namespace/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd";>
<responseDate>2013-04-15T12:14:31Z</responseDate>
<ListRecords>
<record>
<header> <identifier> a1b31ab2-9efe-11df-9922-efbb156aa6c1:01442b82-59a4-627e-800f-c63de74fc109 </identifier>
<datestamp>2012-08-16T14:42:52Z</datestamp>
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd";>
<dc:description>...</dc:description>
<dc:date>1921</dc:date>
<dc:identifier>K11510</dc:identifier>
<dc:source>Waterschap Vallei & Eem</dc:source>
<dc:source>...</dc:source>
<dc:source>610</dc:source>
<dc:coverage>Bunschoten</dc:coverage>
<dc:coverage>Veendijk</dc:coverage>
<dc:coverage>Spakenburg</dc:coverage>
</oai_dc:dc>
</metadata>
<about>...</about>
</record>';
$dom= new DOMDocument;
$STR= str_replace("&", "&", $STR); // disguise &s going IN to loadXML()
// $dom->substituteEntities = true; // collapse &s going OUT to transformToXML()
$dom->recover = TRUE;
#$dom->loadHTML('<?xml encoding="UTF-8">' .$STR);
// dirty fix
foreach ($dom->childNodes as $item)
if ($item->nodeType == XML_PI_NODE)
$dom->removeChild($item); // remove hack
$dom->encoding = 'UTF-8'; // insert proper
print_r($doc->getElementsByTagName('dc')->item(0)->getElementsByTagName('date')->item(0)->textContent);
output:
1921
or access to dc:source
$source= $doc->getElementsByTagName('dc')->item(0)->getElementsByTagName('source');
foreach($source as $value){
echo $value->textContent."\n";
}
output:
Waterschap Vallei & Eem
...
610
or give you array
$array=array();
$source= $doc->getElementsByTagName('dc')->item(0)->getElementsByTagName("*");
foreach($source as $value){
$array[$value->localName][]=$value->textContent."\n";
}
print_r($array);
output:
Array
(
[description] => Array
(
[0] => ...
)
[date] => Array
(
[0] => 1921
)
[identifier] => Array
(
[0] => K11510
)
[source] => Array
(
[0] => Waterschap Vallei & Eem
[1] => ...
[2] => 610
)
[coverage] => Array
(
[0] => Bunschoten
[1] => Veendijk
[2] => Spakenburg
)
)
Using XPath makes dealing with namespaces more straightforward:
<?php
// load the XML into a DOM document
$doc = new DOMDocument;
$doc->load('oai-response.xml'); // or use $doc->loadXML($xml) for an XML string
// bind the DOM document to an XPath object
$xpath = new DOMXPath($doc);
// map all the XML namespaces to prefixes, for use in XPath queries
$xpath->registerNamespace('oai', 'http://www.openarchives.org/OAI/2.0/');
$xpath->registerNamespace('oai_dc', 'http://www.openarchives.org/OAI/2.0/oai_dc/');
$xpath->registerNamespace('dc', 'http://purl.org/dc/elements/1.1/');
// identify each record using an XPath query
// collect data as either strings or arrays of strings
foreach ($xpath->query('oai:ListRecords/oai:record/oai:metadata/oai_dc:dc') as $item) {
$data = array(
'date' => $xpath->evaluate('string(dc:date)', $item), // $item is the context for this query
'source' => array(),
);
foreach ($xpath->query('dc:source', $item) as $source) {
$data['source'][] = $source->textContent;
}
print_r($data);
}