How to reach till the desired node in xpath result? - php

As I have mentioned in question title, I am trying below code to reach till the desired node in xpath result.
<?php
$xpath = '//*[#id="topsection"]/div[3]/div[2]/div[1]/div/div[1]';
$html = new DOMDocument();
#$html->loadHTMLFile('http://www.flipkart.com/samsung-galaxy-ace-s5830/p/itmdfndpgz4nbuft');
$xml = simplexml_import_dom($html);
if (!$xml) {
echo 'Error while parsing the document';
exit;
}
$source = $xml->xpath($xpath);
echo "<pre>";
print_r($source);
?>
this is the source code. I am using to scrap price from a ecommerce.
it works it gives below output :
Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[class] => line
)
[div] => SimpleXMLElement Object
(
[#attributes] => Array
(
[class] => prices
[itemprop] => offers
[itemscope] =>
[itemtype] => http://schema.org/Offer
)
[span] => Rs. 10300
[div] => (Prices inclusive of taxes)
[meta] => Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[itemprop] => price
[content] => Rs. 10300
)
)
[1] => SimpleXMLElement Object
(
[#attributes] => Array
(
[itemprop] => priceCurrency
[content] => INR
)
)
)
)
)
)
Now How to reach till directly [content] => Rs. 10300.
I tried:
echo $source[0]['div']['meta']['#attributes']['content']
but it doesn't work.

Try echo (String) $source[0]->div->meta[0]['content'];.
Basically, when you see an element is an object, you can't access it like an array, you need to use object -> approach.

The print_r of a SimpleXMLElement does not show the real object structure. So you need to have some knowledge:
$source[0]->div->meta['content']
| | | `- attribute acccess
| | `- element access, defaults to the first one
| `- element access, defaults to the first one
|
standard array access to get
the first SimpleXMLElement of xpath()
operation
That example then is (with your address) the following (print_r again, Demo):
SimpleXMLElement Object
(
[0] => Rs. 10300
)
Cast it to string in case you want the text-value:
$rs = (string) $source[0]->div->meta['content'];
However you can already directly access that node with the xpath expression (if that is a single case).
Learn more on how to access a SimpleXMLElement in the Basic SimpleXML usage ExamplesDocs.

Related

SimpleXML Skipping Attributes

Test XML:
<?xml version="1.0" encoding="UTF-8"?>
<Transfer>
<ABR recordLastUpdatedDate="20180329" replaced="N">
<ABN status="ACT" ABNStatusFromDate="20000214">80007321682</ABN>
<EntityType>
<EntityTypeInd>PUB</EntityTypeInd>
<EntityTypeText>Australian Public Company</EntityTypeText>
</EntityType>
<MainEntity>
<NonIndividualName type="MN">
<NonIndividualNameText>BLACK CABS COMBINED PTY LTD</NonIndividualNameText>
</NonIndividualName>
<BusinessAddress>
<AddressDetails>
<State>VIC</State>
<Postcode>3166</Postcode>
</AddressDetails>
</BusinessAddress>
</MainEntity>
</ABR>
</Transfer>
PHP Script:
$f='test.xml';
$reader=new XMLReader();
$reader->open($f);
while($reader->read()){
if($reader->nodeType==XMLReader::ELEMENT && $reader->name=='ABR'){
$doc=new DOMDocument('1.0','UTF-8');
$xml=simplexml_import_dom($doc->importNode($reader->expand(),true));
print_r($xml);
}
}
$reader->close();
PHP Output:
SimpleXMLElement Object
(
[#attributes] => Array
(
[recordLastUpdatedDate] => 20180329
[replaced] => N
)
[ABN] => 80007321682
[EntityType] => SimpleXMLElement Object
(
[EntityTypeInd] => PUB
[EntityTypeText] => Australian Public Company
)
[MainEntity] => SimpleXMLElement Object
(
[NonIndividualName] => SimpleXMLElement Object
(
[#attributes] => Array
(
[type] => MN
)
[NonIndividualNameText] => BLACK CABS COMBINED PTY LTD
)
[BusinessAddress] => SimpleXMLElement Object
(
[AddressDetails] => SimpleXMLElement Object
(
[State] => VIC
[Postcode] => 3166
)
)
)
)
The Problem:
The attributes for the ABN element (status and ABNStatusFromDate) are not in the output, even though other attributes are.
Please help me understand why those attributes in particular are missing.
PS - Dummy text so SO doesn't give me warnings about my post being mostly code
Answer: print_r is not meant to be used to display a SimpleXML object.
I can access the attribute directly via $xml->ABN['status'].

Access xml object

I'm using YouTube API. I got following result from API :
SimpleXMLElement Object
(
[#attributes] => Array
(
[rel] => alternate
[href] => http://www.youtube.com/watch?v=blabla
)
)
I'm confuse with this object. I want to access #attributes. How can i do it?
The #attributes part of the print_r output are just the element's attributes which can be accessed via $obj['attrname'].
<?php
$obj = new SimpleXMLElement('<foo rel="alternate" href="http://www.youtube.com/watch?v=blabla" />');
print_r($obj); // to verify that the sample data fits your actual data
echo $obj['rel'], ' | ', $obj['href'];
prints
SimpleXMLElement Object
(
[#attributes] => Array
(
[rel] => alternate
[href] => http://www.youtube.com/watch?v=blabla
)
)
alternate | http://www.youtube.com/watch?v=blabla
see also Example #5 Using attributes in the SimpleXML documentation.

Export Array of SimpleXMLElement objects to MySQL Database

I have a SOAP Response from a Web Service and have extracted the XML data as a SimpleXMLElement. I have then iterated through this object to extract the various fields I need and save them into an array, which becomes an array of SimpleXMLElement objects.
I am now trying to export this data into a MySQL Database which, according to my research, means turning the array into a String and then using mysql_query("INSERT INTO (whatever) VALUES (whatever)");. I have tried implode and serialize but neither work and I get the error:
Fatal error: Uncaught exception 'Exception' with message 'Serialization of 'SimpleXMLElement' is not allowed'
This is what the array I have created from the SimpleXMLELement looks like:
Array
(
[0] => Array
(
[uid] => SimpleXMLElement Object
(
[0] => WOS:000238186400009
)
[journal] => SimpleXMLElement Object
(
[#attributes] => Array
(
[type] => source
)
)
[publication] => SimpleXMLElement Object
(
[#attributes] => Array
(
[type] => item
)
[0] => Abundance of hedgehogs (Erinaceus europaeus) in relation to the density and distribution of badgers (Meles meles)
)
[year] => 2006
[author1] => SimpleXMLElement Object
(
[0] => Young, RP
)
[address] => SimpleXMLElement Object
(
[0] => Cent Sci Lab, Sand Hutton, Yorks, England
)
[author2] => SimpleXMLElement Object
(
[0] => Davison, J
)
[author3] => SimpleXMLElement Object
(
[0] => Trewby, ID
)
[citations] => SimpleXMLElement Object
(
[#attributes] => Array
(
[local_count] => 15
[coll_id] => WOS
)
)
) ... etc ...
)
Can anyone help me with the method to get this data into my database, please? Do I need to change it into (yet) another format?
You have to iterate through your array to create a new array fulfilled with strings instead of SimpleXMLElement, such as :
<?php
// your array (already built)
$arraySimpleXml = array(
"example1" => new SimpleXMLElement("<test>value</test>"),
"example2" => new SimpleXMLElement("<test>value2</test>")
);
// output array, to store in database
$result = array();
foreach($arraySimpleXml as $key => $simpleXml) {
$result[$key] = $simpleXml->asXML();
}
// gets your result as a string => you can now insert it into mysql
$dbInsertion = serialize($result);
?>
So I worked out how to change the data into a standard array rather than an array of SimpleXMLElements so that I can successfully insert it into a MySQL database.
When iterating the SimpleXMLElement object to extract the data I needed I cast the type as String so that now my array has the format (as opposed to above):
Array
(
[0] => Array
(
[uid] => WOS:000238186400009
[journal] => JOURNAL OF ZOOLOGY
[publication] => Abundance of hedgehogs (Erinaceus europaeus) in relation to the density and distribution of badgers (Meles meles)
[year] => 2006
[author1] => Young, RP
[address] => Cent Sci Lab, Sand Hutton, Yorks, England
[author2] => Davison, J
[author3] => Trewby, ID
[citations] => 15
)
)
Thought I'd post this in case anyone has a similar problem in future. To do this, when iterating the data instead of:
$uid = $record->UID;
I did:
$uid = (string)$record->UID;
For each of the data fields I required. This ensures the data is stored as a String and so removes the SimpleXMLElement format.

How to set attribute for nodes with text content?

I am trying to iterate over set of nodes given by xpath and set certain attribute for each node. However it works only for nodes withou content or with empty (whitespace) content. I have tried 2 approaches but with the same result (maybe they are both the same on some deeper level, dunno). The commented line is the second approach.
$temp = simplexml_load_string (
'<toolbox>
<hammer/>
<screwdriver> </screwdriver>
<knife>
sharp
</knife>
</toolbox>' );
echo "vanilla toolbox: ";
print_r($temp);
$nodes = $temp->xpath('//*[not(#id)]');
foreach($nodes as $obj) {
$tempdom = dom_import_simplexml($obj);
$tempdom->setAttributeNode(new DOMAttr('id', 5));
//$obj->addAttribute('bagr', 5);
}
echo "processed toolbox: ";
print_r($temp);
This is output. Attribute id is missing in node knife.:
vanilla toolbox: SimpleXMLElement Object
(
[hammer] => SimpleXMLElement Object
(
)
[screwdriver] => SimpleXMLElement Object
(
[0] =>
)
[knife] =>
sharp
)
processed toolbox: SimpleXMLElement Object
(
[#attributes] => Array
(
[id] => 5
)
[hammer] => SimpleXMLElement Object
(
[#attributes] => Array
(
[id] => 5
)
)
[screwdriver] => SimpleXMLElement Object
(
[#attributes] => Array
(
[id] => 5
)
[0] =>
)
[knife] =>
sharp
I'm unable to reproduce what you describe, the changed XML is:
<?xml version="1.0"?>
<toolbox id="5">
<hammer id="5"/>
<screwdriver id="5"> </screwdriver>
<knife id="5">
sharp
</knife>
</toolbox>
Demo
It's exactly your code, maybe you're using a different LIBXML version? See the LIBXML_VERSION constant (codepad viper has 20626 (2.6.26)).
But probably it's just only the print_r output for a SimpleXMLElement object.
It does not output the attributes for the last element, even on a brand new object, but it's still possible to access the attribute. Demo.
You will see when you print_r($temp->knife['id']); that the attribute is set (as you can see in the earlier XML output).

php SimpleXml & arrays issue

When I print_r($var) I get the result below.
SimpleXMLElement Object
(
[SEND_FILE] => SimpleXMLElement Object
(
[FILEID] => 123
[GUID] => 456
[SUMMARY] => SimpleXMLElement Object
(
[NB_PAYMENTS] => 1
)
)
)
How can I get the value of the FILEID element in a variable? If I do
print $result->SEND_FILE->FILEID[0]
then I just get the number - what I want, no mention of a SimpleXML Object.
But if I put this variable in an array, as such
$res['file_id'] = $result->SEND_FILE->FILEID[0]
and then print_r($res) I get:
Array
(
[file_id] => SimpleXMLElement Object
(
[0] => 307466
)
)
How can I get it to remove the [0] / SimpleXMLElement Object?
This will look not too elegant, but try casting the result to integer (if the type is known):
$res['file_id'] = (int)$result->SEND_FILE->FILEID[0]
Why do you append the [0] at the end? You dont need that. You should simply do
print $result->SEND_FILE->FILEID;
And that should be enough.

Categories