I am using xpath to parse text from a webpage but it returns it as an object how can i return this as a string.
libxml_use_internal_errors(TRUE);
$dom = new DOMDocument();
$dom->loadHTML($source);
$xml = simplexml_import_dom($dom);
libxml_use_internal_errors(FALSE);
$username = $xml->xpath("//span[#class='user']");
var_dump of the $username array:
object(SimpleXMLElement)#3 (2) { ["#attributes"]=> array(1) { ["class"]=> string(4) "user" } [0]=> string(11) "bubblebubble1210" }
list(, $node) = $username;
var_dump($node);
// object(SimpleXMLElement)#3 (1) { [0]=> string(11) "bubblebubble1210" }
$node will still be an object above, but you can cast it explicitly with (string) or use echo which will cast it implicitly.
CodePad.
You can use $username->asXML(); to get the full string of that particular SimpleXMLElement object.
Related
Using PHP, I would like to remove all the links in an unordered list and put them in an array. So the output would be: array[0]='Benefits', array[1]='Cost Savings', etc.
<ul>
<li>Benefits</li>
<li>Cost Savings</li>
<li>Member listing</li>
</ul>
Using; preg_match_all('/<a href=\"(.*?)\"[.*]?>(.*?)<\/a>/i', $content, $matches);
I get:
array(3) { [0]=> array(3) { [0]=> string(24) "Benefits" [1]=> string(28) "Cost Savings" [2]=> string(30) "Member listing" } [1]=> array(3) { [0]=> string(1) "#" [1]=> string(1) "#" [2]=> string(1) "#" } [2]=> array(3) { [0]=> string(8) "Benefits" [1]=> string(12) "Cost Savings" [2]=> string(14) "Member listing" } }
But i need to put it into one array.
To fetch the links you can leverage domdocument and domxpath
$html = '<html><body><ul>
<li>Benefits</li>
<li>Cost Savings</li>
<li>Member listing</li>
</ul></body></html>';
$dom = new DOMDocument();
$dom->loadHTML( $html ); // loads the html into the class
$xpath = new DOMXPath( $dom );
$items = $xpath->query('*/ul/li/a'); // matches any elements in this order
$array = array();
foreach( $items as $item )
{
$array[] = $dom->saveHTML( $item ); // using the parent document, get just a single elements html
}
// Array
// (
// [0] => Benefits
// [1] => Cost Savings
// [2] => Member listing
// )
I am uploading one xml and trying to convert that data into php array and have to store it in db. My problem is when i get empty key inside loop it automatically converts into array. but i want it as a string only. As i am getting array so its difficult me to store it in db.Please help me with the solution.
Current Output :
array(19) {
["EMP_NAME"]=>
string(12) "ABC"
["EMP_ADDRESS"]=>
string(1) "MUMBAI"
["DEPARTMENT"]=>
string(1) "IT"
["LOCATION"]=>
array(0) {
}
}
Expected Output :
array(19) {
["EMP_NAME"]=>
string(12) "ABC"
["EMP_ADDRESS"]=>
string(1) "MUMBAI"
["DEPARTMENT"]=>
string(1) "IT"
["LOCATION"]=>
string(1) ""
}
This is my php code to get data from xml and looping through array.
$xml = file_get_contents('uploads/data.xml');
$xml = simplexml_load_string($xml);
$xml_array = json_decode(json_encode((array) $xml), 1);
$data = ($xml_array);
foreach($data as $val){
//var_dump($val);
}
I tried to fix it after getting.
<?php
function fixContent(&$val, $key = null) {
if (is_array($val)) {
if (!$val) $val = '';
else fix($val);
}
}
function fix(&$arr) {
array_walk($arr, 'fixContent');
array_walk_recursive($arr, 'fixContent');
}
$xml = "<?xml version='1.0'?>
<document>
<title>Forty What?</title>
<from>Joe</from>
<to>Jane</to>
<body></body>
</document>";
$xml = simplexml_load_string($xml);
$xml_array = json_decode(json_encode((array) $xml), 1);
fix($xml_array);
$data = $xml_array;
var_dump($data);
?>
Output:
array(4) {
["title"]=>
string(11) "Forty What?"
["from"]=>
string(3) "Joe"
["to"]=>
string(4) "Jane"
["body"]=>
string(0) ""
}
Demo: https://paiza.io/projects/qQC5pfvhGz_FniCIK6S_9g
Generic conversions are often not the best solution. They allow the source to control the output - especially if you use debug features like serializing a SimpleXMLElement instance.
Read the data from XML and add it to a result. This puts your code in control of the output. You can change the keys, validate and convert values, add defaults, ...
$xml = <<<'XML'
<EMP>
<EMP_NAME>ABC</EMP_NAME>
<EMP_ADDRESS>MUMBAI</EMP_ADDRESS>
<DEPARTMENT>IT</DEPARTMENT>
<LOCATION></LOCATION>
</EMP>
XML;
$employee = new SimpleXMLElement($xml);
$result = [
'EMP_NAME' => (string)$employee->EMP_NAME,
'EMP_ADDRESS' => (string)$employee->EMP_ADDRESS,
'DEPARTMENT' => (string)$employee->DEPARTMENT,
'LOCATION' => (string)$employee->LOCATION
];
var_dump($result);
Output:
array(4) {
["EMP_NAME"]=>
string(3) "ABC"
["EMP_ADDRESS"]=>
string(6) "MUMBAI"
["DEPARTMENT"]=>
string(2) "IT"
["LOCATION"]=>
string(0) ""
}
Or with DOM:
$document = new DOMDocument();
$document->loadXML($xml);
$xpath = new DOMXpath($document);
$result = [
'EMP_NAME' => $xpath->evaluate('string(/EMP/EMP_NAME)'),
'EMP_ADDRESS' => $xpath->evaluate('string(/EMP/EMP_ADDRESS)'),
'DEPARTMENT' => $xpath->evaluate('string(/EMP/DEPARTMENT)'),
'LOCATION' => $xpath->evaluate('string(/EMP/LOCATION)')
];
var_dump($result);
<div>A/C:front<span style="color:red;margin:8px">/
</span>Anti-Lock Brakes<span style="color:red;margin:8px">/
</span>Passenger Airbag<span style="color:red;margin:8px">/
</span>Power Mirrors<span style="color:red;margin:8px">/
</span>Power Steering<span style="color:red;margin:8px">/
</span>Power Windows<span style="color:red;margin:8px">/
</span>Driver Airbag<span style="color:red;margin:8px">/
</span>No Accidents<span style="color:red;margin:8px">/
</span>Power Door Locks<span style="color:red;margin:8px">/</span>
</div>
Appears like this on website :
A/C:front/Anti-Lock Brakes/Passenger Airbag/Power Mirrors/Power Steering/Power Windows/Driver Airbag/No Accidents/Power Door Locks/
I used $content = file_get_contents('url'); and now i need to shift through the data.
I need to fetch each one of the options above and put them in an array or something like :
$option = ("A/C:front","Anti-Lock Brakes","Passenger Airbag",....);
Any idea how to do this using php ?
With the source code everything is easier:
<?php
$dom = new DOMDocument;
#$dom->loadHTMLFile('http://www.sayuri.co.jp/used-cars/B37659-Nissan-Tiida%20Latio-japanese-used-cars');
$xpath = new DOMXPath($dom);
$nodes = iterator_to_array($xpath->query('//h4/following-sibling::div')->item(0)->childNodes);
$items = array_map(function ($node) {
return $node->nodeValue;
}, array_filter($nodes, function ($node) {
return $node->nodeValue != '/';
}));
var_dump($items);
This gave me the following:
array(9) {
[0]=>
string(9) "A/C:front"
[2]=>
string(16) "Anti-Lock Brakes"
[4]=>
string(16) "Passenger Airbag"
[6]=>
string(13) "Power Mirrors"
[8]=>
string(14) "Power Steering"
[10]=>
string(13) "Power Windows"
[12]=>
string(13) "Driver Airbag"
[14]=>
string(12) "No Accidents"
[16]=>
string(16) "Power Door Locks"
}
You might want to use array_values() on $items to reset the indexes. That's all!
Sounds like you need DOMDocument. Specifically, the getElementsByTagName function. So using your example, I suggest this. Please adjust to suit your needs:
// Get the contents of the URL.
$content = file_get_contents('url');
// Parse the HTML using `DOMDocument`
$dom = new DOMDocument();
#$dom->loadHTML($content);
// Search the parsed DOM structure for `span` elements.
$option = array();
foreach($dom->getElementsByTagName('span') as $span){
$option[] = $span->nodeValue;
}
// Dumps the values in `option` for review.
echo '<pre>';
print_r($option);
echo '</pre>';
Considering this code:
<div class="a">foo</div>
<div class="a"><div id="1">bar</div></div>
If I want to fetch all the values of divs with class a, I'll do the following query:
$q = $xpath->query('//div[#class="a"]');
However, I'll get this result:
foo
bar
But I want to get the actual value including the children tags. So it'll look like that:
foo
<div id="1">bar</div>
How can I accomplish that with XPath and DOMDocument only?
Solved by the function provided here.
PHP DOM has an undocumented '.nodeValue' attribute which acts exactly like .innerHTML in a browser. Once you've used XPath to get the node you want, just do $node->nodeValue to get the innerhtml.
You can try to use
$xml = '<?xml version=\'1.0\' encoding=\'UTF-8\' ?>
<root>
<div class="a">foo</div>
<div class="a"><div id="1">bar</div></div>
</root>';
$xml = simplexml_load_string($xml);
var_dump($xml->xpath('//div[#class="a"]'));
But in this case you will have to iterate objects.
Output:
array(2) {
[0]=>
object(SimpleXMLElement)#2 (2) {
["#attributes"]=>
array(1) {
["class"]=>
string(1) "a"
}
[0]=>
string(3) "foo"
}
[1]=>
object(SimpleXMLElement)#3 (2) {
["#attributes"]=>
array(1) {
["class"]=>
string(1) "a"
}
["div"]=>
string(3) "bar"
}
}
Try something like:
$doc = new DOMDocument;
$doc->loadHTML('<div>Your HTML here.</div>');
$xpath = new DOMXpath($doc);
$node = $xpath->query('//div[#class="a"]')->item(0);
$html = $node->ownerDocument->saveHTML($node); // Get HTML of DOMElement.
I am able to pull the necessary information using xpath, when I use var_dump using the following code. When I try to add a foreach loop to return all ["href"] values i get a blank page any ideas where I am messing up?
$dom = new DOMDocument();
#$dom->loadHTML($source);
$xml = simplexml_import_dom($dom);
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
$links = $rss->href;
foreach ($links as $link){
echo $link;
}
Here is the array of information.
array(96) {
[0]=>
object(SimpleXMLElement)#3 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(49) "/p/18351/test1.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(36) ""test1"
}
[1]=>
object(SimpleXMLElement)#4 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(43) "/p/18351/test2.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(30) ""test2"
}
[2]=>
object(SimpleXMLElement)#5 (2) {
["#attributes"]=>
array(2) {
["href"]=>
string(48) "/p/18351/test3.html"
["class"]=>
string(10) "highzoom1"
}
[0]=>
string(35) ""test3"
}
Instead of:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
use:
$hrefs = $xml->xpath("/html/body//a[#class='highzoom1']/#href");
The original XPath expression (the first above) you are using selects any a element in the XML document the value of whose class atribute is 'highzoom1' and that (the a element) is a descendent of a body that is a child of the top element (named html) in the XML document.
However, you want to select the href attributes of these a elements -- not the a elements themselves.
The second XPath expression above select exactly the href attributes of these a elements.
$links = $rss->href;
will never work, as $rss is a DOMNodeList object, and won't have an href attribute. Instead, you'd want to do this:
$rss = $xml->xpath("/html/body//a[#class='highzoom1']");
foreach($rss as $link) {
echo $link->href;
}
Or you can address $rss as an array directly:
echo $rss[5]->href; // echo out the href of the 6th link found.