Finding value of nodes using XMLDOm in PHP - php

I need to extract information from an XML using XMLDom.
Below is myroot.xml
<?xml version='1.0' encoding='ISO-8859-1'?>
<myroot xml:lang='en'>
<delta>
<history>
<detail>
<id>one</id>
<degree>
<dname>alpha</dname>
<dates>
<StartDate>
<Year>1998</Year>
</StartDate>
<EndDate>
<Year>2002</Year>
</EndDate>
</dates>
</degree>
</detail>
<detail>
<id>two</id>
<degree>
<dname>beta</dname>
<dates>
<StartDate>
<Year>2006</Year>
</StartDate>
<EndDate>
<Year>2008</Year>
</EndDate>
</dates>
</degree>
</detail>
</history>
</delta>
here is my code
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$rootxmldoc = $doc->load('myroot.xml');
$xpath = new DOMXPath($rootxmldoc);
$items = $hrxml_obj->getElementsByTagName("detail");
$subitemarray = array();
$icounter = 0;
foreach ($items as $item) {
$query = "//dates/*/Year"; //xpath of all occurrence of Year
$entries = $xpath->query($query, $item);
foreach ($entries as $entry) {
$dates["startdate"] = "todo"; //extract StartDate
$dates["enddate"] = "todo"; //extract EndDate
}
$subitemarray[$icounter++] = dates;
}
var_dump($subitemarray);
Ideally I need to extract dates using xpath. I am not able to get this nailed. any help is appreciated. The issue is the usage of xpath while looping.

With XPath go directly to yout dates tag, and then use DOMElement::getElementsByTagName() to get StartDate and EndDate (you can also go to the dates tag using DOMDocument::getElementsByTagName(), but XPath gives you more flexibility should you need it). This will return a DOMNodeList, but you know (if the structure is constant) that you only need the first element of the list. So:
// $xml ommited, saved in a variable for testing purposes
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
$items = $doc->getElementsByTagName("detail");
$subitemarray = array();
$icounter = 0;
foreach ($items as $item) {
$query = "//dates"; //xpath of all occurrence of Year
$entries = $xpath->query($query, $item);
foreach ($entries as $entry) {
$startDate = $entry->getElementsByTagName("StartDate")[0]->nodeValue;
$endDate = $entry->getElementsByTagName("EndDate")[0]->nodeValue;
$dates["startdate"] = $startDate; //extract StartDate
$dates["enddate"] = $endDate; //extract EndDate
}
$subitemarray[$icounter++] = $dates;
}
var_dump($subitemarray);
Demo
Or only with XPath:
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
$items = $doc->getElementsByTagName("detail");
$subitemarray = array();
$icounter = 0;
foreach ($items as $item) {
$queryStart = "//dates/StartDate";
$entriesStart = $xpath->query($queryStart, $item);
$dates["startdate"] = $entriesStart[0]->nodeValue;
$queryEnd = "//dates/EndDate";
$entriesEnd = $xpath->query($queryEnd, $item);
$dates["enddate"] = $entriesEnd[0]->nodeValue;
$subitemarray[$icounter++] = $dates;
}
var_dump($subitemarray);
And lastly, using only one XPath query:
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
$items = $doc->getElementsByTagName("detail");
$subitemarray = array();
$icounter = 0;
foreach ($items as $item) {
$query = "//dates/*[contains(local-name(), 'Date')]
";
$entries = $xpath->query($query, $item);
$dates["startdate"] = $entries[0]->nodeValue;
$dates["enddate"] = $entries[1]->nodeValue;
$subitemarray[$icounter++] = $dates;
}
var_dump($subitemarray);
Demo
The query will simply get any elements inside the current detail element that contains the word "Date". Again, if the structure is constant, you can assume that the first result will be StartDate and the second result will be EndDate.

Related

Why the query doesn't match the DOM?

Here is my code:
$res = file_get_contents("http://www.lenzor.com/photo/search/index/type/user/%D8%B9%D9%84%DB%8C//text/%D9%81%D8%A7%D8%B7%D9%85%D9%87");
$doc = new \DOMDocument();
#$doc->loadHTMLFile($res);
$xpath = new \DOMXpath($doc);
$links = $xpath->query("//ul[#class='user_box']/li");
$result = array();
if (!is_null($links)) {
foreach ($links as $link) {
$href = $link->getAttribute('class');
$result[] = [$href];
}
}
print_r($result);
Here is the content I'm working on. I mean it's the result of echo $res.
Ok well, the result of my code is an empty array. So $links is empty and that foreach won't be executed. Why? Why //ul[#class='user_box']/li query doesn't match the DOM ?
Expected result is an array contains the class attribute of lis.
Try this, Hope this will be helpful. There are few mistakes in your code.
1. You should search like this '//ul[#class="user_box clearfix"]/li' because class="user_box clearfix" class attribute of that HTML source contains two classes.
2. You should use loadHTMLinstead of loadHTMLFile.
<?php
ini_set('display_errors', 1);
libxml_use_internal_errors(true);
$res = file_get_contents("http://www.lenzor.com/photo/search/index/type/user/%D8%B9%D9%84%DB%8C//text/%D9%81%D8%A7%D8%B7%D9%85%D9%87");
$doc = new \DOMDocument();
$doc->loadHTML($res);
$xpath = new \DOMXpath($doc);
$links = $xpath->query('//ul[#class="user_box clearfix"]/li');
$result = array();
if (!is_null($links)) {
foreach ($links as $link) {
$href = $link->getAttribute('class');
$result[] = [$href];
}
}
print_r($result);

Parsing HTML to extract array of DIV content by class

$html = file_get_contents("https://www.wireclub.com/chat/room/music");
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$result = array();
foreach($xpath->evaluate('//div[#class="message clearfix"]/node()') as $childNode) {
$result[] = $dom->saveHtml($childNode);
}
echo '<pre>'; var_dump($result);
I would like the content of each individual DIV in an array to be processed individually.
This code is clumping every DIV together.
You could retrieve all the div and get the nodeValue
$dom = new DOMDocument();
$dom->loadHTML($html);
$myDivs = $dom->getElementsByTagName('div');
foreach($myDivs as $key => $value) {
$result[] = $value->nodeValue;
}
var_dump($result);
for class you should
you could use you code
$xpath = new DOMXPath($dom);
$myElem = $xpath->query("//*[contains(#class, '$classname')]");
foreach($myElem as $key => $value) {
$result[] = $value->nodeValue;
}

Getting data from HTML using DOMDocument

I'm trying to get data from HTML using DOM. I can get some data, but can't figure out how to get the rest. Here is an image highlighting the data I want.
http://i.imgur.com/Es51s5s.png
here is the code itself
http://pastebin.com/Re8qEivv
and here my PHP code
$html = file_get_contents('result.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
$tr = $dom->getElementsByTagName('tr');
foreach ($tr as $row){
$td = $row->getElementsByTagName('td');
$td1 = $td->item(1);
$td2 = $td->item(2);
foreach ($td1->childNodes as $node){
$title = $node->textContent;
}
foreach ($td2->childNodes as $node){
$type = $node->textContent;
}
}
Figured it out
$html = file_get_contents('result.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
$tr = $dom->getElementsByTagName('tr');
foreach ($tr as $row){
$td = $row->getElementsByTagName('td');
$td1 = $td->item(1);
$td2 = $td->item(2);
$title = $td1->childNodes->item(0)->textContent;
$firstURL = $td1->getElementsByTagName('a')->item(0)->getAttribute('href');
$type = $td2->childNodes->item(0)->textContent;
$imageURL = $td2->getElementsByTagName('img')->item(0)->getAttribute('src');
}
I have used following class.
http://sourceforge.net/projects/simplehtmldom/
This is very simple and easy to use class.
You can use
$html->find('#RosterReport > tbody', 0);
to find specific table
$html->find('tr')
$html->find('td')
to find table rows or columns
Note $html is variable have full html dom content.

How do I rename XML values using php?

How do I rename a value in xml using PHP? This is what I've got so far:
<?php
$q = $_GET["q"];
$q = stripslashes($q);
$q = explode('|^', $q);
$old = $q[0];
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->Load("test.xml");
$xpath = new DOMXPath($dom);
$query1 = 'channel/item[title="' . $old . '"]/title';
$entries = $xpath->query($query1);
foreach ($entries as $entry)
{
$oldchapter = $entry->parentNode->removeChild($entry);
$item = $dom->getElementsByTagName('item');
foreach ($item as $items)
{
$title = $dom->createElement('title', $q[1]);
$items->appendChild($title);
}
}
$dom->save("test.xml");
Basically, what it does is take two titles from a url, the old existing title, and the one the user wants to change it to (so like this oldtitle|^newtitle), and puts them into an array.
What I've tried doing is removing the existing old title, and then making a new title with, using the new title value from the url, but it doesn't seem to be working. Where am I going wrong, or is there an easier way of doing this?
The way to do this is with DOMNode::replaceChild(). The majority of your code is correct, you've just slightly over-complicated some of the DOM stuff.
Try this:
<?php
$q = $_GET["q"];
$q = stripslashes($q);
$q = explode('|^', $q);
$old = $q[0];
$dom = new DOMDocument;
// Do this *before* loading the document
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->Load("test.xml");
$xpath = new DOMXPath($dom);
$query1 = 'channel/item[title="' . $old . '"]/title';
$entries = $xpath->query($query1);
// This is all you need to do in the loop
foreach ($entries as $oldTitle) {
$newTitle = $dom->createElement('title', $q[1]);
$entry->parentNode->replaceChild($newTitle, $oldTitle);
}
$dom->save("test.xml");

Xpath to parse xml and input in mysql

I'm trying to use xpath in conjunction with DOMDocument to try and parse my xml and insert into a table.
All my variables are inserting correctly other than $halftimescore - why is this?
Here is my code:
<?php
define('INCLUDE_CHECK',true);
require 'db.class.php';
$dom = new DOMDocument();
$dom ->load('main.xml');
$xpath = new DOMXPath($dom);
$queryResult = $xpath->query('//live/Match/Results/Result[#name="HT"]');
foreach($queryResult as $resulty) {
$halftimescore=$resulty->getAttribute("value");
}
$Match = $dom->getElementsByTagName("Match");
foreach ($Match as $match) {
$matchid = $match->getAttribute("id");
$home = $match->getElementsByTagName("Home");
$hometeam = $home->item(0)->getAttribute("name");
$homeid = $home->item(0)->getAttribute("id");
$away = $match->getElementsByTagName("Away");
$awayid = $away->item(0)->getAttribute("id");
$awayteam = $away->item(0)->getAttribute("name");
$leaguename = $match->getElementsByTagName("league");
$league = $leaguename->item(0)->nodeValue;
$leagueid = $leaguename->item(0)->getAttribute("id");
foreach ($match->getElementsByTagName('Result') as $result) {
$resulttype = $result->getAttribute("name");
$score = $result->getAttribute("value");
$scoreid = $result->getAttribute("value");
}
mysql_query("
INSERT INTO blabla
(home_team, match_id, ht_score, away_team)
VALUES
('".$hometeam."', '".$matchid."', '".$halftimescore."', '".$awayteam."')
");
}
Because you populated $halftimescore outside the main loop, in a loop of its own, it will only have one value (the last value) because each iteration overwrites the previous.
What you need to do instead is run the XPath query within the main loop, with a base node of the current node, like this:
// ...
$xpath = new DOMXPath($dom);
/*
Remove these lines from here...
$queryResult = $xpath->query('//live/Match/Results/Result[#name="HT"]');
foreach($queryResult as $resulty) {
$halftimescore=$resulty->getAttribute("value");
}
*/
$Match = $dom->getElementsByTagName("Match");
foreach ($Match as $match) {
// and do the query here instead:
$result = $xpath->query('./Results/Result[#name="HT"]', $match);
if ($result->length < 1) {
// handle this error - the node was not found
}
$halftimescore = $result->item(0)->getAttribute("value");
// ...

Categories