I have a html like below:
<table>
<thead>
<tr>
<th>Name</th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>ABC</td>
<td><a data-permission="allow"></a></td>
</tr>
<tr>
<td>B</td>
<td><a data-permission="allow"></a></td>
</tr>
<tr>
<td>C</td>
<td><a data-permission="allow"></a></td>
</tr>
<tr>
<td>D</td>
<td><a data-permission="allow"></a></td>
</tr>
<tr>
<td>E</td>
<td><button type="button" data-permission="allow"></button></td>
</tr>
</tbody>
</table>
Now i am finding the nodes who contains "data-permission" attributes like (a, button etc.) from above example.
TO do that i am using the below code. Now what i am trying do is remove that whole <a>..</a> or <button>...</button> or any other element if they contain "data-permission" attribute and after deletion only return remaining HTML. So how to achieve that?
$dom = new DOMDocument;
$dom->loadHTML($output);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//#data-permission-id');
foreach ($nodes as $node) {
echo $node->nodeValue;
//$node->parentNode->removeChild($node); throws the error "Not Found Error"
}
Note- I have tried $node->parentNode->removeChild($node); inside loop, but it throws the error. Also after delete that tag, i want to get remaining HTML. I have read the How to delete element with DOMDocument? but it doesn't help.
Replace your node value to remove : $node->nodeValue = "";
$dom = new DOMDocument;
$dom->loadHTML($output);
echo "Previous : ".PHP_EOL.$dom->textContent.PHP_EOL;
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//*[#data-permission='allow']");
foreach ($nodes as $node) {
$node->nodeValue = "";
$dom->saveHTML();
}
Live demo : https://eval.in/885719
Live demo with your table data : https://eval.in/885780
I know this is probably covered in other threads, but I've been searching all over StackOverflow and tried many solutions, this is why I'm asking.
With this html:
<div class="someclass">
<table>
<tbody>
<tr>
<th class="state">Status</th>
<th class="name">Name</th>
<th class="type">Type</th>
<th class="length">Length</th>
<th class="height">Height</th>
</tr>
<tr>
<td class="state state2"></td>
<td class="name"></td>
<td class="type t18"></td>
<td class="length">2000 m</td>
<td class="height"></td>
</tr>
<tr>
<td class="state state1"></td>
<td class="name"></td>
<td class="type t18"></td>
<td class="length">2250 m</td>
<td class="height"></td>
</tr>
<tr>
<td class="state state1"></td>
<td class="name"></td>
<td class="type t18"></td>
<td class="length">3000 m</td>
<td class="height"></td>
</tr>
<tr>
<td class="state state2"></td>
<td class="name"></td>
<td class="type t18"></td>
<td class="length">2250 m</td>
<td class="height"></td>
</tr>
</tbody>
</table>
</div>
Now, this is the PHP code I have so far :
$dom = new DOMDocument();
$dom->loadHtmlFile('http://www.whatever.com');
$dom->preserveWhiteSpace = false;
$xp = new DOMXPath($dom);
$col = $xp->query('//td[contains(#class, "state1") and (contains(#class, "state"))]');
$length = 0;
foreach( $col as $n ) {
$parent = $n->parentNode;
$length += $parent->childNodes->item(3)->nodeValue;
}
echo 'Length: ' . $length;
I need to:
1.- Sum the 'length' values so I can echo them, getting rid of the ' m' substring of the given values.
2.- Understand why I'm getting wrong the 'parentNodes', 'childNodes' and 'item()' parts. With many tries I've gotten 'Length: 0'
I know this isn't the place to get a full detailed explanation, but it is really hard to find tutorials targetting these concrete issues. It would be great if someone could give some advice on where I can get this information.
Thanks very much in advance.
Edited the 'Concat' part for simplicity.
Navigation through DOMDocument for a specified childNode value by using DOMXpath
function getInt($string)
{
preg_match("/[0-9]+/i", $string, $val);
$out = 0;
if (isset($val) && !empty($val))
{
$out = $val[0];
}
return intval($out);
}
$dom = new DOMDocument();
$dom->loadHtml($html);
$dom->preserveWhiteSpace = false;
$xp = new DOMXPath($dom);
$length = 0;
foreach($xp->query('//td[#class="state state1"]/following-sibling::*[3]') as $element)
{
$value = $element->nodeValue;
$length += getInt($value);
}
echo $length;
I'm trying to find the span tags on a website similar to this: http://www.pointstreak.com/prostats/leagueschedule.html?leagueid=49&seasonid=14225. The tags I need are these:
However, when I use code such as the following:
$my_url = 'http://www.pointstreak.com/prostats/leagueschedule.html?leagueid=49&seasonid=14225';
$html = file_get_contents($my_url);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
//Put your XPath Query here
$my_xpath_query = "//span";
$result_rows = $xpath->query($my_xpath_query);
// Create an array to hold the content of the nodes
$statsListings = array();
//here we loop through our results (a DOMDocument Object)
foreach ($result_rows as $result_object) {
$statsListings[] = $result_object->nodeValue;
}
echo json_encode($statsListings);
The only output I get is [].
If I replace $statsListings[] = $result_object->nodeValue; with $statsListings[] = $result_object->childNodes->item(0)->nodeValue;, I still get the same [] as output. When there are clearly span tags with values, why am I getting nothing?
XPath is not guilty at all.
Span tags are added dinamically. Just have a look at the source code of the page, not the DOM-Structure, which may be already modified by javascript, but use "view-source:" and you will see exactly the same html, as it is parsed by XPath.
It would be a good idea to have a look at the table with class tablelines? probably, you have there everything you may need.
You should skip "maincolor" and "tableheader", and start processing with "light" class.
<table width="98%" class="tablelines" cellpadding="2" border="0" cellspacing="1">
<tr class="maincolor">
<td colspan="8" align="right">All Times Local</td>
</tr>
<tr class="tableheader">
<td width="4%">
<b>GN</b>
</td>
<td nowrap width="21%">
<b>AWAY</b>
</td>
<td nowrap width="21%">
<b>HOME</b>
</td>
<td width="14%"><b>DATE</b></td>
<td width="11%"><b>TIME</b></td>
<td width="8%"><b>SCORE</b></td>
<td nowrap align="right" width="*"><b>BOXSCORE</b></td>
<td nowrap align="center" width="4%"><b>GS</b></td>
</tr>
<tr class="light">
<td></td>
<td>Sioux City
<b>1</b></td>
<td>Sioux Falls
<b>5</b></td>
<td>Tue, Apr 14</td>
<td> 7:05 PM</td>
<td> <b>1 - 5</b> </td>
<td align="right">
<img src="/images/gamelive_icon.gif" title="Click here for Game Live!" alt="Click here for Game Live" border="0">
Final</td>
<td align="center">
<img src="/images/playersection/prostats/gslink.gif" border="0">
</td>
</tr>
For example, try this:
$my_url = 'http://www.pointstreak.com/prostats/leagueschedule.html?leagueid=49&seasonid=14225';
$html = file_get_contents($my_url);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
//Put your XPath Query here
$my_xpath_query = "//tr[#class='light']/td";
$result_rows = $xpath->query($my_xpath_query);
echo $result_rows->length;
// Create an array to hold the content of the nodes
$statsListings = array();
//here we loop through our results (a DOMDocument Object)
foreach ($result_rows as $result_object) {
$statsListings[] = $result_object->nodeValue;
}
echo json_encode($statsListings);
Probably I have found what you need, and even in nice JSON form:
http://www.pointstreak.com/ajax/trending_ajax.html?action=divisionscoreboard&divisionid=12299&seasonid=14225
{"trending_list":null,"lacrosse_list":null,"hockey_list":null,"soccer_list":null,"baseball_list":null,"softball_list":null,"basketball_list":null,"news_list":null,"news_hockey_list":null,"news_baseball_list":null,"news_baseball_list2":null,"news_softball_list":null,"news_basketball_list":null,"games_list":[{"status":"FINAL","hometeam":"Sioux Falls","homescore":"4","awayteam":"Muskegon","awayscore":"2","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"15\/05","link":"..\/prostats\/boxscore.html?gameid=2672134"},{"status":"FINAL","hometeam":"Muskegon","homescore":"1","awayteam":"Sioux Falls","awayscore":"6","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:15 pm","gamedate":"10\/05","link":"..\/prostats\/boxscore.html?gameid=2672133"},{"status":"FINAL","hometeam":"Muskegon","homescore":"2","awayteam":"Sioux Falls","awayscore":"3","timeremaining":"0:00","currentperiod":"1st","schedtime":"7:15 pm","gamedate":"09\/05","link":"..\/prostats\/boxscore.html?gameid=2672132"},{"status":"FINAL","hometeam":"Dubuque","homescore":"3","awayteam":"Muskegon","awayscore":"4","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"05\/05","link":"..\/prostats\/boxscore.html?gameid=2662061"},{"status":"FINAL","hometeam":"Muskegon","homescore":"0","awayteam":"Dubuque","awayscore":"6","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:15 pm","gamedate":"02\/05","link":"..\/prostats\/boxscore.html?gameid=2662060"},{"status":"FINAL","hometeam":"Sioux Falls","homescore":"7","awayteam":"Tri-City","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"02\/05","link":"..\/prostats\/boxscore.html?gameid=2662055"},{"status":"FINAL","hometeam":"Muskegon","homescore":"3","awayteam":"Dubuque","awayscore":"1","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:15 pm","gamedate":"01\/05","link":"..\/prostats\/boxscore.html?gameid=2662059"},{"status":"FINAL","hometeam":"Sioux Falls","homescore":"4","awayteam":"Tri-City","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:04 pm","gamedate":"01\/05","link":"..\/prostats\/boxscore.html?gameid=2662054"},{"status":"FINAL","hometeam":"Tri-City","homescore":"2","awayteam":"Sioux Falls","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"29\/04","link":"..\/prostats\/boxscore.html?gameid=2664638"},{"status":"FINAL","hometeam":"Dubuque","homescore":"7","awayteam":"Muskegon","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"25\/04","link":"..\/prostats\/boxscore.html?gameid=2662058"}],"division_list":null,"site_network_title":null,"leagueshortname":"USHL","includesportlink":null,"showleaguename":0}
Lately I've had a question, what I'm trying to do is read data from an HTML table and grab the data into a variable called $id. For example I have this code:
<tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td>View SWF</td>
</tr>
What I want to do is that another variable called $array[$i] which is holding a search query. I want my PHP code to search through the table until it finds the section with that specific query in it. In this case is would be "Party Hat." What I want it to do after it finds the query is for it to look at the ID which is the "td" section above the name "Party Hat" the ID in this case is 413. After this I want the variable $id to hold the ID. How do I do this? Any help would be HIGHLY appreciated!
using Tidy, DOMDocument and DOMXPath (make sure the PHP extensions are enabled) you can do something like this:
<?php
$url = "http://example.org/test.html";
function get_data_from_table($id, $url)
{
// retrieve the content of that url
$content = file_get_contents($url);
// repair bad HTML
$tidy = tidy_parse_string($content);
$tidy->cleanRepair();
$content = (string)$tidy;
// load into DOM
$dom = new DOMDocument();
$dom->loadHTML($content);
// make xpath-able
$xpath = new DOMXPath($dom);
// search for the first td of each tr, where its content is $id
$query = "//tr/td[position()=1 and normalize-space(text())='$id']";
$elements = $xpath->query($query);
if ($elements->length != 1) {
// not exactly 1 result as expected? return number of hits
return $elements->length;
}
// our td was found
$element = $elements->item(0);
// get his parent element (tr)
$tr = $element->parentNode;
$data = array();
// iterate over it's td elements
foreach ($tr->getElementsByTagName("td") as $td) {
// retrieve the content as text
$data[] = $td->textContent;
}
// return the array of <td> contents
return $data;
}
echo '<pre>';
print_r(
get_data_from_table(
414,
$url
)
);
echo '</pre>';
Your HTML source (http://example.org/test.html):
<table><tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td>View SWF</td>
</tr><tr>
<td>414</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td>View SWF</td>
</tr>
(as you can see, no valid HTML, but this doesn't matter)
This works: (although a bit ugly, perhaps someone else can come up with a better xpath solution)
$html = <<<HTML
<html>
<body>
<table>
<thead>
<tr>
<td>id</td>
<td>name</td>
<td>a</td>
<td>b</td>
<td>c</td>
</tr>
</thead>
<tbody>
<tr>
<td>413</td>
<td>Party Hat</td>
<td>0</td>
<td>No</td>
<td>a link</td>
</tr>
<tr>
<td>414</td>
<td>Party Hat 2</td>
<td>0</td>
<td>No</td>
<td>a link</td>
</tr>
</tbody>
</table>
</body>
</html>
HTML;
$doc = new DOMDocument();
$doc->loadHTML($html);
$domxpath = new DOMXPath($doc);
$res = $domxpath->query("//*[local-name() = 'td'][text() = 'Party Hat']/../td[position() = '1']");
var_dump($res->length, $res->item(0)->textContent);
Outputs:
int(1)
string(3) "413"
try to load the html into an new DOMDocument via loadHTML and process it like an XML Doc, with xpath or other types of query
I use regex for HTML parsing but I need your help to parse the following table:
<table class="resultstable" width="100%" align="center">
<tr>
<th width="10">#</th>
<th width="10"></th>
<th width="100">External Volume</th>
</tr>
<tr class='odd'>
<td align="center">1</td>
<td align="left">
http://xyz.com
</td>
<td align="right">210,779,783<br />(939,265 / 499,584)</td>
</tr>
<tr class='even'>
<td align="center">2</td>
<td align="left">
http://abc.com
</td>
<td align="right">57,450,834<br />(288,915 / 62,935)</td>
</tr>
</table>
I want to get all domains with their volume(in array or var) for example
http://xyz.com - 210,779,783
Should I use regex or HTML dom in this case. I don't know how to parse large table, can you please help, thanks.
here's an XPath example that happens to parse the HTML from the question.
<?php
$dom = new DOMDocument();
$dom->loadHTMLFile("./input.html");
$xpath = new DOMXPath($dom);
$trs = $xpath->query("//table[#class='resultstable'][1]/tr");
foreach ($trs as $tr) {
$tdList = $xpath->query("td[2]/a", $tr);
if ($tdList->length == 0) continue;
$name = $tdList->item(0)->nodeValue;
$tdList = $xpath->query("td[3]", $tr);
$vol = $tdList->item(0)->childNodes->item(0)->nodeValue;
echo "name: {$name}, vol: {$vol}\n";
}
?>