I have a html like below:
<table>
<thead>
<tr>
<th>Name</th>
<th>Action</th>
</tr>
</thead>
<tbody>
<tr>
<td>ABC</td>
<td><a data-permission="allow"></a></td>
</tr>
<tr>
<td>B</td>
<td><a data-permission="allow"></a></td>
</tr>
<tr>
<td>C</td>
<td><a data-permission="allow"></a></td>
</tr>
<tr>
<td>D</td>
<td><a data-permission="allow"></a></td>
</tr>
<tr>
<td>E</td>
<td><button type="button" data-permission="allow"></button></td>
</tr>
</tbody>
</table>
Now i am finding the nodes who contains "data-permission" attributes like (a, button etc.) from above example.
TO do that i am using the below code. Now what i am trying do is remove that whole <a>..</a> or <button>...</button> or any other element if they contain "data-permission" attribute and after deletion only return remaining HTML. So how to achieve that?
$dom = new DOMDocument;
$dom->loadHTML($output);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//#data-permission-id');
foreach ($nodes as $node) {
echo $node->nodeValue;
//$node->parentNode->removeChild($node); throws the error "Not Found Error"
}
Note- I have tried $node->parentNode->removeChild($node); inside loop, but it throws the error. Also after delete that tag, i want to get remaining HTML. I have read the How to delete element with DOMDocument? but it doesn't help.
Replace your node value to remove : $node->nodeValue = "";
$dom = new DOMDocument;
$dom->loadHTML($output);
echo "Previous : ".PHP_EOL.$dom->textContent.PHP_EOL;
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//*[#data-permission='allow']");
foreach ($nodes as $node) {
$node->nodeValue = "";
$dom->saveHTML();
}
Live demo : https://eval.in/885719
Live demo with your table data : https://eval.in/885780
<table>
<tr>
<th>Year</th>
<th>Score</th>
</tr>
<tr>
<td>2014</td>
<td>3078</td>
</tr>
</table>
If I have the above table being successfully stored as a variable, how could I append it to a div with an overflow-x style attribute?
I've tried the following snippet but no cigar:
$div = str_get_html('<div style="overflow-x:auto;"></div>');
$div = $div->find('div');
$div = $div->appendChild($table);
return $div;
so expected output should be:
<div style="overflow-x:auto;">
<table>
<tr>
<th>Year</th>
<th>Score</th>
</tr>
<tr>
<td>2014</td>
<td>3078</td>
</tr>
</table>
</div>
Hope this one will give you a basic idea of implementation. Here we are using DOMDocument.
Try this code snippet here
<?php
ini_set('display_errors', 1);
//creating table node
$tableNode='<table><tr><th>Year</th><th>Score</th></tr><tr><td>2014</td><td>3078</td></tr></table>';
$domDocument = new DOMDocument();
$domDocument->encoding="UTF-8";
$domDocument->loadHTML($tableNode);
$domXPath = new DOMXPath($domDocument);
$table = $domXPath->query("//table")->item(0);
//creating empty div node.
$domDocument = new DOMDocument();
$element=$domDocument->createElement("div");
$element->setAttribute("style", "overflow-x:auto;");
$result=$domDocument->importNode($table,true);//importing node from of other DOMDocument
$element->appendChild($result);
echo $domDocument->saveHTML($element);
I am trying to parse html table in order to get <td> ID HERE </td> tag content using Xpath and PHP.
Executing following line
$doc->loadHTMLFile($file);
gives me warnings like this:
PHP Warning: DOMDocument::loadHTMLFile(): Unexpected end tag : tr in...
That's why I am using the following block of code:
libxml_use_internal_errors(true);
$doc->loadHTMLFile($file);
libxml_clear_errors();
Trying to parse this: (the entire page here)
<table class="object-table" cellpadding="0" cellspacing="0">
<tbody>
<tr>
<th width="8%">something here</th>
<th width="89%">something here</th>
<th width="3%">something here</th>
</tr>
<tr class="normal-row">
<td>ID number here</td>
<td>something here
</td>
<td align="center">
<img src="/design/img/hasnt_photo_icon.gif">
</td>
</tr>
<tr class="odd-row">
<td>ID number here</td>
<td>something here
</td>
<td align="center">
<img src="/design/img/hasnt_photo_icon.gif">
</td>
</tr>
</tbody>
</table>
with the following code:
$file = "http://www.sportsporudy.gov.ua/catalog/#c[1]=1";
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTMLFile($file);
libxml_clear_errors();
$xpath = new DOMXPath($doc);
$query = '//tr[#class="odd-row"]';
$elements = $xpath->query($query);
printf("Size of array: %d\n", sizeof($elements));
printElements($elements);
and tried using different queries like
//table[#class="object-table"]/tbody/tr ...
but doesn't seem to give me the td tags I need. Maybe that's because of the broken HTML.
Thanks for your advice.
Substantially, your code is fine.
The only error that I've found is in the printing $elements length: $elements is not an array, to retrieve its length you have to use this syntax:
printf( "Size of array: %d\n", $elements->length );
But the major problem that you have with your page is that the HTML has only one table with one row: the remaining data are filled with javascript, so you can't retrieve it directly through DOMXPath.
I'm trying to find the span tags on a website similar to this: http://www.pointstreak.com/prostats/leagueschedule.html?leagueid=49&seasonid=14225. The tags I need are these:
However, when I use code such as the following:
$my_url = 'http://www.pointstreak.com/prostats/leagueschedule.html?leagueid=49&seasonid=14225';
$html = file_get_contents($my_url);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
//Put your XPath Query here
$my_xpath_query = "//span";
$result_rows = $xpath->query($my_xpath_query);
// Create an array to hold the content of the nodes
$statsListings = array();
//here we loop through our results (a DOMDocument Object)
foreach ($result_rows as $result_object) {
$statsListings[] = $result_object->nodeValue;
}
echo json_encode($statsListings);
The only output I get is [].
If I replace $statsListings[] = $result_object->nodeValue; with $statsListings[] = $result_object->childNodes->item(0)->nodeValue;, I still get the same [] as output. When there are clearly span tags with values, why am I getting nothing?
XPath is not guilty at all.
Span tags are added dinamically. Just have a look at the source code of the page, not the DOM-Structure, which may be already modified by javascript, but use "view-source:" and you will see exactly the same html, as it is parsed by XPath.
It would be a good idea to have a look at the table with class tablelines? probably, you have there everything you may need.
You should skip "maincolor" and "tableheader", and start processing with "light" class.
<table width="98%" class="tablelines" cellpadding="2" border="0" cellspacing="1">
<tr class="maincolor">
<td colspan="8" align="right">All Times Local</td>
</tr>
<tr class="tableheader">
<td width="4%">
<b>GN</b>
</td>
<td nowrap width="21%">
<b>AWAY</b>
</td>
<td nowrap width="21%">
<b>HOME</b>
</td>
<td width="14%"><b>DATE</b></td>
<td width="11%"><b>TIME</b></td>
<td width="8%"><b>SCORE</b></td>
<td nowrap align="right" width="*"><b>BOXSCORE</b></td>
<td nowrap align="center" width="4%"><b>GS</b></td>
</tr>
<tr class="light">
<td></td>
<td>Sioux City
<b>1</b></td>
<td>Sioux Falls
<b>5</b></td>
<td>Tue, Apr 14</td>
<td> 7:05 PM</td>
<td> <b>1 - 5</b> </td>
<td align="right">
<img src="/images/gamelive_icon.gif" title="Click here for Game Live!" alt="Click here for Game Live" border="0">
Final</td>
<td align="center">
<img src="/images/playersection/prostats/gslink.gif" border="0">
</td>
</tr>
For example, try this:
$my_url = 'http://www.pointstreak.com/prostats/leagueschedule.html?leagueid=49&seasonid=14225';
$html = file_get_contents($my_url);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
//Put your XPath Query here
$my_xpath_query = "//tr[#class='light']/td";
$result_rows = $xpath->query($my_xpath_query);
echo $result_rows->length;
// Create an array to hold the content of the nodes
$statsListings = array();
//here we loop through our results (a DOMDocument Object)
foreach ($result_rows as $result_object) {
$statsListings[] = $result_object->nodeValue;
}
echo json_encode($statsListings);
Probably I have found what you need, and even in nice JSON form:
http://www.pointstreak.com/ajax/trending_ajax.html?action=divisionscoreboard&divisionid=12299&seasonid=14225
{"trending_list":null,"lacrosse_list":null,"hockey_list":null,"soccer_list":null,"baseball_list":null,"softball_list":null,"basketball_list":null,"news_list":null,"news_hockey_list":null,"news_baseball_list":null,"news_baseball_list2":null,"news_softball_list":null,"news_basketball_list":null,"games_list":[{"status":"FINAL","hometeam":"Sioux Falls","homescore":"4","awayteam":"Muskegon","awayscore":"2","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"15\/05","link":"..\/prostats\/boxscore.html?gameid=2672134"},{"status":"FINAL","hometeam":"Muskegon","homescore":"1","awayteam":"Sioux Falls","awayscore":"6","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:15 pm","gamedate":"10\/05","link":"..\/prostats\/boxscore.html?gameid=2672133"},{"status":"FINAL","hometeam":"Muskegon","homescore":"2","awayteam":"Sioux Falls","awayscore":"3","timeremaining":"0:00","currentperiod":"1st","schedtime":"7:15 pm","gamedate":"09\/05","link":"..\/prostats\/boxscore.html?gameid=2672132"},{"status":"FINAL","hometeam":"Dubuque","homescore":"3","awayteam":"Muskegon","awayscore":"4","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"05\/05","link":"..\/prostats\/boxscore.html?gameid=2662061"},{"status":"FINAL","hometeam":"Muskegon","homescore":"0","awayteam":"Dubuque","awayscore":"6","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:15 pm","gamedate":"02\/05","link":"..\/prostats\/boxscore.html?gameid=2662060"},{"status":"FINAL","hometeam":"Sioux Falls","homescore":"7","awayteam":"Tri-City","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"02\/05","link":"..\/prostats\/boxscore.html?gameid=2662055"},{"status":"FINAL","hometeam":"Muskegon","homescore":"3","awayteam":"Dubuque","awayscore":"1","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:15 pm","gamedate":"01\/05","link":"..\/prostats\/boxscore.html?gameid=2662059"},{"status":"FINAL","hometeam":"Sioux Falls","homescore":"4","awayteam":"Tri-City","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:04 pm","gamedate":"01\/05","link":"..\/prostats\/boxscore.html?gameid=2662054"},{"status":"FINAL","hometeam":"Tri-City","homescore":"2","awayteam":"Sioux Falls","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"29\/04","link":"..\/prostats\/boxscore.html?gameid=2664638"},{"status":"FINAL","hometeam":"Dubuque","homescore":"7","awayteam":"Muskegon","awayscore":"3","timeremaining":"0:00","currentperiod":"3rd","schedtime":"7:05 pm","gamedate":"25\/04","link":"..\/prostats\/boxscore.html?gameid=2662058"}],"division_list":null,"site_network_title":null,"leagueshortname":"USHL","includesportlink":null,"showleaguename":0}
I have this table in output from a program (string converted in a DomDocument in PHP):
<table>
<tr>
<td width="50">Â </td>
<td>My content</td>
<td width="50">Â </td>
</tr>
<table>
I need to remove the two tag <td width="50">Â </td> (i don't know why the program adds them, but there are -.-") like this:
<table>
<tr>
<td>My content</td>
</tr>
<table>
What's the best way for do it in PHP?
Edit:
the program is JasperReport Server. I call the report rendering function via web application:
//this is the call to server library for generate the report
$reportGen = $reportServer->runReport($myReport);
$domDoc = new \DomDocument();
$domDoc->loadHTML($reportGen);
return $domDoc->saveHTML($domDoc->getElementsByTagName('table')->item(0));
return the upper table who i need to fix...
Try this
<?php
$domDoc = new DomDocument();
$domDoc->loadHTML($reportGen);
$xpath = new DOMXpath($domDoc);
$tags = $xpath->query('//td');
foreach($tags as $tag) {
$value = $tag->nodeValue;
if(preg_match('/^(Â )/',$value))
$tag->parentNode->removeChild($tag);
}
?>
Regex and replace:
$var = '<table>
<tr>
<td width="50">Ã</td>
<td>My interssing content</td>
<td width="50">Ã</td>
</tr>
<table>';
$final = preg_replace('#(<td width="50".*?>).*?(</td>)#', '$1$2', $var);
$final = str_replace('<td width="50"></td>', '', $final);
echo $final;