Getting tags in DOMDocument - php

I'm trying to get the HTML markup of a table in a page:
$new_dom = new DOMDocument();
$table = '';
$nodesTable = $this->dom->getElementsbyTagName("table");
foreach($nodesTable as $nodeTable){
$color = $nodeTable->getAttribute('bordercolordark');
if ($color == '#73BAFF') {
$table = $nodeTable;
}
}
$new_dom->appendChild($table);
echo $new_dom->saveHTML();
Here is somepage.html:
<html>
<table>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
</table>
<table border="1" cellpadding="0" width="500" bordercolorlight="#ACD6FF" bordercolordark="#73BAFF" align="center">
<tr>
<td rowspan="2" colspan="2" bgcolor="#73BAFF"> </td>
<td colspan="3" align="center" bgcolor="#ACD6FF"> Element 1 </td>
<td colspan="3" align="center" bgcolor="#ACD6FF"> Element 2 </td>
</tr>
<tr>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
<td width="50" align="center" bgcolor="#ACD6FF"> 50 </td>
</tr>
<tr>
<td bgcolor="#ACD6FF" width="155" align="center"> Row 1</td>
<td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
<td align="center"> 50 </td>
</tr>
<tr>
<td bgcolor="#ACD6FF" width="155" align="center"> Row 2</td>
<td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
<td align="center"> 60 </td>
</tr>
<tr>
<td bgcolor="#ACD6FF" width="155" align="center"> Row 3</td>
<td bgcolor="#ACD6FF" width="45" align="center"> 30 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
<td align="center"> 70 </td>
</tr>
</table>
<table>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
</table>
<table>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
<tr> <td> 10 </td> </tr>
</table>
</html>
$new_dom just outputs \n instead of HTML markup. I tried looking at this answer: PHP DOMDocument stripping HTML tags, but appending the table this way didn't work either.

Fatal error: Uncaught exception 'DOMException' with message 'Wrong Document Error'
So you cannot move nodes from one document to another... If you want to do that, you have to use importNode() with the deep flag.
$dom = new DOMDocument();
$dom->loadHTMLFile('x.html');
$new_dom = new DOMDocument();
$table = '';
$nodesTable = $dom->getElementsbyTagName("table");
foreach($nodesTable as $nodeTable){
$color = $nodeTable->getAttribute('bordercolordark');
if ($color == '#73BAFF') {
$table = $new_dom->importNode($nodeTable, true);
}
}
$new_dom->appendChild($table);
echo $new_dom->saveHTML();
This imports only the table element, but not the children...
note: I'd disable the entity loader in your case libxml_disable_entity_loader(true);. I am not sure whether XEE attacks work with loadHTML() too, but just for the sake of security.

Related

PHP/HTML: Replace everything between <TD colspan=6 ...> .... </TD>

I want to replace everything between a TD-Tag that has the attribute/value colspan=6.
All this I want to replace ...
<TD colspan=6 rowspan=4 align="center" nowrap="1">
<TABLE>
<TR>
<TD width="50%" nowrap=1><font size="3" face="Arial">
Some Text
</font>
</TD>
</TR>
<TR>
<TD width="50%" nowrap=1><font size="3" face="Arial">
Some Text
</font>
</TD>
</TR>
<TR>
<TD width="50%" nowrap=1><font size="3" face="Arial">
Some Text
</font>
</TD>
</TR>
</TABLE>
</TD>
...with this lines:
<TD colspan=12 rowspan=2 align="center" nowrap="1">
<TABLE>
<TR>
<TD>frei</TD>
Some Text
</TR>
</TABLE>
</TD>
Any Ideas? Maybe with simple_html_dom.php?
Solved with PHP and Regex:
$plan1 = preg_replace('~<TD colspan=6.*?</TR></TABLE></TD>~s",
"<TD colspan=12 rowspan=2 align="center" nowrap="1">
<TABLE><TR><TD></TD></TR></TABLE></TD>', $plan1);

Python regex ignore new line

I have web page look like this
<td valign="top">
<table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
<tr>
<td colspan="2">
<div align="center">
<a href="/title/name.php" target="_blank">
<img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
</a>
</div>
</td>
</tr>
<tr>
<td colspan="2"><h1 align="center">Title - secondname</h1></td>
</tr>
<tr>
<td><span class="style10">Cat1 :</span></td>
<td>1st name</td>
</tr>
<tr>
<td width="32%"><span class="style10">Cat2 :</span></td>
<td width="68%"><b><i>secondname</i></b></td>
</tr>
<tr>
<td><span class="style10">cat4 :</span></td>
<td>Bla bla</td>
</tr>
<tr>
<td><span class="style10">Cat3 :</span></td>
<td>thirdName2</td>
</tr>
</table>
</td>
<td valign="top">
<table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
<tr>
<td colspan="2">
<div align="center">
<a href="/title/name.php" target="_blank">
<img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
</a>
</div>
</td>
</tr>
<tr>
<td colspan="2"><h1 align="center">Title - secondname</h1></td>
</tr>
<tr>
<td><span class="style10">Cat1 :</span></td>
<td>1st name</td>
</tr>
<tr>
<td width="32%"><span class="style10">Cat2 :</span></td>
<td width="68%"><b><i>secondname</i></b></td>
</tr>
<tr>
<td><span class="style10">cat4 :</span></td>
<td>Bla bla</td>
</tr>
<tr>
<td><span class="style10">Cat3 :</span></td>
<td>thirdName2</td>
</tr>
</table>
</td>
I would like to get certain values from this site using python regex.
After <div align="center"> I like to get href value: "/title/name.php" and img src: "./movie/image.jpg" and Title - secondname from <h1 align="center">Title - secondname</h1>
i have tried this:
regex = 'class="main_tb3"*\n<a href="(.+?)" target="_blank">\n<img src="(.+?)"'
please help me
you can use below regex
For href value: <a href="(.*?)"
For Image src: <img src="(.*?)"
For Title: titleid=12">(.*?)<
You will find it a lot simpler to install something like BeautifulSoup to do this:
from bs4 import BeautifulSoup
html = """
<td valign="top">
<table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
<tr>
<td colspan="2">
<div align="center">
<a href="/title/name.php" target="_blank">
<img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
</a>
</div>
</td>
</tr>
<tr>
<td colspan="2"><h1 align="center">Title - secondname</h1></td>
</tr>
<tr>
<td><span class="style10">Cat1 :</span></td>
<td>1st name</td>
</tr>
<tr>
<td width="32%"><span class="style10">Cat2 :</span></td>
<td width="68%"><b><i>secondname</i></b></td>
</tr>
<tr>
<td><span class="style10">cat4 :</span></td>
<td>Bla bla</td>
</tr>
<tr>
<td><span class="style10">Cat3 :</span></td>
<td>thirdName2</td>
</tr>
</table>
</td>
<td valign="top">
<table width="100%" border="0" cellspacing="2" cellpadding="1" class="main_tb3">
<tr>
<td colspan="2">
<div align="center">
<a href="/title/name.php" target="_blank">
<img src="./movie/image.jpg" alt="TitleName" border="0" height="100" width="225" />
</a>
</div>
</td>
</tr>
<tr>
<td colspan="2"><h1 align="center">Title - secondname</h1></td>
</tr>
<tr>
<td><span class="style10">Cat1 :</span></td>
<td>1st name</td>
</tr>
<tr>
<td width="32%"><span class="style10">Cat2 :</span></td>
<td width="68%"><b><i>secondname</i></b></td>
</tr>
<tr>
<td><span class="style10">cat4 :</span></td>
<td>Bla bla</td>
</tr>
<tr>
<td><span class="style10">Cat3 :</span></td>
<td>thirdName2</td>
</tr>
</table>
</td>"""
soup = BeautifulSoup(html)
for table in soup.find_all("table", class_="main_tb3"):
print table.find('a').get('href')
print table.find('h1').text
For the HTML you have given, this will print the following:
/title/name.php
Title - secondname
/title/name.php
Title - secondname

xpath got wrong html attribute

So, I have this PHP scraper code and HTML below that I want to scrape using Xpath.
When I try to scrape every #href, it shows outerHTML 14, when it is supposes to be 14
The #href is cut in half where the space are. What causes this?
$content = $xpath->query('//a');
foreach($content as $c){
var_dump(htmlspecialchars($c->C14N())); echo '<br>';
}
The one above is the CURL code.
Here is the HTML.
<div class="outercalendar" id="maincalendar821"><table class="calendarHeader">
<tbody><tr>
<td><input type="button" onclick="AjxGetMainCalendarMonth('2', '2015', '821')" value="<"></td>
<td class="calendarHeader" colspan="5">March 2015</td>
<td><input type="button" onclick="AjxGetMainCalendarMonth('4', '2015', '821')" value=">"></td>
</tr>
</tbody></table>
<table class="calendar">
<tbody><tr>
<td class="calendarDay">S</td>
<td class="calendarDay">M</td>
<td class="calendarDay">T</td>
<td class="calendarDay">W</td>
<td class="calendarDay">T</td>
<td class="calendarDay">F</td>
<td class="calendarDay">S</td>
</tr>
<tr>
<td class="calendar">1</td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar">7</td>
</tr>
<tr>
<td class="calendar">8</td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar">14</td>
</tr>
<tr>
<td class="calendar">15</td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar">21</td>
</tr>
<tr>
<td class="calendar">22</td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar">28</td>
</tr>
<tr>
<td class="calendar">29</td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
<td class="calendar"> </td>
</tr>
</tbody></table>
</div>
The issue could be in the structure of the info stored in the tag.
I would suggest starting with a more detailed xpath:
//a/#href
so your initial code would be:
$content = $xpath->query('//a/#href');

how to use sql query result in html table

I am trying to create a html table showing results from php sql query. it is a result page of students php code is as under
$r1=$_GET["r"];
$con=mysqli_connect(localhost,chumspai_tlss,Tls121,chumspai_tlsResult);
if (mysqli_connect_errno())
{
echo "Failed to connect to MySQL: " . mysqli_connect_error();
}
$result = mysqli_query($con,"SELECT * FROM nursery_blue_ WHERE sr_='$r1'");
while($row = mysqli_fetch_array($result))
{
html code is
<pre>
<form name="frmResult" id="frmResult" action="" method="post" onsubmit="return checkEmpty();">
<table width="80%" cellpadding="5" cellspacing="5" border="0">
<tr>
<td class="heading noborder">Enter Your Roll Number:</td>
<td class="noborder"><input type="text" id="r" name="r" value="" /></td>
</tr>
<tr>
<!--
<td class="heading noborder">Enter Your Name:</td>
<td class="noborder"><input type="text" id="name" name="name" value="" /></td>
</tr>
<tr>
<td class="heading noborder">Search by</td>
<td class="noborder"><input type="radio" id="option" name="option" value="rno" checked="checked" />
Roll No
<input type="radio" id="option" name="option" value="name" />
Name </td>
</tr>
-->
<tr>
<td class="noborder"> </td>
<td class="noborder"><input type="submit" name="submit" value="Search" />
<input type="reset" name="reset" value="Clear" />
</td>
</tr>
<!--<tr>
<td colspan="2"> <embed src="images/wait.swf"></embed></td>
</tr> -->
</table>
</form>
<div style="border:1px solid #000000;">
<table width="100%" cellpadding="10" cellspacing="0" border="0">
<tr>
<td class="heading grey" width="30%">RNO</td>
<td><?php
Print $row['sr_'];
?>
</td>
</tr>
<tr>
<td class="heading grey">NAME</td>
<td class="shade"></td>
</tr>
<tr>
<td class="heading grey">FATHER</td>
<td></td>
</tr>
<tr>
<td class="heading grey">regno</td>
<td></td>
</tr>
</table>
<table width="100%" cellpadding="10" cellspacing="0" border="0">
<tr class="grey">
<td rowspan="2" class="heading">Sr.no </td>
<td rowspan="2" class="heading">Name of subject </td>
<td rowspan="2" class="heading">Maximum Marks</td>
<td colspan="7" class="heading">detail of marks Obtained</td>
<tr class="grey">
<td class="heading">PART ONE</td>
<td class="heading">Total</td>
</tr>
<tr>
<td>1</td>
<td>Urdu</td>
<td></td>
<td> </td>
<td></td>
</tr>
<tr class="shade">
<td>2</td>
<td>English</td>
<td></td>
<td> </td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>Islamyat</td>
<td></td>
<td> </td>
<td></td>
</tr>
<tr class="shade">
<td>4</td>
<td>pakstudies</td>
<td></td>
<td> </td>
<td></td>
</tr>
<tr class="shade">
<td>6</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr>
<td>7</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr class="shade">
<td>8</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr class="shade">
<td>9</td>
<td></td>
<td></td>
<td></td>
<td>0</td>
</tr>
<tr class="grey">
<td colspan="2" class="heading">TOTAL</td>
<td class="heading">1100</td>
<td colspan="4" class="heading"></td>
</tr>
<tr class="grey">
<td colspan="3" class="heading">NOTIFICATION</td>
<td class="heading"></td>
<td class="heading"></td>
<td colspan="2" class="heading"></td>
</tr>
<tr>
<td colspan="7">(i) This provisional result intimation is issued as a notice only. Errors and omissions are excepted.</td>
</tr>
</table>
</pre>
please help me how to embed this php query with this html table and html form also.
you are not so far.
The variable $row is an array containing your data. Try this to see it's structure in your while call:
print_r($row);
Using this command you will see the name of each item of your array. Note it somewhere. Then you can do something like this:
...<td><?php echo $row['desired_column_name']; ?></td>...
If you receive data from your mysql query, this should do the trick.
Hope it helps,
Paul
Try This :
$result = mysql_query("select * from emp");
while($row = mysql_fetch_array($result))
{
echo "<tr>";
echo "<td id=SrNo$cnt >".$row['eno']."</td>";
echo "<td id=ItemId$cnt >".$row['eId']."</td>";
echo "<td>". "<button name='Update' id='update' onclick='show(".$cnt.")'>UPDATE</button>"."</td>";
echo "<td>". "<button name='Report' id='show' onclick='Report(".$row['SrNo'].")'>REPORT</button>"."</td>";
echo "</tr>";
echo "<div id=show$cnt>";
echo "</div>";
$cnt++;
}

Php HTML DOM parsing

<table width="100%" cellspacing="0" cellpadding="0" border="0" id="Table4">
<tbody>
<tr>
<td valign="top" class="tx-strong-dgrey">
<a class="anc-noul" href="http://www.example.com/catalog/proddetail.asp?logon=&langid=EN&sku_id=0665000FS10129471&catid=25653">
Apple 8GB 3rd Generation iPod Touch</a></td>
</tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
<tr>
<td valign="top" class="tx-normal-grey">
Product detail
<a href="http://www.example.com/catalog/proddetail.asp?logon=&langid=EN&sku_id=0665000FS10129471&catid=25653">
More Info</a></td>
</tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
<tr>
<td valign="top" class="tx-normal-red">
<span class="tx-strong-dgrey">Price:</span>
$189.99</td>
</tr>
<tr>
<td valign="top">You save: $9.00 after instant savings</td>
</tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
<tr>
<td valign="top" class="tx-normal-grey">
<a href="http://www.example.com/catalog/subclass.asp?catid=25653&logon=&langid=EN">
View similar products</a>
<a href="http://www.example.com/catalog/mfr.asp?man=Apple&catid=19&logon=&langid=EN">
View similar products with same brand</a>
</td></tr>
<tr>
<td valign="top" class="element-spacer"/>
</tr>
</tbody>
</table>
I want to be able to get the $189.99.
echo $ret[0]->find('tr', 4)->plaintext;
This outputs: 'Price: $189.99'
I just need $189.99, not 'Price:'
$exp = explode(":", $ret[0]->find('tr', 4)->plaintext);
$price =$exp[1];

Categories