Scraping Webpage with DOMdocument and DOMXpath - php

I'm very new to this. I would like to extract a table from a page using PHP and return it's HTML after modifying the HREF values of all anchors.
Here is the table:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1255">
<link rel="stylesheet" type="text/css" href="../CssGraduateE.css">
<title></title>
</head>
<body>
<div>
<br>
<table class="main" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td>
<br><span class="MainHeader">Subjects in Faculty - Electrical Engineering</span><br><br>
<table cellpadding="2" cellspacing="0" border="1" width="100%">
<tbody>
<tr>
<td><span class="SecondHeader"> Subject Number</span></td>
<td><span class="SecondHeader">Subject Name</span></td>
<td><span class="SecondHeader">Points</span></td>
<td><span class="SecondHeader">Semesters</span></td>
<td>Subject Site</td>
</tr>
<tr>
<td>46001 </td>
<td nowrap="">Engineering of Distributed Software Sys</td>
<td>3</td>
<td><br></td>
<td><a target="_newtab" href="http://www.thislinkisok.com/courses/046001">www</a></td>
</tr>
<tr>
<td>46002 </td>
<td nowrap="">Design and Analysis of Algorithms</td>
<td>3</td>
<td>B<br></td>
<td> <br></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<br>
<table border="0">
<tbody>
<tr>
<td>Last Update on :</td>
<td>Wednesday ,9 April 2014</td>
<td></td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
I know how to grab the table I want:
$query = $xpath->query('//table[#class="main"]//table[1]');
but how do I loop through all the links that begin with "../xxx" and modify them to something like this: "www.mynewlink.com/xxx" ?
At the end I would like to return the extracted table as HTML. How do I do this with native DOMDocument and DOMXpath?
Thanks All!

If $html is your string with HTML you get from the external website, you can do something like this:
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//table[#class="main"]//a[starts-with(#href, "../")]') as $link) {
$link->setAttribute('href', preg_replace('#^..#', 'http://www.mynewlink.com', $link->getAttribute('href')));
}
$container = new DOMDocument();
$container->appendChild($container->importNode($xpath->query('//table[#class="main"]')->item(0), true));
echo $container->saveHTML();

Related

how to add manual page break in tcpdf

Hai everyone iam trying to add manual page break in tcpdf i tried , but it doesn't works, how to break this..? in the location where i need to include coding
$content = '
<style>
.chead
{
...
</style>
<table class="body">
<tr>
<td>
<table style="width:595px;">
<tr>
'.$myhead.'
</tr>
</table>
</td>
</tr>
//how to add manual page break here..?
<tcpdf method="AddPage" />
<tr>
<td>
<table style="width:595px;">
<tr>
'.$mybody.'
</tr>
</table>
</td>
</tr>
</table>';
Try use:
<br pagebreak="true">
Result:

How to convert english to arabic text in FPDF

I am using FPDF api to get pdf documents but i am using english text in html format to create pdf is there is possible to convert english words to arabic in fpdf i tried but i was not able to get it any? anyone help will be highly appreciated
Here's my code
<?php
$pdf=new PDF_HTML();
$pdf->AliasNbPages();
$pdf->SetAutoPageBreak(true, 15);
$pdf->AddPage();
$pdf->SetFont('Arial','B',14);
$pdf->WriteHTML('<para><h1>Created By test Developer</h1><br>
');
$pdf->SetFont('Arial','B',7);
$htmlTable='
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ar" dir="rtl">
<body>
<div class="trn">
<p>The Client</p>
<p> </p>
<table border="1" cellpadding="1" cellspacing="1" style="width:500px">
<tbody>
<tr>
<td>name1</td>
<td>name2</td>
</tr>
<tr>
<td>
<table border="1" cellpadding="1" cellspacing="1" style="width:500px">
<tbody>
<tr>
<td>>بيست</td>
<td> </td>
</tr>
</tbody>
</table>
</td>
<td>
<table border="1" cellpadding="1" cellspacing="1" style="width:500px">
<tbody>
<tr>
<td>name1</td>
<td> </td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td>
<table border="1" cellpadding="1" cellspacing="1" style="width:500px">
<tbody>
<tr>
<td>name1</td>
<td> </td>
</tr>
</tbody>
</table>
</td>
<td>
<table border="1" cellpadding="1" cellspacing="1" style="width:500px">
<tbody>
<tr>
<td>name1</td>
<td> </td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<p> </p>
<p>Your Faithfully,</p>
</div></body></html>
';
$pdf->WriteHTML2("<br><br><br>$htmlTable");
$pdf->SetFont('Arial','B',6);
For more code please visit http://codepad.viper-7.com/C1NSqR

Send HTML email message in PHP

I'm trying to send an HTML formatted invoice but it is sending the message as plain text rather than formatted HTML.
The code is:
$this->load->library('email',$config);
$this->email->set_newline("\r\n");
$this->email->from('sample#email.com', 'Sample');
$this->email->to('sample2#email.com');
$this->email->cc('sample3#email.com');
$this->email->subject('Sample Test');
$this->email->message($message);
$this->email->send();
echo $this->email->print_debugger();
and the content:
$message ='
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<style>
<![CDATA[
body, td { color:#a6a7a3; font-size: 14px; font-family:Arial; font-weight:normal; text-decoration:none; }
a { color:#a6a7a3; font-weight:normal; text-decoration:none; }
table td { border-collapse: collapse;}
]]>
</style>
</head>
<body>
<table>
<tr>
<td>
<table>
<tr>
<td>
<!-- // Begin Template Header \ -->
<table>
<tr>
<td><!--IMAGE--></td>
</tr>
</table><!-- // End Template Header \ -->
</td>
</tr>
<tr>
<td>
<!-- // Begin Template Body \ -->
<table>
<tr>
<td>
<!-- // Begin Module: Standard Content \ -->
<table>
<tr>
<td>
<div>
<strong style="font-size:14px;font-family:Arial;">Dear Sample,</strong><br />
<br />
Thank you for being with us. <!-- Start Transaction Information -->
<table>
<tr>
<td></td>
</tr>
<tr>
<td></td>
</tr>
<tr>
<td><strong>Sample</strong></td>
</tr>
</table>
<table>
<tr>
<td></td>
</tr>
<tr>
<td>Item Name</td>
<td>Quantity</td>
<td>Item Price</td>
<td>Item Code</td>
<td>Shipping</td>
</tr>
<tr>
<td>'.$item.'</td>
<td>'.$quantity.'</td>
<td>'.$price.'</td>
<td>'.$code.'</td>
<td>'.$shipping.'</td>
</tr>
</table><!-- End Transaction Information -->
</div>
</td>
</tr>
</table><!-- // End Module: Standard Content \ -->
</td>
</tr>
</table><!-- // End Template Body \ -->
</td>
</tr>
</table><br />
</td>
</tr>
</table><br />
</body>
</html>';
$this->email->set_mailtype("html");
Here you might also find this useful.. I was bored. You'll get much better device compatibility with this code..
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta content="telephone=no" name="format-detection" /><!-- ios -->
<meta content="on" http-equiv="cleartype" /><!-- Internet explorer -->
<style>
<![CDATA[
html,body,.ReadMsgBody, .ExternalClass,.ExternalClass * { background-color:#ffffff;width:100%!important;line-height:100%;margin:0;-webkit-text-size-adjust:100%; }
table,table td { border-collapse:collapse;mso-table-lspace:0pt;mso-table-rspace:0pt; }
img { display:block;border:0 none;outline:none;height:auto;line-height:100%;margin:0;padding:0;text-decoration:none;-ms-interpolation-mode:bicubic; }
.container { width:600px; }
#outlook a { padding:0; }
.yshortcuts,.yshortcuts a,.yshortcuts a:link,.yshortcuts a:visited,.yshortcuts a:hover,.yshortcuts a span { color:#a6a7a3;text-decoration:none!important;border-bottom:none!important;background:none!important;}
]]>
</style>
<title></title>
</head>
<body bgcolor="#FFFFFF" style="min-width:100%;margin-top:0;margin-bottom:0;margin-left:0;margin-right:0;padding-top:0;padding-bottom:0;padding-left:0;padding-right:0;color:#a6a7a3;font-family:Arial,sans-serif;">
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td>
<!-- // Begin Template Header \ -->
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td><!--IMAGE (style-display:block;!!)--></td>
</tr>
</table><!-- // End Template Header \ -->
</td>
</tr>
<tr>
<td>
<!-- // Begin Template Body \ -->
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td>
<!-- // Begin Module: Standard Content \ -->
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td style="color:#a6a7a3;font-size:14px;font-family:Arial,sans-serif;text-align:left;">
<b>Dear Sample,</b><br />
</td>
</tr>
<tr>
<td style="color:#a6a7a3;font-size:14px;font-family:Arial,sans-serif;text-align:left;">
Thank you for being with us. <!-- Start Transaction Information -->
</td>
</tr>
<tr>
<td style="color:#a6a7a3;font-size:14px;font-family:Arial,sans-serif;text-align:left;">
<b>Sample</b>
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table border="0" cellpadding="3" cellspacing="0"><tr><td height="3"><table border="0" cellpadding="0" cellspacing="0"><tr><td></td></tr></table></td></tr></table>
</td>
</tr>
<tr>
<td>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td style="text-align:left;font-family:Arial,sans-serif;font-size:14px;color:#a6a7a3;">Item Name</td>
<td style="text-align:left;font-family:Arial,sans-serif;font-size:14px;color:#a6a7a3;">Quantity</td>
<td style="text-align:left;font-family:Arial,sans-serif;font-size:14px;color:#a6a7a3;">Item Price</td>
<td style="text-align:left;font-family:Arial,sans-serif;font-size:14px;color:#a6a7a3;">Item Code</td>
<td style="text-align:left;font-family:Arial,sans-serif;font-size:14px;color:#a6a7a3;">Shipping</td>
</tr>
<tr>
<td style="text-align:left;font-family:Arial,sans-serif;font-size:14px;color:#a6a7a3;">'.$item.'</td>
<td style="text-align:left;font-family:Arial,sans-serif;font-size:14px;color:#a6a7a3;">'.$quantity.'</td>
<td style="text-align:left;font-family:Arial,sans-serif;font-size:14px;color:#a6a7a3;">'.$price.'</td>
<td style="text-align:left;font-family:Arial,sans-serif;font-size:14px;color:#a6a7a3;">'.$code.'</td>
<td style="text-align:left;font-family:Arial,sans-serif;font-size:14px;color:#a6a7a3;">'.$shipping.'</td>
</tr>
</table><!-- End Transaction Information -->
</td>
</tr>
</table><!-- // End Module: Standard Content \ -->
</td>
</tr>
</table><!-- // End Template Body \ -->
</td>
</tr>
</table>
</body>
</html>

php scrubbing a website for icecast listeners

can anyone help extract the current listener count from the link below using php
I have attached phph code below as well but it need to be modified
http://209.105.250.69:8382/
and the source is below
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Icecast Streaming Media Server</title>
<link rel="stylesheet" type="text/css" href="style.css">
</head>
<body topmargin="0" leftmargin="0" rightmargin="0" bottommargin="0">
<h2>Icecast2 Status</h2>
<br><div class="roundcont">
<div class="roundtop"><img src="/corner_topleft.jpg" class="corner" style="display: none"></div>
<table border="0" width="100%" id="table1" cellspacing="0" cellpadding="4"><tr><td bgcolor="#656565">
<a class="nav" href="admin/">Administration</a><a class="nav" href="status.xsl">Server Status</a><a class="nav" href="server_version.xsl">Version</a>
</td></tr></table>
<div class="roundbottom"><img src="/corner_bottomleft.jpg" class="corner" style="display: none"></div>
</div>
<br><br><div class="roundcont">
<div class="roundtop"><img src="/corner_topleft.jpg" class="corner" style="display: none"></div>
<div class="newscontent">
<div class="streamheader"><table cellspacing="0" cellpadding="0">
<colgroup align="left"></colgroup>
<colgroup align="right" width="300"></colgroup>
<tr>
<td><h3>Mount Point /listen.mp3</h3></td>
<td align="right">
M3UXSPF
</td>
</tr>
</table></div>
<table border="0" cellpadding="4">
<tr>
<td>Stream Title:</td>
<td class="streamdata">Quran Kareem Radio</td>
</tr>
<tr>
<td>Stream Description:</td>
<td class="streamdata">Quran Kareem Radio</td>
</tr>
<tr>
<td>Content Type:</td>
<td class="streamdata">audio/mpeg</td>
</tr>
<tr>
<td>Mount started:</td>
<td class="streamdata">Thu, 11 Apr 2013 19:19:59 -0400</td>
</tr>
<tr>
<td>Bitrate:</td>
<td class="streamdata">60</td>
</tr>
<tr>
<td>Current Listeners:</td>
<td class="streamdata">5</td>
</tr>
<tr>
<td>Peak Listeners:</td>
<td class="streamdata">25</td>
</tr>
<tr>
<td>Stream Genre:</td>
<td class="streamdata">Islam</td>
</tr>
<tr>
<td>Stream URL:</td>
<td class="streamdata"><a target="_blank" href="http://qkradio.com.au">http://qkradio.com.au</a></td>
</tr>
<tr>
<td>Current Song:</td>
<td class="streamdata"></td>
</tr>
</table>
</div>
<div class="roundbottom"><img src="/corner_bottomleft.jpg" class="corner" style="display: none"></div>
</div>
<br><br>
<div class="poster">Support icecast development at <a class="nav" target="_blank" href="http://www.icecast.org">www.icecast.org</a>
</div>
</body>
</html>
I have used the following so far but it needs to be modified
<?php
$fp = fsockopen("listen.qkradio.com.au", 8382, $errno, $errstr, 30);
if (!$fp) {
echo "$errstr ($errno)<br />\n";
} else {
for($i=0; $i<30; $i++) {
if(feof($fp)) break;
$fp_data=fread($fp,31337);
usleep(500000);
}
$fp_data=ereg_replace("^.*<body>","",$fp_data);
$fp_data=ereg_replace("</body>.*","",$fp_data);
list($current,$status,$peak,$max,$unique,$bitrate,$song) = explode(",", $fp_data);
if ($status == "0") {
echo "<center>Off Air</center>";
} else {
echo "<TABLE>";
//start edit below - comment with "//" unnecessary lines
echo "<TR><TD>Current Listeners: </TD><TD>".$current."</TD></TR>";
//echo "<TR><TD>Server Status: </TD><TD>".$status."</TD></TR>";
//stop editing from here
echo "</TABLE>";
//start edit below - comment with "//" next line to stop scrolling
echo "<marquee>".$song."</marquee>";
//stop editing here
}}
?>
First, I would use
$html = file_get_contents('http://209.105.250.69:8382/');
to retrieve the remote file. Then I would use:
$doc = new DOMDocument();
$doc->loadHtml($html);
to build a DOM document of it. The you can use xpath to retrieve the information from it:
$selector = new DOMXPath($doc);
$result = $selector->query('....');
Then, for example, you can use the following code to retrieve stream stats:
$stats = array(
'Stream Title' => '',
'Stream Description' => '',
'Bitrate' => ''
// ...
);
foreach($stats as $key => $val) {
$result = $selector->query("//td[text()='$key:']");
foreach($result as $node) {
$stats[$key] = $node->nextSibling->nextSibling->nodeValue;
}
}
var_dump($stats);
// output the stream description
echo $stats['Stream Description'];

How can i get the entire HTML of an element using regex?

i'm learning Regex but can't figure it out.... i want to get the entire HTML from a DIV, how to procced?
already tried this;
/\< td class=\"desc1\"\>(.+)/i
it returns;
Array
(
[0] => < td class="desc1">
[1] =>
)
the code that i'm matching is this;
<table id="profile" cellpadding="1" cellspacing="1">
<thead>
<tr>
<th colspan="2">Jogador TheInFEcT </th>
</tr>
<tr>
<td>Detalhes</td>
<td>Descrição:</td>
</tr>
</thead><tbody>
<tr>
<td class="empty"></td><td class="empty"></td>
</tr>
<tr>
<td class="details">
<table cellpadding="0" cellspacing="0">
<tbody><tr>
<th>Classificação</th>
<td>11056</td>
</tr>
<tr>
<th>Tribo:</th>
<td>Teutões</td>
</tr>
<tr>
<th>Aliança:</th>
<td>-</td>
</tr>
<tr>
<th>Aldeias:</th>
<td>1</td>
</tr>
<tr>
<th>População:</th>
<td>2</td>
</tr><tr>
<td colspan="2" class="empty"></td>
</tr>
<tr>
<td colspan="2"> » Alterar perfil</td>
</tr>
</tbody></table>
</td>
<td class="desc1">
<div>STATUS: OFNAaaaAA</div>
</td>
</tr>
</tbody>
</table>
i need to get the entire code inside the < td class="desc1">, like that;
<div >STATUS: OFNAaaaAA< /div>
</td>
</tr>
</tbody>
</table>
Could someone help me out?
Thanks in advance.
I usually use
$dom = DOMDocument::load($htmldata);
for converting HTML code to XML DOM. And then you can use
$node = $dom->getElementsById($id);
/* or */
$nodes = $dom->getElementsByTagName($tag);
to get your HTML/XML node.
Now, use
$node->textContent
to get data inside node.
try this, it does not cover all possible cases but it should work:
/<td\s+class=['"]\s*desc1\s*['"]\s*>((.|\n)*)<\/td>/i
tested with: http://www.pagecolumn.com/tool/pregtest.htm
edit: improved solution suggested by Alan Moore
/<td\s+class=['"]\s*desc1\s*['"]\s*>(.*?)<\/td>/s

Categories