Getting the href attribute and text of certain kind of links - php

Of these four links:
<img border="0" src="imagenes/flech.gif" width="6" height="8">
Albano Y Romina Power<br>
<img border="0" src="imagenes/flech.gif" width="6" height="8">
Armando Manzanero<br>
<a name="inicio21" href="musica-Merengue-de-Banda_Cuisillos.html">
<img border="0" src="imagenes/flech.gif" width="6" height="8">Banda Cuisillos</a><br>
Baladas Alternativas<br>
I'm trying to capture the href value and the text of the link of the three first, leaving out the fourth link, in other words i'm trying to get this:
escuchar-baladas-de-Albano_Y_Romina_Power.html Albano Y Romina Power
escuchar-baladas-de-Armando_Manzanero.html Armando Manzanero
musica-Merengue-de-Banda_Cuisillos.html Banda Cuisillos
I was trying to make the most of the fact that the three first have imagenes/flech.gif and that way leave out the fourth, the thing that imagenes/flech.gif isn't in the same order. Here is my attempt to solve it where i get up to the href but include the fourth.
Thanks for any help

You should use an html parser and not a regex, try this:
<?php
$html = <<< EOF
<img border="0" src="imagenes/flech.gif" width="6" height="8">
Albano Y Romina Power<br>
<img border="0" src="imagenes/flech.gif" width="6" height="8">
Armando Manzanero<br>
<a name="inicio21" href="musica-Merengue-de-Banda_Cuisillos.html">
<img border="0" src="imagenes/flech.gif" width="6" height="8">Banda Cuisillos</a><br>
Baladas Alternativas<br>
EOF;
$dom = new DOMDocument();
#$dom->loadHTML($html);
# Iterate over all the <a> tags
foreach($dom->getElementsByTagName('a') as $link) {
$url = $link->getAttribute('href');
$text = preg_replace('/[\r\n]/sm', '', $link->nodeValue); // remove line breaks
//if doesn't contain the banned words...
if (!preg_match('/(Baladas Alternativas|another text to filter)/sm', $text)) {
echo $url ." ".$text. "\n";
}
}
?>
DEMO
http://ideone.com/5QX83x
RESOURCES
http://htmlparsing.com/php.html

this code will get the first 3 links
$a='<img border="0" src="imagenes/flech.gif" width="6" height="8">Albano Y Romina Power<br><img border="0" src="imagenes/flech.gif" width="6" height="8">Armando Manzanero<br><a name="inicio21" href="musica-Merengue-de-Banda_Cuisillos.html"><img border="0" src="imagenes/flech.gif" width="6" height="8">Banda Cuisillos</a><br>Baladas Alternativas<br>';
preg_match_all('/<a.*?href="(.+?)">(?:<img.*\d+">)?(.+?)<\/a>/',$a,$match);
echo $match[1][0] . " " . $match[2][0]."<br>";
echo $match[1][1] . " " . $match[2][1]."<br>";
echo $match[1][2] . " " . $match[2][2]."<br>";

Related

How to parse specific parts from table using PHP and Simple HTML Dom

I have managed to parse a table from some website, but I only need some elements:
include('simple_html_dom.php');
$html = file_get_html($kako);
foreach($html->find('table') as $e)
echo $e->innertext . '<br>';
$html->clear();
unset($html);
For this input:
<tr>
<td style="word-wrap: break-word;">
Title
<img alt="Verified" Title="Verified and marked" src="/images/verified.png" width="20" height="20">
<br>
<a href="/file/........." rel="nofollow">
<img alt="........" Title="........." src="......" width="16" height="16"></a>
<a href="magnet:?xt=urn:btih:.......A1337" rel="nofollow">
<img alt="Download ..... using magnet link" Title="Dow.....ing magnet link" src="/images/magnet.svg" width="16" height="16"></a>
Uploaded 5 days ago Size 1.6 GB</td><td class="is-hidden-touch" >1.6 GB
</td>
<td class="is-hidden-touch" style="text-align: center;" >5</td>
<td class="is-hidden-touch" style="text-align: center;">5 days ago</td><td style="text-align: center;">6627</td><td style="text-align: center;">2445</td>
</tr>
I need title, magnet link (<a href="magnet:?>) and the text "Uploaded 5 days ago....".
Is that possible? I have searched all manuals and could't find anything.
foreach($html->find('a') as $element){
if (strpos($element->href, 'magnet:') !== false){
$hrf = $element->href;
}
}
foreach($html->find('img') as $ele){
if(strpos($ele->title, 'magnet link') !== false){
$title = $ele->title;
}
}
foreach($html->find('td') as $eletd){
if(strpos($eletd->text,'Uploaded') !== false){
$text = $eletd->text;
}
}
I have customized code according to your HTML. Hope this works for you!

wrapped all unwrapped text with <p>

I have this string:
$str = 'সাংবাদিক<p>দলীয় সূত্রে</p>'
.'<img width="600" src="img/1.jpg">বিলুপ্ত হওয়া পাবনা'
.'বিলুপ্ত হওয়া পাবনা<img width="600" src="img/1.jpg">'
.'বিলুপ্ত হওয়া পাবনা<img width="600" src="img/1.jpg">বিলুপ্ত হওয়া পাবনা'
.'<p>শাহজাদপুর </p>';
and I want to turn into:
$str = '<p>সাংবাদিক</p><p>দলীয় সূত্রে</p>'
.'<img width="600" src="img/1.jpg"><p>বিলুপ্ত হওয়া পাবনা</p>'
.'<p>বিলুপ্ত হওয়া পাবনা</p><img width="600" src="img/1.jpg">'
.'<p>বিলুপ্ত হওয়া পাবনা</p><img width="600" src="img/1.jpg"><p>বিলুপ্ত হওয়া পাবনা</p>'
.'<p>শাহজাদপুর </p>';
I tried regex
$str = preg_replace('/^(?!<p>).*(?!<\/p>)/m', '<p>$0</p>', $str);
but not doing properly. Please help
It isn't a job for regex but for DOMDocument. Since you are working with html parts and not a whole html document, you need to wrap your string into a basic html skeleton to avoid bad surprises with the auto-correction and to provide the document encoding:
$str = 'সাংবাদিক<p>দলীয় সূত্রে</p>'
.'<img width="600" src="img/1.jpg">বিলুপ্ত হওয়া পাবনা'
.'বিলুপ্ত হওয়া পাবনা<img width="600" src="img/1.jpg">'
.'বিলুপ্ত হওয়া পাবনা<img width="600" src="img/1.jpg">বিলুপ্ত হওয়া পাবনা'
.'<p>শাহজাদপুর </p>';
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML('<html><head><meta charset="UTF-8" /></head><body>' . $str . '</body></html>');
$bodyNode = $dom->getElementsByTagName('body')->item(0);
$result = '';
foreach ($bodyNode->childNodes as $childNode) {
$result .= ($childNode->nodeType === XML_TEXT_NODE)
? '<p>' . $dom->saveHTML($childNode) . '</p>'
: $dom->saveHTML($childNode);
}
echo $result;

php code not running until second submit

I had it working and at some point broke it yesterday and can't figure it out.
I am generating an html file from a form input.
I call a function if data is posted to the page, but the "foreach" loops don't run the first time the form is submitted...and it's 1 asset shy afterwards, if you upload 3 images, it'll show 2 on the generated page...
function AddClientDB($client, $pth, $project){
mysql_connect('localhost', 'dbname', 'pw'); //connect to db
mysql_select_db('tablename'); //select file
$indx = $project.".html";
$sql="INSERT INTO Clients VALUES (NULL,'$client', '$project', 'http://webpage.net/','$pth','$indx')";
mysql_query($sql) or DIE("Problems with the query:<pre>$sql</pre>" . mysql_error());
//Create client folder
if (!file_exists('uploads/'.$client)) {
mkdir('uploads/'.$client, 0777, true);
echo "Created Folder for Client: ". $_GET['client']. "<br />";
}
//Make project folder under client
if (!file_exists('uploads/'.$client. '/'. $project)) {
mkdir('uploads/'.$client.'/'.$project, 0777, true);
}
$sql="INSERT INTO Projects VALUES (NULL,'$project', '$client',0,0,'$pth')";
mysql_query($sql) or DIE("Problems with the query:<pre>$sql</pre>" . mysql_error());
$myFile = 'uploads/'.$client.'/'.$project . '/' . $project.".html";
$fh = fopen($myFile, 'w') or die("can't open file");
//Top part of html page to make
$stringDataA = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Title</title>
<link rel="stylesheet" type="text/css" href="http://pixelfirereview.net/styles.css">
<script type="text/javascript">
function popDate(){
var dt=new Date();
document.getElementById("dat").innerHTML=dt;
}
</script>
</head>
<body onload="popDate();">
<div align="center">
<table width="960" border="0" cellspacing="0" cellpadding="0">
<tr>
<td><table width="960" border="0" cellspacing="0" cellpadding="0">
<tr>
<td width="480"></td>
<td width="480"><img src="http://www.pixelfireinc.com/pfp2011/wp-content/uploads/2011/04/PixelfireLogoTopSM1.png" width="340" height="72" border="0" align="right" /></td>
</tr></table></td></tr><tr><td> </td></tr><tr><td>
<table width="960" border="0" cellspacing="0" cellpadding="0"><tr>
<td><div align="center"</div></td></tr><tr>
<td><div align="center">';
//Foreach valid file in the project folder, add it to our $content variable as a string.
//For some reason, the foreach loops only run the second time the form is submitted...if you hit f5 and continue through the warning about resubmitting data, it then fires the code in the foreach loops....
foreach (glob($pth. '*.jpg') as $filename2) {
echo "<br />filename2: ". $filename2. "<br />";
// echo "<br />$filename size " . filesize($filename2) . "<br />";
$content = $content . '<span class="m_title">'.$filename2.'</span><br /><p><img src="http://pixelfirereview.net/'.$filename2.'" /><br /><br /><img src="http://pixelfire.net/clients/images/btn_download.png" width="235" height="50" border="0" align="right" /></p><br />';
}
foreach (glob($pth. '*.png') as $filename3) {
echo "<br />filename3: ". $filename3."<br />";
echo "<br />\n$filename size " . filesize($filename3) . "<br />";
$content = $content . '<br /><span class="m_title"><!-- InstanceBeginEditable name="Project Title" -->'.$filename3.'</span><img src="http://pixelfirereview.net/'.$filename3.'" />
<br /><br /><p><img src="http://pixelfire.net/clients/images/btn_download.png" width="235" height="50" border="0" align="right" /></p><br />';
}
foreach (glob($pth. '*.mp4') as $filename4) {
echo "<br />filename4: ". $filename4."<br />";
// echo "<br />\n$filename size " . filesize($filename4) . "<br />";
$content= $content. '<br /><span class="m_title">'.$filename4.'</span><div id="mediaplayer'.$filename4.'"></div><script type="text/javascript" src="http://www.pixelfire.net/clients/jwplayer.js"></script><script type="text/javascript">
jwplayer("mediaplayer'.$filename4.'").setup({
flashplayer: "http://www.pixelfire.net/clients/player.swf",
file: "http://pixelfirereview.net/'.$filename4.'",
width: "960",
height: "565",
autoplay: "false",
image: "http://www.pixelfire.net/clients/images/VideoPreview.jpg",
repeat: "always",
controlbar: "bottom",
});
</script><br />
<p><img src="http://pixelfire.net/clients/images/btn_download.png" width="235" height="50" border="0" align="right" /></p><br />';
}
foreach (glob($pth. '*.wav') as $filename5) {
$content = $content . '<span class="m_title">'.$filename5.'</span><div id="mediaplayer'.$filename5.'"></div> <script type="text/javascript" src="http://www.pixelfire.net/clients/jwplayer.js"></script> <br /><script type="text/javascript">
jwplayer("mediaplayer'.$filename5.'").setup({
flashplayer: "http://www.pixelfire.net/clients/player.swf",
file: "http://pixelfirereview.net/'.$filename5.'",
width: "960",
height: "565",
autoplay: "false",
image: "http://www.pixelfire.net/clients/images/VideoPreview.jpg",
repeat: "always",
controlbar: "bottom",
});
</script><br />
<p><img src="http://pixelfire.net/clients/images/btn_download.png" width="235" height="50" border="0" align="right" /></p>';
}
//Create the lower half of the html page
$stringDataB= '</div></td></tr><tr><td height="20"> </td></tr><tr><td><table width="960" border="0" cellspacing="0" cellpadding="0"><tr>
<td width="620" valign="top"><table width="550" border="0" cellspacing="0" cellpadding="1"><tr>
<td width="125" class="m_main"><div align="right">Last Modified:</div></td>
<td width="400"><table width="100%" border="0" cellspacing="0" cellpadding="0"><tr>
<div id="dat" style="margin-top:80px;background:#333333;padding:.5em;" align="left" class="m_main_alt"></div></div></td></tr></table></td></tr></table></td><td width="350" valign="top">
<table width="350" border="0" cellspacing="1" cellpadding="0"><tr>
<td></td>
</tr></table></td></tr></table></td></tr><tr><td height="100"></td></tr></table></td></tr><tr><td></td></tr><tr><td>
<div align="center" class="footer">© 2013 PixelFire Productions<br /> (425) 917-1400 </div></td> </tr> </table></div></body></html>';
//Put the pieces together, top html, content, bottom html.
$stringData = ''.$stringDataA . $content . $stringDataB.'';
fwrite($fh, $stringData);
}
It creates the HTML page and all, but the first time it runs $content is empty....if you press F5 after submitting the form and press Continue at the message about resubmitting data, it then runs the code in the foreach loops and $content will contain a string that's put in between the top and bottom halves of the html.......
Any ideas why that might be???
Got it working by splitting it into a couple functions and forcing the order.

What am I Doing wrong with str_replace ()?

Here is the HTML Code :
<a href="?loadurl=/search/Battlefield 3/1/99/0/">
<img src="static/img/next.gif" border="0" alt="Next" />
</a>
And this is the PHP Code :
//Fix Icons
$toremove = str_replace("next.gif\" border=\"0\" alt=\"Next\">", "dot.jpg\" border=\"0\" alt=\"Next\"><i class=\"icon-magnet\" style=\"color: #ffdd00;text-decoration: none;\"></i>", $toremove);
What am I doing wrong ?
Any help would be appreciated :)
~Kazilotus
Your HTML is using XHTML syntax: <img ... /> but your PHP is looking for HTML syntax: <img ... >. You need to make up your mind which to use and stick with it.
For example,
$toremove = str_replace("next.gif\" border=\"0\" alt=\"Next\">", "dot.jpg\" border=\"0\" alt=\"Next\"><i class=\"icon-magnet\" style=\"color: #ffdd00;text-decoration: none;\"></i>", $toremove);
Should be:
$toremove = str_replace("next.gif\" border=\"0\" alt=\"Next\" />", "dot.jpg\" border=\"0\" alt=\"Next\"><i class=\"icon-magnet\" style=\"color: #ffdd00;text-decoration: none;\"></i>", $toremove);
In your sample code.

how to store in a variable in its script format as string using php

I want to store this script below as a variable using a text area form but when I do and echo the variable, the browser interprets the code. I don't want this, I just want to store the script "as is" in a variable so I can use that variable for further manipulation. any ideas how I can accomplish this using a text area form in php?
<img border="0" src="http://ws.assoc-amazon.com/widgets/q?_encoding=UTF8&ASIN=B008EYEYBA&Format=_SL110_&ID=AsinImage&MarketPlace=US&ServiceVersion=20070822&WS=1&tag=mytwitterpage-20" ><img src="http://www.assoc-amazon.com/e/ir?t=mytwitterpage-20&l=as2&o=1&a=B008EYEYBA" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />
btw, below is the exact php variable that I want when the user put the script above in a text area.
$str = "<img border="0" src="http://ws.assoc-amazon.com/widgets/q?_encoding=UTF8&ASIN=B008EYEYBA&Format=_SL110_&ID=AsinImage&MarketPlace=US&ServiceVersion=20070822&WS=1&tag=mytwitterpage-20" ><img src="http://www.assoc-amazon.com/e/ir?t=mytwitterpage-20&l=as2&o=1&a=B008EYEYBA" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />";
any help is greatly appreciated. thanks
use like this
<?php
$str = '<img border="0" src="http://ws.assoc-amazon.com/widgets/q?_encoding=UTF8&ASIN=B008EYEYBA&Format=_SL110_&ID=AsinImage&MarketPlace=US&ServiceVersion=20070822&WS=1&tag=mytwitterpage-20" ><img src="http://www.assoc-amazon.com/e/ir?t=mytwitterpage-20&l=as2&o=1&a=B008EYEYBA" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" />' ;
echo htmlentities($str);
?>
If you need to output your value in HTML, you need to replace your < and > with < and >. Best is to use htmlspecialchars() function for that.
Also you were told correctly that your string has unescaped quotes in it (replace inside quotes with \").

Categories