I was recently asked to take part in a certain project that, for now, aims to parse a chunk of HTML code to PHP . Using a certain website that I have been assigned, I went through inspect element to complete my code's missing parts. The actual aim is to spit out (using echo) some certain data on localhost, without them being stored into a database or anything relevant. Attached is the html and PHP code in a few printscreens (couldnt upload the raw codes, dunno why). Thanks in advance!
Php code:
<?php
include_once('simple_html_dom.php');
$html = new simple_html_dom();
// Website link to scrap
$website = 'https://www1.gsis.gr/webtax3/etak/faces/main.jspx?_adf.ctrl-
state=16kjeyshcz_4&_afrLoop=70130840737831';
// Create DOM from URL or file
$html = file_get_html($website, false, null, 0);
//$html = str_get_html('<html><body><div id="pt1:r1:0:t3::db">Hello</div>
<div class="xx8">Goodbye</div></body></html>');
//$ret = $html->find('.xx8', 0)->plaintext;
if (is_array($html)) {
foreach($html->find('div[class=xx8]')->outertext as $data) {
echo $data->outertext;
}
}
?>
HTML code (via inspect element, where Δεν βρεθηκαν γηπεδα is the custom text of the page i told you about):
<div id="pt1:r1:1:t3::db" class="xx8"
style="position:relative;width:100%;overflow:hidden" _afrcolcount="30"><table
class="xxb xy3" style="table-layout:fixed;position:relative;width:2097px;"
cellspacing="0" _totalwidth="2097" _selstate="{}" _rowcount="0" _startrow="0">
<colgroup span="30"><col style="width:80px;"><col style="width:110px;"><col
style="width:105px;"><col style="width:105px;"><col style="width:105px;"><col
style="width:75px;"><col style="width:35px;"><col style="width:50px;"><col
style="width:55px;"><col style="width:80px;"><col style="width:65px;"><col
style="width:65px;"><col style="width:55px;"><col style="width:95px;"><col
style="width:65px;"><col style="width:55px;"><col style="width:75px;"><col
style="width:75px;"><col style="width:60px;"><col style="width:60px;"><col
style="width:60px;"><col style="width:60px;"><col style="width:50px;"><col
style="width:50px;"><col style="width:60px;"><col style="width:50px;"><col
style="width:62px;"><col style="width:125px;"><col style="width:55px;"><col
style="width:55px;"></colgroup></table>Δε βρέθηκαν γήπεδα.</div>
Trying to assign variables set by user in URL to a table in HTML using php but it doesn't seem to work.
At start of body:
<?php
$border = $_GET['border'];
$cellpadding = $_GET['cellpadding'];
$bgcolor = $_GET['bgcolor'];
?>
Below, in table:
<table <?php echo" style='border: $border; padding: $cellpadding; background-color: $bgcolor;' "; ?> >
Thank You.
This code is the same and I don't think that there's a problem with your code
But I prefer displaying PHP in HTML like that. If it does not work open inspect element and check the HTML code there and post it here
<table style="border:<?=$border?>; padding:<?=$cellpadding?>; background-color:<?=$bgcolor?>;">
I'm using the fpdf library to output a pdf from html. The .pdf is being created because I can email it to myself and it arrives in the correct format but if I want to download the .pdf as an option without emailing, the output is illegible. The output appears in a browser window(see attached screen shot) and I'm unsure how to fix this issue.
I've also attached a screenshot of how we have our report options set up. 1. HTML 2. PDF 3. Download 4. Email -- the HTML and Email options work, the PDF and Download options do not. I'm focusing on the Download option in this question.
This is the output code that I've tried to test out but no luck
//$pdf->Output("D","D:/example2.pdf");
//$content = $pdf->Output("","S");
//$pdf->Output(); //Outputs on browser screen
//Outputs on browser screen
$pdf->Output();
//echo file_get_contents($pdf);
//readfile($pdf);
the $pdf->Output(); is generating the illegible code the readfile
echo and file_get_contents throw errors
the $pdf->Output(... gives me an error that says Incorrect Output
Destination (see attached screenshot)
Need guidance -- thanks for any help.
here is the full code:
<?php
$m_header = '<link href="shared/report.css" rel=stylesheet type="text/css">';
$m_body_tag = ' scroll=no';
require_once($DOCUMENT_ROOT."inc/top-2.inc.php");
$i_get_sid = isset($_GET["sid"]) ? (int)$_GET["sid"] : $i_sid;
$i_get_pass = isset($_GET["a"]) ? $_GET["a"] : $_SESSION['r_pass'];
$i_get_pass = addslashes($i_get_pass);
$i_pdf_file_url = 'report.php?sid='.$i_get_sid.'&a='.urlencode($i_get_pass).'&b=/report.pdf';
echo '<table cellpadding=0 cellspacing=0 border=0 width="100%">';
echo '<tr vAlign=top><td height=7><img src="images/1x1.gif" width=1 height=7></td></tr>';
echo '<tr height=28 style="background: url(images/bookm-bg.gif) repeat-x"><td width="100%"><nobr>';
echo '<img src="images/1x1.gif" width=5 height=1><img src="images/bookm-42.gif" width=67 height=28 border=0><img src="images/bookm-s1.gif" width=10 height=28 border=0><img src="images/bookm-51.gif" width=65 height=28 border=0> <img src="images/button-downloadpdf.gif" width=80 height=28 border=0> <img src="images/button-emailpdf.gif" width=80 height=28 border=0>';
echo '</nobr></td><td><nobr><font style="font-size: 10px;">Close Window </font></nobr></td></tr></table>';
(This is what I added as a workaround)echo '<p><a href="'.$i_pdf_file_url.'" name="plugin" width=100% height=100% fullscreen=yes style="position: absolute;">Click Here to open the PDF</p>';
(This is what should display the PDF in the browser but wont' work)
echo '<p><embed type="application/pdf" src="'.$i_pdf_file_url.'" name="plugin" width=100% height=100% fullscreen=yes style="position: absolute;"></p>';
require_once($DOCUMENT_ROOT."inc/btm-2.inc.php");
?>
From documentation:
Destination where to send the document. It can be one of the following:
I: send the file inline to the browser. The PDF viewer is used if available.
D: send to the browser and force a file download with the name given by name.
F: save to a local file with the name given by name (may include a path).
S: return the document as a string.
The default value is I.
If you want to force download with given name, create a link that points to your pdf script and use this:
$pdf->Output("__name__","F");
If you want to display pdf for preview use this:
$pdf->Output("__name__","I");
and in your html inside PDF tab use iframe to embed pdf:
<iframe src="pdf_preview.php" frameborder="0"></iframe>
There are other ways to do this, but this should be the easest.
I am writing a simple HTML email design editor in PHP and also show a demo of how this will look.
I think it would also be very useful to show the user how this will look in an email client such as gmail with images turned off.
What is my best approach for this? Anybody know how this is done in gmail/hotmail etc?
Do I simple remove img -> src and css background: url with a reg expression?
I would like to remove the background parts from:
background="url" used in tables and
background-image:url(url); used inline css
I found this question which has the same kind of idea, although I would like to actually remove the img and backrgound-images from the HTML text.
Or could this code be modified to work with background images also?
I would also suggest using PHP DOM instead of regex, which are often inaccurate. Here is an example code you could use to strip all the img tags and all the background attributes from your string:
// ...loading the DOM
$dom = new DOMDocument();
#$dom->loadHTML($string); // Using # to hide any parse warning sometimes resulting from markup errors
$dom->preserveWhiteSpace = false;
// Here we strip all the img tags in the document
$images = $dom->getElementsByTagName('img');
$imgs = array();
foreach($images as $img) {
$imgs[] = $img;
}
foreach($imgs as $img) {
$img->parentNode->removeChild($img);
}
// This part strips all 'background' attribute in (all) the body tag(s)
$bodies = $dom->getElementsByTagName('body');
$bodybg = array();
foreach($bodies as $bg) {
$bodybg[] = $bg;
}
foreach($bodybg as $bg) {
$bg->removeAttribute('background');
}
$str = $dom->saveHTML();
I've selected the body tags instead of the table, as the <table> itself doesn't have a background attribute, it only has bgcolor.
To strip the background inline css property, you can use the sabberworm's PHP CSS Parser
to parse the CSS retrieved from the DOM: try this
// Selecting all the elements since each one could have a style attribute
$alltags = $dom->getElementsByTagName('*');
$tags = array();
foreach($alltags as $tag) {
$tags[] = $tag;
} $css = array();
foreach($tags as &$tag) {
$oParser = new CSSParser("p{".$tag->getAttribute('style')."}");
$oCss = $oParser->parse();
foreach($oCss->getAllRuleSets() as $oRuleSet) {
$oRuleSet->removeRule('background');
$oRuleSet->removeRule('background-image');
}
$css = $oCss->__toString();
$css = substr_replace($css, '', 0, 3);
$css = substr_replace($css, '', -2, 2);
if($css)
$tag->setAttribute('style', $css);
}
Using all this code togheter, for example if you have a
$string = '<!DOCTYPE html>
<html><body background="http://yo.ur/background/dot/com" etc="an attribute value">
<img src="http://your.pa/th/to/image"><img src="http://anoth.er/path/to/image">
<div style="background-image:url(http://inli.ne/css/background);border: 1px solid black">div content...</div>
<div style="background:url(http://inli.ne/css/background);border: 1px solid black">2nd div content...</div>
</body></html>';
The PHP will output
<!DOCTYPE html>
<html><body etc="an attribute value">
<div style="border: 1px solid black;">div content...</div>
<div style="border: 1px solid black;">2nd div content...</div>
</body></html>
In order to fully mimic the behavior of gmail or similar web mails would be to replace the tags, and background: css attributes accordingly so that they display a placeholder, making clear to the user that here lies an image.
Since usually the message is being loaded in an iframe I believe that your best guess, would be to clean the message server side removing all unwanted tags and replacing images accordingly on preview.
I will agree with Michal that it is not wise to use just regex to validate your HTML and you probably should traverse the DOM tree just to be safe.
Why don't you take a look at washtml by Frederic Motte used by roundcube to get you started?
Using regular expressions to parse html is usually not recommended.
I think a better approach would be to parse the html server-side, and manipulate it to remove the images or the image src attributes. A library I've had success with is http://simplehtmldom.sourceforge.net/, but I think you can use official PHP DOM extensions.
The removal of background images might be more tricky. You might have to use something like http://www.pelagodesign.com/sidecar/emogrifier/ to apply something like {background: none} to the html elements. However, CSS background images are not supported in the latest versions of Microsoft Outlook, so I would recommend not using them at all from the get-go in order to have the emails to be consistent for most email clients.
Like tkone mentioned: perhaps JavaScript / jQuery is the answer.
This will look at all images in your preview area and change the source to a placeholder image. The 'placeholder' class sets the background image to the placeholder as well
jQuery
$("#previewArea img").each(function(){
$(this).attr("src","placeholder.jpg");
$(this).addClass("hideBG");
});
CSS
.hideBG{
background: url("placeholder.jpg");
}
Not tested, but should work - depending on your setup and needs.
I've asked a similar question (in solution, not actual problem): How to strip specific tags and specific attributes from a string? (Solution)
It's a server side library which cleans (and formats) HTML input according to predefined settings. Have it remove any src attributes and all background properties.
You could always do this on the client end as well.
Using this hypothetical code, you should be able to do something like this, pretending that modern browsers all work the same: (or use jQuery or something)
var email;
var xhr = new XMLHttpRequest();
xhr.open('GET', URL_FOR_EMAIL, true);
xhr.onreadystatechange = function(event){
if(xhr.readyState === 4 && xhr.status === 200){
email = HTMLParser(xhr.responseText);
}
}
var imgs = email.getElementsByTagName('img');
for(var i = 0; i > imgs.length; i++){
email.removeChild(imgs[i]);
}
// attach the email body to the DOM
// do something with the images
HTMLParser from MDN
function HTMLParser(aHTMLString){
var html = document.implementation.createDocument("http://www.w3.org/1999/xhtml", "html", null),
body = document.createElementNS("http://www.w3.org/1999/xhtml", "body");
html.documentElement.appendChild(body);
body.appendChild(Components.classes["#mozilla.org/feed-unescapehtml;1"]
.getService(Components.interfaces.nsIScriptableUnescapeHTML)
.parseFragment(aHTMLString, false, null, body));
return body;
},
I think that the best way to do it and keep the change reversible its using a tag who not process the "src" attribute.
Ex: Change all the "img" with "br"
So print the filtered HTML 1st and reverse it with ajax, search for all the br with a src attribute.
1) convert
<table style="width: 700px; height: 300px; background-color: #ff0000;" border="0">
to
<table width="700" height="300" bgcolor="#ff0000" border="0">
2) convert
<table style="width: 700px; background-color: #ff0000;" border="0">
to
<table width="700" bgcolor="#ff0000" border="0">
I'm using Joomla 1.5.x where in content client can add table or nested tables. Tinymce using inline styles for table dimensions, background properties. But when we try to generate pdf from front end inline styles for table are ignored. Joomla is using TCPDF to generate I've updated the version but same problem. When I've converted the css properties into html attributes, it is generating the table in formated way. So thought to replace all inline styles for table, td, th to html attributes. Tried with many thread from this website but unable to make use of them as I'm poor at regular expressions.
Please any help me in doing so.
Thanks in advance.
I think you can use PHP DOM for this.
1) Use DOMElement::getAttribute() to get the value of attribute style=
2) Use $split = explode(";", $style) to separate those css values
3) With each entry $i of $split, $attributes = explode(":", $split[$i]), in get attribute's name and its value.
4) Now you got $attributes holding 2 values: attribute and value of that attribute.
5) Use DOMElement::setAttribute() to add those values of $attributes.
So, put everything into codes:
$dom = new DOMDocument();
$dom->loadHTML($html);
$atrvalue= $dom->getAttribute("style");
$split = explode(";", $atrvalue);
for ($i=0; $i<=count($split); $i++) {
$attribute = explode(":", $split[$i]);
$node = $doc->createElement("table");
$newnode = $doc->appendChild($node);
$newnode->setAttribute($attribute[0], $attribute[1]);
}
This way doesn't involve regex :) But you need to modify it in order to fit your context.