What does the beginning of this HTML optimization code do? - php

The difficult part is trying to figure out what the stripwhitespace() function does. stripbuffer() is fairly straightforward, but I've been staring at this little piece of code for a while now trying to decipher it, but to no avail. The cryptic variable names and lack of comments don't help much, either. I also had to remove some hyperlinks from the credits because of the spam prevention on this site
<?php
/* ---------------------------------
26 January, 2008 - 2:55pm:
The example below is adapted from a post by londrum 8:29 pm on June 7, 2007:
"crunch up your HTML into a single line
a handy little script..."
This PHP code goes at the very TOP of the PHP-enabled HTML webpage
above EVERYTHING else. Recommendation: use a PHP include file for this
to have only one file to maintain.
--------------------------------- */
function stripwhitespace($bff){
$pzcr=0;
$pzed=strlen($bff)-1;
$rst="";
while($pzcr<$pzed){
$t_poz_start=stripos($bff,"<textarea",$pzcr);
if($t_poz_start===false){
$bffstp=substr($bff,$pzcr);
$temp=stripBuffer($bffstp);
$rst.=$temp;
$pzcr=$pzed;
}
else{
$bffstp=substr($bff,$pzcr,$t_poz_start-$pzcr);
$temp=stripBuffer($bffstp);
$rst.=$temp;
$t_poz_end=stripos($bff,"</textarea>",$t_poz_start);
$temp=substr($bff,$t_poz_start,$t_poz_end-$t_poz_start);
$rst.=$temp;
$pzcr=$t_poz_end;
}
}
return $rst;
}
function stripBuffer($bff){
/* carriage returns, new lines */
$bff=str_replace(array("\r\r\r","\r\r","\r\n","\n\r","\n\n\n","\n\n"),"\n",$bff);
/* tabs */
$bff=str_replace(array("\t\t\t","\t\t","\t\n","\n\t"),"\t",$bff);
/* opening HTML tags */
$bff=str_replace(array(">\r<a",">\r <a",">\r\r <a","> \r<a",">\n<a","> \n<a","> \n<a",">\n\n <a"),"><a",$bff);
$bff=str_replace(array(">\r<b",">\n<b"),"><b",$bff);
$bff=str_replace(array(">\r<d",">\n<d","> \n<d",">\n <d",">\r <d",">\n\n<d"),"><d",$bff);
$bff=str_replace(array(">\r<f",">\n<f",">\n <f"),"><f",$bff);
$bff=str_replace(array(">\r<h",">\n<h",">\t<h","> \n\n<h"),"><h",$bff);
$bff=str_replace(array(">\r<i",">\n<i",">\n <i"),"><i",$bff);
$bff=str_replace(array(">\r<i",">\n<i"),"><i",$bff);
$bff=str_replace(array(">\r<l","> \r<l",">\n<l","> \n<l","> \n<l","/>\n<l","/>\r<l"),"><l",$bff);
$bff=str_replace(array(">\t<l",">\t\t<l"),"><l",$bff);
$bff=str_replace(array(">\r<m",">\n<m"),"><m",$bff);
$bff=str_replace(array(">\r<n",">\n<n"),"><n",$bff);
$bff=str_replace(array(">\r<p",">\n<p",">\n\n<p","> \n<p","> \n <p"),"><p",$bff);
$bff=str_replace(array(">\r<s",">\n<s"),"><s",$bff);
$bff=str_replace(array(">\r<t",">\n<t"),"><t",$bff);
/* closing HTML tags */
$bff=str_replace(array(">\r</a",">\n</a"),"></a",$bff);
$bff=str_replace(array(">\r</b",">\n</b"),"></b",$bff);
$bff=str_replace(array(">\r</u",">\n</u"),"></u",$bff);
$bff=str_replace(array(">\r</d",">\n</d",">\n </d"),"></d",$bff);
$bff=str_replace(array(">\r</f",">\n</f"),"></f",$bff);
$bff=str_replace(array(">\r</l",">\n</l"),"></l",$bff);
$bff=str_replace(array(">\r</n",">\n</n"),"></n",$bff);
$bff=str_replace(array(">\r</p",">\n</p"),"></p",$bff);
$bff=str_replace(array(">\r</s",">\n</s"),"></s",$bff);
/* other */
$bff=str_replace(array(">\r<!",">\n<!"),"><!",$bff);
$bff=str_replace(array("\n<div")," <div",$bff);
$bff=str_replace(array(">\r\r \r<"),"><",$bff);
$bff=str_replace(array("> \n \n <"),"><",$bff);
$bff=str_replace(array(">\r</h",">\n</h"),"></h",$bff);
$bff=str_replace(array("\r<u","\n<u"),"<u",$bff);
$bff=str_replace(array("/>\r","/>\n","/>\t"),"/>",$bff);
$bff=ereg_replace(" {2,}",' ',$bff);
$bff=ereg_replace(" {3,}",' ',$bff);
$bff=str_replace("> <","><",$bff);
$bff=str_replace(" <","<",$bff);
/* non-breaking spaces */
$bff=str_replace(" "," ",$bff);
$bff=str_replace(" "," ",$bff);
/* Example of EXCEPTIONS where I want the space to remain
between two form buttons at */
/* <!-- http://websitetips.com/articles/copy/loremgenerator/ --> */
/* name="select" /> <input */
$bff=str_replace(array("name=\"select\" /><input"),"name=\"select\" /> <input",$bff);
return $bff;
}
ob_start("stripwhitespace");
?>

It looks to me as if it crunches everything before the textarea and after the textarea but it leaves the contents of a textarea alone.
While this code may be somewhat interesting, PHP is notoriously bad at fast string manipulation and all those str_replace calls are a bad, bad idea.
I predict you'd get better performance by using gzip/deflate on the web server to compress the script output before sending.

It's definitely a mess, but it seems as if it strips unnecessary white-space from a string, except from within textareas.

It's obvious what stripBuffer does: it tries to strip all whitespace from its input.
stripwhitespace works as follows:
function stripwhitespace($input){
$currentPosition=0; // start from the first char
$endPosition=strlen($input)-1; // where to stop
$returnValue="";
// while there is more input to process
while($currentPosition<$endPosition){
// find start of next <textarea> tag
$startOfNextTextarea=stripos($input,"<textarea",$currentPosition);
if($startOfNextTextarea===false){
// no textarea tag remaining:
// strip ws from remaining input, append to $returnValue and done!
$bufferToStrip=substr($input,$currentPosition);
$temp=stripBuffer($bufferToStrip);
$returnValue.=$temp;
$currentPosition=$endPosition; // to cause the function to return
}
else{
// <textarea> found
// strip ws from input in the range [current_position, start_of_textarea)
$bufferToStrip=substr($input,$currentPosition,$startOfNextTextarea-$currentPosition);
// append to return value
$temp=stripBuffer($bufferToStrip);
$returnValue.=$temp;
$endOfNextTextarea=stripos($input,"</textarea>",$startOfNextTextarea);
// get contents of <textarea>, append to return value without stripping ws
$temp=substr($input,$startOfNextTextarea,$endOfNextTextarea-$startOfNextTextarea);
$returnValue.=$temp;
// continue looking for textareas after the end of this one
$currentPosition=$endOfNextTextarea;
}
}
return $returnValue;
}
I admit this would be quite harder if you can't "intuitively" tell what it's trying to do, given the special treatment the content of <textarea> tags gets in HTML.

In pseudo code (ish)
bff is the initial buffer
pzcr is the current start
pzed is the current end
rst will have the filtered text appended to it.
while the current start is before the end
t_pos_start is first position of the textarea (after current start)
if there is no text area found
bffstp becomes the substring of the buffer starting at pzcr
temp is buffer stripped.
append temp to rst
set the current start to the current end.
else
set bffstp to the substr between the start and the start of the textarea tag
temp is buffer stripped.
append temp to rst
skip the textarea
temp will be the substr from the start of the text area to the closing text area tag.
append temp (unfiltered) to rst.
set the next start to the end of the textarea (at the start of its closing tag).
end the if
end the while
return the appended buffer (rst)
Hmm - As an html compressor, this code itself is actually bloated as well as hard to read. Regular expressions, used well, should be able to make a much better job of this.

Related

php Format arrays and objects etc output in text logfile

Should be a pretty obvious answer, but I have spent several hours looking at existing similar questions and none are working for me
My code generates logfiles for (manual) debugging etc
If I use print_r($array,TRUE) to capture the output from an array as a string and then echo with <pre> tags to display that on screen, it's really easy to view and understand what's going on.
However, when I write the same info to the logfile, fwrite doesn't preserve the line break and indentation formatting so there is a splurge of info that takes significant amounts of time to make sense of, esp larger arrays and objects.
I have tried using output buffer
$string=print_r($array,TRUE);
ob_start();
echo "<pre>$string</pre>";
$outputBuffer = ob_get_contents();
ob_end_clean();
fwrite($handle,$outputBuffer);
However, all that's now happening is that I see the <pre> tags added into the basic, non-layout output
e.g.
<pre>DOING QUERY: SELECT * FROM event_triggers WHERE DateTime<='2015-09-16 13:04:30'</pre><pre>Completed checking for event triggers</pre>
You can't just add HTML tags to a document, open it in an editor expect HTML tags to be rendered correctly.
You either have to setup your log file as a HTML file (doesn't neccessarily have to be valid, so just add .html to the file name and open it in the browser) or use var_dump to echo out the variables.
Rename file to .html extension and just open with a browser. Browser will detect it with line break html document. <pre></pre> will output like <p></p> in the browser.

Zend PDF wysiwyg editor output

I'm currently building a PDF editor. I have a problemen with implementing the processing of the tags.
I want to allow the following tags:
[h1],[h2],[h3],[h4],[h4],[h5],[h6],[strong]
I've build an class with a method called drawText(code below).
The [h1] tag will change the font size and the font weight. As you can see in the code I'm outputting lines of text.
Example text line:
This is your [strong]boarding pass[/strong], please save this PDF file on your smartphone or tablet and [strong]show it at the gate[/strong].
I'd like to make the text between the [strong] bold. To do this with Zend_PDF I need to set the TTF file with the bold text and then find the current X-coordinate and call $this->pdf()->drawText(text, X-coordinate, Y-coordinate, charset). I've been thinking and trying for hours to write the code which makes this possible(tried using explode, preg_match_all, etc), but I can't get it to work...
I believe I'm not the only one with this problem, and I hope someone has thought about this and can help a little by telling how he or she did it...
Hope to hear from someone and thanks in advance!
/**
* drawSplittedText()
*
* #param array $text
* #return object Application_Plugin_PdfPlugin
*/
public function drawSplittedText(Array $text)
{
// Count the number of rows.
$textRowCount = count($text);
$i = 0;
foreach ($text as $row)
{
// Replace tabs, because they're not outputted properly.
$row = str_replace("\t", ' ', $row);
// If the character encoding of the currrent row not is UTF-8, convert the row characters to UTF-8.
if (($rowEncoding = mb_detect_encoding($row)) != 'UTF-8') {
$row = iconv($rowEncoding, 'UTF-8', $row);
}
// Output row on PDF
$this->pdf()->drawText($row, $this->_defaultMarginleft, $this->currentY, 'UTF-8');
$this->newLine();
++$i;
}
return $this;
}
The code above is probably where most people start when rendering text with Zend_Pdf, but unfortunately you are going to have to develop something a litte more complex to achieve your goals.
Firstly, you are going to need to keep track of the current x and y location, along with the current font type and size.
Then you'll need a helper function/method to calculate how much space a chunk of text is going to need when rendered in the current font and size.
I would then suggest breaking up your rendering code as follows:
function writeParagraph( $text )
{
// Looks for the next tag and sends all text before that tag to the
// writeText() function. When it gets to a tag it changes the current
// font/size accordingly, then continues sending text until it runs out
// of text or reaches another tag. If you need to deal with nested tags
// then this function may have to call itself recursively.
}
function writeText( $text )
{
// The first thing this function needs to do is call getStringWidth() to
// determine the width of the text that it is being asked to render and if
// the line is too long, shorten it. In practice, a better approach is to
// split the words into an array based on the space between each word and
// then use a while() loop to start building the string to be rendered
// (start with first word, then add second word, then add third word, etc),
// in each iteration testing the length of the current string concatenated
// with the next word to see if the resulting string will still fit. If you
// end up with a situation where adding the next word to the current string
// will result in a string that is too long, render the current string and
// a line feed then start the process again, using the next word as the first
// word in the new string. You will probably want to write a bonus line feed
// at the end as well (unless, of course, you just wrote one!).
}
function getStringWidth( $str )
{
// This needs to return the width of $str
}
I have a sample class (https://github.com/jamesggordon/Wrap_Pdf) that implements the writeText() and getStringWidth() functions/methods, plus includes all of the other stuff, like current location, current style, etc. If you can't figure out the code for the writeParagraph() function let me know and I'll include it in Wrap_Pdf.

Can I use PHP to save only the visible elements of the output as a new file?

I've got a php file that takes an xml file (generated by an outside source) and reformats it with CSS & HTML. A number of the XML tags are things I don't want to see in the final version, so I have them hidden. The end result is something like this:
<html>
<div style="display: none">
content i don't want to see
</div>
content I do want to see.
</html>
Is there a way I can take the resulting html file as it's displayed in the browser window,
content I do want to see.
…and save that as a text file? I want it to ignore all the hidden <div> tags and only save what can otherwise be selected and copied by the user.
I've looked around for an answer to this but I'm not even really sure what I'm looking for or how to search it.
I've also tried this:
ob_start();
file_put_contents('filename.htm', ob_get_contents());
ob_end_flush();
… but that's doesn't solve it. I have a number of tags in the outputted test (> etc) that need to be saves as they are displayed, and ob_get_contents() takes the page's source code, not the displayed version.
This matters because the outputted page is also PHP that has been generated based on other factors, so I need to use html unicode values to keep the $ signs and quotes from messing up the source PHP.
I hope that was clear. Thanks in advance for any help or suggestions.
I think you have to strip out the unneeded part manually, using a RegEx something, which maybe like:
$content_raw = ob_get_contents();
$content_stripped = preg_replace($content_raw, '<div style="display: none">[^<>]*</div>', '');
file_put_contents('filename.htm', $content_stripped);

Loading multiline text from database to TextArea

I have some multi line text saved in MySql database (VARCHAR 255). When i load it, and process it using standard php function "nl2br", it echoes fine (multi line). But, when i load multi line text from database, make it "nl2br" and then send it to javascript (so it gets displayed in textarea), it won't be displayed! What's wrong?
echo "<SCRIPT>FillElements('".$subject."','".$text."');</SCRIPT>";
P.S.
FillElements function:
function FillElements(Sub,Txt)
{
document.getElementById('txtSubject').value=Sub;
document.getElementById('txtMessage').value=Txt;
}
textareas don't actually store the contents in an attribute like value in the same manner as input elements. They actually store the contents in in between the <textarea> and </textarea> tags. Meaning that the contents is actually treated as CDATA in the document.
<textarea>
This is my Content
</textarea>
Produces a text area with "This is my Content" as the contents.
The implication of this is that you cannot use the code you have to alter the contents of a textarea. You have to alter the innerHTML property of the textarea. I have set up a simple example here:
http://jsfiddle.net/wFZWQ/
As an aside, since you are populating the fields using PHP on the creation of the page, why not merely fill the data in the HTML markup, this seems like a long way round to do it.
Also, since you don't appear to be using it, have you seen [jQuery][1] it abstracts alot of things out, so instead of typing document.getElementById("the_id") to get an element you can use CSS selectors and merely write $("#the_id") to get the same element. You also get a load of useful functions that make writing javascript mucxh easier.
[1]: http://jquery.com jQuery
Newline tags (<br />) don't cause actual new lines in <textarea>.
You can pass the "real" newlines (\n) to your <textarea>, though.
I created a fiddle for that.
EDIT: For the updated FillElements code:
$subject = "String\nWith\nMultiple\nLines";
printf('<script type="text/javascript">FillElements(%s)</script>',
json_encode($subject)
);
My guess is that your HTML source code looks like this:
<script>FillElements("foo","foo
bar
baz");<script>
Correct?
In JavaScript, strings cannot span multiple lines...

markdown: render linebreaks within block elements as <br>

I know this has been asked (Python Markdown nl2br extension, etc) but none of those answers is doing it for me.
I would like to render markdown so that linebreaks occuring within a <p> element will be rendered as <br>. Example: they type
Here is line one.
And line two.
New paragraph.
should render as
<p>Here is line one.<br>And line two.</p>
<p>New paragraph.</p>
I know that if you want that, you should type two spaces at the end of the line you want to <br>. I am trying to make it so my users don't have to do that, but rather, enter text as though they were using a typewriter (for those who know what that is). One hard return, new line; two hard returns, new paragraph.
I've been working with https://parsedown.org/ and have also experimented with https://commonmark.thephpleague.com; also the Python markdown module with nl2br extension (tried their example verbatim, did not work for me). Whatever I do, I end up with either too many or not enough linebreaks, depending.
I have tried what I thought would be clever and elegant: style my markdown's <p> with white-space: "pre" (also tried pre-line). That works, unless the user has done it "right" with two spaces, in which case you get the unwanted double <br> effect.
Also tried nl2br($markdown) with likewise unreliable results.
I want non-technical users to be able to use some basic formatting as easily as possible, and markdown seems just the thing, but for this detail. I don't want to write a CMS just to work around this. For example, I've thought of adding a boolean markdown property on the entity and letting them choose, yadda yadda... don't wanna go there. I've thought of doing some string-replacement or regexp magic, either at database-write time or just before rendering. But again, hoping to avoid getting too complicated. (To make it a little more challenging, I will also have to import a few thousand legacy records that are non-markdown, and potentially deal with issues around old ones versus new.)
Maybe I'm overlooking a simple, sane way out. Any thoughts as to the best strategy?
Update: by popular demand, code examples of what does not work. It's a Zend MVC application that involves Doctrine entities I call MOTD and MOTW (Message Of The Day and Message Of The Week, respectively); these have a string property called content. Generically I think of these entities as Notes and they implement a NoteInterface. When I retrieve these from the database (via a NotesService class that internally uses a custom Doctrine repository class), it's time to render the content as markdown before the controller assigns it to the view:
// from NotesService.php
use Parsedown;
// stuff omitted...
/**
* gets MOT(D|W) by date
*
* #param DateTime $date
* #param string $type
* #param boolean $render_markdown
* #return NoteInterface|null
*/
public function getNoteByDate(DateTime $date, string $type, bool $render_markdown = true) :? NoteInterface
{
$entity = $this->getRepository()->findByDate($date,$type);
if ($entity && $render_markdown) {
$content = $entity->getContent();
$entity->setContent($this->parsedown($content));
}
return $entity;
}
The point of the boolean $render_markdown is for when we want raw markdown, i.e., when it's going to populate a textarea element of a form.
And the parsedown() method, quite simply:
public function parsedown(string $content) : string
{
if (! $this->parseDown) {
$this->parseDown = new Parsedown();
}
// nope...
// return nl2br($this->parseDown->text($content));
return $this->parseDown->text($content);
}
Inside a viewscript, I just go, e.g.,
if ($this->notes['motd']):
// echo nl2br($this->notes['motd']->getContent());
echo $this->notes['motd']->getContent();
else:
?><p class="font-italic no-note">no MOTD for this date</p><?php
endif;
Now, if in the editing form they input this as content:
here is a line
and here is another
now, new paragraph.
and then we save it in the database, when you select it back out and run it through $parsedown->text($content), you get this HTML:
<p>here is a line
and here is another</p>
<p>now, new paragraph.</p>
Please note, the example input above does not have any space characters preceding the linebreaks. When you do type two spaces before the linebreaks, yeah, it works great. But I don't think my users want to think about that. So using nl2br() helps, except when it results in too many consecutive <br>s in the HTML.
My latest thinking is, use a CSS solution and an input filter that strips <space><space> at the end of lines. When it works, I'll add the story to my memoir. :-)
There may be some more desirable way to achieve this, but finally I decided to
(1) filter the input (at create|update time) with regexp pattern substition to remove trailing ' ' (two consecutive space characters) from lines. I happen to be using ZendFramework's Zend\Filter\PregReplace but it's a de facto wrapper for preg_replace('/( {2,})(\R)/m',$2).
(2) Use CSS to make newlines act like <br> when I display these entities, e.g.,
#motd .card-body p { white-space: pre-line }
Seems to be working for me.

Categories