Regular expression that will only compress certain sections of the page - php

I have a function that strips out un-needed whitespaces from the output of my php page prior to saving the page to an HTML file for caching purposes.
However in some sections of my page I have source code in pre tags and these whitespaces effect how the code is displayed. My skill with regular expressions is horrible so I am basically look for a solution to stop this function from messing with code inside:
<pre></pre>
This is the php function
function sanitize_output($buffer)
{
$search = array(
'/\>[^\S]+/s', //strip whitespaces after tags, except space
'/[^\S ]+\</s', //strip whitespaces before tags, except space
'/(\s)+/s', // shorten multiple whitespace sequences
);
$replace = array(
'>',
'<',
'\\1',
);
$buffer = preg_replace($search, $replace, $buffer);
return $buffer;
}
Thanks for your help.
Heres what i found to be working :
Solution:
function stripBufferSkipPreTags($buffer){
$poz_current = 0;
$poz_end = strlen($buffer)-1;
$result = "";
while ($poz_current < $poz_end){
$t_poz_start = stripos($buffer, "<pre", $poz_current);
if ($t_poz_start === false){
$buffer_part_2strip = substr($buffer, $poz_current);
$temp = stripBuffer($buffer_part_2strip);
$result .= $temp;
$poz_current = $poz_end;
}
else{
$buffer_part_2strip = substr($buffer, $poz_current, $t_poz_start-$poz_current);
$temp = stripBuffer($buffer_part_2strip);
$result .= $temp;
$t_poz_end = stripos($buffer, "</pre>", $t_poz_start);
$temp = substr($buffer, $t_poz_start, $t_poz_end-$t_poz_start);
$result .= $temp;
$poz_current = $t_poz_end;
}
}
return $result;
}
function stripBuffer($buffer){
// change new lines and tabs to single spaces
$buffer = str_replace(array("\r\n", "\r", "\n", "\t"), ' ', $buffer);
// multispaces to single...
$buffer = preg_replace(" {2,}", ' ',$buffer);
// remove single spaces between tags
$buffer = str_replace("> <", "><", $buffer);
// remove single spaces around
$buffer = str_replace(" ", " ", $buffer);
$buffer = str_replace(" ", " ", $buffer);
return $buffer;
}

Regular expressions are known to be evil (see this and this) when it comes to parsing HTML.
That said, try to do what you need in another way, like using a DOM parser and customizing its HTML output functions.

If you are compressing for disk-space, you should consider using gz compression. (php.net/gz_deflate)

Related

Removing empty CSS Selector with PHP preg_replace

I'm trying to get a whole CSS file through compression. It works fine but I'd like to remove empty CSS selectors. This file is procedurally generated so it leaves behind some empty tags and instead of removing them all by hand:
$buffer = str_replace('#content>#columns>#article{}', '', $buffer);
$buffer = str_replace('.menuDeeper88{}', '', $buffer);
I tried to remove them like this:
$buffer = preg_replace('/\}[.*?]\{\}/', '\}', $buffer);
Which I would imagine goes like this (just a clarification):
Replace any case of '}ANY_CHARACTERS{}' with ''
But the preg_replace method didn't work. I was hoping someone here could help me make it work.
I thank you in advance.
You can use this approach:
$css = "#content>#columnsA{min‐width: 500px;}#articleA{ }.menuUpper88{}#content>#columnsB>{min‐width: 700px;}#articleB{}.menuUpper88{}.header {background-color: #fff;background-image: url(image.gif);background-repeat: no-repeat;background-position: top left;}";
$css_splitted_by_closed_bracket = preg_split("/\s*\}/i", $css);
function get_css_without_empty_selector($data)
{
$result = "";
foreach ($data as $item) {
if($item != "" && substr($item, -1) != '{') {
$result .= $item . "}";
}
}
return $result;
}
get_css_without_empty_selector($css_splitted_by_closed_bracket);
Running code here: https://3v4l.org/9Cskr
Use this:
preg_replace('/(?:[^\r\n,{}]+)(?:,(?=[^}]*{)|\s*{[\s]*})/', '', $buffer);
Tested with: https://regex101.com/r/pX0wR0/3
Regular expression modified from: What is the REGEX of a CSS selector
I did not get Jim's answer to work with grouped selectors.
Try this:
preg_replace('/[a-zA-Z0-9\s#=",-:()\[\]]+\{\s*\}/', '', $buffer);

Converting html tags to docx and updating TOC using XML in php

I need to implement a module for exporting html to docx document in PHP. I created a template and set some variables inside. I am replacing these variables to data queried from database. It was working while there have occured the need to add some html tags with style attributes and TOC. I was using str_replace to convert some simple tags like <br/>, <p> and etc, but it is not working if add styling attributes like align and color.
Is there any ready open source systems to convert html tags including its styles to word?
Can I create a TOC after all the replace have been done?
I use http://phpword.codeplex.com/ to do this.
The way I use it: upload an existing doc with tags like %name% in it.
These tags will be replaced by $name variable and the system will output the new document.
In order to fix a bug where phpword was unable to replace variables I had to modified the Template.php file. Look for the method setValue and change the function to:
$pattern = '|\$\{([^\}]+)\}|U';
preg_match_all($pattern, $this->_documentXML, $matches);
$openedTagPattern= '/<[^>]+>/';
$closedTagPattern= '/<\/[^>]+>/';
foreach ($matches[0] as $value) {
$modified= preg_replace($openedTagPattern, '', $value);
$modified= preg_replace($closedTagPattern, '', $modified);
$this->_header1XML = str_replace($value, $modified, $this->_header1XML);
$this->_header2XML = str_replace($value, $modified, $this->_header2XML);
$this->_header3XML = str_replace($value, $modified, $this->_header3XML);
$this->_documentXML = str_replace($value, $modified, $this->_documentXML);
$this->_footer1XML = str_replace($value, $modified, $this->_footer1XML);
$this->_footer2XML = str_replace($value, $modified, $this->_footer2XML);
$this->_footer3XML = str_replace($value, $modified, $this->_footer3XML);
}
if(substr($search, 0, 2) !== '${' && substr($search, -1) !== '}') {
$search = '${'.$search.'}';
}
if(!is_array($replace)) {
$replace = utf8_encode($replace);
}
$this->_header1XML = str_replace($search, $replace, $this->_header1XML);
$this->_header2XML = str_replace($search, $replace, $this->_header2XML);
$this->_header3XML = str_replace($search, $replace, $this->_header3XML);
$this->_documentXML = str_replace($search, $replace, $this->_documentXML);
$this->_footer1XML = str_replace($search, $replace, $this->_footer1XML);
$this->_footer2XML = str_replace($search, $replace, $this->_footer2XML);
$this->_footer3XML = str_replace($search, $replace, $this->_footer3XML);

including leading whitespaces when using fgets in php

I am using PHP to read a simple text file with the fgets() command:
$file = fopen("filename.txt", "r") or exit('oops');
$data = "";
while(!feof($file)) {
$data .= fgets($file) . '<br>';
}
fclose($file);
The text file has leading white spaces before the first character of each line. The fgets() is not grabbing the white spaces. Any idea why? I made sure not to use trim() on the variable. I tried this, but the leading white spaces still don't appear:
$data = str_replace(" ", " ", $data);
Not sure where to go from here.
Thanks in advance,
Doug
UPDATE:
The text appears correctly if I dump it into a textarea but not if I ECHO it to the webpage.
Function fgets() grabs the whitespaces. I don't know what you are exactly doing with the $data variable, but if you simply display it on a HTML page then you won't see whitespaces. It's because HTML strips all whitespaces. Try this code to read and show your file content:
$file = fopen('file.txt', 'r') or exit('error');
$data = '';
while(!feof($file))
{
$data .= '<pre>' . fgets($file) . '</pre><br>';
}
fclose($file);
echo $data;
The PRE tag allows you to display $data without parsing it.
Try it with:
$data = preg_replace('/\s+/', ' ', $data);
fgets should not trim whitespaces.
Try to read the file using file_get_contents it is successfully reading the whitespace in the begining of the file.
$data = file_get_contents("xyz.txt");
$data = str_replace(" ","~",$data);
echo $data;
Hope this helps
I currently have the same requirement and experienced that some characters are written as a tab character.
What i did was:
$tabChar = ' ';
$regularChar = ' '
$file = fopen('/file.txt');
while($line = fgets($file)) {
$l = str_replace("\t", $tabChar, $line);
$l = str_replace(" ", $regularChar, $line);
// ...
// replacing can be done till it matches your needs
$lines .= $l; // maybe append <br /> if neccessary
}
$result = '<pre'> . $lines . '</pre>';
This one worked for me, maybe it helps you too :-).

remove breaks from php url string?

I for the life of me can't figure this out...
I've got a form, it ends up passing some fields as a query string.
$f = $_GET['full'];
$t = $_GET['title'];
$illegal = array("'", "#039;");
$f = str_replace($illegal, "", $f);
$t = str_replace($illegal, "", $t);
For some weird reason, apostrophes where messing up what I need, so I removed them as they were turning up (' for the first occurance and #039; afterwards).
Now $t outputs a useable string. $f however can contain breaks. And they end up in the string as
<br+%2F>
I've tried
$f = preg_replace("/\r?\n/", "", $f);
and
$breaks = array("<br+%2F>", "/\r?\n/", "<br />");
$f = str_replace($breaks, "%20", $f);
but when i output it in a url I STILL get
http://www.domain.com?t=good%20string&f=bad<br+%2F>string
The is causing a 404 page not found error. :(
EDIT
$f = html_entity_decode($_GET['full']);
$t = html_entity_decode($_GET['title']);
$illegal = array("'", "#039;");
$f = str_replace($illegal, "", $f);
$t = str_replace($illegal, "", $t);
$f = htmlspecialchars($f);
$breaks = array("<br+%2F>%0D%0A<br+%2F>%0D%0A", "<br+%2F>%0D%0A");
$f = str_replace($breaks, "%20", $f);
if (---private-stuff---) {
header( 'Location: /?title=' . $t . '&full=' . $f . '&fifth=fith%20percent');
}
The last $f STILL contains
<br+%2F>%0D%0A<br+%2F>%0D%0A
in the url. Shouldn't ONE of those functions stripped it out?!?
Use html entities decode
$string = html_entity_decode("'");
echo $string ; // '
so in your case that will be
$f = html_entity_decode($_GET['full']);
$t = html_entity_decode($_GET['title']);
Since you are getting a 404, it may be that you have a .htaccess rule (ModRewrite) that is trying to route you to some other page (that does not exist). Sounds like you may have two separate issues...
To get your actual text back out of the query string, Ibu is correct...use html_entity_decode()

Php custom function is Truncating Text but i don't want it to

I am passing a large amount of text to a PHP function and having it return it compressed. The text is being cut off. Not all of it is being passed back out. Like some of the words at the very end aren't showing up after being compressed. Does PHP limit this somewhere?
function compress($buffer) {
/* remove comments */
$buffer = preg_replace('!/\*[^*]*\*+([^/][^*]*\*+)*/!', '', $buffer);
/* remove tabs, spaces, newlines, etc. */
$buffer = str_replace(array("\r\n", "\r", "\n", "\t", ' ', ' ', ' '), '', $buffer);
return $buffer;
}
Is the function. Its from http://www.antedes.com/blog/webdevelopment/three-ways-to-compress-css-files-using-php
Is there like a setting in php.ini to fix this?
Your compress() function looks decent for CSS files, not JS. This is what I use to "compress" CSS (including jquery-ui and other monsters):
function compress_css($string)
{
$string = preg_replace('~/\*[^*]*\*+([^/][^*]*\*+)*/~', '', $string);
$string = preg_replace('~\s+~', ' ', $string);
$string = preg_replace('~ *+([{}+>:;,]) *~', '$1', trim($string));
$string = str_replace(';}', '}', $string);
$string = preg_replace('~[^{}]++\{\}~', '', $string);
return $string;
}
and for JavaScript files this one: https://github.com/mishoo/UglifyJS2 (or this: http://lisperator.net/uglifyjs/#demo)
I'm sure there are other good tools for the same tasks, just find what suits you and use that.

Categories