Remove whitespace/new lines/comments. Regex? - php

I'm currently making sort of a CSS minifier and well, I think examples for what I'm trying to achieve are the simplest way to explain.
I'm trying to transform this:
/* CSS Content */
.class{
text-align:center;
border-bottom:1px solid #ccc;
}
.anotherclass, .another, .another{
text-align:center;
border-bottom:1px solid #ccc;
}
Into:
.class{text-align:center;border-bottom:1px solid #ccc;}.anotherclass,.another,.another{text-align:center;border-bottom:1px solid #ccc;}
Thus: removing comments, unnecessary whitespace and new lines.
So far I got to remove the comments and new lines (Using one expression and a function exploding the string on \n, then appending the parts). The whitespaces are a bit more difficult, since the whitespaces within the {} should be removed but not between the colon and semicolon.
Since I'm quite inexperienced with the use of regular expressions, have no good reference book at hand and Google does'nt seem to have the answer: I'm wondering if anyone here can help me with creating one good Regex to accomplish this.
Thanks in advance!

If you must do it this way then this will minify the example you gave:
(/\*[^/]+\*/|^\t*|^\s*|\n|\s+(?=[\{.])|(?<=[\{;])\s+)
It assumes the flavour of regex you're using allows positive look-behinds

I guess you could try something like this:
(?<=[{};])\s+|\/\*.+?\*\/
http://rubular.com/r/djDtEPzEV0
Cleans out whitespace in front of parentheses, semicolons and all comments.

Related

strip_tags or other solution

I have this text in mysql adding even directly but do not want to lose the labels only the styles and formats that tenien
<p style="margin-bottom: 20px; padding: 0px; border: 0px;"><span style="line-height: 1.428571429;">Allí, el club crema</span><p>
use strip_tags but removes the entire label
strip_tags ($ data, "<p>");
I want it that way:
<p>Allí, el club crema<p>
I hope your help, thank you very much beforehand for your answers
Warning, anti-pattern: using REGEX on mark-up is generally a bad idea. However it's sometimes more convenient, so to hell with it:
$data = preg_replace('/(<\w+) [^>]+/', '$1', $data);
There is no php function for that. The strip tags function will strip the tag completely, and allowing a tag will keep the tag in place, including the attributes. You'll need to load the html in a xml parser and reconstruct the output, or, and I would advise you to go that way, use regex to strip out any html attributes (after you've stripped away the tags you don't need anyway. See also this question

preg_replace UNLESS string exists

I'm trying to add CSS styling to all hyperlinks unless it has a "donttouch" attribute.
E.g.
Style this: style me
Don't style this: <a href="http://whatever.com" donttouch>don't style me</a>
Here's my preg_replace without the "donttouch" exclusion, which works fine.
preg_replace('/<a(.*?)href="([^"]*)"(.*?)>(.*?)<\/a>/','<a$1href="$2"$3><span style="color:%link_color%; text-decoration:underline;">$4</span></a>', $this->html)
I've looked all over the place, and would appreciate any help.
Find (works also in Notepad++)
(?s)(<a (?:(?!donttouch)[^>])+>)(.*?)</a>
Replace with (Replace all in Notepad++):
\1<span style="whatever">\2</span></a>
This can be accomplished without a regular expression. Instead, use a CSS attribute selector.
For example, use these rules:
a { font-weight: bold; color: green }
a[donttouch=''] { font-weight: normal; color: blue }
Technically, you are styling the elements with the 'donttouch' attribute, but you can use default values. This will be more efficient than attempting to use a regular expression to parse your HTML, which is usually a bad idea.

Using functions inside preg_replace

$TOPIC_CONTENT = preg_replace("!<code>(.+)</code>!is","<div style='color: #00FF00;
background-color: #000000; border-radius: 5px; margin: 5px;"<pre>".htmlspecialchars("$0")."</pre></div>",$TOPIC_INFO->content);
How can I get this to work? I have no idea how to pull this off, and I know my current way is invalid.
Use preg_replace_callback. Be a little careful with your regex .. I think you want to use .+? instead of just .+. The usual mantra is "don't parse html with regex," but for something as simple as this I don't see the harm.
Except for preg_replace_callback as in tandu's answer, you can also use the /e switch, and your replacement string will be *e*valuated as PHP code, and its result will be used.
I.e you could do:
preg_replace("!<code>(.+?)</code>!ise",
'"<pre style=\"color: #0f0; background: #000;\">" . htmlspecialchars("$1") . "</pre>"',
$string);

preg_replace capture group check

I'm trying to add a new bbcode to my phpfusion application. Im using with preg_replace. Here's the code:
$text = preg_replace(
"#\[gameimg( float:(left|center|right))?\]((http|ftp|https|ftps)://)(.*?)(\.(jpg|jpeg|gif|png|JPG|JPEG|GIF|PNG))\[/gameimg\]#sie",
"'<span style=\"display: block; max-width: 350px; margin: 0 0 5px 5px; $1\"><img src=\"'
. (strlen('$3') > 0 ? '$3' : BASEDIR.GAMESDIR)
. '$5$6\" alt=\"$5$6\" style=\"border:0px; max-width: 350px;\" /></span>'",
$text
);
If I supply an absolute url (for ex. [gameimg]http://localhost/dirname/file.jpg[/gameimg]) everything works fine, the picture shows up as expected. But if i omit the protokol and hostname using relative url ([gameimg]dirname/file.jpg[/gameimg]) i expect to append the basedir.gamedir constant to the given url but it doesn't work at all, it prints out the original bbcode without any replacement, not the image. What am i doing wrong?
A couple things here:
That is a giant preg_replace call. Maybe you could break this problem into smaller parts so that it is easier to understand/maintain?
You are using the "ignore case" modifier (i), yet you have things like (jpg|JPG). This is redundant.
Your question asks why [gameimg] tags without http://.. don't get matched. Well this is because of the required ((http|ftp|https|ftps)://) in your regex. You should make this section optional by adding a ?, like this:
((http|ftp|https|ftps)://)?

regexp for html p tag with style element

Sorry i'm new for regexp, so i need a regexp for this type of tag, exactly for this:
<p style="font-family: Georgia;">Text</p>
I have a regexp that suits for all p tags, regexp for all p tag is: <\s*p[^>]*>([^<]*)<\s*\/\s*p\s*>/
found it on this site. I tried to figure out by myself, i wrote this: <\s*p[\"\^font\-\family\:\()\Georgia\;\$\"\]*>([^<]*)<\s*\/\s*p\s*>/ but i know it's wrong can anyone help me :) I need regexp exactly for this piece: style="font-family: Georgia;" Thank you.
If you just want to match style="font-family: Georgia;" use this: \w+=".+"
For matching the the entire thing \s*<p \w+=".+">([^<]*)</p>\s* can do the trick. This will capture all paragraphs with exactly one attribute with a quoted value.. Not really flexible. If you only want to capture on paragraphs with exactly the font-family georgia style use \s*<p style="font-family: Georgia;">([^<]*)</p>\s*
If you find yourself using a lot of regular expressions to "parse" html, a HTML parser might save yourself a lot of trouble. Php's DOM class can parse html.

Categories