Parsing css with a regex

Parsing css with a regex - php

I'm wanting to scan through a css file and capture both comments and the css. I've came up with a regex that's almost there, however it's not quite perfect as it misses out properties with multiple declarations i.e.
ul.menu li a, # Won't capture this line
ul.nice-menu li a { text-decoration: none; cursor:pointer; }
Here's the regex that I'm working with:
(\/\*[^.]+\*\/\n+)?([\t]*[a-zA-Z0-9\.# -_:#]+[\t\s]*\{[^}]+\})
I've been testing this at rubular.com and here is what it currently matches, and what the array output is like.
Result 1
[0] /* Index */
/*
GENERAL
PAGE REGIONS
- Header bar region
- Navigation bar region
- Footer region
SECTION SPECIFIC
- Homepage
- News */
[1] html { background: #ddd; }
Result 2
[0]
[1] body { background: #FFF; font-family: "Arial", "Verdana", sans-serif; color: #545454;}
I must point out that I'm still a new when it comes to regular expressions, so if anyone can help and show where I'm going wrong, it'd be much appreciated :)
BTW:
I'm using PHP and preg_match_all

CSS cannot be fully parsed with a regex (see CSS Grammar: http://www.w3.org/TR/CSS2/grammar.html). The {...} can be split over lines, for example, and your current version wouldn't handle this. If you need to do this, you should read the CSS spec and use a tool like ANTLR to generate a parser.
Here is an example from the W3C spec (http://www.w3.org/TR/CSS2/syndata.html):
#import "subs.css";
#import "print-main.css" print;
#media print {
body { font-size: 10pt }
}
h1 {color: blue }
No normal regex is powerful enough to deal with nested {...} etc. let alone the contents of the imported stylesheets.

What language are you using?
You should probably just use a library to parse the CSS. Libraries can save you a lot of grief.

Related

Stop text peeking above bottom of div

Right now, I have PHP outputting a list of tags from an SQL database and creating each of them as an <a> tag that looks something like: <a class="tag" href="tags/test-tag" style="background-color:rgb(150,150,255)" title="test tag"> test tag </a> with css:
.tags {
margin-block-start: 0;
margin-block-end: 0;
margin-left: 5px;
overflow: hidden;
font-size: 16px;
display: inline;
margin-top: 2px;
text-overflow: "";
}
As it stands this looks pretty good, but after 3-4 lines (depending on title length) the tags reach the end of the div and keep going, leaving a little bit of the first tag to wrap below visible despite having overflow:hidden on.
Two rows of tags with the third barely visible ("peeking") above the bottom of the div
Is there any way to fully hide any overflowing text? I've changed values around many times to no avail, but I haven't had time to work on this in a while, so I couldn't really say what precisely I've done. Any help would be greatly appreciated.

You can use white-space: nowrap to prevent text from wrapping to the next line and keep it all on one line.
.tag {
white-space: nowrap;
}
This will keep the text from wrapping and any text that exceeds the width of the parent container will be hidden due to the overflow: hidden property.

Is there a way in css to clear all font-family + font-size style declarations?

I have a page which is a cms/wysiwyg/ms word nightmare.
It pulls many paragraphs of text from a database, some of which have retained ms word's bizarre html tags - including font declarations!!! ahh!
In one sentence I can have things like:
<span style="font-family:Verdana">this is some</span>
<span style="font-family:arial">ugly text!</span>
I was wondering if there is a way of removing all font-family and font-size styles so they will adapt the master stylesheet css?
I'd prefer to not get into massive preg_replace conditions if I can avoid it.
Thanks

CSS:
span {
font-family: initial !important;
font-size: initial !important;
}

Well, if you're getting inline styles in many places, I would add this to the body CSS
body {
font-family: Arial, Helvetica, sans-serif !important;
font-size: 16px !important;
}
If you notice that all of the inline font styling are going on spans, you could target spans instead of the body.
I chose these two fonts because they are the "default" fonts for Windows and Mac/iOS.
Of course you can choose your own font size. The only unfortunate part about this is if you want a different font and font size in other places you'll have to use more !importants.

You can use the !important rule for this. But you will have to explicitly define each element you want it to go on (or use the universal selector *)
http://jsfiddle.net/b8RKm/
* { font-family: Tahoma, sans-serif !important; }

preg_replace UNLESS string exists

I'm trying to add CSS styling to all hyperlinks unless it has a "donttouch" attribute.
E.g.
Style this: style me
Don't style this: <a href="http://whatever.com" donttouch>don't style me</a>
Here's my preg_replace without the "donttouch" exclusion, which works fine.
preg_replace('/<a(.*?)href="([^"]*)"(.*?)>(.*?)<\/a>/','<a$1href="$2"$3><span style="color:%link_color%; text-decoration:underline;">$4</span></a>', $this->html)
I've looked all over the place, and would appreciate any help.

Find (works also in Notepad++)
(?s)(<a (?:(?!donttouch)[^>])+>)(.*?)</a>
Replace with (Replace all in Notepad++):
\1<span style="whatever">\2</span></a>

This can be accomplished without a regular expression. Instead, use a CSS attribute selector.
For example, use these rules:
a { font-weight: bold; color: green }
a[donttouch=''] { font-weight: normal; color: blue }
Technically, you are styling the elements with the 'donttouch' attribute, but you can use default values. This will be more efficient than attempting to use a regular expression to parse your HTML, which is usually a bad idea.

Remove inline CSS and classes from text with PHP

I have a text in this form:
aaaa bbbbb cccccc a:link {text-decoration: none;font-family: Verdana, Arial, Helvetica, sans-serif;color: #ffffff; } a:hover {text-decoration: underline; } .intro{font-size: 11px;font-weight: bold;line-height: 18px;color : #ffffff;padding-left: 25px;font-family: Verdana, Arial, Helvetica, sans-serif; } ddddd eeeeee
I would like to remove all the css with the classes. The output should be:
aaaa bbbbb cccccc ddddd eeeeee
Can aynone show me an preg_match example? I fond an example to remove everything between the brakets {} but I need, that everything with css is removed.
Thanks
Nik

The trouble you will have with removing the non-CSS parts from that string is that it's very hard to determine which parts are CSS and which aren't.
You say you want to be left with aaaa bbbbb cccccc ddddd eeeeee, but in fact from your original string, aaaa bbbbb cccccc would be valid parts of the CSS selector. They would select elements named <aaaa> or <bbbbb> or <cccccc>.
Granted, these are not valid HTML elements, but CSS can be used to apply styles to arbitrary XML as well as HTML, so they could very easily be valid elements. If you're using xhtml, they could appear in your page quite legitimately under a custom name-space.
But it gets worse. I assume that the text wouldn't actually be aaaa bbbbb cccccc, but would be an arbitrary string of words. In that case, consider that it may be a sentence like 'I am strong'. In this case, strong would be part of the string you want to remove, but <strong> is also a valid HTML element (as is <I> for that matter), so even if you are just sticking with it would be impossible to tell in the above string whether it was intended to be a CSS selector or part of the text string to keep. You simply couldn't write a regex that would be completely reliable in all cases.
As you say, you can remove everything inside the {} braces fairly easily, but the selectors outside the braces would be very hard to separate reliably from surrounding text.

Limiting Characters to a specfic width

I have run into this problem a few times. I have no problem limiting characters in a string to a specific number. However, some characters are longer than others so it wraps to another line or makes my DIV element wider and not uniform with the rest of my website.
For instance:
Literacy - Is it the Sa
Machu Picchu, Cuzco, Pe
are the exactly the same amount of characters (23) including spaces but the Machu Pichu one is longer in terms of the actual width on screen.
Is there any way to have a uniform size for a string that is based on the width of the actual string as opposed to the number of characters? Someone has had to have come up with a solution to this before right?

First (obvious) solution: switch to a fixed-width font such as Courier, Lucida Console, Consolas, etc.
Second solution: use the GD library to write strings to a graphic object and measure that object.

You'd probably have to play with GD and imagefontwidth(): http://ar2.php.net/manual/es/function.imagefontwidth.php

Without writing an algorithm in PHP to limit characters based on "font-widths" for the specific font you are using, you can use a monospace font.
Alternatively, I'm sure a JavaScript solution could be written as well to test the widths, but I'm not sure how off of the top of my head.

This can't be done in PHP -- the best you can do is approximate. Different browsers and different operating systems render font widths differently for the same font. So even if you manually stored an array of character font widths for the font on your browser and os, it might not match up with others.
It's usually better to work your design around the possibility of different-width fonts than to try to force it.
That being said, this can be done perfectly (without approximation) in javascript, albeit with a little bit of a hack. There are a number of possible methods, but here's one: Start by rendering the full string in a div that has width that you are looking for, then measure the div's height. If it is larger than one line could possibly be, then start a loop progressively removing the last word. Keep going until the height of the div is for one line.

Use CSS for formatting so they all have uniform widths - no jagged edges on the right side. Something like this:
p { text-align: justify; }

Had some fun workin on this CSS + JS solution. It wasn't tested intensively (Firefox + IE7/8) but it should work ok...
<script type="text/javascript" src="jquery.js"></script>
<style>
.monospaced, .not-monospaced{
font: normal normal 16px/16px Verdana; /* not monospaced font */
clear: both;
}
.monospaced span{
float: left;
display: block;
width: 16px;
text-align: center;
}
</style>
<script>
$(document).ready(function(){
$('.monospaced').each(function(){
var monospace = $(this).html(); // .trim() does not work at IE http://api.jquery.com/jQuery.trim/ (view comments)
monospace.replace(/(^[\s\xA0]+|[\s\xA0]+$)/g, '');
mono = monospace.split('');
for(i = 0; i < mono.length; i++){
if(mono[i] == ' ')
mono[i] = ' ';
mono[i] = '<span>'+mono[i]+'</span>';
}
$(this).html(mono.join(''));
});
});
</script>
</head>
<body>
<div class="not-monospaced">
This is supposed to be monospaced...
</div>
<div class="not-monospaced">
mmmm mm mmmmmmmm mm mm mmmmmmmmmm...
</div>
<div class="monospaced">
This is supposed to be monospaced...
</div>
<div class="monospaced">
mmmm mm mmmmmmmm mm mm mmmmmmmmmm...
</div>
</body>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Parsing css with a regex - php

What language are you using? You should probably just use a library to parse the CSS. Libraries can save you a lot of grief.

Related

Stop text peeking above bottom of div

Is there a way in css to clear all font-family + font-size style declarations?

preg_replace UNLESS string exists

Remove inline CSS and classes from text with PHP

Limiting Characters to a specfic width

Categories

Resources