CSS (/ Wordpress) White Space property behaving strangely - php

Despite setting the "white-space" property to my paragraphs to "normal" in CSS, the breaks in some of my lines are spacing really strangely. Does anyone have an idea at to what may be the cause? Here is an example of this.
Thanks a lot

these lines are prob. seperated by different p elements.
word-break: break-all;
word-break: break-word;
word-break: keep-all;
overflow-wrap: break-word;
hyphens: manual;
hyphens: auto;
line-break: loose;
line-break: strict;
line-break: anywhere;
text-wrap: balance;
these are some styles can apply to change break points in text.

Upon investigation, it turns out my wordpress had been adding in random &nbsp breaks to my <p> elements. I found a solution to this in another thread which involves adding the following code to my functions.php file:
function replace_content($content) {
$content = htmlentities($content, null, 'utf-8');
$content = str_replace(" ", " ", $content);
$content = html_entity_decode($content);
return $content;
}
add_filter('the_content','replace_content', 999999999);
[Credits to 'Bruno' https://wordpress.stackexchange.com/questions/29702/wordpress-automatically-adding-nbsp]

Related

How to remove whitespace in inline styles?

I have a php script which generates a html email. In order to optimise the size to not fall foul of Google's 102kB limit I'm trying to squeeze as unnecessary characters out of the code as possible.
I currently use Emogrifier to inline the css and then TinyMinify to minify.
The output from this still has spaces between properties and values in the inlined styles (eg style="color: #ffffff; font-weight: 16px")
I've developed the following regex to remove the extra whitespace, but it also affects the actual content too (eg this & that becomes this &that)
$out = preg_replace("/(;|:)\s([a-zA-Z0-9#])/", "$1$2", $newsletter);
How can I modify this regex to be limited to inlines styles, or is there a better approach?
There are no bullitproof ways to not match the payload (style="" can appear anywhere) and to not match actual CSS values (as in content: 'a: b'). Furthermore consider also
shortening the values: red is shorter than #f00, which is shorter than #ff0000
remove leading and trailing bogus, like whitespaces and semicolons
redesigning your HTML: i.e. using <ins> and <strong> can be effectively shorter than using inline CSS
One approach would be to match all inline style HTML attributes first and then operate on their content only, but you have to test for yourself how good this works:
$out= preg_replace_callback
( '/( style=")([^"]*)("[ >])/' // Find all appropriate HTML attributes
, function( $aMatch ) { // Per match
// Kill any amount of any kind of spaces after colon or semicolon only
$sInner= preg_replace
( '/([;:])\\s*([a-zA-Z0-9#])/' // Escaping backslash in PHP string context
, '$1$2'
, $aMatch[2] // Second sub match
);
// Kill any amount of leading and trailing semicolons and/or spaces
$sInner= preg_replace
( array( '/^\\s*;*\\s*/', '/\\s*;*\\s*$/' )
, ''
, $sInner
);
return $aMatch[1]. $sInner. $aMatch[3]; // New HTML attribute
}
, $newsletter
);
You haven't provided sample input for us to use, but you have mentioned that you are dealing with html. This should sound alarm bells that using regex as a direct solution is ill-advised. When intending to process valid html, you should be using a dom parser to isolate the style attributes.
Why shouldn't you use regex to isolate the inline style declarations? Simply put: Regex is "dom-unaware". It doesn't know when it is inside or outside of a tag (I'll provide a contrived monkeywrench in my demo to express this vulnerability. Furthermore, using a dom parser will add the benefit of correctly handling different types of quoting. While regex can be written to match/acknowledge balanced quoting, it adds considerable bloat (when executed well) and damages the readability and maintainability of your script.
In my demo, I'll show how spaces after colons, semicolons, and commas can be simply/accurately purged after isolating true inline style declarations. I've gone that little bit farther (since color hexcode condensing was mentioned on this page) to show how regex can be used to reduce some six character hexcodes to three characters.
Code: (Demo)
$html = <<<HTML
<div style='font-family: "Times New Roman", Georgia, serif; background-color: #ffffff; '>
<p>Some text
<span class="ohyeah" style="font-weight: bold; color: #ff6633 !important; border: solid 1px grey;">
Monkeywrench: style="padding: 3px;"
</span>
&
<strong style="text-decoration: underline; ">Underlined</strong>
</p>
<h1 style="margin: 1px 2px 3px 4px;">Heading</h1>
<span style="background-image: url('images/not_a_hexcode_ffffff.png'); ">Text</span>
</div>
HTML;
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
foreach ($dom->getElementsByTagName('*') as $node) {
$style = $node->getAttribute('style');
if ($style) {
$patterns = ['~[:;,]\K\s+~', '~#\K([\da-f])\1([\da-f])\2([\da-f])\3~i'];
$replaces = ['', '\1\2\3'];
$node->setAttribute('style', preg_replace($patterns, $replaces, $style));
}
}
$html = $dom->saveHtml();
echo $html;
Output:
<div style='font-family:"Times New Roman",Georgia,serif;background-color:#fff;'>
<p>Some text
<span class="ohyeah" style="font-weight:bold;color:#f63 !important;border:solid 1px grey;">
Monkeywrench: style="padding: 3px;"
</span>
&
<strong style="text-decoration:underline;">Underlined</strong>
</p>
<h1 style="margin:1px 2px 3px 4px;">Heading</h1>
<span style="background-image:url('images/not_a_hexcode_ffffff.png');">Text</span>
</div>
The above snippet uses \K in the patterns to avoid the use of lookaround and excess capture groups.
I am not writing a pattern that removes the space before !important because I have read some (not so recent) posts that some browsers express buggy behavior without the space.

preg_replace HTML code in PHP

I want to remove string like below from a html code
<span style="font-size: 0.8px; letter-spacing: -0.8px; color: #ecf6f6">3</span>
so I came up with regex.
$pattern = "/<span style=\"font-size: \\d(\\.\\d)?px; letter-spacing: -\\d(\\.\\d)?px; color: #\\w{6}\">\\w\\w?</span>/um";
However, regex doesn’t work. Can someone point me what i did wrong. I'm new to PHP.
when I tested with a simple regex, it works so problem remains with the regex.
$str = $_POST["txtarea"];
$pattern = $_POST["regex"];
echo preg_replace($pattern, "", $str);
As much as I would advocate DOMDocument to do the job here, you would still need some regular expression down the line, so ...
The expression for the px numeric value can be simply [\d.-]+, since you're not trying to validate anything.
The contents of the span can be simplified to [^<]* (i.e. anything but a opening bracket):
$re = '/<span style="font-size: [\d.-]+px; letter-spacing: [\d.-]+px; color: #[0-9a-f]{3,6}">[^<]*<\/span>/';
echo preg_replace($re, '', $str);
Do not use regex for this problem. Use an html parser. Here is a solution in python with BeautifulSoup, because I like this library for these tasks:
from BeautifulSoup import BeautifulSoup
with open('Path/to/file', 'r') as content_file:
content = content_file.read()
soup = BeautifulSoup(content)
for div in soup.findAll('span', {'style':re.compile("font-size: \d(\.\d)?px; letter-spacing: -\d(\.\d)?px; color: #\w{6}")}):
div.extract()
with open('Path/to/file.modified', 'w') as output_file:
output_file.write(str(soup))
you have a slash ( / ) in your ending tag ( closing span )
you need to escape it or to use a different delimiter than slash

How to wordwrap to 3 lines?

HI guys i want to know how to word wrap to 3 lines this is my code
$string = "this is a long string that i want condensed isjdjfibfhkbfhdbfsdbfsdhfbdsfhsbdhfkhvbsdhvshfsdjkvfjhjfdsffsdfsdfdsfdshfbdkhsvfvdfdhfsdvf";
echo wordwrap($string,55,"<br />\n",TRUE);
.wordwrap {
word-wrap: break-word;
}
You could put the text in a div or other block element that has style overflow = scroll.

nested bb code quotes how to>

Hi im using a pretty basic bbcode parser.
could you guys help me with a problem of mine?
but when for example this is written:
[quote=tanab][quote=1][code]a img{
text-decoration: none;
}[/code][/quote][/quote]
the output is this:
tanab said:
[quote=1]
a img{
text-decoration: none;
}
[/quote]
how would i go and fix that? im realllly bad at the whole preg_replace stuff.
this is my parser:
function bbcode($input){
$input = htmlentities($input);
$search = array(
'/\[b\](.*?)\[\/b\]/is',
'/\[i\](.*?)\[\/i\]/is',
'/\[img\](.*?)\[\/img\]/is',
'/\[url=(.*?)\](.*?)\[\/url\]/is',
'/\[code\](.*?)\[\/code\]/is',
'/\[\*\](.*?)/is',
'/\\t(.*?)/is',
'/\[quote=(.*?)\](.*?)\[\/quote\]/is',
);
$replace = array(
'<b>$1</b>',
'<i>$1</i>',
'<img src="$1">',
'$2',
'<div class="code">$1</div>',
'<ul><li>$1</li></ul>',
' ',
'<div class="quote"><div class="quote-writer">$1 said:</div><div class="quote-body">$2</div></div>',
);
return preg_replace($search,$replace,$input);
}
This could be adapted with a recursive regex:
'/\[quote=(.*?)\](((?R)|.*?)+)\[\/quote\]/is'
Which will at least ensure that the output divs will not be incorrectly nested. But you would still have to run the regex twice or three times to catch all quote blocks.
Otherwise it would require a rewrite of your code with preg_replace_callback. Which I cannot be bothered to showcase, since this came up a few dozen times already (try the site search!), has been solved before, etc.

How to extract image filename from style/background-image tag?

I found lots of posts regarding estracting a filename from an img-tag, but none from a CSS inline style tag. Here's the source string
<span style="width: 40px; height: 30px; background-image: url("./files/foo/bar.png");" class="bar">FOO</span>
What I want to get is bar.png.
I tried this:
$pattern = "/background-image: ?.png/";
preg_match($pattern, $string, $matches);
But this didnt work out.
Any help appreciated..
You need to read up about regular expressions.
"/background-image: ?.png/"
means "background-image:" followed optionally by a space, followed by any single character, followed (directly) by "png".
Exactly what you need depends on how much variation you need to allow for in the layout of the tag, but it will be something like
"/background-image\s*:\s*url\s*(\s*".*([^\/]+)"/
where all the "\s*" are optional spaces, and parenthesis captures something that doesn't contain a slash.
Generally, regexp is not a good tool for parsing HTML, but in this limited case it might be OK.
$string = '<span style="width: 40px; height: 30px; background-image: url("./files/foo/bar.png");" class="bar">FOO</span>';
$pattern = '/background-image:\s*url\(\s*([\'"]*)(?P<file>[^\1]+)\1\s*\)/i';
$matches = array();
if (preg_match($pattern, $string, $matches)) {
echo $matches['file'];
}
something along the lines
$style = "width: 40px; height: 30px; background-image: url('./files/foo/bar.png');";
preg_match("/url[\s]*\(([\'\"])([^\'\"]+)([\'\"])\)/", $style, $matches);
var_dump($matches[2]);
it wont work for filenames that contain ' or ". It basically matches anything between the parenthesis of url() that is not ' or "

Categories