PHP wont recognise double line feed - php

I am running a RST to php conversion and am using preg_match.
this is the rst i am trying to identify:
An example of the **Horizon Mapping** dialog box is shown below. A
summary of the main features is given below.
.. figure:: horizon_mapping_dialog_horizons_tab.png
**Horizon Mapping** dialog box, *Horizons* tab
Some of the input values to the **Horizon Mapping** job can be changed
during a Workflow using the internal programming language, IPL. For
details, refer to the *IPL User Guide*.
and I am using this regex:
$match = preg_match("/.. figure:: (.*?)(\n{2}[ ]{3}.*\n)/s", $text, &$result);
however it is returning as false.
here is a link of the expression working on regex
http://regex101.com/r/oB3fW7.

Are you sure that the line break is \n, is doubt, use \R:
$match = preg_match("/.. figure:: (.*?)(\R{2}[ ]{3}.*\R)/s", $text, &$result);
\R stands for either \n, \r and \r\n

My instinct would be to do some troubleshooting around the s flag as well as the $result variable passed by reference. To achieve the same without any interference from dots and the return variable, can you please try this regex:
..[ ]figure::[ ]([^\r\n]*)(?:\n|\r\n){2}[ ]{3}[^\r\n]*\R
In code, please try exactly like this:
$regex = "~..[ ]figure::[ ]([^\r\n]*)(?:\n|\r\n){2}[ ]{3}[^\r\n]*\R~";
if(preg_match($regex,$text,$m)) echo "Success! </br>";
Finally:
If this does not working, you might have a weird Unicode line break that php is not catching. To debug, for each character of your string, iterate through all the string's characters
Iterate: foreach(str_split($text) as $c) {
Print the character: echo $c . " value = "
Print the value from this function: . _uniord($c) . "<br />"; }

Related

PHP regex replace based on \v character (vertical tab)

I have a character string like (ascii codes):
32,13,7,11,11,
"string1,blah;like: this...", 10,10, 32,32,32,32, 138,138, 32,32,32,32, 13,7, 11,11,
"string2/lorem/example-text...", 10,10, 32,32,32,32,32, 143,143,143,143,143
So the sequence is:
any characters, followed by my search string, followed by any
characters
11,11
the string I want to replace
any non-printable characters
If the block contains string1 then I need to replace the next string with something else. The second string always starts directly after the 11,11.
I'm using PHP.
I thought something like this, but I am not getting the correct result:
$updated = preg_replace("/(.*string1.*?\\v+)([[:print:]]+)([[:ascii:]]*)/mi", "$1"."new string"."$3", $orig);
This puts "new string" between the 10,10 and the 138,138 (and replaces the 32's).
Also tried \xb instead of \v.
Normally I test with regex101, but not sure how to do that with non-printable characters. Any suggestions from regex guru's?
Edit: the expected output is the sequence:
32,13,7,11,11,
"string1,blah;like: this...", 10,10, 32,32,32,32, 138,138, 32,32,32,32, 13,7, 11,11,
"new string", 10,10, 32,32,32,32,32, 143,143,143,143,143
Edit: sorry for the confusion regarding the ascii codes.
Here's a complete example:
<?php
$s = chr(32).chr(32).chr(7).chr(11).chr(11);
$s .= "string1,blah;like: this...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(138).chr(138);
$s .= chr(32).chr(32).chr(32).chr(32).chr(13).chr(7).chr(11).chr(11);
$s .= "string2/lorem/example-text...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(32).chr(143).chr(143).chr(143);
$result = preg_replace('/(.*string1.*?\v+)([[:print:]]+)([[:ascii:]]*)/mi', "$1"."new string"."$3", $s);
echo "\n------------------------\n";
echo $result;
echo "\n------------------------\n";
The text string2/lorem/example-text... should be replaced by new string.
My php-cli halted every time preg_match has reached char(138) and I don't know why.
I will throw my hat on this RegEx (note: \v matches a new-line | no flags are set):
"[^"]*"[^\x0b]+\v{2}"\K[^"]*
PHP code:
$source = chr(32).chr(13).chr(7).chr(11).chr(11)."\"string1,blah;like: this...\"".chr(10).
chr(10).chr(32).chr(32).chr(32).chr(32).chr(138).chr(138).chr(32).chr(32).chr(32).chr(32).
chr(13).chr(7).chr(11).chr(11)."\"string2/lorem/example-text...\"".chr(10).chr(10).chr(32).
chr(32).chr(32).chr(32).chr(32).chr(143).chr(143).chr(143).chr(143).chr(143);
echo preg_replace('~"[^"]*"[^\x0b]+\v{2}"\K[^"]*~', "new string", $source);
Beautiful output:
"string1,blah;like: this..."
��
"new string"
�����
Live demo
Solved. It was a combination of things:
/mis was needed (instead of /mi)
\x0b was needed (instead of \v)
Complete working example:
<?php
$s = chr(32).chr(32).chr(7).chr(11).chr(11);
$s .= "string1,blah;like: this...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(138).chr(138);
$s .= chr(32).chr(32).chr(32).chr(32).chr(13).chr(7).chr(11).chr(11);
$s .= "string2/lorem/example-text...". chr(10).chr(10).chr(32).chr(32).chr(32).chr(32).chr(32).chr(143).chr(143).chr(143);
$result = preg_replace('/(.*string1.*?\x0b+)([[:print:]]+)/mis', "$1"."new string", $s);
echo "\n------------------------\n";
echo $result;
echo "\n------------------------\n";
Thanks for everyone's suggestions. It put me on the right track.

Regex to select url except when = is directly infront of it

I'm trying to use a regex to find and replace all URLs in a forum system. This works but it also selects anything that is within bbcode. This shouldn't be happening.
My code is as follows:
<?php
function make_links_clickable($text){
return preg_replace('!(([^=](f|ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i', '$1', $text);
}
//$text = "https://www.mcgamerzone.com<br>http://www.mcgamerzone.com/help/support<br>Just text<br>http://www.google.com/<br><b>More text</b>";
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Unparsed text:</b><br>";
echo $text;
echo "<br><br>";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
?>
All urls that occur in bb-code are following up on a = character, meaning that I don't want anything that starts with = to be selected.
I basically have that working but this results in selecting 1 extra character in in front of the string that should be selected.
I'm not very familiar with regex. The final output of my code is this:
<b>Unparsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa<br>
<br>
<b>Parsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa
You can match and skip [url=...] like this:
\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)
See regex demo
That way, you will only match the URLs outside the [url=...] tag.
IDEONE demo:
function make_links_clickable($text){
return preg_replace('~\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)~iu', '$1', $text);
}
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
You can use a negative lookbehind (?<!=) instead of your negated class. It asserts that what is going to be matched isn't preceded by something.
Example

Keep all html whitespaces in php mysql

i want to know how to keep all whitespaces of a text area in php (for send to database), and then echo then back later. I want to do it like stackoverflow does, for codes, which is the best approach?
For now i using this:
$text = str_replace(' ', '&nbs p;', $text);
It keeps the ' ' whitespaces but i won't have tested it with mysql_real_escape and other "inject prevent" methods together.
For better understanding, i want to echo later from db something like:
function jack(){
var x = "blablabla";
}
Thanks for your time.
Code Blocks
If you're trying to just recreate code blocks like:
function test($param){
return TRUE;
}
Then you should be using <pre></pre> tags in your html:
<pre>
function test($param){
return TRUE;
}
</pre>
As plain html will only show one space even if multiple spaces/newlines/tabs are present. Inside of pre tags spaces will be shown as is.
At the moment your html will look something like this:
function test($param){
return TRUE;
}
Which I would suggest isn't desirable...
Escaping
When you use mysql_real_escape you will convert newlines to plain text \n or \r\n. This means that your code would output something like:
function test($param){\n return TRUE;\n}
OR
<pre>function test($param){\n return TRUE;\n}</pre>
To get around this you have to replace the \n or \r\n strings to newline characters.
Assuming that you're going to use pre tags:
echo preg_replace('#(\\\r\\\n|\\\n)#', "\n", $escapedString);
If you want to switch to html line breaks instead you'd have to switch "\n" to <br />. If this were the case you'd also want to switch out space characters with - I suggest using the pre tags.
try this, works excellently
$string = nl2br(str_replace(" ", " ", $string));
echo "$string";

preg_replace between > and <

I have this:
<div> 16</div>
and I want this:
<div><span>16</span></div>
Currently, this is the only way I can make it work:
preg_replace("/(\D)(16)(\D)/", "$1<span>$2</span>$3", "<div> 16</div>")
If I leave off the $3, I get:
<div><span>16</span>/div>
Not quite sure what your after, but the following is more generic:
$value = "<div> 16 </div>";
echo(preg_replace('%(<(\D+)[^>]*>)\s*([^\s]*)\s*(</\2>)%', '\1<span>\3</span>\4', $value));
Which would result in:
<div><span>16</span></div>
Even if the value were:
<p> 16 </div>
It would result in:
<p><span>16</span></p>
I think you meant to say you're using the following:
print preg_replace("/(\\D+)(16)(\\D+)/", "$1<span>$2</span>$3", "<div>16</div>");
There's nothing wrong with that. $3 is going to contain everything matched in the second (\D+) group. If you leave it off, obviously it's not going to magically appear.
Note that your code in the question had some errors:
You need to escape your \'s in a string.
You need to use \D+ to match multiple characters.
You have a space before 16 in your string, but you're not taking this into account in your regex. I removed the space, but if you want to allow for it you should use \s* to match any number of whitespace characters.
The order of your parameters was incorrect.
Try following -
$str = "<div class=\"number\"> 16</div>";
$formatted_str = preg_replace("/(<div\b[^>]*>)(.*?)<\/div>/i", "$1<span>$2</span></div>", $s);
echo $formatted_str;
This is what ended up working the best:
preg_replace('/(<.*>)\s*('. $page . ')\s*(<.*>)/i', "$1" . '<span class="curPage">' . "$2" . '</span>' . "$3", $pagination)
What I found was that I didn't know for sure what tags would precede or follow the page number.

nl2br for paragraphs

$variable = 'Afrikaans
Shqip - Albanian
Euskara - Basque';
How do I convert each new line to paragraph?
$variable should become:
<p>Afrikaans</p>
<p>Shqip - Albanian</p>
<p>Euskara - Basque</p>
Try this:
$variable = str_replace("\n", "</p>\n<p>", '<p>'.$variable.'</p>');
The following should do the trick :
$variable = '<p>' . str_replace("\n", "</p><p>", $variable) . '</p>';
Be careful, with the other proposals, some line breaks are not catch.
This function works on Windows, Linux or MacOS :
function nl2p($txt){
return str_replace(["\r\n", "\n\r", "\n", "\r"], '</p><p>', '<p>' . $txt . '</p>');
}
$array = explode("\n", $variable);
$newVariable = '<p>'.implode('</p><p>', $array).'</p>'
<?php
$variable = 'Afrikaans
Shqip - Albanian
Euskara - Basque';
$prep0 = str_replace(array("\r\n" , "\n\r") , "\n" , $variable);
$prep1 = str_replace("\r" , "\n" , $prep0);
$prep2 = preg_replace(array('/\n\s+/' , '/\s+\n/') , "\n" , trim($prep1));
$result = '<p>'.str_replace("\n", "</p>\n<p>", $prep2).'</p>';
echo $result;
/*
<p>Afrikaans</p>
<p>Shqip - Albanian</p>
<p>Euskara - Basque</p>
*/
?>
Explanation:
$prep0 and $prep1: Make sure each line ends with \n.
$prep2: Remove redundant whitespace. Keep linebreaks.
$result: Add p tags.
If you don't include $prep0, $prep1 and $prep2, $result will look like this:
<p>Afrikaans
</p>
<p>Shqip - Albanian
</p>
<p>Euskara - Basque</p>
Not very nice, I think.
Also, don't use preg_replace unless you have to. In most cases, str_replace is faster (at least according to my experience). See the comments below for more information.
Try:
$variable = 'Afrikaans
Shqip - Albanian
Euskara - Basque';
$result = preg_replace("/\r\n/", "<p>$1</p>", $variable);
echo $result;
I know this is a very old thread, but I want to highlight, that suggested solutions can have some issues in HTML world:
They do not check whether there is already a p tag around respective paragraph. This can result in extra paragraphs. At least some browsers will then show this as extra paragraphs, meaning <p>line1<p>line2</p>line3</p> will result in 3 paragraphs, which may not be the intention.
In fact, there is a bunch of tags, that are not expected inside of p, as per the spec of phrasing content. Or rather there is a limited set of tags, tha can be.
They will change new lines inside tags, where you want to preserve new lines as is. pre and textarea are the ones, where you could generally want that. code, samp, kbd and var are an example of other common values, but technically it can be any tag with white-space CSS property set to either pre, pre-wrap, pre-line or break-spaces.
They usually only check for \r\n or just \r or \n, while there are actually more symbols, that would mean new line, and they also have respective HTML entities, which can easily occur in HTML string.
To "combat" these flaws, at least, to an extent, I've just released a nl2tag library, which can also "convert" new lines to <li> items and has an "improved" nl2br logic (mostly for the sake of whitespace retention).
It's far from perfect (check the readme for limitations), but should cover you in case of relatively simple HTML string.

Categories