Regex to insert line breaks after specific series of token - php

I am trying to convert a multi-line string, to insert line breaks \n after two final closing parenthesis occur in series, ie: )) becomes ))\n.
There is also likely to be a ')' just prior to the '))', effectively creating ')))'.
These two or three parenthesis may or may not be "spread out" by indeterminate lengths of whitespace, eg )), ) ), ))), ) ) ), )) ), ) )) and so on.
I've tried the following:
//Example message
$message = '(item (name 286) (Index 31) (Image "item001") (class money coin) (code 4 110 0) (country 2) (plural 1) (buy 0))
(item(name 7904)(Index 7904) (specialty (Dex 10(defense 55)(hp 3500)(dodge 71) ))
(item(name 7905)(Index 7905)(country 2)
(level 80)(specialty(hp 3400) ) )
(item(name 7906)(Index 7906)(level 80) (specialty(Str 10)) ) ';
// Converts all lines into one line
$message = preg_replace("/[\r\n]*/","",$message);
// Replace '))' with '))\n' - doesn't work.
$message = preg_replace("/[)s+)]s*/","\n",$message);
$InititemLines = explode("\n", $message);
for ($line = 0; $line < count($InititemLines); $line++) {
echo "Line #<b>{$line}</b> : " . $InititemLines[$line] . "<br />\n";
}
To convert all lines into one, I used:
$message = preg_replace("/[\r\n]*/","",$message);
Then, to replace )) with ))\n, I tried the following (but it doesn't work):
$message = preg_replace("/[)s+)]s*/","))\n",$message);
I want the output to be like this:
Line #0: (item (name 286) (Index 31) (Image "item001") (class money coin) (code 4 11 0 0) (country 2) (plural 1) (buy 0))
Line #1: (item(name 7904)(Index 7904) (specialty (Dex 10)(defense 55)(hp 3500)(dodge 71) ))
Line #2: (item(name 7905)(Index 7905)(country 2)(level 80)(specialty(hp 3400) ) )
Line #3: (item(name 7906)(Index 7906)(level 80) (specialty(Str 10)) )

This will replace the "))" at the end of ALL lines, in the case of line ending in ))) or )):
$message = Preg_replace( "\)?(\s*\)\s*\))", "$1\n", $message );
This regex means
find an optional single closing parenthesis ')'. We escape it as ')' as the parenthesis has special meanings in regex. It's optional because of the trailing '?'.
followed by 0 or more space characters denoted as '\s'. 0 or more denoted by '*',
followed by another ')'
followed by another 0 or more spaces,
followed by another ')'
Then we surround \s*\)\s*\) with a pair of '(' and ')' meaning "group this section, so we can reference it later". We do this so we can replace it with ))\n.
And then a more elegant solution might be (depending on your requirements...), to subsequently also strip any excess remaining spaces from before every ')':
$message = preg_replace("(\)\s*)", "\)", $message);
This regex means
find an operning ')',
followed by 0 or more spaces
Grouped, so we can replace.
(In your example, I believe this will strip all the excess whitespace, while leaving the spaces in your strings alone).

Thank you got it working fine with
$message = Preg_replace("/(\s*\)\s*\)?\s*\))/", "$1\n", $message );

Related

Limited number of break lines in text in PHP

Assume a string is like
Section 1
Section 2 (after 1 line break)
Section 3 (after 2 line breaks)
Section 4 (after 4 line breaks)
Section 5 (after 1 line break)
My intention is to only allow N number of breaks and replace the other ones with SPACE in PHP. For example, if N=3 then the text above would be outputed like:
Section 1
Section 2 (after 1 line break)
Section 3 (after 2 line breaks) Section 4 (after 4 line breaks) Section 5 (after 1 line break)
My code is below but I am looking for a better way:
function limitedBreaks($str = '', $n = 5)
{
$str = nl2br($str);
$chars = str_split($str);
$counter = 0;
foreach ($chars as $key => $char) {
if ($char == "<br/>")
if ($counter > $n) {
$chars[$key] = ' ';
} else {
$counter += 1;
}
}
return implode($chars, ' ');
}
This really is a job better suited for regex than exploding and iterating.
$str = preg_replace('~\v+~', "\n\n", $str);
Here \v+ matches any number of vertical spaces, substituted with two \n newlines. You will get a standard one line gap between your content (non-linebreak-only) lines. This results in:
Section 1
Section 2 (after 1 line break)
Section 3 (after 2 line breaks)
Section 4 (after 4 line breaks)
Section 5 (after 1 line break)
If you want to only target more than N breaks, use e.g. \v{4,}, assuming EOL is \n, to normalize all instances of 4 or more newlines. If your file uses Windows EOL (\r\n), be aware that you need to double it up, or use (\r\n|\n){4,}, since \r and \n each are one match of \v.
That's the basic idea. Seeing as you want to replace 4+ newlines with horizontal space, merging the lines instead of normalizing line break quantity, you would simply:
$str = preg_replace('~(\r\n|\n){4,}~', " ", $str);
This would give you:
Section 1
Section 2 (after 1 line break)
Section 3 (after 2 line breaks) Section 4 (after 4 line breaks)
Section 5 (after 1 line break)
Here the gap with 4 or more EOLs was substituted with space and merged with the preceding line. The rest of the "acceptably spaced" lines are still in their places.
However it seems that you want to merge all subsequent rows into a single line after any gap of 4+ EOLs. Is that really the requirement? The first example I posted is a fairly standard operation for normalizing content with irregular linebreaks; especially "level items" like section headings!
OP: thanks for explaining your use case, makes sense. This can be regex-ed without loops:
$str = preg_replace_callback('~(\r\n|\n){4,}(?<therest>.+)~s', function($match) {
return ' ' . preg_replace('~\v+~', ' ', $match['therest']);
}, $str);
Here we capture (as named group therest) all the content that follows four or more linebreaks using preg_replace_callback, and inside the callback we preg_replace all vertical spaces in "the rest" with a single horizontal space. This results in:
Section 1
Section 2 (after 1 line break)
Section 3 (after 2 line breaks) Section 4 (after 4 line breaks) Section 5 (after 1 line break) Section 17 after a hundred bundred line breaks"
For convenience, here's the regex above wrapped in a function:
function fuse_breaks(string $str): string {
$str = preg_replace_callback('~(\r\n|\n){4,}(?<therest>.+)~s', function($match) {
return ' ' . preg_replace('~\v+~', ' ', $match['therest']);
}, $str);
return $str;
}
// Usage:
$fused_text = fuse_breaks($source_text);
Your example with N=3 shows either 4 line breaks – if the empty lines count –, or 2 line breaks.
To make things clearer this is a function limitedLines, which reduces the text to a specific amount of lines:
$str = "
line 1
line 2
line 3
line 4
line 5
line 6
";
function limitedLines(string $str = '', int $maxLines = 5): string {
$maxLines = $maxLines < 1 ? 1 : $maxLines;
$arr = explode("\n", trim($str));
while (count($arr) > $maxLines) {
$last = array_pop($arr);
$arr[array_key_last($arr)] .= ' ' . trim($last);
}
return implode("\n", $arr);
}
$result = limitedLines($str, 3);
print_r($result);
This will print:
line 1
line 2
line 3 line 4 line 5 line 6

preq_replace automatically adds single quote in Thai language

My intention is to breaks groups of 50 chars that do not contain spaces with a \n
My code is like this:
$string= preg_replace('/([^\s<>]{50})(?=[^\s])/u', "$1\n$2", '[ติดต่อนัดหมายชมโครงการหรือสอบถามข้อมูลเพิ่มเติมที่]');
And the result adds \n and additional ' in the new line:
[ติดต่อนัดหมายชมโครงการหรือสอบถามข้อมูลเพิ่มเติมที\n
่]
But with an arbitrary value:
$string= preg_replace('/([^\s<>]{50})(?=[^\s])/u', "$1\n$2", '[ติดต่อนัดหมายชมโครงการหรือสอบถามข้อมูลติดต่อนัดัด]');
The result shows without ':
[ติดต่อนัดหมายชมโครงการหรือสอบถามข้อมูลติดต่อนัดัด\n
]
Why does it add additional ' in the new line ?
How can I avoid it ?

How to remove part of string in php?

I have this string
NOTEBOOK > ABC
TABLET > DFG
I want to remove everything after '>' including '>'
I tried this
$category = substr($categoryGet,0,strrpos($categoryGet.">",">"));
No result so far
You can use preg_replace
$category = preg_replace('/>[^>]*$/m', '', $category);
The regular expression matches > followed by any non-> characters until the end of the line. The m modifier makes $ match the end of each line in the string.
$str="NOTEBOOK > ABC
TABLET > DFG";
$x=explode("\n",$str); //break by lines
foreach($x as $y){
$k=explode('>',$y);
echo trim($k[0]); //or that ever form you want the output to be
}

Altering .htaccess with PHP - removing a rewrite rule

I am using PHP to remove/add static pages once a page has been deleted, I want to be able to remove it from the .htaccess, however I've tried this, but it throws an error:
Warning: preg_replace() [function.preg-replace]: Unknown modifier '' in ...
The code:
$page_name = $row['page_name']; // Example: help
preg_replace('/RewriteRule ' . preg_quote('^' . $page_name . '/?$ page.php?mode=') . '.*/i', '', $htaccess);
This is an example of what it should fully remove:
RewriteRule ^help/?$ page.php?mode=help
You have to escape the expression delimiter by passing it to preg_quote as the second argument.
preg_replace('/RewriteRule ' . preg_quote('^' . $page_name . '/?$ page.php?mode=', '/') . '.*/i', '', $htaccess);
Or else your / won't be escaped. As stated in the documentation "the special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -"
USe like this
preg_replace ( "~~msi", "pattern to replace").
Also - good practive is analise by line's not - change in all text a time!!!
so
foreach ( file(.htaccess) as $line)
{
and replace in each line,
}
than output all, store copy of old .htaccess ...
,Arsen

Convert Single Line Comments To Block Comments

I need to convert single line comments (//...) to block comments (/*...*/). I have nearly accomplished this in the following code; however, I need the function to skip any single line comment is already in a block comment. Currently it matches any single line comment, even when the single line comment is in a block comment.
## Convert Single Line Comment to Block Comments
function singleLineComments( &$output ) {
$output = preg_replace_callback('#//(.*)#m',
create_function(
'$match',
'return "/* " . trim(mb_substr($match[1], 0)) . " */";'
), $output
);
}
As already mentioned, "//..." can occur inside block comments and string literals. So if you create a small "parser" with the aid f a bit of regex-trickery, you could first match either of those things (string literals or block-comments), and after that, test if "//..." is present.
Here's a small demo:
$code ='A
B
// okay!
/*
C
D
// ignore me E F G
H
*/
I
// yes!
K
L = "foo // bar // string";
done // one more!';
$regex = '#
("(?:\\.|[^\r\n\\"])*+") # group 1: matches double quoted string literals
|
(/\*[\s\S]*?\*/) # group 2: matches multi-line comment blocks
|
(//[^\r\n]*+) # group 3: matches single line comments
#x';
preg_match_all($regex, $code, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE);
foreach($matches as $m) {
if(isset($m[3])) {
echo "replace the string '{$m[3][0]}' starting at offset: {$m[3][1]}\n";
}
}
Which produces the following output:
replace the string '// okay!' starting at offset: 6
replace the string '// yes!' starting at offset: 56
replace the string '// one more!' starting at offset: 102
Of course, there are more string literals possible in PHP, but you get my drift, I presume.
HTH.
You could try a negative look behind: http://www.regular-expressions.info/lookaround.html
## Convert Single Line Comment to Block Comments
function sinlgeLineComments( &$output ) {
$output = preg_replace_callback('#^((?:(?!/\*).)*?)//(.*)#m',
create_function(
'$match',
'return "/* " . trim(mb_substr($match[1], 0)) . " */";'
), $output
);
}
however I worry about possible strings with // in them. like:
$x = "some string // with slashes";
Would get converted.
If your source file is PHP, you could use tokenizer to parse the file with better precision.
http://php.net/manual/en/tokenizer.examples.php
Edit:
Forgot about the fixed length, which you can overcome by nesting the expression. The above should work now. I tested it with:
$foo = "// this is foo";
sinlgeLineComments($foo);
echo $foo . "\n";
$foo2 = "/* something // this is foo2 */";
sinlgeLineComments($foo2);
echo $foo2 . "\n";
$foo3 = "the quick brown fox";
sinlgeLineComments($foo3);
echo $foo3. "\n";;

Categories