How to strip out extra asterisks in a string using preg_replace() - php

I know how to strip out extra spaces, dashes, and periods using preg_replace(), but I need to know what format below is correct for stripping out extra asterisks in a string.
These lines of code work for stripping out extra spaces, dashes, and periods:
// Strips out extra spaces
$string = preg_replace('/\s\s+/', ' ',$string);
// Strips out extra dashes
$string = preg_replace('/-+/', '-', $string);
// Strips out extra periods
$string = preg_replace('/\.+/', '.', $string);
Which of the following is correct for stripping out extra asterisks?
// Version 1: Strips out extra asterisks
$string = preg_replace('/\*+/', '*', $string);
// Version 2: Strips out extra asterisks
$string = preg_replace('/*+/', '*', $string);
Thank you in advance.
By the way, is there a list somewhere that shows all the characters that need to be escaped with a forward slash when using PHP?

Try this:
$string = preg_replace('/\*{2,}/', '*', $string);
This will replace any instances of multiple asterisks next to one another with one asterisk.
Or, if you wanted to just get rid of all asterisks:
$string = preg_replace('/[\*]+/', '', $string);
It's worth noting that * is a special character in regular expressions; so, you must escape it with a backslash.
Also, here's a good regex reference:
http://www.regular-expressions.info/reference.html
Here's how you could combine multiple character replacements into one regex:
$string = preg_replace('/(\*|\.){2,}/', '$1', $string);
This will replace asterisks as well as periods.

Related

replacing spaces and newlines with commas

In my preg_replace RegEx here
$string = preg_replace('~[^[:alnum:],]*,[^[:alnum:]]*~', ',', $string);
i've been trying to split words from each other with commas, and it worked, But then i tried for a string like
x
y
z
and
x y z
to replace the whitespaces and newlines with so i wrote the tried using [[:space:]] and [[:blank:]] but they're more of solving whitespaces, but not newlines,
How to handle the new lines? i tried using my old replacement /[\s,]+/ for newlines and whitespaces, But still no effect, I know i can go two queries like
$string = preg_replace('/[\s,]+/', ',', $string);
$string = preg_replace('~[^[:alnum:],]*,[^[:alnum:]]*~', ',', $string);
but i prefer merging them into one RegEx for performance.
Try the following:
preg_replace("'/[^[:alnum:],]*,[^[:alnum:]]*|[\s,]+/'", ",", $string);
It will replace all spaces and new lines with a , comma.

PHP - Removing Brackets and Special Characters

I'm looking to modify a PHP string so I can use it as an anchor tag.
I used the method found here: Remove all special characters from a string
It worked well to remove ampersands from my strings, but it doesn't seem to be removing or affecting the brackets or punctuation.
Here's what I'm currently using:
$name_clean = preg_replace('/ [^A-Za-z0-9\-]/', '', $name); // REMOVES SPECIAL CHARACTERS
$name_slug = str_replace(' ', '-', $name_clean); // REPLACES SPACES WITH DASHES IN TITLE
$link = strtolower( $name_slug ); // CREATES LOWERCASE SLUG VERSION OF TITLE_SLUG
My string (in this case $name) = St. John's (Newfoundland).
The output I get = #st.-john'snewfoundland)
I'd like to remove the periods, apostrophes and brackets altogether.
Any help would be greatly appreciated!
Your regex pattern / [^A-Za-z0-9\-]/ appears to contain a space after the opening /. This pattern will only match a special character that comes after a space. Removing that space should get the result you want.

Multiple preg_replace on a variable

It is safe to use multiple preg_replace and str_replace on a variable?
$this->document->setDescription(tokenTruncate(str_replace(array("\r", "\n"), ' ', preg_replace( '/\s+/', ' ',preg_replace("/[^\w\d ]/ui", ' ', $custom_meta_description))),160));
This is a code which I am using to remove newlines, whitespaces and all non-alphanumeric characters (excluding unicode). The last preg_replace is for the non-alphanumeric characters, but dots are removed too. Is there any way to keep dots, commas, - separators?
Thanks!
What you want can be done in a single expression:
preg_replace('/(?:\s|[^\w.,-])+/u', ' ', $custom_meta_description);
It replaces either spaces (tabs, newlines as well) or things that aren't word-like, digits or punctuation.
What you're trying to do can be achieved with a single preg_replace statement:
$str = preg_replace('#\P{Xwd}++#', '', $str);
$this->document->setDescription($desc, tokenTruncate($str, 160));
The above preg_replace() statement will replace anything that's not a Unicode digit, letter or whitespace from the supplied string.
See the Unicode Reference for more details.

PHP regexp - remove all leading, trailing and standalone hyphens

I'm trying to remove all leading, trailing and standalone hyphens from string:
-on-line - auction- website
Desired result:
on-line auction website
I came up with a working solution:
^-|(?<=\s)-|-(?=\s)|-$
But it looks to me a little bit amateur (isn't it?). So do you have a better solution?
You can use this pattern:
(?<!\S)-|-(?!\S)
example:
echo preg_replace('~(?<!\S)-|-(?!\S)~', '', '-on-line - auction- website');
Another possible pattern that uses a conditional statement: -(?(?!\S)|(?<!\S.))
This last one is interesting since it benefits of a single branch with a leading literal character. This way, the regex engine is able to quickly tests only positions in the string where the character appears (due to internal optimisations before the "normal" regex engine walk).
Note that the conditional statement isn't mandatory and can also be replaced with a non-capturing group adding a : (it doesn't change the result but it's longer):
-(?:(?!\S)|(?<!\S.))
I guess it can be shortened to:
$repl = preg_replace('/(^|\s)-|-(\s|$)/', '$1$2', $str);
You can try the following:
-(?!\w)|(?<!\w)-
This either matches a dash which is followed by something that is not a word character, or a dash that is preceded by something that is not a word character.
Or if you want to put it otherwise, match all dashes which are not between two word characters.
Regex101 Demo
There's no reason you have to do everything in one regex. Split it into two or three.
s/^-\s*//; # Strip leading hyphens and optional space
s/\s*-$//; # Strip trailing hyphens and optional space
s/\s+-\s+/ /; # Change any space-hyphen-space sequences to a single space.
That's the sed/Perl syntax. You'll adjust accordingly for the preg_replace syntax.
In PHP you can use trim and rtrim to remove any characters from the beginning and end of the string. After that you can use str_replace to remove the - from the middle.
$string = '-on-line - auction- website';
$string = trim($string, "-");
$string = rtrim($string,"-");
$string = str_replace("- ", " ", $string);
$string = str_replace(" ", " ", $string); //remove double spaces left by " - "
var_dump($string);
the result:
string(24) "on-line auction website"
You can stack that up into one line if you want:
$string = $string = str_replace(" ", " ", str_replace("- ", " ", rtrim(trim($string, "-"),"-")));

How do I remove blank lines from text in PHP?

I need to remove blank lines (with whitespace or absolutely blank) in PHP. I use this regular expression, but it does not work:
$str = ereg_replace('^[ \t]*$\r?\n', '', $str);
$str = preg_replace('^[ \t]*$\r?\n', '', $str);
I want a result of:
blahblah
blahblah
adsa
sad asdasd
will:
blahblah
blahblah
adsa
sad asdasd
// New line is required to split non-blank lines
preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
The above regular expression says:
/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/
1st Capturing group (^[\r\n]*|[\r\n]+)
1st Alternative: ^[\r\n]*
^ assert position at start of the string
[\r\n]* match a single character present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
2nd Alternative: [\r\n]+
[\r\n]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
[\s\t]* match a single character present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\tTab (ASCII 9)
[\r\n]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
Your ereg-replace() solution is wrong because the ereg/eregi methods are deprecated. Your preg_replace() won't even compile, but if you add delimiters and set multiline mode, it will work fine:
$str = preg_replace('/^[ \t]*[\r\n]+/m', '', $str);
The m modifier allows ^ to match the beginning of a logical line rather than just the beginning of the whole string. The start-of-line anchor is necessary because without it the regex would match the newline at the end of every line, not just the blank ones. You don't need the end-of-line anchor ($) because you're actively matching the newline characters, but it doesn't hurt.
The accepted answer gets the job done, but it's more complicated than it needs to be. The regex has to match either the beginning of the string (^[\r\n]*, multiline mode not set) or at least one newline ([\r\n]+), followed by at least one newline ([\r\n]+). So, in the special case of a string that starts with one or more blank lines, they'll be replaced with one blank line. I'm pretty sure that's not the desired outcome.
But most of the time it replaces two or more consecutive newlines, along with any horizontal whitespace (spaces or tabs) that lies between them, with one linefeed. That's the intent, anyway. The author seems to expect \s to match just the space character (\x20), when in fact it matches any whitespace character. That's a very common mistake. The actual list varies from one regex flavor to the next, but at minimum you can expect \s to match whatever [ \t\f\r\n] matches.
Actually, in PHP you have a better option:
$str = preg_replace('/^\h*\v+/m', '', $str);
\h matches any horizontal whitespace character, and \v matches vertical whitespace.
Just explode the lines of the text to an array, remove empty lines using array_filter and implode the array again.
$tmp = explode("\n", $str);
$tmp = array_filter($tmp);
$str = implode("\n", $tmp);
Or in one line:
$str = implode("\n", array_filter(explode("\n", $str)));
I don't know, but this is maybe faster than preg_replace.
The comment from Bythos from Jamie's link above worked for me:
/^\n+|^[\t\s]*\n+/m
I didn't want to strip all of the new lines, just the empty/whitespace ones. This does the trick!
There is no need to overcomplicate things. This can be achieved with a simple short regular expression:
$text = preg_replace("/(\R){2,}/", "$1", $text);
The (\R) matches all newlines.
The {2,} matches two or more occurrences.
The $1 Uses the first backreference (platform specific EOL) as the replacement.
This has been already answered long time ago but can greatly benefit for preg_replace and a much simplified pattern:
$result = preg_replace('/\s*($|\n)/', '\1', $subject);
Pattern: Remove all white-space before a new-line -or- at the end of the string.
Longest match wins:
As the white-space \s has a greedy quantifier * and contains \n consecutive empty lines are matched.
As \s contains \r as well, \r\n new-line sequences are supported, however single \r (without \n) are not.
And when $ matches the end of the buffer the backreference \1 is empty allowing to handle trailing whitespace at the very end, too.
If leading (empty) lines need to be removed as well, they have to match while not capturing, too (this was not directly asked for but could be appropriate):
$result = preg_replace('/^(?:\s*\n)+|\s*($|\n)/', '\1', $subject);
# '----------'
Pattern: Also remove all leading white-space (first line(s) are empty).
And if the new-line at the end of the buffer should be normalized differently (always a newline at the end instead of never), it needs to be added: . "\n".
This variant is portable to \r\n, \r and \n new-line sequences ((?>\r\n|\r|\n)) or \R:
$result = preg_replace('/^(?> |\t|\r\n|\r|\n)+|(?> |\t|\r\n|\r|\n)*($|(?>\r\n|\r|\n))/', '\1', $subject);
# or:
$result = preg_replace('/^(?:\s*\R)+|\s*($|\R)/', '\1', $subject);
Pattern: Support all new-line sequences.
This is with the downside that the new-lines can not be normalized (e.g. any of the three to \n).
Therefore, it can make sense to normalize new-lines before removing:
$result = preg_replace(['/(?>\r\n|\n|\r)/', '/\s*($|\n)/'], ["\n", '\1'], $subject);
# or:
$result = preg_replace(['/\R/u', '/\s*($|\n)/'], ["\n", '\1'], $subject);
It ships with the opportunity to do some normalization apart from the line handling.
For example removal of the trailing white-space and fixing the missing new-line at the end of file.
Then doing more advanced line normalization, for example zero empty lines at the beginning and end; otherwise not more than two consecutive empty lines:
$result = preg_replace(
['/[ \t]*($|\R)/u', '/^\n*|(\n)\n*$|(\n{3})\n+/'],
["\n" , '\1\2' ],
$subject
);
The secondary pattern benefits from the first patterns replacements already.
The power with preg_replace relies here in choosing the backreference(s) to replace with wisely.
Also using multiple patterns can greatly simplify things and keep the process maintainable.
Try this one:
$str = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\r\n", $str);
If you output this to a text file, it will give the same output in the simple Notepad, WordPad and also in text editors, for example Notepad++.
Use this:
$str = preg_replace('/^\s+\r?\n$/D', '', $str);
function trimblanklines($str) {
return preg_replace('`\A[ \t]*\r?\n|\r?\n[ \t]*\Z`','',$str);
}
This one only removes them from the beginning and end, not the middle (if anyone else was looking for this).
The accepted answer leaves an extra line-break at the end of the string. Using rtrim() will remove this final linebreak:
rtrim(preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string));
From this answer, the following works fine for me!
$str = "<html>
<body>";
echo str_replace(array("\r", "\n"), '', $str);
<?php
function del_blanklines_in_array_q($ar){
$strip = array();
foreach($ar as $k => $v){
$ll = strlen($v);
while($ll--){
if(ord($v[$ll]) > 32){ //hex /0x20 int 32 ascii SPACE
$strip[] = $v; break;
}
}
}
return $strip;
}
function del_blanklines_in_file_q($in, $out){
// in filename, out filename
$strip = del_blanklines_in_array_q(file($in));
file_put_contents($out, $strip );
}
$file = "file_name.txt";
$file_data = file_get_contents($file);
$file_data_after_remove_blank_line = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $file_data );
file_put_contents($file,$file_data_after_remove_blank_line);
nl2br(preg_replace('/^\v+/m', '', $r_msg))

Categories