Multiple preg_replace on a variable - php

It is safe to use multiple preg_replace and str_replace on a variable?
$this->document->setDescription(tokenTruncate(str_replace(array("\r", "\n"), ' ', preg_replace( '/\s+/', ' ',preg_replace("/[^\w\d ]/ui", ' ', $custom_meta_description))),160));
This is a code which I am using to remove newlines, whitespaces and all non-alphanumeric characters (excluding unicode). The last preg_replace is for the non-alphanumeric characters, but dots are removed too. Is there any way to keep dots, commas, - separators?
Thanks!

What you want can be done in a single expression:
preg_replace('/(?:\s|[^\w.,-])+/u', ' ', $custom_meta_description);
It replaces either spaces (tabs, newlines as well) or things that aren't word-like, digits or punctuation.

What you're trying to do can be achieved with a single preg_replace statement:
$str = preg_replace('#\P{Xwd}++#', '', $str);
$this->document->setDescription($desc, tokenTruncate($str, 160));
The above preg_replace() statement will replace anything that's not a Unicode digit, letter or whitespace from the supplied string.
See the Unicode Reference for more details.

Related

Regex split words by spaces and all punctuation marks except

I am trying to split a file into words by separated by any type and any amount of whitespace and punctuation marks except for the following punctuations ' - ’. How would I do this? This is currently what i have but it isn't separating on periods.
$words = preg_split("/((?![a-zA-Z'-’])\s)+/",$file);
Using preg_match_all is more simple:
preg_match_all("~[A-Z'’-]+~ui", $str, $m);
$words = $m[0];
I added the u modifier because ’ is outside of the ascii range.
If you need other characters than ascii letters, quotes or hyphens, add them in the character class.

How to remove more than one whitespace

Hello guys I currently have a problem with my preg_replace :
preg_replace('#[^a-zA-z\s]#', '', $string)
It keeps all alphabetic letters and white spaces but I want more than one white space to be reduced to only one. Any idea how this can be done ?
$output = preg_replace('!\s+!', ' ', $input);
From Regular Expression Basic Syntax Reference
\d, \w and \s
Shorthand character classes matching digits, word characters (letters, digits, and underscores), and whitespace (spaces, tabs, and line breaks). Can be used inside and outside character classes.
The character type \s stands for five different characters: horizontal tab (9), line feed (10), form feed (12), carriage return (13) and ordinary space (32). The following code will find every substring of $string which is composed entirely of \s. Only the first \s in the substring will be preserved. For example, if line feed, horizontal tab and ordinary space occur immediately after one another in a substring, line feed alone will remain after the replacement is done.
$string = preg_replace('#(\s)\s+#', '\1', $string);
preg_replace(array('#\s+#', '#[^a-zA-z\s]#'), array(' ', ''), $string);
Though it will replace all of whitespaces with spaces. If you want to replace consequent whitespaces (like two newlines with only one newline) - you should figure out logic for that, coz \s+ will match "\n \n \n" (5 whitespaces in a row).
try using trim instead
<?php
$something = " Error";
echo $something."\n";
echo "------"."\n";
echo trim($something);
?>
output
Error
------
Error
Question is old and miss some details. Let's assume OP wanted to reduce all consecutive horizontal whitespaces and replace by a space.
Exemple:
"\t\t \t \t" => " "
"\t\t \t\t" => "\t \t"
One possible solution would be simply to use the generic character type \h which stands for horizontal whitespace space:
preg_replace('/\h+/', ' ', $text)

PHP Trim() of everything that is not "text"

I have a source of strings that typically looks like this
word1
phrase with more words than one
a phrase prefaced by whitespace that is not whitespace in code
wordX
NOTE! The whitespace before the words and phrases comes out as whitespace to the naked eye but is not being trimmed by using "trim()".
Is there any way to use either Trim() or preg_replace() to KEEP the whitespaces within the phrases but trim it outside (which looks like whitespaces but isn't).
EDIT: I have no idea what "char" the whitespacelooking spaces before and after the words and phrases are.
This will replace all whitespace characters (spaces, tabs, and line breaks) to a single space:
$output = preg_replace('!\s+!', ' ', $input);
EDIT:
For the first-whitespace, you can either trim() it, or use this instead:
$output = preg_replace('!^\s+!', '', preg_replace('!\s+!', ' ', $input));
I think it could be done as a single RegExp, if a RegExpu guru manages to do it, I'd want this person to have his answer accepted instead.

Normalize spaces in a string?

I need to normalize the spaces in a string:
Remove multiple adjacent spaces
Remove spaces at the beginning and end of the string
E.g. " my name is " => my name is
I tried
str_replace(' ',' ',$str);
I also tried php Replacing multiple spaces with a single space but that didn't work either.
Replace any occurrence of 2 or more spaces with a single space, and trim:
$str = preg_replace('/ {2,}/', ' ', trim($input));
Note: using the whitespace character class \s here is a fairly bad idea since it will match linebreaks and other whitespace that you might not expect.
Use a regex
$text = preg_replace("~\\s{2,}~", " ", $text);
The \s approach strips away newlines too, and / {2,}/ approach ignores tabs and spaces at beginning of line right after a newline.
If you want to save newlines and get a more accurate result, I'd suggest this impressive answer to similar question, and this improvement of the previous answer. According to their note, the answer to your question is:
$norm_str = preg_replace('/[^\S\r\n]+/', ' ', trim($str));
In short, this is taking advantage of double negation. Read the links to get an in-depth explanation of the trick.

How to strip out extra asterisks in a string using preg_replace()

I know how to strip out extra spaces, dashes, and periods using preg_replace(), but I need to know what format below is correct for stripping out extra asterisks in a string.
These lines of code work for stripping out extra spaces, dashes, and periods:
// Strips out extra spaces
$string = preg_replace('/\s\s+/', ' ',$string);
// Strips out extra dashes
$string = preg_replace('/-+/', '-', $string);
// Strips out extra periods
$string = preg_replace('/\.+/', '.', $string);
Which of the following is correct for stripping out extra asterisks?
// Version 1: Strips out extra asterisks
$string = preg_replace('/\*+/', '*', $string);
// Version 2: Strips out extra asterisks
$string = preg_replace('/*+/', '*', $string);
Thank you in advance.
By the way, is there a list somewhere that shows all the characters that need to be escaped with a forward slash when using PHP?
Try this:
$string = preg_replace('/\*{2,}/', '*', $string);
This will replace any instances of multiple asterisks next to one another with one asterisk.
Or, if you wanted to just get rid of all asterisks:
$string = preg_replace('/[\*]+/', '', $string);
It's worth noting that * is a special character in regular expressions; so, you must escape it with a backslash.
Also, here's a good regex reference:
http://www.regular-expressions.info/reference.html
Here's how you could combine multiple character replacements into one regex:
$string = preg_replace('/(\*|\.){2,}/', '$1', $string);
This will replace asterisks as well as periods.

Categories