How do I compress space characters with PHP regular expressions?

How do I compress space characters with PHP regular expressions? - php

$string = 'Hello this is a bunch of numbers: 333 and letters and $pecial Characters#!*(';
$foo = preg_replace("/[^a-zA-Z0-9\s]/", "", $string);
echo $foo;
The above returns:
Hello this is a bunch of numbers 333 and letters and pecial Characters
I want to retain spaces but not if theres more than one. How can that be done?
It should look like:
Hello this is a bunch of numbers 333 and letters and pecial Characters

$foo = preg_replace("/[^a-zA-Z0-9\s]/", "", $string);
$foo = preg_replace('/\s+/',' ',$foo);

One regex will do it:
$foo = preg_replace('/[^a-zA-Z0-9\s]|(\s)\s+/', '$1', $string);

I haven't worked with PHP, but in Perl it's something like:
s/\s+/ /g
i.e. replace any sequence of one or more spaces with a single space.
So I imagine the PHP to compress spaces would be:
$foo = preg_replace("/\s{2,}/", " ", $string);
I don't think there should be any problems with running two preg_replace lines, especially if it makes the code clearer.

Replace globally (edit - tested):
/(?<=\s)\s+|[^a-zA-Z0-9\s]+/
with ""

Related

Removing 'words' contained in strings with non-alphanumeric characters?

What is the recommended method in PHP for removing 'words' in strings with non-alphanumeric characters please?
$string = "Test let's test 123. https://youtu.be/dQw4w9WgXcQ EOTest.";
desired result:
"Test test 123. EOTest.";
Method 1 - regex
Method 2 - explode(), foreach() and str_replace or preg_replace

Try using the preg_split, preg_grep, and implode functions, like so:
$string = "Test let's test 123. https://youtu.be/dQw4w9WgXcQ EOTest.";
$words = preg_split('/\s+/', $string); // split on one or more spaces
$filter = preg_grep('/^[A-Za-z\d.]+$/', $words); // allow dot, letters, and numbers
$result = implode(' ', $filter); // turn it into a string
print_r($result); // -> Test test 123. EOTest.
I hope that helps!

PHP rtrim all trailing special characters

I'm making a function that that detect and remove all trailing special characters from string. It can convert strings like :
"hello-world"
"hello-world/"
"hello-world--"
"hello-world/%--+..."
into "hello-world".
anyone knows the trick without writing a lot of codes?

Just for fun
[^a-z\s]+
Regex demo
Explanation:
[^x]: One character that is not x sample
\s: "whitespace character": space, tab, newline, carriage return, vertical tab sample
+: One or more sample
PHP:
$re = "/[^a-z\\s]+/i";
$str = "Hello world\nhello world/\nhello world--\nhellow world/%--+...";
$subst = "";
$result = preg_replace($re, $subst, $str);

try this
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
or escape apostraphe from string
preg_replace('/[^A-Za-z0-9\-\']/', '', $string); // escape apostraphe

You could use a regex like this, depending on your definition of "special characters":
function clean_string($input) {
return preg_replace('/\W+$/', '', $input);
}
It replaces any characters that are not a word character (\W) at the end of the string $ with nothing. \W will match [^a-zA-Z0-9_], so anything that is not a letter, digit, or underscore will get replaced. To specify which characters are special chars, use a regex like this, where you put all your special chars within the [] brackets:
function clean_string($input) {
return preg_replace('/[\/%.+-]+$/', '', $input);
}

This one is what you are looking for. :
([^\n\w\d \"]*)$
It removes anything that is not from the alphabet, a number, a space and a new line.
Just call it like this :
preg_replace('/([^\n\w\s]*)$/', '', $string);

regex to also match accented characters

I have the following PHP code:
$search = "foo bar que";
$search_string = str_replace(" ", "|", $search);
$text = "This is my foo text with qué and other accented characters.";
$text = preg_replace("/$search_string/i", "<b>$0</b>", $text);
echo $text;
Obviously, "que" does not match "qué". How can I change that? Is there a way to make preg_replace ignore all accents?
The characters that have to match (Spanish):
á,Á,é,É,í,Í,ó,Ó,ú,Ú,ñ,Ñ
I don't want to replace all accented characters before applying the regex, because the characters in the text should stay the same:
"This is my foo text with qué and other accented characters."
and not
"This is my foo text with que and other accented characters."

The solution I finally used:
$search_for_preg = str_ireplace(["e","a","o","i","u","n"],
["[eé]","[aá]","[oó]","[ií]","[uú]","[nñ]"],
$search_string);
$text = preg_replace("/$search_for_preg/iu", "<b>$0</b>", $text)."\n";

$search = str_replace(
['a','e','i','o','u','ñ'],
['[aá]','[eé]','[ií]','[oó]','[uú]','[nñ]'],
$search)
This and the same for upper case will complain your request. A side note: ñ replacemet sounds invalid to me, as 'niño' is totaly diferent from 'nino'

If you want to use the captured text in the replacement string, you have to use character classes in your $search variable (anyway, you set it manually):
$search = "foo bar qu[eé]"
And so on.

You could try defining an array like this:
$vowel_replacements = array(
"e" => "eé",
// Other letters mapped to their other versions
);
Then, before your preg_match call, do something like this:
foreach ($vowel_replacements as $vowel => $replacements) {
str_replace($search_string, "$vowel", "[$replacements]");
}
If I'm remembering my PHP right, that should replace your vowels with a character class of their accented forms -- which will keep it in place. It also lets you change the search string far more easily; you don't have to remember to replaced the vowels with their character classes. All you have to remember is to use the non-accented form in your search string.
(If there's some special syntax I'm forgetting that does this without a foreach, please comment and let me know.)

PHP Regex: Remove words less than 3 characters

I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);

Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries

You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''

Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);

As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"

To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.

Regex to trim text between tags

I expected this to be a simple regex but I guess my head isn't screwed on this morning!
I'm taking the source code of a page and tidying it up with a bunch of other preg_replaces, so by the time we get to the regex below, the result is already a single line string with things like comments stripped out, etc.
All I'm looking to do now is trim the texts between > and < char's down to remove extra whitespace. I.e.
<p> hello world </p>
should become
<p>hello world</p>
I figured this would do the trick, but it seems to do nothing?
$data = trim(preg_replace('/>(\s*)([^\s]*?)(\s*)</', '>$2<', $data));
Cheers.

Here's a ridiculous way to do it lol:
$str = "<p> hello world </p>";
$strArr = explode(" ", $str);
$strArr = array_filter($strArr);
var_dump(implode(" ",$strArr));
Use the power of arrays to remove the white spaces lol

you can use the /e modifier in regex to use the trim() function while replacing.
$data = preg_replace('/>([^<]*)</e', '">" . trim("$1") . "<"', $data);

A regex could be:
>\s+(.*[^\s])\s+<
but don't use it, there are better ways to reach that goal (example: HTMLtidy)

You may use this snippet of code.
$x = '<p> hello world </p>';
$foo = preg_replace('/>\s+/', '>', $x); //first remove space after ">" symbol
$foo = htmlentities(preg_replace('/\s+</', '<', $foo)); //now remove space before "<" symbol
echo $foo;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How do I compress space characters with PHP regular expressions? - php

$foo = preg_replace("/[^a-zA-Z0-9\s]/", "", $string); $foo = preg_replace('/\s+/',' ',$foo);

One regex will do it: $foo = preg_replace('/[^a-zA-Z0-9\s]|(\s)\s+/', '$1', $string);

Replace globally (edit - tested): /(?<=\s)\s+|[^a-zA-Z0-9\s]+/ with ""

Related

Removing 'words' contained in strings with non-alphanumeric characters?

PHP rtrim all trailing special characters

regex to also match accented characters

PHP Regex: Remove words less than 3 characters

Regex to trim text between tags

Categories

Resources