How to trim out the space between non letters with PHP? - php

Say,should keep the space between letters(a-z,case insensitive) and remove the space between non-letters?

This should work:
$trimmed = preg_replace('~([a-z0-9])\s+([a-z0-9])~i', '\1\2', $your_text);

This will strip any whitespace that is between two non-alpha characters:
preg_replace('/(?<![a-z])\s+(?![a-z])/i', '', $text);
This will strip any whitespace that has a non-alpha character on either side (big difference):
preg_replace('/(?<![a-z])\s+|\s+(?![a-z])/i', '', $text);
By using negative look-ahead and negative look-behind assertions, the beginning and end of the string are treated as non-alpha as well.

Related

creating regex with letters and accents

i need to create a regular expresion that match word whitespace word, it can't start with whitespace neither has more than 1 whitespaces between word and word i have to allow on each word letters and accents, i'm using this pattern:
^([^\+\*\.\|\(\)\[\]\{\}\?\/\^\s\d\t\n\r<>ºª!#"·#~½%¬&=\'¿¡~´,;:_®¥§¹×£µ€¶«²¢³\$\-\\]+\s{0,1}?)*$/
Examples:
-Graça+whitespace+anotherWord -> match
-whitespace+Graça+whitespace+anotherWord -> don't match
-Graça+whitespace+whitespace+anotherword -> don't match
In general, it is a validation to allow firstname+whitespace+lastname with accents chars and a-z chars
and i have to exclude all specials chars like /*-+)(!/($=
You can try this pattern: ^[\x{0041}-\x{02B3}]+\s[\x{0041}-\x{02B3}]+.
Explanation: since you are using characters not matched by \w, you have to define your own range of word characters. \x{0041} is just a character with unicode index equal to 0041.
Demo
For just spaces, use str_replace:
$string = str_replace(' ', '', $string);
For all whitespace, use preg_replace:
$string = preg_replace('/\s+/', '', $string);

Regular expression to remove trailing chars

I'm looking for a regular expression in Php that could transform incoming strings like this:
abaisser_negation_pronominal_question => abaisser_n_p_q
abaisser_pronominal_question => abaisser_p_q
abaisser_negation_question => abaisser_n_q
abaisser_negation_pronominal => abaisser_n_p
abaisser_negation_voix_passive_pronominal => abaisser_n_v_p_p
abaisser => abaisser
With the Php code close to something like:
$line=preg_replace("/<h3>/im", "", $line);
How would you do?
You can use:
$input = preg_replace('/(_[A-Za-z])[^_\n]*/', '$1', $input);
RegEx Demo
Explanation:
This regex searches for (_[A-Za-z])[^_\n]* which means underscore followed by single letter and then match before a newline or underscore
It capture first part (_[A-Za-z]) in a backreference $1
Replacement is $1 leaving underscore and first letter in the replacement string
You could use \K or positive lookbehind.
$input = preg_replace('~_.\K[^_\n]*~', '', $input);
Pattern _. in the above regex would match an _ and also the character following the underscore. \K discards the previously matched characters that is, _ plus the following character. It won't take these two characters into consideration. Now [^_\n]* matches any character but not of an _ or a \n newline character zero or more times. So the characters after the character which was preceded by an underscore would be matched upto the next _ or \n character. Removing those characters will give you the desired output.
DEMO
$input = preg_replace('~(?<=_.)[^_\n]*~', '', $input);
It just looks after to the _ and the character following the _ and matches all the characters upto the next underscore or newline character.
DEMO
You can use regex
$input = preg_replace('/_(.)[^\n_]+/', '_$1', $input);
DEMO
What it does is capture the character after _ and match till \n or _ is encountered and replaced with the _$1 which means _ plus the character captured.
$line = preg_replace("/_([a-z])([a-z]*)/i", "_$1", $line);

preg_replace doesn't match with two spaces between words

i need to format uppercase words to bold but it doesn't work if the word contains two spaces
is there any way to make regex match only with words which end with colon?
$str = "BAKA NO TEST: hey";
$str = preg_replace('~[A-Z]{4,}\s[A-Z]\s{2,}(?:\s[A-Z]{4,})?:?~', '<b>$0</b>', $str);
output: <b>BAKA NO TEST:</b> hey
but it returns <b>BAKA</b> NO TEST: hey
the original $str is a multiline text so there are many lowercase and uppercase words but i need to change only some
You can do it like this:
$txt = preg_replace('~[A-Z]+(?:\s[A-Z]+)*:~', '<b>$0</b>', $txt);
Explanations:
[A-Z]+ # uppercase letter one or more times
(?: # open a non capturing group
\s # a white character (space, tab, newline,...)
[A-Z]+ #
)* # close the group and repeat it zero or more times
If you want a more tolerant pattern you can replace \s by \s+ to allow more than one space between each words.
Unless you have some good reason to use that regexp, try something simpler, like:
/([A-Z\s]+):/
Also, just so you know, you can use asterisk to specify none or more space characters: \s*

Regex to strip specific characters

I have been using the following regex to replace all punctuation in a string:
preg_replace('/[^\w\s]/', '', $tweet);
with \w being shorthand for [a-zA-Z0-9_] and \s is used to ommit spaces. I learned this wisdom here: Strip punctuation in an address field in PHP. But now, I need the regex to strip all characters except
a-z and A-Z
{ and }
So it should strip out all dots, commas, numbers etc. What is the correct regex for this?
preg_replace('/[^a-zA-Z{} ]/', '', $tweet);
Possibly faster variant as proposed by FakeRainBrigand in a comment, thanks:
preg_replace('/[^a-zA-Z{} ]+/', '', $tweet);
preg_replace('/[^a-z{}]/i', '', $tweet);

Remove all but valid characters

Valid characters include the alphabet (abcd..), numbers (0123456789), spaces, ' and ".
I need to strip any other characters than these from a string in PHP.
Thanks :)
You can do this:
$str = preg_replace('/[^a-z0-9 "\']/', '', $str);
Here the character class [^a-z0-9 "'] will match any character except the listed ones (note the inverting ^ at the begin of the character class) that are then replaced by an empty string.
Gumbo's answer is correct for your given specification. But if your "specification" is only "symbolic", what you eventually need might be like the following:
$str = preg_replace('{ [^ \w \s \' " ] }x', '', $str );
[^ ]: negated character class (all except these inside)
\w: alphanumeric (letters and digits)
\s: white space
\': '

Categories