Regex to remove single characters from string

Regex to remove single characters from string - php

Consider the following strings
breaking out a of a simple prison
this is b moving up
following me is x times better
All strings are lowercased already. I would like to remove any "loose" a-z characters, resulting in:
breaking out of simple prison
this is moving up
following me is times better
Is this possible with a single regex in php?

$str = "breaking out a of a simple prison
this is b moving up
following me is x times better";
$res = preg_replace("#\\b[a-z]\\b ?#i", "", $str);
echo $res;

How about:
preg_replace('/(^|\s)[a-z](\s|$)/', '$1', $string);
Note this also catches single characters that are at the beginning or end of the string, but not single characters that are adjacent to punctuation (they must be surrounded by whitespace).
If you also want to remove characters immediately before punctuation (e.g. 'the x.'), then this should work properly in most (English) cases:
preg_replace('/(^|\s)[a-z]\b/', '$1', $string);

As a one-liner:
$result = preg_replace('/\s\p{Ll}\b|\b\p{Ll}\s/u', '', $subject);
This matches a single lowercase letter (\p{Ll}) which is preceded or followed by whitespace (\s), removing both. The word boundaries (\b) ensure that only single letters are indeed matched. The /u modifier makes the regex Unicode-aware.
The result: A single letter surrounded by spaces on both sides is reduced to a single space. A single letter preceded by whitespace but not followed by whitespace is removed completely, as is a single letter only followed but not preceded by whitespace.
So
This a is my test sentence a. o How funny (what a coincidence a) this is!
is changed to
This is my test sentence. How funny (what coincidence) this is!

You could try something like this:
preg_replace('/\b\S\s\b/', "", $subject);
This is what it means:
\b # Assert position at a word boundary
\S # Match a single character that is a “non-whitespace character”
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
\b # Assert position at a word boundary
Update
As raised by Radu, because I've used the \S this will match more than just a-zA-Z. It will also match 0-9_. Normally, it would match a lot more than that, but because it's preceded by \b, it can only match word characters.
As mentioned in the comments by Tim Pietzcker, be aware that this won't work if your subject string needs to remove single characters that are followed by non word characters like test a (hello). It will also fall over if there are extra spaces after the single character like this
test a hello
but you could fix that by changing the expression to \b\S\s*\b

Try this one:
$sString = preg_replace("#\b[a-z]{1}\b#m", ' ', $sString);

Related

Regular expression alphanumeric with dash and underscore and space, but not at the beginning or at the end of the string [duplicate]

I want to design an expression for not allowing whitespace at the beginning and at the end of a string, but allowing in the middle of the string.
The regex I've tried is this:
\^[^\s][a-z\sA-Z\s0-9\s-()][^\s$]\

This should work:
^[^\s]+(\s+[^\s]+)*$
If you want to include character restrictions:
^[-a-zA-Z0-9-()]+(\s+[-a-zA-Z0-9-()]+)*$
Explanation:
the starting ^ and ending $ denotes the string.
considering the first regex I gave, [^\s]+ means at least one not whitespace and \s+ means at least one white space. Note also that parentheses () groups together the second and third fragments and * at the end means zero or more of this group.
So, if you take a look, the expression is: begins with at least one non whitespace and ends with any number of groups of at least one whitespace followed by at least one non whitespace.
For example if the input is 'A' then it matches, because it matches with the begins with at least one non whitespace condition. The input 'AA' matches for the same reason. The input 'A A' matches also because the first A matches for the at least one not whitespace condition, then the ' A' matches for the any number of groups of at least one whitespace followed by at least one non whitespace.
' A' does not match because the begins with at least one non whitespace condition is not satisfied. 'A ' does not matches because the ends with any number of groups of at least one whitespace followed by at least one non whitespace condition is not satisfied.
If you want to restrict which characters to accept at the beginning and end, see the second regex. I have allowed a-z, A-Z, 0-9 and () at beginning and end. Only these are allowed.
Regex playground: http://www.regexr.com/

This RegEx will allow neither white-space at the beginning nor at the end of your string/word.
^[^\s].+[^\s]$
Any string that doesn't begin or end with a white-space will be matched.
Explanation:
^ denotes the beginning of the string.
\s denotes white-spaces and so [^\s] denotes NOT white-space. You could alternatively use \S to denote the same.
. denotes any character expect line break.
+ is a quantifier which denote - one or more times. That means, the character which + follows can be repeated on or more times.
You can use this as RegEx cheat sheet.

In cases when you have a specific pattern, say, ^[a-zA-Z0-9\s()-]+$, that you want to adjust so that spaces at the start and end were not allowed, you may use lookaheads anchored at the pattern start:
^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$
^^^^^^^^^^^^^^^^^^^^
Here,
(?!\s) - a negative lookahead that fails the match if (since it is after ^) immediately at the start of string there is a whitespace char
(?![\s\S]*\s$) - a negative lookahead that fails the match if, (since it is also executed after ^, the previous pattern is a lookaround that is not a consuming pattern) immediately at the start of string, there are any 0+ chars as many as possible ([\s\S]*, equal to [^]*) followed with a whitespace char at the end of string ($).
In JS, you may use the following equivalent regex declarations:
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = /^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = new RegExp("^(?!\\s)(?![^]*\\s$)[a-zA-Z0-9\\s()-]+$")
var regex = new RegExp(String.raw`^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$`)
If you know there are no linebreaks, [\s\S] and [^] may be replaced with .:
var regex = /^(?!\s)(?!.*\s$)[a-zA-Z0-9\s()-]+$/
See the regex demo.
JS demo:
var strs = ['a b c', ' a b b', 'a b c '];
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/;
for (var i=0; i<strs.length; i++){
console.log('"',strs[i], '"=>', regex.test(strs[i]))
}

if the string must be at least 1 character long, if newlines are allowed in the middle together with any other characters and the first+last character can really be anyhing except whitespace (including ##$!...), then you are looking for:
^\S$|^\S[\s\S]*\S$
explanation and unit tests: https://regex101.com/r/uT8zU0

This worked for me:
^[^\s].+[a-zA-Z]+[a-zA-Z]+$
Hope it helps.

How about:
^\S.+\S$
This will match any string that doesn't begin or end with any kind of space.

^[^\s].+[^\s]$
That's it!!!! it allows any string that contains any caracter (a part from \n) without whitespace at the beginning or end; in case you want \n in the middle there is an option s that you have to replace .+ by [.\n]+

pattern="^[^\s]+[-a-zA-Z\s]+([-a-zA-Z]+)*$"
This will help you accept only characters and wont allow spaces at the start nor whitespaces.

This is the regex for no white space at the begining nor at the end but only one between. Also works without a 3 character limit :
\^([^\s]*[A-Za-z0-9]\s{0,1})[^\s]*$\ - just remove {0,1} and add * in order to have limitless space between.

As a modification of #Aprillion's answer, I prefer:
^\S$|^\S[ \S]*\S$
It will not match a space at the beginning, end, or both.
It matches any number of spaces between a non-whitespace character at the beginning and end of a string.
It also matches only a single non-whitespace character (unlike many of the answers here).
It will not match any newline (\n), \r, \t, \f, nor \v in the string (unlike Aprillion's answer). I realize this isn't explicit to the question, but it's a useful distinction.

Letters and numbers divided only by one space. Also, no spaces allowed at beginning and end.
/^[a-z0-9]+( [a-z0-9]+)*$/gi

I found a reliable way to do this is just to specify what you do want to allow for the first character and check the other characters as normal e.g. in JavaScript:
RegExp("^[a-zA-Z][a-zA-Z- ]*$")
So that expression accepts only a single letter at the start, and then any number of letters, hyphens or spaces thereafter.

use /^[^\s].([A-Za-z]+\s)*[A-Za-z]+$/. this one. it only accept one space between words and no more space at beginning and end

If we do not have to make a specific class of valid character set (Going to accept any language character), and we just going to prevent spaces from Start & End, The must simple can be this pattern:
/^(?! ).*[^ ]$/
Try on HTML Input:
input:invalid {box-shadow:0 0 0 4px red}
/* Note: ^ and $ removed from pattern. Because HTML Input already use the pattern from First to End by itself. */
<input pattern="(?! ).*[^ ]">
Explaination
^ Start of
(?!...) (Negative lookahead) Not equal to ... > for next set
Just Space / \s (Space & Tabs & Next line chars)
(?! ) Do not accept any space in first of next set (.*)
. Any character (Execpt \n\r linebreaks)
* Zero or more (Length of the set)
[^ ] Set/Class of Any character expect space
$ End of
Try it live: https://regexr.com/6e1o4

^[^0-9 ]{1}([a-zA-Z]+\s{1})+[a-zA-Z]+$
-for No more than one whitespaces in between , No spaces in first and last.
^[^0-9 ]{1}([a-zA-Z ])+[a-zA-Z]+$
-for more than one whitespaces in between , No spaces in first and last.

Other answers introduce a limit on the length of the match. This can be avoided using Negative lookaheads and lookbehinds:
^(?!\s)([a-zA-Z0-9\s])*?(?<!\s)$
This starts by checking that the first character is not whitespace ^(?!\s). It then captures the characters you want a-zA-Z0-9\s non greedily (*?), and ends by checking that the character before $ (end of string/line) is not \s.
Check that lookaheads/lookbehinds are supported in your platform/browser.

Here you go,
\b^[^\s][a-zA-Z0-9]*\s+[a-zA-Z0-9]*\b
\b refers to word boundary
\s+ means allowing white-space one or more at the middle.

(^(\s)+|(\s)+$)
This expression will match the first and last spaces of the article..

Unexpected behavior of preg_replace() with regular expression containing \h on à [duplicate]

I want to replace all empty spaces on the beginning of all new lines. I have two regex replacements:
$txt = preg_replace("/^　+/m", '', $txt);
$txt = preg_replace("/^[^\S\r\n]+/m", '', $txt);
Each of them matches different kinds of empty spaces. However, there may be chances that both of the empty spaces exist and in different orders, so I want to match occurences of all of them at the beginning of new lines. How can I do that?
NOTE: The first regex matches an ideographic space, \u3000 char, which is only possible to check in the question raw body (SO rendering is not doing the right job here). The second regex matches only ASCII whitespace chars other than LF and CR. Here is a demo proving the second regex does not match what the first regex matches.

Since you want to remove any horizontal whitespace from a Unicode string you need to use
\h regex escape ("any horizontal whitespace character (since PHP 5.2.4)")
u modifier (see Pattern Modifiers)
Use
$txt = preg_replace("/^\h+/mu", '', $txt);
Details
^ - start of a line (m modifier makes ^ match all line start positions, not just string start position)
\h+ - one or more horizontal whitespaces
u modifier will make sure the Unicode text is treated as a sequence of Unicode code points, not just code units, and will make all regex escapes in the pattern Unicode aware.

Regex Matches white space but not tab (php)

How to write a regex with matches whitespace but no tabs and new line?
thanks everything
[[:blank:]]{2,} <-- Even though this isn't good for me because its whitespace or tab but not newlines.

As per my original comment, you can use this.
Code
See regex in use here
Note: The link contains whitespace characters: tab, newline, and space. Only space is matched.
[^\S\t\n\r]
So your regex would be [^\S\t\n\r]{2,}
Explanation
[^\S\t\n\r] Match any character not present in the set.
\S Matches any non-whitespace character. Since it's a double negative it will actually match any whitespace character. Adding \t, \n, and \r to the negated set ensures we exclude those specific characters as well. Basically, this regex is saying:
Match any whitespace character except \t\n\r
This principle in regex is often used with word characters \w to negate the underscore _ character: [^\W_]

[ ]{2,} works normally (not sure about php)
or even / {2,}/

Php Regex to insert character after first all-capital letter word in a string

I'm trying to use a preg_replace or similar php function to:
- identify the first all capital letter word in a string,
- and insert a character directly after it (a dash or semi-colon will do)
- the all capital letter word should be 3 characters long or more.
So far I have the regular expression:
/(?<!\ )([^A-Z{3,}])/
But, this isn't working in terms of only words that are 3+ characters. I'm also not sure I have it 'strictly' only looking at the very first word.
I believe that once I have the regex sorted out - this
$string = "LONDON On November 12th twelve people...";
$replaced_string = preg_replace('/myregex/',': ', $string);
will output as the following
LONDON: On November 12th twelve people..."

It's a fairly simple regex, really:
$replacedString = preg_replace('/\b([A-Z]{3,})\b/', '$1: ', $string);
It works like this:
\b: word boundary. This detects the start and end of a "word"
([A-Z]{3,}): Match 3 or more upper-case characters. The brackets capture this part of the match, so we can use it in the replacement string
\b: Another word boundary
Replace this match with:
'$1: ': the $1 refers back to the first captured group (the 3 or more upper case characters). To this, we're adding a colon and a space. That will be our replacement string
This will add the colon and space after all upper-case words of 3 or more characters. To replace only 1 word, just pass a limit to preg_replace:
$replaced = preg_replace('/\b([A-Z]{3,})\b/', '$1: ', $string, 1);
Where that last argument is the number of matches you wish to replace. -1 for all, 1 for 1, 2 for 2, etc...
Demo
Judging by your sample string, the upper-case words are city names. It's possible for city names to contain a dash, or even a space. To address this, you might want to match all strings containing upper-case chars, dashes and spaces:
$replaceAll = preg_replace('/\b([A-Z -]{2,}[A-Z])\b/', '$1: ', $string);
Demo 2
What changed:
([A-Z -]{2,}: The capturing match start with upper-case chars (2 or more, not 3), but also matches spaces and dashes.
[A-Z]): The last character of the captured group must be an upper-case character, this avoids capturing the trailing spaces or dashes. The result is that we capture stuff like "NEW YORK" or "FOO-TOWN", but not "ON - Something".
The rest is the same as before. If you want to allow for other characters that might occur (like a dot) just add them to the first part of the capturing group. The most complete pattern will probably be something like this:
$replaced = preg_replace('/\b([A-Z][A-Z .-]+[A-Z])\b/', '$1: ', $string);
This ensures the captured group starts, and ends with an upper case character, and contains any number of upper-case chars, spaces, dots and dashes in between. So this will match something like "ST. LEWIS", too

Remove whitespace around certain character with preg_replace

I have a string where I want to remove all the whitespace around a certain character using preg_replace. In my case this character is /.
For example:
first part / second part would become first part/second part
Or let's say that character is : now:
first part : second part would become first part:second part
I couldn't find an example on how to do this... Thanks!

$string = preg_replace("/\s*([\/:])\s*/", "$1", $string);
Explanation:
\s* means any amount (*) of whitespace (\s)
[\/:] is either a / or a :. If you want another character, just add it here.
the brackets are a capture group which you reference with the $1 meaning that if it matches a : then the $1 will mean :.

Replace : with your character.
$string = preg_replace("/\s*:\s*/", ":", $string);
In english:
Replace any amount of whitespace (including 0), then a : and then any amount of whitespace again, by just a :.

match optional space followed by your character (captured in brackets) followed by another optional space and then replace by your captured character
preg_replace('/\s*(:)\s*/',"$1",$str);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex to remove single characters from string - php

$str = "breaking out a of a simple prison this is b moving up following me is x times better"; $res = preg_replace("#\\b[a-z]\\b ?#i", "", $str); echo $res;

Try this one: $sString = preg_replace("#\b[a-z]{1}\b#m", ' ', $sString);

Related

Regular expression alphanumeric with dash and underscore and space, but not at the beginning or at the end of the string [duplicate]

Unexpected behavior of preg_replace() with regular expression containing \h on à [duplicate]

Regex Matches white space but not tab (php)

Php Regex to insert character after first all-capital letter word in a string

Remove whitespace around certain character with preg_replace

Categories

Resources