regex exclude space from \W - php

What I have now: preg_match("[\W|_]",$string), which matches any non-word character and underscores. However, I only want to match strings containing \w and individual spaces in the middle of the string (as opposed to $string starting or ending with any number of spaces), but not underscores. Thanks for your help!
Examples that should be matched: Example 123 or One Two Three.
Examples that should be rejected: example& or (starting with one ore more spaces, and multiple spaces between "Example" and "of) Example of foo.

Ah, so you don't need to catch the results of the match - just to test whether or not the string matches some pattern. That can be done with...
$pattern = '/^[A-Z0-9](?:[A-Z0-9 ]*[A-Z0-9])?$/i';
... but that's destined to fail if you want to cover letters outside of ASCII range. You should use this instead then:
$pattern = '/^[\p{L}0-9](?:[\p{L}0-9 ]*[\p{L}0-9])?$/u';
Check the demo to see that in action.

Related

Regex - Escaping square brackets along with boundry

I have a website where users can have custom actions when a keyword is detected in a sentence. How I currently do matches is like the following:
$output = array();
preg_match('/\b' . $keyword . '\b/', $phrase, $output);
If I find a match if(count($output) > 0) { then the custom action is ran. This is for spoken sentences so it is for things like operator, we have a custom one called [silence] so when silence is detected it runs an action.
However when the keyword contains brackets for example: [silence] the regex fails because it has square brackets. I have tried escaping both like \b\[silence\]\b However this does not detect a match.
Also this is in PHP
Thanks in advance,
Joe
The "word boundary" expression matches if the next character is a part of a word, and [ isn't (it is not a letter)
From Regex tutorial :
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
Simply put: \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b. A “word character” is a character that can be used to form words. All characters that are not “word characters” are “non-word characters”.
So you need to "rewrite" the \b expression that can suit your need, like :
(?<=[\s\.,;])\[silence\](?=[\s\.,;])
First, a non-matching "delimiter character" (space, dot, comma, ... You probably need to add a few more), followed by your expression, followed by a non-matching delimiter character again.

PHP/Laravel trim all but last word in a namespace

Trying to trim a fully qualified namespace so to use just the last word. Example namepspace is App\Models\FruitTypes\Apple where that final word could be any number of fruit types. Shouldn't this...
$fruitName = 'App\Models\FruitTypes\Apple';
trim($fruitName, "App\\Models\\FruitTypes\\");
...do the trick? It is returning an empty string. If I try to trim just App\\Models\\ it returns FruitTypes\Apples as expected. I know the backslash is an escape character, but doubling should treat those as actual backslashes.
If you want to use native functionality for this rather than string manipulation, then ReflectionClass::getShortName will do the job:
$reflection = new ReflectionClass('App\\Models\\FruitTypes\\Apple');
echo $reflection->getShortName();
Apple
See https://3v4l.org/eVl9v
preg_match() with the regex pattern \\([[:alpha:]]*)$ should do the trick.
$trimmed = preg_match('/\\([[:alpha:]]*)$/', $fruitName);
Your result will then live in `$trimmed1'. If you don't mind the pattern being a bit less explicit, you could do:
preg_match('/([[:alpha:]]*)$/', $fruitName, $trimmed);
And your result would then be in $trimmed[0].
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
preg_match - php.net
(matches is the third parameter that I named $trimmed, see documentation for full explanation)
An explanation for the regex pattern
\\ matches the character \ literally to establish the start of the match.
The parentheses () create a capturing group to return the match or a substring of the match.
In the capturing group ([[:alpha:]]*):
[:alpha:] matches a alphabetic character [a-zA-Z]
The * quantifier means match between zero and unlimited times, as many times as possible
Then $ asserts position at the end of the string.
So basically, "Find the last \ then return all letter between this and the end of the string".

Match 2 or more uppercase characters in entire string

I'm trying to create a pattern in PHP that matches 2 or more upper case characters in a string.
I've tried the following, but it only matches 2 or more upper case characters in a row, not the entire string:
preg_match('/[A-Z]{2,}/', $string);
For example, the string "aBcDe" or "Red Apple" should return true.
You just have to allow other characters between your uppercase letters:
^(?:.*?\p{Lu}){2}
Demo
I used \p{Lu} here to include Unicode characters as well. If you don't want that just use [A-Z] instead like you did in your pattern.
This simply means:
^ from the start of the pattern
(?: group:
.*? match anything, but as few chars as possible
\p{Lu} match an uppercase letter
){2} ... two times
If all you need to do is identify that a string contains at least 2 uppercase characters then you can use the following:
[A-Z].*?[A-Z]
Try it here.
If you need to identify the specific uppercase characters in the string then things get more complicated.
UPDATE: As Lucas mentioned, you need a different regex if you want unicode support.
\p{Lu}.*?\p{Lu}
^.*[A-Z].*[A-Z].*$
A simple pattern stating the same would do.See demo.
https://regex101.com/r/pT4tM5/23
[A-Z].*[A-Z]
is about as simple as it gets - match an uppercase followed by anything repeated any number of times followed by any other uppercase letter.
If you need to match the whole line/string that has at least 2 upper case letters, you can also use
^(?=(?:.*[A-Z]){2}).+$
Demo here.

REGEX - match words that contain letters repeating next to each other

im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).

Regex to remove single characters from string

Consider the following strings
breaking out a of a simple prison
this is b moving up
following me is x times better
All strings are lowercased already. I would like to remove any "loose" a-z characters, resulting in:
breaking out of simple prison
this is moving up
following me is times better
Is this possible with a single regex in php?
$str = "breaking out a of a simple prison
this is b moving up
following me is x times better";
$res = preg_replace("#\\b[a-z]\\b ?#i", "", $str);
echo $res;
How about:
preg_replace('/(^|\s)[a-z](\s|$)/', '$1', $string);
Note this also catches single characters that are at the beginning or end of the string, but not single characters that are adjacent to punctuation (they must be surrounded by whitespace).
If you also want to remove characters immediately before punctuation (e.g. 'the x.'), then this should work properly in most (English) cases:
preg_replace('/(^|\s)[a-z]\b/', '$1', $string);
As a one-liner:
$result = preg_replace('/\s\p{Ll}\b|\b\p{Ll}\s/u', '', $subject);
This matches a single lowercase letter (\p{Ll}) which is preceded or followed by whitespace (\s), removing both. The word boundaries (\b) ensure that only single letters are indeed matched. The /u modifier makes the regex Unicode-aware.
The result: A single letter surrounded by spaces on both sides is reduced to a single space. A single letter preceded by whitespace but not followed by whitespace is removed completely, as is a single letter only followed but not preceded by whitespace.
So
This a is my test sentence a. o How funny (what a coincidence a) this is!
is changed to
This is my test sentence. How funny (what coincidence) this is!
You could try something like this:
preg_replace('/\b\S\s\b/', "", $subject);
This is what it means:
\b # Assert position at a word boundary
\S # Match a single character that is a “non-whitespace character”
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
\b # Assert position at a word boundary
Update
As raised by Radu, because I've used the \S this will match more than just a-zA-Z. It will also match 0-9_. Normally, it would match a lot more than that, but because it's preceded by \b, it can only match word characters.
As mentioned in the comments by Tim Pietzcker, be aware that this won't work if your subject string needs to remove single characters that are followed by non word characters like test a (hello). It will also fall over if there are extra spaces after the single character like this
test a hello
but you could fix that by changing the expression to \b\S\s*\b
Try this one:
$sString = preg_replace("#\b[a-z]{1}\b#m", ' ', $sString);

Categories