I have a string :
$s = "I am not foo+bar";
I want to remove the first portion of $s starting from the beginning of the string until the word "foo+" so it becomes "I am not foo+bar" :
$s == "bar"
How can I achieve that with PHP?
Edit : I have a "+" sign inside the string. Why preg_replace is not replacing it? The pattern that I've used is /^(.*?\bfoo+)\b/. Any ideas?
You should be able to use a regex to find everything up until a certain word. For your example,
/^(.*?\bfoo)\b/
Should work with preg_replace. The ^ makes sure we start at the beginning of the string. .*? is anything (excluding new lines add the s modifier to allow new lines as well) until the first foo.
Simply put: \b allows you to perform a "whole words only" search using a regular expression in the form of \bword\b. A "word character" is a character that can be used to form words. All characters that are not "word characters" are "non-word characters".
-http://www.regular-expressions.info/wordboundaries.html
Regex demo: https://regex101.com/r/gJ3nS7/3
Rough untested replacement example using preg_quote.
preg_replace('/^(.*?\b' . preg_quote('foo', '/') . '\b/', '', $s);
Longer example the + is a special character but also is a non-word character so the \b won't work trailing that. You can put the + into an optional grouping with the word boundary and that should work.
https://regex101.com/r/gJ3nS7/5
/^(.*?\bfoo(?:\+|\b))/
Related
I have a regular expression to escape all special characters in a search string. This works great, however I can't seem to get it to work with word boundaries. For example, with the haystack
add +
or
add (+)
and the needle
+
the regular expression /\+/gi matches the "+". However the regular expression /\b\+/gi doesn't. Any ideas on how to make this work?
Using
add (plus)
as the haystack and /\bplus/gi as the regex, it matches fine. I just can't figure out why the escaped characters are having problems.
\b is a zero-width assertion: it doesn't consume any characters, it just asserts that a certain condition holds at a given position. A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. (A "word character" is a letter, a digit, or an underscore.) In your string:
add +
...there's a word boundary at the beginning because the a is not preceded by a word character, and there's one after the second d because it's not followed by a word character. The \b in your regex (/\b\+/) is trying to match between the space and the +, which doesn't work because neither of those is a word character.
Try changing it to:
/\b\s?+/gi
Edit:
Extend this concept as far as you want. If you want the first + after any word boundary:
/\b[^+]*+/gi
Boundaries are very conditional assertions; what they anchor depends on what they touch. See this answer for a detailed explanation, along with what else you can do to deal with it.
I have a website where users can have custom actions when a keyword is detected in a sentence. How I currently do matches is like the following:
$output = array();
preg_match('/\b' . $keyword . '\b/', $phrase, $output);
If I find a match if(count($output) > 0) { then the custom action is ran. This is for spoken sentences so it is for things like operator, we have a custom one called [silence] so when silence is detected it runs an action.
However when the keyword contains brackets for example: [silence] the regex fails because it has square brackets. I have tried escaping both like \b\[silence\]\b However this does not detect a match.
Also this is in PHP
Thanks in advance,
Joe
The "word boundary" expression matches if the next character is a part of a word, and [ isn't (it is not a letter)
From Regex tutorial :
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
Simply put: \b allows you to perform a “whole words only” search using a regular expression in the form of \bword\b. A “word character” is a character that can be used to form words. All characters that are not “word characters” are “non-word characters”.
So you need to "rewrite" the \b expression that can suit your need, like :
(?<=[\s\.,;])\[silence\](?=[\s\.,;])
First, a non-matching "delimiter character" (space, dot, comma, ... You probably need to add a few more), followed by your expression, followed by a non-matching delimiter character again.
I have been looking around and googling about regex and I came up with this to make sure some variable has letters in it (and nothing else).
/^[a-zA-Z]*$/
In my mind ^ denotes the start of the string, the token [a-zA-Z] should in my mind make sure only letters are allowed. The star should match everything in the token, and the $-sign denotes the end of the string I'm trying to match.
But it doesn't work, when I try it on regexr it doesn't work sadly. What's the correct way to do it then? I would also like to allow hyphen and spaces, but figured just starting with letters are good enough to start with and expand.
Short answer : this is what you are looking for.
/^[a-zA-Z]+$/
The star * quantifier means "zero or more", meaning your regexp will match everytime even with an empty string as subject. You need the + quantifier instead (meaning "one or more") to achieve what you need.
If you also want to match at least one character which could also be a whitespace or a hyphen you could add those to your character class ^[A-Za-z -]+$ using the plus + sign for the repetition.
If you want to use preg_match to match at least one character which can contain an upper or lowercase character, you could shorten your pattern to ^[a-z]+$ and use the i modifier to make the regex case insensitive. To also match a hyphen and a whitespace, this could look like ^[a-z -]+$
For example:
$strings = [
"TeSt",
"Te s-t",
"",
"Te4St"
];
foreach ($strings as $string) {
if(preg_match('#^[a-z -]+$#i', $string, $matches)){
echo $matches[0] . PHP_EOL;
}
}
That would result in:
TeSt
Te s-t
Output php
Could you help me with PHP function/regex that in given text finds all words starting with character ":" ?
..in other words all substrings that start with ":" and are separated with " " (a space)
Since :word should probably be valid, and I guess :word:another should be considered two words, then you cannot say that there is always a space.
Words in natural languages can be followed by dots and other characters.
In digital input, they can be followed by end of line.
I suggest using this regexp:
~:\w+~
It takes any : character followed by at least one alpha character and will end at any character that is not valid letter.
Example: on RegExr.com
You can also try ~:\w+\b~, where \b is word boundary (literally end of word), but I see it not necessary here.
Note: \w stands for [a-zA-Z0-9_] meaning it catches underscores _ and digits 0-9 as well. It works pretty much like variable/function naming in PHP
EDIT (some notes on usage):
You said that in given text (I understand that like input with random things) you want to extract all words prepended with :, for example :word. To do that easily, you should use preg_match_all() function with PREG_PATTERN_ORDER flag.
Example:
$regex = '~(:\w+)~';
if (preg_match_all($regex, $input, $matches, PREG_PATTERN_ORDER)) {
foreach ($matches[1] as $word) {
echo $word .'<br/>';
}
}
regex: /:\w+\s/g
\w Matchs any word character
\s Matchs whitespace character
This would work:
preg_match('/^:\w*\s$/g', $var);
Sorry, because I don't use PHP. But I suppose that your problem is that PHP would have reserved the character ":" for some reason in its regex implementation ?
Well, in that case, you still can catch any word beginning with ":" and ending with some space this way:
(...)
match('^\x3A[.]*[\s]');
("3A" is hexadecimal value for 58, which is the ASCII code for ":")
This should work, I think...
Consider the following strings
breaking out a of a simple prison
this is b moving up
following me is x times better
All strings are lowercased already. I would like to remove any "loose" a-z characters, resulting in:
breaking out of simple prison
this is moving up
following me is times better
Is this possible with a single regex in php?
$str = "breaking out a of a simple prison
this is b moving up
following me is x times better";
$res = preg_replace("#\\b[a-z]\\b ?#i", "", $str);
echo $res;
How about:
preg_replace('/(^|\s)[a-z](\s|$)/', '$1', $string);
Note this also catches single characters that are at the beginning or end of the string, but not single characters that are adjacent to punctuation (they must be surrounded by whitespace).
If you also want to remove characters immediately before punctuation (e.g. 'the x.'), then this should work properly in most (English) cases:
preg_replace('/(^|\s)[a-z]\b/', '$1', $string);
As a one-liner:
$result = preg_replace('/\s\p{Ll}\b|\b\p{Ll}\s/u', '', $subject);
This matches a single lowercase letter (\p{Ll}) which is preceded or followed by whitespace (\s), removing both. The word boundaries (\b) ensure that only single letters are indeed matched. The /u modifier makes the regex Unicode-aware.
The result: A single letter surrounded by spaces on both sides is reduced to a single space. A single letter preceded by whitespace but not followed by whitespace is removed completely, as is a single letter only followed but not preceded by whitespace.
So
This a is my test sentence a. o How funny (what a coincidence a) this is!
is changed to
This is my test sentence. How funny (what coincidence) this is!
You could try something like this:
preg_replace('/\b\S\s\b/', "", $subject);
This is what it means:
\b # Assert position at a word boundary
\S # Match a single character that is a “non-whitespace character”
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
\b # Assert position at a word boundary
Update
As raised by Radu, because I've used the \S this will match more than just a-zA-Z. It will also match 0-9_. Normally, it would match a lot more than that, but because it's preceded by \b, it can only match word characters.
As mentioned in the comments by Tim Pietzcker, be aware that this won't work if your subject string needs to remove single characters that are followed by non word characters like test a (hello). It will also fall over if there are extra spaces after the single character like this
test a hello
but you could fix that by changing the expression to \b\S\s*\b
Try this one:
$sString = preg_replace("#\b[a-z]{1}\b#m", ' ', $sString);