Regular Expressions inline options - php

I have a text that has the possible values already in the text, i want to show the right values in situations. I'm not really good with regexes and i don't really know how to explain my problem so here is an example. I've got it working almost:
$string = "This [was a|is the] test!";
preg_replace('/\[(.*)\|(.*)\]/', '$1', $string);
// results in "This was a text!"
preg_replace('/\[(.*)\|(.*)\]/', '$2', $string);
// results in "This is the test!"
This works without problems but when there are two parts it doesn't work anymore because it gets the end bracket from the last.
$string = "This [was a|is the] so this is [bullshit|filler] text";
preg_replace('/\[(.*)\|(.*)\]/', '$1', $string);
//results in "This was a|is the] test so this is [bullshit text"
preg_replace('/\[(.*)\|(.*)\]/', '$2', $string);
//results in "This filler text"
Situation 1 should be the values between ( and | and situation 2 should show the values between | and ).

Your probem is the regex greediness. Add a ? after .* to make it consume only the string within the square brackets:
preg_replace('/\[(.*?)\|(.*?)\]/', '$1', $string);
Likewise could you use the /U ungreedy modifier. Better yet use a more specific match in place of .*? anything.

Instead of using:
(.*)
...to match the stuff inside of the options groups, use this:
([^|\]]*)
That pattern matches anything that is not a | or a ], repeatedly.

You can forbid | characters in your .* replacing the . with [^|] (which means “no |”).
$string = "This [was a|is the] so this is [bullshit|filler] text";
echo preg_replace('/\[([^|]*)\|([^|]*)\]/', '$1', $string);
// results in "This was a so this is bullshit text"
echo '<br />';
echo preg_replace('/\[([^|]*)\|([^|]*)\]/', '$2', $string);
// results in "This is the so this is filler text"

Related

How to match the whole word when we have a sub-string of it?

Here is my code:
$txt = 'this is a text';
$word = 'is';
echo str_replace($word, '<b>'.$word.'</b>', $txt);
//=> th<b>is</b> <b>is</b> a text
As you see, my sub-string is is in example above and it matches just is part of this. While I need to select the whole of word. So this is expected result:
//=> <b>this</b> <b>is</b> a text
So I need to check both left and right side of the sub-string and match everything until either first of string ^ or end of string $ or white spage \s.
How can I do that?
You can use preg_replace to achieve that with Regex
http://php.net/manual/en/function.preg-replace.php
If you want to match a substring of a word as well as the word itself you can check for any word characters around the word your looking for like so:
$re = '/(\w*is\w*)/';
$str = 'this is a text';
$subst = '<b>$1<b>';
$result = preg_replace($re, $subst, $str);
echo "The result of the substitution is ".$result;
This will give you:
<b>this<b> <b>is<b> a text
Use a regular expression with word boundary anchors:
$regex = '/\b(\p{L}*' . preg_quote($word, '/') . '\p{L}*)\b/u';
echo preg_replace($regex, '<b>$1</b>', $txt);
where \p{L} stands for a Unicode letter (see Unicode character classes). If Unicode is not supported, replace \p{L} with \S (non-space character), for example.
Output
<b>this</b> <b>is</b> a text

Twitter handle regular expression PHP [duplicate]

i'm not very firm with regular Expressions, so i have to ask you:
How to find out with PHP if a string contains a word starting with # ??
e.g. i have a string like "This is for #codeworxx" ???
I'm so sorry, but i have NO starting point for that :(
Hope you can help.
Thanks,
Sascha
okay thanks for the results - but i did a mistake - how to implement in eregi_replace ???
$text = eregi_replace('/\B#[^\B]+/','\\1', $text);
does not work??!?
why? do i not have to enter the same expression as pattern?
Match anything with has some whitespace in front of a # followed by something else than whitespace:
$ cat 1812901.php
<?php
echo preg_match("/\B#[^\B]+/", "This should #match it");
echo preg_match("/\B#[^\B]+/", "This should not# match");
echo preg_match("/\B#[^\B]+/", "This should match nothing and return 0");
echo "\n";
?>
$ php 1812901.php
100
break your string up like this:
$string = 'simple sentence with five words';
$words = explode(' ', $string );
Then you can loop trough the array and check if the first character of each word equals "#":
if ($stringInTheArray[0] == "#")
Assuming you define a word a sequence of letters with no white spaces between them, then this should be a good starting point for you:
$subject = "This is for #codeworxx";
$pattern = '/\s*#(.+?)\s/';
preg_match($pattern, $subject, $matches);
print_r($matches);
Explanation:
\s*#(.+?)\s - look for anything starting with #, group all the following letters, numbers, and anything which is not a whitespace (space, tab, newline), till the closest whitespace.
See the output of the $matches array for accessing the inner groups and the regex results.
#OP, no need regex. Just PHP string methods
$mystr='This is for #codeworxx';
$str = explode(" ",$mystr);
foreach($str as $k=>$word){
if(substr($word,0,1)=="#"){
print $word;
}
}
Just incase this is helpful to someone in the future
/((?<!\S)#\w+(?!\S))/
This will match any word containing alphanumeric characters, starting with "#." It will not match words with "#" anywhere but the start of the word.
Matching cases:
#username
foo #username bar
foo #username1 bar #username2
Failing cases:
foo#username
#username$
##username

PHP preg_match & preg_replace outputing wrong

<?php
$string = "[img image:left]1.jpg[/img]Example Text 1[img image:left]2.jpg[/img] Example Text 2";
preg_match("/\[img\s*[^>]+\s*\](.*?)\[\/\s*img\]/i", $string, $match);
$result = preg_replace("/\[img\s*[^>]+\s*\](.*?)\[\/\s*img\]/i", $match['1'], $string);
echo $result;
?>
When using this code it should output 1.jpg, Example Text 1, 2.jpg, Example Text 2.
But however it shows only 2.jpg, Example Text 2.
I dont know what i'm doing wrong.
There are two fundamental issues:
you don't need to use a preg_match() and a preg_replace(), you can just use preg_replace() and reference your capture groups in the substitution
it looks like you copy pasted some code from HTML regex, and have [^>]+ inside of your [img], which says 1+ non-> characters..it should really be [^\]]+, 1+ non-] characters
Final solution:
$string = "[img image:left]1.jpg[/img]Example Text 1[img image:left]2.jpg[/img] Example Text 2";
$string = preg_replace("/\[img\s*[^\]]+\s*\](.*?)\[\/\s*img\]/i", ' \1 ', $string);
Demo: RegEx and PHP

preg_match all words start with an #?

i'm not very firm with regular Expressions, so i have to ask you:
How to find out with PHP if a string contains a word starting with # ??
e.g. i have a string like "This is for #codeworxx" ???
I'm so sorry, but i have NO starting point for that :(
Hope you can help.
Thanks,
Sascha
okay thanks for the results - but i did a mistake - how to implement in eregi_replace ???
$text = eregi_replace('/\B#[^\B]+/','\\1', $text);
does not work??!?
why? do i not have to enter the same expression as pattern?
Match anything with has some whitespace in front of a # followed by something else than whitespace:
$ cat 1812901.php
<?php
echo preg_match("/\B#[^\B]+/", "This should #match it");
echo preg_match("/\B#[^\B]+/", "This should not# match");
echo preg_match("/\B#[^\B]+/", "This should match nothing and return 0");
echo "\n";
?>
$ php 1812901.php
100
break your string up like this:
$string = 'simple sentence with five words';
$words = explode(' ', $string );
Then you can loop trough the array and check if the first character of each word equals "#":
if ($stringInTheArray[0] == "#")
Assuming you define a word a sequence of letters with no white spaces between them, then this should be a good starting point for you:
$subject = "This is for #codeworxx";
$pattern = '/\s*#(.+?)\s/';
preg_match($pattern, $subject, $matches);
print_r($matches);
Explanation:
\s*#(.+?)\s - look for anything starting with #, group all the following letters, numbers, and anything which is not a whitespace (space, tab, newline), till the closest whitespace.
See the output of the $matches array for accessing the inner groups and the regex results.
#OP, no need regex. Just PHP string methods
$mystr='This is for #codeworxx';
$str = explode(" ",$mystr);
foreach($str as $k=>$word){
if(substr($word,0,1)=="#"){
print $word;
}
}
Just incase this is helpful to someone in the future
/((?<!\S)#\w+(?!\S))/
This will match any word containing alphanumeric characters, starting with "#." It will not match words with "#" anywhere but the start of the word.
Matching cases:
#username
foo #username bar
foo #username1 bar #username2
Failing cases:
foo#username
#username$
##username

how could I combine these regex rules?

I'm detecting #replies in a Twitter stream with the following PHP code using regexes.
$text = preg_replace('!^#([A-Za-z0-9_]+)!', '#$1', $text);
$text = preg_replace('! #([A-Za-z0-9_]+)!', ' #$1', $text);
How can I best combine these two rules without false flagging email#domain.com as a reply?
OK, on a second thought, not flagging whatever#email means that the previous element has to be a "non-word" item, because any other element that could be contained in a word could be signaled as an email, so it would lead:
!(^|\W)#([A-Za-z0-9_]+)!
but then you have to use $2 instead of $1.
Since the ^ does not have to stand at the beginning of the RE, you can use grouping and | to combine those REs.
If you don't want re-insert the whitespace you captured, you have to use "positive lookbehind":
$text = preg_replace('/(?<=^|\s)#(\w+)/',
'#$1', $text);
or "negative lookbehind":
$text = preg_replace('/(?<!\S)#(\w+)/',
'#$1', $text);
...whichever you find easier to understand.
Here's how I'd do the combination
$text = preg_replace('!(^| )#([A-Za-z0-9_]+)!', '$1#$2', $text);
$text = preg_replace('/(^|\W)#(\w+)/', '#$2', $text);
preg_replace('%(?<!\S)#([A-Za-z0-9_]+)%', '#$1', $text);
(?<!\S) is loosely translated to "no preceding non-whitespace character". Sort of a double-negation, but also works at the start of the string/line.
This won't consume any preceding character, won't use any capturing group, and won't match strings such as "foo-#host.com", which is a valid e-mail address.
Tested:
Input = 'foo bar baz-#qux.com bee #def goo#doo #woo'
Output = 'foo bar baz-#qux.com bee #def goo#doo #woo'
Hu, guys, don't push too far... Here it is :
!^\s*#([A-Za-z0-9_]+)!
I think you can use alternation,: so look for the beginning of a string or a space
'!(?:^|\s)#([A-Za-z0-9_]+)!'

Categories