PHP capture word that contains special char from string using RegEx - php

I have special words in a string that i would like to capture based on the prefix.
Example Special words such as ^to_this should be caught.
I would need the word this because of the special prefix ^to_.
Here is my attempt but it is not working
preg_match('/\b(\w*^to_\w*)\b/', $str, $specialWordArr);
but this returns an empty array

Your code would be,
<?php
$mystring = 'Special words such as ^to_this should be caught';
$regex = '~[_^;]\w+[_^;](\w+)~';
if (preg_match($regex, $mystring, $m)) {
$yourmatch = $m[1];
echo $yourmatch;
}
?> //=> this
Explanation:
[_^;] Add the special characters into this character class to ensure that the begining of a word would be a special character.
\w+ After a special character, there must one or more word characters followed.
[_^;] Word characters must be followed by a special character.
(\w+) If these conditions are satisfied, capture the following one or more word characters into a group.

Without some additional examples this will work for what you've posted:
$str = 'Special words such as ^to_this should be caught';
preg_match('/\s\^to_(\w+)\s/', $str, $specialWordArr);
echo $specialWordArr[1]; //this

Related

PHP Extract Specific Character from string

i have the bellow string
$LINE = TCNU1573105 HDPE HTA108 155 155 000893520918 PAL990 25.2750 MT 28.9750 MT
and i want extract the PAL990 from the above string. actually extract PAL990 string or any string that has PAL followed by some digits Like PAL222 or PAL123
i tried many ways and could not get the result. i used,
substr ( $LINE, 77, 3)
but when the value in different position i get the wrong value.
You may use
$LINE = "TCNU1573105 HDPE HTA108 155 155 000893520918 PAL990 25.2750 MT 28.9750 MT";
if (preg_match('~\bPAL\d+\b~', $LINE, $res)) {
echo $res[0]; // => PAL990
}
See the PHP demo and this regex demo.
Details
\b - a word boundary
PAL - a PAL substring
\d+ - 1+ digits
\b - a word boundary.
The preg_match function will return the first match.
Note that in case your string contains similar strings in between hyphens/whitespace you will no longer be able to rely on word boundaries, use custom whitespace boundaries then, i.e.:
'~(?<!\S)PAL\d+(?!\S)~'
See this regex demo
EDIT
If you may have an optional whitespace between PAL and digits, you may use
preg_replace('~.*\b(PAL)\s?(\d+)\b.*~s', '$1$2', $LINE)
See this PHP demo and this regex demo.
Or, match the string you need with spaces, and then remove them:
if (preg_match('~\bPAL ?\d+\b~', $LINE, $res)) {
echo str_replace(" ", "", $res[0]);
}
See yet another PHP demo
Note that ? makes the preceding pattern optional (1 or 0 occurrences are matched).
$string = "123ABC1234 *$%^&abc.";
$newstr = preg_replace('/[^a-zA-Z\']/','',$string);
echo $newstr;
Output:ABCabc

Regex to match words starting with hyphen

I have a regex which does all matches except one match.The PHP Code for the word match is:
$string = preg_replace("/\b".$wordToMatch."\b/","<span class='sp_err' style='background-color:yellow;'>".$wordToMatch."</span>",$string);
Here in the above regex when the $wordToMatch variable value becomes "-abc" and the $string value is "The word -abc should match and abc-abc should not match".With above regex it fails to catch "-abc".
I want enhancement in the above regex so that it can catch "-abc" in $string,but if it tries to match "-abc" in "abc-abc" of $string it should not.
In case your keywords can have non-word characters on both ends you can rely on lookarounds for a whole word match:
"/(?<!\\w)".$wordToMatch."(?!\\w)/"
Here, (?<!\w) will make sure there is no word character before the word to match, and (?!\w) negative lookahead will make sure there is no word character after the word to match. These are unambiguous subpatterns, while \b meaning depends on the context.
See regex demo showing that -abc is not matched in abc-abc and matches if it is not enclosed with word characters.
PHP demo:
$wordToMatch = "-abc";
$re = "/(?<!\\w)" . $wordToMatch . "(?!\\w)/";
$str = "abc-abc -abc";
$subst = "!$0!";
$result = preg_replace($re, $subst, $str);
echo $result; // => abc-abc !-abc!

Replace all the first character of words in a string using preg_replace()

I have a string as
This is a sample text. This text will be used as a dummy for "various" RegEx "operations" using PHP.
I want to select and replace all the first alphabet of each word (in the example : T,i,a,s,t,T,t,w,b,u,a,d,f,",R,",u,P). How do I do it?
I tried /\b.{1}\w+\b/. I read the expression as "select any character that has length of 1 followed by word of any length" but didn't work.
You may try this regex as well:
(?<=\s|^)([a-zA-Z"])
Demo
Your regex - /\b.{1}\w+\b/ - matches any string that is not enclosed in word characters, starts with any symbol that is in a position after a word boundary (thus, it can even be whitespace if there is a letter/digit/underscore in front of it), followed with 1 or more alphanumeric symbols (\w) up to the word boundary.
That \b. is the culprit here.
If you plan to match any non-whitespace preceded with a whitespace, you can just use
/(?<!\S)\S/
Or
/(?<=^|\s)\S/
See demo
Then, replace with any symbol you need.
You may try to use the following regex:
(.)[^\s]*\s?
Using the preg_match_all and implode the output result group 1
<?php
$string = 'This is a sample text. This text will be used as a dummy for'
. '"various" RegEx "operations" using PHP.';
$pattern = '/(.)[^\s]*\s?/';
$matches;
preg_match_all($pattern, $string, $matches);
$output = implode('', $matches[1]);
echo $output; //Output is TiastTtwbuaadf"R"uP
For replace use something like preg_replace_callback like:
$pattern = '/(.)([^\s]*\s?)/';
$output2 = preg_replace_callback($pattern,
function($match) { return '_' . $match[2]; }, $string);
//result: _his _s _ _ample _ext. _his _ext _ill _e _sed _s _ _ummy _or _various" _egEx _operations" _sing _HP.

Make regex more specific - only select "VB", not variations ("VB%")

Consider the following string and regex:
$string= "just/RB convinced/VBN closing/VBG 10dma/NN need/VBN see/VB";
echo preg_replace("/(\w+)\/(JJ|RB|VB)/", "not$1/$2", $tweet);
I want to concatenate "not" to every word ending in /JJ, /RB or /VB. However, the regex also captures variations on /VB: /VBN and /VBG. The output is
notjust/RB notconvinced/VBN notclosing/VBG 10dma/NN notneed/VBN notsee/VB
However, the expected output is:
notjust/RB convinced/VBN closing/VBG 10dma/NN need/VBN notsee/VB
How can I stop the regex from grabbing there variations?
Use a \b word boundary:
$string= "just/RB convinced/VBN closing/VBG 10dma/NN need/VBN see/VB";
echo preg_replace("/(\w+)\/(JJ|RB|VB)\b/", "not$1/$2", $tweet);
\b only matches between an alphanumeric character and either a non-alphanumeric character or the start/end of the string.

Replace symbol if it is preceded and followed by a word character

I want to change a specific character, only if it's previous and following character is of English characters. In other words, the target character is part of the word and not a start or end character.
For Example...
$string = "I am learn*ing *PHP today*";
I want this string to be converted as following.
$newString = "I am learn'ing *PHP today*";
$string = "I am learn*ing *PHP today*";
$newString = preg_replace('/(\w)\*(\w)/', '$1\'$2', $string);
// $newString = "I am learn'ing *PHP today* "
This will match an asterisk surrounded by word characters (letters, digits, underscores). If you only want to do alphabet characters you can do:
preg_replace('/([a-zA-Z])\*([a-zA-Z])/', '$1\'$2', 'I am learn*ing *PHP today*');
The most concise way would be to use "word boundary" characters in your pattern -- they represent a zero-width position between a "word" character and a "non-word" characters. Since * is a non-word character, the word boundaries require the both neighboring characters to be word characters.
No capture groups, no references.
Code: (Demo)
$string = "I am learn*ing *PHP today*";
echo preg_replace('~\b\*\b~', "'", $string);
Output:
I am learn'ing *PHP today*
To replace only alphabetical characters, you need to use a [a-z] as a character range, and use the i flag to make the regex case-insensitive. Since the character you want to replace is an asterisk, you also need to escape it with a backslash, because an asterisk means "match zero or more times" in a regular expression.
$newstring = preg_replace('/([a-z])\*([a-z])/i', "$1'$2", $string);
To replace all occurances of asteric surrounded by letter....
$string = preg_replace('/(\w)*(\w)/', '$1\'$2', $string);
AND
To replace all occurances of asteric where asteric is start and end character of the word....
$string = preg_replace('/*(\w+)*/','\'$1\'', $string);

Categories