PHP - How to set regex boundary and not match non alphanumeric char?

PHP - How to set regex boundary and not match non alphanumeric char? - php

Consider the following:
$text = 'c c++ c# and other text';
$skills = array('c','c++','c#','java',...);
foreach ($skill as $skill) {
if (preg_match('/\b'.$skill.'\b/', $text)) {
echo $skill.' is matched';
}
}
In the case of 'c', it matches 'c', 'c#', and 'c++'. I've tried appending assertion (?=\s) or [\s|.] in place of \b towards the end but it needs something similar to \b.
I've checked out other posts but doesn't seem to have the exact situation. Thanks!

The problem is that \b matches between c and + or #. You need something like this:
$text = 'c c++ c# and other text';
$skills = array('c','c++','c#','java');
foreach ($skills as $skill) {
if (preg_match('/(?<=^|\s)'.preg_quote($skill).'(?:\s|$)/', $text)) {
echo $skill.' is matched';
}
}
This matches when the text is preceded by either the start of the string (^) or a space at the beginning, and followed by either the end of the string ($) or a space at the end.
You need to use preg_quote(), like I did above, because c++ contains regex special characters.
Also, note the typo (missing s) in foreach ($skills ... ) in your original code.

Part of the problem is that c++ has regex chars in it. You should use preg_quote on $skill. Then use your back and forward reference solution.
The other issue is that you need to double escape the special characters because php also uses \ as an escape character in strings.

Related

How to change specific first Letter in string to Capital using PHP?

If the first character of my string contains any of the following letters, then I would like to change the first letter to Uppercase: (a,b,c,d,f,g,h,j,k,l,m,n,o,p,q,r,s,t,v,w,y,z) but not (e,i,u,x).
For example,
luke would become Luke
egg would stay the same as egg
dragon would become Dragon
I am trying to acheive this with PHP, here's what I have so far:
<?php if($str("t","t"))
echo ucfirst($str);
else
echo "False";
?>
My code is simply wrong and it doesn't work and I would be really grateful for some help.

Without regex:
function ucfirstWithCond($str){
$exclude = array('e','i','u','x');
if(!in_array(substr($str, 0, 1), $exclude)){
return ucfirst($str);
}
return $str;
}
$test = "egg";
var_dump(ucfirstWithCond($test)); //egg
$test = "luke";
var_dump(ucfirstWithCond($test)); //Luke
Demo:
http://sandbox.onlinephpfunctions.com/code/c87c6cbf8c616dd76fe69b8f081a1fbf61cf2148

You may use
$str = preg_replace_callback('~^(?![eiux])[a-z]~', function($m) {
return ucfirst($m[0]);
}, $str);
See the PHP demo
The ^(?![eiux])[a-z] regex matches any lowercase ASCII char at the start of the string but e, u, i and x and the letter matched is turned to upper inside the callback function to preg_replace_callback.
If you plan to process each word in a string you need to replace ^ with \b, or - to support hyphenated words - with \b(?<!-) or even with (?<!\S) (to require a space or start of string before the word).

If the first character could be other than a letter then check with an array range from a-z that excludes e,i,u,x:
if(in_array($str[0], array_diff(range('a','z'), ['e','i','u','x']))) {
$str[0] = ucfirst($str[0]);
}
Probably simpler to just check for the excluded characters:
if(!in_array($str[0], ['e','i','u','x'])) {
$str[0] = ucfirst($str[0]);
}

Expecting output is not displaying from php code

This is the code:
<?php
$pattern =' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
$text = "kdaiuyq7e611422^^$^vbnvcn^vznbsjhf";
$text_split = str_split($text,1);
$data = '';
foreach($text_split as $value){
if (preg_match("/".$value."/", $pattern )){
$data = $data.$value;
}
if (!preg_match('/'.$value.'/', $pattern )){
break;
}
}
echo $data;
?>
Current output:
kdaiuyq7e611422^^$^vbnvcn^vznbsjhf
Expected output:
kdaiuyq7e611422
Please help me editing my code error. In pattern there is no ^ or $. But preg_match is showing matched which is doubtful.

You string $text have ^ which will match the begin of the string $pattern.
So the preg_match('/^/', $pattern) will return true, then the ^ will append to $data.
You should escape the ^ as a raw char, not a special char with preg_match('/\^/', $pattern) by the help of preg_quote() which will escape the special char.

There is no need to split your string up like that, the whole point of a regular expression is you can specify all the conditions within the expression. You can condense your entire code down to this:
$pattern = '/^[[:word:] ]+/';
$text = 'kdaiuyq7e611422^^$^vbnvcn^vznbsjhf';
preg_match($pattern, $text, $matches);
echo $matches[0];

Kris has accurately isolated that escaping in your method is the monkey wrench. This can be solved with preg_quote() or wrapping pattern characters in \Q ... \E (force characters to be interpreted literally).
Slapping that bandaid on your method (as you have done while answering your own question) doesn't help you to see what you should be doing.
I recommend that you do away with the character mask, the str_split(), and the looped calls of preg_match(). Your task can be accomplished far more briefly/efficiently/directly with a single preg_match() call. Here is the clean way that obeys your character mask fully:
Code: (Demo)
$text = "kdaiuyq7e611422^^$^vbnvcn^vznbsjhf";
echo preg_match('/^[a-z\d ]+/i',$text,$out)?$out[0]:'No Match';
Output:
kdaiuyq7e611422
miknik's method was close to this, but it did not maintain 100% accuracy given your question requirements. I'll explain:
[:word:] is a POSIX Character Class (functioning like \w) that represents letters(uppercase and lowercase), numbers, and an underscore. Unfortunately for miknik, the underscore is not in your list of wanted characters, so this renders the pattern slightly inaccurate and may be untrustworthy for your project.

PHP preg_match to allow only numbers,spaces '+' and '-'

I need to check to see if a variable contains anything OTHER than 0-9 and the "-" and the "+" character and the " "(space).
The preg_match I have written does not work. Any help would be appreciated.
<?php
$var="+91 9766554433";
if(preg_match('/[0-9 +\-]/i', $var))
echo $var;
?>

You have to add a * as a quantifier to the whole character class and add anchors to the start and end of the regex: ^ and $ means to match only lines containing nothing but the inner regex from from start to end of line. Also, the i modifier is unnecessary since there is no need for case-insensitivity in this regex.
This should do the work.
if(!preg_match('/^[0-9 +-]*$/', $var)){
//variable contains char not allowed
}else{
//variable only contains allowed chars
}

Just negate the character class:
if ( preg_match('/[^0-9 +-]/', $var) )
echo $var;
or add anchors and quantifier:
if ( preg_match('/^[0-9 +-]+$/', $var) )
echo $var;
The case insensitive modifier is not mandatory in your case.

You can try regex101.com to test your regex to match your criteria and then on the left panel, you'll find code generator, which will generate code for PHP, Python, and Javascript.
$re = "/^[\\d\\s\\+\\-]+$/i";
$str = "+91 9766554433";
preg_match($re, $str, $matches);
You can take a look here.

Try see if this works. I haven't gotten around to test it beforehand, so I apologize if it doesn't work.
if(!preg_match('/^[0-9]+.-.+." ".*$/', $var)){
//variable contains char not allowed
}else{
//variable only contains allowed chars
}

PHP Regex: Remove words less than 3 characters

I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);

Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries

You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''

Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);

As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"

To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.

remove all punctuation except hyphen & underscore of a string

I want to remove all punctuation from beginning and end of a string except hyphen,underscore.
Example:if input is spice-b32. Or lg_b32; Then string after using preg_replace(); should be: spice-b32 and lg_b32;
i'm also tried to use preg_match('/^[A-Za-z0-9]/',$inm) for data validation use $inm=preg_replace('/^\PL+|\PL\z/','',$inm); but, when input a!-read_ result is a!-read
but output should be: a-read
if this preg_replace() OR preg_match() is not correct,then plz help..

If I understand correctly what you want, then something like this will do for you:
$inm=preg_replace('/[,.!?]*([-_]+)[,.!?]*/',
'\1',
preg_replace('/\b[.,?!]+|[.,!?]+\b/', '', $inm);
Feel free to add other characters that need to be stripped off to the character groups.

How about
$arr = array('spice-b32.', 'lg_b32;', 'a!-read_');
foreach ($arr as $str) {
echo preg_replace('/^[^\P{P}_-]+|[^\P{P}_-]+$/u', '', $str),"\n";
}
This will remove all punctuation (except _ and -) from the begining or end of a string.
output:
spice-b32
lg_b32
a!-read_

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP - How to set regex boundary and not match non alphanumeric char? - php

Part of the problem is that c++ has regex chars in it. You should use preg_quote on $skill. Then use your back and forward reference solution. The other issue is that you need to double escape the special characters because php also uses \ as an escape character in strings.

Related

How to change specific first Letter in string to Capital using PHP?

Expecting output is not displaying from php code

PHP preg_match to allow only numbers,spaces '+' and '-'

PHP Regex: Remove words less than 3 characters

remove all punctuation except hyphen & underscore of a string

Categories

Resources