I need the regex to check if a string only contains numbers, letters, hyphens or underscore
$string1 = "This is a string*";
$string2 = "this_is-a-string";
if(preg_match('******', $string1){
echo "String 1 not acceptable acceptable";
// String2 acceptable
}
Code:
if(preg_match('/[^a-z_\-0-9]/i', $string))
{
echo "not valid string";
}
Explanation:
[] => character class definition
^ => negate the class
a-z => chars from 'a' to 'z'
_ => underscore
- => hyphen '-' (You need to escape it)
0-9 => numbers (from zero to nine)
The 'i' modifier at the end of the regex is for 'case-insensitive' if you don't put that you will need to add the upper case characters in the code before by doing A-Z
if(!preg_match('/^[\w-]+$/', $string1)) {
echo "String 1 not acceptable acceptable";
// String2 acceptable
}
Here is one equivalent of the accepted answer for the UTF-8 world.
if (!preg_match('/^[\p{L}\p{N}_-]+$/u', $string)){
//Disallowed Character In $string
}
Explanation:
[] => character class definition
p{L} => matches any kind of letter character from any language
p{N} => matches any kind of numeric character
_- => matches underscore and hyphen
+ => Quantifier — Matches between one to unlimited times (greedy)
/u => Unicode modifier. Pattern strings are treated as UTF-16. Also
causes escape sequences to match unicode characters
Note, that if the hyphen is the last character in the class definition it does not need to be escaped. If the dash appears elsewhere in the class definition it needs to be escaped, as it will be seen as a range character rather then a hyphen.
\w\- is probably the best but here just another alternative
Use [:alnum:]
if(!preg_match("/[^[:alnum:]\-_]/",$str)) echo "valid";
demo1 | demo2
Why to use regex? PHP has some built in functionality to do that
<?php
$valid_symbols = array('-', '_');
$string1 = "This is a string*";
$string2 = "this_is-a-string";
if(preg_match('/\s/',$string1) || !ctype_alnum(str_replace($valid_symbols, '', $string1))) {
echo "String 1 not acceptable acceptable";
}
?>
preg_match('/\s/',$username) will check for blank space
!ctype_alnum(str_replace($valid_symbols, '', $string1)) will check for valid_symbols
Related
I have some data that some of the data contain non English letters , So I want to check letters from any language + spaces + some special characters.
The special characters are: ' () - &
I tried /^[\p{L} -()']+$/ but it's not working with something like Castaٌeda and word Castaٌeda
I want the first character to be any language letter , then a combination of all the allowed characters .
The string could be:
first-second
first second
first'second
first & second
first&second
first(second)
first (second)
first-second-third
first second third
first second third(fourth)
first-second-third(fourth)
..
I want the first character to be any language letter, then a combination of all the allowed characters.
You should re-arrange the current regex to require the first char to be a letter, and the character class to follow should be quantified with * (zero or more occurrences).
However, there are some things to note:
You may have hard spaces between the words, so it makes sense to replace the literal space with \s or \h (and use the u modifier in PHP to make them Unicode aware), or add the \x{00A0} pattern into the character class to match hard spaces
You need to escape a hyphen between single chars in the character class to make it match a literal hyphen, else, it creates a range of chars that the pattern can match
You should add any other allowed chars before the hyphen later, when you need to fine tune the pattern.
So, you may use
$pattern = "~^\p{L}[\p{L}\p{M}\h().'&-]*$~u";
See the regex demo.
Details
^ - start of string
\p{L} - any Unicode letter
[\p{L}\p{M}\h().'&-]* - zero or more
\p{L} - letters
\p{M} - diacritics
\h - horizontal whitespace
().'&- - these specific chars
$ - an end of string (better, add D modifier, or replace $ with \z to avoid matching before the last \n).
See the PHP demo:
$arr = ["first-second", "first second", "first'second", "first & second", "first&second", "first(second)", "first (second)", "first-second-third", "first second third", "first second third(fourth)", "first-second-third(fourth)", "word Castaٌeda", "Alfonso Lista (Potia)", "Bacolod-Kalawi (Bacolod-Grande)", "Balindong (Watu)", "President Manuel A. Roxas", "Enrique B. Magalona (Saravia)", "Bacolod-Kalawi (Bacolod-Grande)", "Datu Blah T. Sinsuat", "Don Victoriano Chiongbian (Don Mariano Marcos)", "Bulalacao (San Pedro)", "Hinoba-an (Asia)"];
$pattern = "~^\p{L}[\p{L}\p{M}\h().'&-]*$~u";
foreach ($arr as $s) {
echo $s;
if (preg_match($pattern, $s)) {
echo " => VALID\n";
} else {
echo " => INVALID\n";
}
}
Output:
first-second => VALID
first second => VALID
first'second => VALID
first & second => VALID
first&second => VALID
first(second) => VALID
first (second) => VALID
first-second-third => VALID
first second third => VALID
first second third(fourth) => VALID
first-second-third(fourth) => VALID
word Castaٌeda => VALID
Alfonso Lista (Potia) => VALID
Bacolod-Kalawi (Bacolod-Grande) => VALID
Balindong (Watu) => VALID
President Manuel A. Roxas => VALID
Enrique B. Magalona (Saravia) => VALID
Bacolod-Kalawi (Bacolod-Grande) => VALID
Datu Blah T. Sinsuat => VALID
Don Victoriano Chiongbian (Don Mariano Marcos) => VALID
Bulalacao (San Pedro) => VALID
Hinoba-an (Asia) => VALID
I am looking to find and replace words in a long string. I want to find words that start looks like this: $test$ and replace it with nothing.
I have tried a lot of things and can't figure out the regular expression. This is the last one I tried:
preg_replace("/\b\\$(.*)\\$\b/im", '', $text);
No matter what I do, I can't get it to replace words that begin and end with a dollar sign.
Use single quotes instead of double quotes and remove the double escape.
$text = preg_replace('/\$(.*?)\$/', '', $text);
Also a word boundary \b does not consume any characters, it asserts that on one side there is a word character, and on the other side there is not. You need to remove the word boundary for this to work and you have nothing containing word characters in your regular expression, so the i modifier is useless here and you have no anchors so remove the m (multi-line) modifier as well.
As well * is a greedy operator. Therefore, .* will match as much as it can and still allow the remainder of the regular expression to match. To be clear on this, it will replace the entire string:
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*)\$/', '', $text));
# => string(0) ""
I recommend using a non-greedy operator *? here. Once you specify the question mark, you're stating (don't be greedy.. as soon as you find a ending $... stop, you're done.)
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*?)\$/', '', $text));
# => string(10) " bar quz "
Edit
To fix your problem, you can use \S which matches any non-white space character.
$text = '$20.00 is the $total$';
var_dump(preg_replace('/\$\S+\$/', '', $text));
# string(14) "$20.00 is the "
There are three different positions that qualify as word boundaries \b:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
$ is not a word character, so don't use \b or it won't work. Also, there is no need for the double escaping and no need for the im modifiers:
preg_replace('/\$(.*)\$/', '', $text);
I would use:
preg_replace('/\$[^$]+\$/', '', $text);
You can use preg_quote to help you out on 'quoting':
$t = preg_replace('/' . preg_quote('$', '/') . '.*?' . preg_quote('$', '/') . '/', '', $text);
echo $t;
From the docs:
This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
Contrary to your use of word boundary markers (\b), you actually want the inverse effect (\B)-- you want to make sure that there ISN'T a word character next to the non-word character $.
You also don't need to use capturing parentheses because you are not using a backreference in your replacement string.
\S+ means one or more non-whitespace characters -- with greedy/possessive matching.
Code: (Demo)
$text = '$foo$ boo hi$$ mon$k$ey $how thi$ $baz$ bar $foobar$';
var_export(
preg_replace(
'/\B\$\S+\$\B/',
'',
$text
)
);
Output:
' boo hi$$ mon$k$ey $how thi$ bar '
I want to allow only alpha numeric characters and spaces, so I use the following;
$name = preg_replace('/[^a-zA-z0-9 ]/', '', $str);
However, that is allowing underscores "_" which I don't want. Why is this and how do I fix it?
Thanks
The character class range is for a range of characters between two code points. The character _ is included in the range A-z, and you can see this by looking at the ASCII table:
... Y Z [ \ ] ^ _ ` a b ...
So it's not only the underscore that's being let through, but those other characters you see above, as stated in the documentation:
Ranges operate in ASCII collating sequence. ... For example, [W-c] is equivalent to [][\^_`wxyzabc].
To prevent this from happening, you can perform a case insensitive match with a single character range in your character class:
$name = preg_replace('/[^a-z0-9 ]/i', '', $str);
You have mistake in your expression. Last Z must be capital.
$name = preg_replace('/[^a-zA-Z0-9 ]/', '', $str);
^
I'm trying to write a regular expression which could match a string that possibly includes Chinese characters. Examples:
hahdj5454_fd.fgg"
example.com/list.php?keyword=关键字
example.com/list.php?keyword=php
I am using this expression:
$matchStr = '/^[a-z 0-9~%.:_\-\/[^x7f-xff]+$/i';
$str = "http://example.com/list.php?keyword=关键字";
if ( ! preg_match($matchStr, $str)){
exit('WRONG');
}else{
echo "RIGHT";
}
It matches plain English strings like that dasdsdsfds or http://example.com/list.php, but it doesn't match strings containing Chinese characters. How can I resolve this?
Assuming you want to extend the set of letters that this regex matches from ASCII to all Unicode letters, then you can use
$matchStr = '#^[\pL 0-9~%.:_/-]+$#u';
I've removed the [^x7f-xff part which didn't make any sense (in your regex, it would have matched an opening bracket, a caret, and some ASCII characters that were already covered by the a-z and 0-9 parts of that character class).
This works:
$str = "http://mysite/list.php?keyword=关键字";
if (preg_match('/[\p{Han}]/simu', $str)) {
echo "Contains Chinese Characters";
}else{
exit('WRONG'); // Doesn't contains Chinese Characters
}
How do I get "Lrumipsm1" from "Lörum ipsäm 1!"?
So what I need is to only get a-z and 0-9 from a string, using php.
E.g. by using a regular expression (pcre) and replacing all characters that are not within the class of "acceptable" characters by ''.
$in = "Lörum ipsäm 1!";
$result = preg_replace('/[^a-z0-9]+/i', '', $in);
echo $result;
see also: http://docs.php.net/preg_replace
edit:
[a-z0-9] is the class of all characters a....z and 0...9
[^...] negates a class, i.e. [^a-z0-9] contains all characters that are not within a...z0...9
+ is a quantifier with the meaning "1 or more times", [^a-z0-9]+ matches one or more (consecutive) characters that are not within a...z0..9.
The option i makes the pattern case-insensitive, i.e. [a-z] also matches A...Z
you can do this also
$in = "Lörum ipsäm 1!";
$result = preg_replace('/[^[:alnum:]]/i', '', $in);
echo $result;