I need a RegExp to only match Latin characters and not else - php

I have written the following code to check if the given string is Latin or it contains some other non-latin chars like Persian. The issue is that it always returns true for both of the following strings:
$str = "Hello, What's up?"
Or
$str = "Hello, سلام"
While for the second string it should return false since it contains Persian characters (non-latin) too.
$default_rule = "/[a-zA-Z0-9\(\)\*_\-\!\#\$\%\^\&\*\,\.\"\'\]\[]*/";
$rule = ($rule==null) ? $default_rule : $rule;
if(preg_match($rule, $str)==true)
{
// always returns true
}

Your pattern will return true if the string contains zero or more of those characters you've specified. In other words, it will return true for any string at all. You need to put start (^) and end ($) anchors around it. Also you don't need to escape most of those characters (the character class causes them to be treated as literal characters):
$default_rule = '/^[a-zA-Z0-9()*_\-!#$%^&*,."\'\][]*$/';
But, this will match an empty string. To also make sure that the string is not empty, use the + quantifier (one or more) instead of *.
$default_rule = '/^[a-zA-Z0-9()*_\-!#$%^&*,."\'\][]+$/';

Related

Ensure a string contains specific characters, each at most once

I would like to test if a string is empty or if it only contain specific characters, each at most once.
Example:
Given $valid = 'ABCDE', the following strings are:
$a = ''; // valid, empty
$b = 'CE'; // valid, only contains C and E, each once
$c = 'AZ'; // invalid, contains Z
$d = 'DAA'; // invalid, contains A twice
Any quick way of doing this, (possibly) using regex?
We can try using the following regex pattern:
^(?!.*(.).*\1)[ABCDE]{0,5}$
Here is an explanation of the regex:
^ from the start of the string
(?!.*(.).*\1) assert that the same letter does not repeat
[ABCDE]{0,5} then match 0-5 letters
$ end of the string
Sample PHP script:
$input = "ABCDE";
if (preg_match("/^(?!.*(.).*\1)[ABCDE]{0,5}$/", $input)) {
echo "MATCH";
}
The negative lookahead (?!.*(.).*\1) works by checking if it can capture any single letter, and then also find it again later on in the string. Let's take the OP's invalid input DAA. The above negative lookahead would ffail when it matches and captures the first A, and then sees it again. Note carefully that lookarounds can have their own capture groups.

Find and replace string with condition in php

I am newbie in PHP. I want to replace certain characters in a string. My code is in below:
$str="this 'is' a new 'string and i wanna' replace \"in\" \"it here\"";
$find = [
'\'',
'"'
];
$replace = [
['^', '*']
['#', '#']
];
$result = null;
$odd = true;
for ($i=0; $i < strlen($str); $i++) {
if (in_array($str[$i], $find)) {
$key = array_search($str[$i], $find);
$result .= $odd ? $replace[$key][0] : $replace[$key][1];
$odd = !$odd;
} else {
$result .= $str[$i];
}
}
echo $result;
the output of the above code is:
this ^is* a new ^string and i wanna* replace #in# #it here#.
but I want the output to be:
this ^is* a new 'string and i wanna' replace #in# "it here".
That means character will replace for both quotation(left quotation and right quotation- condition is for ' and "). for single quotation, string will not be replaced either if have left or right quotation. it will be replaced for left and right quotation.
Ok, I don't know what all that code is trying to accomplish.
But anyway here is my go at it
$str = "this 'is' a new 'string and i wanna' replace \"in\" \"it here\"";
$str = preg_replace(["/'([^']+)'/",'/"([^"]+)"/'], ["^$1*", "#$1#"], $str, 1);
print_r($str);
You can test it here
Ouptput
this ^is* a new 'string and i wanna' replace #in# "it here"
Using preg_replace and a fairly simple Regular expression, we can replace the quotes. Now the trick here is the fourth parameter of preg_replace is $count And is defined as this:
count If specified, this variable will be filled with the number of replacements done.
Therefore, setting this to 1 limits it to the first match only. In other words it will do $count replacements, or 1 in this case. Now because it's an array of patterns, each pattern is treated separately. So each one is basically treated as a separate operation, and thus each is allowed $count matches, or each get 1 match/replacement.
Now rather or not this fits every use case you have I cannot say, but it's the most straight forward way to do it for the example you provided.
As for the match itself /'([^']+)'/
/ opening and closing "delimiters" for the Expression (its a required thing, although it doesn't have to be /)
' literal match, matches ' one time (the opening quote)
( ... ) capture group (group1) so we can use it in the replacement, as $1
[^']+ character set with a [^ not modifier, match anything not in the set, so anything that is not a ' one or more times, greedy
' literal match, matches ' one time (the ending quote)
The replacement "^$1*"
^ literal, adds this char in
$1 use the contents of the capture group (group1)
* literal, adds the char in
Hope that helps understand how it works.
UPDATE
Ok I think I finally deciphered what you want:
string will be replaced for if any word have left and right quotation. example..'word'..here string will be changed..but 'word...in this case not change or word' also not be changed.
This seems like you are trying to say only "whole" words with no spaces.
So in that case we have to adjust our regular expression like this:
$str = preg_replace(["/'([-\w]+)'/",'/"([-\w]+)"/'], ["^$1*", "#$1#"], $str);
So we removed the limit $count and we changed what is in the character group to be more strict:
[-\w]+ the \w means the working set, or in other words a-zA-Z0-9_ then the - is a literal (it has to/should go first in this case)
What we are saying with this is to match only strings that start and end with a quote(single|double) and only if the string within them match the working set plus the hyphen. This does not include the space. This way in the first case, your example, it produces the same result, but if you were to flip it to
//[ORIGINAL] this 'is' a new 'string and i wanna' replace \"in\" \"it here\"
this a new 'string and i wanna' replace 'is' \"it here\" \"in\"
You would get his output
this a new 'string and i wanna' replace ^is* \"it here\" #in#
Before this change you would have gotten
this a new ^string and i wanna* replace 'is' #it here# "in"
In other words it would have only replaced the first occurrence, now it will replace anything between the quotes if and only if it's a whole word.
As a final note you can be even more strict if you only want alpha characters by changing the character set to this [a-zA-Z]+, then it will match only a to z, upper or lower case. Whereas the example above will match 0 to 9 (or any combination of them) the - hyphen, the _ underline and the previously mentioned alpha sets.
Hope that is what you need.

modify values in variable string with php

Consider example:
$mystring = "us100ch121jp23uk12";
I) I want to change value of jp by adding +1 so that makes the string into
us100ch121jp24uk12
suppose if
II) Is there a way to seperate the numeric part and alphabetic part in the above string into:
[us , 100]
[ch,121]
[jp,24]
[us,12]
my code:
$string = "us100ch121jp23uk12";
$search_for = "us";
$pairs = explode("[]", $string); // I dont know the parameters.
foreach ($pairs as $index=>$pair)
{
$numbers = explode(',',$pair);
if ($numbers[0] == $search_for){
$numbers[1] += 1; // 23 + 1 = 24
$pairs[index] = implode(',',$numbers); //push them back
break;
}
}
$new_string = implode('|',$pairs);
using Evan sir's suggestions
$mystring = "us100ch121jp22uk12";
preg_match_all("/([A-z]+)(\d+)/", $mystring, $output);
//echo $output[0][4];
foreach($output[0] as $key=>$value) {
// echo "[".$value."]";
echo "[".substr($value, 0, 2).",".substr($value, 2, strlen($value) - 2)."]"."<br>";
}
If you use preg_match_all("/([A-z]+)(\d+)/", $string, $output);, it will return an array to $output that contains three arrays. The first array will be country number strings (eg 'us100'). The second will contain country strings (eg 'us'). The third will contain the numbers (eg '100').
Since the second and third arrays will have matching indexes ($output[1][0] will be 'us' and $output[2][0] will be '100'), you could just cycle through those and do whatever you'd like to them.
Here is more information about using regular expressions in PHP. The site also contains information about regular expressions in general, which are a useful tool for any programmer!
You can do it using regular expressions in PHP. See tutorial:
http://w3school.in/w3schools-php-tutorial/php-regular-expression/
Function Description
ereg_replace() The ereg_replace() function finds for string specified by pattern and replaces pattern with replacement if found.
eregi_replace() The eregi_replace() function works similar to ereg_replace(), except that the search for pattern in string is not case sensitive.
preg_replace() The preg_replace() function works similar to ereg_replace(), except that regular expressions can be used in the pattern and replacement input parameters.
preg_match() The preg_match() function finds string of a pattern and returns true if pattern matches false otherwise.
Expression Description
[0-9] It matches any decimal digit from 0 through 9.
[a-z] It matches any character from lowercase a through lowercase z.
[A-Z] It matches any character from uppercase A through uppercase Z.
[a-Z] It matches any character from lowercase a through uppercase Z.
p+ It matches any string containing at least one p.
p* It matches any string containing zero or more p’s.
p? It matches any string containing zero or more p’s. This is just an alternative way to use p*.
p{N} It matches any string containing a sequence of N p’s
p{2,3} It matches any string containing a sequence of two or three p’s.
p{2, } It matches any string containing a sequence of at least two p’s.
p$ It matches any string with p at the end of it.
^p It matches any string with p at the beginning of it.
[^a-zA-Z] It matches any string not containing any of the characters ranging from a through z and A through Z.
p.p It matches any string containing p, followed by any character, in turn followed by another p.
^.{2}$ It matches any string containing exactly two characters.
<b>(.*)</b> It matches any string enclosed within <b> and </b>.
p(hp)* It matches any string containing a p followed by zero or more instances of the sequence hp.
you also can use JavaScript:
http://www.w3schools.com/jsref/jsref_obj_regexp.asp

Filter everything but letters and "-"

I am trying to write a function that filteres everything from an input to only letters and the symbol "-". I want that symbol since the input contains names, and someone may be called Jean-Paul, this is my current code:
if(!preg_match('/^\[a-zA-Z]+$/',$string)) {
// Containing something other than a-z and A-Z
}
$string = 'Jean-Paul'; now gives that the string contains illegal characters, but how can I do so that it accepts "-" ?
if (!preg_match('/^[A-Z-]+$/i', $string)) {
// Contains something other than A-Z (case-insensitive) or -
}
A - is treated as a literal dash inside a character class if it's the first or last character there.
Be aware that "Jean-Rémy" will still fail. Are you sure you want to restrict yourself to ASCII letters?
If by "filter" you mean delete unwanted characters, then use
$s = preg_replace("/[^a-z-]/i", "", $s);
or
$s = preg_replace("/[^a-z-]/i", "", iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $s));

Reg exp to Null/empty string if string contains non alphanumeric characters

I'm looking for a php preg replace to null/empty string if string contains any non alphanumeric characters or spaces
e.g. Strings
$string = "This string is ok";
$string = "Thi$ string is NOT ok, and should be emptied"
When I say emptied/nulled I mean it will make the string "".
So basically anything a-z A-Z 0-9 or space is ok
Any ideas?
if(preg_match('~[^a-z0-9 ]~i', $str))
$str = '';
You can use this pattern (note the possessive quantifier) to match "invalid" strings:
^[a-zA-Z0-9 ]*+.+$
Here's a snippet:
<?php
$test = array(
"This string is ok",
"Thi$ string is NOT ok, and should be emptied",
"No way!!!",
"YES YES YES"
);
foreach ($test as $str) {
echo preg_replace('/^[a-zA-Z0-9 ]*+.+$/', '<censored!>', $str)."\n";
}
?>
The above prints (as seen on ideone.com):
This string is ok
<censored!>
<censored!>
YES YES YES
It works by using possessive repetition (i.e. no backtracking) to match as many valid characters as possible with [a-zA-Z0-9 ]*+. If there's anything left after this, i.e. .+ matches, then we must have gotten stuck at an invalid character, so the whole string gets matched (and thus replaced). Otherwise the string remains untouched.
The string '<censored!>' is used as replacement here for clarity; you can use the empty string '' if that's what you need.
References
regular-expressions.info/Possessive Quantifier

Categories