How to check letters from any language , spaces and some special characters? - php

I have some data that some of the data contain non English letters , So I want to check letters from any language + spaces + some special characters.
The special characters are: ' () - &
I tried /^[\p{L} -()']+$/ but it's not working with something like Castaٌeda and word Castaٌeda
I want the first character to be any language letter , then a combination of all the allowed characters .
The string could be:
first-second
first second
first'second
first & second
first&second
first(second)
first (second)
first-second-third
first second third
first second third(fourth)
first-second-third(fourth)
..

I want the first character to be any language letter, then a combination of all the allowed characters.
You should re-arrange the current regex to require the first char to be a letter, and the character class to follow should be quantified with * (zero or more occurrences).
However, there are some things to note:
You may have hard spaces between the words, so it makes sense to replace the literal space with \s or \h (and use the u modifier in PHP to make them Unicode aware), or add the \x{00A0} pattern into the character class to match hard spaces
You need to escape a hyphen between single chars in the character class to make it match a literal hyphen, else, it creates a range of chars that the pattern can match
You should add any other allowed chars before the hyphen later, when you need to fine tune the pattern.
So, you may use
$pattern = "~^\p{L}[\p{L}\p{M}\h().'&-]*$~u";
See the regex demo.
Details
^ - start of string
\p{L} - any Unicode letter
[\p{L}\p{M}\h().'&-]* - zero or more
\p{L} - letters
\p{M} - diacritics
\h - horizontal whitespace
().'&- - these specific chars
$ - an end of string (better, add D modifier, or replace $ with \z to avoid matching before the last \n).
See the PHP demo:
$arr = ["first-second", "first second", "first'second", "first & second", "first&second", "first(second)", "first (second)", "first-second-third", "first second third", "first second third(fourth)", "first-second-third(fourth)", "word Castaٌeda", "Alfonso Lista (Potia)", "Bacolod-Kalawi (Bacolod-Grande)", "Balindong (Watu)", "President Manuel A. Roxas", "Enrique B. Magalona (Saravia)", "Bacolod-Kalawi (Bacolod-Grande)", "Datu Blah T. Sinsuat", "Don Victoriano Chiongbian (Don Mariano Marcos)", "Bulalacao (San Pedro)", "Hinoba-an (Asia)"];
$pattern = "~^\p{L}[\p{L}\p{M}\h().'&-]*$~u";
foreach ($arr as $s) {
echo $s;
if (preg_match($pattern, $s)) {
echo " => VALID\n";
} else {
echo " => INVALID\n";
}
}
Output:
first-second => VALID
first second => VALID
first'second => VALID
first & second => VALID
first&second => VALID
first(second) => VALID
first (second) => VALID
first-second-third => VALID
first second third => VALID
first second third(fourth) => VALID
first-second-third(fourth) => VALID
word Castaٌeda => VALID
Alfonso Lista (Potia) => VALID
Bacolod-Kalawi (Bacolod-Grande) => VALID
Balindong (Watu) => VALID
President Manuel A. Roxas => VALID
Enrique B. Magalona (Saravia) => VALID
Bacolod-Kalawi (Bacolod-Grande) => VALID
Datu Blah T. Sinsuat => VALID
Don Victoriano Chiongbian (Don Mariano Marcos) => VALID
Bulalacao (San Pedro) => VALID
Hinoba-an (Asia) => VALID

Related

regex to remove variable prefix with or without a delimiter

I am trying to process historic military service numbers which have a very variable format. The key thing is to remove any prefix, but also to keep any suffix. Prefixes most commonly have a delimiter of a space, slash or dash, but sometimes they do not. In these cases the prefix is always one or more uppercase letters. In all other cases both prefixes and suffixes can contain letters or numbers and whilst typically uppercase, can be lower!
Currently my php code is
$cleanServiceNumber = preg_replace("/^.*[\/\s-]/","",$serviceNumber)
and typical values and desired results are
AB/12345 => 12345
CD-23456 => 23456
EF 34567 => 34567
5/45678 => 45678
GH/56789/A =>56789/A
GH/56789B => 56789B
XY67890 => 67890 <<< fails to do any replace and returns XY67890
I'm afraid my basic regex skills are failing me in terms of sorting the last example!
This regex replaces the combination of 0 to n digits and n non-digits at the beginning of the string: /^\d*\D+/
Demo
$serviceNumbers = array(
'AB/12345',
'CD-23456',
'EF 34567',
'5/45678',
'GH/56789/A',
'GH/56789B',
'XY67890');
foreach ($serviceNumbers as $serviceNumber) {
$cleanServiceNumber = preg_replace("/^\d*\D+/","",$serviceNumber);
echo $cleanServiceNumber . "\n";
}
Output:
12345
23456
34567
45678
56789/A
56789B
67890
You can add an alternation of [A-Z]+, but you should also make the other alternation more efficient by searching for non-delimiter characters followed by a delimiter:
$cleanServiceNumber = preg_replace("/^(?:[^\/ -]+[\/ -]|[A-Z]+)/","",$serviceNumber);
Demo on regex101
PHP demo on 3v4l.org
Here is another try for a regex which looks like:
/^([A-Za-z]+(\d+\W|\W)?|\d+\W)/
It has 2 parts which detects the type of prefixes you have:
[A-Za-z]+(\d+\W|\W)? => Any alphabets ending with non word character or alphabets having numbers and then ending with non word character. However, this ending game is optional with a ? at the end.
\d+\W => Any digits followed by a non word character.
Snippet:
<?php
$tests = [
'AB/12345',
'CD-23456',
'EF 34567',
'5/45678',
'GH/56789/A',
'GH/56789B',
'XY67890',
'XY67890/90/A'
];
foreach($tests as $test){
echo $test," => ",preg_replace("/^([A-Za-z]+(\d+\W|\W)?|\d+\W)/","",$test),PHP_EOL;
}
Demo: https://3v4l.org/9hJLJ
The pattern you tried ^.*[\/\s-] first matches until the end of the string because the dot is greedy. Then it will backtrack until it can match either a /, - or a whitespace char.
This will not work for GH/56789/A as it will backtrack until the last / and it will not work for XY67890 as it does not match any of the characters in the character class.
You could match from the start of the string either 1 or more chars a-zA-Z or 1 or more digits 0-9 and at the end match an optional /, - or a horizontal whitespace character.
^(?:[A-Za-z]+|\d+)[/\h-]?
Regex demo | Php demo
For example
$serviceNumbers = [
"AB/12345",
"CD-23456",
"EF 34567",
"5/45678",
"GH/56789/A",
"GH/56789B",
"XY67890"
];
foreach ($serviceNumbers as $serviceNumber) {
echo preg_replace("~^(?:[A-Za-z]+|\d+)[/\h-]?~","",$serviceNumber) . PHP_EOL;
}
Output
12345
23456
34567
45678
56789/A
56789B
67890

Split String With preg_match

I have string :
$productList="
Saluran Dua(Bothway)-(TAN007);
Speedy Password-(INET PASS);
Memo-(T-Memo);
7-pib r-10/10-(AM);
FBI (R/N/M)-(Rr/R(A));
";
i want the result like this:
Array(
[0]=>TAN007
[1]=>INET PASS
[2]=>T-Memo
[3]=>AM
[4]=>Rr/R(A)
);
I used :
$separator = '/\-\(([A-z ]*)\)/';
preg_match_all($separator, $productList, $match);
$value=$match[1];
but the result:
Array(
[0]=>INET PASS
[1]=>AM
);
there's must wrong code, anybody can help this?
Your regex does not include all the characters that can appear in the piece of text you want to capture.
The correct regex is:
$match = array();
preg_match_all('/-\((.*)\);/', $productList, $match);
Explanation (from the inside to outside):
.* matches anything;
(.*) is the expression above put into parenthesis to capture the match in $match[1];
-\((.*)\); is the above in the context: it matches if it is preceded by -( and followed by );; the parenthesis are escaped to use their literal values and not their special regex interpretation;
there is no need to escape - in regex; it has special interpretation only when it is used inside character ranges ([A-Z], f.e.) but even there, if the dash character (-) is right after the [ or right before the ] then it has no special meaning; e.g. [-A-Z] means: dash (-) or any capital letter (A to Z).
Now, print_r($match[1]); looks like this:
Array
(
[0] => TAN007
[1] => INET PASS
[2] => T-Memo
[3] => AM
[4] => Rr/R(A)
)
for the 1th line you need 0-9
for the 3th line you need a - in and
in the last line you need ()
try this
#\-\(([a-zA-Z/0-9(\)\- ]*)\)#
try with this ReGex
$separator = '#\-\(([A-Za-z0-9/\-\(\) ]*)\)#';

modify values in variable string with php

Consider example:
$mystring = "us100ch121jp23uk12";
I) I want to change value of jp by adding +1 so that makes the string into
us100ch121jp24uk12
suppose if
II) Is there a way to seperate the numeric part and alphabetic part in the above string into:
[us , 100]
[ch,121]
[jp,24]
[us,12]
my code:
$string = "us100ch121jp23uk12";
$search_for = "us";
$pairs = explode("[]", $string); // I dont know the parameters.
foreach ($pairs as $index=>$pair)
{
$numbers = explode(',',$pair);
if ($numbers[0] == $search_for){
$numbers[1] += 1; // 23 + 1 = 24
$pairs[index] = implode(',',$numbers); //push them back
break;
}
}
$new_string = implode('|',$pairs);
using Evan sir's suggestions
$mystring = "us100ch121jp22uk12";
preg_match_all("/([A-z]+)(\d+)/", $mystring, $output);
//echo $output[0][4];
foreach($output[0] as $key=>$value) {
// echo "[".$value."]";
echo "[".substr($value, 0, 2).",".substr($value, 2, strlen($value) - 2)."]"."<br>";
}
If you use preg_match_all("/([A-z]+)(\d+)/", $string, $output);, it will return an array to $output that contains three arrays. The first array will be country number strings (eg 'us100'). The second will contain country strings (eg 'us'). The third will contain the numbers (eg '100').
Since the second and third arrays will have matching indexes ($output[1][0] will be 'us' and $output[2][0] will be '100'), you could just cycle through those and do whatever you'd like to them.
Here is more information about using regular expressions in PHP. The site also contains information about regular expressions in general, which are a useful tool for any programmer!
You can do it using regular expressions in PHP. See tutorial:
http://w3school.in/w3schools-php-tutorial/php-regular-expression/
Function Description
ereg_replace() The ereg_replace() function finds for string specified by pattern and replaces pattern with replacement if found.
eregi_replace() The eregi_replace() function works similar to ereg_replace(), except that the search for pattern in string is not case sensitive.
preg_replace() The preg_replace() function works similar to ereg_replace(), except that regular expressions can be used in the pattern and replacement input parameters.
preg_match() The preg_match() function finds string of a pattern and returns true if pattern matches false otherwise.
Expression Description
[0-9] It matches any decimal digit from 0 through 9.
[a-z] It matches any character from lowercase a through lowercase z.
[A-Z] It matches any character from uppercase A through uppercase Z.
[a-Z] It matches any character from lowercase a through uppercase Z.
p+ It matches any string containing at least one p.
p* It matches any string containing zero or more p’s.
p? It matches any string containing zero or more p’s. This is just an alternative way to use p*.
p{N} It matches any string containing a sequence of N p’s
p{2,3} It matches any string containing a sequence of two or three p’s.
p{2, } It matches any string containing a sequence of at least two p’s.
p$ It matches any string with p at the end of it.
^p It matches any string with p at the beginning of it.
[^a-zA-Z] It matches any string not containing any of the characters ranging from a through z and A through Z.
p.p It matches any string containing p, followed by any character, in turn followed by another p.
^.{2}$ It matches any string containing exactly two characters.
<b>(.*)</b> It matches any string enclosed within <b> and </b>.
p(hp)* It matches any string containing a p followed by zero or more instances of the sequence hp.
you also can use JavaScript:
http://www.w3schools.com/jsref/jsref_obj_regexp.asp

PHP preg_match - only allow alphanumeric strings and - _ characters

I need the regex to check if a string only contains numbers, letters, hyphens or underscore
$string1 = "This is a string*";
$string2 = "this_is-a-string";
if(preg_match('******', $string1){
echo "String 1 not acceptable acceptable";
// String2 acceptable
}
Code:
if(preg_match('/[^a-z_\-0-9]/i', $string))
{
echo "not valid string";
}
Explanation:
[] => character class definition
^ => negate the class
a-z => chars from 'a' to 'z'
_ => underscore
- => hyphen '-' (You need to escape it)
0-9 => numbers (from zero to nine)
The 'i' modifier at the end of the regex is for 'case-insensitive' if you don't put that you will need to add the upper case characters in the code before by doing A-Z
if(!preg_match('/^[\w-]+$/', $string1)) {
echo "String 1 not acceptable acceptable";
// String2 acceptable
}
Here is one equivalent of the accepted answer for the UTF-8 world.
if (!preg_match('/^[\p{L}\p{N}_-]+$/u', $string)){
//Disallowed Character In $string
}
Explanation:
[] => character class definition
p{L} => matches any kind of letter character from any language
p{N} => matches any kind of numeric character
_- => matches underscore and hyphen
+ => Quantifier — Matches between one to unlimited times (greedy)
/u => Unicode modifier. Pattern strings are treated as UTF-16. Also
causes escape sequences to match unicode characters
Note, that if the hyphen is the last character in the class definition it does not need to be escaped. If the dash appears elsewhere in the class definition it needs to be escaped, as it will be seen as a range character rather then a hyphen.
\w\- is probably the best but here just another alternative
Use [:alnum:]
if(!preg_match("/[^[:alnum:]\-_]/",$str)) echo "valid";
demo1 | demo2
Why to use regex? PHP has some built in functionality to do that
<?php
$valid_symbols = array('-', '_');
$string1 = "This is a string*";
$string2 = "this_is-a-string";
if(preg_match('/\s/',$string1) || !ctype_alnum(str_replace($valid_symbols, '', $string1))) {
echo "String 1 not acceptable acceptable";
}
?>
preg_match('/\s/',$username) will check for blank space
!ctype_alnum(str_replace($valid_symbols, '', $string1)) will check for valid_symbols

PHP: replace characters and make exceptions (preg_replace)

How do I:
replace characters in a word using preg_replace() but make
an exception if they are part of a
certain word.
replace an uppercase character with an
uppercase replacement even if the
replacement is lowercase and vice
versa.
example:
$string = 'Newton, Einstein and Edison. end';
echo preg_replace('/n/i', '<b>n</b>', $string);
from: newton, Einstein and Edison. end
to: Newton, Einstein and Edison. end
In this case I want all the n letters to be replaced unless they are part of the word end And Newton should not change to newton
echo preg_replace('/((?<!\be)n|n(?!d\b))/i', '<b>\1</b>', $string);
It matches any letter 'n' that is either not preceded by [word boundary + e] or not followed by [d + word boundary].
The general case: /((?<!\b$PREFIX)$LETTER|$LETTER(?!$SUFFIX\b))/i'

Categories