Combine Multiple Regex into One - php

I am trying to write a code to hyphenate a string into latin verses. There are a few constraints to it which I have taken care of, however I do not get the desired output. My code is given below :
<?php
$string = "impulerittantaenanimis caelestibusirae";
$precedingC = precedingConsonant($string);
$xrule = xRule($precedingC);
$consonantc = consonantCT($xrule);
$consonantp = consonantPT($consonantc);
$cbv = CbetweenVowels($consonantp);
$tv = twoVowels($cbv);
echo $tv;
function twoVowels($string)
{
return preg_replace('/([aeiou])([aeiou])/', '$1-$2', $string);
}
function CbetweenVowels($string)
{
return preg_replace('/([aeiou])([^aeiou])([aeiou])/', '$1-$2$3', $string);
}
function consonantPT($string)
{
return preg_replace('/([^aeiou]p)(t[aeiou])/', '$1-$2', $string);
}
function consonantCT($string)
{
return preg_replace('/([^aeiou]c)(t[aeiou])/', '$1-$2', $string);
}
function precedingConsonant($string)
{
$arr1 = str_split($string);
$length = count($arr1);
for($j=0;$j<$length;$j++)
{
if(isVowel($arr1[$j]) && !isVowel($arr1[$j+1]) && !isVowel($arr1[$j+2]) && isVowel($arr1[$j+3]))
{
$pc++;
}
}
function strAppend2($string)
{
$arr1 = str_split($string);
$length = count($arr1);
for($i=0;$i<$length;$i++)
{
$check = $arr1[$i+1].$arr1[$i+2];
$check2 = $arr1[$i+1].$arr1[$i+2].$arr1[$i+3];
if($check=='br' || $check=='cr' || $check=='dr' || $check=='fr' || $check=='gr' || $check=='pr' || $check=='tr' || $check=='bl' || $check=='cl' || $check=='fl' || $check=='gl' || $check=='pl' || $check=='ch' || $check=='ph' || $check=='th' || $check=='qu' || $check2=='phl' || $check2=='phr')
{
if(isVowel($arr1[$i]) && !isVowel($arr1[$i+1]) && !isVowel($arr1[$i+2]) && isVowel($arr1[$i+3]))
{
$updatedString = substr_replace($string, "-", $i+1, 0);
return $updatedString;
}
}
else
{
if(isVowel($arr1[$i]) && !isVowel($arr1[$i+1]) && !isVowel($arr1[$i+2]) && isVowel($arr1[$i+3]))
{
$updatedString = substr_replace($string, "-", $i+2, 0);
return $updatedString;
}
}
}
}
$st1 = $string;
for($k=0;$k<$pc;$k++)
{
$st1 = strAppend2($st1);
}
return $st1;
}
function xRule($string)
{
return preg_replace('/([aeiou]x)([aeiou])/', '$1-$2', $string);
}
function isVowel($ch)
{
if($ch=='a' || $ch=='e' || $ch=='i' || $ch=='o' || $ch=='u')
{
return true;
}
else
{
return false;
}
}
function isConsonant($ch)
{
if($ch=='a' || $ch=='e' || $ch=='i' || $ch=='o' || $ch=='u')
{
return false;
}
else
{
return true;
}
}
?>
I believe if I combine all these functions it will result in the desired output. However I will specify my constraints below :
Rule 1 : When two or more consonants are between vowels, the first consonant is joined to the preceding vowel; for example - rec-tor, trac-tor, ac-tor, delec-tus, dic-tator, defec-tus, vic-tima, Oc-tober, fac-tum, pac-tus,
Rule 2 : 'x' is joined to the preceding vowel; as, rex-i.
However we give a special exception to the following consonants - br, cr, dr, fr, gr, pr, tr; bl, cl, fl, gl, pl, phl, phr, ch, ph, th, qu. These consonants are taken care by adding them to the later vowel for example - con- sola-trix
n- sola-trix.
Rule 3 : When 'ct' follows a consonant, that consonant and 'c' are both joined to the first vowel for example - sanc-tus and junc-tum
Similarly for 'pt' we apply the same rule for example - scalp-tum, serp-tum, Redemp-tor.
Rule 4 : A single consonant between two vowels is joined to the following vowel for example - ma-ter, pa-ter AND Z is joined to the following vowel.
Rule 5 : When two vowels come together they are divided, if they be not a diphthong; as au-re-us. Diaphthongs are - "ae","oe","au"

If you look carefully at each rule, you can see that all involve a vowel at the beginning or a preceding vowel. Once you realize that, you can try to build a single pattern putting [aeiou] in factor at the beginning:
$pattern = '~
(?<=[aeiou]) # each rule involves a vowel at the beginning (also called a
# "preceding vowel")
(?:
# Rule 2: capture particular cases
( (?:[bcdfgpt]r | [bcfgp] l | ph [lr] | [cpt] h | qu ) [aeiou] x )
|
[bcdfghlmnp-tx]
(?:
# Rule 3: When "ct" follows a consonant, that consonant and "c" are both
# joined to the first vowel
[cp] \K (?=t)
|
# Rule 1: When two or more consonants are between vowels, the first
# consonant is joined to the preceding vowel
\K (?= [bcdfghlmnp-tx]+ [aeiou] )
)
|
# Rule 4: a single consonant between two vowels is joined to the following
# vowel
(?:
\K (?= [bcdfghlmnp-t] [aeiou] )
|
# Rule 2: "x" is joined to the preceding vowel
x \K (?= [a-z] | (*SKIP)(*F) )
)
|
# Rule 5: When two vowels come together they are divided, if they not be a
# diphthong ("ae", "oe", "au")
\K (?= [aeiou] (?<! a[eu] | oe ) )
)
~xi';
This pattern is designed to only match the position where to put the hyphen (except for particular cases of Rule 2), that's why it uses a lot of \K to start the match result at this position and lookaheads to test what follows without matching characters.
$string = <<<EOD
Aeneadum genetrix, hominum diuomque uoluptas,
alma Uenus, caeli subter labentia signa
quae mare nauigerum, quae terras frugiferentis
concelebras, per te quoniam genus omne animantum
EOD;
$result = preg_replace($pattern, '-$1', $string);
Ae-ne-a-dum ge-ne-trix, ho-mi-num di-u-om-qu-e u-o-lup-tas,
al-ma U-e-nus, cae-li sub-ter la-ben-ti-a sig-na
qu-ae ma-re nau-i-ge-rum, qu-ae ter-ras fru-gi-fe-ren-tis
con-ce-leb-ras, per te qu-o-ni-am ge-nus om-ne a-ni-man-tum
Note that I didn't include several letters like k, y and z that don't exist in the latin alphabet, feel free to include them if you need to handle translated greek words or other.

Related

How to use regex validation in Laravel for uppercase letters, numbers and square brackets and then list as a postcode range?

I want to convert the user input to a list of postcodes, e.g.
If the user types SW[1-3] return a list of SW1, SW2, SW3.
I also need it to work alphabetically, if the user inputs LD1[A-D] return a list of LD1A, LD1B, LD1C, LD1D.
If the user doesn't use square brackets and just does LD1 then it should just return LD1 in the list.
This is what I've tried so far but I can't get the validation to work and I'm not sure I have the best method for creating the list.
public function store(Request $request) {
$request->validate([
'postcode' => 'required|string|regex:/^[A-Z0-9][\[A-Z\-0-9\]]*$/u',
]);
$postcode_input = $request->postcode;
$postcode_list = [];
// check for []'s
$match = preg_match('/\[[^\]]*\]/', $postcode_input, $matches);
if($match) {
$letters = explode("[", $postcode_input)[0] ?? false;
$numbers = explode("-", str_replace( array('[',']') , '' , $matches )[0]) ?? false;
if($letters && $numbers[0] && $numbers[1]) {
foreach (range($numbers[0], $numbers[1]) as $number) {
$postcode_list[] = $letters . $number;
}
}
} else {
$postcode_list[] = $postcode_input;
}
if(count($postcode_list) > 0) {
foreach($postcode_list as $pc) {
$pc = preg_replace("/\s+/", "", $pc);
echo $pc . "<br>";
}
}
}
Assuming that the range for [A-Z] is a single uppercase character, you might use a pattern with capture groups and a branch reset to access the same group number in the alternation.
^([A-Z\d]+)\[(?|(\d+)-(\d+)|([A-Z])-([A-Z]))]|([A-Z\d]+)$
^ Start of string
([A-Z\d]+) Capture group 1 Match 1+ times A-Z or a digit
\[ Match [
(?| Branch reset group
(\d+)-(\d+) Capture 1+ digits in group 2 and group 3 with - in between
| Or
([A-Z])-([A-Z]) Capture a single char A-Z in group 2 and group 3 with - in between
) Close branch reset
] Match ]
| Or
([A-Z\d]+) Capture group 4 Match 1+ times A-Z or a digit (in case of only LD1)
$ End of string
Regex demo | Php demo
Example code, using the value from group 2 and group 3 for the range:
$pattern = "/^([A-Z\d]+)\[(?|(\d+)-(\d+)|([A-Z])-([A-Z]))]|([A-Z\d]+)$/";
$strings = [
"SW[1-3]",
"LD1[A-D]",
"LD1"
];
foreach ($strings as $s) {
if (preg_match($pattern, $s, $match)) {
if (array_key_exists(4, $match)) {
echo $match[4] . PHP_EOL;
continue;
}
foreach (range($match[2], $match[3]) as $m) {
echo $match[1] . $m . PHP_EOL;
}
}
}
Output
SW1
SW2
SW3
LD1A
LD1B
LD1C
LD1D
LD1

Regex: match single equal sign with negative lookahead =

I am building something that requires the user to input conditions and then I will parse it with PHP. I want to build a preg_replace that replaces = with == but == remains untouched.
Examples
a=b => a==b
a==b => a==b
a = b => a == b
a == b => a == b
So basically if a user forgets that the condition needs == instead of =, the system will allow that too.
You can use this regex,
(?<![=!])=(?![=!])
Which ensures a = will only be selected if it is not preceded or followed by a = and replace it by ==
Demo
Sample PHP codes,
$arr = array("a=b", "a==b", "a = b", "a == b", "a!=b");
foreach($arr as $s) {
echo $s, ' --> ', preg_replace('/(?<![=!])=(?![=!])/', '==', $s) , "\n";
}
Prints,
a=b --> a==b
a==b --> a==b
a = b --> a == b
a == b --> a == b
a!=b --> a!=b
Another option is to use positive lookbehind to assert what is on the left is either a word character \w or a whitespace character \s
(?<=[\w\s])=+
Regex demo | Php demo
For example:
$result = preg_replace('/(?<=[\w ])=+/', '==', $str)
You could get the desired result by doing the following :
$string = "a == b" ;
$string = str_replace('==', '=', $string);
$string = str_replace('=', '==', $string);
var_dump($string);
By replacing all == to = and then replacing all = to ==

Regex to remove everything but numbers and one character

I need to remove everything but numbers and, if exists one character from a string. It's a street name I need to extract the house number of. It is possible that there is some more content after the string, but not neccessarely.
The original string is something like
Wagnerstrasse 3a platz53,eingang 3,Zi.3005
I extract the street with number like this:
preg_match('/^([^\d]*[^\d\s]) *(\d.*)$/', $address, $match);
Then, I do an if statement on "Wagnerstrasse 3a"
if (preg_replace("/[^0-9]/","",$match[2]) == $match[2])
I need to change the regex in order to get one following letter too, even if there is a space in between, but only if it is a single letter so that my if is true for this condition / Better a regex that just removes everything but below:
Wagnerstrasse 3a <-- expected result: 3a
Wagnerstrasse 3 a <--- expected result 3 a
Wagnerstrasse 3 <--- expected result 3
Wagnerstrasse 3 a bac <--- expected result 3 a
You can try something like this that uses word boundaries:
preg_match('~\b\d+(?: ?[a-z])?\b~', $txt, $m)
The letter is in an optional group with an optional space before. Even if there is no letter the last word boundary will match with the digit and what follows (space, comma, end of the string...).
Note: to avoid a number in the street name, you can try to anchor your pattern at the first comma in a lookahead, for example:
preg_match('~\b\d+(?: ?[a-z])?\b(?= [^\s]*,)~', $txt, $m)
I let you to improve this subpattern with your cases.
<?php
$s1 = 'Wagnerstrasse 3 platz53,eingang 3,Zi.3005';
$s2 = 'Wagnerstrasse 3a platz53,eingang 3,Zi.3005';
$s3 = 'Wagnerstrasse 3A platz53,eingang 3,Zi.3005';
$s4 = 'Wagnerstrasse 3 a platz53,eingang 3,Zi.3005';
$s5 = 'Wagnerstrasse 3 A platz53,eingang 3,Zi.3005';
//test all $s
preg_match('#^(.+? [0-9]* *[A-z]?)[^A-z]#', $s1, $m);
//if you want only the street number
//preg_match('#^.+? ([0-9]* *[A-z]?)[^A-z]#', $s1, $m);
echo $m[1];
?>
After doing some more research and hours of checking addresses (so many addresses) on the topic I found a solution which, until now, didn't fail. Might be that I didn't realize it, but it seems to be quite good. And it's a regex one has not seen before... The regex fails if there are no numbers in the line. So I did some hacking (mention the millions of nines...)
Basically the regex is excellent for finding numbers at the end and preserves numbers in the middle of the text but fails for above mentionend fact and if the street starts with a number. So I did just another little hack and explode the first number to the back and catch it as number.
if ($this->startsWithNumber($data))
{
$tmp = explode(' ', $data);
$data = trim(str_replace($tmp[0], '', $data)) . ' ' . $tmp[0];
}
if (!preg_match('/[0-9]/',$data))
{
$data .= ' 99999999999999999999999999999999999999999999999999999999999999999999999';
}
$data = preg_replace("/[^ \w]+/",'',$data);
$pcre = '/\A\s*
(.*?) # street
\s*
\x2f? # slash
(
\pN+\s*[a-zA-Z]? # number + letter
(?:\s*[-\x2f\pP]\s*\pN+\s*[a-zA-Z]?)* # cut
) # number
\s*\z/ux';
preg_match($regex, $data, $h);
$compare = strpos($h[2],'999999999999999999999999999999999999999999999999999999999999999999999999');
if ($compare !== false) {
$h[2] = null;
}
$this->receiverStreet[] = (isset($h[1])) ? $h[1] : null;
$this->receiverHouseNo[] = (isset($h[2])) ? $h[2] : null;
public function startsWithNumber($str)
{
return preg_match('/^\d/', $str) === 1;
}

How to add and OR statement correctly

I am trying to do a comparison of serial numbers like so 20140831-123 or 20140831-1234 so the form can accept our new serial numbers which contain 4 last numbers. So far I have tried an elseif statement and an or operator with no results what am I doing wrong? is there a way to change the reg expression itself to accept 3 or 4 digits at the end of the serial?
if($name == 'newserial1'){
$newserial1 = $_POST['newserial1'];
if($newserial1 != '') {
if(!preg_match('/^([0-9]{8}-)([0-9]{3})$/', $newserial1) ||
(!preg_match('/^([0-9]{8}-)([0-9]{4})$/', $newserial1))) {
$result['valid'] = false;
$result['reason'][$name] = 'Incorrect Serial Number.';
}
}
}
Just use the below regex to match last 3 or 4 digits also,
^([0-9]{8}-)([0-9]{3,4})$
DEMO
Explanation:
^ Asserts that we are at the start.
([0-9]{8}-) Captures 8 digits and a following - symbol.
([0-9]{3,4}) Remaining three or four digits are captured by the second group.
$ Asserts that we are at the end.
Use \d{3,4}$ to match 3 or 4 digit in the end
Here is the complete regex pattern
^(\d{8})-(\d{3,4})$
Here is online demo
Pattern explanation:
^ the beginning of the string
( group and capture to \1:
\d{8} digits (0-9) (8 times)
) end of \1
- '-'
( group and capture to \2:
\d{3,4} digits (0-9) (between 3 and 4 times)
) end of \2
$ the end of the string
Your code works fine, just remove Not operator from your if clauses, and add matches to preg_match:
if($name == 'newserial1'){
$newserial1 = $_POST['newserial1'];
if($newserial1 != '') {
if(preg_match('/^([0-9]{8}-)([0-9]{3})$/', $newserial1, $matches) ||
(preg_match('/^([0-9]{8}-)([0-9]{4})$/', $newserial1, $matches))) {
//$result['valid'] = false;
//$result['reason'][$name] = 'Incorrect Serial Number.';
$result['matches'] = $matches[2];
}
}
}

Validating String in PHP with pattern match

Hey could someone help me to test if a string matches 3 double digit figures separated by a colon? For Example:
12:13:14
I understand I should be using preg_match but I can't work out how
Preferably the first number should be between 0 and 23 and the second two numbers should be between 0 and 59 like a time but I can always work that out with if statements.
Thanks
This answer does correct matching across the entire string (other answers will match the regexp within a longer string), without any extra tests required:
if (preg_match('/^((?:[0-1][0-9])|(?:2[0-3])):([0-5][0-9]):([0-5][0-9])$/', $string, $matches))
{
print_r($matches);
}
else
{
echo "Does not match\n";
}
You could use preg_match with number comparissons on $string = '23:24:25';
preg_match('~^(\d{2}):(\d{2}):(\d{2})$~', $string, $matches);
if (count($matches) != 3 || $matches[1] > 23 || $matches[2] > 59 || $matches[3] > 59 ......)
die('The digits are not right');
Or you can even ditch the regular expresions and use explode with numeric comparisons.
$numbers = explode(':', $string);
if (count($numbers) != 3 || $numbers[0] > 23 || $numbers[1] > 59 || $numbers[2] > 59 ......)
die('The digits are not right');
$regex = "/\d\d\:\d\d\:\d\d/";
$subject = "12:13:14";
preg_match($regex, $subject, $matches);
print_r($matches);
if (preg_match ('/\d\d:\d\d:\d\d/', $input)) {
// matches
} else {
// doesnt match
}
\d means any digit, so groups of two of those with : in between.

Categories