Make a negative lookahead case-insensitive - php

I have the following expression:
$exp = "/^(?!.*?that).*$/";
which is meant to match any line that does not contain "that".
I have the following three sentences:
$str = array(
"I like this sentence.", #line1
"I like that sentence.", #line2
"I link THAT sentence." #line3
);
The match is case-sensitive and therefore only lines 1 and 3 are matched. So far so good.
However, I would like to make it case-insensitive, so that it only matches line 1. I have tried with an inline modifier, i.e. "(?-i ... )":
$exp = "/^(?!.*?(?i:that)).*$/";
and as a flag, i.e. "/ ... /i":
$exp = "/^(?!.*?that).*$/i";
but to no avail.
I run the search with the following loop:
foreach($str as $s) {
preg_match_all($exp, $s, $matches);
var_dump($matches);
}
with output:
array (size=1)
0 =>
array (size=1)
0 => string 'I like this sentence.' (length=21)
array (size=1)
0 =>
array (size=0)
empty
array (size=1)
0 =>
array (size=1)
0 => string 'I link THAT sentence.' (length=21)
and an online demo is available here: https://regex101.com/r/bs9rzF/1
I would grateful for any tips about how I can make my regular expression case-insensitive.
EDIT: I was incorrectly using "?-i" instead of "?-i", as some contributors correctly point out. Fixed now.

Your first regex ^(?!.*?that).*$ has nothing to do with case sensitivity as you are not using any modifier for case insensitivity.
The regex matches first and third sentence because your regex is saying that there shouldn't be a word that (case sensitive here) in the sentence, which is true for first and third sentence (In third sentence you have THAT which is not same as that)
To match only the first sentence, you can use the inline modifier (?i) like
(?i)^(?!.*?that).*$
See here
BTW, your /^(?!.*?that).*$/i regex is also correct.

You were close:
^(?!.*?(?i)that).*$
See a demo on regex101.com. In your expression ((?-i)) you were turning the modifier off.

Related

Why is preg_match behaving differently to preg_replace (resulting in different matches) in php?

Given the following string and regular expression, the resulting behavior is something I don't understand. preg_match delivers what I am expecting while preg_replace doesn't make sense to me.
$string = 'aaa [Ticket#RS-123456] äüö [xxx] ccc ddd';
$re = '#(.*)?(\[Ticket\#)(.*)(\])(.*)?#siU';
What I finally need in this example is the string RS-123456 (or whatever string would be at this position). This string should match at the 3rd position ($3), if I don't completely misunderstand regular expressions.
preg_match($re, $string, $matches_pm);
Result (as expected):
Array(
[0] => aaa [Ticket#RS-123456]
[1] => aaa
[2] => [Ticket#
[3] => RS-123456 // That's exactly what I would expect
[4] => ]
)
$res_pr = preg_replace($re, "$3", $string);
Result (unexpected):
RS-123456 äüö [xxx] ccc ddd
I hope anyone can open my eyes and show me where my logical failure is hiding.
Both match the same text, but preg_match returns the first match only while preg_replace replaces the match (that is not the entire string) with Group 3 contents leaving äüö [xxx] ccc ddd in the resulting string.
Use
$re = '#(.*)(\[Ticket\#)(.*?)(\])(.*)#si';
to get the same results with preg_match and preg_replace.
See the PHP demo.
However, preg_match is the preferred way here:
if (preg_match('#\[Ticket#\K[^]]+#i', $string, $matches_pm)) {
echo $matches_pm[0];
}
See this PHP demo.
Pattern details
\[Ticket# - a literal [Ticket# substring
\K - match reset operator discarding the currently matched text
[^]]+ - 1 or more chars other than ]

How to separate string to number in single word with PHP?

I have the word AK747, I use regex to detect if a string (at least 2 chars ex: AK) is followed by a number (at least to digits ex: 747).
EDIT : (sorry that I wasn't clear on this guys)
I need to do this above because :
In some case I need to split to match search against AK-747. When I search for string 'AK-747' with keyword 'AK747' it won't find a match unless I use levenshtein in database, so I prefer splitting AK747 to AK and 747.
My code:
$strNumMatch = preg_match('/^[a-zA-Z]{2,}[0-9]{2,}$/',
$value, $match);
if(isset($match[0]))
echo $match[0];
How do I split to array ['AK', '747'] for example with preg_split() or any other way?
$input = 'AK-747';
if (preg_match('/^([a-z]{2,})-?([0-9]{2,})$/i', $input, $result)) {
unset($result[0]);
}
print_r($result);
The output:
Array
(
[1] => AK
[2] => 747
)
You may try this:
preg_match('/[0-9]{2,}/', $value, $matches, PREG_OFFSET_CAPTURE);
$position = $matches[0][1];
$letters = substr($value, 0, $position);
$numbers = substr($value, $position);
This way you get the position of the first number and split there.
EDIT:
Starting from your original approach this could look somewhat like this:
$strNumMatch = preg_match('/^([a-zA-Z]{2,})([0-9]{2,})$/', $value, $match, PREG_OFFSET_CAPTURE);
if($strNumMatch){
$position = $matches[2][1];
$letters = substr($value, 0, $position);
$numbers = substr($value, $position);
$alternative = $letters.'-'.$numbers;
}
preg_split() is a very sensible and direct call since you desire an indexed array containing the two substrings.
Code: (Demo)
$input = 'AK-747';
var_export(preg_split('/[a-z]{2,}\K-?/i',$input));
Output:
array (
0 => 'AK',
1 => '747',
)
The \K means "restart the fullstring match". Effectively, everything to the left of \K is retained as the first element in the result array and everything to right (the optional hyphen) is omitted because it is considered the delimiter. Pattern Demo
Code: (Demo)
I process a small battery of inputs to show what can be done and explain after the snippet.
$inputs=['AK747','AK-747','AK-','AK']; // variations as I understand them
foreach($inputs as $input){
echo "$input returns: ";
var_export(preg_split('/[a-z]{2,}\K-?/i',$input,2,PREG_SPLIT_NO_EMPTY));
echo "\n";
}
Output:
AK747 returns: array (
0 => 'AK',
1 => '747',
)
AK-747 returns: array (
0 => 'AK',
1 => '747',
)
AK- returns: array (
0 => 'AK',
)
AK returns: array (
0 => 'AK',
)
preg_split() takes a pattern that receives a pattern that will match a variable substring and use it as a delimiter. If - were present in every input string then explode('-',$input) would be most appropriate. However, - is optional in this task, so the pattern must allow - to be optional (this is what the ? quantifier does in all of the patterns on this page).
Now, you couldn't just use a pattern like /-?/, that would split the string on every character. To overcome this, you need to tell the regex engine the exact expected location for the optional -. You do this by referencing [a-z]{2,} before the -? (single intended delimiter).
The pattern /[a-z]{2,}-?/i does a fair job of finding the correct location for the optional hyphen, but now the trouble is, the leading letters in the string are included as part of the delimiting substring.
Sometimes, "lookarounds" can be used in regex patterns to match but not consume substrings. A "positive lookbehind" is used to match a preceding substring, however "variable length lookbehinds" are not permitted in php (and most other regex flavors). This is what the invalid pattern would look like: /(?<=[a-z]{2,})-?/i.
The way around this technicality is to "restart the fullstring match" using the \K token (aka a lookbehind alternative) just before the optional hyphen. To correctly target only the intended delimiter, the leading letters must be "matched/consumed" then "discarded" -- that's what \K does.
As for the inclusion of the 3rd and 4th parameter of preg_split()...
I've set the 3rd parameter to 2. This is just like the limit parameter that explode() has. It instructs the function to not make more than 2 output elements. For this case, I could have used NULL or -1 to mean "unlimited", but I could NOT leave the parameter empty -- it must be assigned to allow for the declaration of the 4th parameter.
I've set the 4th parameter to PREG_SPLIT_NO_EMPTY which instructs the function to not generate empty output elements.
Ta-Da!
p.s. a preg_match_all() solution is as easy as using a pipe and two anchors:
$inputs=['AK747','AK-747','AK-','AK']; // variations as I understand them
foreach($inputs as $input){
echo "$input returns: ";
var_export(preg_match_all('/^[a-z]{2,}|\d{2,}$/i',$input,$out)?$out[0]:[]);
echo "\n";
}
// same outputs as above
You can make the - optional with ?.
/([A-Za-z]{2,}-?[0-9]{2,})/
https://regex101.com/r/tIgM4F/1

PHP find multiple currency numbers in string

I'm writing php script which will recognise bank payment reports.
For example, I have this code:
$str = "Customer Name /First Polises number - SAT431223 (5.20 eur), BOS32342 (33,85 euro), (32,10 eiro), (78.66 €), €1232,2, (11.45)"
And I need to find all this currency combinations in string, so the input be like this:
5.20
33.85
32.10
78.66
1232.20
11.45
How can I do that? I know the function preg_match(), but I don't understand how to write pattern for that case.
preg_match will give you only first match found. But you can use preg_match_all to get array of all matches.
Here's everything you need to know about how to build regex patterns:
http://php.net/manual/en/reference.pcre.pattern.syntax.php
You need pattern like this: /[0-9]+[,.]{1}[0-9]{2}/
/ - delimiter, can be other character, but you need it on the beginning and end of the pattern.
[0-9] - matches digits
+ and {1}, and {2} - they define amount of charaters. + is "one or more", number in {} is exact number of characters.
[,.]{1} - this matches exactly one ({1}) character from set of ,..
Example code:
$matches = array();
preg_match_all('/[0-9]+[,.]{1}[0-9]{2}/', $str, $matches);
var_dump($matches);
Result:
array (size=1)
0 =>
array (size=5)
0 => string '5.20' (length=4)
1 => string '33,85' (length=5)
2 => string '32,10' (length=5)
3 => string '78.66' (length=5)
4 => string '11.45' (length=5)
I would do this with:
/([0-9]+[,.][0-9]+)/g
Matching:
Numbers (zero or more times)
Dot or Comma
Numbers (zero or more times)
Note the g: Global to get all matches
Example and more detailed break-down of the regex: https://regex101.com/r/eH6aX6/1
That will match any double values in the provided sentence which are not necessarily currency...
Hope it points you t the correct direction

Regular Expression on PHP

I have a problem in splitting a string using regex.
I have searched about regex to split string on uppercase word, but what I need is to split string like in the following example.
Having this example data:
This is First SentenceThis is Second Sentence
... the string should be split like this:
This is First Sentence
This is Second Sentence
Anyone know the solution for this?
You can use the \K token combined with a lookahead assertion.
$str = 'This is First SentenceThis is Second Sentence';
$results = preg_split('~[a-z]\K(?=[A-Z])~', $str);
print_r($results);
Or utilize both look-behind and lookahead assertions:
$results = preg_split('~(?<=[a-z])(?=[A-Z])~', $str);
Output
Array
(
[0] => This is First Sentence
[1] => This is Second Sentence
)

Regular expression in PHP being too greedy on words

I know I'm just being simple-minded at this point but I'm stumped. Suppose I have a textual target that looks like this:
Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother's HG766 id was RB1223.
Using this RegExp: \s[A-Z][A-Z]\d\d\d\d\s, how would I extract, individually, the first and second occurrences of the matching strings? "JH6781" and "RB1223", respectively. I guarantee that the matching string will appear exactly twice in the target text.
Note: I do NOT want to change the existing string at all, so str_replace() is not an option.
Erm... how about using this regex:
/\b[A-Z]{2}\d{4}\b/
It means 'match boundary of a word, followed by exactly two capital English letters, followed by exactly four digits, followed by a word boundary'. So it won't match 'TGX7777' (word boundary is followed by three letters - pattern match failed), and it won't match 'TX77777' (four digits are followed by another digit - fail again).
And that's how it can be used:
$str = "Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother's HG766 id was RB1223.";
preg_match_all('/\b[A-Z]{2}\d{4}\b/', $str, $matches);
var_dump($matches[0]);
// array
// 0 => string 'JH6781' (length=6)
// 1 => string 'RB1223' (length=6)
$s='Johnny was really named for his 1234 grandfather, John Hugenot, but his T5677 id was JH6781 and his little brother\'s HG766 id was RB1223.';
$n=preg_match_all('/\b[A-Z][A-Z]\d\d\d\d\b/',$s,$m);
gives the result $n=2, then
print_r($m);
gives the result
Array
(
[0] => Array
(
[0] => JH6781
[1] => RB1223
)
)
You could use a combination of preg_match with the offset parameter(5th) and strpos to select the first and second occurrence.
Alternatively you could use preg_match_all and just use the first two array entries
<?php
$first = preg_match($regex, $subject, $match);
$second = preg_match($regex, $subject, $match, 0, strpos($match[0]) + 1);
?>

Categories