php regex to match text - php

I need a php regex to match text that is not preceded by the name "Total" of "maximum" case insensitive in the text below.
[1]
[1m]
[1mk][1mks]
[1mark]
[1marks]
(1mk)
12mk
12 mark
13 mark
[Total: 15]
Total: 16 mark
Total 1 mark
Total 12 mark
Total: 9 mark
Total: 10 mark
[Total: 11 marks] Total 6 mark
maximum 5 marks
maximum:5 marks
Note: This text is in a one long line.
The regex should match the following
[1]
[1m]
[1mk][1mks]
[1mark]
[1marks]
(1mk)
12mk
12 mark
13 mark
I have tried this one but its not working
/(?<!Total\:\s|Total\s|maximum\s|maximum\:\s)[\[|\(]?([0-9]{1,2})(\s|(?=marks|mark|mks|mk|m|\]))?(\]|marks|mark|mks|mk|m)[\]|\)]?/i
EDIT
https://www.debuggex.com/r/yNNN_B3iQmGyYWoz
EDIT2
e.g '12 mark' should be returned only is its not "Total[:]\s+ 12 mark" or "maximum[:]\s+12 mark"

Try this: (?:\[?\b(?:Total|maximum):?\s?\d+\s?[^ ]+(*SKIP)(*FAIL))|(\d++\s?[^ )\]]*)
(Use ignore case.)
Explanation
Part 1
(?:\[? Non capturing group that may have a [
\b Boundary
(?:Total|maximum) non capturing group matching either literal
:?\s?\d+\s? Maybe a : maybe a space, some digits, maybe another space.
[^ ]+ A bunch of non spaces.
(*SKIP)(*FAIL))| Plot twist: Anything matching Part 1 FAILS
Part 2
This is captured, for real.
\d++\s? digits, maybe followed by a space.
[^ )\]]* And maybe stuff that's not a space, ), or ].
The PHP should look something like this:
preg_match_all(
'/(?:\[?\b(?:Total|maximum):?\s?\d+\s?[^ ]+(*SKIP)(*FAIL))|(\d++\s?[^ )\]]*)/i',
"YOUR STRING",
$matches
);
print_r($matches[0]);

Actually I would go for the two step solution. First clean up the trashy words by replacing them with this regexp:
(Total:?\s?|maximum:?\s?)
Then match all the content you really need is easy:
\[?\(?([0-9]{1,2}\s?marks?|[0-9]{1,2}\s?mk?s?)\)?\]?
No idea how to use debuggex.com but I tested all regular expressions in pspad so it definitely works.

Related

Match multiple same occurence after one specific character chain REGEX

Thanks to anyone that will try to help me.
I struggle into making a regex that can do this case :
I want every match of "Heure Pleine Saison Basse" that occur after the first occurence of "Acheminement conso".
Using the raw text below, i want to match "Heure Pleine Saison Basse" 3 5 6 7 and not 1 & 2.
Do not use the number inside characted recognition, it is just here to help you uderstand which chain i want to match
This example regex only match the last occurrence :
Acheminement[\s\S]*(Heure Pleine Saison Basse)
Here is a great raw text example :
Electricité n° de\n
compteur ancien\n
index nouvel\n
index conso\n
kWh/Qté prix unitaire\n
HT en euros montant HT\n
en euros taux de\n
TVA\n
Contribution cee du 14/07/22 au 13/08/22 143020,00495 70,7920,0%\n
Evolutions arenh du 14/07/22 au 13/08/22 14302-0,03149 -450,3720,0%\n
Consommation  du 14/07/22 au 13/08/22 154\n
Heure Pleine Saison Basse 1
Heure Pleine Saison Basse 2
Heure Creuse Saison Basse 2
Acheminement conso\n
kWh/Qté prix unitaire\n
HT en euros montant HT\n
en euros taux de\n
TVA\n
Composante de comptage du 1
Composante de comptage du 2
Composante de soutirage du 1
Composante de soutirage du 2
Composante de gestion 1
Composante de gestion 2
Consommation du 14/07/22 au 31/07/22 Heure Pleine Saison Basse 56200,02000 112,4020,0%\n
Heure Creuse Saison Basse 26840,01700 45,6320,0%\n
Consommation du 01/08/22 au 13/08/22\n
Heure Pleine Saison Basse 3
Heure Creuse Saison Basse 4
Heure Pleine Saison Basse 5
Heure Pleine Saison Basse 6
Heure Pleine Saison Basse 7
Services et prestations techniques conso\n
kWh/Qté prix unitaire\n
HT en euros montant HT\n
en euros taux de\n
TVA\n
Espace Client Gratuit\n
Taxes et Contributions conso\n
You can use
'/(?:\G(?!\A)|Acheminement conso)[\s\S]*?\KHeure Pleine Saison Basse/u'
'/(?:\G(?!\A)|Acheminement conso).*?\KHeure Pleine Saison Basse/su'
See the regex demo. Details:
(?:\G(?!\A)|Acheminement conso) - either Acheminement conso or the end of the previous match (\G(?!\A) is matching what \G operator matches except the position at the start of string that is "cancelled" with the (?!\A) negative lookahead)
[\s\S]*? - any zero or more chars as few as possible
\K - omit the text matched so far
Heure Pleine Saison Basse - a fixed string.
The u flag is necessary when you have to deal with Unicode strings.
The s flag is useful to make . match any characters including line breaks.

Regular Expression for replace all non digit expect symbols

i can't figure out this thing i think it possible with only one pattern, please help me improve.
I have this string 2 / 3 items and i wont receive only 2 / 3
Items can also be write in cirillic so 2 / 3 штуки
So i think the best way is use \D all non digit (result 23)
But this delete also the slash that i want to keep, how i can do?
// this was my solution for now,
// but it not complete for cirillic cause i have an error
// it return: 2 / 3 �
// maybe is something with encoding?
preg_replace('#[a-zA-Zа-яА-Я]*#', '', '2 / 3 штуки');
// so i chose to do this, but doesn't know how to keep slash
preg_replace('#[\D]*#', '', '2 / 3 штуки');
// it return: 23
# How to get 2 / 3 ?
You can use
if (preg_match('~\d+\s*/\s*\d+~u', $text, $match)) {
echo $match[0];
}
Also, if the fraction part is optional, use
preg_match('~\d+(?:\s*/\s*\d+)?~u', $text, $match)
And if you need to extract all occurrences, use preg_match_all:
preg_match_all('~\d+(?:\s*/\s*\d+)?~u', $text, $matches)
See the regex demo and the PHP demo. Note that preg_match extracts the match rather than remove it (as is the case with preg_replace).
Pattern details
\d+ - one or more digits
- \s*/\s* - / enclosed with zero or more whitespaces
\d+ - one or more digits
Note that u is used in case the whitespace in your string can be other than regular ASCII whitespace, like \xA0.

Conditional regex length based on the first character

There is a string with numbers I need to validate with PHP preg_match.
If it starts with 10 or 20 or 30, I need 7 more numbers after the inital 2, but in any other cases I need 8 numbers only and don't care what are the lead characters.
The first part is the simple one
/^(1|2|3)0\d{7}$
But how can I add an ELSE part? There I need a simple
^\d{8}$
I need to match these examples:
101234567
201234567
12345678
33445566
You may use
^(?:[1-3]0\d{7}|(?![1-3]0)\d{8})$
See the regex demo
Details
^ - start of string
(?: - start of a non-capturing group:
[1-3]0\d{7} - 1, 2 or 3, then 0 and any 7 digits
| - or
(?![1-3]0)\d{8} - no 10, 20 or 30 immediately at the start of the string are allowed, then any 8 digits are matched
) - end of the group
$ - end of the string.
Here's an alternative using (?(?=regex)then|else) aka conditionals:
^(?(?=[1-3]0)[1-3]0\d{7}|\d{8})$
It literally says: if [1-3]0 is right at the start, match [1-3]0\d{7}, else match \d{8}.
Demo: https://regex101.com/r/LXoHyk/1 (examples shamelessly taken from Wiktor's answer)

Combine two regular expressions for php

I have these two regular expression
^(((98)|(\+98)|(0098)|0)(9){1}[0-9]{9})+$
^(9){1}[0-9]{9}+$
How can I combine these phrases together?
valid phone :
just start with : 0098 , +98 , 98 , 09 and 9
sample :
00989151855454
+989151855454
989151855454
09151855454
9151855454
You haven't provided what passes and what doesn't, but I think this will work if I understand correctly...
/^\+?0{0,2}98?/
Live demo
^ Matches the start of the string
\+? Matches 0 or 1 plus symbols (the backslash is to escape)
0{0,2} Matches between 0 and 2 (0, 1, and 2) of the 0 character
9 Matches a literal 9
8? Matches 0 or 1 of the literal 8 characters
Looking at your second regex, it looks like you want to make the first part ((98)|(\+98)|(0098)|0) in your first regex optional. Just make it optional by putting ? after it and it will allow the numbers allowed by second regex. Change this,
^(((98)|(\+98)|(0098)|0)(9){1}[0-9]{9})+$
to,
^(?:98|\+98|0098|0)?9[0-9]{9}$
^ this makes the non-grouping pattern optional which contains various alternations you want to allow.
I've made few more corrections in the regex. Use of {1} is redundant as that's the default behavior of a character, with or without it. and you don't need to unnecessarily group regex unless you need the groups. And I've removed the outer most parenthesis and + after it as that is not needed.
Demo
This regex
^(?:98|\+98|0098|0)?9[0-9]{9}$
matches
00989151855454
+989151855454
989151855454
09151855454
9151855454
Demo: https://regex101.com/r/VFc4pK/1/
However note that you are requiring to have a 9 as first digit after the country code or 0.

PHP Regex to capture names if prefixed with key words

I'm in need of a PHP regular expression to capture the first initial an last name of people listed in a text document. But only capture the names when the sentence or line contains a few keywords. (from, with, of, and ,as ,observed). My current attempt captures list items ie. "A. General" or "B. Issues" because it doesn't seem to care about what's in front of the names.
I've been using preg_match_all() with hopes of it returning an array of names. (first inital, last name).
Example text
"from J. Smith and B. Miller"
"as T. Baker observed M. Kelly"
"We inquired with B. Brown, T. Stark and J. Maddox."
I've tried
$regex = "/[from|with|of|and|as|observed|,|.]\s+([A-Z]. \w+)/";
$regex = "/((from|with|of|and|as|observed|,|.)\s+([A-Z]. \w+))/";
$regex = "/\b(from|with|of|and|as|observed|,|.)\s+([A-Z].\ \w+)/";
$regex = "/\b(from|with|of|and|as|observed|,|.|\b)\s+([A-Z].\ \w+)/";
I cannot make it only capture when the word list is before the names. I can't use ^ to check 'starts with'. I'm horrible at regex and guess until it works. I feel the solution requires some sort of look-behind assertion, though I'm not sure how it works.
Output
Should be an array
[ 'J. Smith', 'B. Miller' ]
[ 'T. Baker', 'M. Kelly' ]
[ 'B. Brown', 'T. Stark', 'J. Maddox' ]
UPDATE
Final Regexp
$regex = "/\b(?:from|with|of|and|as|observed|,)\s+([A-Z].\ \w+)/";
Seems to work with the few documents I have. Thanks everyone!!
You can use this modified version of your third regex :
\b(?:from|with|of|and|as|observed|,)\s+([A-Z].\ \w+)\g
You need to escape . in the first group or it will accept any character. Not relevant after edit
The \g flag will find every occurrence of the pattern, and you will be able to access the results in $matches[1].
(The added ?: in first group prevent it from being captured, you can remove it if you need to know the keyword, but then the results will be stored in $matches[2] )
Edit : Removed \. in first group to not match end of sentences (see author comment).
You can try looking for a capital letter followed by a dot and a word
[A-Z]\.\s\w+
I think this should work
/(?!^from|with|of|and|as|observed|\s)([A-Z]{1,}\.\s\w*)/g
Where
?! = Discard the match of the first group, that begins with first ( and ends with ) and at least is included also the \s (space) at the beginning of the name.
^ = match the begins of the line/sentence/string
Then in second group it should match just one capital letter {1,} and then a dot \., a space \s and the word \w
The /g at the end stands for "global search"
https://regexr.com/3pa9o

Categories