Text for example
data=1 type=old
data=2 type=test (2)
type=test data=3 (3)
I need get data-id from 2 and 3 lines
My code:
(data=([\d]+)|type=test)\s+(?!\1)((?1))
but don't get data=3
You need the g from global and m from multiline in your regex:
/(data=([\d]+)|type=test)\s+(?!\1)((?1))/gm
In the most simple form you may use
^(?=.*type=test).*data=(\d+)
See the regex demo
You may add word/whitespace boundaries later if necessary, e.g.
^(?=.*\btype=test\b).*\bdata=(\d+)\b
^(?=.*(?<!\S)type=test(?!\S)).*(?<!\S)data=(\d+)(?!\S)
The point is
^ - start of string
(?=.*type=test) - there must be type=test after any 0+ chars as many as possible to the right of the current position
.* - any 0+ chars other than line break chars as many as possible
data= - a string
(\d+) - Group 1: 1+ digits
Related
I have a variable that can contain a few variations, it also contains a number which can be any number.
The variations:
($stuksprijs+9.075);
($stuksprijs-9.075);
($stuksprijs*9.075);
($m2+9.075);
($m2-9.075);
($m2*9.075);
($o+9.075);
($o-9.075);
($o*9.075);
These are the only variations except for the numbers in it, they can change. And I need that number.
So there can be:
($m2+5);
or
($o+8.25);
or
($stuksprijs*3);
How can I get the number from those variations? How can I get the 9.075 or 5 or 8.25 or 3 from my above examples with regular expression?
I am trying to fix this with PHP, my variable that contains the string is: $explodeberekening[1]
I read multiple regex tutorials and got it to work for a single string that never changes, but how can I write a regex to get the number from above variations?
As per my comment, which seems to have worked, you can try:
^\(\$(?:stuksprijs|m2|o)[+*-](\d+(?:\.\d+)?)\);$
The number is captured in the 1st capture group. See the online demo.
A quick breakdown:
^ - Start string anchor.
\(\$ - Literally match "($".
(?: - Open a non-capture group to list alternation:
stuksprijs|m2|o - Match one of these literal alternatives.
) - Close non-capture group.
[+*-] - Match one of the symbols from the character-class.
( - Open 1st capture group:
\d+ - 1+ digits.
(?:\.\d+)? - Extra optional non-capture group to match a literal dot and 1+ digits.
) - Close 1st capture group.
\); - Literally match ");".
$ - End string anchor.
I've got a list of url with random ending string like that :
paris-chambre-double-classique-avec-option-petit-dejeuner-a-lhotel-trianon-rive-gauche-4-pour-2-personnes-8ae0676c-aba2-4cf2-9391-91096a247672
paris-chambre-double-standard-avec-petit-dejeuner-et-acces-spa-pour-2-personnes-a-lhotel-le-mareuil-4-f707b0fe-31cb-4507-b7b3-7b91695bff9c
villes-deurope-visite-des-plus-grands-monuments-et-acces-aux-activites-etou-transport-avec-un-pass-par-destination-6a04659b-62c4-4995-9d0f-5e473df520cd
paris-chambre-doubletriplequadruple-confort-avec-petit-dejeuner-a-lhotel-de-france-gare-de-lyon-pour-2-a-4-pers-404f5780-9818-4599-af6b-be53b85a8185
paris-chambre-double-standard-avec-pdj-et-croisiere-sur-la-seine-en-option-a-lhotel-prince-albert-lyon-bercy-pour-2-33d0b087-5701-4199-9d9c-147cca687263.html
Now i try since few day with regex to convert this line into :
/paris-chambre-double-classique-avec-option-petit-dejeuner-a-lhotel-trianon-rive-gauche-4-pour-2-personnes-8ae0676c-aba2-4cf2-9391-91096a247672
/paris-chambre-double-standard-avec-petit-dejeuner-et-acces-spa-pour-2-personnes-a-lhotel-le-mareuil-4-f707b0fe-31cb-4507-b7b3-7b91695bff9c
villes-deurope-visite-des-plus-grands-monuments-et-acces-aux-activites-etou-transport-avec-un-pass-par-destination-6a04659b-62c4-4995-9d0f-5e473df520cd.html
/paris-chambre-doubletriplequadruple-confort-avec-petit-dejeuner-a-lhotel-de-france-gare-de-lyon-pour-2-a-4-pers-404f5780-9818-4599-af6b-be53b85a8185
paris-chambre-double-standard-avec-pdj-et-croisiere-sur-la-seine-en-option-a-lhotel-prince-albert-lyon-bercy-pour-2-33d0b087-5701-4199-9d9c-147cca687263.html
The problem is the random string :
3d0b087-5701-4199-9d9c-147cca687263
33d0b087-5701-4199-9d9c-147cca687263
I need to remove this part without having the last - and add .html: and add a / beforeurl like that:
/paris-chambre-doubletriplequadruple-confort-avec-petit-dejeuner-a-lhotel-de-france-gare-de-lyon-pour-2-a-4-pers.html
paris-chambre-double-standard-avec-pdj-et-croisiere-sur-la-seine-en-option-a-lhotel-prince-albert-lyon-bercy-pour-2.html
Thanks for your help. Regex is running me crazy.
This is for a new Linux server, running MySQL 5, PHP 5 and Apache 2.
The lines appear to end with some sort of hash, which means it can only contain the letters a to f and digits.
To match this hash, you can use the following regex (it does include the initial dash):
\-[0-9a-f]{8}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{12}
See here for an demo
Once you have matched what you want to remove, you can replace it with the PHP preg_replace function.
You could use this pattern to capture into group part you want to keep ^(.+)(?:-[0-9a-zA-Z]+){5}$
and replace pattern is \\\1.html
Explanation:
^ - match beginning of a string
(.+) - capturing group: match one or more of any characters
(?:...) - non-capturing group
-[0-9a-zA-Z]+ - match hyphen - literally, then any letter (lower or uppercase) or any digit one or more times
{5} - match (?:-[0-9a-zA-Z]+) exactly five times
$ - match end of string
Replace pattern:
\\ - \ literally
\1 - refers to first capturing group
.html - .html literally
Demo
I am looking to build a regular expression to parse a string, which can be of one of the following two forms: -
Part 1 (Part 2 - Part 3)
or
Part 1 (Part 2)
The following regular expression matches first string and captures all three parts
(.*)\((.*)(?:-)(.*)\)
But I am unable to improvise it so that it could match both strings. I want one regex to match both strings. Not sure if it is even possible.
You may use
'~(.*)\((.*?)(?:-(.*))?\)~'
See the regex demo
Details
(.*) - Group 1: any 0+ chars other than line break chars, as many as possible
\( - a ( char
(.*?) - Group 2: any 0+ chars other than line break chars, as few as possible
(?:-(.*))? - an optional group matching a - and then capturing into Group 3 any 0+ chars other than line break chars, as many as possible
\) - a ) char.
If there can be no other parentheses than those shown in the string, you may optimize the pattern to ^([^()]*)\(([^()-]*)(?:-([^()]*))?\)$.
I'm new to regular expressions and try to extract text in a string which starts with a value in brackets on the beginning of a new line until the next string in brackets.
My string:
(1x) cat
dog
(2) ele(4)phant
tiger
(x) fish
bird
I need to get:
- "1x" and "cat\r\ndog"
- "2" and "ele(4)phant\r\ntiger"
- "x" and "fish\r\nbird"
My regex:
(\r\n)*(\((.*?)\))(.*)
This gets me:
Match 1
Full match 0-8 `(1x) cat`
Group 2. 0-4 `(1x)`
Group 3. 1-3 `1x`
Group 4. 4-8 ` cat`
Match 2
Full match 13-28 `(2) ele(4)phant`
Group 2. 13-16 `(2)`
Group 3. 14-15 `2`
Group 4. 16-28 ` ele(4)phant`
Match 3
Full match 35-44 `(x) fish `
Group 2. 35-38 `(x)`
Group 3. 36-37 `x`
Group 4. 38-44 ` fish `
The problem is that my regex seems to stop at the end of the line so the strings on the new line (dog, tiger, bird) are missing.
Do you have an idea how to also get the content of the next lines until the next match?
You may use
'~^\(([^()]*)\)(.*(?:\R(?!\([^()]*\)).*)*)~m'
See the regex demo
Details
^ - start of a line (due to m modifier, ^ matches the start of a line rather than the start of the whole string)
\( - a (
([^()]*) - Group 1:
[^()]* - 0+ chars other than ( and ) (you might use your .*? here, if you do not want to overflow across lines, and want to match ( inside (...))
\) - a ) char
(.*(?:\R(?!\([^()]*\)).*)*) - Group 2:
.* - the rest of the line
(?:\R(?!\([^()]*\)).*)* - 0+ sequences of
\R(?!\([^()]*\)) - line break not followed with (...) substring
.* - rest of the line
I'm working through a bunch of text in which I'm looking for the following strings:
INT.
EXT.
INT./EXT.
EXT./INT.
The text under analysis is, for instance,
17 INT. BLOOM HOUSE - NIGHT 17
27 INT./EXT. BLOOM HOUSE - (PRESENT) DAY 27
Calls in php to, for instance,
preg_match("/^\w.*(INT\.\/EXT\.|EXT\.\/INT\.|EXT\.|INT\.)(.*)$/", $a_line, $matches);
and variants of that don't quite handle the greediness right (or so I think, anyway), and something gets left out, usually INT./EXT. or EXT./INT. items. Any advice? Thanks!
True, you need to use lazy dot matching with \w.*?, but you can also optimize the pattern to shorten the alternation group like this:
/^\w.*?(INT\.(?:\/EXT\.)?|EXT\.(?:\/INT\.)?)(.*)$/
See the regex demo
Also, if you are processing the text as a whole, you will need a /m multiline modifer.
Details:
^ - start of a string
\w - a word char
.*? - any 0+ chars other than line break chars as few as possible up to the first
(INT\.(?:\/EXT\.)?|EXT\.(?:\/INT\.)?) - Group 1 capturing either:
INT\.(?:\/EXT\.)? - INT. followed with optional /EXT. substring
| - or
EXT\.(?:\/INT\.)? - EXT. followed with optional /INT. substring
(.*) - Group 2: any 0+ chars other than line break chars up to the...
$ - end of string.