Match just once with regex - php

I'm using this regex to mach some words without numbers and it works well
(?:searchForThis|\G).+?(\b[^\d\s]+?\b)
The problem that Regex searching the entire document and not only in the line that contains searchForThis
So if I have 2 times searchForThis it will take them twice
I want to stop it only on that 1st line so it will not search the other lines after
Any help please?
I'm using Regex with php
Example of the problem here: http://www.rubular.com/r/vPhk8VbqZR
In the example you will see :
Match 1
1. word
Match 2
1. worldtwo
Match 3
1. wordfive
Match 4
1. word
Match 5
1. worldtwo
Match 6
1. wordfive
But I need only :
Match 1
1. word
Match 2
1. worldtwo
Match 3
1. wordfive
You will see that it's doing twice
===========Edit for more details as asked ===========================
In my php I have :
define('CODE_REGEX', '/(?:searchForThis|\G(?<!^)).*?(\b[a-zA-Z]+\b)/iu')
Output :
if (preg_match_all(CODE_REGEX, $content, $result))
return trim($result[1][0].' '.$result[1][1].' '.$result[1][2].' '.$result[1][3].' '.$result[1][4].' '.$result[1][5]);
Thank you

You can use this pattern instead:
(?:\A[\s\S]*?searchForThis|\G).*?(\b[a-z]+\b)/iu
or
(?:\A(?s).*?searchForThis|\G)(?-s).*?(\b[a-z]+\b)/iu
To deal with multiple line between the first "searchForThis" and others or the end of the string, you can use this: (with your example string you will obtain "After" and "this".)
(?:\A.*?searchForThis|\G)(?>[^a-z]++|\b[a-z]++\S)*?(?!searchForThis)(\b[a-z]+\b)/ius
Note: in all the three pattern you can replace \A with ^ since the multiline mode is not used. Be carefull with rubular that is designed for ruby regexes: m in ruby = s in php (that is the dotall/singleline mode), m in php is the multiline mode (each start of the line can be matched with ^)

You can make it in two stages :
// get the first line with 'searchForThis'
preg_match('/searchForThis(?<line>.*)\n/m', $text, $results);
$line = $results['line'];
// get every word from this line
preg_match_all('/\b[a-z]+\b/i', $line, $results);
$words = $results[0];
Another way, based on the great Casimir's answer (just for readibility) :
preg_match_all('/(?s:^.*?searchForThis|\G).*?(?<words>\b[a-z]+\b)/iu', $str, $results);
$words = $results['words'];

Related

Regex in conditional or with exact number of digits

I've been struggling to achieve regex with the operator or.
For example
Having the following chain:
Allowed numbers: 1, 2, 5, 6, 20
"/path/item/1"
"/path/item/2"
"/path/item/5"
etc
The regex that I have been testing is:
"/\/path\/item\/(1|2|5|6|20)/"
What I want is for regex to return true only if it is 1 or 2 or 5 or 6, etc.
But for the example of the number 20, the regex returns true for 2 and not for 20.
How can I validate each value independently, that is to say that it is only true if it is 2 and not 20. But true when it is 20 but not 2.
How would the regex be to implement this validation?
Ejemplo
You need to restrict the search such that the matched digits bring you to the end of the string:
"/\/path\/item\/(1|2|5|6|20)$/"
This will mean that the digits must exactly match, and does not involve any re-ordering of the permitted values in your regex.
Demonstrated here
The key is to add the large numbers first in the capturing or non-capturing group, such as:
^\/path\/item\/(20|1|2|5|6)$
or
^\/path\/item\/(?:20|1|2|5|6)$
or
\/path\/item\/(?:20|1|2|5|6)
Test
$re = '/^\/path\/item\/(20|1|2|5|6)$/s';
$str = '/path/item/20';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
The expression is explained on the top right panel of this demo, if you wish to explore further or modify it, and in this link, you can watch how it would match against some sample inputs step by step, if you like.
Problem with your code was, whenever you sent 20 to match, 2 was matched first and was ignored as there also was 0 following. This can be resolved by giving 20 first, like this:
\/path\/item\/(20|1|2|5|6)\/
View Here: https://regex101.com/r/aJf1Q8/1

Combine two regular expressions for php

I have these two regular expression
^(((98)|(\+98)|(0098)|0)(9){1}[0-9]{9})+$
^(9){1}[0-9]{9}+$
How can I combine these phrases together?
valid phone :
just start with : 0098 , +98 , 98 , 09 and 9
sample :
00989151855454
+989151855454
989151855454
09151855454
9151855454
You haven't provided what passes and what doesn't, but I think this will work if I understand correctly...
/^\+?0{0,2}98?/
Live demo
^ Matches the start of the string
\+? Matches 0 or 1 plus symbols (the backslash is to escape)
0{0,2} Matches between 0 and 2 (0, 1, and 2) of the 0 character
9 Matches a literal 9
8? Matches 0 or 1 of the literal 8 characters
Looking at your second regex, it looks like you want to make the first part ((98)|(\+98)|(0098)|0) in your first regex optional. Just make it optional by putting ? after it and it will allow the numbers allowed by second regex. Change this,
^(((98)|(\+98)|(0098)|0)(9){1}[0-9]{9})+$
to,
^(?:98|\+98|0098|0)?9[0-9]{9}$
^ this makes the non-grouping pattern optional which contains various alternations you want to allow.
I've made few more corrections in the regex. Use of {1} is redundant as that's the default behavior of a character, with or without it. and you don't need to unnecessarily group regex unless you need the groups. And I've removed the outer most parenthesis and + after it as that is not needed.
Demo
This regex
^(?:98|\+98|0098|0)?9[0-9]{9}$
matches
00989151855454
+989151855454
989151855454
09151855454
9151855454
Demo: https://regex101.com/r/VFc4pK/1/
However note that you are requiring to have a 9 as first digit after the country code or 0.

PHP Regex to get text between 2 words with numbers

i'm trying to get the string between two words in a entire string:
Ex.:
My string:
...'Total a Facturar 123,061 221,063 26,860161,16080,580310,760 358,297 Recepcionado'...
I'm using
/(?<=Total a Facturar )(.*?) Recepcionado/
I need the highlighted characters (26,860161,16080,580310,760)
and i get 221,061 221,063 26,860161,16080,580310,760 358,297 Recepcionado with my pattern.
The numbers of the string are always different, i need the numbers that are together without a space.
Thanks
EDIT:
Here is the entire string: eval.in/802292
I hope this will be helpful
Regex demo or Regex demo 2
Regex: (?:\d+(?:\,\d+){2,})
For above question you can also use it like this (?:\d+(?:\,\d+){4})
1. (?:\d+) this will match digits one or more.
2. (?:\,\d+){2,} Adding this in expression will match patterns like , and digits {2,} for 2 or more than 2 times.
PHP code: Try this code snippet here
<?php
ini_set('display_errors', 1);
$string = "Total a Facturar 123,061 221,063 26,860161,16080,580310,760 358,297 Recepcionado";
preg_match("#(?:\d+(?:\,\d+){2,})#", $string, $matches);
print_r($matches);

Group regex with fix part

$txt = "toto1 555.4545.555.999.7465.432.674";
$rgx = "/([\w]+)\s([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)\.([0-9]+)/";
preg_match($rgx, $txt, $res);
var_dump($res);
I would like to simplify this pattern by avoiding repeating "([0-9]+)" because i don't know how many they are.
Any one can say me how ?
Here is a direct answer to the question, as you have stated it:
/[\w]+\s[0-9]+(?:\.[0-9]+)+/
However, note that I have removed all of the numbered capture groups. This could be problematic, depending on what you're actually trying to achieve.
It is not possible to "count" with capture groups in regular expressions, so you would need to write some other code (i.e. not just one match, with one regex, and using back-references) to deal with this if you wish to run any queries like "What digits appear after the fifth "."?"
There are two ways you can do this. If you just need to verify that the string matches the pattern, this regex will do the job: \w+\s(?:[0-9]+\.?)+
However, if you need to split the string in to it's component parts (in my interpretation, the beginning word followed by the sequence of decimal separated numbers), then you could use this pattern: (\w+)\s((?:[0-9]+\.?)+)
The second pattern will return the beginning word, toto1 in group 1, followed by the decimal separated numbers in group 2 555.4545.555.999.7465.432.674 which you could then split in PHP if required: $sequence = explode('.', $matches[2]);
What you need can be obtained with a preg_split with a regex matching 1 or more whitespaces or dots:
$txt = "toto1 555.4545.555.999.7465.432.674";
$rgx = '/[\s.]+/';
$res = preg_split($rgx, $txt);
print_r($res);
See the PHP demo
If you need a regex approach, you can use a \G based regex with preg_match_all:
'~(?|([\w]+)|(?!\A)\G[\s.]*([0-9]+))~'
See the regex demo and a PHP demo:
$txt = "toto1 555.4545.555.999.7465.432.674";
$rgx = '~(?|(\w+)|(?!\A)\G[\s.]*([0-9]+))~';
preg_match_all($rgx, $txt, $res);
print_r($res[1]);
Pattern details:
The (?|...) is a branch reset group to reset group IDs in all the branches
(\w+) - Group 1 matches 1+ word chars
| - or (then goes Branch 2)
(?!\A)\G - the end of the previous successful match
[\s.]* - zero or more whitespaces or dots
([0-9]+) - Group 1 (again!) matching 1 or more digits.

PHP PREG_SPLIT on numbers 1-100

I am working on some code to break down the full text of a test that will copied and pasted with the following format:
1. This is question number one.
A. Answer 1
B. Answer 2
C. Answer 3
D. Answer 4
2. This is question number two.
3. This is another question, number three.
45. Ken has uses his money, $353. How much does he have after spending $214.
I am using the following preg_split:
$questions = preg_split("/[0-9]+\./", $_POST[test]);
My problem has come in with questions like #45 where there are numbers in the question itself and they are followed by a period.
I just want to match the numbers 1-100 followed by a period. Eg.
1.
2.
3.
4.
5.
etc
I think it is better to use multiline flag with ^:
$questions = preg_split('/^ *[0-9]+\. +/m', $_POST[test]);
A number between 1 and 100, followed by a period, can be matched by
/\b(?:100|[1-9][0-9]?)\./
but if the actual rule is to match a number at the start of a line, use
/^\d+\./m
You can use preg_match_all() instead:
preg_match_all('~(?:^|\R)[0-9]+\. \K.+~', $_POST['test'], $matches);
$questions = $matches[0];
Use ^ to specify that it's the beginning of the line, using the g and m modifiers to specify global and multiline:
/^[0-9]+\.\s/m

Categories