PHP - preg_match_all - a little advenced - php

I need to find specific part of text in string.
That text need to have:
12 characters (letters and numbers only)
whole string must contains at least 3 digits
3*4 characters with spaces (ex. K9X6 6GM6 LM11)
every block from example above must contains at least 1 number
words like this, line, spod shouldn't be recognized
So I ended with this code:
preg_match_all("/(?<!\S)(?i:[a-z\d]{4}|[a-z\d]{12})(?!\S)/", $input_lines, $output_array);
But it won't works for all of requirements. Of course I can use preg_repace or str_replace and remove all (!,?,#) and in a loop count numbers if there are 4 or more but I wonder if it is possible to do with preg_match_all...
Here is a string to search in:
?K9X6 6GM6 LM11 // not recognized - but it should be
!K9X6 6GM6 LM11 // not recognized - but it should be
K0X6 0GM7 LM12! // not recognized - but it should be
K1X6 1GM8 LM13# // not recognized - but it should be
K2X6 2GM9 LM14? // not recognized - but it should be
K3X6 3GM0 LM15# // not recognized - but it should be
K4X6 4GM1 LM16* // not recognized - but it should be
K5X65GM2LM17
bla bla bla
this shouldn't be visible
spod also shouldn't be visible
but line below should be!!
K9X66GM6LM11! (see that "!" at the end? Help me with this)
Correct preg_match_all should returns this:
K9X6
6GM6
LM11
K9X6
6GM6
LM11
K0X6
0GM7
LM12
K1X6
1GM8
LM13
K2X6
2GM9
LM14
K3X6
3GM0
LM15
K4X6
4GM1
LM16
K5X65GM2LM17
K9X66GM6LM11
working example: http://www.phpliveregex.com/p/bHX

The following should do the trick:
\b(?:(?=.{0,3}?\d)[A-Za-z\d]{4}\s??){3}\b
Demo
[A-Za-z\d]{4} matches 4 letters/digits
(?=.{0,3}?\d) checks there's a digit in these 4 characters
\s?? matches a whitespace character, but tries not to match it if possible
\b makes sure everything isn't contained in a larger word
Note that this will allow strings like K2X6 2GM9LM14, I'm not sure whether you want these to match or not.

Related

Regex cant limit search range

I have following problem:
I have a pattern like this:
/(?<=template=")(.*?)(.*\/)/gm
And an text like this:
template="test/widgets/glasgow.phtml"}}
My regex should search for the path infront of my file, i need to cut it out so that it will look at the end like this:
template="glasgow.phtml"}}
That works fine but the problem is that i have sometimes an text that looks like this:
block="core/template" template="test/widgets/getcallus.phtml"}}</p>
It cuts everything out till the </.
This is getting cutted out:
test/widgets/getcallus.phtml"}}</
Instead of:
test/widgets/
I have tried to limit the end with $ but it doesnt do nothing.
I am testing it on regexr.com
https://regexr.com/50hi2
You may use the following pattern:
template="\K[^"\/]*\/[^"\/]*\/
See the regex demo. In PHP, you may get rid of backslashes if you specify another regex delimiter:
$regex = '~template="\K[^"/]*/[^"/]*/~';
Details
template=" - literal text
\K - match reset operator
[^"\/]* - 0 or more chars other than / and "
\/ - a / char
[^"\/]* - 0 or more chars other than / and "
\/ - a / char
It is equal to template="\K(?:[^"\/]*\/){2}, where (?:...){2} repeats the non-capturing group sequence of patterns twice.
Be careful with (.*?)(.*\/)
This pattern corresponds to a REDOS vulnerability. (There are 2^n ways to read the n chars before the last /...
To keep a regex closed to yours, you can use
/(?<=template=")([^"]*?\/)*([^"]*)"/
([^"]*?\/)* reads as many blocks "non / nor " chars followed by /" as possible.
https://regex101.com/r/SMSv5R/2

regex to remove complete HTML entity

We have a requirement to remove special characters from text strings. For example, we may get a string that looks like this; the ® is the registered trademark symbol:
PEPSI® Bottle 20 oz<br><br>
I'm not great with regex, and can't figure out how to edit the existing code to produce that.
Here's what we currently have:
$ui = "PEPSI Bottle 20 oz<br><br>";
$ui = preg_replace('/[^A-Za-z0-9\.\' -]/', '', $ui);
This results in PEPSI174 Bottle 20 ozbrbr.
Our desired result is PEPSI Bottle 20 oz<br><br>.
How can I edit the regex to make sure that
It doesn't remove valid HTML tags like <br>, and
If it does find a special character entity, it removes not only the special characters (the & and #), but also the numbers and semicolon?
We don't want to have it remove all the numbers, as obviously the string can contain numbers; it's only numbers that are part of the entity code that we need to remove.
You could use this but now I can't guaranty it covers all the possible HTML entities:
$res = preg_replace('/&[A-Za-z0-9#]+;/', '', $ui);
That says replace any substring that:
- starts with &
- followed by any number of alphanumeric characters or # in random order
- followed by ;.

Regex starts with x or x prefixed or suffixed

I'm trying to get pattern match for string like the following to convert every line into a list item <li>:
-Give on result
&Second new text
-The third text
Another paragraph without list.
-New list here
In natural language: Match every string that starts with - and ended with the new line sign \n
I tried the following pattern that works fine:
/^([-|-]\w+\s*.*)?\n*$/gum
Of course we can write it simply without the square brackets ^(-\w+\s*.*)?\n*$ but for debugging I used it as described.
In the example above, when I replaces the second - with & to be ^([-|&]\w+\s*.*)?\n*$ It works fine too and it mtaches the the second line of the smaple string. However, I could not able to make it matches - prefixed with white space or suffixed with white space.
I changed the sample string to:
- Give on result
&Second new text
-The third text
Another paragraph without list.
-New list here
and I tried the following pattern:
/^([-|\- |&| -]\w+\s*.*)?\n*$/gum
However, it failed to match any suffixed or prefixed - with white space.
Here are a live demo for the original working pattern:
To my understanding, what you want is having a line that starts with an element e (e being & or -), with element being either prefixed/suffixed by space(s).
^\s*[&-]\s*(.*)$
If you do not want multilines, simply do not use the m modifier.
^(\h*(?:-|&)\h*\w+\s*.*)\n*$
You can try this.| inside [] has no special meaning.See demo.
https://regex101.com/r/nS2lT4/3
A string may start with whitespace, then it should have either - or & which may have spaces ahead. Then it should have at least one alphanumeric characters which may have space ahead. Then it can have anything or nothing. In the end, it will eat up all the newlines it consume or none if it can't.

Search a String for Alpha Numeric Characters in a Pattern

I have a string that contains 5 words. In the string one of the words is a Ham Radio Call Sign and can be anyone of the thousands of call signs in the US. In order to extract the Call Sign from the string I need to utilize the below pattern. The Call Sign I need to extract can be in any of the 5 positions in the string. The number is never the first character and the number is never the last character. The string is actually put together from an Array since it is originally read from a text file.
$string = $word[1] $word[2] $word[3] etc....
So the search can be either done on the whole string or each piece of the array.
Patterns:
1 Number and 3 Letters Example: AB4C A4BC
1 Number and 4 Letters Example: A4BCD
1 Number and 5 Letters Example: AB4CDE
I have tried everything I can think of and search till I cant search no more. I am sure I am over thinking this.
A two-step regular expression like this would do it:
$str = "hello A4AB there BC5AD";
$signs = array();
preg_match_all('/[A-Z][A-Z\d]{1,3}[A-Z]/', $str, $possible_signs);
foreach($possible_signs[0] as $possible_sign)
if (preg_match('/^\D+\d\D+$/', $possible_sign))
array_push($signs, $possible_sign);
print_r($signs); //Array ([0] => A4AB [1] => BC5AD)
Explanation
This is a regular expression approach, using two patterns. I don't think it could be done with one and still satisfy the exact requirements of the matching rules.
The first pattern enforces the following requirements:
substring starts and ends with a capital letter
substring contains only other capital letters or numbers between the first and last letter
substring is, overall, not more than 6 characters long
What I can't do in that same pattern, for complex REGEX reasons I won't go into (unless someone knows a way and can correct me), is enforce that only one number is contained.
#jeroen's answer does enforce this in a single pattern, but in turn does not enforce the correct length of the substring. Either way, we need a second pattern.
So after grabbing the initial matches, we loop over the results. We then apply each to a second pattern that enforces simply that there is only one number in the substring.
If so, we green-light the substring and it's added to the $signs array.
Hope this helps.
It depends on what the other words can contain, but you could use a regular expression like:
#\b[a-z]+\d[a-z]+\b#i
^ case insensitive
^^ a word boundary
^^^^^^ One or more letters
^^ One number
You can make it more restrictive by using {1,3} instead of + for the letters so that you have a sequence of 1 to 3 letters.
The complete expression would be something like:
$success = preg_match('#\b[a-z]+\d[a-z]+\b#i', $input_string, $matches);
where $matches[0] will contain the matched value, see the manual.

PHP Regex - Finding two consecutive words with unknown number of spaces (" ") between them

I am trying to create a PHP REGEX that will match if two words appear next to each other with ANY number of spaces between them.
For example, match "Daniel Baylis" (3 spaces between 'Daniel' and 'Baylis'). I tried with this but it doesn't seem to work:
"/DANIEL[ ]{1,5}BAYLIS/" (this was to check up to 5 spaces which the most I expect in the data)
and
"/DANIEL[ ]{*}BAYLIS/"
I need to extract names from within larger bodies of text and names can appear anywhere within that text. User input error is what creates the multiple spaces.
Thanks all! - Dan
/DANIEL[ ]+BAYLIS/ should do... + will glob one or more occurence of the previous character(-class), in this case, litteral space.
Also, assuming you want to match regardless of the case, you'll need to adjust your regex to be case-insensitive, which I'm not sure how to do in PHP (it depends on which flavor of regex you use... Long time since I last touched that...)

Categories