How to do preg_replace that only matches particular conditions? - php

I am struggling to write a preg_replace command that achieves what I need.
Essentially I have the following array (all the items follow one of these four patterns):
$array = array('Dogs/Cats', 'Dogs/Cats/Mice', 'ANIMALS/SPECIES Dogs/Cats/Mice', '(Animals/Species) Dogs/Cats/Mice' );
I need to be able to get the following result:
Dogs/Cats = Dogs or Cats
Dogs/Cats/Mice = Dogs or Cats or Mice
ANIMALS/SPECIES Dogs/Cats/Mice = ANIMALS/SPECIES Dogs or Cats or Mice
(Animals/Species) Dogs/Cats/Mice = (Animals/Species) Dogs or Cats or Mice
So basically replace slashes in anything that isn't capital letters or brackets.
I am starting to grasp it but still need some guidance:
preg_replace('/(\(.*\)|[A-Z]\W[A-Z])[\W\s\/]/', '$1 or', $array);
As you can see this recognises the first patterns but I don't know where to go from there
Thanks!

You might use the \G anchors to assert the position at the previous match and use \K to forget what was matched to match only a /.
You could optionally match ANIMALS/SPECIES or (Animals/Species) at the start.
(?:^(?:\(\w+/\w+\)\h+|[A-Z]+/[A-Z]+\h+)?|\G(?!^))\w+\K/
Explanation
(?: Non capturing group
^ Assert start of string
(?: Non capturing group, match either
\(\w+/\w+\)\h+ Match between (....) 1+ word chars with a / between ending with 1+ horizontal whitespace chars
| Or
[A-Z]+/[A-Z]+\h+ Match 1+ times [A-Z], / and again 1+ times [A-Z]
)? Close non capturing group and make it optional
| Or
\G(?!^) Assert position at the previous match
)\w+ Close non capturing group and match 1+ times a word char
\K/ Forget what was matched, and match a /
Regex demo | Php demo
In the replacement use a space, or and a space
For example
$array = array('Dogs/Cats', 'Dogs/Cats/Mice', 'ANIMALS/SPECIES Dogs/Cats/Mice', '(Animals/Species) Dogs/Cats/Mice');
$re = '~(?:^(?:\(\w+/\w+\)\h+|[A-Z]+/[A-Z]+\h+)?|\G(?!^))\w+\K/~';
$array = preg_replace($re, " or ", $array);
print_r($array);
Result:
Array
(
[0] => Dogs or Cats
[1] => Dogs or Cats or Mice
[2] => ANIMALS/SPECIES Dogs or Cats or Mice
[3] => (Animals/Species) Dogs or Cats or Mice
)

The way you present your problem with your example strings, doing:
$result = preg_replace('~(?:\S+ )?[^/]*+\K.~', ' or ', $array);
looks enough. In other words, you only have to check if there's a space somewhere to consume the beginning of the string until it and to discard it from the match result using \K.
But to avoid future disappointments, it is sometimes useful to put yourself in the shoes of the Devil to consider more complex cases and ask embarrassing questions:
What if a category, a subcategory or an item contains a space?
~
(?:^
(?:
\( [^)]* \)
|
\p{Lu}+ (?> [ ] \p{Lu}+ \b )*
(?> / \p{Lu}+ (?> [ ] \p{Lu}+ \b )* )*
)
[ ]
)?
[^/]*+ \K .
~xu
demo
In the same way, to deal with hyphens, single quotes or whatever, you can replace [ ] with [^\pL/] (a class that excludes letters and the slash) or something more specific.

Related

Extract Key from URL with Preg_Match in PHP

I haves this URL
https://test.com/file/5gdxyYpb#_FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE
I need to create with preg_mach this condition:
$match[0]=5gdxyYpb#_FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE
$match[1]=5gdxyYpb
$match[2]=_FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE
I try difference pattern the mos closed was this one. e\/(.*?)\#(.*).
Please any recommendation. (If necessary in Preg_Match).
Thank you,
You might use 2 capturing groups and make use of \K to not match the first part of the url to get the desired matches.
https?://.*/\K([^#\s]+)#(\S+)
https?:// Match the protocol with optional s, then ://
.*/ Match until the last occurrence of /
\K Forget what is matched until here
([^#\s]+) Capture group 1, match 1+ occurrences of any char except a # or whitespace char
# Match the #
(\S+) Capture group 2, match 1+ occurrences of a non whitespace char
Regex demo | Php demo
$url = "https://test.com/file/5gdxyYpb#_FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE";
$pattern = "~https?://.*/\K([^#]+)#(.*)~";
$res = preg_match($pattern, $url, $matches);
print_r($matches);
Output
Array
(
[0] => 5gdxyYpb#_FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE
[1] => 5gdxyYpb
[2] => _FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE
)

Capturing groups in string using preg_match

I got in trouble parsing a text file in codeigniter, for each line in file I need to capture groups data...the data are:
- progressive number
- operator
- manufacturer
- model
- registration
- type
Here you are an example of the file lines
8 SIRIO S.P.A. BOMBARDIER INC. BD-100-1A10 I-FORZ STANDARD
9 ESERCENTE PRIVATO PIAGGIO AERO INDUSTRIES S.P.A. P.180 AVANTI II I-FXRJ SPECIALE/STANDARD
10 MIGNINI & PETRINI S.P.A. ROBINSON HELICOPTER COMPANY R44 II I-HIKE SPECIALE/STANDARD
11 MIGNINI & PETRINI S.P.A. ROBINSON HELICOPTER COMPANY R44 II I-HIKE STANDARD
12 BLUE PANORAMA AIRLINES S.P.A. THE BOEING COMPANY 737-86N I-LCFC STANDARD
To parse each line I'm using the following code:
if ($fh = fopen($filePath, 'r')) {
while (!feof($fh)) {
$line = trim(fgets($fh));
if(preg_match('/^(\d{1,})\s+(\w{1,})\s+(\w{1,})\s+(\w{1,})\s+(\w{1,})\s+(\w{1,})$/i', $line, $matches))
{
$regs[] = array(
'Operator' => $matches[1],
'Manufacturer' => $matches[2],
'Model' => $matches[3],
'Registration' => $matches[4],
'Type' => $matches[5]
);
$this->data['error'] = FALSE;
}
}
fclose($fh);
}
The code above doesn't work...I think because some groups of data are composed by more then one words...for example "SIRIO S.P.A."
Any hint to fix this?
Thanks a lot for any help
You should not use \w for capturing the data as some of the characters in your text like &, ., - and / are not part of word characters. Moreover some of them are space separated, so you should replace \w{1,} with \S+(?: \S+)* which will capture your text properly into groups you have made.
Try changing your regex to this and it should work,
^\s*(\d+)\s+(\S+(?: \S+)*)\s+(\S+(?: \S+)*)\s+(\S+(?: \S+)*)\s+(\S+(?: \S+)*)\s+(\S+(?: \S+)*)$
Check this demo
Explanation of what \S+(?: \S+)* does in above regex.
\S+ - \S is opposite of \s meaning it matches any non-whitespace (won't match a space or tab or newline or vertical space or horizontal space and in general any whitespace) character. Hence \S+ matches one or more visible characters
(?: \S+)* - Here ?: is only for turning a group as non-capture group and following it has a space and \S+ and all of it is enclosed in parenthesis with * quantifier. So this means match a space followed by one or more non-whitespace character and whole of it zero or more times as * quantifier is used.
So \S+(?: \S+) will match abc or abc xyz or abc pqr xyz and so on but the moment more than one space appears, the match stops as there is only a single space present in the regex before \S+
Hope my explanation is clear. If still any doubt, please feel free to ask.

regex expected value in a postion depends on a random value in another position

I need regex to find all shortcode tag pairs that look like this [sc1-g-data]b[/sc1-g-data] but the number next to the sc can vary but they must match.
So something like this won't work \[sc(.*?)\-((.|\n)*?)\[\/sc(.*?)\- as this matches unmatching tag pairs like this which i don't want [sc1-g-data]b[/sc2-g-data]
so the expected number in the second tag depends on a random number in the first tag
You may use a regex like:
\[(sc\d*-[^\]\[]*)\]([\s\S]*?)\[\/\1\]
See the regex demo
\[ - a [ char
(sc\d*-[^\]\[]*) - Capturing group 1: sc, 0+ digits, -, and then 0+ chars other than ] and [
\] - a ] char
([\s\S]*?) - Capturing group 2: any 0+ chars, as few as possible
\[\/ - a [/ string
\1 - the same text stored in Group 1
\] - a ] char
See the regex graph:
PHP demo:
$pattern = '~\[(sc\d*-[^][]*)](.*?)\[/\1]~s';
$string = '[sc1-g-data]a[/sc1-g-data] ';
if (preg_match($pattern, $string, $matches)) {
print_r($matches);
}
Mind the use of a single quoted string literal, if you use a double quoted one you will need to use \\1, not \1 as '\1' != "\1" in PHP.
Output:
Array
(
[0] => [sc1-g-data]a[/sc1-g-data]
[1] => sc1-g-data
[2] => a
)
If your tags are just anything between brackets [blah][/blah] you can use:
\[(.*?)\].*?\[\/\1\]

Regex of number inside brackets

I need to get the float number inside brackets..
I tried this '([0-9]*[.])?[0-9]+' but it returns the first number like 6 in the first example.
Also I tried this
'/\((\d+)\)/'
but it returns 0.
Please note that I need the extracted number either int or float.
Can u plz help
As you need to match bracket also, You need to add () in regular expression:
$str = 'Serving size 6 pieces (40)';
$str1 = 'Per bar (41.5)';
preg_match('#\(([0-9]*[.]?[0-9]+)\)#', $str, $matches);
print_r($matches);
preg_match('#\(([0-9]*[.]?[0-9]+)\)#', $str1, $matches);
print_r($matches);
Output:
Array
(
[0] => (40)
[1] => 40
)
Array
(
[0] => (41.5)
[1] => 41.5
)
DEMO
You could escape brackets:
$str = 'Serving size 6 pieces (41.5)';
if (preg_match('~\((\d+.?\d*)\)~', $str, $matches)) {
print_r($matches);
}
Outputs:
Array
(
[0] => (41.5)
[1] => 41.5
)
Regex:
\( # open bracket
( # capture group
\d+ # one or more numbers
.? # optional dot
\d* # optional numbers
) # end capture group
\) # close bracket
You could also use this to get only one digit after the dot:
'~\((\d+.?\d?)\)~'
You need to escape the brackets
preg_match('/\((\d+(?:\.\d+)?)\)/', $search, $matches);
explanation
\( escaped bracket to look for
( open subpattern
\d a number
+ one or more occurance of the character mentioned
( open Group
?: dont save data in a subpattern
\. escaped Point
\d a number
+ one or more occurance of the character mentioned
) close Group
? one or no occurance of the Group mentioned
) close subpattern
\) escaped closingbracket to look for
matches numbers like
1,
1.1,
11,
11.11,
111,
111.111 but NOT .1, .
https://regex101.com/r/ei7bIM/1
You could match an opening parenthesis, use \K to reset the starting point of the reported match and then match your value:
\(\K\d+(?:\.\d+)?(?=\))
That would match:
\( Match (
\K Reset the starting point of the reported match
\d+ Match one or more digits
(?: Non capturing group
\.\d+ Match a dot and one or more digits
)? Close non capturing group and make it optional
(?= Positive lookahead that asserts what follows is
\) Match )
) Close posive lookahead
Demo php

Regex/PHP Replace any repeating (but flexible) word group

How can I match "Any Group" repeated as "ANY GROUP" or "ANYGROUP"
$string = "Foo Bar (Any Group - ANY GROUP Baz)
Foo Bar (Any Group - ANYGROUP Baz)";
so they return as "Foo Bar (Any Group - Baz)"
The separator would always be -
This post extends Regex/PHP Replace any repeating word group
This matches "Any Group - ANY GROUP" but not when repeated without blank.
$result = preg_replace(
'%
( # Match and capture
(?: # the following:...
[\w/()]{1,30} # 1-30 "word" characters
[^\w/()]+ # 1 or more non-word characters
){1,4} # 1 to 4 times
) # End of capturing group 1
([ -]*) # Match any number of intervening characters (space/dash)
\1 # Match the same as the first group
%ix', # Case-insensitive, verbose regex
'\1\2', $subject);
This is ugly (as I said it would be), but it should work:
$result = preg_replace(
'/((\b\w+)\s+) # One repeated word
\s*-\s*
\2
|
((\b\w+)\s+(\w+)\s+) # Two repeated words
\s*-\s*
\4\s*\5
|
((\b\w+)\s+(\w+)\s+(\w+)\s+) # Three
\s*-\s*
\7\s*\8\s*\9
|
((\b\w+)\s+(\w+)\s+(\w+)\s+(\w+)\s+) # Four
\s*-\s*
\11\s*\12\s*\13\s*\14\b/ix',
'\1\3\6\10-', $subject);
Up to 6 word(s) solution is:
$result = preg_replace(
'/
(\(\s*)
(([^\s-]+)
\s*?([^\s-]*)
\s*?([^\s-]*)
\s*?([^\s-]*)
\s*?([^\s-]*)
\s*?([^\s-]*))
(\s*\-\s*)
\3\s*\4\s*\5\s*\6\s*\7\s*\8\s*
/ix',
'\1\2\9',
$string);
Check this demo.

Categories