Regular Expression Help Needed - Advanced Search and Replace - php

I have a string like
"'Joe'&#[Uk Customers.First Name](contact:16[[---]]first_name) +#[Uk Customers.Last Name](contact:16[[---]]last_name)"
My requirement is start finding the pattern
#[A.B](contact:**digit**[[---]]**field**)
There can be many pattern in single string.
and replace it with a new string (entire pattern should be replaced) with a dynamic text generated by digit and field value
For an example for above string there are two matches
1st match is : array(digit => 16, field =>first_name)
2nd match is : array(digit => 16, field =>last_name)
and somewhere I have few rules which are
if digit is 16 and field is first_name replace pattern with "John"
if digit is 16 and field is last_name replace pattern with "Doe"
so the output string will be "'Joe'&John+Doe"
Thanks in advance.

The matching part is fairly straightforward. This will do the trick:
#\[[^.]+\.[^.]+\]\(contact:(\d+)\[\[---\]\]([^)]+)\)
Debuggex Demo
Regex101 Demo
In PHP (and other languages that support named capture groups), you can do this to get the array to contain keys "digit" and "field":
#\[[^.]+\.[^.]+\]\(contact:(?<digit>\d+)\[\[---\]\](?<field>[^)]+)\)
Example PHP code:
$regex = '/#\[[^.]+\.[^.]+\]\(contact:(?<digit>\d+)\[\[---\]\](?<field>[^)]+)\)/';
$text = '"\'Joe\'&#[Uk Customers.First Name](contact:16[[---]]first_name) +#[Uk Customers.Last Name](contact:16[[---]]last_name)"';
preg_match_all($regex, $text, $matches, PREG_SET_ORDER);
var_dump($matches);
Result:
array(2) {
[0]=>
array(5) {
[0]=>
string(55) "#[Uk Customers.First Name](contact:16[[---]]first_name)"
["digit"]=>
string(2) "16"
[1]=>
string(2) "16"
["field"]=>
string(10) "first_name"
[2]=>
string(10) "first_name"
}
[1]=>
array(5) {
[0]=>
string(53) "#[Uk Customers.Last Name](contact:16[[---]]last_name)"
["digit"]=>
string(2) "16"
[1]=>
string(2) "16"
["field"]=>
string(9) "last_name"
[2]=>
string(9) "last_name"
}
}
I'm not real clear on the logic you want to use for the replacement, so I'm afraid I can't help there without some clarification.

Related

Conditional regex pattern for preg_match_all PHP

I have a pattern. Whenever a specific matching group is not present, it skips and find another match even if it skips the next matching group.
There are 4 capturing group.
first group, 2nd group, 3rd group, 4th group
3rd group is not always there. In my sample string, there are 3 sets. The first one does not contain any character for the 3rd group. I want a conditional statement for the 3rd group. If it does not found any character, then it should capture blank or space.
Demo: https://regex101.com/r/zK0aW4/1
it should be like this: https://regex101.com/r/sD4eB7/1
but I don't know how to assign condition for this.
If third match is not present then it should get blank. How do I write this in regex pattern?
For example:
$string = "\nTHIS IS FIRST PATTERN 63101 0789158126 0-0000000-000-0000\n4415 THIS IS FIRST \nPATTERN 49401-9528\n0406842931 Third match 0-0000000-000-0000\n11403 THIS IS FIRST PATTERN 49401-\n9595\n0112853789 Third match 0-0000000-000-0000";
preg_match_all(
"/([A-Z ,\.\-\&#\\\\n\/0-9&]+)(\d{10})([A-Z a-z]+)(\d{1}-\d{7}-\d{3}-\d{4}|\d{1}-\d{7}-\d{2}-\d{4})/",
$string,
$matches
);
This should output something like:
array(3) {
[0]=>
array(3) {
[0]=>
string(78) "\nTHIS IS FIRST PATTERN 63101 0789158126 0-0000000-000-0000"
[1]=>
string(84) "\n4415 THIS IS FIRST \nPATTERN 49401-9528\n0406842931 Third match 0-0000000-000-0000"
[2]=>
string(87) "\n11403 THIS IS FIRST PATTERN 49401-\n9595\n0112853789 Third match 0-0000000-000-0000"
}
[1]=>
array(5) {
[0]=>
string(36) "\nTHIS IS FIRST PATTERN 63101"
[1]=>
string(42) "\n4415 THIS IS FIRST \nPATTERN 49401-9528\n"
[2]=>
string(45) "\n11403 THIS IS FIRST PATTERN 49401-\n9595\n"
}
[2]=>
array(3) {
[0]=>
string(10) "0789158126"
[1]=>
string(10) "0406842931"
[2]=>
string(10) "0112853789"
}
[3]=>
array(3) {
[0]=>
string(15) " "
[1]=>
string(15) " Third match "
[2]=>
string(15) " Third match "
}
[4]=>
array(3) {
[0]=>
string(17) "0-0000000-000-0000"
[1]=>
string(17) "0-0000000-000-0000"
[2]=>
string(17) "0-0000000-000-0000"
}
}
Try this: https://regex101.com/r/zK0aW4/2
((?:[A-Z ,.&#\/0-9-]|&|\\n)+?)(\d{10})([A-Z a-z]+)?(\d{1}-\d{7}-\d{3}-\d{4}|\d{1}-\d{7}-\d{2}-\d{4})
Because your initial group has so many matches it was extending too far. By changing to a non-greedy or lazy match (*? or +?) it will match as little as possible. This makes it behave better with the following patterns.
Character classes (surrounded by [ and ]) are for matching single characters; I assumed that you wanted to match only a literal & and \n, so moved those out of the character class.

Regex to split numbers from letters

I want to split number and letters from string but have problem .
Inputs like:
input example 1 : A5
input example 2 : C16
input example 3 : A725
input example 4 : X05
Result must be:
Result example 1 :'A','5'
Result example 2 : 'C','16'
Result example 3 : 'A','725'
Result example 4 : 'X','05'
I try to it with belo regex but don't give a good result :
preg_split('/(?=\d+)/', $input)
You also need to add a negative look-behind to make sure the empty string that is chosen is not somewhere in the middle of two digits.
Currently for string A725, your regex will split on the empty string before 7, 2 and 5, as all of them are followed by at least one digit.
You can use this regex:
preg_split('/(?<!\d)(?=\d+)/', $input)
You can use:
$s = 'A5,C16,A725,X05';
if (preg_match_all("~(?>[a-z]+|\d+)~i", $s, $arr))
var_dump($arr[0]);
gives:
array(8) {
[0]=>
string(1) "A"
[1]=>
string(1) "5"
[2]=>
string(1) "C"
[3]=>
string(2) "16"
[4]=>
string(1) "A"
[5]=>
string(3) "725"
[6]=>
string(1) "X"
[7]=>
string(2) "05"
}

preg_split using PREG_SPLIT_DELIM_CAPTURE

I was looking to split a string based on a regular expression but I also have interest in keeping the text we split on:
php > var_dump(preg_split("/(\^)/","category=Telecommunications & CATV^ORcategory!=ORtest^caused_byISEMPTY^EQ"), null, PREG_SPLIT_DELIM_CAPTURE);
array(4) {
[0]=> string(34) "category=Telecommunications & CATV"
[1]=> string(18) "ORcategory!=ORtest"
[2]=> string(16) "caused_byISEMPTY"
[3]=> string(2) "EQ"
}
NULL
int(2)
What I do not understand is why am I not getting an array such as:
array(4) {
[0]=> "category=Telecommunications & CATV"
[1]=> "^"
[2]=> "ORcategory!=ORtest"
[3]=> "^"
[4]=> "caused_byISEMPTY"
[5]=> "^"
[6]=> "EQ"
}
Additionally, how could I change my regular expression to match "^OR" and also "^". I was having trouble with a lookbehind assertion such as:
$regexp = "/(?<=\^)OR|\^/";
This will work as expected:
var_dump(preg_split('/(\^)/','category=Telecommunications & CATV^ORcategory!=ORtest^caused_byISEMPTY^EQ', -1, PREG_SPLIT_DELIM_CAPTURE));
the closing bracket of preg_split() is at the wrong place.
additional question:
/(\^OR|\^)/

preg_match not returning expected results

I'm attempting to use regexp to parse a search string that from time to time may contain special syntax. The syntax im looking for is [special keyword : value] and i want each match put into an array. Keep in mind that the search string will contain other text that is not intended to be parsed.
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
$specialKeywords = array();
preg_match("/\[{1}.+\:{1}.+\]{1}/", $searchString, $specialKeywords);
var_dump($specialKeywords);
Output:
array(1) { [0]=> string(43) "[StartDate:2010-11-01] [EndDate:2010-11-31]" }
Desired Output:
array(2) { [0]=> string() "[StartDate:2010-11-01]"
[1]=> string() "[EndDate:2010-11-01]"}
Please let me know if i am not being clear enough.
Your .+ matches across the boundaries between the two [...] parts because it matches any character, and as many of them as possible. You could be more restrictive about which characters may be matched. Also {1} is redundant and can be dropped.
/\[[^:]*:[^\]]*\]/
should work more reliably.
Explanation:
\[ # match a [
[^:]* # match any number of characters except :
: # match a :
[^\]]* # match any number of characters except ]
\] # match a ]
This:
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
preg_match_all('/\[.*?\]/', $searchString, $match);
print_r($match);
gives the expected result, I'm not sure if it matches all the constraints.
Try the following:
$searchString = "[StartDate:2010-11-01][EndDate:2010-11-31]";
$specialKeywords = array();
preg_match_all("/\[\w+:\d{4}-\d\d-\d\d\]/i", $searchString, $specialKeywords);
var_dump($specialKeywords[0]);
Outputs:
array(2) {
[0]=>
string(22) "[StartDate:2010-11-01]"
[1]=>
string(20) "[EndDate:2010-11-31]"
}
Use this regex: "/\[(.*?)\:(.*?)\]{1}/" and also use preg_match_all, it will return
array(3) {
[0]=>
array(2) {
[0]=>
string(22) "[StartDate:2010-11-01]"
[1]=>
string(20) "[EndDate:2010-11-31]"
}
[1]=>
array(2) {
[0]=>
string(9) "StartDate"
[1]=>
string(7) "EndDate"
}
[2]=>
array(2) {
[0]=>
string(10) "2010-11-01"
[1]=>
string(10) "2010-11-31"
}
}
/\[.+?\:.+?\]/
I suggest this method, less complex but it handles the same as tim's

Does anyone see why my preg_match RegEx is not returning results?

I was just helped in another thread with a regex that has been verified to work. I can see it actually working on Rubular but when I plug the regex into preg_match, I get absolutely nothing.
Here is the regex with my preg_match function:
preg_match('/^!!([0-9]{5}) +.*? +[MF] ([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) + ([A-Z])[0-9]{3}([0-9]{4})([0-9]{2})([0-9]{2})/', $res, $matches);
All I am getting is an empty array returned.
The problem is that you have added two extra spaces into the regular expression that should not be there and that cause the match to fail.
/^!!([0-9]{5}) +.*? +[MF] ([0-9]{3})([0-9]{3})([A-Z]{3})([A-Z]{3}) + ([A-Z])...
^ ^
here and here
Whitespace is significant (by default) in regular expressions. A space in a regular expression matches a space in the target string. Removing these two spaces fixes the problem.
See it working on ideone (this time it is a PHP example).
array(10) {
[0]=>
string(39) "!!92519 C 01 M600200BLNBRN D55420090205"
[1]=>
string(5) "92519"
[2]=>
string(3) "600"
[3]=>
string(3) "200"
[4]=>
string(3) "BLN"
[5]=>
string(3) "BRN"
[6]=>
string(1) "D"
[7]=>
string(4) "2009"
[8]=>
string(2) "02"
[9]=>
string(2) "05"
}

Categories