php string parsing to array - php

I have a string that looks like:
'- 10 TEST (FOO 3 TEST.BAR 213 BAZ (\HELLO) TEST'
Same format for any other string. How could I get, the value of FOO, BAR and BAZ. So, for instance, I can get an array as:
'FOO' => 3,
'BAR' => 213,
'BAZ' => HELLO

preg_match is your friend :)

You want to use preg_match to first grab the matches and then put them into an array. This will give you what you are looking for:
$str = '- 10 TEST (FOO 3 TEST.BAR 213 BAZ (\HELLO) TEST';
preg_match('/FOO (\d+).+BAR (\d+).+BAZ \(\\\\(\w+)\)/i', $str, $match);
$array = array(
'FOO' => $match[1],
'BAR' => $match[2],
'BAZ' => $match[3]
);
print_r($array);
This is assuming though that the first two values are numbers and the last is word characters.

Assuming neither the constants nor the values have space in them,
this will work for the given example :
$str = '(CONST1 value1 X.CONST2 value2 CONST3 (\VALUE3) X...)';
preg_match('/\((\S+)\s+(\S+)\s+.*?\.(\S+)\s+(\S+)\s+(\S+)\s+\(\\\\(\S+)\)/', $str, $m);
$arr = array();
for($i=1; $i<count($m);$i+=2) {
$arr[$m[$i]] = $m[$i+1];
}
print_r($arr);
output:
Array
(
[CONST1] => value1
[CONST2] => value2
[CONST3] => VALUE3
)
explanation
\( : an open parenthesis
(\S+) : 1rst group, CONST1, all chars that are not spaces
\s+ : 1 or more spaces
(\S+) : 2nd group, value1
\s+.*?\. : some spaces plus any chars plus a dot
(\S+) : 3rd group, CONST2
\s+ : 1 or more spaces
(\S+) : 4th group, value2
\s+ : 1 or more spaces
(\S+) : 5th group, CONST3
\s+ : 1 or more spaces
\( : an open parenthesis
\\\\ : backslash
(\S+) : 6th group, VALUE3
\) : a closing parenthesis

Related

Finding sentences between characters

I am trying to find sentences between pipe | and dot ., e.g.
| This is one. This is two.
The regex pattern I use :
preg_match_all('/(:\s|\|+)(.*?)(\.|!|\?)/s', $file0, $matches);
So far I could not manage to capture both sentences. The regex I use captures only the first sentence.
How can I solve this problem?
EDIT: as it may seen from the regex, I am trying to find the sentences BETWEEN (: or |) AND (. or ! or ?)
Column or pipe indicates starting point for sentences.
The sentences might be:
: Sentence one. Sentence two. Sentence three.
| Sentence one. Sentence two?
| Sentence one. Sentence two! Sentence three?
I would keep it simple and just match on:
\s*[^.|]+\s*
This says to match any content not consisting of pipes or full stops, and it also trims optional whitespace before/after each sentence.
$input = "| This is one. This is two.";
preg_match_all('/\s*[^.|]+\s*/s', $input, $matches);
print_r($matches[0]);
This prints:
Array
(
[0] => This is one
[1] => This is two
)
This does the job:
$str = '| This is one. This is two.';
preg_match_all('/(?:\s|\|)+(.*?)(?=[.!?])/', $str, $m);
print_r($m)
Output:
Array
(
[0] => Array
(
[0] => | This is one
[1] => This is two
)
[1] => Array
(
[0] => This is one
[1] => This is two
)
)
Demo & explanation
Another option is to make use of \G to get iterative matches asserting the position at the end of the previous match and capture the values in a capturing group matching a dot and 0+ horizontal whitespace chars after.
(?:\|\h*|\G(?!^))([^.\r\n]+)\.\h*
In parts
(?: Non capturing group
\|\h* Match | and 0+ horizontal whitespace chars
| Or
\G(?!^) Assert position at the end of previous match
) Close group
( Capture group 1
- [^.\r\n]+ Match 1+ times any char other than . or a newline
) Close group
\.\h* Match 1 . and 0+ horizontal whitespace chars
Regex demo | Php demo
For example
$re = '/(?:\|\h*|\G(?!^))([^.\r\n]+)\.\h*/';
$str = '| This is one. This is two.
John loves Mary.| This is one. This is two.';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches);
Output
Array
(
[0] => Array
(
[0] => | This is one.
[1] => This is one
)
[1] => Array
(
[0] => This is two
[1] => This is tw
)
)
To keep it simple, find everything between | and . and then split:
$input = "John loves Mary. | This is one. This is two. | Sentence 1. Sentence 2.";
preg_match_all('/\|\s*([^|]+)\./', $input, $matches);
if ($matches) {
foreach($matches[1] as $match) {
print_r(preg_split('/\.\s*/', $match));
}
}
Prints:
Array
(
[0] => This is one
[1] => This is two
)
Array
(
[0] => Sentence 1
[1] => Sentence 2
)

Validate url parameters with preg_match

Valid example
12[red,green],13[xs,xl,xxl,some other text with chars like _&-##%]
number[anythingBut ()[]{},anythingBut ()[]{}](,number[anythingBut ()[]{},anythingBut ()[]{}]) or nothing
Full match 12[red,green]
Group 1 12
Group 2 red,green
Full match 13[xs,xl,xxl,some other text with chars like _&-##%]
Group 1 13
Group 2 xs,xl,xxl,some other text with chars like _&-##%
Not valid example
13[xs,xl,xxl 9974-?ds12[dfgd,dfgd]]
What I tried is this: (\d+(?=\[))\[([^\(\[\{\}\]\)]+)\], regex101 link with what I tried, but this also matches wrong input like given in the example.
If you just need to validate the input, you can add some anchors:
^(?:\d+\[[^\(\[\{\}\]\)]+\](?:,|$))+$
Regex101
If you also need to get all the matching parts, you can use another regex. Using only one will not work well.
$in = '12[red,green],13[xs,xl,xxl,some other text with chars like _&-##%],13[xs,xl,xxl 9974-?ds12[dfgd,dfgd]]';
preg_match_all('/(\d+)\[([^][{}()]+)(?=\](?:,|$))/', $in, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => 12[red,green
[1] => 13[xs,xl,xxl,some other text with chars like _&-##%
)
[1] => Array
(
[0] => 12
[1] => 13
)
[2] => Array
(
[0] => red,green
[1] => xs,xl,xxl,some other text with chars like _&-##%
)
)
Explanation:
/ : regex delimiter
(\d+) : group 1, 1 or more digits
\[ : open square bracket
( : start group 2
[^][{}()]+ : 1 or more any character that is not open or close parenthesis, brackets or square brackets
) : end group 2
(?= : positive lookahead, make sure we have after
\] : a close square bracket
(?:,|$) : non capture group, a comma or end of string
) : end group 2
/ : regex delimiter

PHP Regex Word Boundary exclude underscore _

I'm using regex word boundary \b, and I'm trying to match foo in the following $sentence but the result is not what I need, the underscore is killing me, I want underscore to be word boundary just like hyphen or space:
$sentence = "foo_foo_foo foo-foo_foo";
X X X YES X X
Expected:
$sentence = "foo_foo_foo foo-foo_foo";
YES YES YES YES YES YES
My code:
preg_match("/\bfoo\b/i", $sentence);
You would have to create DIY boundaries.
(?:\b|_\K)foo(?=\b|_)
Does this do what you want?:
preg_match_all("/foo/i", $sentence, $matches);
var_dump($matches);
You can subtract _ from the \w and use unambiguous word boundaries:
/(?<![^\W_])foo(?![^\W_])/i
See this regex demo. Note \bfoo = (?<!\w)foo and foo(?!\w) = foo\b, and subtracting a _ from \w (that is equal to [^\W]) results in [^\W_].
In PHP, you can use preg_match_all to find all occurrences:
preg_match_all("/(?<![^\W_])foo(?![^\W_])/i", $sentence)
To replace / remove all occurrences, you may use preg_replace:
preg_replace("/(?<![^\W_])foo(?![^\W_])/i", "YES", $sentence)
See the PHP demo online:
$sentence = "foo_foo_foo foo-foo_foo";
if (preg_match_all("/(?<![^\W_])foo(?![^\W_])/i", $sentence, $matches)) {
print_r($matches[0]);
}
// => Array( [0] => foo [1] => foo [2] => foo [3] => foo [4] => foo [5] => foo)
echo PHP_EOL . preg_replace("/(?<![^\W_])foo(?![^\W_])/i", "YES", $sentence);
// => YES_YES_YES YES-YES_YES

Regex to remove outer brackets

I have been using this /\(\s*([^)]+?)\s*\)/ regex to remove outer brackets with PHP preg_replace function (Read more in my previous question Regex to match any character except trailing spaces).
This works fine when there is only one pair of brackets, but problem is when there is more, for example ( test1 t3() test2) becomes test1 t3( test2) instead test1 t3() test2.
I am aware of regex limitations, but it would be nice if I could just make it not matching anything if there is more then one pair of brackets.
So, example behavior is good enough:
( test1 test2 ) => test1 test2
( test1 t3() test2 ) => (test1 t3() test2)
EDIT:
I would like to keep trimming trailing white spaces inside removed brackets.
You can use this recursive regex based code that will work with nested brackets also. Only condition is that brackets should be balanced.
$arr = array('Foo ( test1 test2 )', 'Bar ( test1 t3() test2 )', 'Baz ((("Fdsfds")))');
foreach($arr as $str)
echo "'$str' => " .
preg_replace('/ \( \s* ( ( [^()]*? | (?R) )* ) \s* \) /x', '$1', $str) . "\n";
OUTPUT:
'Foo ( test1 test2 )' => 'Foo test1 test2'
'Bar ( test1 t3() test2 )' => 'Bar test1 t3() test2'
'Baz ((("Fdsfds")))' => 'Baz (("Fdsfds"))'
Try this
$result = preg_replace('/\(([^)(]+)\)/', '$1', $subject);
Update
\(([^\)\(]+)\)(?=[^\(]+\()
RegEx explanation
"
\( # Match the character “(” literally
( # Match the regular expression below and capture its match into backreference number 1
[^\)\(] # Match a single character NOT present in the list below
# A ) character
# A ( character
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\) # Match the character “)” literally
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
[^\(] # Match any character that is NOT a ( character
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\( # Match the character “(” literally
)
"
You may want this (As I guess it is what you want originally):
$result = preg_replace('/\(\s*(.+)\s*\)/', '$1', $subject);
This would get
"(test1 test2)" => "test1 test2"
"(test1 t3() test2)" => "test1 t3() test2"
"( test1 t3(t4) test2)" => "test1 t3(t4) test2"

regex to match 3 parts from a given string

Example input:
hjkhwe5boijdfg
I need to split this into 3 variables as below:
hjkhwe5 (any length, always ends in some number (can be any number))
b (always a single letter, can be any letter)
oijdfg (everything remaining at the
end, numbers or letters in any combination)
I've got the PHP preg_match all setup but have no idea how to do this complex regex. Could someone give me a hand?
Have a try with:
$str = 'hjkhwe5boijdfg';
preg_match("/^([a-z]+\d+)([a-z])(.*)$/", $str, $m);
print_r($m);
output:
Array
(
[0] => hjkhwe5boijdfg
[1] => hjkhwe5
[2] => b
[3] => oijdfg
)
Explanation:
^ : begining of line
( : 1rst group
[a-z]+ : 1 or more letters
\d+ : followed by 1 or more digit
) : end of group 1
( : 2nd group
[a-z] : 1 letter
) : end group 2
( : 3rd group
.* : any number of any char
) : end group 3
$
You can use preg_match as:
$str = 'hjkhwe5boijdfg';
if(preg_match('/^(\D*\d+)(\w)(.*)$/',$str,$m)) {
// $m[1] has part 1, $m[2] has part 2 and $m[3] has part 3.
}
See it

Categories