Regex that recognizes everything except text between quotes? - php

I need to make a regex that recognizes everything except text between quotes.
Here is an example:
my_var == "Hello world!"
I want to get my_var but not Hello world!.
I tried (?<!\")([A-Za-z0-9]+) but it didn't work.

If you would of took the time to google or search stackoverflow, you would find answers to this question that have already been answered by not only me, but many other users out there.
#Pappa's answer using a negative lookbehind will only match a simple test case and not everything in a string that is not enclosed by quotes. I would suffice for a negative lookahead in this case, if you're wanting to match all word characters in any given data.
/[\w.-]+(?![^"]*"(?:(?:[^"]*"){2})*[^"]*$)/
See live demo
Example:
<?php
$text = <<<T
my_var == "Hello world!" foo /(^*#&^$
"hello" foobar "hello" FOO "hello" baz
Hi foo, I said "hello" $&#^$(#$)#$&*#(*$&
T;
preg_match_all('/[\w.-]+(?![^"]*"(?:(?:[^"]*"){2})*[^"]*$)/', $text, $matches);
print_r($matches);
Output
Array
(
[0] => Array
(
[0] => my_var
[1] => foo
[2] => foobar
[3] => FOO
[4] => baz
[5] => Hi
[6] => foo
[7] => I
[8] => said
)
)

You have an accepted answer but I am still submitting once since I believe this answer is better in capturing more edge cases:
$s = 'my_var == "Hello world!" foo';
if (preg_match_all('/[\w.-]+(?=(?:(?:[^"]*"){2})*[^"]*$)/', $s, $arr))
print_r($arr[0]);
OUTPUT:
Array
(
[0] => my_var
[1] => foo
)
This works by using a lookahead to make sure there are even # of double quotes are followed (requires balanced double quotes and no escaping).

As much as I'll regret getting downvoted for answering this, I was intrigued, so did it anyway.
(?<![" a-zA-Z])([A-Za-z0-9\-_\.]+)

This simple solution hasn't been mentioned (see demo):
"[^"]*"(*SKIP)(*F)|[\w.-]+
Reference
How to match pattern except in situations s1, s2, s3

Related

Extracting all the emojis from a string using REGEX

I have been trying to extract all the emojis from a string using a regex function listed below. However, this function is not accurate sometimes as it adds up additional emojis in the process.
The regex that I am using is this one:
preg_match_all('/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{1F000}-\x{1FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F9FF}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F9FF}][\x{1F000}-\x{1FEFF}]?/u', $string, $emojis);
When I try to print 'emojis[0]' after this, sometimes, it is not accurate.
For example,
CODE:
$string = "Get into it !!! 🤰🏻🍴";
preg_match_all('/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{1F000}-\x{1FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F9FF}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F9FF}][\x{1F000}-\x{1FEFF}]?/u', $string, $emojis);
print_r($emojis[0]);
OUTPUT:
Array ( [0] => 🤰 [1] => 🏻 [2] => 🍴 )
This is not expected as the second element in the above array was not in the inputted string.
Is this a REGEX issue? Is there any better REGEX for this? Or anything other than REGEX to extract emojis?
Your are dealing with "Fitzpatrick Modifiers".
I haven't had a close look at your regex pattern to make refinements, but I can offer a quick solution.
Use: (?:[\x{1f3fb}-\x{1f3ff}](*SKIP)(*FAIL))| at the start of your pattern disqualify the modifiers.
Code: (Demo)
$string = "Pregnant Woman: 🤰🏻 Pregnant Woman: 🤰 Fork and Knife: 🍴 Light Skin Tone: 🏻 (a pale skin tone modifier)";
//$string = "Get into it !!! 🤰🏻🍴";
preg_match_all('/(?:[\x{1f3fb}-\x{1f3ff}](*SKIP)(*FAIL))|[0-9|#][\x{20E3}]|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{1F000}-\x{1FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F9FF}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F9FF}][\x{1F000}-\x{1FEFF}]/u', $string, $emojis);
print_r($emojis[0]);
Output:
Array
(
[0] => 🤰
[1] => 🤰
[2] => 🍴
)

Regex to Match Passed Function/Method Parameters

I've had a good look around for a question that asked this before; alas, my search for a PHP preg_match search returned no results (maybe my searching skills fell short, I suppose justified considering it's a Regex question!).
Consider the text below:
The quick __("brown ") fox jumps __('over the') lazy __("dog")
Now currently I need to 'scan' for the given method __('') above, whereas it could include the spacing and different quotations ('|"). My best attempt after numerous 'iterations':
(__\("(.*?)"\))|(__\('(.*?)'\))
Or at its simplest form:
__\((.*?)\)
To break this down:
Anything that starts with __
Escaped ( and quotation mark " or '. Thus, \(\"
(.*?) Non-greedy match of all characters
Escaped closing " and last bracket.
| between the two expressions match either/or.
However, this only gets partial matches, and spaces are throwing off the search entirely. Apologies if this has been asked before, please link me if so!
Tester Link for the pattern provided above:
PHP Live Regex Test Tool
When the searched method string uses single quotes it will end up in another capture group than if it has double quotes. So in fact, your regular expression works (except for the spaces, see further down), but you'd have to look at a different index in your result array:
$input = 'The quick __("brown ") fox jumps __(\'over the\') lazy __("dog")';
// using your regular expression:
$res = preg_match_all("/(__\(\"(.*?)\"\))|(__\('(.*?)'\))/", $input, $matches);
print_r ($matches);
Note that you need preg_match_all instead of preg_match to get all matches.
Output:
Array
(
[0] => Array
(
[0] => __("brown ")
[1] => __('over the')
[2] => __("dog")
)
[1] => Array
(
[0] => __("brown ")
[1] =>
[2] => __("dog")
)
[2] => Array
(
[0] => brown
[1] =>
[2] => dog
)
[3] => Array
(
[0] =>
[1] => __('over the')
[2] =>
)
[4] => Array
(
[0] =>
[1] => over the
[2] =>
)
)
So, the result array has 5 elements, the first one representing the complete match, and all the others correspond to the 4 capture groups you have in your regular expression. As the capture groups for single quotes are not those of the double quotes, you'll find the matches at different places.
To "solve" this, you could use a back reference in your regular expression, which would look back to see which was the opening quote (single or double) and require the same to be repeated at the end:
$res = preg_match_all("/__\(([\"'])(.*?)\\1\)/", $input, $matches);
Note the back reference \1 (the backslash had to be escaped with another one). This refers back to the first capture group, where we have ["'] (again an escape was necessary) to match both kinds of quotes.
You also wanted to deal with spaces. On your PHP Live Regex you used a test string that had such spaces between the brackets and quotes. To deal with these so they still match the method strings correctly, the regular expression should get two additional \s*:
$res = preg_match_all("/__\(\s*([\"'])(.*?)\\1\s*\)/", $input, $matches);
Now the output is:
Array
(
[0] => Array
(
[0] => __("brown ")
[1] => __('over the')
[2] => __("dog")
)
[1] => Array
(
[0] => "
[1] => '
[2] => "
)
[2] => Array
(
[0] => brown
[1] => over the
[2] => dog
)
)
... and the text captured by the groups is now nicely arranged.
See this code run on eval.in and PHP Live Regex.
When working with stuff like this, don't forget about escaping:
<?php
ob_start();
?>
The quick __("brown ") fox jumps __( 'over the' ) lazy __("dog").
And __("everyone says \"hi\"").
<?php
$content = ob_get_clean();
$re = <<<RE
/__ \(
\s*
" ( (?: \\\\. | [^"])+ ) "
|
' ( (?: \\\\. | [^'])+ ) '
\s*
\)
/x
RE;
preg_match_all($re, $content, $matches, PREG_SET_ORDER);
foreach($matches as $match)
echo end($match), "\n";
How about this:
(__(\('[^']+'\)|\("[^"]+"\)))
Instead of the non greedy ., use any char but the quotes [^'] or [^"]
Enclose double and single quotes with square brackets as a character class:
$str = 'The quick __( "brown ") fox jumps __(\'over the\') lazy __("dog")';
preg_match_all("/__\(\s*([\"']).*?\\1\s*\)/ium", $str, $matches);
echo '<pre>';
var_dump($matches[0]);
// the output:
array (size=3)
0 => string '__( "brown ")'
1 => string '__('over the')'
2 => string '__("dog")'
And here is example with the same solution on phpliveregex.com:
http://www.phpliveregex.com/p/exF
(section preg_match_all)

Split this values part of an insert query

Is there any way to achieve the following? I need to take this $query and split it into its various elements (the reason is because I am having to reprocess an insert query). As you can see this will work for regular string blocks or numbers, but not where a number, occurs in the string. Is there a way to say |\d but not where that \d occurs within a ' quoted string '?
$query = "('this is\'nt very, funny (I dont think)','is it',12345,'nope','like with 2,4,6')";
$matches = preg_split("#',|\d,#",substr($query,1,-1));
echo $query;
print'<pre>[';print_r($matches);print']</pre>';
So just to be clear about expected results:
0:'this is\'nt very, funny (I dont think)'
1:'it is'
2:12345
3:'nope'
4:'like with 2,4,6'.
** Additionally I don't mind if each string is not quoted - I can requote them myself.
Could (*SKIP)(*F) parts that are inside single quotes and match , outside:
'(?:\\'|[^'])*'(*SKIP)(*F)|,
(?:\\'|[^']) Inside the single quotes matches escaped \' or a character that is not a single quote.
See Test at regex101.com
$query = "('this is\'nt very, funny (I dont think)','is it',12345,'nope','like with 2,4,6')";
$matches = preg_split("~'(?:\\\\'|[^'])*'(*SKIP)(*F)|,~", substr($query,1,-1));
print_r($matches);
outputs to (test at eval.in)
Array
(
[0] => 'this is\'nt very, funny (I dont think)'
[1] => 'is it'
[2] => 12345
[3] => 'nope'
[4] => 'like with 2,4,6'
)
Not absolutely sure, if that is what you mean :)
('(?:(?!(?<!\\)').)*')|(\d+)
Try this.Grab the captures.Each string is quoted as well.See demo.
http://regex101.com/r/dK1xR4/3
You could try matching through preg_match_all instead of splitting.
<?php
$data = "('this is\'nt very, funny (I dont think)','is it',12345,'nope','like with 2,4,6')";
$regex = "~'(?:\\\\'|[^'])+'|(?<=,|\()[^',)]*(?=,|\))~";
preg_match_all($regex, $data, $matches);
print_r($matches[0]);
?>
Output:
Array
(
[0] => 'this is\'nt very, funny (I dont think)'
[1] => 'is it'
[2] => 12345
[3] => 'nope'
[4] => 'like with 2,4,6'
)
If you don't mind using preg_match, then the solution could look like this. This regex uses lookbehind with negative assertions (?<!\\\\), it will match strings inside quotes that is not preceded by slash, and the alternation with the vertical bar ensures that numbers that are part of larger match will be ignored.
$query = "('this is\'nt very, funny (I dont think)','is it',12345,'nope','like with 2,4,6',6789)";
preg_match_all( "/(?<!\\\\)\'.+?(?<!\\\\)\'|\d+/", substr( $query, 1, -1 ), $matches );
print_r( $matches );
/* output:
Array (
[0] => Array
(
[0] => 'this is\'nt very, funny (I dont think)'
[1] => 'is it'
[2] => 12345
[3] => 'nope'
[4] => 'like with 2,4,6'
[5] => 6789
)
)
*/
,(?=(?:[^']*'[^']*')*[^']*$)
Try this.This will split according to what you want.Replace by \n.See demo.
http://regex101.com/r/dK1xR4/4

PHP preg_split with two delimiters unless a delimiter is within quotes

Further on from my previous question about preg_split which was answers super fast, thanks to nick; I would really like to extend the scenario to no split the string when a delimiter is within quotes. For example:
If I have the string foo = bar AND bar=foo OR foobar="foo bar", I'd wish to split the sting on every space or = character but include the = character in the returned array (which works great currently), but I don't want to split the string either of the delimiters are within quotes.
I've got this so far:
<!doctype html>
<?php
$string = 'foo = bar AND bar=foo';
$array = preg_split('/ +|(=)/', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
?>
<pre>
<?php
print_r($array);
?>
</pre>
Which gets me:
Array
(
[0] => foo
[1] => =
[2] => bar
[3] => AND
[4] => bar
[5] => =
[6] => foo
)
But if I changed the string to:
$string = 'foo = bar AND bar=foo OR foobar = "foo bar"';
I'd really like the array to be:
Array
(
[0] => foo
[1] => =
[2] => bar
[3] => AND
[4] => bar
[5] => =
[6] => foo
[6] => OR
[6] => foobar
[6] => =
[6] => "foo bar"
)
Notice the "foo bar" wasn't split on the space because it's in quotes?
Really not sure how to do this within the RegEx or if there is even a better way but all your help would be very much appreciated!
Thank you all in advance!
Try
$array = preg_split('/(?: +|(=))(?=(?:[^"]*"[^"]*")*[^"]*$)/', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
The
(?=(?:[^"]*"[^"]*")*[^"]*$)
part is a lookahead assertion making sure that there is an even number of quote characters ahead in the string, therefore it will fail if the current position is between quotes:
(?= # Assert that the following can be matched:
(?: # A group containing...
[^"]*" # any number of non-quote characters followed by one quote
[^"]*" # the same (to ensure an even number of quotes)
)* # ...repeated zero or more times,
[^"]* # followed by any number of non-quotes
$ # until the end of the string
)
I was able to do this by adding quoted strings as a delimiter a-la
"(.*?)"| +|(=)
The quoted part will be captured. It seems like this is a bit tenuous and I did not test it extensively, but it at least works on your example.
But why bother splitting?
After a look at this old question, this simple solution comes to mind, using a preg_match_all rather than a preg_split. We can use this simple regex to specify what we want:
"[^"]*"|\b\w+\b|=
See online demo.

Regular Expressions: get what is outside of the brackets

I'm using PHP and I have text like:
first [abc] middle [xyz] last
I need to get what's inside and outside of the brackets. Searching in StackOverflow I found a pattern to get what's inside:
preg_match_all('/\[.*?\]/', $m, $s)
Now I'd like to know the pattern to get what's outside.
Regards!
You can use preg_split for this as:
$input ='first [abc] middle [xyz] last';
$arr = preg_split('/\[.*?\]/',$input);
print_r($arr);
Output:
Array
(
[0] => first
[1] => middle
[2] => last
)
This allows some surrounding spaces in the output. If you don't want them you can use:
$arr = preg_split('/\s*\[.*?\]\s*/',$input);
preg_split splits the string based on a pattern. The pattern here is [ followed by anything followed by ]. The regex to match anything is .*. Also [ and ] are regex meta char used for char class. Since we want to match them literally we need to escape them to get \[.*\]. .* is by default greedy and will try to match as much as possible. In this case it will match abc] middle [xyz. To avoid this we make it non greedy by appending it with a ? to give \[.*?\]. Since our def of anything here actually means anything other than ] we can also use \[[^]]*?\]
EDIT:
If you want to extract words that are both inside and outside the [], you can use:
$arr = preg_split('/\[|\]/',$input);
which split the string on a [ or a ]
$inside = '\[.+?\]';
$outside = '[^\[\]]+';
$or = '|';
preg_match_all(
"~ $inside $or $outside~x",
"first [abc] middle [xyz] last",
$m);
print_r($m);
or less verbose
preg_match_all("~\[.+?\]|[^\[\]]+~", $str, $matches)
Use preg_split instead of preg_match.
preg_split('/\[.*?\]/', 'first [abc] middle [xyz] last');
Result:
array(3) {
[0]=>
string(6) "first "
[1]=>
string(8) " middle "
[2]=>
string(5) " last"
}
ideone
As every one says that you should use preg_split, but only one person replied with an expression that meets your needs, and i think that is a little complex - not complex, a little to verbose but he has updated his answer to counter that.
This expression is what most of the replies have stated.
/\[.*?\]/
But that only prints out
Array
(
[0] => first
[1] => middle
[2] => last
)
and you stated you wanted whats inside and outside the braces, sio an update would be:
/[\[.*?\]]/
This gives you:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
but as you can see that its capturing white spaces as well, so lets go a step further and get rid of those:
/[\s]*[\[.*?\]][\s]*/
This will give you a desired result:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
This i think is the expression your looking for.
Here is a LIVE Demonstration of the above Regex

Categories