Using REGEX with escaped quotes inside quotes - php

I have a PHP preg_match_all and REGEX question.
I have the following code:
<?php
$string= 'attribute1="some_value" attribute2="<h1 class=\"title\">Blahhhh</h1>"';
preg_match_all('/(.*?)\s*=\s*(\'|"|&#?\w+;)(.*?)\2/s', trim($string), $matches);
print_r($matches);
?>
That does not seem to pickup escaped quotes for the instance that I want to pass in HTML with quotes. I have tried numerous solutions for this with the basic quotes inside quotes REGEX fixes, but none seem to be working for me. I can't seem to place them correctly inside this pre-existing REGEX.
I am not a REGEX master, can someone please point me in the right direction?
The result I am trying to achieve is this:
Array
(
[0] => Array
(
[0] => attribute1="some_value"
[1] => attribute2="<h1 class=\"title\">Blahhhh</h1>"
)
[1] => Array
(
[0] => attribute1
[1] => attribute2
)
[2] => Array
(
[0] => "
[1] => "
)
[3] => Array
(
[0] => some_value
[1] => <h1 class=\"title\">Blahhhh</h1>
)
)
Thanks.

You can solve this with a negative lookbehind assertion:
'/(.*?)\s*=\s*(\'|"|&#?\w+;)(.*?)(?<!\\\\)\2~/'
^^^^^^^^^
The closing quote should not be prepended by \. Gives me:
Array
(
[0] => Array
(
[0] => attribute1="some_value"
[1] => attribute2="<h1 class=\"title\">Blahhhh</h1>"
)
[1] => Array
(
[0] => attribute1
[1] => attribute2
)
[2] => Array
(
[0] => "
[1] => "
)
[3] => Array
(
[0] => some_value
[1] => <h1 class=\"title\">Blahhhh</h1>
)
)
This regex ain't perfect because it of the entity you but in there as delimiter, like the quotes it can be escaped as well with \. No idea if that is really intended.
See also this great question/answer: Split string by delimiter, but not if it is escaped.

Related

Obtain specific data with preg_match_all

I have different texts which aren't well formatted, therefore I need a pattern which works with all of them and return some specific elements (text) from it. Let's say I have this text:
"AL TEST232 KW 12*/13*/17 TEST kw16TEST123 kw 15*"
and I want my preg_match_all() to return something like this:
Array
(
[0] => Array
(
[0] => AL TEST232
[1] => 12/13/17
)
[1] => Array
(
[0] => TEST
[1] => 16
)
[2] => Array
(
[0] => TEST123
[1] => 15
)
)
Is this possible with a single pattern?
You can use:
preg_match_all('~(\w[\s\w]*?\w)\s*kw\s*([\d/*]+)~', $input, $matches);
RegEx Demo

PHP preg_match is mismatching a curly apostrophe with other types of curly quotes. How to avoid?

I have the following variable content:
$content_content = '“I can’t do it, she said.”';
I want to do a preg_match for every "word" in that, including the contractions, so I use preg_match as follows:
if (preg_match_all('/([a-zA-Z0-9’]+)/', $content_content, $matches))
{
echo '<pre>';
print_r($matches);
echo '</pre>';
}
However, it seems by including ’ in the regular expression, it's also trapping the curly double quotes, as the above command outputs:
Array
(
[0] => Array
(
[0] => ��
[1] => I
[2] => can’t
[3] => do
[4] => it
[5] => she
[6] => said
[7] => ��
)
[1] => Array
(
[0] => ��
[1] => I
[2] => can’t
[3] => do
[4] => it
[5] => she
[6] => said
[7] => ��
)
)
How can I include ’ without it also including the “ and ”?
This is because the "fancy" apostrophe you're using inside the character set is treated in its binary form; you need to enable Unicode mode using its respective modifier:
preg_match_all('/([a-zA-Z0-9’]+)/u', $content_content, $matches)
Demo

Regex with unknown character length

Simple question for you folks.
Sorry that I have to ask it.
On my website, I want to use signatures at "random" places in my text. The problem is, There could be multiple DIFFERENT signatures in this given string.
The signature code is ~~USERNAME~~
So anything like
~~timtj~~
~~foobar~~
~~totallylongusername~~
~~I-d0n't-us3-pr0p3r-ch#r#ct3r5~~
I have tried using preg_match for this, with no success. I understand that the third parameter is used to store the matches, but I can not properly get a match because of the format.
Should I not use preg_match, or am I just not able to use signatures in this manner?
You could make use of preg_match_all and with this modified regex
preg_match_all('/~~(.*?)~~/', $str, $matches);
The code...
<?php
$str="~~I-d0n't-us3-pr0p3r-ch#r#ct3r5~~";
preg_match_all('/~~(.*?)~~/', $str, $matches);
print_r($matches[1]);
OUTPUT :
Array
(
[0] => I-d0n't-us3-pr0p3r-ch#r#ct3r5
)
This should work, but usernames mustn't contain ~~
preg_match_all('!~~(.*?)~~!', $str, $matches);
Output:
Array
(
[0] => Array
(
[0] => ~~timtj~~
[1] => ~~foobar~~
[2] => ~~totallylongusername~~
[3] => ~~I-d0n't-us3-pr0p3r-ch#r#ct3r5~~
)
[1] => Array
(
[0] => timtj
[1] => foobar
[2] => totallylongusername
[3] => I-d0n't-us3-pr0p3r-ch#r#ct3r5
)
)
The first sub array contains the complete matched strings and the other sub arrays contain the matched groups.
You could change the order by using the flag PREG_SET_ORDER, see http://php.net/preg_match_all#refsect1-function.preg-match-all-parameters
<?php
$str = "~~timtj~~ ~~foobar~~ ~~totallylongusername~~ ~~I-d0n't-us3-pr0p3r-ch#r#ct3r5~~";
preg_match_all("!~~(.*?)~~!", str, $matches, PREG_SET_ORDER);
print_r($matches);
This code produces the following output
Array
(
[0] => Array
(
[0] => ~~timtj~~
[1] => timtj
)
[1] => Array
(
[0] => ~~foobar~~
[1] => foobar
)
[2] => Array
(
[0] => ~~totallylongusername~~
[1] => totallylongusername
)
[3] => Array
(
[0] => ~~I-d0n't-us3-pr0p3r-ch#r#ct3r5~~
[1] => I-d0n't-us3-pr0p3r-ch#r#ct3r5
)
)

split string with delimiter that are not inside specific characters

I have a string in the in the following format
,"value","value2","3",("this is, a test"), "3"
How can I split by commas when they are not within parenthesis?
Edit: Sorry slight problem/correction, inside the parenthesis the format is actually
,"value","value2","3",(THIS IS THE FORMAT "AND QUOTES, INSIDE"), "3"
Consider this code:
$str = ',"value","value2","3",(THIS IS THE FORMAT \) "AND QUOTES, INSIDE"), "3"';
$regex = '#(\(.*?(?<!\\\)\))\s*,|,#';
$arr = preg_split( $regex, $str, 0, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY );
print_r($arr);
OUTPUT:
Array
(
[0] => "value"
[1] => "value2"
[2] => "3"
[3] => (THIS IS THE FORMAT \) "AND QUOTES, INSIDE")
[4] => "3"
)
The quotes are already sufficient to delimit the comma, so you don't need parens as well. If you take out the parens, str_getcsv() will work on it just fine. If you don't have control of the source, you can strip them yourself:
$str = str_replace('",("', '","', $str);
$str = str_replace('"), "', '", "', $str);
print_r(str_getcsv($str))
Edit for updated question:
You're still ok as long as there are no unescaped parens in the file. Just convert close parens to open parens (since getcsv() can only use a single char for delimiters), and then use open paren as your quote character:
$str = str_replace(')', '(', $str);
print_r(str_getcsv($str, ',', '('));
Result:
Array
(
[0] =>
[1] => "value"
[2] => "value2"
[3] => "3"
[4] => THIS IS THE FORMAT "AND QUOTES, INSIDE"
[5] => "3"
)
the above solutions work fine but i have one more
preg_match_all('#(,)?("|(\())(.+?)((?(3)\)|"))(,)?#',$str,$arr);
the output to this one is
Array
(
[0] => Array
(
[0] => ,"value",
[1] => "value2",
[2] => "3",
[3] => ("this is, a test"),
[4] => "3"
)
[1] => Array
(
[0] => ,
[1] =>
[2] =>
[3] =>
[4] =>
)
[2] => Array
(
[0] => "
[1] => "
[2] => "
[3] => (
[4] => "
)
[3] => Array
(
[0] =>
[1] =>
[2] =>
[3] => (
[4] =>
)
[4] => Array
(
[0] => value
[1] => value2
[2] => 3
[3] => "this is, a test"
[4] => 3
)
[5] => Array
(
[0] => "
[1] => "
[2] => "
[3] => )
[4] => "
)
[6] => Array
(
[0] => ,
[1] => ,
[2] => ,
[3] => ,
[4] =>
)
)
so $arr[4] contains the matches
Here’s a simple tokenizer that you can use to split the input into strings and other characters:
preg_match_all('/"(?:[^\\\\"]|\\.)*"|[^"]/', $input, $tokens)
If you want to parse the input, just iterate the tokens and do whatever syntax check you want. You can identify the strings by the quote at the begin and end of the token.
preg_match("/,?\"(.*?)\",?/", $myString, $result);
You can check the regex here
Edit: The only solution I can quickly think with escaped quotes is just replace them and add them again later
preg_match("/,?\"(.*?)\",?/", str_replace('\"', "'", $myString), $result);

What is the regex for the text between quotes?

Ok, I have tried looking at other answers, but couldn't get mine solved. So here is the code:
{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}
I need to get every second value in the quotes (as the "name" values are constant). I actually worked out that I need to get text between :" and " but i can't manage to write a regex for that.
EDIT: I'm doing preg_match_all in php. And its between :" and ", not " and " as someone else edited.
Why on earth would you attempt to parse JSON with regular expressions? PHP already parses JSON properly, with built-in functionality.
Code:
<?php
$input = '{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}';
print_r(json_decode($input, true));
?>
Output:
Array
(
[chg] => -0.71
[vol] => 40700
[time] => 11.08.2011 12:29:09
[high] => 1.417
[low] => 1.360
[last] => 1.400
[pcl] => 1.410
[turnover] => 56,560.25
)
Live demo.
You may need to escape characters or add a forward slash to the front or back depending on your language. But it's basically:
:"([^"].*?)"
or
/:"([^"].*?)"/
I've test this in groovy as below and it works.
import java.util.regex.*;
String test='{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}'
// Create a pattern to match breaks
Pattern p = Pattern.compile(':"([^"]*)"');
// Split input with the pattern
// Run some matches
Matcher m = p.matcher(test);
while (m.find())
System.out.println("Found comment: "+m.group().replace('"','').replace(":",""));
Output was:
Found comment: -0.71
Found comment: 40700
Found comment: 11.08.2011 12:29:09
Found comment: 1.417
Found comment: 1.360
Found comment: 1.400
Found comment: 1.410
Found comment: 56,560.25
PHP Example
<?php
$subject = '{"chg":"-0.71","vol":"40700","time":"11.08.2011 12:29:09","high":"1.417","low":"1.360","last":"1.400","pcl":"1.410","turnover":"56,560.25"}';
$pattern = '/(?<=:")[^"]*/';
preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>
Output is:
Array ( [0] => Array ( [0] => Array ( [0] => -0.71 [1] => 8 ) [1] => Array ( [0] => 40700 [1] => 22 ) [2] => Array ( [0] => 11.08.2011 12:29:09 [1] => 37 ) [3] => Array ( [0] => 1.417 [1] => 66 ) [4] => Array ( [0] => 1.360 [1] => 80 ) [5] => Array ( [0] => 1.400 [1] => 95 ) [6] => Array ( [0] => 1.410 [1] => 109 ) [7] => Array ( [0] => 56,560.25 [1] => 128 ) ) )

Categories