php pregmatch_all url with specific word between double quotes - php

I am having a very hard time coming up with a regex that works in this situation.
I am trying to use pregmatch_all to capture the url between quotes which contain "720.mp4" resulting in the url without the double quotes.
[{"file":"https:\/\/cw012.videohost.com\/files\/videos\/2017\/09\/18\/1505738417e8b76-720.mp4?h=33wg3l1i1G0XcJxvT82x7Q&ttl=1505769928",
In the end i want the above to result as
https:\/\/cw012.videohost.com\/files\/videos\/2017\/09\/18\/1505738417e8b76-720.mp4?h=33wg3l1i1G0XcJxvT82x7Q&ttl=1505769928
Any ideas ? I am very new to regex, i have done my reading but cant put what i have read to work with this specific case.

As a simple approach you can use:
preg_match_all('#"([^"]*720\.mp4[^"]*)"#', $str, $m);
var_dump($m[1]);
The steps are straightforward. We want a literal ". Then we open a capture group ((), then anything that is not a ", then the literal string 720.mp4 (with an escaped dot, because . has a special meaning). Again anything but ", close the group, and a final ".
$m[1] is the content of the capture group we want. $m[0] contains the entire match with the quotes.

Related

Use regex to quote the name in name-value pair of a list of pairs

I am trying to put quotes around the names of name-value pairs separated by commas. I use preg_replace and regex to achieve that. However, my pattern is not working properly.
$str="f1=1,f2='2',f3='a',f4=4,f5='5'";
$newstr=Preg_replace(/'(?.[^=]+)'/,"'$1'",$str);
I expected $newstr to come out like so:
'f1'=1,'f2'='2','f3'='a','f4'=4,'f5'='5'
But it doesn't and the qoutes don't contain the name.
What should the pattern be and how can I use the comma to get all of them correctly?
There are a few issues with your attempt:
PHP does not have a regex-literal syntax as in JavaScript, so starting the regex value with a forward slash is a syntax error. It should be a string, so start with a quote. Maybe you accidently swapped the slash and quote at the start and the end.
(?. is not valid. Maybe you intended (?:, but then there is no capture group and $1 is not a valid back reference. To have the capture group, you should not have (?., but just (.
[^=]+ could include substrings like 1,f2. There should be logic to not start matching while still inside a value (whether quoted or not).
I would suggest a regex where you match both parts around the = (both key and value), and then in the replacement, just reproduce the second part without change. This will ensure you don't accidently use anything in the value side for wrapping in quotes:
$newstr = preg_replace("/([^,=]+)=('[^']*'|[^,]*)/","'$1'=$2",$str);
Basically, match beginning of line or a comma (with negative capture) and then capture everything until a =
$reg = "/(?<=^|,)([^=]+)/";
$str = "f1=1,f2='2',f3='a',f4=4,f5='5'";
print_r(preg_replace($reg, "'$1'", $str));
// output:
// 'f1'=1,'f2'='2','f3'='a','f4'=4,'f5'='5'
This will also work, a different approach, but assuming there will be no comma in the values or names except the separators..
$newstr = preg_replace("/(.)(?==)|(?<=,|^)(.)/", "$1'$2", $str);
But I believe string and simple array operations will be faster as the regex is really getting complex and there are so many steps to get the characters.. Here is the same output but with array functions only.
$newstr = implode(",", array_map(function($element){ return "'". implode("'=", explode("=", $element)); }, explode(",", $str)));
RegEx is not always fast than string or array operations, but yes it can do complex things with little bit of code.

exploding a search string

I'm trying to make a search string, which can accept a query like this:
$string = 'title -launch category:technology -tag:news -tag:"outer space"$';
Here's a quick explanation of what I want to do:
$ = suffix indicating that the match should be exact
" = double quotes indicate that the multi-word is taken as a single keyword
- = a prefix indicating that the keyword is excluded
Here's my current parser:
$string = preg_replace('/(\w+)\:"(\w+)/', '"${1}:${2}', $string);
$array = str_getcsv($string, ' ');
I was using this above code before, but it doesn't work as intended with the keywords starting on searches like -tag:"outer space". The code above doesn't recognize strings starting with - character and breaks the keyword at the whitespace between the outer and space, despite being enclosed with double quotes.
EDIT: What I'm trying to do with that code is to preg_replace -tag:"outer space" into "-tag:outer space" so that they won't be broken when I pass the string to str_getcsv().
You may use preg_replace like this:
preg_replace('/(-?\w+:)"([^"]+)"/', '"$1$2"', $str);
See the PHP demo online.
The regex matches:
(-?\w+:) - Capturing group 1: an optional - (? matches 1 or 0 occurrences), then 1+ letters/digits/underscores and a :
" - a double quote (it will be removed)
([^"]+) - Capturing group 2: one or more chars other than a double quote
" - a double quote
The replacement pattern is "$1$2": ", capturing group 1 value,
capturing group 2 value, and a ".
See the regex demo here.
Here's how I did it:
$string = preg_replace('/(\-?)(\w+?\:?)"(\w+)/', '"$1$2$3', $string);
$array = str_getcsv($string, ' ');
I considered formats like -"top ten" for quoted multi-word keywords that doesn't have a category/tag + colon prefix.
I'm sorry for being slow, I'm new on regex, php and programming in general and this is also my first post in stackoverflow. I'm trying to learn it as a personal hobby. I'm glad that I learned something new today. I'll be reading more about regex since it looks like it can do a lot of stuff.

Extract value from between quotes with regex

I have the following string:
feature name="osp"
I need to extract part of the strings out and put them into a new string. The word feature can change and the word inside quotes can change so I need to be able to capture any instance possible. The name=" " part is always the same. The result I need is:
feature osp
I need to filter out the name= and quotes from the string.
I've used this ^\w*\s to get the first feature part but can't figure out how to extract osp from the string using a regex. I've been looking here RegEx: Grabbing values between quotation marks but can't get a regex that combines both to get the result I need. I'm working in PHP so using preg-match at the moment. Can anyone help with this?
I'd go with
(\w+)\s+name\s*=\s*"([^"]*)
It's a little bit slower, but it allows for arbitrary number of spaces and it captures the first word correctly, even with Alexandru's test.
See it work here at regex101.
Regards
Try something like that:
preg_match('/(.+)name="(.+?)"/', $string, $matches);
echo $matches[1] . $matches[2];
An improved version of #vuryss
preg_match('/(.*?)name="(.*?)"/ims', $string, $matches);
echo $matches[1] . $matches[2];

Regex grab all text between brackets, and NOT in quotes

I'm attempting to match all text between {brackets}, however not if it is in quotation marks:
For example:
$str = 'value that I {want}, vs value "I do {NOT} want" '
my results should snatch "want", but omit "NOT". I've searched stackoverflow desperately for the regex that could perform this operation with no luck. I've seen answers that allow me to get the text between quotes but not outside quotes and in brackets. Is this even possible?
And if so how is it done?
So far this is what I have:
preg_match_all('/{([^}]*)}/', $str, $matches);
But unfortunately it only gets all text inside brackets, including {NOT}
It's quite tricky to get this done in one go. I even wanted to make it compatible with nested brackets so let's also use a recursive pattern :
("|').*?\1(*SKIP)(*FAIL)|\{(?:[^{}]|(?R))*\}
Ok, let's explain this mysterious regex :
("|') # match eiter a single quote or a double and put it in group 1
.*? # match anything ungreedy until ...
\1 # match what was matched in group 1
(*SKIP)(*FAIL) # make it skip this match since it's a quoted set of characters
| # or
\{(?:[^{}]|(?R))*\} # match a pair of brackets (even if they are nested)
Online demo
Some php code:
$input = <<<INP
value that I {want}, vs value "I do {NOT} want".
Let's make it {nested {this {time}}}
And yes, it's even "{bullet-{proof}}" :)
INP;
preg_match_all('~("|\').*?\1(*SKIP)(*FAIL)|\{(?:[^{}]|(?R))*\}~', $input, $m);
print_r($m[0]);
Sample output:
Array
(
[0] => {want}
[1] => {nested {this {time}}}
)
Personally I'd process this in two passes. The first to strip out everything in between double quotes, the second to pull out the text you want.
Something like this perhaps:
$str = 'value that I {want}, vs value "I do {NOT} want" ';
// Get rid of everything in between double quotes
$str = preg_replace("/\".*\"/U","",$str);
// Now I can safely grab any text between curly brackets
preg_match_all("/\{(.*)\}/U",$str,$matches);
Working example here: http://3v4l.org/SRnva

Regex extra spaces in string not in double or single quotes - PHP

I would like to replace extra spaces (instances of consecutive whitespace characters) with one space, as long as those extra spaces are not in double or single quotes (or any other enclosures I may want to include).
I saw some similar questions, but I could not find a direct response to my needs above. Thank you!
Hope you're still looking, or come back to check! This seems to work for me:
'/\s+((["\']).*?(?=\2)\2)|\s\s+/'
...and replace with $1
EDIT
Also, if you need to allow for escaped quotes like \" or \', you could use this expression:
'/\s+((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/'
It gets a bit stickier if you want to add support for "balanced" quotes like brackets (e.g. () or {})
END EDIT
Let me know if you find problems or would like some explanation!
HOPEFULLY FINAL EDIT AND WARNINGS
Potential problem: If a quoted string starts at the beginning of the string variable (or file), it will either not count as a quoted string (and have any whitespace reduced) or it will throw off the whole thing, making anything NOT in quotes get treated as though it was in quotes and vice versa -
A potential change that might remedy this is to use the following match expression
/(?:^|\s+)((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/
this replaces \s+ with (?:^|\s+) at the beginning of the expression
this will add a space at the beginning of the variable if the string starts with a quote - just trim() or remove that whitespace to continue
I seem to have used the "line by line" approach (like sed, if I'm not mistaken) to reach my original results - if you use the "whole file" or "whole string" setting or approach, carriage-return-line-feed seems to count as two whitespace characters (can't imagine why...), thus turning any newlines into single spaces (unless they are inside quotes and "dot-matches-newline" is used, of course)
this could be resolved by replacing the . and \s shorthand character classes with the specific characters you want to match, like the following:
/(?:^|[ \t]+)((["\'])(\\\\\2|(?!\2)[\s\S])*?(?=\2)\2)|[ \t]{2,}/
this does not require the dot-matches-newline switch and only replaces multiple spaces or tabs - not newlines - with a single space (and of course, only if they are not quoted)
EXAMPLE
This link shows an example of the first expression and last expression in use on sample text on http://codepad.viper-7.com
You could do it in several steps. Consider the following example:
$str = 'This is a string with "Bunch of extra spaces". Leave them "untouched !".';
$id = 0;
$buffer = array();
$str = preg_replace_callback('|".*?"|', function($m) use (&$id, &$buffer) {
$buffer[] = $m[0];
return '__' . $id++;
}, $str);
$str = preg_replace('|\s+|', ' ', $str);
$str = preg_replace_callback('|__(\d+)|', function($m) use ($buffer) {
return $buffer[$m[1]];
}, $str);
echo $str;
This will output the string:
This is a string with "Bunch of extra spaces". Leave them "untouched !".
Although this is is not the prettiest solution.

Categories