I have a URL:
https:\u002F\u002Fsite.vid.com\u002F93836af7-f465-4d2c-9feb-9d8128827d85\u002F6njx6dp3gi.m3u8?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjb3VudHJ5IjoiSU4iLCJkZXZpY2VfaWQiOiI1NjYxZTY3Zi0yYWE3LTQ1MjUtOGYwYy01ODkwNGQyMjc3ZmYiLCJleHAiOjE2MTA3MjgzNjEsInBsYXRmb3JtIjoiV0VCIiwidXNlcl9pZCI6MH0.c3Xhi58DnxBhy-_I5yC2XMGSWU3UUkz5YgeVL1buHYc","
And I want to match it using preg_match_all. My regex expression is:
preg_match_all('/(https:\/\/site\.vid\.com\/.*\",")/', $input_lines, $output_array);
But I am not able to match special character \ & u002F in above code. I tried using (escaping fuction). But it is not matching. I know it maybe a lame question, but if anyone could help me in matching \ and u002F or in escaping \ and u002F in preg_match_all, that would be helpfull.
Question Edit:
I want to use only preg_match_all because I am trying to extract above URL from a html page.
You may use
preg_match_all('~https:(?://|(?:\\\\u002F){2})site\.vid\.com(?:/|\\\\u002F)[^"]*~', $string)
See the regex demo. Details:
https: - a literal string (if s is optional, use https?:)
(?://|(?:\\u002F){2}) - a non-capturing group matching either // or (|) two occurrences of \u002F
site\.vid\.com - a literal site.vid.com string (the dot is a metacharacter that matches any char but line break chars, so it must be escaped)
(?:/|\\u002F) - a non-capturing group matching / or \u002F text
[^"]* - a negated character class matching zero or more chars other than ".
See the PHP demo:
$re = '~https:(?://|(?:\\\\u002F){2})site\.vid\.com(?:/|\\\\u002F)[^"]*~';
$str = 'https:\\u002F\\u002Fsite.vid.com\\u002F93836af7-f465-4d2c-9feb-9d8128827d85\\u002F6njx6dp3gi.m3u8?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjb3VudHJ5IjoiSU4iLCJkZXZpY2VfaWQiOiI1NjYxZTY3Zi0yYWE3LTQ1MjUtOGYwYy01ODkwNGQyMjc3ZmYiLCJleHAiOjE2MTA3MjgzNjEsInBsYXRmb3JtIjoiV0VCIiwidXNlcl9pZCI6MH0.c3Xhi58DnxBhy-_I5yC2XMGSWU3UUkz5YgeVL1buHYc","';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches[0]);
// => Array( [0] => https:\u002F\u002Fsite.vid.com\u002F93836af7-f465-4d2c-9feb-9d8128827d85\u002F6njx6dp3gi.m3u8?token=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjb3VudHJ5IjoiSU4iLCJkZXZpY2VfaWQiOiI1NjYxZTY3Zi0yYWE3LTQ1MjUtOGYwYy01ODkwNGQyMjc3ZmYiLCJleHAiOjE2MTA3MjgzNjEsInBsYXRmb3JtIjoiV0VCIiwidXNlcl9pZCI6MH0.c3Xhi58DnxBhy-_I5yC2XMGSWU3UUkz5YgeVL1buHYc )
Related
how to use preg_match_all() to get 1a1a-1a1a and 2B2B2-B2in the following string :
$string = 'Hello #1a1a-1a1a and #2B2B2-B2 too';
my aim is to capture every # followed by a uuid.
i tried :
preg_match_all("/#(.*)/", $string, $matches);
preg_match_all("/#.*?/U", $string, $matches);
preg_match_all("/#([^\"]+)/si", $a, $matches);
but can't make it
Use /(?<=#)[\w-]+/ pattern that match any string after #
preg_match_all("/(?<=#)[\w-]+/", $string, $matches);
print_r($matches[0]);
Output
Array
(
[0] => 1a1a-1a1a
[1] => 2B2B2-B2
)
Check result in demo
The #(.*) regex matches a # and the greedily any 0 or more chars other than line break chars (i.e. the rest of the line). /#.*?/U is a synonymous pattern, it is equal to /#.*/, the text after # just is not captured into a group. #([^\"]+) matches # and captures into Group 1 any one or more chars other than " and that will either match up to the first " or end of string if there is no ".
I suggest using
preg_match_all('~#\K[\w-]+~', $s, $matches)
See the regex demo. #\K[\w-]+ will match # and \K will remove it from the match, and [\w-]+ will match 1 or more word or - chars that will be returned.
To make the pattern a bit more restrictive, say, to only match letters or digits after # that can be hyphen separated, you may use
'~#\K[A-Z0-9]+(?:-[A-Z0-9]+)*~i'
See this regex demo. Here, [A-Z0-9]+ matches 1 or more alphanumeric chars and (?:-[A-Z0-9]+)* will match 0 or more repetitions of a - followed with 1+ alphanumeric chars. i modifier will make the pattern case insensitive.
Your regexes ar matching:
#(.*) Matches # and captures in a group any character 0+ times greedy including the space which will match all in your example
#.*? Matches # followed by any character 0+ times non greedy which will only match the #
#([^\"]+) Matches # and captures in a group matching not a " which will match all in your example
To capture every # followed by a uuid, you could use a character class to list what you would allow to match and repeat that pattern preceded by a dash in a non capturing group 1+ times.
If you want to match the uuid only, you could capture the values in a capturing group.
#([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)+)
Regex demo
$string = 'Hello #1a1a-1a1a and #2B2B2-B2 too';
preg_match_all("/#([a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)+)/", $string, $matches);
print_r($matches[1]);
Result
Array
(
[0] => 1a1a-1a1a
[1] => 2B2B2-B2
)
Demo php
Try this, it will catch everything after '#', no matter how many characters
preg_match_all("#(\w)*/", $string, $matches)
I am trying to write a RegEx for preg_match_all in php to match a string inside 2 $ symbols, like $abc$ but only if it doesn't have a space, for example, I don't need to match $ab c$.
I wrote this regex /[\$]\S(.*)[\$]/U and some variations but can't get it to work.
Thanks for your help guys.
Overview
Your regex: [\$]\S(.*)[\$]
[\$] - No point in escaping $ inside [] because it's already interpreted as the literal character. No point putting \$ inside [] because \$ is the escaped version. Just use one or the other [$] or \$.
\S(.*) Matches any non-whitespace character (once), followed by any character (except \n) any number of times
Code
See regex in use here
\$\S+\$
\$ Match $ literally
\S+ Match any non-whitespace character one or more times
\$ Match $ literally
Usage
$re = '/\$\S+\$/';
$str = '$abc$
$ab c$';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
I think this will suit your needs.
https://regex101.com/r/WgUwh9/1
\$([a-zA-Z]*)\$
It will match a-Z of any lenght without space between two $
I need get all matches in string, when word begins with # and then contains only alnym 0-9a-z characters. for example from this string #ww#ee x##vx #ss #aa assadd #sfsd I need get these pieces:
#ss
#aa
#sfsd
I am trying:
$str = "#ww#ee x##vx #ss #aa assadd #sfsd";
preg_match_all("#(^|\s)\#([0-9a-z]+)(\s+|$)#ui", $str, $matches);
var_dump( $matches );
But this gives only #ss
#sfsd and skips #aa.
What would be right pattern for this?
You can use the following regex
'~\B(?<!#)#([0-9a-z]+)(?:\s|$)~iu'
See the regex demo and here is an IDEONE demo:
$re = '~\B(?<!#)#([0-9a-z]+)(?:\s|$)~ui';
$str = "#ww#ee x##vx #ss #aa assadd #sfsd";
preg_match_all($re, $str, $matches);
print_r($matches);
The regex explanation:
\B - match the non-word boundary location (that is, everywhere but between ^ and \w, \w and $, \W and \w, \w and \W))
(?<!#) - fail the match if there is a # before the current location
# - a # symbol (does not have to be escaped)
([0-9a-z]+) - Group 1 (since the (...) are not escaped, they capture a subpattern and store it in a special memory slot)
(?:\s|$) - a non-capturing group (only meant to group alternatives) matching a whitespace (\s) or $.
The ~ui modifiers allow proper handling of Unicode strings (u) and make the pattern case insensitive (i).
Note that \B is forcing a non-word character to appear before #. But you do not want to match if another # precedes the #wwww-like string. Thus, we have to use the negative lookbehind (?<!#) that restricts the matches even further.
How do i can extract hashtags from a non-ASCII string , using regex ?
For example :
$str = #Hello #سلام #hello-again #سلام_دوباره #hello_again
I wouldn't like bad characters like ! # $ % ^ ♫ ► that included in hashtag , be accepted.
I tried this but it accepts bad characters :
preg_match_all('/#([^\s]+)/', $str, $matches);
it accepts #►☻
You may use the following regex:
'/#([\w-]+)/u'
See regex demo. The /u modifier will allow handling of Unicode symbols, and \w will match Unicode letters.
The regex breakdown:
# - a # symbol
([\w-]+) - 1 or more characters that are either letters, numbers, underscores or hyphens.
See the IDEONE demo
In PHP I am trying to complete a simple task of pulling some information from a string using preg_match_all
I have a string like this for example 0(a)1(b)2(c)3(d)4(e)5(f)
and I am trying to return all the contents inside of each () BUT having respect for the fact that escaped parenthesis might exist inside of these.
I have tried multiple combinations but I just can't get any regular expression to allow for something like this 4(here are some escaped parens\(\) more text) to return this here are some escaped parens\(\) more text rather than this here are some escaped parens\(\)
I have a regular expression that works, but not with escaped parenthesis
[0-9]*\(([^ESCAPED PARENTHESIS])*?\)
Can someone give me an idea on how to accomplish this?
You can use a negative look behind to make your regex engine just match the close parenthesis which doesn't precede with backslash:
\((.+?)(?<!\\)\)
See Demo https://regex101.com/r/oU9sF2/1
Debuggex Demo
You can use this regex to match your text:
preg_match_all('/(?<!\\)\((.*?)(?<!\\)\)/', $str, $matches);
print_r($matches[1]);
RegEx Demo
You can use this pattern:
$pattern = <<<'EOD'
~[0-9]+\([^)\\]*+(?s:\\.[^)\\]*)*+\)~
EOD;
demo
The idea is to match all characters until the closing parenthesis and the backslash. When a backslash is reached, the next character is matched too, and "und so weiter", etc., until the end of the world (or a closing parenthesis), all characters that are not a closing parenthesis or a backslash are matched.
Note: possessive quantifiers *+ are only here to limit the backtracking when there is no closing parenthesis.
Here is a working regex:
[0-9]*\(([^()\\]*(?:\\.[^()\\]*?)*)\)
See regex demo
See IDEONE demo:
$re = '~[0-9]*\(([^()\\\\]*(?:\\\\.[^()\\\\]*?)*)\)~s';
$str = "0(a)1(b)2(c)3(d)4(here are some escaped parens\(\) more text)5(f)";
preg_match_all($re, $str, $matches);
print_r($matches[1]);
Regex breakdown:
[0-9]* - matches 0 or more digits
\( - matches a literal (
([^()\\]*(?:\\[()][^()]*?)*) - matches and captures
[^()\\]* - 0 or more symbols other than \, ( and )
(?:\\.[^()]*?)* - matches 0 or more sequences of...
\\. - escaped character followed by
[^()\\]*? - as few as possible characters other than \, ( and )
\) - matches a literal )