Is start ^ and end $ required for multiple regex patterns - php

If given [name=anystring] or #anystring where anystring is a string which has already had any whitespace removed, I wish to return anystring.
Before attempting both, I successfully performed them individually.
$pattern = "/^#(.+)$/";
preg_match($pattern, '#anystring', $matches);
preg_match($pattern, '[name=anystring]', $matches);
$pattern = "/^\\[name=(.+)\\]$/";
preg_match($pattern, '#anystring', $matches);
preg_match($pattern, '[name=anystring]', $matches);
And then I tried to combine them.
# with start ^ and end $ on both
$pattern = "/^#(.+)$|^\\[name=(.+)\\]$/";
preg_match($pattern, '#anystring', $matches);
preg_match($pattern, '[name=anystring]', $matches);
# without start ^ and end $ on both
$pattern = "/^#(.+)|\\[name=(.+)\\]$/";
preg_match($pattern, '#anystring', $matches);
preg_match($pattern, '[name=anystring]', $matches);
While I "kind of" get what I am looking for, the second pattern [name=(.+)] returns an array with three elements.
Should I have and end $ after the first pattern and a start ^ before the second pattern? Can this result in the second pattern returning an array with three elements?
EDIT. Show how one version displays more array elements
<?php
$pattern = "/^(?:#(.+)|\\[name=(.+)\\])$/s";
preg_match($pattern, '#anystring', $matches);
print_r($matches);
preg_match($pattern, '[name=anystring]', $matches);
print_r($matches);
(
[0] => #anystring
[1] => anystring
)
Array
(
[0] => [name=anystring]
[1] =>
[2] => anystring
)

You are looking for a branch reset group where numbering of capturing groups begins from the last ID before the group:
^(?|#(.+)|\[name=(.+)])$
^^
See the regex demo
Details
^ - start of string
(?| - start of the branch reset group
#(.+) - a # and then Group 1 capturin 1+ chars, as many as possible
| - or
\[name= - a [name= substring
(.+) - Group 1 (again) matching 1+ chars other than line break chars, as many as possible
] - a ]
) - end of the branch reset group
$- end of string.

You can combine 2 regexes using a non capturing group:
(?:pattern1|pattern2)
I wrote this regex which will capture on both strings:
(?:\[\w+=(?<bracketword>\w+)\]|\#(?<word>\w+))
Your match will either have array key bracketword, or word.
Check it out on the regex101 link below.
https://regex101.com/r/AmgHTS/1/
You can also use start and end string ^ and $ if you like. In my edited regex, my test string is two lines (one for each string), so i had to use the multi line flag too.
https://regex101.com/r/AmgHTS/2/

To capture only anything with both use Lookbehind like this :
(?<=#|name=)([^\[#\]]+)
https://regex101.com/r/AmgHTS/4/
for more check :
https://regex101.com/r/AmgHTS/5

Related

Extract Key from URL with Preg_Match in PHP

I haves this URL
https://test.com/file/5gdxyYpb#_FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE
I need to create with preg_mach this condition:
$match[0]=5gdxyYpb#_FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE
$match[1]=5gdxyYpb
$match[2]=_FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE
I try difference pattern the mos closed was this one. e\/(.*?)\#(.*).
Please any recommendation. (If necessary in Preg_Match).
Thank you,
You might use 2 capturing groups and make use of \K to not match the first part of the url to get the desired matches.
https?://.*/\K([^#\s]+)#(\S+)
https?:// Match the protocol with optional s, then ://
.*/ Match until the last occurrence of /
\K Forget what is matched until here
([^#\s]+) Capture group 1, match 1+ occurrences of any char except a # or whitespace char
# Match the #
(\S+) Capture group 2, match 1+ occurrences of a non whitespace char
Regex demo | Php demo
$url = "https://test.com/file/5gdxyYpb#_FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE";
$pattern = "~https?://.*/\K([^#]+)#(.*)~";
$res = preg_match($pattern, $url, $matches);
print_r($matches);
Output
Array
(
[0] => 5gdxyYpb#_FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE
[1] => 5gdxyYpb
[2] => _FWRc4T12baPrppZIwVQ5i18Sq16f7TXU82LJwY_BjE
)

regex: select all characters before and after a specific string

I want to select all text before and after a specific substring, I used the following expression to do that, but it not selecting all the needed text:
/^(?:(?!\<\?php echo[\s?](.*?)\;[\s?]\?\>).)*/
for example:
$re = '/^(?:(?!\<\?php echo[\s?](.*?)\;[\s?]\?\>).)*/';
$str = 'customFields[<?php echo $field["id"]; ?>][type]';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
it will select only this part customFields[, while the expected result should be customFields[ and ][type]
check this link for debugging
The pattern ^(?:(?!\<\?php echo[\s?](.*?)\;[\s?]\?\>).)* uses a tempered greedy token which matches any character except a newline from the start of the string ^ that fulfills the assertion of the negative lookahead.
That will only match customFields[
For your example data you could make use of a tempered greedy token regex demo, but instead you could also just make use of a negated character class and SKIP FAIL:
^[^[]+\[|<\?php echo\s(.*?)\;\s\?\>(*SKIP)(*FAIL)|\]\[[^]]*\]
Regex demo | Php demo
For example
$re = '/^[^[]+\[|<\?php echo\s(.*?)\;\s\?\>(*SKIP)(*FAIL)|\]\[[^]]*\]/';
$str = 'customFields[<?php echo $field["id"]; ?>][type]';
preg_match_all($re, $str, $matches, PREG_SET_ORDER);
print_r($matches);
Result
Array
(
[0] => Array
(
[0] => customFields[
)
[1] => Array
(
[0] => ][type]
)
)
To get a more exact match you might also use capturing groups:
^((?:(?!<\?php echo[\s?](?:.*?)\;\s\?>).)*)<\?php echo\s(?:.*?)\;[\s?]\?>(.*)$
regex demo | Php demo
What about using positive lookarounds:
(.*)(?=\<\?php echo)|(?<=\?\>)(.*)
Demo

Match regex pattern that isn't within a bbcode tag

I am attempting to create a regex patten that will match words in a string that begin with #
Regex that solves this initial problem is '~(#\w+)~'
A second requirement of the code is that it must also ignore any matches that occur within [quote] and [/quote] tags
A couple of attempts that have failed are:
(?:[0-9]+|~(#\w+)~)(?![0-9a-z]*\[\/[a-z]+\])
/[quote[\s\]][\s\S]*?\/quote](*SKIP)(*F)|~(#\w+)~/i
Example: the following string should have an array output as displayed:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
//run regex match
preg_match_all('regex', $string, $results);
//dump results
var_dump($results[1]);
//results: array consisting of:
[1]=>"#friends"
[2]=>"#john"
[3]=>"#doe
You may use the following regex (based on another related question):
'~(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s'
See the regex demo. The regex accounts for nested [quote] tags.
Details
(\[quote](?:(?1)|.)*?\[/quote])(*SKIP)(*F) - matches the pattern inside capturing parentheses and then (*SKIP)(*F) make the regex engine omit the matched text:
\[quote] - a literal [quote] string
(?:(?1)|.)*? - any 0+ (but as few as possible) occurrences of the whole Group 1 pattern ((?1)) or any char (.)
\[/quote] - a literal [/quote] string
| - or
#\w+ - a # followed with 1+ word chars.
PHP demo:
$results = [];
$string = "#friends #john [quote]#and #jane[/quote] #doe";
$rx = '~(\[quote\](?:(?1)|.)*?\[/quote])(*SKIP)(*F)|#\w+~s';
preg_match_all($rx, $string, $results);
print_r($results[0]);
// => Array ( [0] => #friends [1] => #john [2] => #doe )

Words finder regex fails

I'm using this pattern to check if certain words exists in a string:
/\b(apple|ball|cat)\b/i
It works on this string cat ball apple
but not on no spaces catball smallapple
How can the pattern be modified so that the words match even if they are combined with other words and even if there are no spaces?
Remove \b from the regex. \b will match a word boundary, and you want to match the string that is not a complete word.
You can also remove the capturing group (denoted by ()) as it is not required any longer.
Use
/apple|ball|cat/i
Regex Demo
An IDEONE PHP demo:
$re = "/apple|ball|cat/i";
$str = "no spaces catball smallapple";
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Results:
[0] => cat
[1] => ball
[2] => apple

Multiple Hash Tags removal

function getHashTagsFromString($str){
$matches = array();
$hashTag=array();
if (preg_match_all('/#([^\s]+)/', $str, $matches)) {
for($i=0;$i<sizeof($matches[1]);$i++){
$hashtag[$i]=$matches[1][$i];
}
return $hashtag;
}
}
test string $str = "STR
this is a string
with a #tag and
another #hello #hello2 ##hello3 one
STR";
using above function i am getting answers but not able to remove two # tags from ##hello3 how to remove that using single regular expression
Update your regular expression as follows:
/#+(\S+)/
Explanation:
/ - starting delimiter
#+ - match the literal # character one or more times
(\S+) - match (and capture) any non-space character (shorthand for [^\s])
/ - ending delimiter
Regex101 Demo
The output will be as follows:
Array
(
[0] => tag
[1] => hello
[2] => hello2
[3] => hello3
)
Demo
EDIT: To match all the hash tags use:
preg_match_all('/#\S+/', $str, $match);
To remove, instead of preg_match_all you should use preg_replace for replacement.
$repl = preg_replace('/#\S+/', '', $str);

Categories