php regexp how to capture substring to variable

php regexp how to capture substring to variable - php

In php if I capture a string
$string = 'gardens, countryside #teddy135'
how do I capture #username from that string to a new variable in php username begins with # preceded by a space and terminating in a space or the end of the string?
so I would end up with
$string = 'gardens, countryside'
$username ='#teddy135'

Use following regex
\s#(\w+)\b
Regex101 Demo
\s: Matches one space
#: Matches # literally
(\w+): Matches one or more alphanumeric characters including _ and put it in first capturing group
\b: Word boundary
Code:
$re = "/\\s#(\\w+)\\b/";
$str = "gardens, countryside #teddy135 #tushar and #abc";
preg_match_all($re, $str, $matches);

$regex = "/\s(#\S+)/";
$mystr = "gardens, countryside #teddy135 #xyz-12 and #abc.abc";
preg_match_all($regex, $mystr, $matches);
print_r($matches);

Related

PHP Regular Expression Exclusion

Here is the sample PHP code:
<?php
$str = '10,000.1 $100,000.1';
$pattern = '/(?!\$)\d+(,\d{3})*\.?\d*/';
$replacement_str = 'Without$sign';
echo preg_replace($pattern, $replacement_str, $str);?>
Target is to replace numbers only (i.e. "$100,000.1" should not be replaced). But the above code replaces both 10,000.1 and $100,000.1. How to achieve the exclusion?

This assertion is always true (?!\$)\d+ as you match a digit which can not be a $
As the . and the digits at the end of the pattern are optional, it could also match ending on a dot like for example 0,000.
Instead you can assert a whitespace boundary to the left, and optionally match a dot followed by 1 or more digits:
(?<!\S)\d+(?:,\d{3})*(?:\.\d+)?\b
Regex demo
Example:
$str = '10,000.1 $100,000.1';
$pattern = '/(?<!\S)\d+(?:,\d{3})*(?:\.\d+)?\b/';
$replacement_str = 'Without$sign';
echo preg_replace($pattern, $replacement_str, $str);
Output (If you remove the numbers, the text "Without$sign" is not correct)
Without$sign $100,000.1

Split and catch text by a variable delimiter

I have a text which include delimiter tags in the following format:
<\!--[od]+-\d+--\>
Example:
<!--od-14-->
<!--od-1--\>
<!--od-65--\>
I need a regex which will split the text and catch the \d+ numeric argument in the split, also the text after it.
Here's a regex i come up, the problem is it does not return multiple lines.
https://regex101.com/r/xvw8Xw/2

One option is to make the dot match a newline using for example an inline modifier (?s). Then use a non greedy match with a positive lookahead to assert the next comment or the end of the string:
(?s)<\!--[od]+-(\d+)-->(.*?)(?=<!--|$)
(?s) Inline modifier, make the dot match a newline
<\!-- match <!--
[od]+-(\d+)--> Match 1+ times either o or d (which might just be od)
(.*?) Match any char 0+ times except a newline non greedy
(?=<!--|$) Positive lookahead, assert what is on the right is <!-- or the end of the string
Regex demo | Php demo
For example using /s in the pattern:
$re = '/<\!--[od]+-(\d+)-->(.*?)(?=<!--|$)/s';
$str = '<!--od-1--> cdskc sdkjc
dsd
sk<!--od-2-->cscdscsdcsd
cdscs
csdcsdc
<!--od-432-->cdcdscsd';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches);

This expression might also work here on m mode:
<!--od-(\d+)--\>([\s\S]*?)(?=<|$)
or this one on s mode:
<!--od-(\d+)--\>(.*?)(?=<|$)
Demo
Test
$re = '/<!--od-(\d+)--\>(.*?)(?=<|$)/s';
$str = '<!--od-1--> cdskc sdkjc
dsd
sk<!--od-2-->cscdscsdcsd
cdscs
csdcsdc
<!--od-432-->cdcdscsd';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);

Matching all of a certain character after a Positive Lookbehind

I have been trying to get the regex right for this all morning long and I have hit the wall. In the following string I wan't to match every forward slash which follows .com/<first_word> with the exception of any / after the URL.
$string = "http://example.com/foo/12/jacket Input/Output";
match------------------------^--^
The length of the words between slashes should not matter.
Regex: (?<=.com\/\w)(\/) results:
$string = "http://example.com/foo/12/jacket Input/Output"; // no match
$string = "http://example.com/f/12/jacket Input/Output";
matches--------------------^
Regex: (?<=\/\w)(\/) results:
$string = "http://example.com/foo/20/jacket Input/O/utput"; // misses the /'s in the URL
matches----------------------------------------^
$string = "http://example.com/f/2/jacket Input/O/utput"; // don't want the match between Input/Output
matches--------------------^-^--------------^
Because the lookbehind can have no modifiers and needs to be a zero length assertion I am wondering if I have just tripped down the wrong path and should seek another regex combination.
Is the positive lookbehind the right way to do this? Or am I missing something other than copious amounts of coffee?
NOTE: tagged with PHP because the regex should work in any of the preg_* functions.

If you want to use preg_replace then this regex should work:
$re = '~(?:^.*?\.com/|(?<!^)\G)[^/\h]*\K/~';
$str = "http://example.com/foo/12/jacket Input/Output";
echo preg_replace($re, '|', $str);
//=> http://example.com/foo|12|jacket Input/Output
Thus replacing each / by a | after first / that appears after starting .com.
Negative Lookbehind (?<!^) is needed to avoid replacing a string without starting .com like /foo/bar/baz/abcd.
RegEx Demo

Use \K here along with \G.grab the groups.
^.*?\.com\/\w+\K|\G(\/)\w+\K
See demo.
https://regex101.com/r/aT3kG2/6
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
preg_match_all($re, $str, $matches);
Replace
$re = "/^.*?\\.com\\/\\w+\\K|\\G(\\/)\\w+\\K/m";
$str = "http://example.com/foo/12/jacket Input/Output";
$subst = "|";
$result = preg_replace($re, $subst, $str);

Another \G and \K based idea.
$re = '~(?:^\S+\.com/\w|\G(?!^))\w*+\K/~';
The (: non capture group to set entry point ^\S+\.com/\w or glue matches \G(?!^) to it.
\w*+\K/ possessively matches any amount of word characters until a slash. \K resets match.
See demo at regex101

Replace all the first character of words in a string using preg_replace()

I have a string as
This is a sample text. This text will be used as a dummy for "various" RegEx "operations" using PHP.
I want to select and replace all the first alphabet of each word (in the example : T,i,a,s,t,T,t,w,b,u,a,d,f,",R,",u,P). How do I do it?
I tried /\b.{1}\w+\b/. I read the expression as "select any character that has length of 1 followed by word of any length" but didn't work.

You may try this regex as well:
(?<=\s|^)([a-zA-Z"])
Demo

Your regex - /\b.{1}\w+\b/ - matches any string that is not enclosed in word characters, starts with any symbol that is in a position after a word boundary (thus, it can even be whitespace if there is a letter/digit/underscore in front of it), followed with 1 or more alphanumeric symbols (\w) up to the word boundary.
That \b. is the culprit here.
If you plan to match any non-whitespace preceded with a whitespace, you can just use
/(?<!\S)\S/
Or
/(?<=^|\s)\S/
See demo
Then, replace with any symbol you need.

You may try to use the following regex:
(.)[^\s]*\s?
Using the preg_match_all and implode the output result group 1
<?php
$string = 'This is a sample text. This text will be used as a dummy for'
. '"various" RegEx "operations" using PHP.';
$pattern = '/(.)[^\s]*\s?/';
$matches;
preg_match_all($pattern, $string, $matches);
$output = implode('', $matches[1]);
echo $output; //Output is TiastTtwbuaadf"R"uP
For replace use something like preg_replace_callback like:
$pattern = '/(.)([^\s]*\s?)/';
$output2 = preg_replace_callback($pattern,
function($match) { return '_' . $match[2]; }, $string);
//result: _his _s _ _ample _ext. _his _ext _ill _e _sed _s _ _ummy _or _various" _egEx _operations" _sing _HP.

change line to bold, if it ends with a colon, and only contains one word

How do I change the word to bold, if there is only one word on a line with a colon at the end?
data comes from at text field in mysql database, and code is php

You can capture the word and substitute surrounded by <b>
^(\w+):$
Live demo
Sample code:
$re = "/^(\\w+):$/m";
$str = "abc:\nabc\nabc:xyz\n";
$subst = '<b>$1</b>';
$result = preg_replace($re, $subst, $str);
Pattern explanation:
^ the beginning of the string
( group and capture to \1:
\w+ word characters (a-z, A-Z, 0-9, _) (1 or more times)
) end of \1
: ':'
$ before an optional \n, and the end of the string

Use this:
$replaced = preg_replace('~^\w+:$~', '<b>$0</b>', $yourstring);
Explanation
The ^ anchor asserts that we are at the beginning of the string
The \w+ matches one or more word chars
: matches the colon
The $ anchor asserts that we are at the end of the string
We replace with <b>, the overall match (referenced by $0) and </b>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php regexp how to capture substring to variable - php

$regex = "/\s(#\S+)/"; $mystr = "gardens, countryside #teddy135 #xyz-12 and #abc.abc"; preg_match_all($regex, $mystr, $matches); print_r($matches);

Related

PHP Regular Expression Exclusion

Split and catch text by a variable delimiter

Matching all of a certain character after a Positive Lookbehind

Replace all the first character of words in a string using preg_replace()

change line to bold, if it ends with a colon, and only contains one word

Categories

Resources