Match pattern and exclude substrings with preg_match_all

Match pattern and exclude substrings with preg_match_all - php

I need to find all the strings placed between START and END, escluding PADDING substring from matched string. The best way I've found is
$r="stuffSTARTthisPADDINGisENDstuffstuffSTARTwhatPADDINGIwantPADDINGtoPADDINGfindENDstuff" ;
preg_match_all('/START(.*?)END/',str_replace('PADDING','',$r),$m);
print(join($m[1]));
> thisiswhatIwanttofind
I want to do this with the smallest code size possible: there a shorter with only preg_match_all and no str_replace, that eventually returns directly the string without join arrays? I've tried with some lookaround expressions but I can't find the proper one.

$r="stuffSTARTthisPADDINGisENDstuffstuffSTARTwhatPADDINGIwantPADDINGtoPADDINGfindENDstuff";
echo preg_replace('/(END.*?START|PADDING|^[^S]*START|END.*$)/', '', $r);
This should return you thisiswhatIwanttofind using a single regular expression pattern
Explanation:-
END.*?START # Replace occurrences of END to START
PADDING # Replace PADDING
^[^S]*START # Replace any character until the first START (inclusive)
END.*$ # Replace the last END and until end of the string

$r="stuffSTARTthisPADDINGisENDstuffstuffSTARTwhatPADDINGIwantPADDINGtoPADDINGfindENDstuff" ;
preg_match_all('/(?:START)(.*?)(?:END)/',str_replace('PADDING','',$r),$m);
var_dump(implode(' ',$m[1]));
would work but I guess you want something faster.

You can also use use preg_replace_callback like this:
$str = preg_replace_callback('#.*?START(.*?)END((?!.*?START.*?END).*$)?#',
function ($m) {
print_r($m);
return str_replace('PADDING', '', $m[1]);
}, $r);
echo $str . "\n"; // prints thisiswhatIwanttofind

Related

simple pattern with preg_match_ALL work fine!, how to use with preg_replace?

thanks by your help.
my target is use preg_replace + pattern for remove very sample strings.
then only using preg_replace in this string or others, I need remove ANY content into <tag and next symbol >, the pattern is so simple, then:
$x = '#<\w+(\s+[^>]*)>#is';
$s = 'DATA<td class="td1">111</td><td class="td2">222</td>DATA';
preg_match_all($x, $s, $Q);
print_r($Q[1]);
[1] => Array
(
[0] => class="td1"
[1] => class="td2"
)
work greath!
now I try remove strings using the same pattern:
$new_string = '';
$Q = preg_replace($x, "\\1$new_string", $s);
print_r($Q);
result is completely different.
what is bad in my use of preg_replace?
using only preg_replace() how I can remove this strings?
(we can use foreach(...) for remove each string, but where is the error in my code?)
my result expected when I intro this value:
$s = 'DATA<td class="td1">111</td><td class="td2">222</td>DATA';
is this output:
$Q = 'DATA<td>111</td><td>222</td>DATA';

Let's break down your RegEx, #<\w+(\s+[^>]*)>#is, and see if that helps.
# // Start delimiter
< // Literal `<` character
\w+ // One or more word-characters, a-z, A-Z, 0-9 or _
( // Start capturing group
\s+ // One or more spaces
[^>]* // Zero or more characters that are not the literal `>`
) // End capturing group
> // Literal `>` character
# // End delimiter
is // Ignore case and `.` matches all characters including newline
Given the input DATA<td class="td1">DATA this matches <td class="td1"> and captures class="td1". The difference between match and capture is very important.
When you use preg_match you'll see the entire match at index 0, and any subsequent captures at incrementing indexes.
When you use preg_replace the entire match will be replaced. You can use the captures, if you so choose, but you are replacing the match.
I'm going to say that again: whatever you pass as the replacement string will replace the entirety of the found match. If you say $1 or \\=1, you are saying replace the entire match with just the capture.
Going back to the sample after the breakdown, using $1 is the equivalent of calling:
str_replace('<td class="td1">', ' class="td1"', $string);
which you can see here: https://3v4l.org/ZkPFb
To your question "how to change [0] by $new_string", you are doing it correctly, it is your RegEx itself that is wrong. To do what you are trying to do, your pattern must capture the tag itself so that you can say "replace the HTML tag with all of the attributes with just the tag".
As one of my comments noted, this is where you'd invert the capturing. You aren't interesting in capturing the attributes, you are throwing those away. Instead, you are interested in capturing the tag itself:
$string = 'DATA<td class="td1">DATA';
$pattern = '#<(\w+)\s+[^>]*>#is';
echo preg_replace($pattern, '<$1>', $string);
Demo: https://3v4l.org/oIW7d

How can remove the numberic suffix in php?

For example, if I want to get rid of the repeating numeric suffix from the end of an expression like this:
some_text_here_1
Or like this:
some_text_here_1_5
and I want finally receive something like this:
some_text_here
What's the best and flexible solution?

$newString = preg_replace("/_?\d+$/","",$oldString);
It is using regex to match an optional underscore (_?) followed by one or more digits (\d+), but only if they are the last characters in the string ($) and replacing them with the empty string.
To capture unlimited _ numbers, just wrap the whole regex (except the $) in a capture group and put a + after it:
$newString = preg_replace("/(_?\d+)+$/","",$oldString);
If you only want to remove a numberic suffix if it is after an underscore (e.g. you want some_text_here14 to not be changed, but some_text_here_14 to be changed), then it should be:
$newString = preg_replace("/(_\d+)+$/","",$oldString);

Updated to fix more than one suffix
Strrpos is far better than regex on such a simple string problem.
$str = "some_text_here_13_15";
While(is_numeric(substr($str, strrpos($str, "_")+1))){
$str = substr($str,0 , strrpos($str, "_"));
}
Echo $str;
Strrpos finds the last "_" in str and if it's numeric remove it.
https://3v4l.org/OTdb9
Just to give you an idea of what I mean with regex not being a good solution on this here is the performance.
Regex:
https://3v4l.org/Tu8o2/perf#output
0.027 seconds for 100 runs.
My code with added numeric check:
https://3v4l.org/dkAqA/perf#output
0.003 seconds for 100 runs.
This new code performs even better than before oddly enough, regex is very slow. Trust me on that
You be the judge on what is best.

First you'll want to do a preg_replace() in order to remove all digits by using the regex /\d+/. Then you'll also want to trim any underscores from the right using rtrim(), providing _ as the second parameter.
I've combined the two in the following example:
$string = "some_text_here_1";
echo rtrim(preg_replace('/\d+/', '', $string), '_'); // some_text_here
I've also created an example of this at 3v4l here.
Hope this helps! :)

$reg = '#_\d+$#';
$replace = '';
echo preg_replace($reg, $replace, $string);
This would do
abc_def_ghi_123 > abc_def_ghi
abc_def_1 > abc_def
abc_def_ghi > abc_def_ghi
abd_def_ > abc_def_
abc_123_def > abd_123_def
in case of abd_def_123_345 > abc_def
one could change the line
$reg = '#(?:_\d+)+$#';

how to remove everything before second occurance of underscore

I couldn't find the solution using search.
I am looking for a php solution to remove all character BEFORE the second occurance of and underscore (including the underscore)
For example:
this_is_a_test
Should output as:
a_test
I currently have this code but it will remove everything after the first occurance:
preg_replace('/^[^_]*.s*/', '$1', 'this_is_a_test');

Using a slightly different approach,
$s='this_is_a_test';
echo implode('_', array_slice( explode( '_', $s ),2 ) );
/* outputs */
a_test

preg_replace('/^.*_.*_(.*)$/U', '$1', 'this_is_a_test');
Note the U modifier which tells regex to take as less characters for .* as possible.

You can also use explode, implode along with array_splice like as
$str = "this_is_a_test";
echo implode('_',array_splice(explode('_',$str),2));//a_test
Demo

Why go the complicated way? This is a suggestion though using strrpos and substr:
<?php
$str = "this_is_a_test";
$str_pos = strrpos($str, "_");
echo substr($str, $str_pos-1);
?>

Try this one.
<?php
$string = 'this_is_a_test';
$explode = explode('_', $string, 3);
echo $explode[2];
?>
Demo

I'm still in favor of a regular expression in this case:
preg_replace('/^.*?_.*?_/', '', 'this_is_a_test');
Or (which looks more complex here but is easily adjustable to N..M underscores):
preg_replace('/^(?:.*?_){2}/', '', 'this_is_a_test');
The use of the question mark in .*? makes the match non-greedy; and the pattern has been expanded from the original post to "match up through" the second underscore.
Since the goal is to remove text the matched portion is simply replaced with an empty string - there is no need for a capture group or to use such as the replacement value.
If the input doesn't include two underscores then nothing is removed; such can be adjusted, very easily with the second regular expression, if the rules are further clarified.

php preg_replace remove entire line (from a block of many lines ) if it contains an occurence of a word

Guys ( preg_replace gurus );
I am looking for a preg_replace snippet , that i can use in a php file whereby if a word appears in a particular line, that entire line is deleted/replaced with an empty line
pseudocode:
$unwanted_lines=array("word1","word2"."word3");
$new_block_of_lines=preg_replace($unwanted_lines, block_of_lines);
Thanx.

The expression
First, let's work out the expression you will need to match the array of words:
/(?:word1|word2|word3)/
The (?: ... ) expression creates a group without capturing its contents into a memory location. The words are separated by a pipe symbol, so that it matches either word.
To generate this expression with PHP, you need the following construct:
$unwanted_words = array("word1", "word2", "word3");
$unwanted_words_match = '(?:' . join('|', array_map(function($word) {
return preg_quote($word, '/');
}, $unwanted_words)) . ')';
You need preg_quote() to generate a valid regular expression from a regular string, unless you're sure that it's valid, e.g. "abc" doesn't need to be quoted.
See also: array_map() preg_quote()
Using an array of lines
You can split the block of text into an array of lines:
$lines = preg_split('/\r?\n/', $block_of_lines);
Then, you can use preg_grep() to filter out the lines that don't match and produce another array:
$wanted_lines = preg_grep("/$unwanted_words_match/", $lines, PREG_GREP_INVERT);
See also: preg_split() preg_grep()
Using a single preg_replace()
To match a whole line containing an unwanted word inside a block of text with multiple lines, you need to use line anchors, like this:
/^.*(?:word1|word2|word3).*$/m
Using the /m modifier, the anchors ^ and $ match the start and end of the line respectively. The .* on both sides "flush" the expression left and right of the matched word.
One thing to note is that $ matches just before the actual line ending character (either \r\n or \n). If you perform replacement using the above expression it will not replace the line endings themselves.
You need to match those extra characters by extending the expression like this:
/^.*(?:word1|word2|word3).*$(?:\r\n|\n)?/m
I've added (?:\r\n|\n)? behind the $ anchor to match the optional line ending. This is the final code to perform the replacement:
$replace_match = '/^.*' . $unwanted_words_match . '.*$(?:\r\n|\n)?/m';
$result = preg_replace($replace_match, '', $block_of_lines);
Demo

This regular expression can remove the match from a line
$newstring = preg_replace("/^.*word1.*$/", "", $string);

As #jack pointed out, let's just use preg_quote() && array_map()
$array = array('word1', 'word2', 'word3', 'word#4', 'word|4');
$text = 'This is some random data1
This is some word1 random data2
This is some word2 random data3
This is some random data4
This is some word#4 random data5
This is some word|4 random data6
This is some word3 random data7'; // Some data
$array = array_map(function($v){
return preg_quote($v, '#');
}, $array); // Escape it
$regex = '#^.*('. implode('|', $array) .').*$#m'; // construct our regex
$output = preg_replace($regex, '', $text); // remove lines
echo $output; // output
Online demo

return part of a string

I'm trying to return a certain part of a string. I've looked at substr, but I don't believe it's what I'm looking for.
Using this string:
/text-goes-here/more-text-here/even-more-text-here/possibly-more-here
How can I return everything between the first two // i.e. text-goes-here
Thanks,

$str="/text-goes-here/more-text-here/even-more-text-here/possibly-more-here";
$x=explode('/',$str);
echo $x[1];
print_r($x);// to see all the string split by /

<?php
$String = '/text-goes-here/more-text-here/even-more-text-here/possibly-more-here';
$SplitUrl = explode('/', $String);
# First element
echo $SplitUrl[1]; // text-goes-here
# You can also use array_shift but need twice
$Split = array_shift($SplitUrl);
$Split = array_shift($SplitUrl);
echo $Split; // text-goes-here
?>

The explode methods above certainly work. The reason for matching on the second element is that PHP inserts blank elements in the array whenever it starts with or runs into the delimiter without anything else. Another possible solution is to use regular expressions:
<?php
$str="/text-goes-here/more-text-here/even-more-text-here/possibly-more-here";
preg_match('#/(?P<match>[^/]+)/#', $str, $matches);
echo $matches['match'];
The (?P<match> ... part tells it to match with a named capture group. If you leave out the ?P<match> part, you'll end up with the matching part in $matches[1]. $matches[0] will contain the part with the forward slashes like "/text-goes-here/".

Just use preg_match:
preg_match('#/([^/]+)/#', $string, $match);
$firstSegment = $match[1]; // "text-goes-here"
where
# - start of regex (can be any caracter)
/ - a litteral /
( - beginning of a capturing group
[^/] - anything that isn't a litteral /
+ - one or more (more than one litteral /)
) - end of capturing group
/ - a litteral /
# - end of regex (must match first character of the regex)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Match pattern and exclude substrings with preg_match_all - php

$r="stuffSTARTthisPADDINGisENDstuffstuffSTARTwhatPADDINGIwantPADDINGtoPADDINGfindENDstuff" ; preg_match_all('/(?:START)(.*?)(?:END)/',str_replace('PADDING','',$r),$m); var_dump(implode(' ',$m[1])); would work but I guess you want something faster.

You can also use use preg_replace_callback like this: $str = preg_replace_callback('#.?START(.?)END((?!.?START.?END).*$)?#', function ($m) { print_r($m); return str_replace('PADDING', '', $m[1]); }, $r); echo $str . "\n"; // prints thisiswhatIwanttofind

Related

simple pattern with preg_match_ALL work fine!, how to use with preg_replace?

How can remove the numberic suffix in php?

how to remove everything before second occurance of underscore

php preg_replace remove entire line (from a block of many lines ) if it contains an occurence of a word

return part of a string

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Match pattern and exclude substrings with preg_match_all - php

$r="stuffSTARTthisPADDINGisENDstuffstuffSTARTwhatPADDINGIwantPADDINGtoPADDINGfindENDstuff" ; preg_match_all('/(?:START)(.*?)(?:END)/',str_replace('PADDING','',$r),$m); var_dump(implode(' ',$m[1])); would work but I guess you want something faster.

You can also use use preg_replace_callback like this: $str = preg_replace_callback('#.*?START(.*?)END((?!.*?START.*?END).*$)?#', function ($m) { print_r($m); return str_replace('PADDING', '', $m[1]); }, $r); echo $str . "\n"; // prints thisiswhatIwanttofind

Related

simple pattern with preg_match_ALL work fine!, how to use with preg_replace?

How can remove the numberic suffix in php?

how to remove everything before second occurance of underscore

php preg_replace remove entire line (from a block of many lines ) if it contains an occurence of a word

return part of a string

Categories

Resources

You can also use use preg_replace_callback like this: $str = preg_replace_callback('#.?START(.?)END((?!.?START.?END).*$)?#', function ($m) { print_r($m); return str_replace('PADDING', '', $m[1]); }, $r); echo $str . "\n"; // prints thisiswhatIwanttofind