Replace content between special characters using preg_replace() - php

I have a paragraph as -
== one ===
==== two ==
= three ====
etc.
The number of = sign vary in every row of the paragraph.
I want to write a preg_replace() expression that will allow me to replace the texts between the = signs.
example:
== DONE ===
==== DONE ==
= DONE ====
I tried preg_replace("/\=+(.*)\=+/","DONE", $paragraph) but that doesn't work. Where am I going wrong?

You can use:
$str = preg_replace('/^=+\h*\K.+?(?=\h*=)/m', 'DONE', $str);
RegEx Demo
RegEx Breakup:
^ # Line start
=+ # Match 1 or more =
\h* # Match or more horizontal spaces
\K # resets the starting point of the reported match
.+? # match 1 or more of any character (non-greedy)
(?=\h*=) # Lookahead to make sure 0 or more space followed by 1 = is there

You have to place the =s back.
Also, instead of .* use [^=]* (matches characters, which are not =) so that the =s don't get eaten up for the replacement.
Additionally, you don't have to escape =:
preg_replace("/(=+)([^=]*)(=+)/","$1 DONE $3", $paragraph);
See it in action

Related

Convert String like *italic* to <i>italic</i> but do not convert if line breaks are within the asterisks limiters

I have the following Regex in my PHP code:
// markers for italic set *Text*
if (substr_count($line, '*')>=2)
{
$line = preg_replace('#\*{1}(.*?)\*{1}#', '<i>$1</i>', $line);
}
which works great.
However, when a $line holds a <br>, e.g.
*This is my text<br>* Some other text
Then the regex still considers the text and transforms it to:
<i>This is my text<br></i> Some other text
The goal is to not translate the text if a <br> is encountered. How to do that with a Regex - using a so called "negative lookahead" or how can the existing Regex be changed?
Note: Strings like *This is my text*<br>Some other text<br>And again *italic*<br>END should still be considered and transformed.
Idea: Or should I explode the $line and then iterate over the results with the regex?!
Using match-what-you-don't-want and discard technique, you may use this regex in PHP (PCRE):
\*[^*]*<br>\*(*SKIP)(*F)|\*([^*]*)\*
and replace with <i>$1</i>
RegEx Demo
PHP code:
$r = preg_replace('/\*[^*]*<br>\*(*SKIP)(*F)|\*([^*]*)\*/'),
"<i>$1</i>", $input);
Explanation:
\*: Match a *
[^*]*: Match 0 or more non-* characters
<br>: Match <br>
\*: Match closing *
(*SKIP)(*F): PCRE verbs to discard and skip this match
|: OR
\*([^*]*)\*: Match string enclosed by *s
You can replace matches of the regular expression
\*(?:(?!<br>)[^*])+\*
with
'<i>$0</i>'
where $0 holds the matched string.
Demo
The regular expression can be broken down as follows.
\* # match '*'
(?: # begin a non-capture group
(?!<br>) # negative lookahead asserts that next four chars are not '<br>'
[^*] # match any char other than '*'
)+ # end non-capture group and execute one or more times
\* # match '*'

Regular Expression for numbers unless all digits are identical

I need help to write a regular expression to match numbers which may be broken up into sections by spaces or dashes e.g:
606-606-606
606606606
123 456-789
However, matches should be rejected if all the digits of the number are identical (or if there are any other characters besides [0-9 -]):
111 111 111
111111111
123456789a
If spaces/dashes weren't allowed, the Regex would be simple:
/^(\d)(?!\1*$)\d*$/
But how would I allow dashes and spaces in the number?
EDIT
How would I allow also letters in the same regex (dashes and spaces shoud be still allowed) e.g:
aaaaa - it's not ok
aa-aaa-aaa-aaaaa - it's not OK
ababab - it's OK
ab-ab-ab - it's OK
This rule checks only numbers.
^(?!(?:(\d)\1+[ -]*)+$)\d[\d- ]+$
Desired results can be achieved by this Regular Expression:
^(?!(?:(\d)\1+[ -]*)+$)\d[\d- ]+$
Live demo
Explanations:
^ # Start of string
(?! # Negative Lookahead to check duplicate numbers
(?: # Non-capturing group
(\d) # Capture first digit
\1+ # More digits same as lately captured one
[ -]* # Any spaces and dashes between
)+ # One or more of what's captured up to now
$ # End of string
) # End of negative lookahead
\d # Start of match with a digit
[\d- ]+ # More than one digit/dash/space
$ # End of string
The theory behind this regex is to use a lookaround to check if string contains any duplicate numbers base on the first captured number. If we have no match in this lookaround, then match it.
Even if you can, i wonder if a regex is the right tool to solve this problem. Just imagine your fellow developers scratching their heads trying to understand your code, how much time do you grant them? Even worse, what if you need to alter the rules?
A small function with some comments could make them happy...
function checkNumberWithSpecialRequirements($number)
{
// ignore given set of characters
$cleanNumber = str_replace([' ', '-'], '', $number);
// handle empty string
if ($cleanNumber == '')
return false;
// check whether non-digit characters are inside
if (!ctype_digit($cleanNumber))
return false;
// check if a character differs from the first (not all equal)
for ($index = 1; $index < strlen($cleanNumber); $index++)
{
if ($cleanNumber[$index] != $cleanNumber[0])
return true;
}
return false;
}

How to multiple regexes into one regex

Hi I have 3 regex preg_match in 1 if..
I want to know if it's possible to mix 3 regex in 1?
this is my if with 3 regex :
if(!preg_match("#\s#",$file) && !preg_match("#\.\.\/#",$file) && (preg_match_all("#/#",$file,$match)==1)):
(I want: no "space" , no "../" and only 1 "/")
thanks for your help.
EDIT
add the needed in list point (more readable):
no "space"
no "../"
1 "/"
It's quite simple. Let's start step by step crafting this regex:
First of all, let's use anchors to define begin&end of string: ^$
I want: no "space", we've got \S which matches a non-white space character: ^\S+$
no "../", let's add a negative lookahead ^(?!.*[.][.]/)\S+$, note that we don't need to escape the dot inside a character class. As for the forwardslash, we'll use different delimiters
one optional "/", we could add a negative lookahead that prevents 2 forwardslashes ^(?!(?:.*/){2})(?!.*[.][.]/)\S+$
Let's define the delimiters and add the s modifier to match newlines with .: ~^(?!(?:.*/){2})(?!.*[.][.]/)\S+$~s and here you go with an online demo
You can use:
if (preg_match('~^(?!.*?(?: |\.\./))(?!(.*?/){2}).*$~', $file) {
...
}
Working Demo
Why not this:
if (preg_match('~((?>[^\s/.]++|\.(?!\./))*)/?(?1)\z~A', $str))
echo 'OK';
details:
~
( # capture group 1
(?>
[^\s./]++ # all that is not a space, a dot or a slash
| # OR
\.(?!\./) # a dot not followed by another dot and a slash
)*
)
/? # optional /
(?1) # repeat the capture group 1
\z # anchor for end of the string
~A # anchored pattern
Note: if you want to exclude the empty string, two possibilities:
if (preg_match('~(?=.)((?>[^\s/.]++|\.(?!\./))*)/?(?1)\z~A', $str))
or
if (preg_match('~((?>[^\s/.]++|\.(?!\./))*)/?(?1)\z~A', $str, $m) && $m)
You cannot merge the three because you have a match_all.
I would replace preg_match_all by substr_count, because pattern is static, so it should be faster.
if(!preg_match("#\s|\.\./#",$file) && (substr_count($file,'/')<=1))
Edit: replaced ==1 by <=1 for / being optional
Edit2: We do not loose too much readability by just merging the two negative patterns

PHP Look behind Regex with variable distance

I need to match a sequence of characters but only if it's not preceded by a "?" or "#" with 0 or more (any) number of wildcard characters in between.
$extension_regex =
'/
(?<!\?|\#) # Negative look behind not "?" or "#"
\/ # Match forward slash
[^\/\?#]+ # Has one or more of any character except forward slash, question mark and hash
\. # Dot
([^\/\?#]+) # Has one or more of any character except forward slash, question mark and hash
/iux';
Examples:
"?randomcharacters/index.php" should not get matched
"#randomcharacters/index.php" should not get matched
"randomcharacters/index.php" should get matched
I understand that the lookbehind is not working because it sees that "/index.php" is not preceded by ? or #. But I can't figure out how to add wildcard "distance" between the ? or # and the /index.php.
The Answer
Based on #Jerry's answer. Here's the full regex as the answer:
$extension_regex =
'~
^
(?:
(?!
[?#]
.*
/
[^/?#]+
\.
[^/?#]+
)
.
)*
/
[^/?#]+
\.
([^/?#]+)
~iux';
You cannot put a variable width assertion within a lookbehind in PCRE, but you could perhaps use a work around using a negative lookahead, something like this maybe?
^(?:(?![#?].*/index.php).)*(/index.php)
I added the capture group just to get the part you want to match, even though it might not be actually useful here.
regex101 demo
^(?:(?![#?].*/index.php).)* will basically match any character, as long as there's no # or ? followed by the string you want to match (/index.php) immediately ahead.
In C#, you might otherwise be able to use:
(?<![#?].*)/index.php
This may help:
$extension_regex = 'string';
$arr = array('?', '#', '0');//these are the forbidden characters
if(in_array(substr($extension_regex, 0, 1), $arr))
echo "true";
else
echo "false";

Consolidate repeating pattern

I am working on a script that develops certain strings of alphanumeric characters, separated by a dash -. I need to test the string to see if there are any sets of characters (the characters that lie in between the dashes) that are the same. If they are, I need to consolidate them. The repeating chars would always occur at the front in my case.
Examples:
KRS-KRS-454-L
would become:
KRS-454-L
DERP-DERP-545-P
would become:
DERP-545-P
<?php
$s = 'KRS-KRS-454-L';
echo preg_replace('/^(\w+)-(?=\1)/', '', $s);
?>
// KRS-454-L
This uses a positive lookahead (?=...) to check for repeated strings.
Note that \w also contains the underscore. If you want to limit to alphanumeric characters only, use [a-zA-Z0-9].
Also, I've anchored with ^ as you've mentioned: "The repeating chars would always occur at the front [...]"
Try the pattern:
/([a-z]+)(?:-\1)*(.*)/i
and replace it with:
$1$2
A demo:
$tests = array(
'KRS-KRS-454-L',
'DERP-DERP-DERP-545-P',
'OKAY-666-A'
);
foreach ($tests as $t) {
echo preg_replace('/([a-z]+)(?:-\1)*(.*)/i', '$1$2', $t) . "\n";
}
produces:
KRS-454-L
DERP-545-P
OKAY-666-A
A quick explanation:
([a-z]+) # group the first "word" in match group 1
(?:-\1)* # match a hyphen followed by what was matched in
# group 1, and repeat it zero or more times
(.*) # match the rest of the input and store it in group 2
the replacement string $1$2 are replaced by what was matched by group 1 and group 2 in the pattern above.
Use this regex ((?:[A-Z-])+)\1{1} and replaced the matched string by $1.
\1 is used in connection with {1} in the above regex. It will look for repeating instance of characters.
You need back references. Using perl syntax, this would work for you:
$line =~ s/([A-Za-z0-9]+-)\1+/\1/gi;

Categories