RegEx for adding a space in a special pattern - php

Quick note: I know markdown parsers don't care about this issue. It's for the sake of visual consistency in the md file and also experimentation.
Sample:
# this
##that
###or this other
Goal: read each line and,if a markdown header does not have a space after the pound/hashtag sign, add one so that it would look like:
# this
## that
### or this other
My non-regex attempt:
function inelegantFunction (string $string){
$array = explode('#',$string);
$num = count($array);
$text = end($array);
return str_repeat('#', $num-1)." ".$text;
}
echo inelegantFunction("###or this other");
// returns ### or this other
This works, but it has no mechanism to match the unlikely case of seven '#'.
Regardless of efficacy, I would like to figure out how to do this with regex in php (and perhaps javascript if that matters).

Try to match (?m)^#++\K\S which matches lines starting with one or more number signs then replace it with $0 in your function:
return preg_replace('~(?m)^#++\K\S~', ' $0', $string);
See live demo here
To limit the number of #s to six use:
(?m)^(?!#{7})#++\K\S

I'm guessing that a simple expression with a right char-list boundary might be working here, maybe:
(#)([a-z])
If we might be having more chars, we can simply add it to [a-z].
Demo
Test
$re = '/(#)([a-z])/m';
$str = '#this
##that
###that
### or this other';
$subst = '$1 $2';
$result = preg_replace($re, $subst, $str);
echo "The result of the substitution is ".$result;

Related

Regular Expression That Contains At Least One Of Each

I'm trying to capitalize "words" that have at least one number, letter, and special character such as a period or dash.
Things like: 3370.01b, 6510.01.b, m-5510.30, and drm-2013-c-004914.
I don't want it to match things like: hello, sk8, and mixed-up
I'm trying to use lookaheads, as suggested, but I can't get it to match anything.
$output = preg_replace_callback('/\b(?=.*[0-9]+)(?=.*[a-z]+)(?=.*[\.-]+)\b/i', function($matches){return strtoupper($matches[0]);}, $input);
You can use this regex to match the strings you want,
(?=\S*[a-z])(?=\S*\d)[a-z\d]+(?:[.-][a-z\d]+)+
Explanation:
(?=\S*[a-z]) - This look ahead ensures that there is at least an alphabet character in the incoming word
(?=\S*\d) - This look ahead ensures that there is at least a digit in the incoming word
[a-z\d]+(?:[.-][a-z\d]+)+ - This part captures a word contain alphanumeric word containing at least one special character . or -
Online Demo
Here is the PHP code demo modifying your code,
$input = '3370.01b, 6510.01.b, m-5510.30, and drm-2013-c-004914 hello, sk8, and mixed-up';
$output = preg_replace_callback('/(?=\S*[a-z])(?=\S*\d)[a-z\d]+(?:[.-][a-z\d]+)+/i', function($matches){return strtoupper($matches[0]);}, $input);
echo $output;
Prints,
3370.01B, 6510.01.B, M-5510.30, and DRM-2013-C-004914 hello, sk8, and mixed-up
Regular expression:
https://regex101.com/r/sdmlL8/1
(?=.*\d)(.*)([-.])(.*)
PHP code:
https://ideone.com/qEBZQc
$input = '3370.01b';
$output = preg_replace_callback('/(?=.*\d)(.*)([-.])(.*)/i', function($matches){return strtoupper($matches[0]);}, $input);
I don't think you never captured anything to put into matches...
$input = '3370.01b foo';
$output = preg_replace_callback('/(?=.*[0-9])(?=.*[a-z])(\w+(?:[-.]\w+)+)/i', function($matches){return strtoupper($matches[0]);}, $input);
echo $output;
Output
3370.01B foo
Sandbox
https://regex101.com/r/syJWMN/1

Modify my regex so that pattern search should not be case sensitive

I have following php code that removes whole word that matches the pattern
$patterns = ["re", "get", "ER"];
$string = "You are definitely getting better today";
$alternations = implode('|', $patterns);
$re = '(?!(?<=\s)(?:'.$alternations.')(?=\s))\S*(?:'.$alternations.')\S*';
$string = preg_replace('#'.$re.'#', '', $string);
$string = preg_replace('#\h{2,}#', ' ', $string);
echo $string;
I want two modifications
The pattern search should not be case sensitive e.g. the pattern ER must remove better in $string
If removed word in $string have line breaks before or after it, only one line break should be removed.
If $string is
You are definitely getting
better
today
Output must be
You definitely
today
Sample PHP Code
Regards,
You may use
$patterns = ["re", "get", "ER"];
$string = "You are definitely getting\nbetter\ntoday";
$alternations = implode('|', $patterns);
$re = '\R?(?!(?<=\s)(?:'.$alternations.')(?=\s))\S*(?:'.$alternations.')\S*';
$string = preg_replace('#'.$re.'#i', '', $string);
$string = preg_replace('#\h{2,}#', ' ', $string);
echo $string;
See the PHP demo.
While the i modifier provides the case insensitivity to regex matching, another, less obvious thing here is that you need to add an optional line break pattern.
That line break can be matched in various ways, but in PHP PCRE, you may easily match it with \R construct.
Adding a ? quantifier after it, you may make it match 1 or 0 times, i.e. make it optional, so that the whole pattern could still match at the start of the string.

preg replace would ignore non-letter characters when detecting words

I have an array of words and a string and want to add a hashtag to the words in the string that they have a match inside the array. I use this loop to find and replace the words:
foreach($testArray as $tag){
$str = preg_replace("~\b".$tag."~i","#\$0",$str);
}
Problem: lets say I have the word "is" and "isolate" in my array. I will get ##isolate at the output. this means that the word "isolate" is found once for "is" and once for "isolate". And the pattern ignores the fact that "#isoldated" is not starting with "is" anymore and it starts with "#".
I bring an example BUT this is only an example and I don't want to just solve this one but every other possiblity:
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
Output will be:
this #is ##isolated #is an example of this and that
You may build a regex with an alternation group enclosed with word boundaries on both ends and replace all the matches in one pass:
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
echo preg_replace('~\b(?:' . implode('|', $testArray) . ')\b~i', '#$0', $str);
// => this #is #isolated #is an example of this and that
See the PHP demo.
The regex will look like
~\b(?:is|isolated|somethingElse)\b~
See its online demo.
If you want to make your approach work, you might add a negative lookbehind after \b: "~\b(?<!#)".$tag."~i","#\$0". The lookbehind will fail all matches that are preceded with #. See this PHP demo.
A way to do that is to split your string by words and to build a associative array with your original array of words (to avoid the use of in_array):
$str = "this is isolated is an example of this and that";
$testArray = array('is','isolated','somethingElse');
$hash = array_flip(array_map('strtolower', $testArray));
$parts = preg_split('~\b~', $str);
for ($i=1; $i<count($parts); $i+=2) {
$low = strtolower($parts[$i]);
if (isset($hash[$low])) $parts[$i-1] .= '#';
}
$result = implode('', $parts);
echo $result;
This way, your string is processed only once, whatever the number of words in your array.

Operation on string in PHP. Remove part of string

How can i remove part of string from example:
##lang_eng_begin##test##lang_eng_end##
##lang_fr_begin##school##lang_fr_end##
##lang_esp_begin##test33##lang_esp_end##
I always want to pull middle of string: test, school, test33. from this string.
I Read about ltrim, substr and other but I had no good ideas how to do this. Becouse each of strings can have other length for example :
'eng', 'fr'
I just want have string from middle between ## and ##. to Maye someone can help me? I tried:
foreach ($article as $art) {
$title = $art->titl = str_replace("##lang_eng_begin##", "", $art->title);
$art->cleanTitle = str_replace("##lang_eng_end##", "", $title);
}
But there
##lang_eng_end##
can be changed to
##lang_ger_end##
in next row so i ahvent idea how to fix that
If your strings are always in this format, an explode way looks easy:
$str = "##lang_eng_begin##test##lang_eng_end## ";
$res = explode("##", $str)[2];
echo $res;
You may use a regex and extract the value in between the non-starting ## and next ##:
$re = "/(?!^)##(.*?)##/";
$str = "##lang_eng_begin##test##lang_eng_end## ";
preg_match($re, $str, $match);
print_r($match[1]);
See the PHP demo. Here, the regex matches a ## that is not at the string start ((?!^)##), then captures into Group 1 any 0+ chars other than newline as few as possible ((.*?)) up to the first ## substring.
Or, replace all ##...## substrings with `preg_replace:
$re = "/##.*?##/";
$str = "##lang_eng_begin##test##lang_eng_end## ";
echo preg_replace($re, "", $str);
See another demo. Here, we just remove all non-overlapping substrings beginning with ##, then having any 0+ chars other than a newline up to the first ##.

How to not perform preg_replace if subject starts with quote

I'm trying to convert plain links to HTML links using preg_replace. However it's replacing links that are already converted.
To combat this I'd like it to ignore the replacement if the link starts with a quote.
I think a positive lookahead may be needed but everything I've tried hasn't worked.
$string = 'test http://www.example.com';
$string = preg_replace("/((https?:\/\/[\w]+[^ \,\"\n\r\t<]*))/is", "$1", $string);
var_dump($string);
The above outputs:
http://www.example.com">test</a> http://www.example.com
When it should output:
test http://www.example.com
You might get along with lookarounds.
Lookarounds are zero-width assertions that make sure to match/not to match anything immediately around the string in question. They do not consume any characters.
That being said, a negative lookbehind might be what you need in your situation:
(?<![">])\bhttps?://\S+\b
In PHP this would be:
<?php
$string = 'I want to be transformed to a proper link: http://www.google.com ';
$string .= 'But please leave me alone ';
$string .= '(https://www.google.com).';
$regex = '~ # delimiter
(?<![">]) # a neg. lookbehind
https?://\S+ # http:// or https:// followed by not a whitespace
\b # a word boundary
~x'; # verbose to enable this explanation.
$string = preg_replace($regex, "<a href='$0'>$0</a>", $string);
echo $string;
?>
See a demo on ideone.com. However, maybe a parser is more appropriate.
Since you can use Arrays in preg_replace, this might be convenient to use depending on what you want to achieve:
<?php
$string = 'test http://www.example.com';
$rx = array("&(<a.+https?:\/\/[\w]+[^ \,\"\n\r\t<]*>)(.*)(<\/a\>)&si", "&(\s){1,}(https?:\/\/[\w]+[^ \,\"\n\r\t<]*)&");
$rp = array("$1$2$3", "$2");
$string = preg_replace($rx,$rp, $string);
var_dump($string);
// DUMPS:
// 'testhttp://www.example.com'
The Idea
You can split your string at the already existing anchors, and only parse the pieces in between.
The Code
$input = 'test http://www.example.com';
// Split the string at existing anchors
// PREG_SPLIT_DELIM_CAPTURE flag includes the delimiters in the results set
$parts = preg_split('/(<a.*?>.*?<\/a>)/is', $input, PREG_SPLIT_DELIM_CAPTURE);
// Use array_map to parse each piece, and then join all pieces together
$output = join(array_map(function ($key, $part) {
// Because we return the delimiter in the results set,
// every $part with an uneven key is an anchor.
return $key % 2
? preg_replace("/((https?:\/\/[\w]+[^ \,\"\n\r\t<]*))/is", "$1", $part)
: $part;
}, array_keys($parts), $parts);

Categories