Php, Regex preg_replace_callback - php

I am using preg_replace_callback, Here is what i am trying to do:
$result = '[code]some code here[/code]';
$result = preg_replace_callback('/\[code\](.*)\[\/code\]/is', function($matches){
return '<div>'.trim($matches[1]).'</div>';
}, $result);
The idea is to replace every match of [code] with <div> and [/code] with </div>, And trim the code between them.
The problem is with this string for example:
$result = '[code]some code[/code]some text[code]some code[/code]';
What i want the result to have 2 separated div's:
$result = '<div>some code</div>some text<div>some code</div>';
The result i get is:
$result = '<div>some code[code]some text[/code]some code</div>';
Now i know the reason, And i understand the regex but i couldn't come up with solution, If anyone know how to make it work i will be very thankful, Thank you all and have a nice day.

Your problem is greedy matchiing:
/\[code\](.*?)\[\/code\]/is
Should behave as you want it to.
Regex Repetition is greedy, which means it captures as many matching items as it can, then gives up one match at a time if it finds that it can't match what's left after the repetition. By using a question mark, you indicate that you want to match non-greedily, or lazily, meaning that the engine will try to match the rest of the regular expression FIRST, then grow the size of the repetition after.

You don't need to use preg_replace_callback() since you can extract the "trimed" content:
$pattern = '~\[code]\s*+((?>[^[\s]++|\s*+(?!\[/code])\[?+)*+)\s*+\[/code]~i';
$replacement = '<div>$1</div>';
$result = preg_replace($pattern, $replacement, $result);

Related

How to remove repeated sequence of characters in a string?

Imagine if:
$string = "abcdabcdabcdabcdabcdabcdabcdabcd";
How do I remove the repeated sequence of characters (all characters, not just alphabets) in the string so that the new string would only have "abcd"? Perhaps running a function that returns a new string with removed repetitions.
$new_string = remove_repetitions($string);
The possible string before removing the repetition is always like above. I don’t know how else to explain since English is not my first language. Other examples are,
$string = “EqhabEqhabEqhabEqhabEqhab”;
$string = “o=98guo=98guo=98gu”;
Note that I want it to work with other sequence of characters as well. I tried using Regex but I couldn't figure out a way to accomplish it. I am still new to php and Regex.
For details : https://algorithms.tutorialhorizon.com/remove-duplicates-from-the-string/
In different programming have a different way to remove the same or duplicate character from a string.
Example: In PHP
<?php
$str = "Hello World!";
echo count_chars($str,3);
?>
OutPut : !HWdelor
https://www.w3schools.com/php/func_string_count_chars.asp
Here, if we wish to remove the repeating substrings, I can't think of a way other than knowing what we wish to collect since the patterns seem complicated.
In that case, we could simply use a capturing group and add our desired output in it the remove everything else:
(abcd|Eqhab|guo=98)
I'm guessing it should be simpler way to do this though.
Test
$re = '/.+?(abcd|Eqhab|guo=98)\1.+/m';
$str = 'abcdabcdabcdabcdabcdabcdabcdabcd
EqhabEqhabEqhabEqhabEqhab
o98guo=98guo=98guo=98guo=98guo=98guo=98guo98';
$subst = '$1';
$result = preg_replace($re, $subst, $str);
echo $result;
Demo
You did not tell what exactly to remove. A "sequnece of characters" can be as small as just 1 character.
So this simple regex should work
preg_replace ( '/(.)(?=.*?\1)/g','' 'abcdabcdabcdabcdabcdabcd');

PHP how to get substring between certain keywords

I have a (strange) string like:
EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR
The pattern I need to look for can only be defined by keywords: EREF+, MREF+, CRED+ and others. I know there are 19 keywords, but the string may contain different subsets of these 19 keywords. I don't know if the order stays the same, from what I can tell EREF+ will most likely be the first keyword, but the order may as well differ. I also don't know which of the 19 keywords might be the last one in the string as that may change case by case.
My first approach was to just use explode() twice, with keyword 1 and keyword 2 – but if the keywords change order (and I cannot guarantee they don't) I would have to go through all possible combinations.
Anyway, here's the first (working) code I used:
<?php
$string = "EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR";
function getBetween($content,$start,$end){
$r = explode($start, $content);
if (isset($r[1])){
$r = explode($end, $r[1]);
return $start.$r[0];
}
return '';
}
$start = "EREF+";
$end = "MREF+";
$output = getBetween($string,$start,$end);
echo $output;
?>
So now I am looking into regex to come up with a solution that extracts a substring between two keywords, where any of the keywords can be the start delimiter while any other keyword may be the end delimiter.
Since there are literally thousands of regex questions around, I took some time and tried to adapt from other solutions, but no success until now. I must confess regex is voodoo to me and I cannot seem to remember the patterns for more than a minute. I found this thread which is pretty close to what I am trying to achieve, and tried a few tweaks but I cannot get it to work properly.
Here's my code so far:
<?php
$string = "EREF+012345678901234MREF+ABCDEF01234567890123CRED+DE12ABC01234567890SVWZ+ABCEDFG HIJ 01234567890 123,45ABWA+ABCDEFGHIJKLMNOPQR";
$matches = array();
$keywords = ['EREF+', 'MREF+', 'CRED+', 'SVWZ+', 'ABWA+'];
$pattern = sprintf('/(?:%s):(.*?)/', join('|', array_map(function($keyword) {
return preg_quote($keyword, '/');
}, $keywords)));
preg_match_all($pattern, $string, $matches);
print_r($matches);
?>
... whereas the constructed pattern looks like this:
/(?:EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+):(.*?)/
Can anyone advise please? Any help appreciated!
Thanks
You can use this regex:
/(?<=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+)(.+?)(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$)/
It will match the strings between defined keywords.
(?<=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+) # look backward for a keyword
(.+?) #Match any character, non greedy
(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$) # Look forward for a keyword or end of string
Regex101
Edit:
If you want to know what keyword caused the split you can use this regex:
/((?:EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+))(.+?)(?=EREF\+|MREF\+|CRED\+|SVWZ\+|ABWA\+|$)/
It will capture the first keyword and the text between keywords.
Live sample

preg_replace with Regex - find number-sequence in URL

I'm a regex-noobie, so sorry for this "simple" question:
I've got an URL like following:
http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx
what I'm going to archieve is getting the number-sequence (aka Job-ID) right before the ".aspx" with preg_replace.
I've already figured out that the regex for finding it could be
(?!.*-).*(?=\.)
Now preg_replace needs the opposite of that regular expression. How can I archieve that? Also worth mentioning:
The URL can have multiple numbers in it. I only need the sequence right before ".aspx". Also, there could be some php attributes behind the ".aspx" like "&mobile=true"
Thank you for your answers!
You can use:
$re = '/[^-.]+(?=\.aspx)/i';
preg_match($re, $input, $matches);
//=> 146370543
This will match text not a hyphen and not a dot and that is followed by .aspx using a lookahead (?=\.aspx).
RegEx Demo
You can just use preg_match (you don't need preg_replace, as you don't want to change the original string) and capture the number before the .aspx, which is always at the end, so the simplest way, I could think of is:
<?php
$string = "http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-146370543.aspx";
$regex = '/([0-9]+)\.aspx$/';
preg_match($regex, $string, $results);
print $results[1];
?>
A short explanation:
$result contains an array of results; as the whole string, that is searched for is the complete regex, the first element contains this match, so it would be 146370543.aspx in this example. The second element contains the group captured by using the parentheeses around [0-9]+.
You can get the opposite by using this regex:
(\D*)\d+(.*)
Working demo
MATCH 1
1. [0-100] `http://stellenanzeige.monster.de/COST-ENGINEER-AUTOMOTIVE-m-w-Job-Mainz-Rheinland-Pfalz-Deutschland-`
2. [109-114] `.aspx`
Even if you just want the number for that url you can use this regex:
(\d+)

I need to find a way explode a specific string that has quotes in it

I'm having serious trouble with this and I'm not really experienced enough to understand how I should go about it.
To start off I have a very long string known as $VC. Each time it's slightly different but will always have some things that are the same.
$VC is an htmlspecialchars() string that looks something like
Example Link... Lots of other stuff in between here... 80] ,[] ,"","3245697351286309258",[] ,["812750926... and it goes on ...80] ,[] ,"","6057413202557366578",[] ,["103279554... and it continues on
In this case the <a> tag is always the same so I take my information from there. The numbers listed after it such as ,"3245697351286309258",[] and ,"6057413202557366578",[] will also always be in the same format, just different numbers and one of those numbers will always be a specific ID.
I then find that specific ID I want, I will always want that number inside pid%3D and %26oid.
$pid = explode("pid%3D", $VC, 2);
$pid = explode("%26oid", $pid[1], 2);
$pid = $pid[0];
In this case that number is 6057413202557366578. Next I want to explode $VC in a way that lets me put everything after ,"6057413202557366578",[] into a variable as its own string.
This is where things start to break down. What I want to do is the following
$vinfo = explode(',"'.$pid.'",[]',$VC,2);
$vinfo = $vinfo[1]; //Everything after the value I used to explode it.
Now naturally I did look around and try other things such as preg_split and preg_replace but I've got to admit, it is beyond me and as far as I can tell, those don't let you put your own variable in the middle of them (e.g. ',"'.$pid.'",[]').
If I'm understanding the whole regular expression idea, there might be other problems in that if I look for it without the $pid variable (e.g. just the surrounding characters), it will pick up the similar parts of the string before it gets to the one I want, (e.g. the ,"3245697351286309258",[]).
I hope I've explained this well enough, the main question though is - How can I get the information after that specific part of the string (',"'.$pid.'",[]') into a variable?
I hope this does what you want:
pid%3D(?P<id>\d+).*?"(?P=id)",\[\](?P<vinfo>.*?)}\);<\/script>
It captures the number after pid%3D in group id, and everything after "id",[] (until the next occurence of });</script>) in group vinfo.
Here's a demo with shortened text.
The problem of capturing more than you want is fixed using capture groups. You'll wrap part of a regular expression in parenthesis to capture it.
You can use preg_match_all to do more robust regular expression capture. You will get an array of things that contains matches to the string that matched the entire pattern plus a string with a partial match for each capture group you use. We'll start by capturing the parts of the string you want. There are no capture groups at this point:
$text = 'Example Link... Lots of other stuff in between here... 80] ,[] ,"","3245697351286309258",[] ,["812750926... and it goes on ...80] ,[] ,"","6057413202557366578",[] ,["103279554... and it continues on"';
$pattern = '/,"\\d+",\\[\\]/';
preg_match_all($pattern,
$text,
$out, PREG_PATTERN_ORDER);
echo $out[0][0]; //echo ,"3245697351286309258",[]
Now to get just the pids into a variable, you can add a capture group in your pattern. The capture group is done by adding parenthesis:
$text = ...
$pattern = '/,"(\\d+)",\\[\\]/'; // the \d+ match will be capture
preg_match_all($pattern,
$text,
$out, PREG_PATTERN_ORDER);
$pids = $out[1];
echo $pids[0]; // echo 3245697351286309258
Notice the first (and only in this case) capture group is in $out[1] (which is an array). What we have captured is all the digits.
To capture everything else, assuming everything is between square brackets, you could match more and capture it. To address the question, we'll use two capture groups. The first will capture the digits and the second will capture everything matching square brackets and everything in between:
$text = ...;
$pattern = '/,"(\\d+)",\\[\\] ,(\\[.+?\\])/';
preg_match_all($pattern,
$text,
$out, PREG_PATTERN_ORDER);
$pids = $out[1];
$contents = $out[2];
echo $pids[0] . "=" . $contents[0] ."\n";
echo $pids[1] . "=". $contents[1];

Make me understand preg_replace

I've been looking all over the internet for some useful information and I think I found too much. I'm trying to understand regular expressions but don't get it.
Lets for instance say $data="A bunch of text [link=123] another bunch of text.", and it should get replaced with "< a href=\"123.html\">123< /a>".
I've been trying around a lot with code similar to this:
$find = "/[link=[([0-9])]/";
$replace = "< a href=\"$1\">$1< /a>";
echo preg_replace ($find, $replace, $data);
but the output is always the same as the original $data.
I think I have to see something relevent to my problem understand the basics.
Remove the extra [] around the (), and add + after the [0-9] to quantify it. Also, escape the [] that make up the tag itself.
$find = "/\[link=(\d+)\]/"; // "\d" is equivalent to "[0-9]"
$replace = "$1";
echo preg_replace($find,$replace,$data);
The regex would be \[link=([\d]+)\]
A good source for an quick overview of regular expression can you find here http://www.regular-expressions.info/
When you really interested in the power of regular expression, you should buy this book: Mastering Regular Expressions
A good Programm to test your RexEx on a Windows Client is: RegEx-Trainer
You are missing the + quantifier and as a result of this your pattern matches if there is a single digit following link=.
And there is an extra pair of [..] as a result of this the outer [...] will be treated as the character class.
You also forgot the escape the closing ].
Solution:
$find = "/[link=([0-9]+)\]/";
<?php
$data= "A bunch of text [link=123] another bunch of text.";
$find = '/\[link=([0-9]+?)\]/';
echo preg_replace($find, "$1", $data);

Categories