How to remove repeated sequence of characters in a string? - php

Imagine if:
$string = "abcdabcdabcdabcdabcdabcdabcdabcd";
How do I remove the repeated sequence of characters (all characters, not just alphabets) in the string so that the new string would only have "abcd"? Perhaps running a function that returns a new string with removed repetitions.
$new_string = remove_repetitions($string);
The possible string before removing the repetition is always like above. I don’t know how else to explain since English is not my first language. Other examples are,
$string = “EqhabEqhabEqhabEqhabEqhab”;
$string = “o=98guo=98guo=98gu”;
Note that I want it to work with other sequence of characters as well. I tried using Regex but I couldn't figure out a way to accomplish it. I am still new to php and Regex.

For details : https://algorithms.tutorialhorizon.com/remove-duplicates-from-the-string/
In different programming have a different way to remove the same or duplicate character from a string.
Example: In PHP
<?php
$str = "Hello World!";
echo count_chars($str,3);
?>
OutPut : !HWdelor
https://www.w3schools.com/php/func_string_count_chars.asp

Here, if we wish to remove the repeating substrings, I can't think of a way other than knowing what we wish to collect since the patterns seem complicated.
In that case, we could simply use a capturing group and add our desired output in it the remove everything else:
(abcd|Eqhab|guo=98)
I'm guessing it should be simpler way to do this though.
Test
$re = '/.+?(abcd|Eqhab|guo=98)\1.+/m';
$str = 'abcdabcdabcdabcdabcdabcdabcdabcd
EqhabEqhabEqhabEqhabEqhab
o98guo=98guo=98guo=98guo=98guo=98guo=98guo98';
$subst = '$1';
$result = preg_replace($re, $subst, $str);
echo $result;
Demo

You did not tell what exactly to remove. A "sequnece of characters" can be as small as just 1 character.
So this simple regex should work
preg_replace ( '/(.)(?=.*?\1)/g','' 'abcdabcdabcdabcdabcdabcd');

Related

Multiple preg_replace

I have many strings that all start with #and a pseudo and I want to change these pseudo via regex to the real name.
I haven't many pseudo (maybe 5 to 10) so I can go with a simple regex like:
$find = array('#alex', '#donald');
$replace = array('Alex A.', 'Donald B.' );
$result= preg_replace($find, $replace, $feed->itemTitle);
My problem is that I already have a pre_replace on these string, that removes the link. So far this is my regex:
<?php echo preg_replace('#(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?).....#',' ',$feed->itemTitle); ?>
I can't come up with a solution that will mix the two regex. (regex is something I am not confortable with).
To have already a preg_replace for the links isn't a problem, don't bother about that.
If you want you can build a giant pattern with capture groups to be used with preg_replace_callback that allows the callback function to choose which is the replacement string to return according to the capture group number. However, this isn't the good way.
Since, you want to replace fixed strings (#alex, #donald are fixed strings) the best and fastest way is to use strtr (even if you parse the string a second time):
$trans = array('#alex' => 'Alex A.',
'#donald' => 'Donald B.');
$result = strtr($feed->itemTitle, $trans);

Converting links occuring inside a string

I am attempting to change a string occurance e.g. http://www.bbc.co.uk/ so that it appears inside a html link e.g. http://www.bbc.co.uk
however for some reason my regex conversion does not work. Can someone please point me in the correct direction?
$text = "I love this website http://www.bbc.co.uk/";
$x = preg_replace("#[a-z]+://[^<>\s]+[[a-z0-9]/]#i", "\\0", $text);
var_dump($x);
outputs I love this website http://www.bbc.co.uk/ (No html link)
Your weird character class is at fault:
[[a-z0-9]/]
Double square brackets are for POSIX character classes like [[:digit:]].
You meant to write just:
[a-z0-9/]
It is because you regex is giving you a match (in fact it's really not even close to giving you a match as you are not accepting periods in the domain name at all). Try something like this:
$pattern = '#https?://.*\b#i';
$replace = '$0';
$x = preg_replace($pattern, $replace, $text);
Note that I am not actually trying to validate the URL format here, so I just accept anything like http():// up to the next word boundary. It didn't seem as if you were going for a true URL validation regex anyway (i.e. validating there is at least one ., that the TLD component has 2-6 characters, etc.), so I just figure I would give you the simplest pattern that would match.
Use this:
$x = preg_replace('#http://[?=&a-z0-9._/-]+#i', '<a target="_blank" href="$0">$0</a>', $text);

A more efficient string cleaning Regex in PHP

Okay, I was hoping someone could help me with a little regex-fu.
I am trying to clean up a string.
Basically, I am:
Replacing all characters except A-Za-z0-9 with a replacement.
Replacing consecutive duplicates of the replacement with a single instance of the replacement.
Trimming the replacement from the beginning and end of the string.
Example Input:
(&&(%()$()#&#&%&%%(%$+-_The dog jumped over the log*(&)$%&)#)##%&)&^)##)
Required Output:
The+dog+jumped+over+the+log
I am currently using this very discombobulated code and just know there is a much more elegant way to accomplish this....
function clean($string, $replace){
$ok = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
$ok .= $replace;
$pattern = "/[^".preg_quote($ok, "/")."]/";
return trim(preg_replace('/'.preg_quote($replace.$replace).'+/', $replace, preg_replace($pattern, $replace, $string)),$replace);
}
Could a Regex-Fu Master please grace me with a simpler/more efficient solution?
A much better solution suggested and explained by Botond Balázs and hakre:
function clean($string, $replace, $skip=""){
// Escape $skip
$escaped = preg_quote($replace.$skip, "/");
// Regex pattern
// Replace all consecutive occurrences of "Not OK"
// characters with the replacement
$pattern = '/[^A-Za-z0-9'.$escaped.']+/';
// Execute the regex
$result = preg_replace($pattern, $replace, $string);
// Trim and return the result
return trim($result, $replace);
}
I'm not a "regex ninja" but here's how I would do it.
function clean($string, $replace){
/// Remove all "not OK" characters from the beginning and the end:
$result = preg_replace('/^[^A-Za-z0-9]+/', '', $string);
$result = preg_replace('/[^A-Za-z0-9]+$/', '', $result);
// Replace all consecutive occurrences of "not OK"
// characters with the replacement:
$result = preg_replace('/[^A-Za-z0-9]+/', $replace, $result);
return $result;
}
I guess this could be simplified more but when dealing with regexes, clarity and readability is often more important than being clever or writing super-optimal code.
Let's see how it works:
/^[^A-Za-z0-9]+/:
^ matches the beginning of the string.
[^A-Za-z0-9] matches all non-alphanumeric characters
+ means "match one or more of the previous thing"
/[^A-Za-z0-9]+$/:
same thing as above, except $ matches the end of the string
/[^A-Za-z0-9]+/:
same thing as above, except it matches mid-string too
EDIT: OP is right that the first two can be replaced with a call to trim():
function clean($string, $replace){
// Replace all consecutive occurrences of "not OK"
// characters with the replacement:
$result = preg_replace('/[^A-Za-z0-9]+/', $replace, $result);
return trim($result, $replace);
}
I don't want to sound super-clever, but I would not call it regex-foo.
What you do is actually pretty much in the right direction because you use preg_quote, many others are not even aware of that function.
However probably at the wrong place. Wrong place because you quote for characters inside a character class and that has (similar but) different rules for quoting in a regex.
Additionally, regular expressions have been designed with a case like yours in mind. That is probably the part where you look for a wizard, let's see some options how to make your negative character class more compact (I keep the generation out to make this more visible):
[^0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]
There are constructs like 0-9, A-Z and a-z that can represent exactly that. As you can see - is a special character inside a character class, it is not meant literal but as having some characters from-to:
[^0-9A-Za-z]
So that is already more compact and represents the same. There are also notations like \d and \w which might be handy in your case. But I take the first variant for a moment, because I think it's already pretty visible what it does.
The other part is the repetition. Let's see, there is + which means one or more. So you want to replace one or more of the non-matching characters. You use it by adding it at the end of the part that should match one or more times (and by default it's greedy, so if there are 5 characters, those 5 will be taken, not 4):
[^0-9A-Za-z]+
I hope this is helpful. Another step would be to also just drop the non-matching characters at the beginning and end, but it's early in the morning and I'm not that fluent with that.

Replace from one custom string to another custom string

How can I replace a string starting with 'a' and ending with 'z'?
basically I want to be able to do the same thing as str_replace but be indifferent to the values in between two strings in a 'haystack'.
Is there a built in function for this? If not, how would i go about efficiently making a function that accomplishes it?
That can be done with Regular Expression (RegEx for short).
Here is a simple example:
$string = 'coolAfrackZInLife';
$replacement = 'Stuff';
$result = preg_replace('/A.*Z/', $replacement, $string);
echo $result;
The above example will return coolStuffInLife
A little explanation on the givven RegEx /A.*Z/:
- The slashes indicate the beginning and end of the Regex;
- A and Z are the start and end characters between which you need to replace;
- . matches any single charecter
- * Zero or more of the given character (in our case - all of them)
- You can optionally want to use + instead of * which will match only if there is something in between
Take a look at Rubular.com for a simple way to test your RegExs. It also provides short RegEx reference
$string = "I really want to replace aFGHJKz with booo";
$new_string = preg_replace('/a[a-zA-z]+z/', 'boo', $string);
echo $new_string;
Be wary of the regex, are you wanting to find the first z or last z? Is it only letters that can be between? Alphanumeric? There are various scenarios you'd need to explain before I could expand on the regex.
use preg_replace so you can use regex patterns.

PHP regex title conversion / negative look ahead / toLowerCase

I'm trying to convert some titles in my html pages to <h2>. The pattern is simple.
<?php
$test = "<p><strong>THIS IS A TEST</strong></p><div>And this is Random STUFF</div><p><strong>CP</strong></p>";
$pattern = "/<p><strong>([A-Z ]*?)<\/strong><\/p>/";
$replacement = "<h2>$1</h2>";
$test = preg_replace($pattern, $replacement, $test);
?>
Basically, grab anything that's between <p><strong></strong></p> that is capitalized. Easy enough, so here's the complicated bit.
Firstly, I need to make a single exception. <p><strong>CP</strong></p> must not be converted to <h2>. I tried adding ?!(CP) right after the <p><strong> but it doesn't work.
Secondly, I need to be able to make the first letter capitalized. When I use "ucfirst" with "strtolower" on the preg_replace (ex:ucfirst(strtolower(preg_replace($pattern, $replacement, $test)));), it makes all the characters in the string to lowercase and ucfirst doesn't work as it's detecting "<" to be the first character.
Any hints or am I even going in the right direction?
EDIT
Thanks for the help, it was definitely better to use preg_replace_callback. I found that all my titles were more than 3 characters so I added the limiter. Also added special characters.
Here's my final code:
$pattern = "/<p><strong>([A-ZÀ-ÿ0-9 ']{3,}?)<\/strong><\/p>/";
$replacement = "<h2>$1</h2>";
$test[$i] = preg_replace_callback($pattern, create_function('$matches', 'return "<h2>".ucfirst(mb_strtolower($matches[1]))."</h2>";'), $test[$i]);
Try http://php.net/manual/de/function.preg-replace-callback.php .
You can create a custom function that is called on every match. In this function you can decide to a) not replace CP and b) to not put $1, but ucfirst.
Hope this helps & good luck.

Categories