Replace a character between two words - php

I have a string like blah blah [START]Hello-World[END] blah blah.
I want to replace - with , between [START] and [END].
So the result should be blah blah[START]Hello,World[END] blah blah.

I would suggest to use preg_replace_callback:
$string = "blah-blah [START]Hello-World. How-are-you?[END] blah-blah" .
" [START]More-text here, [END] end of-message";
$string = preg_replace_callback('/(\[START\])(.*?)(\[END\])/', function($matches) {
return $matches[1] . str_replace("-", ",", $matches[2]). $matches[3];
}, $string);
echo $string;
Output:
blah-blah [START]Hello,World. How,are,you?[END] blah-blah [START]More,text here, [END] end of-message
The idea of the regular expression is to get three parts: "START", "END" and the part between it. The function passes these three text fragments to the callback function, which performs a simple str_replace of the middle part, and returns the three fragments.
This way you are sure that the replacements will happen not only for the first occurrence (of the hyphen or whatever character you replace), but for every occurrence of it.

You will have to use regular expressions to accomplish what you need
$string = "blah blah [START]Hello-World[END] blah blah";
$string = preg_replace('/\[START\](.*)-(.*)\[END\]/', '[START]$1,$2[END]', $string));
Here's what the regular expression does:
\[START\] The backslash is needed to escape the square brackets. It also tells the preg_replace to look in the string where it starts with [START].
(.*) This will capture anything after the [START] and will be referenced later on as $1.
- This will capture the character you want to replace, in our case, the dash.
(.*) This will target anything after the dash and be referenced as $2 later on.
\[END\] Look for the [END] to end the regex.
Now as for the replace part [START]$1,$2[END], this will replace the string it found with the regular expression where the $1 and $2 is the references we got from earlier.
The var_dump of $string would be:
string(43) "blah blah [START]Hello,World[END] blah blah"

Related

How to get a string in regex and delete other after matching the string

my input is following
1 blah blah blah #username_. sblah sblah sblah
the output I need is following
username_.
for now, I make this expression
^.*\#([a-zA-Z0-9\.\_]+)$
which working in following
1 blah blah blah #username_.
but if I use it for the full line it's not working
so its get the user and delete before the user
but how I can make it delete the rest once it gets the user
Note I use regex101 for testing if you have a better tool please write it below.
Your pattern uses ^$ which means it needs a full match, your pattern is only partial.
By adding a .* it becomes a full regex and it matches as expected.
"/^.*\#([a-zA-Z0-9\.\_]+).*$/"
https://3v4l.org/i4pVd
Another way to do it is to use a partial regex like this.
It skips anything up to the # and then captures all to a dot
$str = "1 blah blah blah #username_. sblah sblah sblah";
preg_match("/.*?\#(.*?\.)/", $str, $match);
var_dump($match);
https://3v4l.org/mvBYI
To match the username in you example data, you could preg_match and omit the $ to assert the position at the end of the string as in this demo. Note that you don't have to escape the # and the dot and underscore in the character class.
To get the username in you example data, you could also use:
#\K[\w.]+
That would match
# Match literally
\K Forget what was previously matched
[\w.]+ Match 1+ times a word character or a dot
Regex demo
$re = '/#\K[\w.]+/';
$str = '1 blah blah blah #username_. sblah sblah sblah #test';
preg_match($re, $str, $matches);
echo $matches[0]; // username_.
Demo php

PHP Regex Negation For Youtube URLs

Let's say I have HTML in a database that looks like this:
Hello world!
ABC
Blah blah blah...
https://www.youtube.com/watch?v=df82vnx07s
Blah blah blah...
<p>https://www.youtube.com/watch?v=nvs70fh17f3fg</p>
Now I want to use PHP regex to grab the 2nd and 3rd URLs, but ignore the first.
The regex equation I have so far is:
\s*[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)
It works pretty well, but I don't know how to make it exclude/negate the first type of URL, one which starts with: href="
Please help, thanks!
You can use the "negative lookbehind" regular expression feature to accomplish what you're after. I've modified the very beginning of your regex by adding ((?<!href=[\'"])http) to implement one. Hope it helps!
$regex = '/((?<!href=[\'"])http)[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)/';
$useCases = [
1 => 'ABC',
2 => "<a href='https://www.youtube.com/watch?v=m7t75u72vd'>ABC</a>",
3 => 'https://www.youtube.com/watch?v=df82vnx07s',
4 => '<p>https://www.youtube.com/watch?v=nvs70fh17f3fg</p>'
];
foreach ($useCases as $index => $useCase) {
$matches = [];
preg_match($regex, $useCase, $matches);
if ($matches) {
echo 'The regex was matched in usecase #' . $index . PHP_EOL;
}
}
// Echoes:
// The regex was matched in usecase #3
// The regex was matched in usecase #4
All you need is to add a (?![^<]*>) negative lookahead that will fail the match if the match is followed with 0+ chars other than < followed with >:
[a-zA-Z\/:.]*youtu(?:be\.com\/watch\?v=|\.be\/)([a-zA-Z0-9\-_]+)(?![^<]*>)
^^^^^^^^^^
See the regex demo
Note I also escaped . symbols to match literal dots, and used a non-capturing group with be part. You may replace ([a-zA-Z0-9\-_]+) with [a-zA-Z0-9_-]+ if you are not interested in the capture, and you also may replace [a-zA-Z\/\/:\.]* part with a more precise pattern, like https?:\/\/[a-zA-Z.]*.
Example solution:
(?![^<]*>)[a-zA-Z\/\/:\.]*youtu(be.com\/watch\?v=|.be\/)([a-zA-Z0-9\-_]+)
Visualization with an explanation

php preg_match and regex regular expression

I want to use the regex:
/(.*)[.\s][sS](\d{1,20})[eE](\d{1,100}).*/i
to filter for the title of a tv series. (e.g. The Big Bang Theory S04E05) In order to remove the episode string (S04E05).
I've tested my regex with http://www.phpliveregex.com/ and everything works fine. But including it to my website, I'll get the whole title including the episode string.
The return value of preg_match is 0.
My Code:
$ret=preg_match("/(.*)[.\s][sS](\d{1,20})[eE](\d{1,100}).*/i", $title,$output);
if($ret==1){
$title_without=$output[1];
}
Note that inside a double-quoted string, you need to use double backslash to escape regex shorthand classes.
You can use your regex inside a preg_replace function inside single quotes so that you do not have to double backslashes:
$title= "The Big Bang Theory S04E05";
$ret=preg_replace('/^(.*)[.\s]s\d{1,20}e\d{1,100}(.*)/i', '\1\2', $title);
echo $ret;
See IDEONE demo. Result: The Big Bang Theory.
The back-references \1\2 will restore the substrings before and after the episode substring.
Since you are using /i modifier, you need not use [eE] or [Ss], just use single letters in any case.
To return the substring before the episode and the episode substring itself, just use the capturing groups with preg_match like here:
$title= "The Big Bang Theory S04E05";
$ret=preg_match('/^(.*)[.\s](s\d{1,20}e\d{1,100})/i', $title, $match);
echo $match[1] . PHP_EOL; // => The Big Bang Theory
echo $match[2]; // => S04E05
See another demo
You could look for words and match all but the last one:
$matches = array();
$regex = "/^([\w ]*) [\w]+$/i";
$title = "The Big Bang Theory S04E05";
preg_match_all ($regex, $title, $matches);
Now all your matches are in $matches.

Remove hashtags from the end of a sentence

I would like to remove all words from the end of a text that are starting with a space and # sign.
URLS or hashtags within a sentence should not be remove.
Example text:
hello world #dontremoveme foobar http://example.com/#dontremoveme #remove #removeme #removeüäüö
I tried this but it removes all hashtags:
$tweet = "hello world #dontremoveme foobar http://example.com/#dontremoveme #remove #removeme #removeüäüö";
preg_match_all("/(#\w+)/", $tweet, $matches);
var_dump( $matches );
My idea is to check every word starting at the end of the text for a leading # with a space in front, until it's no longer the case.
How to translate that into a regular expression?
You could use something like so: ( #[^# ]+?)+$ and replace it with an empty string.
An example is available here. Since you have non ASCII characters, the . operator (which matches any character) should help you tackle any character.
The following regex matches all words starting with a [Space]# at the end of the line.
/( #\S+)*$/g
https://regex101.com/r/eH4bJ2/1
This will do the job:
$tweet = "hello world #dontremoveme foobar http://example.com/#dontremoveme #remove #removeme #removeüäüö";
$res = preg_replace("/ #\p{L}+\b(?!\s+\p{L})/u", '', $tweet);
echo $res,"\n";
Output:
hello world #dontremoveme foobar http://example.com/#dontremoveme

Regex: Replace unknown number of occurances after a given marker

I am trying to figure out a way of replacing a / with - in the GET part of a href tag in a html file looking like this:
blah blah <a href="aaaaa/aaaaa/aaaaa/?q=43/23"> blah blah <a
href="aaaaa/aaaaa/aaaaa/?q=43/11/1"> blah blah blah
So basically I'm looking to make the two URLs end with ?q=43-23 and ?q=43-11-1 respectively.
How can I do this with a preg_replace? I can obviously get the 43/23 to be 43-23 with
/(\?.+?)\/(.+?)$/is
And I can get 43/11/1 to be 43-11-1 with
/(\?.+?)\/(.+?)\/(.+?)$/is
But how can I do this in a single regex taking into account that there may be an unlimited number of slashes after the ?. Any suggestions or someone who can point me in the right direction?
I think it could be easy for your content;
print preg_replace_callback('~\?q=([^&"]*)~', function($m) {
return '?q='. str_replace('/', '-', $m[1]);
}, $s);
// for PHP < 5.3.0
print preg_replace_callback('~\?q=([^&"]*)~', create_function(
'$m', 'return "?q=". str_replace("/", "-", $m[1]);'
), $s);
Out;
blah blah <a href="aaaaa/aaaaa/aaaaa/?q=43-23"> blah blah <a
href="aaaaa/aaaaa/aaaaa/?q=43-11-1"> blah blah blah
blah blah blah blah blah blah blah
This is not the simplest search and replace because of how regex engines handle repeated capture groups. Applying repeated capture group principles, you can use the regex to capture the repeating group and then do a simple string replace.
preg_replace_callback('/
( # start capture
\? # question mark
.+? # reluctantly capture all until...
) # end capture
( # start capture
(?: # start group (no capture)
\/ # ...a literal slash
.+? # reluctantly capture all until...
) # end group
+ # repeat capture group
) # end capture
( # start capture
\b # ...a word boundary
) # end capture
/isx', function ($matches) {
return $matches[1] . str_replace('/', '-', $matches[2]) . $matches[3];
}, $str));
You do the string replace on the second match which is the repeated group capture. The word boundary at the end is necessary, but it can be replaced with something more sensible or correct such as " (if you know the URL ends here), or even ("|').
You can use this regex to match an unlimited amount of (slash) levels after the query parameter q=.
// Using tilde delimiters because hash signs are interpreted as comments here :)
~q=((?:[^/]+|/|)*)$~i
For example with the string "aaaaa/aaaaa/aaaaa/?q=43/11/1/5/10" the first captured group will contain 43/11/1/5/10.
Afterwards you can do the following to replace slashes with hyphens:
<?php str_replace( '/', '-', $string );

Categories