Preg_match word boundaries in php - php

I’m trying this, to get word in word boundaries with unicode characters:
if(preg_match("/(?<!\p{L})jaunā(?!\p{L}) Iel.*/iu", "Jaunā. Iela") > 0){
echo "<h1>Match!</h1>";
}
else{
echo "<h1>dont match</h1>";
}
Why I’m getting „Don’t match”?
I want to find word “jaunā” in all variations for example:
text ,jaunā text
text jaunā, text
text ,jaunā! text
jaunā text
jaunā, text
etc.
Thanks.

You can try with following regex:
/(?<!\p{L})jaunā(?!\p{L}).*? Iel.*/iu
After jaunā you have also . character, so with .*? you can match any other characters between jaunā and Iela

You regex isn't matching a dot after jaunā:
Try this:
if(preg_match("/(?<!\p{L})jaunā(?!\p{L})\. Iel.*/iu", "Jaunā. Iela") > 0) {
echo "<h1>Match!</h1>";
}
else{
echo "<h1>dont match</h1>";
}

Related

How to use preg_replace with url encoded $_GET?

I have a an url which looks like this https://URL.DOMAIN/blog.php?id=43&q=echo%20%27test%27.
When I use <?php echo $_GET['q'] ?> it displays echo 'test' which is what I want.
I am using this variable inside a preg_replace function which is basically made to apply a yellow background under matched strings:
preg_replace('/\b('.$_GET['q'].')\b/iu', '<span class="research-news-found">$1</span>', $news_content);
It works perfectly for "normal" strings like "apple" or whatever, but when there is a ' inside the search query it doesn't match anything.
Code example
$news_content = $news_display['news_description'];
if(isset($_GET['q'])){
$news_content = preg_replace('/\b('.$_GET['q'].')\b/iu', '<span class="research-news-found">$1</span>', $news_content);
}
$news_display['news_description'] contains the text output from DB.
Just make the pattern greedy ? and remove the trailing word boundary \b since ' is not a word character and is a word boundary:
$news_content = preg_replace('/\b('.$_GET['q'].'?)/iu',
'<span class="research-news-found">$1</span>',
$news_content);
Demo
But if you are hoping that it will actually echo test, then no. You would need to restructure your question to state what you want to achieve, not how to get this replacement to work.

PHP regex to match command with parameter

I'm using this code inside PHP
case preg_match('/\/start( .*)?/', $text):
echo "got you";
break;
Using this regex all I need to do is catching following structure:
$text needs to be:
/start
or
/start xyz
Where "xyz" stands for random content. These are the two only formats which should be accepted by the regex. For some reason my regex seems to be not working as expected.
This should do the trick:
^\/start\s?[\S]*$
Here is an example in python DEMO:
import re
textlist = ["^/start xyz","/start","/start not to match"]
regex = "^/start\s?[\S]*$"
for text in textlist:
thematch = re.search(regex, text)
if thematch:
print ("match found")
else:
print ("no match sir!")
What it's doing: the line starts with /start and might have space, then there might be any amount of non space (including none) and then the line ends.
Hopefully that helps!
EDIT;
PHP version of this code.
$textlist = array("^/start xyz","/start","/start not to match");
$regex = "#^/start\s?[\S]*$#";
foreach($textlist as $text){
preg_match($regex, $text, $thematch);
if ($thematch){
print ("match found\n");
}else{
print ("no match sir!\n");
}
}
Demo here: https://3v4l.org/OFpnG

preg_replace a hashtag that doesn't end with ;

I am currently using preg_replace to replace hashtags mentioned with html links like shown below. The issue is there is a possibility there will be html code as well being checked. So some css such as color: #000000; will force it to try convert that hex code into a link.
I basically need my regex to ignore doing any preg_replace if the last letter of a word is ;. Here's what I currently have:
$str = preg_replace('/#([a-zA-Z0-9!_%]+)/', '#$1', $str);
Example input: 'I like #action movies!'
Expected output: I like #action movies!'
I cannot use the end of the string to check this as chunks of text is checked at any given time so the string supplied could be #computer text text text #computer for instance.
Appreciate any assistance.
In your regex you can check if next to your hashtag there is a ;, non alphanumeric, end of line or end of string:
/#([a-zA-Z0-9!_%]+)([^;\w]{1}|$)/
Then use $1 and $2 accordingly
'#$1$2'
Your code will look like
$str = preg_replace('/#([a-zA-Z0-9!_%]+)([^;\w]{1}|$)/', '#$1$2',$str);
Here you can see some tests: https://regex101.com/r/yN4tJ6/65
Until a regEx guru come to your rescue (if ever...) and because you are in PHP; here is a solution with few lines of code.
$str="hi #def; #abc #ghi"; // just a test case (first one need be skipped)
if (preg_match_all('/#([a-zA-Z0-9!_%]+.?)/', $str,$m)){
foreach($m[1] as $k) if(substr($k,-1)!=';') {
$k=trim($k);
$str=str_replace("#$k","<a href='http://wxample.com/tags/$k'>#$k</a>",$str);
}
}
print "$str\n";
you can add a condition to check last string is ; or not and use it accordingly .
Example :
if (substr($str, -1)==';'){
//do nothing
}
else {
$str = preg_replace('/#([a-zA-Z0-9!_%]+)/', '#$1', $str);
}
Hope this help .
This regex should work:
#([\w!%]+(?=[\s,!?.\n]|$))
Demo: https://regex101.com/r/KrRiD3/2
Your PHP code:
$str = 'I like #strategy games #f1f1f1; #e2e2e2; #action games!';
$str = preg_replace('/#([\w!%]+(?=[\s,!?.\n]|$))/', '#$1', $str);
echo $str;
output:
I like #strategy games #f1f1f1; #e2e2e2; #action games!
Well, You can use below code, Actually I am new to regex so it is not that professional but it works, here it is
$data = "<p style='color:#00000;'>Heloo</p> #computer text text text #computer #say #goo1d #sd! #say_hello";
echo preg_replace("/(?<!\:)(\s+)\#([\w]+)(?!\;)/",'#$2',$data);
This expression I have use
/(?<!\:)(\s+)\#([\w]+)(?!\;)/
Output is
<p style='color:#00000;'>Heloo</p> #computer text text text #computer #say #goo1d #sd! #say_hello
I hope it helps someone.

Regex to select url except when = is directly infront of it

I'm trying to use a regex to find and replace all URLs in a forum system. This works but it also selects anything that is within bbcode. This shouldn't be happening.
My code is as follows:
<?php
function make_links_clickable($text){
return preg_replace('!(([^=](f|ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i', '$1', $text);
}
//$text = "https://www.mcgamerzone.com<br>http://www.mcgamerzone.com/help/support<br>Just text<br>http://www.google.com/<br><b>More text</b>";
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Unparsed text:</b><br>";
echo $text;
echo "<br><br>";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
?>
All urls that occur in bb-code are following up on a = character, meaning that I don't want anything that starts with = to be selected.
I basically have that working but this results in selecting 1 extra character in in front of the string that should be selected.
I'm not very familiar with regex. The final output of my code is this:
<b>Unparsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa<br>
<br>
<b>Parsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa
You can match and skip [url=...] like this:
\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)
See regex demo
That way, you will only match the URLs outside the [url=...] tag.
IDEONE demo:
function make_links_clickable($text){
return preg_replace('~\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)~iu', '$1', $text);
}
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
You can use a negative lookbehind (?<!=) instead of your negated class. It asserts that what is going to be matched isn't preceded by something.
Example

For every line beginning with 4 spaces, add text-indent tags

I've got text where some lines are indented with 4 spaces. I've been trying to write a regex which would find every line beginning with 4 spaces and put a <span class="indented"> at the beginning and a </span> at the end. I'm no good at regex yet, though, so it came to nothing. Is there a way to do it?
(I'm working in PHP, in case there's an option easier than regex).
Example:
Text text text
Indented text text text
More text text text
A bit more text text.
to:
Text text text
<span class="indented">Indented text text text</span>
More text text text
A bit more text text.
The following will match lines starting with at least 4 spaces or a tab character:
$str = preg_replace("/^(?: {4,}|\t *)(.*)$/m", "<span class=\"indented\">$1</span>", $str);
I had to do something similar, and one thing I might suggest is changing the goal formatting to be
<span class="tab"></span>Indented text text text
You can then set your css something like .tab {width:4em;} and instead of using preg_replace and regexes, you can do
str_replace($str, " ", "<span class='tab'></span>");
This has the benefit of allowing for 8 spaces to turn into a double width tab easily.
I think this should work:
//get each line as an item in an array
$array_of_lines = explode("\n", $your_string_of_lines);
foreach($array_of_lines as $line) {
// First four characters
$first_four = substr($line, 0, 4);
if($first_four == ' ') {
$line = trim($line);
$line = '<span class="indented">'.$line.'</span>';
}
$output[] = $line;
}
echo implode("\n",$output);

Categories