ignore url in PHP regex

ignore url in PHP regex - php

I've got a utility where I'm trying to enforce brand standards in an application where the function will wrap brand words in a span with a class.
public function filterBrandWords($text)
{
// look up the brand words from the config settings
$filter_terms = ['brandword1', 'brandword2', 'brandword3'];
$filtered_text = $text;
foreach ($filter_terms as $word) {
$match_count = preg_match_all('/' . $word . '/i', $text, $matches);
for ($i = 0; $i < $match_count; $i++) {
$brand_string = trim($matches[0][$i]);
$lower = strtolower($brand_string);
$new = '<span class="font-semibold">' . substr($lower, 0, 3) . '</span>' . substr($lower, 3);
$filtered_text = preg_replace('/\b' . $brand_string . '\b/', $new, $filtered_text);
}
}
return $filtered_text;
}
This works but noticed that it's also filtering text that contains the brand URL when applied.
I tried amending $match_count = preg_match_all('/' . $word . '/i', $text, $matches); to $match_count = preg_match_all('/' . $word . 'com$' . '/i', $text, $matches); in the hope it would ignore matches with com in them.
What have I gotten wrong here the regex?
If I do
echo filterBrandWords('brandword1');
the output is
<span class="font-semibold">bra</span>ndword1
with a URL, the output is
<span class="font-semibold">bra</span>ndword1.com
In those instances, I want to ignore the filter and just give it straight.

If you want to ignore anything like a URL you can use something like this as your regex:
(?|.*\.(com|net|org))
which is a Negative Lookahead assertion that matches URL's (broadly). Insert that into your function as I have done here:
function filterBrandWords($text)
{
// look up the brand words from the config settings
$filter_terms = ['brandword1', 'brandword2', 'brandword3'];
$filtered_text = $text;
if(!preg_match('/(?|.*\.(com|net|org))/', $filtered_text)) { // if it resembles a URL, skip it
foreach ($filter_terms as $word) {
$match_count = preg_match_all('/' . $word . '/i', $text, $matches);
for ($i = 0; $i < $match_count; $i++) {
$brand_string = trim($matches[0][$i]);
$lower = strtolower($brand_string);
$new = '<span class="font-semibold">' . substr($lower, 0, 3) . '</span>' . substr($lower, 3);
$filtered_text = preg_replace('/\b' . $brand_string . '\b/', $new, $filtered_text);
}
}
}
return $filtered_text;
}
Now call the function with something resembling a URL:
echo filterBrandWords('brandword1.com');
And the entire URL is just returned:
brandword1.com
EXAMPLE

Related

Adding custom masks to phone numbers

So i'm creating a simple function to mask phone numbers. My phone numbers have a 9 digits and i want preg_replace them with a given mask like 2-2-2-1-2 or 3-2-2-2 and etc.
I tried this:
$mask = explode('-', '3-2-2-2');
$pattern = '';
$replace = '';
foreach ($mask as $key => $value) {
if ($key == 0) {
$pattern = '/\(?(\d{' . $value . '})\)?[- ]';
$replace = '$' . ++$key . '-';
continue;
}
if ($key == count($mask) - 1) {
$pattern .= '?(\d{' . $value . '})/';
$replace .= '$' . ++$key;
break;
}
$pattern .= '?(\d{' . $value . '})[- ]';
$replace .= '$' . ++$key . '-';
}
return preg_replace($pattern, $replace, '902000810');
and the result is 902-00-08-10. Sometimes getting error preg_replace(): No ending delimiter '/' found. How can i refactor this to not getting errors?

Assuming:
$num = '902000810';
$mask = explode('-', '3-2-2-2');
There're other ways than using regex to format a phone number from the mask.
using formatted strings:
$maskPH = array_map(fn($i) => "%{$i}s", $mask);
$formatI = implode('', $maskPH);
$formatO = implode('-', $maskPH);
$result = vsprintf($formatO, sscanf($num, $formatI));
using unpack:
$format = array_reduce($mask, function ($c, $i) {
static $j = 0;
return "{$c}A{$i}_" . $j++ . "/";
});
$result = implode('-', unpack($format, $num));

preg_replace(): No ending delimiter '/' found
means that your pattern does not terminate with a / as last character.
But all three patterns lack proper formatting:
You should modify them accordingly.
From:
$pattern = '/\(?(\d{' . $value . '})\)?[- ]';
$pattern .= '?(\d{' . $value . '})/';
$pattern .= '?(\d{' . $value . '})[- ]';
To:
$pattern = '/\(?(\d{' . $value . '})\)?[- ]/';
$pattern .= '/?(\d{' . $value . '})/';
$pattern .= '/?(\d{' . $value . '})[- ]/';

Highlight keyword on search results php

I am trying to make any keyword bold when the user types the current keyword.
I have this code and it is working fine.
for($k=$no_words; $k>0 ;$k--) {
$w=trim($search_array[$k-1]);
if($w!='')
{
$result[$i]['title'] = preg_replace('/(' . preg_quote($search_array[$k-1], '/') . ')/siU', '<b>\\1</b>', $result[$i]['title']);
$result[$i]['description'] = preg_replace('/(' . preg_quote($search_array[$k-1], '/') . ')/siU', '<b>\\1</b>', $result[$i]['description']);
}
}
My problem is as follows:
I have this keyword: this is my keyword
When I type: " this is my keyword " I get this result: "this is my keyword"
But when I type: " This is keyword " I get this result: "this is my keyword" without the words in the result being bolded.
What I doing wrong?

I suppose you need following:
$search_array = array_unique(explode(' ', $search));
foreach ($search_array as $k => $v)
{
$w = trim($v);
if ($w)
{
$result[$i]['title'] = preg_replace('/(' . preg_quote($w, '/') . ')/siU', '<b>\\1</b>', $result[$i]['title']);
$result[$i]['description'] = preg_replace('/(' . preg_quote($w, '/') . ')/siU', '<b>\\1</b>', $result[$i]['description']);
}
}

extract part of a string before and after a word

i need to extract and show some words before and after a query word, something like google search results, for example:
$str = "hi user! welcome to new php open source world, we are trying to learn you something!";
$query = "new php";
$result = "... welcome to new php open source ...";
i searched google an SO but didn't find a clear answer or maybe my php knowledge was not enough!
is there a workable and easy-to-use function to do this job?

function yourFuncName($str, $query, $numOfWordToAdd) {
list($before, $after) = explode($query, $str);
$before = rtrim($before);
$after = ltrim($after);
$beforeArray = array_reverse(explode(" ", $before));
$afterArray = explode(" ", $after);
$countBeforeArray = count($beforeArray);
$countAfterArray = count($afterArray);
$beforeString = "";
if($countBeforeArray < $numOfWordToAdd) {
$beforeString = implode(' ', $beforeArray);
}
else {
for($i = 0; $i < $numOfWordToAdd; $i++) {
$beforeString = $beforeArray[$i] . ' ' . $beforeString;
}
}
$afterString = "";
if($countAfterArray < $numOfWordToAdd) {
$afterString = implode(' ', $afterArray);
}
else {
for($i = 0; $i < $numOfWordToAdd; $i++) {
$afterString = $afterString . $afterArray[$i] . ' ';
}
}
$string = $beforeString . $query . ' ' . $afterString;
return $string;
}
Output is: user! welcome to new php open source world, ($numOfWordToAdd = 3)

Here is an working example I thing that it is clear what I did and how:
<?php
$str = "hi user! welcome to new php open source world, we are trying to learn you something!";
$query = "new php";
$expl = explode($query, $str);
// items on the left side of middle string
$expl_left = explode(" ", $expl[0]);
$left_cnt = count($expl_left);
$new_left = $expl_left[$left_cnt-3] . " " . $expl_left[$left_cnt-2];
// items on the right side of middle string
$expl_right = explode(" ", $expl[1]);
$new_right = $expl_right[1] . " " . $expl_right[2];
// new string formated
$new = "... " . $new_left . " " . $query . " " . $new_right . " ...";
print $new;
?>
If you have some questions feel free to ask...

$result = preg_replace('/(.+)?([^\s]+.{10}'.$query.'.{10}[^\s]+)(.+)?/', '... $2 ...', $str);
This will return the same result from the same string and query you gave. If the before or after length starts or ends (respectively) in the middle of a word, it will continue until it completes the word before it stops.

Assuming a "word" is any series of non-whitespace characters, the following will extract 3 words on either side of new php out of the string $subject, but accept less if necessary:
if (preg_match('/(?:\S+\s+){1,3}new php(?:\s+\S+){1,3}/', $subject, $regs)) {
$result = $regs[0];
}
Change the 3s to any number you like.

I used the following function with explode:
public static function returnSearch($query, $str, $wordcount) {
$explode = explode($query, $str);
$result = null;
//if explode count is one the query was not found
if (count($explode) == 1) {
$result = implode(' ', array_slice(str_word_count($explode[0], 2), -$wordcount, $wordcount)) . " ";
}
//if explode count is more than one the query was found at least one time
if (count($explode) > 1) {
//check for if the string begins with the query
if (!empty($explode[0])) {
$result = "..." . implode(' ', array_slice(str_word_count($explode[0], 2), -$wordcount, $wordcount)) . " ";
}
$result = $result . $query;
if (!empty($explode[1])) {
$result = $result . " " . implode(' ', array_slice(str_word_count($explode[1], 2), 0, $wordcount)) . "...";
}
}
//return result
return $result;
}

Corrected function from #Can Vural, it wont mess the phrase of the before match and its case insensitive, very usefull to dispaly in php search results:
function render_search_words($str, $query, $numOfWordToAdd) {
list($before, $after) = preg_split("/$query/i", $str);
$before = rtrim($before);
$after = ltrim($after);
$beforeArray = explode(" ", $before);
$afterArray = explode(" ", $after);
$countBeforeArray = count($beforeArray);
$countAfterArray = count($afterArray);
$beforeString = "";
if($countBeforeArray < $numOfWordToAdd) {
$beforeString = implode(' ', $beforeArray);
}
else {
for($i = 0; $i < $numOfWordToAdd; $i++) {
$beforeString = $beforeArray[$i] . ' ' . $beforeString;
}
}
$afterString = "";
if($countAfterArray < $numOfWordToAdd) {
$afterString = implode(' ', $afterArray);
}
else {
for($i = 0; $i < $numOfWordToAdd; $i++) {
$afterString = $afterString . $afterArray[$i] . ' ';
}
}
$string = '...'.$beforeString . ' <span>' . $query . '</span> ' . $afterString.'...';
return $string;
}

How to wrap words of string every 1000 characters in php

i have some big string, and some array of words that must be replaced with some changes, like wrapping in link. First issue is wrap whole words or combinations words. And the second issue is do previous step minimum every 1000 characters.
$string="lalala word lalala blah, blah lalala combination of words lalala lalala...";
$patterns=array('word','combination of words');
$replacements=array('word','combination of words');
For an example, what i must to do with snippet before?

It sounds to me like you're looking for wordwrap(). You can then use preg_replace_callback() to apply it to your search patterns and make the replacements:
foreach ($patterns as $pattern) {
$regex = '/' . preg_quote($pattern, '/') . '/';
$string = preg_replace_callback($regex, function($match) {
return '<a href="#">'
. wordwrap(htmlspecialchars($match), 1000, '<br />')
. '</a>';
}, $string);
}

SOLUTION:
<?php
function set_keys_by_words($content, $key, $words,$before,$after) {
$positions = array();
$string = '';
for ($i = 0; $i < count($words); $i++) {
$string = preg_replace('/\b' . $words[$i] . '\b/ui', $key . $words[$i], $content);
$position = mb_strpos($string, $key);
if ($position != '') {
$positions[(int) $position] = $words[$i];
}
}
ksort($positions);
$word = '';
$number = '';
$i = 0;
foreach ($positions as $k => $v) {
$i++;
if ($i == 1) {
$number = $k;
$word = $v;
}
}
if ((int) $number) {
$word_len = strlen($word);
$part_after = preg_replace('/\b' . $word . '\b/ui', $before . $word . $after, mb_substr($content, 0, $number + $word_len));
echo $part_after . mb_substr($content, $number + $word_len, 1000);
$content = mb_substr($content, $number + $word_len + 1000);
if ($content != '') {
set_keys_by_words($content, $key, $words);
}
} else if ($number == '' && $content != '') {
echo $content;
}
}
?>

Trouble with regular expression for comments code

I am currently making a homepage where logged in users can write comments. The comment string is first run through a function that str_replaces emoticons. After that I want it to exchange
[url=www.whatever.com]linktext[/url]
with:
<a href='www.whatever.com'>linktext</a>
The reason for this is that I want to strip the text for all the html code that isn't controlled by my comment code, in case some users decide to get creative-
and thought it would be best to use preg replace but the code I ended up with (Partially from reading about reg exp from my trusty "O reilly Sql and Php"-book and partially from the web) Is pretty bonkers, and most importantly, doesn't work.
Any help would be appreciated, thanks.
It's probably possible to exchange the entire code, not in 2 segments like I have done. Just decided on that getting 2 smaller parts to work first would be easier, and then merge them afterwards.
code:
function text_format($string)
{
$pattern="/([url=)+[a-zA-Z0-9]+(])+/";
$string=preg_replace($pattern, "/(<a href=\')+[a-zA-Z0-9]+(\'>)+/", $string);
$pattern="/([\/url])+/";
$string=preg_replace($pattern, "/(<\/a>)+/", $string);
return $string;
}

It looks like you're using something similar to BBCode. Why not use a BBCode parser, such as this one?
http://nbbc.sourceforge.net/
It also handles smilies, replacing them with images. If you use their test page, you will still see the text though, because they don't host the images and they set the alt-text to the smily.

I experimented a bit with the following:
function text_format($string)
{
return preg_replace('#\[url=([^\]]+)\]([^\[]*)\[/url\]#', '$2', $string);
}
However, one immediate fault with this is that if linktext is empty, there will be nothing between <a> and </a>. One way around it would be to do another pass with something like this:
preg_replace('##', '$1', $string);
Another option would be to use preg_replace_callback and put this logic inside your callback function.
Finally, this is obviously a common "problem" and has been solved many times by others, and if using a more mature open sourced solution is an option, I'd recommend looking for one.

#Lauri Lehtinen's answer is good for learning the idea behind the technique, but you shouldn't use it in practice because it would make your site extremely vulnerable to XSS attacks. Also, link spammers would appreciate the lack of rel="nofollow" on the generated links.
Instead, use something like:
<?php
// \author Daniel Trebbien
// \date 2010-06-22
// \par License
// Public Domain
$allowed_uri_schemes = array('http', 'https', 'ftp', 'ftps', 'irc', 'mailto');
/**
* Encodes a string in RFC 3986
*
* \see http://tools.ietf.org/html/rfc3986
*/
function encode_uri($str)
{
$str = urlencode('' . $str);
$search = array('%3A', '%2F', '%3F', '%23', '%5B', '%5D', '%40', '%21', '%24', '%26', '%27', '%28', '%29', '%2A', '%2B', '%2C', '%3B', '%3D', '%2E', '%7E');
$replace = array(':', '/', '?', '#', '[', ']', '#', '!', '$', '&', '\'', '(', ')', '*', '+', ',', ';', '=', '.', '~'); // gen-delims / sub-delims / unreserved
return str_ireplace($search, $replace, $str);
}
function url_preg_replace_callback($matches)
{
global $allowed_uri_schemes;
if (empty($matches[1]))
return $matches[0];
$href = trim($matches[1]);
if (($i = strpos($href, ':')) !== FALSE) {
if (strrpos($href, '/', $i) === FALSE) {
if (!in_array(strtolower(substr($href, 0, $i)), $allowed_uri_schemes))
return $matches[0];
}
}
// unescape `\]`, `\\\]`, `\\\\\]`, etc.
for ($j = strpos($href, '\\]'); $j !== FALSE; $j = strpos($href, '\\]', $j)) {
for ($i = $j - 2; $i >= 0 && $href[$i] == '\\' && $href[$i + 1] == '\\'; $i -= 2)
/* empty */;
$i += 2;
$h = '';
if ($i > 0)
$h = substr($href, 0, $i);
for ($numBackslashes = floor(($j - $i)/2); $numBackslashes > 0; --$numBackslashes)
$h .= '\\';
$h .= ']';
if (($j + 2) < strlen($href))
$h .= substr($href, $j + 2);
$href = $h;
$j = $i + floor(($j - $i)/2) + 1;
}
if (!empty($matches[2]))
$href .= str_replace('\\\\', '\\', $matches[2]);
if (empty($matches[3]))
$linkText = $href;
else {
$linkText = trim($matches[3]);
if (empty($linkText))
$linkText = $href;
}
$href = htmlspecialchars(encode_uri(htmlspecialchars_decode($href)));
return "$linkText";
}
function render($input)
{
$input = htmlspecialchars(strip_tags('' . $input));
$input = preg_replace_callback('~\[url=((?:[^\]]|(?<!\\\\)(?:\\\\\\\\)*\\\\\])*)((?<!\\\\)(?:\\\\\\\\)*)\]' . '((?:[^[]|\[(?!/)|\[/(?!u)|\[/u(?!r)|\[/ur(?!l)|\[/url(?!\]))*)' . '\[/url\]~i', 'url_preg_replace_callback', $input);
return $input;
}
which I believe is safe against XSS. This version has the added benefit that it is possible to write out links to URLs that contain ']'.
Evaluate this code with the following "test suite":
echo render('[url=http://www.bing.com/][[/[/u[/ur[/urlBing[/url]') . "\n";
echo render('[url=][/url]') . "\n";
echo render('[url=http://www.bing.com/][[/url]') . "\n";
echo render('[url=http://www.bing.com/][/[/url]') . "\n";
echo render('[url=http://www.bing.com/][/u[/url]') . "\n";
echo render('[url=http://www.bing.com/][/ur[/url]') . "\n";
echo render('[url=http://www.bing.com/][/url[/url]') . "\n";
echo render('[url=http://www.bing.com/][/url][/url]') . "\n";
echo render('[url= javascript: window.alert("hi")]click me[/url]') . "\n";
echo render('[url=#" onclick="window.alert(\'hi\')"]click me[/url]') . "\n";
echo render('[url=http://www.bing.com/] [/url]') . "\n";
echo render('[url=/?#[\\]#!$&\'()*+,;=.~] [/url]') . "\n"; // link text should be `/?#[]#!$&'()*+,;=.~`
echo render('[url=http://localhost/\\\\]d]abc[/url]') . "\n"; // href should be `http://localhost/%5C`, link text should be `d]abc`
echo render('[url=\\]][/url]') . "\n"; // link text should be `]`
echo render('[url=\\\\\\]][/url]') . "\n"; // link text should be `\]`
echo render('[url=\\\\\\\\\\]][/url]') . "\n"; // link text should be `\\]`
echo render('[url=a\\\\\\\\\\]bcde\\]fgh\\\\\\]ijklm][/url]') . "\n"; // link text should be `a\\]bcde]fgh\]ijklm`
Or, just look at the Codepad results.
As you can see, it works.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

ignore url in PHP regex - php

Related

Adding custom masks to phone numbers

Highlight keyword on search results php

extract part of a string before and after a word

How to wrap words of string every 1000 characters in php

Trouble with regular expression for comments code

Categories

Resources