Preg Match circumflex with ^ in php - php

I know I am going to get a lot of asinine comments, but I cannot figure this out no matter what I do. I have a function here
$filter = mysql_query("SELECT * FROM `filter`");
$fil = mysql_fetch_array($filter);
$bad = $fil['filter'];
$bword = explode(",", $bad);
function wordfilter($output,$bword){
$badWords = $bword;
$matchFound = preg_match_all("/(" . implode($badWords,"|") . ")/i",$output,$matches);
if ($matchFound) {
$words = array_unique($matches[0]);
foreach($words as $word) {
$output = preg_replace("/$word/","*****",$output);
}
}
return $output;
}
I know bad word filters are frowned upon, but my client has requested this.
Now i have a list in the database here are a few entries.
^ass$,^asses$,^asshopper,^cock$,^coon,^cracker$,^cum$,^dick$,^fap$,^heeb$,^hell$,^homo$,^humping,^jap$,^mick$,^muff$,^paki$,^phap$,^poon$,^spic$,^tard$,^tit$,^tits$,^twat$,^vag$,ass-hat,ass-pirate,assbag
as you can see I am using a circumflex and dollar signs for certain words.
The problem I am having is with the first three words beginning with ass it is blocking out the word even if i write something like glasses or grasshoppers but everything past the first 3 work fine, I have tried adding 3 entries before these in-case that was the problem, but unfortunately it isn't.
Is there something wrong with how i have this written?

Extending from comment:
Try to use \b to detect words:
$matchFound = preg_match_all('/\b('.implode($badWords,"|").')\b/i',$output,$matches);

Related

PHP - Find occurence in array and then place the replaced part at the start

Here is my code:
function TranslatedTitle($Title) {
ConnectWithMySQLDatabase();
$v = mysql_query("SELECT * FROM `ProductTranslations`");
while($vrowis = mysql_fetch_array($v)){
$English[] = $vrowis['English'];
$Bulgarian[] = $vrowis['Bulgarian'];
}
$TranslatedTitle = str_replace($English, $Bulgarian, $Title);
return $TranslatedTitle;
}
I am using this code to fetch data from MySQL table and then search for certain phrase in English and then replace it with the phrase setted to replace the English one with the Bulgarian one.
Example:
I have very big blue eyes.
Will be translated to:
I have very големи сини eyes . It takes the phrase big blue and replace it with големи сини at the position where it can be found.
In other words how can i make the replaced part to be moved in the beginning of the string giving final result by my example as големи сини I have very eyes.
The sentence in the example have no meaning but i have created it as an example.
I would try looping through the $English array and when finding the matching word move it to the beginning, then translating... something like:
foreach($English as $word){
$pos = strpos($Title, $word);
if ($pos !== false) {
//english word found
$Title = $word . str_replace($English, '', $Title);
break;
}
}
Then
$TranslatedTitle = str_replace($English, $Bulgarian, $Title);
First off, you will want to use PDO to interact with your database. mysql_ extensions are now deprecated, bad practice and vulnerable to sql injections. You can manipulate your strings using strpos see php.net/manual/en/function.strpos.php. You will want to first go like this: find the text to replace, translate, remove the word from where ever it is by using $strip = str_replace("",$word) and finally append your result to a new variable ike this $variable = $translate.$strip . Hope that helps

Regex, PHP - finding words that need correction

I have a long string with words. Some of the words have special letters.
For example a string "have now a rea$l problem with$ dolar inp$t"
and i have a special letter "$".
I need to find and return all the words with special letters in a quickest way possible.
What I did is a function that parse this string by space and then using “for” going over all the words and searching for special character in each word. When it finds it—it saves it in an array. But I have been told that using regexes I can have it with much better performance and I don’t know how to implement it using them.
What is the best approach for it?
I am a new to regex but I understand it can help me with this task?
My code: (forbiden is a const)
The code works for now, only for one forbidden char.
function findSpecialChar($x){
$special = "";
$exploded = explode(" ", $x);
foreach ($exploded as $word){
if (strpos($word,$forbidden) !== false)
$special .= $word;
}
return $special;
}
You could use preg_match like this:
// Set your special word here.
$special_word = "café";
// Set your sentence here.
$string = "I like to eat food at a café and then read a magazine.";
// Run it through 'preg_match''.
preg_match("/(?:\W|^)(\Q$special_word\E)(?:\W|$)/i", $string, $matches);
// Dump the results to see it working.
echo '<pre>';
print_r($matches);
echo '</pre>';
The output would be:
Array
(
[0] => café
[1] => café
)
Then if you wanted to replace that, you could do this using preg_replace:
// Set your special word here.
$special_word = "café";
// Set your special word here.
$special_word_replacement = " restaurant ";
// Set your sentence here.
$string = "I like to eat food at a café and then read a magazine.";
// Run it through 'preg_replace''.
$new_string = preg_replace("/(?:\W|^)(\Q$special_word\E)(?:\W|$)/i", $special_word_replacement, $string);
// Echo the results.
echo $new_string;
And the output for that would be:
I like to eat food at a restaurant and then read a magazine.
I am sure the regex could be refined to avoid having to add spaces before and after " restaurant " like I do in this example, but this is the basic concept I believe you are looking for.

Repeat pattern using preg_match

I want to be able to validate the strings below to allow data between backticks unlimited times as long as it is followed by a comma, if it is not a comma must be a ")". Whitespaces are allowed only out of the backticks not in them.
I am not experienced with regex so I dont know how to allow a repeated pattern. Below is my pattern so far.
Thanks
UPDATED
// first 3 lines should match
$lines[] = "(`a-z0-9_-`,`a-z0-9_-`,`a-z0-9_-`,`a-z0-9_-`)";
$lines[] = "( `a-z0-9_-`, `a-z0-9_-` ,`a-z0-9_-` , `a-z0-9_-` )";
$lines[] = "(`a-z0-9_-`,
`a-z0-9_-`
,`a-z0-9_-` ,`a-z0-9_-`)";
// these lines below should not match
$lines[] = "(`a-z0-9_-``a-z0-9_-`,`a-z0-9_-`,`a-z0-9_-`)";
$lines[] = "(`a-z0-9_-``a-z0-9_-`,`a-z0-9_-`.`a-z0-9_-`";
$pattern = '/~^\(\s*(?:[a-z0-9_-]+\s*,?\s*)+\)$~/';
$result = array();
foreach($lines as $key => $line)
{
if (preg_match($pattern, $line))
{
$result[$key] = 'Found match.';
}
else
{
$result[$key] = 'Not found a match.';
}
}
print("<pre>" . print_r($result, true). "</pre>");
You're very close. It looks like you want this:
$pattern = "~^\(\s*`[a-z0-9_-]+`\s*(?:,\s*`[a-z0-9_-]+`\s*)*\)$~";
The two problems with your regex were:
You had two sets of delimiters (slashes and tildes) - pick just one and stick with it. My personal preference is parentheses because then you don't have to escape anything "just because delimiters", but also it helps me remember that the entire match is the first entry in the match array.
By making the comma optional, you were allowing things you didn't want. The solution does involve repeating yourself a little, but it is more accurate.
Well you weren't very clear about the matching rules for the data between the brackets, and you didn't really specify if you wanted to capture anything so...I took a best guess based on context of your code, hopefully this will suit your needs.
edit: fixed code block so it would show the backtics in the pattern, also changed the delimiter from ~ to / since OP was confused about that
$pattern = '/^\((\s*`[a-z0-9_-]+`\s*[,)])+$/';
here is a generic repeat pattern:
preg_match_all("/start_string([^repeat_string].*?)end_string/si", $input, $output);
var_dump($output);

Function which searches for a word in a text and highlights all the words which contain it

This function searches for words (from the $words array) inside a text and highlights them.
function highlightWords(Array $words, $text){ // Loop through array of words
foreach($words as $word){ // Highlight word inside original text
$text = str_replace($word, '<span class="highlighted">' . $word . '</span>', $text);
}
return $text; // Return modified text
}
Here is the problem:
Lets say the $words = array("car", "drive");
Is there a way for the function to highlight not only the word car, but also words which contain the letters "car" like: cars, carmania, etc.
Thank you!
What you want is a regular expression, preg_replace or peg_replace_callback more in particular (callback in your case would be recommended)
<?php
$searchString = "The car is driving in the carpark, he's not holding to the right lane.\n";
// define your word list
$toHighlight = array("car","lane");
Because you need a regular expression to search your words and you might want or need variation or changes over time, it's bad practice to hard code it into your search words. Hence it's best to walk over the array with array_map and transform the searchword into the proper regular expression (here just enclosing it with / and adding the "accept everything until punctuation" expression)
$searchFor = array_map('addRegEx',$toHighlight);
// add the regEx to each word, this way you can adapt it without having to correct it everywhere
function addRegEx($word){
return "/" . $word . '[^ ,\,,.,?,\.]*/';
}
Next you wish to replace the word you found with your highlighted version, which means you need a dynamic change: use preg_replace_callback instead of regular preg_replace so that it calls a function for every match it find and uses it to generate the proper result. Here we enclose the found word in its span tags
function highlight($word){
return "<span class='highlight'>$word[0]</span>";
}
$result = preg_replace_callback($searchFor,'highlight',$searchString);
print $result;
yields
The <span class='highlight'>car</span> is driving in the <span class='highlight'>carpark</span>, he's not holding to the right <span class='highlight'>lane</span>.
So just paste these code fragments after the other to get the working code, obviously. ;)
edit: the complete code below was altered a bit = placed in routines for easy use by original requester. + case insensitivity
complete code:
<?php
$searchString = "The car is driving in the carpark, he's not holding to the right lane.\n";
$toHighlight = array("car","lane");
$result = customHighlights($searchString,$toHighlight);
print $result;
// add the regEx to each word, this way you can adapt it without having to correct it everywhere
function addRegEx($word){
return "/" . $word . '[^ ,\,,.,?,\.]*/i';
}
function highlight($word){
return "<span class='highlight'>$word[0]</span>";
}
function customHighlights($searchString,$toHighlight){
// define your word list
$searchFor = array_map('addRegEx',$toHighlight);
$result = preg_replace_callback($searchFor,'highlight',$searchString);
return $result;
}
I haven't tested it, but I think this should do it:-
$text = preg_replace('/\W((^\W)?$word(^\W)?)\W/', '<span class="highlighted">' . $1 . '</span>', $text);
This looks for the string inside a complete bounded word and then puts the span around the whole lot using preg_replace and regular expressions.
function replace($format, $string, array $words)
{
foreach ($words as $word) {
$string = \preg_replace(
sprintf('#\b(?<string>[^\s]*%s[^\s]*)\b#i', \preg_quote($word, '#')),
\sprintf($format, '$1'), $string);
}
return $string;
}
// courtesy of http://slipsum.com/#.T8PmfdVuBcE
$string = "Now that we know who you are, I know who I am. I'm not a mistake! It
all makes sense! In a comic, you know how you can tell who the arch-villain's
going to be? He's the exact opposite of the hero. And most times they're friends,
like you and me! I should've known way back when... You know why, David? Because
of the kids. They called me Mr Glass.";
echo \replace('<span class="red">%s</span>', $string, [
'mistake',
'villain',
'when',
'Mr Glass',
]);
Sine it's using an sprintf format for the surrounding string, you can change your replacement accordingly.
Excuse the 5.4 syntax

PHP: Bolding of overlapping keywords in string

This is a problem that I have figured out how to solve, but I want to solve it in a simpler way... I'm trying to improve as a programmer.
Have done my research and have failed to find an elegant solution to the following problem:
I have a hypothetical array of keywords to search for:
$keyword_array = array('he','heather');
and a hypothetical string:
$text = "What did he say to heather?";
And, finally, a hypothetical function:
function bold_keywords($text, $keyword_array)
{
$pattern = array();
$replace = array();
foreach($keyword_array as $keyword)
{
$pattern[] = "/($keyword)/is";
$replace[] = "<b>$1</b>";
}
$text = preg_replace($pattern, $replace, $text);
return $text;
}
The function (not too surprisingly) is returning something like this:
"What did <b>he</b> say to <b>he</b>ather?"
Because it is not recognizing "heather" when there is a bold tag in the middle of it.
What I want the final solution to do is, as simply as possible, return one of the two following strings:
"What did <b>he</b> say to <b>heather</b>?"
"What did <b>he</b> say to <b><b>he</b>ather</b>?"
Some final conditions:
--I would like the final solution to deal with a very large number of possible keywords
--I would like it to deal with the following two situations (lines represent overlapping strings):
One string engulfs the other, like the following two examples:
-- he, heather
-- sanding, and
Or one string does not engulf the other:
-- entrain, training
Possible way to solve:
-A regex that ignores tags in keywords
-Long way (that I am trying to avoid):
*Search string for all occurrences of each keyword, store an array of positions (start and end) of keywords to be bolded
*Process this array recursively to combine overlapping keywords, so there is no redundancy
*Add the bold tags (starting from the end of the string, to avoid the positions of information shifting from the additional characters)
Many thanks in advance!
Example
$keyword_array = array('he','heather');
$text = "What did he say to heather?";
$pattern = array();
$replace = array();
sort($keyword_array, SORT_NUMERIC);
foreach($keyword_array as $keyword)
{
$pattern[] = "/ ($keyword)/is";
$replace[] = " <b>$1</b>";
}
$text = preg_replace($pattern, $replace, $text);
echo $text; // What did <b>he</b> say to <b>heather</b>?
need to change your regex pattern to recognize that each "term" you are searching for is followed by whitespace or punctuation, so that it does not apply the pattern match to items followed by an alpha-numeric.
Simplistic and lazy-ish Approach off The Top of My head:
Sort your initial Array by Item length, descending! No more "Not recognized because there's already a Tag in The Middle" issues!
Edit: The nested tags issue is then easily fixed by extending your regex in a Way that >foo and foo< isn't being matched anymore.

Categories