filtering bad words from text

filtering bad words from text - php

This function filer the email from text and return matched pattern
function parse($text, $words)
{
$resultSet = array();
foreach ($words as $word){
$pattern = 'regex to match emails';
preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE );
$this->pushToResultSet($matches);
}
return $resultSet;
}
Similar way I want to match bad words from text and return them as $resultSet.
Here is code to filter badwords
TEST HERE
$badwords = array('shit', 'fuck'); // Here we can use all bad words from database
$text = 'Man, I shot this f*ck, sh/t! fucking fu*ker sh!t f*cking sh\t ;)';
echo "filtered words <br>";
echo $text."<br/>";
$words = explode(' ', $text);
foreach ($words as $word)
{
$bad= false;
foreach ($badwords as $badword)
{
if (strlen($word) >= strlen($badword))
{
$wordOk = false;
for ($i = 0; $i < strlen($badword); $i++)
{
if ($badword[$i] !== $word[$i] && ctype_alpha($word[$i]))
{
$wordOk = true;
break;
}
}
if (!$wordOk)
{
$bad= true;
break;
}
}
}
echo $bad ? 'beep ' : ($word . ' '); // Here $bad words can be returned and replace with *.
}
Which replaces badwords with beep
But I want to push matched bad words to $this->pushToResultSet() and returning as in first code of email filtering.
can I do this with my bad filtering code?

Roughly converting David Atchley's answer to PHP, does this work as you want it to?
$blocked = array('fuck','shit','damn','hell','ass');
$text = 'Man, I shot this f*ck, damn sh/t! fucking fu*ker sh!t f*cking sh\t ;)';
$matched = preg_match_all("/(".implode('|', $blocked).")/i", $text, $matches);
$filter = preg_replace("/(".implode('|', $blocked).")/i", 'beep', $text);
var_dump($filter);
var_dump($matches);

JSFiddle for working example.
Yes, you can match bad words (saving for later), replace them in the text and build the regex dynamically based on an array of bad words you're trying to filter (you might store it in DB, load from JSON, etc.). Here's the main portion of the working example:
var blocked = ['fuck','shit','damn','hell','ass'],
matchBlocked = new RegExp("("+blocked.join('|')+")", 'gi'),
text = $('.unfiltered').text(),
matched = text.match(matchBlocked),
filtered = text.replace(matchBlocked, 'beep');
Please see the JSFiddle link above for the full working example.

Related

PHP find tags in content text and wrap in <a> tags and set limit the number of links

Im finding keywords "denounce,and,demoralized" in a string, and wrapping it in "html a" tags to change it to link with following function...
function link2tags($text, $tags){
$tags = preg_replace('/\s+/', ' ', trim($tags));
$words = explode(',', $tags);
$linked = array();
foreach ( $words as $word ){
$linked[] = ''.$word.'';
}
return str_replace($words, $linked, $text);
}
echo link2tags('we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment', 'denounce,and,demoralized');
The output of the above function is as follows...
Output:
we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment
Here, the word "and" is linked 2 times I want to limit the number of links to a word
Repeat words are only linked once

You need to get only first occurrence of words and then need to replace those. Check below code:
function link2tags($text, $tags){
$tags = preg_replace('/\s+/', ' ', trim($tags));
$words = explode(',', $tags);
$linked = array();
$existingLinks = array();
foreach ( $words as $word ){
if (!in_array($word, $existingLinks)) {
$existingLinks[] = $word;
$linked[] = ''.$word.'';
}
}
foreach ($existingLinks as $key => $value) {
$text = preg_replace("/".$value."/", $linked[$key], $text, 1);
}
return $text;
}
Hope it helps you.

Here you can check existing word as below:
if(!in_array($word,$alreadyusedword)) {
$linked[] = ''.$word.'';
$alreadyusedword[] = $word;
}

Php search replacer

I have a search String: $str (Something like "test"), a wrap string: $wrap (Something like "|") and a text string: $text (Something like "This is a test Text").
$str is 1 Time in $text. What i want now is a function that will wrap $str with the wrap defined in $wrap and output the modified text (even if $str is more than one time in $text).
But it shall not output the whole text but just 1-2 of the words before $str and then 1-2 of the words after $str and "..." (Only if it isn`t the first or last word). Also it should be case insensitive.
Example:
$str = "Text"
$wrap = "<span>|</span>"
$text = "This is a really long Text where the word Text appears about 3 times Text"
Output would be:
"...long <span>Text</span> where...word <span>Text</span> appears...times <span>Text</span>"
My Code (Obviusly doesnt works):
$tempar = preg_split("/$str/i", $text);
if (count($tempar) <= 2) {
$result = "... ".substr($tempar[0], -7).$wrap.substr($tempar[1], 7)." ...";
} else {
$amount = substr_count($text, $str);
for ($i = 0; $i < $amount; $i++) {
$result = $result.".. ".substr($tempar[$i], -7).$wrap.substr($tempar[$i+1], 0, 7)." ..";
}
}
If you have a tipp or a solution dont hesitate to let me know.

I have taken your approach and made it more flexible. If $str or $wrap changes you could have escaping issues within the regex pattern so I have used preg_quote.
Note that I added $placeholder to make it clearer, but you can use $placeholder = "|" if you don't like [placeholder].
function wrapInString($str, $text, $element = 'span') {
$placeholder = "[placeholder]"; // The string that will be replaced by $str
$wrap = "<{$element}>{$placeholder}</{$element}>"; // Dynamic string that can handle more than just span
$strExp = preg_quote($str, '/');
$matches = [];
$matchCount = preg_match_all("/(\w+\s+)?(\w+\s+)?({$strExp})(\s+\w+)?(\s+\w+)?/i", $text, $matches);
$response = '';
for ($i = 0; $i < $matchCount; $i++) {
if (strlen($matches[1][$i])) {
$response .= '...';
}
if (strlen($matches[2][$i])) {
$response .= $matches[2][$i];
}
$response .= str_replace($placeholder, $matches[3][$i], $wrap);
if (strlen($matches[4][$i])) {
$response .= $matches[4][$i];
}
if (strlen($matches[5][$i]) && $i == $matchCount - 1) {
$response .= '...';
}
}
return $response;
}
$text = "text This is a really long Text where the word Text appears about 3 times Text";
string(107) "<span>text</span> This...long <span>text</span> where...<span>text</span> appears...times <span>text</span>"
To make the replacement case insensitive you can use the i regex option.

If I understand your question correct, just a little bit of implode and explode magic needed
$text = "This is a really long Text where the word Text appears about 3 times Text";
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));
If you specifically need to render the span tags using HTML, just write it that way
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));

Use patern below to get your word and 1-2 words before and after
/((\w+\s+){1,2}|^)text((\s+\w+){1,2}|$)/i
demo
In PHP code it can be:
$str = "Text";
$wrap = "<span>|</span>";
$text = "This is a really long Text where the word Text appears about 3 times Text";
$temp = str_replace('|', $str, $wrap); // <span>Text</span>
// find patern and 1-2 words before and after
// (to make it casesensitive, delete 'i' from patern)
if(preg_match_all('/((\w+\s+){1,2}|^)text((\s+\w+){1,2}|$)/i', $text, $match)) {
$res = array_map(function($x) use($str, $temp) { return '... '.str_replace($str, $temp, $x) . ' ...';}, $match[0]);
echo implode(' ', $res);
}

PHP Regex expression excluding <pre> tag

I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.
The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.
Could you modify this expression so that it won't replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?
The PHP code is:
$text = preg_replace(
"|(?!<[^<>]*?)(?<![?.&])\b$acronym\b(?!:)(?![^<>]*?>)|msU"
, "<acronym title=\"$fulltext\">$acronym</acronym>"
, $text
);

You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:
(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b
This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.
See demo on regex101.com
Here is a sample PHP demo:
<?php
$acronym = "ASCII";
$fulltext = "American Standard Code for Information Interchange";
$re = "/(?s)<pre[^<]*>.*?<\\/pre>(*SKIP)(*F)|\\b$acronym\\b/";
$str = "<pre>ASCII\nSometext\nMoretext</pre>More text \nASCII\nMore text<pre>More\nlines\nASCII\nlines</pre>";
$subst = "<acronym title=\"$fulltext\">$acronym</acronym>";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
<pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>

It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:
function replace($s) {
return str_replace('"', '"', $s); // do something with `$s`
}
$text = 'Your text goes here...';
$parts = preg_split('#(<\/?[-:\w]+(?:\s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$text = "";
$x = 0;
foreach ($parts as $v) {
if (trim($v) === "") {
$text .= $v;
continue;
}
if ($v[0] === '<' && substr($v, -1) === '>') {
if (preg_match('#^<(\/)?(?:code|pre)(?:\s[^<>]+?)?>$#', $v, $m)) {
$x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
}
$text .= $v; // this is a HTML tag…
} else {
$text .= !$x ? replace($v) : $v; // process or skip…
}
}
return $text;
Taken from here.

Preg Replace in PHP for Heading Tags

I have a markdown text content which I have to replace without using library functions.So I used preg replace for this.It works fine for some cases.For cases like heading
for eg Heading
=======
should be converted to <h1>Heading</h1> and also
##Sub heading should be converted to <h2>Sub heading</h2>
###Sub heading should be converted to <h3>Sub heading</h3>
I have tried
$text = preg_replace('/##(.+?)\n/s', '<h2>$1</h2>', $text);
The above code works but I need to have count of hash symbol and based on that I have to assign heading tags.
Anyone help me please....

Try using preg_replace_callback.
Something like this -
$regex = '/(#+)(.+?)\n/s';
$line = "##Sub heading\n ###sub-sub heading\n";
$line = preg_replace_callback(
$regex,
function($matches){
$h_num = strlen($matches[1]);
return "<h$h_num>".$matches[2]."</h$h_num>";
},
$line
);
echo $line;
The output would be something like this -
<h2>Sub heading</h2> <h3>sub-sub heading</h3>
EDIT
For the combined problem of using = for headings and # for sub-headings, the regex gets a bit more complicated, but the principle remains the same using preg_replace_callback.
Try this -
$regex = '/(?:(#+)(.+?)\n)|(?:(.+?)\n\s*=+\s*\n)/';
$line = "Heading\n=======\n##Sub heading\n ###sub-sub heading\n";
$line = preg_replace_callback(
$regex,
function($matches){
//var_dump($matches);
if($matches[1] == ""){
return "<h1>".$matches[3]."</h1>";
}else{
$h_num = strlen($matches[1]);
return "<h$h_num>".$matches[2]."</h$h_num>";
}
},
$line
);
echo $line;
Whose Output is -
<h1>Heading</h1><h2>Sub heading</h2> <h3>sub-sub heading</h3>

Do a preg_match_all like this:
$string = "#####asdsadsad";
preg_match_all("/^#/", $string, $matches);
var_dump ($matches);
And based on count of matches you can do whatever you want.
Or, use the preg_replace_callback function.
$input = "#This is my text";
$pattern = '/^(#+)(.+)/';
$mytext = preg_replace_callback($pattern, 'parseHashes', $input);
var_dump($mytext);
function parseHashes($input) {
var_dump($input);
$matches = array();
preg_match_all('/(#)/', $input[1], $matches);
var_dump($matches[0]);
var_dump(count($matches[0]));
$cnt = count($matches[0]);
if ($cnt <= 6 && $cnt > 0) {
return '<h' . $cnt . ' class="if you want class here">' . $input[2] . '</h' . $cnt . '>';
} else {
//This is not a valid h tag. Do whatever you want.
return false;
}
}

How do I return a part of text with a certain word in the middle?

If this is the input string:
$input = 'In biology (botany), a "fruit" is a part of a flowering
plant that derives from specific tissues of the flower, mainly one or
more ovaries. Taken strictly, this definition excludes many structures
that are "fruits" in the common sense of the term, such as those
produced by non-flowering plants';
And now I want to perform a search on the word tissues and consequently return only a part of the string, defined by where the result is, like this:
$output = '... of a flowering plant that derives from specific tissues of the flower, mainly one or more ovaries ...';
The search term may be in the middle.
How do I perform the aforementioned?

An alternative to my other answer using preg_match:
$word = 'tissues'
$matches = array();
$found = preg_match("/\b(.{0,30}$word.{0,30})\b/i", $string, $matches);
if ($found == 0) {
// string not found
} else {
$output = $matches[1];
}
This may be better as it uses word boundaries.
EDIT: To surround the search term with a tag, you'll need to slightly alter the regex. This should do it:
$word = 'tissues'
$matches = array();
$found = preg_match("/\b(.{0,30})$word(.{0,30})\b/i", $string, $matches);
if ($found == 0) {
// string not found
} else {
$output = $matches[1] . "<strong>$word</strong>" . $matches[2];
}

User strpos to find the location of the word and substr to extract the quote. For example:
$word = 'tissues'
$pos = strpos($string, $word);
if ($pos === FALSE) {
// string not found
} else {
$start = $pos - 30;
if ($start < 0)
$start = 0;
$output = substr($string, $start, 70);
}
Use stripos for case insensitive search.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

filtering bad words from text - php

Related

PHP find tags in content text and wrap in <a> tags and set limit the number of links

Php search replacer

PHP Regex expression excluding <pre> tag

Preg Replace in PHP for Heading Tags

How do I return a part of text with a certain word in the middle?

Categories

Resources