In PHP, get entire word from MySQL search result using "LIKE" - php

what I want is:
Let's supose I searched "goo" using a query that goes like this: ...WHERE message LIKE '%goo%' and it returned me a result, for example I love Google to make my searches, but I'm starting to worry about privacy, so it will be displayed as a result, because the word Google matches my search criteria.
How do I, based on my search string save this entire Google result on a variable?
I need this because I'm using a regular expression that will highlight the searched word and display content before and after this result, but it's only working when the searched word matches exactly the word in the result, and also it's malconstructed, so it won't work well with words that are not surrounded by space.
This is the regular expression code
<?=preg_replace('/^.*?\s(.{0,'.$size.'})(\b'.$_GET['s'].'\b)(.{0,'.$size.'})\s.*?$/',
'...$1<strong>$2</strong>$3...',$message);?>
What I want is that change this $_GET['s'] to my variable which will contain the whole word found in my query string.
How do I achieve this ?

I bet it will be easier to change your regular expression to check any word containing the term, what about:
<?=preg_replace('/^.*?(.{0,'.$size.'})(\b\S*'.$_GET['s'].'\S*\b)(.{0,'.$size.'}).*?$/i',
'...$1<strong>$2</strong>$3...',$message);?>

I read your discussion on this and more robust implementation might be in order. Especially taking your need to support diacritics into account. Using a single regular expression to fix all your problems might seem tempting, but the more complicated it becomes the harder it gets to maintain or expand upon. To quote Jamie Zawinski
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
As I have problems with iconv on my local machine, I used a more simple implementation instead, feel free to use something more complicated or robust if your situation requires it.
I use a simple regular expression in this solution to get a set of alphanumeric characters only (also known as a "word"), the part in the regular expression that reads \p{L}\p{M} makes sure we also get all the multibyte characters.
You can see this code working on IDEone.
<?php
function stripAccents($p_sSubject) {
$sSubject = (string) $p_sSubject;
$sSubject = str_replace('æ', 'ae', $sSubject);
$sSubject = str_replace('Æ', 'AE', $sSubject);
$sSubject = strtr(
utf8_decode($sSubject)
, utf8_decode('àáâãäåçèéêëìíîïñòóôõöøùúûüýÿÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝ')
, 'aaaaaaceeeeiiiinoooooouuuuyyAAAAAACEEEEIIIINOOOOOOUUUUY'
);
return $sSubject;
}
function emphasiseWord($p_sSubject, $p_sSearchTerm){
$aSubjects = preg_split('#([^a-z0-9\p{L}\p{M}]+)#iu', $p_sSubject, null, PREG_SPLIT_DELIM_CAPTURE);
foreach($aSubjects as $t_iKey => $t_sSubject){
$sSubject = stripAccents($t_sSubject);
if(stripos($sSubject, $p_sSearchTerm) !== false || mb_stripos($t_sSubject, $p_sSearchTerm) !== false){
$aSubjects[$t_iKey] = '<strong>' . $t_sSubject . '</strong>';
}
}
$sSubject = implode('', $aSubjects);
return $sSubject;
}
/////////////////////////////// Test \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
$aTest = array(
'goo' => 'I love Google to make my searches, but I`m starting to worry about privacy.'
, 'peo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo'
, 'péo' => 'people, People, PEOPLE, peOple, people!, people., people?, "people, people" péo'
, 'gen' => '"gente", "inteligente", "VAGENS", and "Gente" ...vocês da física que passam o dia protegendo...'
, 'voce' => '...vocês da física que passam o dia protegendo...'
, 'o' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
, 'ø' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
, 'ae' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
, 'Æ' => 'Characters like æ,ø,å,Æ,Ø and Å are used in Denmark, Sweden and Norway'
);
$sContent = '<dl>';
foreach($aTest as $t_sSearchTerm => $t_sSubject){
$sContent .= '<dt>' . $t_sSearchTerm . '</dt><dd>' . emphasiseWord($t_sSubject, $t_sSearchTerm) .'</dd>';
}
$sContent .= '</dl>';
echo $sContent;
?>

I don't understand the importance of matching everything else in the search string, wouldn't this simply be enough?
<?=preg_replace('/\b\S*'.$GET['s'].'\S*\b/i', '<strong>$0</strong>', $message);?>
As far as I can tell, you are only putting the matched word in a html tag, but not doing anything to the rest of the string?
The above regex works fine for cases where you are only matching whole words, captures multiple matches within a string (should there be more than one) and also works fine with case insensitivity.

Related

Using preg_replace to modify link to file

How to use preg_replace to replace some of a link, but keep the original link as text?
I tried using https://www.phpliveregex.com/#tab-preg-replace, but preg_replace is far to complex for my knowledge.
In short I would like to transform this:
!f:\cases\case\20190813_case.pdf!
To this:
<a href='file://server-files/data/cases/case/20190813_case.pdf'>f:\cases\case\20190813_case.pdf</a>
So that the user sees the network drive as a letter, but the link is actually a link via the server name.
$string = "!f:\cases\case\20190813_case.pdf!"
$string = str_ireplace("F:\\", "file://server-files/Data/", $string);
$string = preg_replace("/\!(.*?)\!/", "<a href='$1'>$1</a>", $string);
This gives:
<a href='file://server-files/Data/cases\case\20190813_case.pdf'>file://server-files/cases/case\20190813_case.pdf</a>
It works fine, but I would like to format link text like this
<a href='file://server-files/Data/cases\case\20190813_case.pdf'>f:\cases\case\20190813_case.pdf</a>
Does anyone know if it is possible?
And it might be possible to skip the str_ireplace, and do it all in the preg_replace line?
EDIT
The actual text is like this (had to a anonymize some parts).
Vi har afleveret et skitseprojekt til et nyt domicil for XXXXX
XXXXXXXX.
Mappen kan ses her !F:\A-sager\XXXXXXXX - nyt
domicil\8-Forslag\D-Sendt\fremlagt for bygherren\20190813 domicil.pdf!
Projektet er endnu ikke offentligt.
The text is urlencoded and stored in a XML file.
There is no reason to use regular expressions for simple string replacements. Not saying you should not get over that bearer and learn them, just not needed here really.
<?php
$str = '!f:\cases\case\20190813_case.pdf!';
$str1 = substr($str, 1, strlen($str) -2);
$str2 = substr($str, 4, strlen($str) -5);
echo "<a href='file://{$str2}'>{$str1}</a>";
//<a href='file://cases\case\20190813_case.pdf'>f:\cases\case\20190813_case.pdf</a>
//if slashes are wrong...
var_dump(str_replace('\\', '/', $str1)) ;//see const DIRECTORY_SEPARATOR
//string(31) "f:/cases/case/20190813_case.pdf"
PHP has a string function for about everything you could ever need.
Update: You stated that there can be multiple links in one "string" (in a question since deleted). You've not provided an example of the format though. Assuming a delimiter of ! and you wanting to use pcre try...
<?php
$str = '!f:\cases\case\20190813_case1.pdf!!f:\cases\case\20190813_case2.pdf!!f:\cases\case\20190813_case3.pdf!';
preg_match_all('#!(.*?)!#', $str, $matches);
var_dump($matches[1]);
There are often many ways to accomplish the same basic string manipulation (strtok, explode, etc).
...Seeing your update, sounds like using some XML parser and iterating over these you should be able to use the examples I've provided, specifically the regular expression to isolate it. Watch for false positives if exclamation marks are in the text? Ask if you get stuck on anything else specific and good luck!
Typically I'd say aim to write the code that is most clear and concise. Readable.
I suggest:
$str = <<<'EOD'
Vi har afleveret et skitseprojekt til et nyt domicil for XXXXX XXXXXXXX.
Mappen kan ses her !F:\A-sager\XXXXXXXX - nyt domicil\8-Forslag\D-Sendt\fremlagt for bygherren\20190813 domicil.pdf!
Projektet er endnu ikke offentligt.
EOD;
echo preg_replace_callback('~!f:(.*?)!~i', function ($m) {
return '<a href="file://server-files/Data'
. strtr(rawurlencode($m[1]), ['%5C'=> '/'])
. '">f:' . $m[1] . '</a>';
}, $str);

Converting text to smiley if multiple smileys are combined together not working

I'm trying to convert text ($icon) to smiley image ($image). I used to do it with str_replace(), but that seems to perform the replace sequentially and as such it also replaces items in previously converted results (for example in the tag).
I am now using the following code:
foreach($smiliearray as $image => $icon){
$pattern[]="/(?<!\S)" . preg_quote($icon, '/') . "(?!\S)/u";
$replacement[]=" <img src='$image' border='0' alt=''> ";
}
$text = preg_replace($pattern,$replacement,$text);
This code works, but only if the smiley code is surrounded by whitespace. So basically if someone types ":);)", it won't catch it as two separate smilieys, but ":) ;)" does.
How can I fix it so that also a string of smileys (not separated by space) are converted?
Note that there can be unlimited kinds of smiley codes and smiley images. I do not know beforehand which ones, because other people can submit codes and smileys, so it is not just ":)" and ";)", but can also be "rofl", ":eh", ":-{", etc.
I can partially fix it by adding a \W non-word to the end of the 2nd capturegroup: (?!\S\W), and further by adding a 2nd $pattern and $replacement with a \W to the first capturegroup. But I don't think that is the way it should be done, and it only partially solves it.
I used to do it with str_replace(), but that seems to perform the
replace sequentially and as such it also replaces items in previously
converted results...
A good and true reason to use strtr(). You don't even need Regular Expressions:
<?php
// I assume your original array looks like this
$origSmileys = [
"/1.png" => ':)',
"/2.png" => ':(',
"/3.png" => ':P',
"/4.png" => '>:('
];
// sample input string
$str = " I'm :) but :(>:(:( now :P";
// iterating over smileys to add html tag
$newSmileys = array_map(function($value) {
return "<img src='$value' border='0' alt=''>";
}, array_flip($origSmileys));
// replace
echo strtr($str, $newSmileys);
Live demo

php regular expression help needed on special charecters

here goes my code
$string="According to a report on the Times of In#dia, &#8220 Telan#gana Rashtra Samiti chief K Chandrasekhar #Rao has seen a #sinister motive behind the protests against the formation of Telangana";
preg_match_all('/(?!\b)(#\w+\b)/' ,$string, $matches);
foreach($matches[1] as $match){
$string = str_replace("$match","[h]".$match."[/h]",$string);
}
echo $string;
output
According to a report on the Times of In#dia, &[h]#8220[/h] Telan#gana
Rashtra Samiti chief K Chandrasekhar [h]#Rao[/h] has seen a
[h]#sinister[/h] motive behind the protests against the formation of
Telangana
i want to replace only the string starts with # but it also replacing &#8220 to &[h]#8220[/h] . please help me on this.
Try using a positive lookbehind since there's always a word boundary before a hash # :
/(?<=\s|^)(#\w+\b)/
Which makes sure there's either a space or the beginning of the string before the hashed word.
You can use this in a preg_replace:
$string="According to a report on the Times of In#dia, &#8220 Telan#gana Rashtra Samiti chief K Chandrasekhar #Rao has seen a #sinister motive behind the protests against the formation of Telangana";
$result = preg_replace('/(?<=\s|^)(#\w+\b)/', "[h]$1[/h]", $string);

Find a pattern within two or more sets of text

I have lots of data that I need to search through for certain patterns.
Problem is when looking for said patterns I have no reference to what I'm looking for.
Or in other words, I have two paragraphs. Each on similar topics. I need to be able to compare both paragraphs and find patterns. Phrases said in both paragraphs and how many times both were said.
Can't seem to find the solution because preg_match and other functions your required to supply the things your looking for.
Example paragraphs
Paragraph 1:
Bee Pollen is made by honeybees, and is the food of the young bee. It
is considered one of nature's most completely nourishing foods as it
contains nearly all nutrients required by humans. Bee-gathered pollens
are rich in proteins (approximately 40% protein), free amino acids,
vitamins, including B-complex, and folic acid.
Paragraph 2:
Bee Pollen is made by honeybees. It is required for the fertilization
of the plant. The tiny particles consist of 50/1,000-millimeter
corpuscles, formed at the free end of the stamen in the heart of the
blossom, nature's most completely nourishing foods. Every variety of
flower in the universe puts forth a dusting of pollen. Many orchard
fruits and agricultural food crops do, too.
So from those examples these patterns:
Bee Pollen is made by honeybees
and:
nature's most completely nourishing foods
Both phrases are found in both paragraphs.
This is potentially a complex question depending on whether you're looking for similar phrases or phrases that match word for word.
Finding exact word-for-word matches is quite simple all you need to do is split on common breaks like punctuation marks (e.g. .,;:) and perhaps on conjunctions as well (e.g. and or). However, the problem comes when you come to, for example, adjectives two phrases might be exactly the same but have one word different, like so:
The world is spinnnig around its axis at a tremendous speed.
The world is spinning around its axis at a magnificent speed.
This won't match because tremendous and magnificent are used in place of one another. Potentially you could work around this, however, that would be a more complex question.
Answer
If we stick to the simple side of things we can achieve phrase matching with just a few lines of code (4 in this example; not including the formatting for comments/readability).
$wordSplits = 'and or on of as'; //List of words to split on
preg_match_all('/(?<m1>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para1, $matches1);
preg_match_all('/(?<m2>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para2, $matches2);
$commonPhrases = array_filter( //Removes blank $key=>$value pairs
array_intersect( //Finds matching paterns
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para1 values - removes leading and following spaces
}, $matches1['m1']),
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para2 values - removes leading and following spaces
}, $matches2['m2'])
)
);
var_dump($commonPhrases);
/**
OUTPUT:
array(2) {
[0]=>
string(31) "bee pollen is made by honeybees"
[5]=>
string(41) "nature's most completely nourishing foods"
}
/*
The above code will find matches splitting both on punctuation (defined in [...] of the preg_match_all pattern) it will also concatenate the word list (matching only words in the word list with a preceding and following space).
Wordlist
You can change the word list to include any breaks you like, editing the list until you get the phrases you are after, examples:
$wordSplits = 'and or';
$wordSplits = 'and but if or';
$wordSplits = 'a an as and by but because if in is it of off on or';
Punctuation
You can add any punctuation marks you like into the list (between [ and ]), however remember that some characters do have special meanings and may need to be escaped (or placed appropriately): - and ^ should become \- and \^ or be placed where their special meaning doesn't come into play.
You may consider changing:
([.,;:\-]|
To:
([.,;:\-] | //Adding a space before the pipe
So that you only split punctuation marks which are followed by a space. For example: this would mean that items like 50,000 won't be split.
Spaces and breaks
You may also consider changing the spaces to \s so that tabs and newlines etc are included and not just spaces. Like so:
'/(?<m1>.*?)([.,;:\-]|\s'.str_replace(' ', '\s|\s', trim($wordSplits)).'\s)/i'
This would also apply to:
([.,;:\-]\s|
If you decide to go down that route.
I've been working on this code, don't know if it suits your needs... Feel free to expand it!
$p1 = "Bee Pollen is made by honeybees, and is the food of the young bee. It is considered one of nature's most completely nourishing foods as it contains nearly all nutrients required by humans. Bee-gathered pollens are rich in proteins (approximately 40% protein), free amino acids, vitamins, including B-complex, and folic acid.";
$p2 = "Bee Pollen is made by honeybees. It is required for the fertilization of the plant. The tiny particles consist of 50/1,000-millimeter corpuscles, formed at the free end of the stamen in the heart of the blossom, nature's most completely nourishing foods. Every variety of flower in the universe puts forth a dusting of pollen. Many orchard fruits and agricultural food crops do, too.";
// Strip strings of periods etc.
$p1 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p1));
$p2 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p2));
// Extract words from first paragraph
$w1 = explode(" ", $p1);
// Build search string
$search = '';
$found = array();
foreach ($w1 as $word) {
//echo 'Word: ' . $word . "<br />";
$search .= ' ' . $word;
$search = trim($search);
//echo '. . Search string: '. $search . "<br /><br />";
if (substr_count($p2, $search)) {
$old_search = $search;
$num_occured = substr_count($p2, $search);
//echo " . . . found!" . "<br /><br /><br />";
$add = TRUE;
} else {
//echo " . . . not found! Generating new search string: " . $word . '<br />';
if ($add) {
$found[] = array('pattern' => $old_search, 'occurences' => $num_occured);
$add = FALSE;
}
$old_search = '';
$search = $word;
}
}
print_r($found);
The above code finds occurences of patterns from the first string in the second one.
I'm sure it can be written better, but since it's past midnight (local time), I'm not as "fresh" as I'd like to be...
Codepad-link

my php function strip_tags is not working according to my expectations

I am taking input as comments in my website. where i want few html tags to allow like
<h2>, <h3>, so on. . .
and few to ban.
But i am also using a function which check the part of string and replace it with smilies
let us say '<3' for heart and ':D' for lol
When i use function sanitizeHTML() which is following
public function sanitizeHTML($inputHTML, $allowed_tags = array('<h2>', '<h3>', '<p>', '<br>', '<b>', '<i>', '<a>', '<ul>', '<li>', '<blockquote>', '<span>', '<code>', '<img>')) {
$_allowed_tags = implode('', $allowed_tags);
$inputHTML = strip_tags($inputHTML, $_allowed_tags);
return preg_replace('#<(.*?)>#ise', "'<' . $this->removeBadAttributes('\${1}1') . '>'", $inputHTML);
}
function removeBadAttributes($inputHTML) {
$bad_attributes = 'onerror|onmousemove|onmouseout|onmouseover|' . 'onkeypress|onkeydown|onkeyup|javascript:';
return stripslashes(preg_replace("#($bad_attributes)(\s*)(?==)#is", 'SANITIZED ', $inputHTML));
}
It remove bad attributes and allow only valid tags but when string like <3 for heart come this function remove the part of string after <3 .
Note :
The smilies code which do not have html special chars < or > sign work fine.
You're using PCRE to parse html, which is never a good idea. The expression <(.*?)> will match everything from < up to the next >. You need something more like <[^>]+>. However, that still has problems (and will capture <3). You could use a negative lookahead (<(?!3)[^>]+>) to handle that specific case, but there are a lot of other cases to consider. You may want to consider using a DOM parser instead.

Categories