I want want my output like this when I search a keyword like
"programming"
php programming language
How to do this in php mysql?
Any idea?
Just perform a str_replace on the returned text.
$search = 'programming';
// $dbContent = the response from the database
$dbContent = str_replace( $search , '<b>'.$search.'</b>' , $dbContent );
echo $dbContent;
Any instance of "programming", even if as part of a larger word, will be wrapped in <b> tags.
For instances where more than one word are used
$search = 'programming something another';
// $dbContent = the response from the database
$search = explode( ' ' , $search );
function wrapTag($inVal){
return '<b>'.$inVal.'</b>';
}
$replace = array_map( 'wrapTag' , $search );
$dbContent = str_replace( $search , $replace , $dbContent );
echo $dbContent;
This will split the $search into an array at the spaces, and then wrap each match in the <b> tags.
You could use <b> or <strong> tags (See What's the difference between <b> and <strong>, <i> and <em>? for a dicussion about them).
$search = #$_GET['q'];
$trimmed = trim($search);
function highlight($req_field, $trimmed) //$req_field is the field of your table
{
preg_match_all('~\w+~', $trimmed, $m);
if(!$m)
return $req_field;
$re = '~\\b(' . implode('|', $m[0]) . ')\\b~';
return preg_replace($re, '<b>$0</b>', $req_field);
}
print highlight($req_field, $trimmed);
In this way, you can bolden the searched keywords. Its quite easy and works well.
The response is actually a bit more complicated than that. In the common search results use case, there are other factors to consider:
you should take into account uppercase and lowercase (Programming, PROGRAMMING, programming etc);
if your content string is very long, you wouldn't want to return the whole text, but just the searched query and a few words before and after it, for context;
This guy figured it out:
//$h = text
//$n = keywords to find separated by space
//$w = words near keywords to keep
function truncatePreserveWords ($h,$n,$w=5,$tag='b') {
$n = explode(" ",trim(strip_tags($n))); //needles words
$b = explode(" ",trim(strip_tags($h))); //haystack words
$c = array(); //array of words to keep/remove
for ($j=0;$j<count($b);$j++) $c[$j]=false;
for ($i=0;$i<count($b);$i++)
for ($k=0;$k<count($n);$k++)
if (stristr($b[$i],$n[$k])) {
$b[$i]=preg_replace("/".$n[$k]."/i","<$tag>\\0</$tag>",$b[$i]);
for ( $j= max( $i-$w , 0 ) ;$j<min( $i+$w, count($b)); $j++) $c[$j]=true;
}
$o = ""; // reassembly words to keep
for ($j=0;$j<count($b);$j++) if ($c[$j]) $o.=" ".$b[$j]; else $o.=".";
return preg_replace("/\.{3,}/i","...",$o);
}
Works like a charm!
Related
Trying to convert a PHP function to Python, i am newbie in case of python, tthats what i tried
Python ->
def stopWords(text, stopwords):
stopwords = map(to_lower(x),stopwords)
pattern = '/[0-9\W]/'
text = re.sub(pattern, ',', text)
text_array = text.partition(',');
text_array = map(to_lower(x), text_array);
keywords = []
for term in text_array:
if(term in stopwords):
keywords.append(term)
return filter(None, keywords)
stopwords = open('stop_words.txt','r').read()
text = "All words in the English language can be classified as one of the eight different parts of speech."
print(stopWords(text, stopwords))
PHP ->
function stopWords($text, $stopwords)
{
// Remove line breaks and spaces from stopwords
$stopwords = array_map(
function ($x)
{
return trim(strtolower($x));
}
, $stopwords);
// Replace all non-word chars with comma
$pattern = '/[0-9\W]/';
$text = preg_replace($pattern, ',', $text);
// Create an array from $text
$text_array = explode(",", $text);
// remove whitespace and lowercase words in $text
$text_array = array_map(
function ($x)
{
return trim(strtolower($x));
}
, $text_array);
foreach($text_array as $term)
{
if (!in_array($term, $stopwords))
{
$keywords[] = $term;
}
};
return array_filter($keywords);
}
$stopwords = file('stop_words.txt');
$stopwords = file('stop_words.txt');
$text = "All words in the English language can be classified as one of the eight different parts of speech.";
print_r(stopWords($text, $stopwords));
I am getting the error in python on cmd:
IndentationError: unindent does not match any outer indentation level
Plz figure out what i am doing wrong, and "file" alternative in python
The for should be indented, as you write it, it seems to be out of the function. Moreover, the last return is not aligned both with the for or the function.
A correct indentation should look like this:
def stopWords(text, stopwords):
stopwords = map(to_lower(x),stopwords)
pattern = '/[0-9\W]/'
text = re.sub(pattern, ',', text)
text_array = text.partition(',');
text_array = map(to_lower(x), text_array);
keywords = []
for term in text_array:
if(term in stopwords):
keywords.append(term)
return filter(None, keywords)
I'm currently in the process of setting my first website implementing SQL.
I wish to use one of the columns from a table to identify the most commonly used word in the columns.
So, that is to say:
// TABLE = STUFF
// COLUMN0 = Hello there
// COLUMN1 = Hello I am Stuck
// COLUMN2 = Hi dude
// COLUMN3 = What's Up?
Therefore I wish to return a string of 'HELLO' as the most common word.
I should say I am using PHP and Dreamweaver to communicate with the SQL server, so I am placing the SQL query with in the relevant SQL line of a Recordset, with the result to be consequently placed on the site.
Any help would be great.
Thanks
You can calculate the most common words in PHP like this:
function extract_common_words($string, $stop_words, $max_count = 5) {
$string = preg_replace('/ss+/i', '', $string);
$string = trim($string); // trim the string
$string = preg_replace('/[^a-zA-Z -]/', '', $string); // only take alphabet characters, but keep the spaces and dashes too…
$string = strtolower($string); // make it lowercase
preg_match_all('/\b.*?\b/i', $string, $match_words);
$match_words = $match_words[0];
foreach ( $match_words as $key => $item ) {
if ( $item == '' || in_array(strtolower($item), $stop_words) || strlen($item) <= 3 ) {
unset($match_words[$key]);
}
}
$word_count = str_word_count( implode(" ", $match_words) , 1);
$frequency = array_count_values($word_count);
arsort($frequency);
//arsort($word_count_arr);
$keywords = array_slice($frequency, 0, $max_count);
return $keywords;
}
I'm currently investigating some new ideas for long tail SEO. I have a site where people can create their own blogs, which brings pretty good long tail traffic already. I'm already displaying the article title inside the article's title tags.
However, often the title does not match well for keywords in the content, and I'm interested in maybe adding some keywords into the title that php has actually determined would be best.
I've tried using a script which I made to work out what the most common words are on a page. This works ok but the problem with this is it comes up with pretty useless words.
It's occurred to me that what would be useful is to make a php script that would extract frequently occurring pairs (or sets of 3) words and then put them in an array ordered by how often they occur.
My problem: how to parse text in a more dynamic way to look for recurring pairs or triplets of words. How would I go about this?
function extractCommonWords($string, $keywords){
$stopWords = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
$string = preg_replace('/\s\s+/i', '', $string); // replace whitespace
$string = trim($string); // trim the string
$string = preg_replace('/[^a-zA-Z0-9 -]/', '', $string); // only take alphanumerical characters, but keep the spaces and dashes too…
$string = strtolower($string); // make it lowercase
preg_match_all('/\b.*?\b/i', $string, $matchWords);
$matchWords = $matchWords[0];
foreach ( $matchWords as $key=>$item ) {
if ( $item == '' || in_array(strtolower($item), $stopWords) || strlen($item) <= 3 ) {
unset($matchWords[$key]);
}
}
$wordCountArr = array();
if ( is_array($matchWords) ) {
foreach ( $matchWords as $key => $val ) {
$val = strtolower($val);
if ( isset($wordCountArr[$val]) ) {
$wordCountArr[$val]++;
} else {
$wordCountArr[$val] = 1;
}
}
}
arsort($wordCountArr);
$wordCountArr = array_slice($wordCountArr, 0, $keywords);
return $wordCountArr;
}
For the sake of including some code - here's another primitive adaptation that returns multi-word keywords of a given length and occurrences - rather than strip all common words it only filters those that are at the start and end of a keyword. It still returns some nonsense but that is unavoidable really.
function getLongTailKeywords($str, $len = 3, $min = 2){ $keywords = array();
$common = array('i','a','about','an','and','are','as','at','be','by','com','de','en','for','from','how','in','is','it','la','of','on','or','that','the','this','to','was','what','when','where','who','will','with','und','the','www');
$str = preg_replace('/[^a-z0-9\s-]+/', '', strtolower(strip_tags($str)));
$str = preg_split('/\s+-\s+|\s+/', $str, -1, PREG_SPLIT_NO_EMPTY);
while(0<$len--) for($i=0;$i<count($str)-$len;$i++){
$word = array_slice($str, $i, $len+1);
if(in_array($word[0], $common)||in_array(end($word), $common)) continue;
$word = implode(' ', $word);
if(!isset($keywords[$len][$word])) $keywords[$len][$word] = 0;
$keywords[$len][$word]++;
}
$return = array();
foreach($keywords as &$keyword){
$keyword = array_filter($keyword, function($v) use($min){ return !!($v>$min); });
arsort($keyword);
$return = array_merge($return, $keyword);
}
return $return;
}
run code *on random BBC News article
The problem with just ignoring common words, grammar and punctuation though is that they still carry meaning within a sentence. If you remove them you are at best changing the meaning or at worst generating unintelligible phrases. Even the idea of extracting "keywords" itself is flawed because words can have different meanings - when you remove them from a sentence you take them out of context.
It's not my area but there are complex studies into natural languages and there is no easy solution - though the general theory goes like this: A computer cannot decipher the meaning of a single piece of text, it has to rely on cross referencing a semantically tagged corpus of related material (which is a huge overhead).
I have the following string stored in a variable in PHP.
The words inside the quotes should be in '''bold'''. Like '''this''' and '''that'''
Here triple quotes ''' are used to represent that the word should be shown bold.
What is the most efficient way to replace this with the <strong> tag?
i would say regex with something like that :
$new_string = preg_replace('/\'\'\'([^\']+)\'\'\'/', '<strong>$1</strong>', $string);
Even though #atrepp's answer is correct, I ended up using the following function
function makeBold($string) {
$quote = ''''';
$count = substr_count($string, $quote);
for ($i = 0; $i <= $count/2; $i++) {
$string = preg_replace("/$quote/", '<strong>', $string, 1);
$string = preg_replace("/$quote/", '</strong>', $string, 1);
}
return $string;
}
because
My string was actually in encoded form (ie) it had ' instead of '
His answer doesn't work when the word to be bolded has ' in it
I'm looking for a way that I can extract the first letter of each word from an input field and place it into a variable.
Example: if the input field is "Stack-Overflow Questions Tags Users" then the output for the variable should be something like "SOQTU"
$s = 'Stack-Overflow Questions Tags Users';
echo preg_replace('/\b(\w)|./', '$1', $s);
the same as codaddict's but shorter
For unicode support, add the u modifier to regex: preg_replace('...../u',
Something like:
$s = 'Stack-Overflow Questions Tags Users';
if(preg_match_all('/\b(\w)/',strtoupper($s),$m)) {
$v = implode('',$m[1]); // $v is now SOQTU
}
I'm using the regex \b(\w) to match the word-char immediately following the word boundary.
EDIT:
To ensure all your Acronym char are uppercase, you can use strtoupper as shown.
Just to be completely different:
$input = 'Stack-Overflow Questions Tags Users';
$acronym = implode('',array_diff_assoc(str_split(ucwords($input)),str_split(strtolower($input))));
echo $acronym;
$initialism = preg_replace('/\b(\w)\w*\W*/', '\1', $string);
If they are separated by only space and not other things. This is how you can do it:
function acronym($longname)
{
$letters=array();
$words=explode(' ', $longname);
foreach($words as $word)
{
$word = (substr($word, 0, 1));
array_push($letters, $word);
}
$shortname = strtoupper(implode($letters));
return $shortname;
}
Regular expression matching as codaddict says above, or str_word_count() with 1 as the second parameter, which returns an array of found words. See the examples in the manual. Then you can get the first letter of each word any way you like, including substr($word, 0, 1)
The str_word_count() function might do what you are looking for:
$words = str_word_count ('Stack-Overflow Questions Tags Users', 1);
$result = "";
for ($i = 0; $i < count($words); ++$i)
$result .= $words[$i][0];
function initialism($str, $as_space = array('-'))
{
$str = str_replace($as_space, ' ', trim($str));
$ret = '';
foreach (explode(' ', $str) as $word) {
$ret .= strtoupper($word[0]);
}
return $ret;
}
$phrase = 'Stack-Overflow Questions IT Tags Users Meta Example';
echo initialism($phrase);
// SOQITTUME
$s = "Stack-Overflow Questions IT Tags Users Meta Example";
$sArr = explode(' ', ucwords(strtolower($s)));
$sAcr = "";
foreach ($sArr as $key) {
$firstAlphabet = substr($key, 0,1);
$sAcr = $sAcr.$firstAlphabet ;
}
using answer from #codaddict.
i also thought in a case where you have an abbreviated word as the word to be abbreviated e.g DPR and not Development Petroleum Resources, so such word will be on D as the abbreviated version which doesn't make much sense.
function AbbrWords($str,$amt){
$pst = substr($str,0,$amt);
$length = strlen($str);
if($length > $amt){
return $pst;
}else{
return $pst;
}
}
function AbbrSent($str,$amt){
if(preg_match_all('/\b(\w)/',strtoupper($str),$m)) {
$v = implode('',$m[1]); // $v is now SOQTU
if(strlen($v) < 2){
if(strlen($str) < 5){
return $str;
}else{
return AbbrWords($str,$amt);
}
}else{
return AbbrWords($v,$amt);
}
}
}
As an alternative to #user187291's preg_replace() pattern, here is the same functionality without needing a reference in the replacement string.
It works by matching the first occurring word characters, then forgetting it with \K, then it will match zero or more word characters, then it will match zero or more non-word characters. This will consume all of the unwanted characters and only leave the first occurring word characters. This is ideal because there is no need to implode an array of matches. The u modifier ensures that accented/multibyte characters are treated as whole characters by the regex engine.
Code: (Demo)
$tests = [
'Stack-Overflow Questions Tags Users',
'Stack Overflow Close Vote Reviewers',
'Jean-Claude Vandàmme'
];
var_export(
preg_replace('/\w\K\w*\W*/u', '', $tests)
);
Output:
array (
0 => 'SOQTU',
1 => 'SOCVR',
2 => 'JCV',
)