I need help with writing a function for smart selection of fragments in the text.
Src text = "Regulation is mediated via many different mechanisms"
HightLight string = "mediate via"
Expected Result = "Regulation is mediated via many different mechanisms"
I found one solution in google, but it is not work correctly with strings with dynamic endings
<?php
$string = "The monkey hangs from the door";
$keyword = "the";
function highlightkeyword($str, $search) {
$occurrences = substr_count(strtolower($str), strtolower($search));
$newstring = $str;
$match = array();
for ($i=0;$i<$occurrences;$i++) {
$match[$i] = stripos($str, $search, $i);
$match[$i] = substr($str, $match[$i], strlen($search));
$newstring = str_replace($match[$i], '[#]'.$match[$i].'[#]', strip_tags($newstring));
}
$newstring = str_replace('[#]', '<b>', $newstring);
$newstring = str_replace('[#]', '</b>', $newstring);
return $newstring;
}
?>
Another examples:
Ex1:
src = is mediated via many
search = mediate via
result = is [b]mediated via[/b] many
Ex2:
src = are meddling in local affairs.
search = meddle in
result = are [b]meddling in[/b] local affairs.
Ex3:
src = who can not get married in France.
search = marry in
result = who can not get [b]married in[/b] France.
!! search string contain marry in, but source contain married in
To make patterns recognizable you can use the power of regex
function highlightkeyword($keyword, $string) {
return preg_replace("/{$keyword}/", '<strong>\\0</strong>', $string);
}
Examples
$string = "Regulation is mediated via many different mechanisms";
$keyword = "mediate.*? via";
echo highlightkeyword($keyword, $string), PHP_EOL;
Regulation is <strong>mediated via</strong> many different mechanisms
$string = "Who can not get married in France.";
$keyword = "marr(ied|y)";
echo highlightkeyword($keyword, $string), PHP_EOL;
Who can not get <strong>married</strong> in France.
$string = "Who can not marry in France.";
$keyword = "marr(ied|y)";
echo highlightkeyword($keyword, $string), PHP_EOL;
Who can not <strong>marry</strong> in France.
Related
I have a search String: $str (Something like "test"), a wrap string: $wrap (Something like "|") and a text string: $text (Something like "This is a test Text").
$str is 1 Time in $text. What i want now is a function that will wrap $str with the wrap defined in $wrap and output the modified text (even if $str is more than one time in $text).
But it shall not output the whole text but just 1-2 of the words before $str and then 1-2 of the words after $str and "..." (Only if it isn`t the first or last word). Also it should be case insensitive.
Example:
$str = "Text"
$wrap = "<span>|</span>"
$text = "This is a really long Text where the word Text appears about 3 times Text"
Output would be:
"...long <span>Text</span> where...word <span>Text</span> appears...times <span>Text</span>"
My Code (Obviusly doesnt works):
$tempar = preg_split("/$str/i", $text);
if (count($tempar) <= 2) {
$result = "... ".substr($tempar[0], -7).$wrap.substr($tempar[1], 7)." ...";
} else {
$amount = substr_count($text, $str);
for ($i = 0; $i < $amount; $i++) {
$result = $result.".. ".substr($tempar[$i], -7).$wrap.substr($tempar[$i+1], 0, 7)." ..";
}
}
If you have a tipp or a solution dont hesitate to let me know.
I have taken your approach and made it more flexible. If $str or $wrap changes you could have escaping issues within the regex pattern so I have used preg_quote.
Note that I added $placeholder to make it clearer, but you can use $placeholder = "|" if you don't like [placeholder].
function wrapInString($str, $text, $element = 'span') {
$placeholder = "[placeholder]"; // The string that will be replaced by $str
$wrap = "<{$element}>{$placeholder}</{$element}>"; // Dynamic string that can handle more than just span
$strExp = preg_quote($str, '/');
$matches = [];
$matchCount = preg_match_all("/(\w+\s+)?(\w+\s+)?({$strExp})(\s+\w+)?(\s+\w+)?/i", $text, $matches);
$response = '';
for ($i = 0; $i < $matchCount; $i++) {
if (strlen($matches[1][$i])) {
$response .= '...';
}
if (strlen($matches[2][$i])) {
$response .= $matches[2][$i];
}
$response .= str_replace($placeholder, $matches[3][$i], $wrap);
if (strlen($matches[4][$i])) {
$response .= $matches[4][$i];
}
if (strlen($matches[5][$i]) && $i == $matchCount - 1) {
$response .= '...';
}
}
return $response;
}
$text = "text This is a really long Text where the word Text appears about 3 times Text";
string(107) "<span>text</span> This...long <span>text</span> where...<span>text</span> appears...times <span>text</span>"
To make the replacement case insensitive you can use the i regex option.
If I understand your question correct, just a little bit of implode and explode magic needed
$text = "This is a really long Text where the word Text appears about 3 times Text";
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));
If you specifically need to render the span tags using HTML, just write it that way
$arr = explode("Text", $text);
print_r(implode('<span>Text</span>', $arr));
Use patern below to get your word and 1-2 words before and after
/((\w+\s+){1,2}|^)text((\s+\w+){1,2}|$)/i
demo
In PHP code it can be:
$str = "Text";
$wrap = "<span>|</span>";
$text = "This is a really long Text where the word Text appears about 3 times Text";
$temp = str_replace('|', $str, $wrap); // <span>Text</span>
// find patern and 1-2 words before and after
// (to make it casesensitive, delete 'i' from patern)
if(preg_match_all('/((\w+\s+){1,2}|^)text((\s+\w+){1,2}|$)/i', $text, $match)) {
$res = array_map(function($x) use($str, $temp) { return '... '.str_replace($str, $temp, $x) . ' ...';}, $match[0]);
echo implode(' ', $res);
}
I have a string and some words, i want to highlight those words which match with string, and also i want to print only those words which are highlighted, like if apple matches, then only apple must be printed.
$string = "apple computer";
$keyword = "apple,orange,bike";
I am using the following function to highlight specific characters in a string.
$str = preg_replace("/($keyword)/i","<span style='color:orange;'>$0</span>",$string);
The problem is I want to show only those characters which are highlighted, currently it shows all the characters.
This would meet your need.
$string = " apple computer orange";
$keywords = "apple, orange";
$exp_kwd = explode(",", $keywords);
$res = "<span style='color:orange;'>";
foreach($exp_kwd as $val){
if(strpos($string, trim($val))){
$res .= $val." ";
}
}
$res = $res."</span>";
echo $res;
Hopefully this also will work
$string = "apple computer orange tested";
$keyword = "apple,orange,bike,tested";
$pattern="/".str_replace(",","/,/",$keyword)."/";
$pattern=explode(",",$pattern);
$string=explode(" ",$string);
$keyword =explode(",",$keyword);
$string=implode(",",(preg_filter($pattern, $keyword, $string)));
echo $string="<span style='color:orange;'>$string</span>";
$string = "Im On #Here";
$keyword = "#";
$var = strrchr($string,$keyword);
if(empty($var))
{
echo 'No Occerunce Found';
}
else
{
echo '<span style="color:orange;">'.$var.'</span>';
}
phpfiddle Preview
I am trying to use a script to search a text file and return words that meet certain criteria:
*The word is only listed once
*They are not one words in an ignore list
*they are the top 10% of the longest words
*they are not repeating letters
*The final list would be a random ten that met the above criteria.
*If any of the above were false then words reported would be null.
I've put together the following but the script dies at arsort() saying it expects an array. Can anyone suggest a change to make arsort work? Or suggest an alternative (simpler) script to find metadata?**I realize this second question may be a question better suited for another StackExchange.
<?php
$fn="../story_link";
$str=readfile($fn);
function top_words($str, $limit=10, $ignore=""){
if(!$ignore) $ignore = "the of to and a in for is The that on said with be was by";
$ignore_arr = explode(" ", $ignore);
$str = trim($str);
$str = preg_replace("#[&].{2,7}[;]#sim", " ", $str);
$str = preg_replace("#[()°^!\"§\$%&/{(\[)\]=}?´`,;.:\-_\#'~+*]#", " ", $str);
$str = preg_replace("#\s+#sim", " ", $str);
$arraw = explode(" ", $str);
foreach($arraw as $v){
$v = trim($v);
if(strlen($v)<3 || in_array($v, $ignore_arr)) continue;
$arr[$v]++;
}
arsort($arr);
return array_keys( array_slice($arr, 0, $limit) );
}
$meta_keywords = implode(", ", top_words( strip_tags( $html_content ) ) );
?>
The problem is when your loop never increments $arr[$v], which results in the possibility of $arr not becoming defined. This is the reason for your error because then arsort() is given null as its argument - not an array.
The solution is to define $arr as an array before the loop for instances where $arr[$v]++; isn't executed.
function top_words($str, $limit=10, $ignore=""){
if(!$ignore) $ignore = "the of to and a in for is The that on said with be was by";
$ignore_arr = explode(" ", $ignore);
$str = trim($str);
$str = preg_replace("#[&].{2,7}[;]#sim", " ", $str);
$str = preg_replace("#[()°^!\"§\$%&/{(\[)\]=}?´`,;.:\-_\#'~+*]#", " ", $str);
$str = preg_replace("#\s+#sim", " ", $str);
$arraw = explode(" ", $str);
$arr = array(); // Defined $arr here.
foreach($arraw as $v){
$v = trim($v);
if(strlen($v)<3 || in_array($v, $ignore_arr)) continue;
$arr[$v]++;
}
arsort($arr);
return array_keys( array_slice($arr, 0, $limit) );
}
Came across an excellent code that words well for this:
<?php
function extract_keywords($str, $minWordLen = 3, $minWordOccurrences = 2, $asArray = false, $maxWords = 5, $restrict = true)
{
$str = str_replace(array("?","!",";","(",")",":","[","]"), " ", $str);
$str = str_replace(array("\n","\r"," "), " ", $str);
strtolower($str);
function keyword_count_sort($first, $sec)
{
return $sec[1] - $first[1];
}
$str = preg_replace('/[^\p{L}0-9 ]/', ' ', $str);
$str = trim(preg_replace('/\s+/', ' ', $str));
$words = explode(' ', $str);
// If we don't restrict tag usage, we'll remove common words from array
if ($restrict == false) {
$commonWords = array('a','able','about','above', 'get a list here http://www.wordfrequency.info','you\'ve','z','zero');
$words = array_udiff($words, $commonWords,'strcasecmp');
}
// Restrict Keywords based on values in the $allowedWords array
// Use if you want to limit available tags
if ($restrict == true) {
$allowedWords = array('engine','boeing','electrical','pneumatic','ice','pressurisation');
$words = array_uintersect($words, $allowedWords,'strcasecmp');
}
$keywords = array();
while(($c_word = array_shift($words)) !== null)
{
if(strlen($c_word) < $minWordLen) continue;
$c_word = strtolower($c_word);
if(array_key_exists($c_word, $keywords)) $keywords[$c_word][1]++;
else $keywords[$c_word] = array($c_word, 1);
}
usort($keywords, 'keyword_count_sort');
$final_keywords = array();
foreach($keywords as $keyword_det)
{
if($keyword_det[1] < $minWordOccurrences) break;
array_push($final_keywords, $keyword_det[0]);
}
$final_keywords = array_slice($final_keywords, 0, $maxWords);
return $asArray ? $final_keywords : implode(', ', $final_keywords);
}
$text = "Many systems that traditionally had a reliance on the pneumatic system have been transitioned to the electrical architecture. They include engine start, API start, wing ice protection, hydraulic pumps and cabin pressurisation. The only remaining bleed system on the 787 is the anti-ice system for the engine inlets. In fact, Boeing claims that the move to electrical systems has reduced the load on engines (from pneumatic hungry systems) by up to 35 percent (not unlike today’s electrically power flight simulators that use 20% of the electricity consumed by the older hydraulically actuated flight sims).";
echo extract_keywords($text);
// Advanced Usage
// $exampletext = "The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog. The quick brown fox jumped over the lazy dog.";
// echo extract_keywords($exampletext, 3, 1, false, 5, false);
?>
In PHP, I'd like to crop the following sentence:
"Test 1. Test 2. Test 3."
and transform this into 2 strings:
"Test 1. Test 2." and "Test 3."
How do I achieve this?
Do I use strpos?
Many thanks for any pointers.
function isIndex($i){
$i = (isset($i)) ? $i : false;
return $i;
}
$str = explode("2.", "Test 1. Test 2. Test 3.");
$nstr1 = isIndex(&$str[0]).'2.';
$nstr2 = isIndex(&$str[1]);
to separate the two first sentence, this should do it quick :
$str = "Lorem Ipsum dolor sit amet etc etc. Blabla 2. Blabla 3. Test 4.";
$p1 = "";
$p2 = "";
explode_paragraph($str, $p1, $p2); // fills $p1 and $p2
echo $p1; // two first sentences
echo $p2; // the rest of the paragraph
function explode_paragraph($str, &$part1, &$part2) {
$s = $str;
$first = strpos($s,"."); // tries to find the first dot
if ($first>-1) {
$s = substr($s, $first); // crop the paragraph after the first dot
$second = strpos($s,"."); // tries to find the second dot
if ($second>-1) { // a second one ?
$part1 = substr($str, 9, $second); //
$part2 = substr($str, $second);
} else { // only one dot : part1 will be everything, no part2
$part1 = $str;
$part2 = "";
}
} else { // no sentences at all.. put something in part1 ?
$part1 = ""; // $part1 = $str;
$part2 = "";
}
}
$string="Inspired by arthropod insects and spiders, BAIUST researchers have created an entirely new type of semi-soft robots capable of standing and walking using drinking straws and inflatable tubing. Inspired by arthropod insects and spiders, BAIUST researchers have created an entirely new type of semi-soft robots capable of standing and walking using drinking straws and inflatable tubing. Inspired by arthropod insects and spiders, BAIUST researchers have created an entirely new type of semi-soft robots capable of standing and walking using drinking straws and inflatable tubing.
";
call :
echo short_description_in_complete_sentence($string,3,1000,2);
function:
public function short_description_in_complete_sentence($string,$start_point,$end_point,$sentence_number=1){
$final_string='';
//$div_string=array();
$short_string=substr($string,$start_point,$end_point);
$div_string=explode('.',$short_string);
for($i=0;$i<$sentence_number;$i++){
if(!Empty($div_string[$sentence_number-1])){
$final_string=$final_string.$div_string[$i].'.';
}else{ $final_string='Invalid sentence number or total character number!';}
}
return $final_string;
}
Something like that?
$str = 'Test 1. Test 2. Test 3.';
$strArray = explode('.', $str);
$str1 = $strArray[0] . '. ' . $strArray[1] . '.';
$str2 = $strArray[2] . '.';
what is the main reason why you must cut the text after "Test 2." and not before ? the best solution depends on what you want to do and what you will eventually want to do with that
I want to convert any title e.g. of a blog entry to a user friendly url. I used rawurlencode() to do that but it gives me a lot of strange strings like %s.
The algorithm should consider german chars like Ö, Ä, etc. I want to make a url from title and be able to get the title by decoding the url.
I tried some of this code: http://pastebin.com/L1SwESBn that is provided in some other questions but it seems to be one way.
E.g. HÖRZU.de -> hoerzu-de -> HÖRZU.de
Any ideas?
You want to create slugs, but from experience i can tell you the decode possibilities are limited. For example "Foo - Bar" will become "foo-bar" so how do you then can possibly know that it wasn't "foo bar" or "foo-bar" all along?
Or how about chars that you don't want in your slug and also have no representation for like " ` "?
So you can ether use a 1 to 1 converstion like rawurlencode() or you can create a Slug, here is an example for a function - but as i said, no reliable decoding possible - its just in its nature since you have to throw away Information.
function sanitizeStringForUrl($string){
$string = strtolower($string);
$string = html_entity_decode($string);
$string = str_replace(array('ä','ü','ö','ß'),array('ae','ue','oe','ss'),$string);
$string = preg_replace('#[^\w\säüöß]#',null,$string);
$string = preg_replace('#[\s]{2,}#',' ',$string);
$string = str_replace(array(' '),array('-'),$string);
return $string;
}
function url_title($str, $separator = 'dash', $lowercase = FALSE)
{
if ($separator == 'dash')
{
$search = '_';
$replace = '-';
}
else
{
$search = '-';
$replace = '_';
}
$trans = array(
'&\#\d+?;' => '',
'&\S+?;' => '',
'\s+' => $replace,
'[^a-z0-9\-\._]' => '',
$replace.'+' => $replace,
$replace.'$' => $replace,
'^'.$replace => $replace,
'\.+$' => ''
);
$str = strip_tags($str);
foreach ($trans as $key => $val)
{
$str = preg_replace("#".$key."#i", $val, $str);
}
if ($lowercase === TRUE)
{
$str = strtolower($str);
}
return trim(stripslashes($str));
}
The most elegant way I think is using a Behat\Transliterator\Transliterator.
I need to extends this class by your class because it is an Abstract, some like this:
<?php
use Behat\Transliterator\Transliterator;
class Urlizer extends Transliterator
{
}
And then, just use it:
$text = "Master Ápiu";
$urlizer = new Urlizer();
$slug = $urlizer->transliterate($slug, "-");
echo $slug; // master-apiu
Of course you should put this things in your composer as well.
composer require behat/transliterator
More info here https://github.com/Behat/Transliterator
there is no reliable way to 'decode' the slug back to its original form. the best solution here would be to database the slug and its original.