Reliable and effective custom search & replace function - preg or str replace - php

In a few different guises I've asked about this "filter" on here and WPSE. I'm now taking a different approach to it, and I'd like to make it solid and reliable.
My situation:
When I create a post in my WordPress CMS, I want to run a filter which searches for certain terms and replaces them with links.
I have the terms that I want to search for in two arrays: $glossary_terms and $species_terms.
$species_terms is a list of scientific names of fishes, such as Apistogramma panduro.
$glossary_terms is a list of fishkeeping glossary terms such as abdomen, caudal-fin and Gram's Method.
There are a few nuances worth noting:
Speed is not an issue, as I will be running this filter in the background rather than when a user visits the page or whan an author submits/edits a species profile or post.
Some of the post content being filtered may contain HTML with these terms in, like <img src="image.jpg" title="Apistogramma panduro male" />. Obviously these shouldn't be replaced.
Species are often referred to with an abbreviated Genus, so instead of Apistogramma panduro, you'll often see A. panduro. This means I need to search & replace all of the species terms as an abbreviation too - Apistogramma panduro, A. panduro, Satanoperca daemon, S. daemon etc.
If caudal-fin and caudal both exist in the glossary terms, caudal-fin should be replaced first.
I was contemplating simply adding a preg_replace which searched for the terms, but only with a space on the left, (i.e. ( )term) and a space, comma, exclamation, full-stop or hyphen on the right (i.e. term(, . ! - )) but that won't help me to not break the image HTML.
Example content
<br />
It looks very similar to fishes of the <i>B. foerschi</i> group/complex but its breeding strategy, adult size and observed behaviour preclude its inclusion in that assemblage.
Instead it appears to be a member of the <i>B. coccina</i> group which currently includes <i>B. brownorum</i>, <i>B. burdigala</i>, <i>B. coccina</i>, <i>B. livida</i>, <i>B. miniopinna</i>, <i>B. persephone</i>, <i>B. tussyae</i>, <i>B. rutilans</i> and <i>B. uberis</i>.
Of these it's most similar in appearance to <i>B. uberis</i> but can be distinguished by its noticeably shorter dorsal-fin base and overall blue-greenish (vs. green/reddish) colouration.
Members of this group are characterised by their small adult size (< 40 mm SL), a uniform red or black base body colour, the presence of a midlateral body blotch in some species and the fact they have 9 abdominal vertebrae compared with 10-12 in the other species groups. In addition all are obligate peat swamp dwellers (Tan and Ng, 2005).<br />
^^^ This example here has had the correct links manually inserted. The filter shouldn't break these links!
It looks very similar to fishes of the B. foerschi group/complex but its breeding strategy, adult size and observed behaviour preclude its inclusion in that assemblage.
Instead it appears to be a member of the B. coccina group which currently includes B. brownorum, B. burdigala, B. coccina, B. livida, B. miniopinna, B. persephone, B. tussyae, B. rutilans and B. uberis.
Of these it's most similar in appearance to B. uberis but can be distinguished by its noticeably shorter dorsal-fin base and overall blue-greenish (vs. green/reddish) colouration.
Members of this group are characterised by their small adult size (< 40 mm SL), a uniform red or black base body colour, the presence of a midlateral body blotch in some species and the fact they have 9 abdominal vertebrae compared with 10-12 in the other species groups. In addition all are obligate peat swamp dwellers (Tan and Ng, 2005).
^^^ Same example pre-formatting.
[caption id="attachment_542" align="alignleft" width="125" caption="Amazonas Magazine - now in English!"]<img class="size-thumbnail wp-image-542" title="Amazonas English" src="/wp-content/uploads/2011/12/Amazonas-English-1-288x381.jpg" alt="Amazonas English" width="125" height="165" />[/caption]
Edited by Hans-Georg Evers, the magazine 'Amazonas' has been widely-regarded as among the finest regular publications in the hobby since its launch in 2005, an impressive achievment considering it's only been published in German to date. The long-awaited English version is just about to launch, and we think a subscription should be top of any serious fishkeeper's Xmas list...
The magazine is published in a bi-monthly basis and the English version launches with the January/February 2012 issue with distributors already organised in the United States, Canada, the United Kingdom, South Africa, Australia, and New Zealand. There are also mobile apps availablen which allow digital subscribers to read on portable devices.
It's fair to say that there currently exists no better publication for dedicated hobbyists with each issue featuring cutting-edge articles on fishes, invertebrates, aquatic plants, field trips to tropical destinations plus the latest in husbandry and breeding breakthroughs by expert aquarists, all accompanied by excellent photography throughout.
U.S. residents can subscribe to the printed edition for just $29 USD per year, which also includes a free digital subscription, with the same offer available to Canadian readers for $41 USD or overseas subscribers for $49 USD. Please see the Amazonas website for further information and a sample digital issue!
Alternatively, subscribe directly to the print version here or digital version here.
^^^ This will likely only have a few Glossary terms in rather than any species links.
Example terms
$species_terms
339 => 'Aulonocara maylandi maylandi',
340 => 'Aulonocara maylandi kandeensis',
341 => 'Aulonocara sp. "walteri"',
342 => 'Aulonocara sp. "stuartgranti maleri"',
343 => 'Aulonocara stuartgranti',
344 => 'Benthochromis tricoti',
345 => 'Boulengerochromis microlepis',
346 => 'Buccochromis lepturus',
347 => 'Buccochromis nototaenia',
348 => 'Betta brownorum',
349 => 'Betta foerschi',
350 => 'Betta coccina',
351 => 'Betta uberis'
As you can see above, the general format for these scientific names is "Genus species", but can often include "sp." or "aff." (for species which aren't officially described) and "Genus species subspecies" formats.
$glossary_terms
1 => 'abdomen',
2 => 'caudal',
3 => 'caudal-fin',
4 => 'caudal-fin peduncle',
5 => 'Gram\'s Method'
If anyone can come up with a filter which meets all these conditions and requirements, I'd like to offer a bounty.
Thanks in advance,

I think it's better to use DOMDocument functionality than regexps. Here is a working prototype:
// Each dynamically constructed regexp will contain at most 70 subpatterns
define('GROUPS_PER_REGEXPS', 70);
$speciesTerms = array(
339 => '(?:Aulonocara|A\.) maylandi maylandi',
340 => '(?:Aulonocara|A\.) maylandi kandeensis',
344 => '(?:Benthochromis|B\.) tricoti',
345 => '(?:Boulengerochromis|B\.) microlepis',
);
function matchTerms($text) {
// Globals are not good. I left it for the simplicity
global $speciesTerms;
$result = array();
$t = 0;
$speciesCount = count($speciesTerms);
reset($speciesTerms);
while ($t < $speciesCount) {
// Maps capturing group identifiers to term ids
$termMapping = array();
// Dynamically construct regexp
$groups = '';
$c = 1;
while (list($termId, $termPattern) = each($speciesTerms)) {
if (!empty($groups)) {
$groups .= '|';
}
// Match word boundaries, so we don't capture "B. tricotisomeramblingstring"
$groups .= '(\b' . $termPattern . '\b)';
$termMapping[$c++] = $termId;
if (++$t % GROUPS_PER_REGEXPS == 0) {
break;
}
}
$regexp = "/$groups/m";
preg_match_all($regexp, $text, $matches, PREG_OFFSET_CAPTURE);
for ($i = 1; $i < $c; $i++) {
foreach ($matches[$i] as $matchData) {
// matchData[0] holds matched string, e.g. Benthochromis tricoti
// matchData[1] holds offset, e.g. 15
if (isset($matchData[0]) && !empty($matchData[0])) {
$result[] = array(
'text' => $matchData[0],
'offset' => $matchData[1],
'id' => $termMapping[$i],
);
}
}
}
}
// Sort by offset in descending order
usort($result, function($a, $b) {
return $a['offset'] > $b['offset'] ? -1 : 1;
});
return $result;
}
$doc = DOMDocument::loadHTML($html);
// Stack will be used to avoid recursive functions
$stack = new SplStack;
$stack->push($doc);
while (!$stack->isEmpty()) {
$node = $stack->pop();
if ($node->nodeType == XML_TEXT_NODE && $node->parentNode instanceof DOMElement) {
// $node represents text node
// and it's inside a tag (second condition in the statement above)
// Check that this text is not wrapped in <a> tag
// as we don't want to wrap it twice
if ($node->parentNode->tagName != 'a') {
$matches = matchTerms($node->wholeText);
foreach ($matches as $match) {
// Create new link element in the DOM
$link = $doc->createElement('a', $match['text']);
$link->setAttribute('href', 'species/' . $match['id']);
$link->setAttribute('class', 'link_species');
// Save the text after the link
$remainingText = $node->splitText($match['offset'] + strlen($match['text']));
// Save the text before the link
$linkText = $node->splitText($match['offset']);
// Replace $linkText with $link node
// i.e. 'something' becomes 'something'
$node->parentNode->replaceChild($link, $linkText);
}
}
}
if ($node->hasChildNodes()) {
foreach ($node->childNodes as $childNode) {
$stack->push($childNode);
}
}
}
$body = $doc->getElementsByTagName('body');
echo $doc->saveHTML($body->item(0));
Implementation details
I've only showed how to replace species terms, glossary terms will be same. Links are formed in form "species/$id". Abbreviations are handled correctly. DOMDocument is a very reliable parser, it can deal with broken markup and is fast.
?: in regexp allows not to count this subpattern as a capturing group (documentation on subpatterns). Without proper counting of subpatterns, we can't retrieve the termId. The idea is that we build a big regexp pattern by joining all regexps specified in $speciesTerms array and separating them with a pipe |. Final regexp for the first two species would be (spaces for clarity):
First capturing group Alternation Second capturing group
( (?:Aulonocara|A\.) maylandi maylandi ) | ( (?:Aulonocara|A\.) maylandi kandeensis )
So, the text "Examples: Aulonocara maylandi maylandi, A. maylandi kandeensis" will give following matches:
$matches[1] = array('Aulonocara maylandi maylandi') // Captured by the first group
$matches[2] = array('A. maylandi kandeensis') // Captured by the second group
We can clearly say that all elements in matches[1] are referring to the species Aulonocara maylandi maylandi or A. maylandi maylandi which has id = 339.
In short: Use (?:) if you're using subpatterns in $speciesTerms.
UPDATE
Each dynamically created regexp has a limit on maximal number of subpatterns, which is defined as a const at the top. This allows avoiding PCRE limit on number of subpatterns in regexp.
Important notes:
If you have a lot of terms you should rewrite matchTerms, because regexp has a limit on a number of subpatterns. In this case it's optimal to prebuild array of regexps out of every N terms.
matchTerms generates regexp at every call, obviously it can be done only once
It's possible to use advanced regexps in speciesTerms
strlen => mb_strlen if you're using multibyte encodings
Supplied $html will be wrapped in a <body> tag (unless it's already wrapped)

It would be better to parse the HTML rather than trying to use regular expressions. Regex is good when you have something specific you want to match, but gets quirky when you're trying to NOT match certain things.
Using http://simplehtmldom.sourceforge.net/ :
function addLinks(&$p, $species, $terms) {
// much easier to say "not in an anchor tag" with parsed content than with regex
if ($p->tag != 'a') {
// pull out existing elements so they aren't replaced
$children = array();
$x = 0;
foreach ($p->children as &$e) {
$children[] = $e->outertext;
$e->outertext = '---child-'.$x.'---';
$x++;
}
foreach($species as $s) {
$p->innertext = str_replace(
$s,
''.$s.'',
$p->innertext);
}
foreach($term as $t) {
$p->innertext = str_replace(
$t,
'<a href="glossary/'.
strtolower($t[0]).'/'.
strtolower(str_replace(' ','-',$t)).'">'.$t.'</a>',
$p->innertext);
}
// restore previous child elements
foreach($children as $x => $e) {
$p->innertext = str_replace('---child-'.$x.'---', $e, $p->innertext);
}
foreach ($p->children() as &$e) {
addLinks($e, $species, $terms);
}
}
}
$html = new simple_html_dom();
// you may have to wrap $content in a div. not exactly sure how partial content is handled
$html->load($content);
addLinks($html, $species_terms, $glossary_terms);
$content = $html->save();
I haven't used simple_html_dom a whole lot, but that should get you pointed in the right direction.

Related

Any way to optimize my solution for a faster and more elegant approach?

In mySQL, I have a column near_to where I save entries like Public Transportation,Business center
On the frontend I want to display some icons based on these entries.
For example an icon when there is Public Transportation inside the field, or Business Center or Fitness Center and so on.
This is my solution so far. My question is, is there any way to make this faster and more elegant?
if (strpos($req['near_to'],'Pub') !==false) {
echo '<li>public transportation icon</li>';
}
if (strpos($req['near_to'],'Fitn') !==false) {
echo '<li>fitness icon</li>';
}
if (strpos($req['near_to'],'Busi') !==false) {
echo '<li>business icon</li>';
}
I made up a little snippet with preg_replace. This way you can define the mapping in one array and get the final result in one statement by running preg_replace on the array itself.
<?php
$subject = "Public Transportation";
//$subject = "Business center";
$patterns = array(
"/Pub.*/" => "<li>public transportation icon</li>",
"/Fitn.*/" => "<li>fitness icon</li>",
"/Busi.*/" => "<li>business icon</li>"
);
$html = preg_replace(array_keys($patterns), array_values($patterns), $subject);
echo($html);
UPDATE
if you wanna match subjects for more than one of the patterns, than the pattern key must be different. The keys in the patterns array in the above example match the whole string, therefore only one icon will be returned as you said in your comment.
If we change the patterns as below, you'll see multiple icons in the html results. I assumed that the strings are constant and they are separated by ',', where ',' is optional, hence the '?' in the pattern.
<?php
$subject = "Public Transportation,Fitness Center,Business Center"; //$subject = "Business center";
$patterns = array(
"/Public Transportation,?/" => "<li>public transportation icon</li>",
"/Fitness Center,?/" => "<li>fitness icon</li>",
"/Business Center,?/" => "<li>business icon</li>"
);
$html = preg_replace(array_keys($patterns), array_values($patterns), $subject);
echo($html);
The above will return
<li>public transportation icon</li><li>fitness icon</li><li>business icon</li>

Extract a full substring from a partial substring (needle)

As you can see below, I'm attempting to extract the complete substring of an exploded array by using just a few characters to match the substring.
$keyword = array('Four Wheel', 'Power', 'Trailer');
function customSearch($keyword, $featurelistarray){
$key = ''; //possibly reset output
foreach($featurelistarray as $key => $arrayItem){
if( stristr( $arrayItem, $keyword ) ){
$termname = $key;
}
}
}
The array ($featurelistarray) comprises vehicle options, four wheel drive, four wheel disc brakes, power windows, power door locks, floor mats, trailer tow package, and many many more.
The point is to list all the options for a given category, and using the $keyword array to define the category.
I would also like to alphabetize the results. Thank you for the help!
To further explain, the $featurelistarray is exploded from a CSV field. The CSV field has a long length of options listed.
$featurelist=$csvdata['Options'];
$featurelistarray=explode(',',$featurelist);
$termname = $featurelistarray[0];
As you can see, $termname is assigned the first position of the exploded array. This was the original code for these features, but I need more control for $termname.
It seems to me you are trying to make database operations without database. I'd suggest to transform input into some kind of database.

PHP performant search a text for given usernames

I am currently dealing with a performance issue where I cannot find a way to fix it. I want to search a text for usernames mentioned with the # sign in front. The list of usernames is available as PHP array.
The problem is usernames may contain spaces or other special characters. There is no limitation for it. So I can't find a regex dealing with that.
Currently I am using a function which gets the whole line after the # and checks char by char which usernames could match for this mention, until there is just one username left which totally matches the mention. But for a long text with 5 mentions it takes several seconds (!!!) to finish. for more than 20 mentions the script runs endlessly.
I have some ideas, but I don't know if they may work.
Going through username list (could be >1.000 names or more) and search for all #Username without regex, just string search. I would say this would be far more inefficient.
Checking on writing the usernames with JavaScript if space or punctual sign is inside the username and then surround it with quotation marks. Like #"User Name". Don't like that idea, that looks dirty for the user.
Don't start with one character, but maybe 4. and if no match, go back. So same principle like on sorting algorithms. Divide and Conquer. Could be difficult to implement and will maybe lead to nothing.
How does Facebook or twitter and any other site do this? Are they parsing the text directly while typing and saving the mentioned usernames directly in the stored text of the message?
This is my current function:
$regular_expression_match = '#(?:^|\\s)#(.+?)(?:\n|$)#';
$matches = false;
$offset = 0;
while (preg_match($regular_expression_match, $post_text, $matches, PREG_OFFSET_CAPTURE, $offset))
{
$line = $matches[1][0];
$search_string = substr($line, 0, 1);
$filtered_usernames = array_keys($user_list);
$matched_username = false;
// Loop, make the search string one by one char longer and see if we have still usernames matching
while (count($filtered_usernames) > 1)
{
$filtered_usernames = array_filter($filtered_usernames, function ($username_clean) use ($search_string, &$matched_username) {
$search_string = utf8_clean_string($search_string);
if (strlen($username_clean) == strlen($search_string))
{
if ($username_clean == $search_string)
{
$matched_username = $username_clean;
}
return false;
}
return (substr($username_clean, 0, strlen($search_string)) == $search_string);
});
if ($search_string == $line)
{
// We have reached the end of the line, so stop
break;
}
$search_string = substr($line, 0, strlen($search_string) + 1);
}
// If there is still one in filter, we check if it is matching
$first_username = reset($filtered_usernames);
if (count($filtered_usernames) == 1 && utf8_clean_string(substr($line, 0, strlen($first_username))) == $first_username)
{
$matched_username = $first_username;
}
// We can assume that $matched_username is the longest matching username we have found due to iteration with growing search_string
// So we use it now as the only match (Even if there are maybe shorter usernames matching too. But this is nothing we can solve here,
// This needs to be handled by the user, honestly. There is a autocomplete popup which tells the other, longer fitting name if the user is still typing,
// and if he continues to enter the full name, I think it is okay to choose the longer name as the chosen one.)
if ($matched_username)
{
$startpos = $matches[1][1];
// We need to get the endpos, cause the username is cleaned and the real string might be longer
$full_username = substr($post_text, $startpos, strlen($matched_username));
while (utf8_clean_string($full_username) != $matched_username)
{
$full_username = substr($post_text, $startpos, strlen($full_username) + 1);
}
$length = strlen($full_username);
$user_data = $user_list[$matched_username];
$mentioned[] = array_merge($user_data, array(
'type' => self::MENTION_AT,
'start' => $startpos,
'length' => $length,
));
}
$offset = $matches[0][1] + strlen($search_string);
}
Which way would you go? The problem is the text will be displayed often and parsing it every time will consume a lot of time, but I don't want to heavily modify what the user had entered as text.
I can't find out what's the best way, and even why my function is so time consuming.
A sample text would be:
Okay, #Firstname Lastname, I mention you!
Listen #[TEAM] John, you are a team member.
#Test is a normal name, but #Thât♥ should be tracked too.
And see #Wolfs garden! I just mean the Wolf.
Usernames in that text would be
Firstname Lastname
[TEAM] John
Test
Thât♥
Wolf
So yes, there is clearly nothing I know where a name may end. Only thing is the newline.
I think the main problem is, that you can't distinguish usernames from text and it's a bad idea, to lookup maybe thousands of usernames in a text, also this can lead to further problems, that John is part of [TEAM] John‌ or JohnFoo...
It's needed to separate the usernames from other text. Assuming that you're using UTF-8, could put the usernames inside invisible zero-w space \xE2\x80\x8B and non-joiner \xE2\x80\x8C.
The usernames can now be extracted fast and with little effort and if needed still verified in db.
$txt = "
Okay, #\xE2\x80\x8BFirstname Lastname\xE2\x80\x8C, I mention you!
Listen #\xE2\x80\x8B[TEAM] John\xE2\x80\x8C, you are a team member.
#\xE2\x80\x8BTest\xE2\x80\x8C is a normal name, but
#\xE2\x80\x8BThât?\xE2\x80\x8C should be tracked too.
And see #\xE2\x80\x8BWolfs\xE2\x80\x8C garden! I just mean the Wolf.";
// extract usernames
if(preg_match_all('~#\xE2\x80\x8B\K.*?(?=\xE2\x80\x8C)~s', $txt, $out)){
print_r($out[0]);
}
Array
(
[0] => Firstname Lastname
1 => [TEAM] John
2 => Test
3 => Thât♥
4 => Wolfs
)
echo $txt;
Okay, #​Firstname Lastname, I mention you!
Listen #​[TEAM] John‌, you are a team member.
#​Test‌ is a normal name, but
#​Thât♥‌ should be tracked too.
And see #​Wolfs‌ garden! I just mean the Wolf.
Could use any characters you like and that possibly don't occur elsewhere for separation.
Regex FAQ, Test at eval.in (link will expire soon)

PHP: Most efficient way to display a variable within text when the text could be one of many possibilities

Below is a link to my original question:
PHP: How to display a variable (a) within another variable(b) when variable (b) contains text
Ok here's more to the problem, all your suggestions work but now I'm looking for the most efficient method to my specific problem.
In my database I have several blocks of text. When a user(described as $teamName) logs in to the site, they are randomly assigned one of these blocks of text. Each block of text is different and may have different variables in it.
The problem is I don't have knowledge of which block of text is assigned to the user without actually viewing the database or running a query. So at the moment I have to query the database and select the $newsID that corresponds to the block of text that the user has been assigned.
Because I have preset the blocks of text, I know what they contain so I can know do a switch($newsID) and depending on the value of the $newsID I then run the correct values inserted into the sprintf() function.
There is however, many many blocks of text so there will be many instances of case "": and break;. I wish to have the site working so that if at any stage I change a block of text to something different, then the variables within sprintf() are automatically updated, rather than me manually updating sprintf() within the switch() case:.
Sorry for the long post, hope it makes sense.
EDIT:
I have these predetermined blocks of text in my database in my teamNews table:
For $newsID = 1:
"$teamName is the name of a recently formed company hoping to take over the lucrative hairdryer design
$sector"
For $newsID = 2:
"The government is excited about the potential of ".$teamName.", after they made an annoucement that they have hired $HoM"
For $newsID = 3:
"It is rumored that $teamName are valuing their hairdryer at $salePrice. People are getting excited.
When a user($teamName) logs into the game they are randomly assigned one of these blocks of text with $newsID of 1,2 or 3.
Lets say the user is assigned the block of text with $newsID = 2. So now their username($teamName) is inserted into the database into the same row as their selected text.
Now I want to display the text corresponding to this user so I do the following:
$news = news ($currentStage,$teamName);
switch ($ID)
{
case "1":
sprintf($teamName,$sector)
echo $news."<br/><hr/>";
break;
case "2":
sprintf($teamName,$Hom)
break;
case "3":
sprintf($teamName,$saleprice)
break;
}
$currentStage--;
}
With the function
function news($period,$teamName)
{
$news = mysql_query("
SELECT `content`,`newsID` FROM `teamnews` WHERE `period` = '$period' && `teamName` = '$teamName'
") or die($news."<br/><br/>".mysql_error());
$row = mysql_fetch_assoc($news);
$news = $row['content'];
$ID = $row ['newsID'];
return $news,$ID;
}
The problem is that in reality there are about 20 different blocks of text that the user could be assigned to. So I will have many case:'s.
Also if I want to change all the text blocks in the database I would have to also manually change all the variables in the sprintf's in each ``case:`
I am wondering is there a better way to do this so that if I change the text in the database then the paramaters passed to sprintf will change accordingly.
So if I use
$replaces = array(
'teamName' => 'Bob the team',
'sector' => 'murdering',
'anotherSector' => 'giving fluffy bunnies to children'
);
is it possible to do this:
$replaces = array(
'$teamName' => '$teamName',
'$sector' => '$sector',
'$anotherSector' => '$anothersector'
);
I suggest you have fixed set of named placeholders, and use either the str_replace() or eval() (evil) methods of substitution.
So you would (for example) always have a $teamName and a $sector - and you might only sometimes use $anotherSector. And you have these two strings:
1 - $teamName, is the name of a recently formed company hoping to take over the lucrative $sector.
2 - The people at $teamName hate working in $sector, they would much rather work in $anotherSector
If you were to do:
$replaces = array(
'$teamName' => 'Bob the team',
'$sector' => 'murdering',
'$anotherSector' => 'giving fluffy bunnies to children'
);
$news = str_replace(array_keys($replaces),array_values($replaces),$news);
You would get
1 - Bob the team, is the name of a recently formed company hoping to take over the lucrative murdering.
2 - The people at Bob the team hate working in murdering, they would much rather work in giving fluffy bunnies to children
As long as your placeholders have known names, they don't all have to be present in the string - only the relevant ones will be replaced.
You could create a simple template language, and store templates in your database.
You can use strtr for this.
function replaceTemplateVars($str, $data) {
// change the key format to correspond to the template replacement format
$replacepairs = array();
foreach($data as $key => $value) {
$replacepairs["{{{$key}}}"] = $value;
}
// do the replacement in bulk
return strtr($str, $replacepairs);
}
// store your teamNews table text in this format
// double curly braces is easier to spot and less ambiguous to parse than `$name`.
$exampletemplate = '{{teamName}} is {{sector}} the {{otherteam}}!!'
// get $values out of your database for the user
$values = array(
'teamName' => 'Bob the team',
'sector' => 'murdering',
'otherteam' => 'fluffy bunnies'
);
echo replaceTemplateVars($exampletemplate, $values);
// this will echo "Bob the team is murdering the fluffy bunnies!!"
If you have needs more ambitious than this, such as looping or filters, you should find a third-party php template language and use it.
What about function eval?
http://php.net/eval

Searching keywords(from a matrix) in a string(around 500 char)

Hey, basically what i am trying to do is automatically assign Tags to a user input string. Now i have 5 tags to be assigned. Each tag will have around 10 keywords. A String can only be assigned one tag. In order to assign tag to string, i need to search for words matching keywords for all the five tags.
Example:
TAGS: Keywords
Drink: Beer, whiskey, drinks, drink, pint, peg.....
Fitness: gym, yoga, massage, exercise......
Apparels: men's shirt, shirt, dress......
Music: classical, western, sing, salsa.....
Food: meal, grilled, baked, delicious.......
User String: Take first step to reach your fitness goals, Pay Rs 199 for Aerobics, Yoga, Kick Boxing, Bollywood Dance and more worth Rs 1000 at The very Premium F Chisel Bounce, Koramangala.
Now i need to decide upon a tag for the above string. I need an time efficient algorithm for this problem. I don't know how to go about matching keywords for strings but i do have a thought about deciding tag. I was thinking to maintain an array count for each tag and as a keyword is matched count for respective tag is increased. if at any time count for any tag reaches 5 we can stop and decide on that tag only this will save us from searching the whole thing.
Please give any advice you have on this. I will be using php just so you know.
thanks
Interesting topic! What you are looking for is something similar to latent semantic indexing. There is questing here.
If the number of tags and keywords is small I would save me writing a complex algorithm and simply do:
$tags = array(
'drink' => array('beer', 'whiskey', ...),
...
);
$string = 'Take first step ...';
$bestTag = '';
$bestTagCount = 0;
foreach ($tags as $tag => $keywords) {
$count = 0;
foreach ($keywords as $keyword) {
$count += substr_count($string, $keyword);
}
if ($count > $bestTagCount) {
$bestTagCount = $count;
$bestTag = $tag;
}
}
var_dump($bestTag);
The algorithm is pretty obvious, but only suited for a small number of tags/keywords.
If you dont mind using an external API, you should try one of these:
http://www.zemanta.com/
http://www.opencalais.com/
Benjamin Nowack: Linked Data Entity Extraction with Zemanta and OpenCalais
To give an example, Zemanta will return the following tags (among other things) for your User String:
Bollywood, Kickboxing, Koramangala, Aerobics, Boxing, Sports, India, Asia
Open Calais will return
Sports, Hospitality Recreation, Health, Recreation, Human behavior, Kick, Yoga, Chisel
Aerobics, Meditation, Indian philosophy, Combat sports, Aerobic exercise, Exercise

Categories