How to replace glossary terms in HTML text with links? - php

I would like to run a str_replace or preg_replace which looks for certain words (found in $glossary_terms) in my $content and replaces them with links (like term).
However, the $content is full HTML and my links/images are being affected too, which isn't what I'm after.
An example of $content is:
<div id="attachment_542" class="wp-caption alignleft" style="width: 135px"><img class="size-thumbnail wp-image-542" title="Amazonas English" src="http://www.seriouslyfish.com/dev/wp-content/uploads/2011/12/Amazonas-English-1-288x381.jpg" alt="Amazonas English" width="125" height="165" /><p class="wp-caption-text">Amazonas Magazine - now in English!</p></div>
<p>Edited by Hans-Georg Evers, the magazine ‘Amazonas’ has been widely-regarded as among the finest regular publications in the hobby since its launch in 2005, an impressive achievment considering it’s only been published in German to date. The long-awaited English version is just about to launch, and we think a subscription should be top of any serious fishkeeper’s Xmas list…</p>
<p>The magazine is published in a bi-monthly basis and the English version launches with the January/February 2012 issue with distributors already organised in the United States, Canada, the United Kingdom, South Africa, Australia, and New Zealand. There are also mobile apps availablen which allow digital subscribers to read on portable devices.</p>
<p>It’s fair to say that there currently exists no better publication for dedicated hobbyists with each issue featuring cutting-edge articles on fishes, invertebrates, aquatic plants, field trips to tropical destinations plus the latest in husbandry and breeding breakthroughs by expert aquarists, all accompanied by excellent photography throughout.</p>
<p>U.S. residents can subscribe to the printed edition for just $29 USD per year, which also includes a free digital subscription, with the same offer available to Canadian readers for $41 USD or overseas subscribers for $49 USD. Please see the Amazonas website for further information and a sample digital issue!</p>
<p>Alternatively, subscribe directly to the print version here or digital version here. Just gonna add this to the end of the post so I can do some testing.</p>
I came across this link, but I wasn't sure if such a method would work with nested HTML.
Is there any way I can str_replace or preg_replace content within <p> tags only; excluding any nested <a>, <img> or <h1/2/3/4/5> tags?
Thanks in advance,

A "by-the-book solution" would be like this:
<?php
$html = "<your HTML string>";
$glossary_terms = array('fishes', 'invertebrates', 'aquatic plants');
$dom = new DOMDocument;
$dom->loadHTML($html);
dom_link_glossary($dom, $glossary_terms);
echo $dom->saveHTML();
// wraps all occurrences of the glossary terms in links
function dom_link_glossary(&$document, &$glossary) {
$xpath = new DOMXPath($document);
$urls = array();
$pattern = array();
// build a normalized lookup (case-insensitive, whitespace-agnostic)
foreach ($glossary as $term) {
$term_norm = preg_replace('/\s+/', ' ', strtoupper(trim($term)));
$pattern[] = preg_replace('/ /', '\\s+', preg_quote($term_norm));
$urls[$term_norm] = '/glossary/initial/' . rawurlencode($term);
}
$pattern = '/\b(' . implode('|', $pattern) . ')\b/i';
$text_nodes = $xpath->query('//text()[not(ancestor::a)]');
foreach($text_nodes as $original_node) {
$text = $original_node->nodeValue;
$hitcount = preg_match_all($pattern, $text, $matches, PREG_OFFSET_CAPTURE);
if ($hitcount == 0) continue;
$offset = 0;
$parent = $original_node->parentNode;
$refnode = $original_node->nextSibling;
$parent->removeChild($original_node);
foreach ($matches[0] as $i => $match) {
$term_txt = $match[0];
$term_pos = $match[1];
$term_norm = preg_replace('/\s+/', ' ', strtoupper($term_txt));
// insert any text before the term instance
$prefix = substr($text, $offset, $term_pos - $offset);
$parent->insertBefore($document->createTextNode($prefix), $refnode);
// insert the actual term instance as a link
$link = $document->createElement("a", $term_txt);
$link->setAttribute("href", $urls[$term_norm]);
$parent->insertBefore($link, $refnode);
$offset = $term_pos + strlen($term_txt);
if ($i == $hitcount - 1) { // last match, append remaining text
$suffix = substr($text, $offset);
$parent->insertBefore($document->createTextNode($suffix), $refnode);
}
}
}
}
?>
Here is how dom_link_glossary() works:
It normalizes the glossary terms (trim, uppercase, white-space) and builds a lookup array and a regex pattern that matches all terms.
It uses XPath to find all text nodes that are not already part of a link. Text nodes are returned irrespective of their nesting depth (i.e. no recursion necessary on our part). I use \b to prevent partial matches.
For each text node that contains terms:
The original text node is deleted ($parent->removeChild())
Now new nodes are created and inserted into the DOM: text nodes for anything before (or after) a glossary term, element nodes (<a>) for the actual glossary terms.
The solution preserves original case and white space, therefore
term will become term
Term will become Term
Foo Bar will become Foo Bar. Surplus whitespace or line breaks in the HTML will not break the mechanism.
Note that it is perfectly all-right to use regex on the plain text node values. It is not okay to use regex on full HTML.
I would recommend pairing the glossary terms with their respective URLs in an array, instead of calculating the URLs in the function. That way you can make multiple terms point to the same URL.

You can try this:
$content = preg_replace('/(<p\sclass=\"wp\-caption\-text\">)[^<]+(<\/p>)/i', '', $content);

Related

php preg_match excluding text within html tags/attributes to find correct place to cut a string

I am trying to determine the absolute position of certain words within a block of html, but only if they are outside of an actual html tag. For instance, if I wanted to determine the position of the word "join" using preg_match in this text:
<p>There are 14 more days until our holiday special so come join us!</p>
I could use:
preg_match('/join/', $post_content, $matches, PREG_OFFSET_CAPTURE, $offset);
The problem is that this is matching the word within the aria-label attribute, when what I need is the one just after the link. It would be fine to match between the <a> and </a>, just not inside the brackets themselves.
My actual end goal, most of what (I think) I have aside from this last element: I am trimming a block of html (not a full document) to cut off at a specific word count. I am trying to determine which character that last word ends at, and then joining the left side of the html block with only the html from the right side, so all html tags close gracefully. I thought I had it working until I ran into an example like I showed where the last word was also within an html attribute, causing me to split the string at the wrong location. This is my code so far:
$post_content = strip_tags ( $p->post_content, "<a><br><p><ul><li>" );
$post_content_stripped = strip_tags ( $p->post_content );
$post_content_stripped = preg_replace("/[^A-Za-z0-9 ]/", ' ', $post_content_stripped);
$post_content_stripped = preg_replace("/\s+/", ' ', $post_content_stripped);
$post_content_stripped_array = explode ( " " , trim($post_content_stripped) );
$excerpt_wordcount = count( $post_content_stripped_array );
$cutpos = 0;
while($excerpt_wordcount>48){
$thiswordrev = "/" . strrev($post_content_stripped_array[$excerpt_wordcount - 1]) . "/";
preg_match($thiswordrev, strrev($post_content), $matches, PREG_OFFSET_CAPTURE, $cutpos);
$cutpos = $matches[0][1] + (strlen($thiswordrev) - 2);
array_pop($post_content_stripped_array);
$excerpt_wordcount = count( $post_content_stripped_array );
}
if($pwordcount>$excerpt_wordcount){
preg_match_all('/<\/?[^>]*>/', substr( $post_content, strlen($post_content) - $cutpos ), $closetags_result);
$excerpt_closetags = "" . $closetags_result[0][0];
$post_excerpt = substr( $post_content, 0, strlen($post_content) - $cutpos ) . $excerpt_closetags;
}else{
$post_excerpt = $post_content;
}
I am actually searching the string in reverse in this case, since I am walking word by word backwards from the end of the string, so I know that my html brackets are backwards, eg:
>p/<!su nioj emoc os >a/<laiceps yadiloh>"su nioj"=lebal-aira "renepoon rerreferon"=ler "knalb_"=tegrat "lmth.egapemos/"=ferh a< ruo litnu syad erom 41 era erehT>p<
But it's easy enough to flip all of the brackets before doing the preg_match, or I am assuming should be easy enough to have the preg_match account for that.
Do not use regex to parse HTML.
You have a simple objective: limit the text content to a given number of words, ensuring that the HTML remains valid.
To this end, I would suggest looping through text nodes until you count a certain number of words, and then removing everything after that.
$dom = new DOMDocument();
$dom->loadHTML($post_content);
$xpath = new DOMXPath($dom);
$all_text_nodes = $xpath->query("//text()");
$words_left = 48;
foreach( $all_text_nodes as $text_node) {
$text = $text_node->textContent;
$words = explode(" ", $text); // TODO: maybe preg_split on /\s/ to support more whitespace types
$word_count = count($words);
if( $word_count < $words_left) {
$words_left -= $word_count;
continue;
}
// reached the threshold
$words_that_fit = implode(" ", array_slice($words, 0, $words_left));
// If the above TODO is implemented, this will need to be adjusted to keep the specific whitespace characters
$text_node->textContent = $words_that_fit;
$remove_after = $text_node;
while( $remove_after->parentNode) {
while( $remove_after->nextSibling) {
$remove_after->parentNode->removeChild($remove_after->nextSibling);
}
$remove_after = $remove_after->parentNode;
}
break;
}
$output = substr($dom->saveHTML($dom->getElementsByTagName("body")->item(0)), strlen("<body>"), -strlen("</body>"));
Live demo
Ok, I figured out a workaround. I don't know if this is the most elegant solution, so if someone sees a better one I would still love to hear it, but for now I realized that I don't have to actually have the html in the string I am searching to determine the position to cut, I just need it to be the same length. I grabbed all of the html elements and just created a dummy string replacing all of them with the same number of asterisks:
// create faux string with placeholders instead of html for search purposes
preg_match_all('/<\/?[^>]*>/', $post_content, $alltags_result);
$tagcount = count( $alltags_result );
$post_content_dummy = $post_content;
foreach($alltags_result[0] as $thistag){
$post_content_dummy = str_replace($thistag, str_repeat("*",strlen($thistag)), $post_content_dummy);
}
Then I just use $post_content_dummy in the while loop instead of $post_content, in order to find the cut position, and then $post_content for the actual cut. So far seems to be working fine.

Find a pattern within two or more sets of text

I have lots of data that I need to search through for certain patterns.
Problem is when looking for said patterns I have no reference to what I'm looking for.
Or in other words, I have two paragraphs. Each on similar topics. I need to be able to compare both paragraphs and find patterns. Phrases said in both paragraphs and how many times both were said.
Can't seem to find the solution because preg_match and other functions your required to supply the things your looking for.
Example paragraphs
Paragraph 1:
Bee Pollen is made by honeybees, and is the food of the young bee. It
is considered one of nature's most completely nourishing foods as it
contains nearly all nutrients required by humans. Bee-gathered pollens
are rich in proteins (approximately 40% protein), free amino acids,
vitamins, including B-complex, and folic acid.
Paragraph 2:
Bee Pollen is made by honeybees. It is required for the fertilization
of the plant. The tiny particles consist of 50/1,000-millimeter
corpuscles, formed at the free end of the stamen in the heart of the
blossom, nature's most completely nourishing foods. Every variety of
flower in the universe puts forth a dusting of pollen. Many orchard
fruits and agricultural food crops do, too.
So from those examples these patterns:
Bee Pollen is made by honeybees
and:
nature's most completely nourishing foods
Both phrases are found in both paragraphs.
This is potentially a complex question depending on whether you're looking for similar phrases or phrases that match word for word.
Finding exact word-for-word matches is quite simple all you need to do is split on common breaks like punctuation marks (e.g. .,;:) and perhaps on conjunctions as well (e.g. and or). However, the problem comes when you come to, for example, adjectives two phrases might be exactly the same but have one word different, like so:
The world is spinnnig around its axis at a tremendous speed.
The world is spinning around its axis at a magnificent speed.
This won't match because tremendous and magnificent are used in place of one another. Potentially you could work around this, however, that would be a more complex question.
Answer
If we stick to the simple side of things we can achieve phrase matching with just a few lines of code (4 in this example; not including the formatting for comments/readability).
$wordSplits = 'and or on of as'; //List of words to split on
preg_match_all('/(?<m1>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para1, $matches1);
preg_match_all('/(?<m2>.*?)([.,;:\-]| '.str_replace(' ', ' | ', trim($wordSplits)).' )/i', $para2, $matches2);
$commonPhrases = array_filter( //Removes blank $key=>$value pairs
array_intersect( //Finds matching paterns
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para1 values - removes leading and following spaces
}, $matches1['m1']),
array_map(function($item){
return(strtolower(trim($item))); //Cleans array for $para2 values - removes leading and following spaces
}, $matches2['m2'])
)
);
var_dump($commonPhrases);
/**
OUTPUT:
array(2) {
[0]=>
string(31) "bee pollen is made by honeybees"
[5]=>
string(41) "nature's most completely nourishing foods"
}
/*
The above code will find matches splitting both on punctuation (defined in [...] of the preg_match_all pattern) it will also concatenate the word list (matching only words in the word list with a preceding and following space).
Wordlist
You can change the word list to include any breaks you like, editing the list until you get the phrases you are after, examples:
$wordSplits = 'and or';
$wordSplits = 'and but if or';
$wordSplits = 'a an as and by but because if in is it of off on or';
Punctuation
You can add any punctuation marks you like into the list (between [ and ]), however remember that some characters do have special meanings and may need to be escaped (or placed appropriately): - and ^ should become \- and \^ or be placed where their special meaning doesn't come into play.
You may consider changing:
([.,;:\-]|
To:
([.,;:\-] | //Adding a space before the pipe
So that you only split punctuation marks which are followed by a space. For example: this would mean that items like 50,000 won't be split.
Spaces and breaks
You may also consider changing the spaces to \s so that tabs and newlines etc are included and not just spaces. Like so:
'/(?<m1>.*?)([.,;:\-]|\s'.str_replace(' ', '\s|\s', trim($wordSplits)).'\s)/i'
This would also apply to:
([.,;:\-]\s|
If you decide to go down that route.
I've been working on this code, don't know if it suits your needs... Feel free to expand it!
$p1 = "Bee Pollen is made by honeybees, and is the food of the young bee. It is considered one of nature's most completely nourishing foods as it contains nearly all nutrients required by humans. Bee-gathered pollens are rich in proteins (approximately 40% protein), free amino acids, vitamins, including B-complex, and folic acid.";
$p2 = "Bee Pollen is made by honeybees. It is required for the fertilization of the plant. The tiny particles consist of 50/1,000-millimeter corpuscles, formed at the free end of the stamen in the heart of the blossom, nature's most completely nourishing foods. Every variety of flower in the universe puts forth a dusting of pollen. Many orchard fruits and agricultural food crops do, too.";
// Strip strings of periods etc.
$p1 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p1));
$p2 = strtolower(str_replace(array('.', ',', '(', ')'), '', $p2));
// Extract words from first paragraph
$w1 = explode(" ", $p1);
// Build search string
$search = '';
$found = array();
foreach ($w1 as $word) {
//echo 'Word: ' . $word . "<br />";
$search .= ' ' . $word;
$search = trim($search);
//echo '. . Search string: '. $search . "<br /><br />";
if (substr_count($p2, $search)) {
$old_search = $search;
$num_occured = substr_count($p2, $search);
//echo " . . . found!" . "<br /><br /><br />";
$add = TRUE;
} else {
//echo " . . . not found! Generating new search string: " . $word . '<br />';
if ($add) {
$found[] = array('pattern' => $old_search, 'occurences' => $num_occured);
$add = FALSE;
}
$old_search = '';
$search = $word;
}
}
print_r($found);
The above code finds occurences of patterns from the first string in the second one.
I'm sure it can be written better, but since it's past midnight (local time), I'm not as "fresh" as I'd like to be...
Codepad-link

I need to find a string in a string then replace that and text around it

i have a string that has markers and I need to replace with text from a database. this text string is stored in a database and the markers are for auto fill with data from a different part of the database.
$text = '<span data-field="la_lname" data-table="user_properties">
{Listing Agent Last Name}
</span>
<br>RE: The new offer<br>Please find attached....'
if i can find the data marker by:
strpos($text, 'la_lname');
can i use that to select everything in and between the <span> and </span> tags..
so the new string looks like:
'Sommers<br>RE: The new offer<br>Please find attached....'
I thought I could explode the string based on the <span> tags but that opens up a lot of problems as I need to keep the text intact and formated as it is. I just want to insert the data and leave everything else untouched.
To get what's between two parts of a string
for example if you have
<span>SomeText</span>
If you want to get SomeText then I suggest using a function that gets whatever is between two parts that you put as parameters
<?php
function getbetween($content,$start,$end) {
$r = explode($start, $content);
if (isset($r[1])){
$r = explode($end, $r[1]);
return $r[0];
}
return '';
}
$text = '<span>SomeText</span>';
$start = '<span>';
$end = '</span>';
$required_text = getbetween($text,$start,$end);
$full_line = $start.$required_text.$end;
$text = str_replace($full_line, 'WHAT TO REPLACE IT WITH HERE',$text);
You could try preg_replace or use a DOM Parser, which is far more useful for navigating HTML-like-structure.
I should add that while regular expressions should work just fine in this example, you may need to do more complex things in the future or traverse more intrincate DOM structures for your replacements, so a DOM Parser is the way to go in this case.
Using PHP Simple HTML DOM Parser
$html = str_get_html('<span data-field="la_lname" data-table="user_properties">{Listing Agent Last Name}</span><br>RE: The new offer<br>Please find attached....');
$html->find('span')->innerText = 'New value of span';

PHP Search Text Highlight Function

I have a PHP highlighting function which makes certain words bold.
Below is the function, and it works great, except when the array: $words contains a single value that is: b
For example someone searches for: jessie j price tag feat b o b
This will have the following entries in the array $words: jessie,j,price,tag,feat,b,o,b
When a 'b' shows up, my whole function goes wrong, and it displays a whole bunch of wrong html tags. Of course I can strip out any 'b' values from the array, but this isn't ideal, as the highlighting isnt working as it should with certain queries.
This sample script:
function highlightWords2($text, $words)
{
$text = ($text);
foreach ($words as $word)
{
$word = preg_quote($word);
$text = preg_replace("/\b($word)\b/i", '<b>$1</b>', $text);
}
return $text;
}
$string = 'jessie j price tag feat b o b';
$words = array('jessie','tag','b','o','b');
echo highlightWords2($string, $words);
Will output:
<<<b>b</b>><b>b</b></<b>b</b>>>jessie</<<b>b</b>><b>b</b></<b>b</b>>> j price <<<b>b</b>><b>b</b></<b>b</b>>>tag</<<b>b</b>><b>b</b></<b>b</b>>> feat <<b>b</b>><b>b</b></<b>b</b>> <<b>b</b>>o</<b>b</b>> <<b>b</b>><b>b</b></<b>b</b>>
And this only happens because there are "b"'s in the array.
Can you guys see anything that I could change to make it work properly?
You problem is that when your function goes through and looks for all the b's to bold it sees the bold tags and also tries to bold them as well.
#symcbean was close but forgot one thing.
$string = 'jessie j price tag feat b o b';
$words = array('jessie','tag','b','o','b');
print hl($string, $words);
function hl($inp, $words)
{
$replace=array_flip(array_flip($words)); // remove duplicates
$pattern=array();
foreach ($replace as $k=>$fword) {
$pattern[]='/\b(' . $fword . ')(?!>)\b/i';
$replace[$k]='<b>$1</b>';
}
return preg_replace($pattern, $replace, $inp);
}
Do you see this added "(?!>)" that is a negative look ahead assertion, basically it says only match if the string is not followed by a ">" which is what would be seen is opening bold and closing bold tags. Notice I only check for ">" after the string in order to exclude both the opening and closing bold tag as looking for it at the start of the string would not catch the closing bold tag. The above code works exactly as expected.
Your base problem is that you quite wildly replace plain text strings inside HTML. That does cause your problem for small strings as you replace text in tags and attributes as well.
Instead you need to apply your search and replace to the text between HTML texts only. Additionally you don't want to highlight inside another highlight as well.
To do such things, regular expressions are quite limited. Instead use a HTML parser, in PHP this is for example DOMDocument. With a HTML parser it is possible to search only inside the HTML text elements (and not other things like tags, attributes and comments).
You find a highlighter for text in a previous answer of mine with a detailed description how it works. The question is Ignore html tags in preg_replace and it is quite similar to your question so probably this snippet is helpful, it uses <span> instead of <b> tags:
$doc = new DOMDocument;
$doc->loadXML($str);
$xp = new DOMXPath($doc);
$anchor = $doc->getElementsByTagName('body')->item(0);
if (!$anchor)
{
throw new Exception('Anchor element not found.');
}
// search elements that contain the search-text
$r = $xp->query('//*[contains(., "'.$search.'")]/*[FALSE = contains(., "'.$search.'")]/..', $anchor);
if (!$r)
{
throw new Exception('XPath failed.');
}
// process search results
foreach($r as $i => $node)
{
$textNodes = $xp->query('.//child::text()', $node);
// extract $search textnode ranges, create fitting nodes if necessary
$range = new TextRange($textNodes);
$ranges = array();
while(FALSE !== $start = strpos($range, $search))
{
$base = $range->split($start);
$range = $base->split(strlen($search));
$ranges[] = $base;
};
// wrap every each matching textnode
foreach($ranges as $range)
{
foreach($range->getNodes() as $node)
{
$span = $doc->createElement('span');
$span->setAttribute('class', 'search_hightlight');
$node = $node->parentNode->replaceChild($span, $node);
$span->appendChild($node);
}
}
}
If you adopt it for multiple search terms, I would add an additional class with a number depending on the search term so you can nicely style it with CSS in different colors.
Additionally you should remove duplicate search terms and make the xpath expression aware to not look for text that is already part of an element that has the highlight span assigned.
If it were me I'd have used javascript.
But using PHP, since the problem only seems to be duplicate entries in the search, just remove them, also you can run preg_replace just once rather than multiple times....
$string = 'jessie j price tag feat b o b';
$words = array('jessie','tag','b','o','b');
print hl($string, $words);
function hl($inp, $words)
{
$replace=array_flip(array_flip($words)); // remove duplicates
$pattern=array();
foreach ($replace as $k=>$fword) {
$pattern[]='/\b(' . $fword . ')\b/i';
$replace[$k]='<b>$1<b>';
}
return preg_replace($pattern, $replace, $inp);
}

String parsing help

I have a paragraph of text in the following format:
text text text <age>23</age>. text text <hobbies>...</hobbies>
I want to be able to
1) Extract the text found between each <age> and <hobbies> tag found in the string. So for example, I would have an array called $ages which will contain all ages found between all the <age></age> tags, and then another array $hobbies which will have the text between the <hobbies></hobbies> tags found throughout the string.
2) Be able to replace the tags which are extracted with a marker, such as {age_444}, so e.g the above text would become
text text text {age_444}. text text {hobbies_555}
How can this be done?
//Extract the age
preg_match_all("#<age>(.*?)</age>#",$string,$match);
$ages=$match[1];
//Extract the hobby
preg_match_all("#<hobbies>(.*?)</hobbies>#",$string,$match);
$hobbies=$match[1];
//Replace the age
$agefn=create_function('$match','$query=mysql_query("select ageid...where age=".$match[1]); return "<age>{age_".mysql_fetch_object($query)->ageid."}</age>"');
$string=preg_replace_callback("#<age>(.*?)</age>#",$agefn,$string);
//Replace the hobby
$hobfn=create_function('$match','$query=mysql_query("select hobid...where hobby=".$match[1]); return "<hobbies>{hobbies_".mysql_fetch_object($query)->hobid."}</hobbies>"');
$string=preg_replace_callback("#<hobbies>(.*?)</hobbies>#",$hobfn,$string);
If your source document is a kind of well-formed XML (or if it can easily be brought into this shape at least), you can use XSLT/XSL-FO to transform your document.
Finding informations enclosed by <> tags and rearranging/extracting them is one of the main features. You can use XSLT/XSL-FO stand-alone or within various languages (Java, C, even Visual Basic)
What you need is your source document and a document describing the transformation rules. The rendering machine or library function will do the rest.
Hope that helps. Good luck
$string = '<age>23</age><hobbies>hobbietext</hobbies>';
$ageTemp = explode('<age>', $string );
foreach($ageTemp as $key=>$value)
{
$age = explode('</age>', $value);
if(isset($age[0])) $ages[] = $age[0];
}
$hobbiesTemp = explode('<hobbies>', $string );
foreach($hobbiesTemp as $key=>$value)
{
$hobbie = explode('</hobbies>', $value);
if(isset($hobbie[0])) $hobbies[] = $hobbie[0];
}
final arrays are $hobbies and $ages
after that you just replace your sting like this:
foreach($ages as $key=>$value)
{
$string = str_replace('<age>'.$value.'</age>', '{age_'.$yourId.'}', $string);
}
foreach($hobbies as $key=>$value)
{
$string = str_replace('<hobbies>'.$value.'</hobbies>', '{hobbie_'.$yourId.'}', $string);
}

Categories