Basically I have a variable which contains a few paragraphs of text and I have a variable which I want to make bold within the paragraphs. (By wrapping <strong></strong> tags around it). The problem is I don't want to make all instances of the word bold, or else I'd just do a str_replace(), I want to be able to wrap the first, second, fourth, whatever instance of this text in the tags, at my own discretion.
I've looked on Google for quite awhile but it's hard to find any results related to this, probably because of my wording..
I guess that preg_replace() could do the trick for you. The following example should skip 2 instances of the word "foo" and highlight the third one:
preg_replace(
'/((?:.*?foo.*?){2})(foo)/',
'\1<strong>\2</strong>',
'The foo foo fox jumps over the foo dog.'
);
(Sorry, I forgot two questionmarks to disable the greediness on my first post. I edited them in now.)
You can probably reference 'Replacing the nth instance of a regex match in Javascript' and modify it to work for your needs.
Since you said you wanted to be able to define which instances should be highlighted and it sounds like that will be arbitrary, something like this should do the trick:
// Define which instances of a word you want highlighted
$wordCountsToHighlight = array(1, 2, 4, 6);
// Split up the paragraph into an array of words
$wordsInParagraph = explode(' ', $paragraph);
// Initialize our count
$wordCount = 0;
// Find out the maximum count (because we can stop our loop after we find this one)
$maxCount = max($wordCountsToHighlight);
// Here's the word we're looking for
$wordToFind = 'example'
// Go through each word
foreach ($wordsInParagraph as $key => $word) {
if ($word == $wordToFind) {
// If we find the word, up our count.
$wordCount++;
// If this count is one of the ones we want replaced, do it
if (in_array($wordCount, $wordCountsToHighlight)) {
$wordsInParagragh[$key] = '<strong>example</strong>';
}
// If this is equal to the maximum number of replacements, we are done
if ($wordCount == $maxCount) {
break;
}
}
}
// Put our paragraph back together
$newParagraph = implode(' ', $wordsInParagraph);
It's not pretty and probably isn't the quickest solution, but it'll work.
Related
Basically what I am trying to do here is get a text input (a paragraph), and then save each word into an array. Then I want to check each word in the array against the original paragraph to see how many times it occurred. By doing this I am hopefully going to be able to check what the topic is. Originally I started this is as an open ended school project, but I am more interested in finding out how to do this for my own sanity.
Here is my code (this is after I requested the text input in html code above):
$paragraph = $_POST['text'];
$paragraph = str_replace(' ',' ',$paragraph);
$paragraph = str_replace(' ',' ',$paragraph);
$paragraph = strtolower($paragraph);
$words = explode(" ",$paragraph);
$count = count($words);
for($x = 0; $x < $count; $x++) {
echo $words[$x];
echo "<br/>";
}
So far I have been able to get the words all lowercase and to replace all the extra spaces in my text, and then subsequently save that to an array. For now I am just displaying the words.
This is where I have run into some problems. I was thinking I could have a multidimensional array where it would be something along the lines of
$words[1]["word"][0]["amount"];
The word would be the actual word in the paragraph, and amount would count how many times it showed up in the paragraph. If anyone has basic concepts for doing this, or there is something I am missing here I would appreciate your help. The main thing I need help with is checking the amount of times each word shows up in the paragraph. I couldn't get this to work (it was within the prior for loop):
substr_count($words[$x],$paragraph)
To recap, I am trying to take a paragraph, save each different word into an array (I have managed to do this successfully) and then save the amount of times the word shows up in the paragraph into a different array (or a multidimensional array). Once I get this data I am going to see which words I used the most, while filtering out filler words like "the" and "a".
You would be better off using preg_replace('/\W+/', ' ', $paragraph); and simplifying the rest of your code to this:
$paragraph = preg_replace('/\W+/', ' ', $paragraph);
$filter = array('the', 'a');
$words = explode(' ',$paragraph);
$countWords = array();
foreach($words as $w)
{
if(trim($w) != "" && array_search($w, $filter) === false)
{
if(!isset($countWords[$w]))
$countWords[$w] = 0;
$countWords[$w] += 1;
}
}
This will give you how many times each word is used. And if you don't care about case, then you can use $countWords[strtolower($w)] instead. Also, with the $filter array I added, you can add whatever words that you don't want to count in there.
Horrible title, I know.
I want to have some kind of wordwrap, but obviously can not use wordwrap() as it messes up UTF-8.. not to mention markup.
My issue is that I want to get rid of stuff like this "eeeeeeeeeeeeeeeeeeeeeeeeeeee" .. but then longer of course. Some jokesters find it funny to put that stuff on my site.
So when I have a string like this "Hello how areeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee you doing?" I want to break up the 'areeee'-thing with the zero width space () character.
Strings aren't always the same letter, and strings are always inside larger strings.. so str_len, substr, wordwrap all don't really fit the description.
Who can help me out?
Said that this is not a PHP solution, if your problem is the view of your script, why don't you use the simple CSS3 rule called word-wrap?
Let your container is a div with id="example", you can write:
#example
{
word-wrap: break-word;
}
Do this in 3 steps
do a split on the string and whitespace
do a str_len/trim on each word in the string
concat the string back together
The downside to this would be that words longer than 10 chars would be broken as well. So I would suggest adding some stuff in here to see if it is the same letter in a row over and over.
EXAMPLE
$string = "Hello how areeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee you doing?";
$strArr = explode(" ",$string);
foreach($strArr as $word) {
if(strlen($word) > 10) {
$word = substr($word,0,10);
}
$wordArr[] = $word;
}
$newString = implode(" ",$wordArr);
print $newString; // Prints "Hello how areeeeeeee you doing?"
Consider the following string
$input = "string with {LABELS} between brackets {HERE} and {HERE}";
I want to temporarily remove all labels (= whatever is between curly braces) so that an operation can be performed on the rest of the string:
$string = "string with between brackets and";
For arguments sake, the operation is concatenate every word that starts with 'b' with the word 'yes'.
function operate($string) {
$words = explode(' ', $string);
foreach ($words as $word) {
$output[] = (strpos($word, 0, 1) == 'b') ? "yes$word" : $word;
}
return implode(' ', $output);
}
The output of this function would be
"string with yesbetween yesbrackets and"
Now I want to insert the temporarily deleted labels back into place:
"string with {LABELS} yesbetween yesbrackets {HERE} and {HERE}"
My question is: how can I accomplish this? Important: I am not able to alter operate(), so the solution should contain a wrapper function around operate() or something. I have been thinking about this for quite a while now, but am confused as to how to do this. Could you help me out?
Edit: it would be too much to put the actual operate() in this post. It will not really add value (except make the post longer). There is not much difference between the output of operate() here and the real one. I will be able to translate any ideas from here, to the real-world situation :-)
The answer to this depends on wether or not you are able to understand operate(), even if you can't change it.
If you have absolutely no insight into operate(), your problem is simply unsolvable: To reinsert your labels you need one of
Their offset or relative position (You can't know them, if you don't know operate())
A marker for their place (You can't have them, if you don't know how operate() will work on them)
If you have at least some insight into operate(), this becomes something between solvable and easy:
If operate($a . $b)==operate($a) . operate($b), then you just split your original input by the labels, run the non-label parts through operate(), but obviously not the labels, then reassemble
If operate() is guaranteed to let a placeholder string, that itself is guaranteed to be not part of the normal input ("\0" and friends come to mind) alone, then you extract your labels in order, replace them by the placeholder, run the result through operate() and later replace the placeholder by your saved labels (in order)
Edit
After reading your comments, here are some lines of code
$input = "string with {LABELS} between brackets {HERE} and {HERE}";
//Extract labels and replace with \0
$tmp=preg_split('/(\{.*?\})/',$input,-1,PREG_SPLIT_DELIM_CAPTURE);
$labels=array();
$txt=array();
$islabel=false;
foreach ($tmp as $t) {
if ($islabel) $labels[]=$t;
else $txt[]=$t;
$islabel=!$islabel;
}
$txt=implode("\0",$txt);
//Run through operate()
$txt=operate($txt);
//Reasssemble
$txt=explode("\0",$txt);
$result='';
foreach ($txt as $t)
$result.=$t.array_shift($labels);
echo $result;
Here's what I would do as a first attempt. Split your string into single words, then feed them into operate() one by one, depending on whether the word is 'braced' or not.
$input = "string with {LABELS} between brackets {HERE} and {HERE}";
$inputArray = explode(' ',$input);
foreach($inputArray as $key => $value) {
if(!preg_match('/^{.*}$/',$value)) {
$inputArray[$key] = operate($value);
}
}
$output = implode(' ',$inputArray);
This is a problem that I have figured out how to solve, but I want to solve it in a simpler way... I'm trying to improve as a programmer.
Have done my research and have failed to find an elegant solution to the following problem:
I have a hypothetical array of keywords to search for:
$keyword_array = array('he','heather');
and a hypothetical string:
$text = "What did he say to heather?";
And, finally, a hypothetical function:
function bold_keywords($text, $keyword_array)
{
$pattern = array();
$replace = array();
foreach($keyword_array as $keyword)
{
$pattern[] = "/($keyword)/is";
$replace[] = "<b>$1</b>";
}
$text = preg_replace($pattern, $replace, $text);
return $text;
}
The function (not too surprisingly) is returning something like this:
"What did <b>he</b> say to <b>he</b>ather?"
Because it is not recognizing "heather" when there is a bold tag in the middle of it.
What I want the final solution to do is, as simply as possible, return one of the two following strings:
"What did <b>he</b> say to <b>heather</b>?"
"What did <b>he</b> say to <b><b>he</b>ather</b>?"
Some final conditions:
--I would like the final solution to deal with a very large number of possible keywords
--I would like it to deal with the following two situations (lines represent overlapping strings):
One string engulfs the other, like the following two examples:
-- he, heather
-- sanding, and
Or one string does not engulf the other:
-- entrain, training
Possible way to solve:
-A regex that ignores tags in keywords
-Long way (that I am trying to avoid):
*Search string for all occurrences of each keyword, store an array of positions (start and end) of keywords to be bolded
*Process this array recursively to combine overlapping keywords, so there is no redundancy
*Add the bold tags (starting from the end of the string, to avoid the positions of information shifting from the additional characters)
Many thanks in advance!
Example
$keyword_array = array('he','heather');
$text = "What did he say to heather?";
$pattern = array();
$replace = array();
sort($keyword_array, SORT_NUMERIC);
foreach($keyword_array as $keyword)
{
$pattern[] = "/ ($keyword)/is";
$replace[] = " <b>$1</b>";
}
$text = preg_replace($pattern, $replace, $text);
echo $text; // What did <b>he</b> say to <b>heather</b>?
need to change your regex pattern to recognize that each "term" you are searching for is followed by whitespace or punctuation, so that it does not apply the pattern match to items followed by an alpha-numeric.
Simplistic and lazy-ish Approach off The Top of My head:
Sort your initial Array by Item length, descending! No more "Not recognized because there's already a Tag in The Middle" issues!
Edit: The nested tags issue is then easily fixed by extending your regex in a Way that >foo and foo< isn't being matched anymore.
I am relatively new to php, and hope someone can help me with a replace regex, or maybe a match replace I am not exactly sure.
I want to automatically bold the (second occurance of a match) and then make the 4th appearance of a match italic and then the 7th appearance of a match underlined.
This is basically for SEO purposes in content.
I have done some replacements with: and were thinking this should do the trick?
preg_replace( pattern, replacement, subject [, limit ])
I already know the word I want to use in
'pattern' is also a word that is already defined like [word].
`replacement` 'This is a variable I am getting from a mysql db.
'subject' - The subject is text from a db.
Lets say I have this content: This explains more or less what I want to do.
This is an example of the text that I want to replace. In this text I want to make the second occurance of the word example < bold. Then I want to skip the next time example occurs in the text, and make the 4th time the word example appears in italic. Then I want to skip the 5th time the word example appears in the text, as well as the 6th time and lastly wants to make the 7th time example appears in the text underline it. In this example I have used a hyperlink as the underline example as I do not see an underline function in the text editor. The word example may appear more times in the text, but my only requerement is to underline once, make bold once and make italic once. I may later descide to do some quotes on the word "example" as well but it is not yet priority.
It is also important for the code not to through an error if there is not atleast 7 occurances of the word.
How would I do this, any ideas would be appreciated.
You could use preg_split to split the text at the matches, apply the modifications, and then put everything back together:
$parts = preg_split('/(example)/', $str, 7, PREG_SPLIT_DELIM_CAPTURE);
if (isset($parts[3])) $parts[3] = '<b>'.$parts[3].'</b>';
if (isset($parts[7])) $parts[7] = '<i>'.$parts[7].'</i>';
if (isset($parts[13])) $parts[13] = '<u>'.$parts[13].'</u>';
$str = implode('', $parts);
The index formula for the i-th match is index = i · 2 - 1.
The regular expression itself cannot count, and the preg_ functions provide little help. You need a workaround. If you were to actually search for just a word, you might want to use string functions. Otherwise try:
// just counting
if (7 >= preg_match_all($pattern, $subject, $matches)) {
$cb_num = 0;
$subject = preg_replace_callback($pattern, "cb_ibu", $subject);
}
function cb_ibu($match) {
global $cb_num;
$match = $match[0];
switch (++$cb_num) {
case 2: return "<b>$match</b>";
case 4: return "<i>$match</i>";
case 7: return "<u>$match</u>";
default: return $match;
}
}
The trick is to have a callback which does the accounting. And there it's quite easy to add any rules.
That's an interesting question. My implementation would be:
function replace_exact($word, $tag, $string, $limit) {
$tag1 = '<'.$tag.'>';
$tag2 = '</'.$tag.'>';
$string = str_replace($word, $tag1.$word.$tag2, $string, 1);
if ($limit==1) return $string;
return str_replace($tag1.$word.$tag2,$word,$string,$limit-1);
}
Use it like this:
echo replace_exact('Example', 'b', $source_text, 2);
echo replace_exact('Example', 'i', $source_text, 4);
I don't know about how fast this will work, but it will be faster than preg_replace.