I am being "thick this morning" so please excuse this simple question - I have an array of keywords e.g. array('keyword1','keyword2'....) and I have a string of text - (bit like a blog content in length i.e. not just a few words but may be 200-800 words) what is the best way to search the string for the keywords and replace them with an href link. So in the text 'keyword 1' (as plain text) will become <a href='apage'>keyword1</a> and so on.
See said was being thick this am.
Thanks in adavance.
Typical preg_replace case:
$text = "There is some text keyword1 and lorem ipsum keyword2.";
$keywords = array('keyword1', 'keyword2');
$regex = '/('.implode('|', $keywords).')/i';
// You might want to make the replacement string more dependent on the
// keyword matched, but you 'll have to tell us more about it
$output = preg_replace($regex, '\\1', $text);
print_r($output);
See it in action.
Now the above doesn't do a very "smart" replace in the sense that the href is not a function of the matched keyword, while in practice you will probably want to do that. Look into preg_replace_callback for more flexibility here, or edit the question and provide more information regarding your goal.
WHY would you use regex instead of just str_replace() !? Regex works, but it over complicates such an incredibly simple question.
Related
I would like to extract the last 4 words from a string as a single chunk.
For example, if I have:
"Are You Looking For The Best Website Review"
I'd like to catch the last four words as:
"The Best Website Review"
I only have basic coding knowledge and have tried every variation I could find within this forum.
The closest I've come is by using the suggestion of Rick Kuipers (How to obtain the last word of a string) but this gives me the words as individual values.
$split = explode(" ", $str);
echo $split[count($split)-4];
echo $split[count($split)-3];
echo $split[count($split)-2];
echo $split[count($split)-1];
My coding knowledge is limited so any suggestions would be appreciated.
Simply use str_word_count along with the array_splice like as
$str = "Are You Looking For The Best Website Review";
$arr = str_word_count($str,1);
echo implode(' ',array_splice($arr,-4));
Edited
If your text contains Are-You-Looking-For-The-Best-Website-Review hiphen the you can use str_replace like as
$arr = str_word_count(str_replace('-',' ',$str),1);
echo implode(' ',array_splice($arr,-4));
Demo
You can use a simple regular expression for this:
<?php
$subject = "Are You Looking For The Best Website Review";
$pattern = '/(\w+\s+\w+\s+\w+\s+\w+)\s*$/u';
preg_match($pattern, $subject, $tokens);
var_dump($tokens[0]);
The output is:
string(23) "The Best Website Review"
This also has the advantage that the type and number of whitespaces between words is irrelevant and trailing whitespaces at the end of the string are ignored.
Edit: considering your comment to the answer posted by #Uchiha this is a variant of the pattern that shows how you can easily match words delimited by other characters apart from whitespaces:
$pattern = '/(\w+[-\s]+\w+[-\s]+\w+[-\s]+\w+)\s*$/u';
I can't tell you guys how many hours I've spent on this one. I simply want to IGNORE any instances of keywords that are BETWEEN the strong tags. Whether they are directly next to the tags or somewhere in between with other words. All while keeping the keywords case-insensitive.
Example:
The man drove in his car. Then <strong>the man walked to the boat.</strong>
The word boat should be ignored and Car should be replaced.
$keywords = array(
'boat',
'car',
);
$p = implode('|', array_map('preg_quote', $keywords));
$string = preg_replace("/\b($p)\b/i", 'gokart', $string, 4);
You can use a SKIP-FAIL regex for to only replace something that is clearly outside on non-identical delimiters:
<strong>.*?<\/strong>(*SKIP)(*FAIL)|\b(boat|car)\b
See demo
Here is an IDEONE demo:
$str = "The man drove in his car.Then <strong>the man walked to the boat.</strong>";
$keywords = array('boat','car');
$p = implode('|', array_map('preg_quote', $keywords));
$result = preg_replace("#<strong>.*?<\/strong>(*SKIP)(*FAIL)|\b($p)\b#i", "gokart", $str);
echo $result;
NOTE that in this case, we most probably are not interested in a tempered greedy token solution inside the SKIP-FAIL block (that I posted initially, see revision history) since we do not care what is in between the delimiters.
Regex is probably not the best way to do something like this.
It would probably be best to use a DOM parser or something similar to properly find the <strong> tags.
A few of the answers here offer some good options: RegEx: Matching text that is not inside and part of a HTML tag
currently i am replacing all my quotes inside a text with special quotes.
But how can i change my regex that only quotes inside the text will be replaced and not the ones who are used in html tags.
$text = preg_replace('/"(?=\w)/', "»", $text);
$text = preg_replace('/(?<=\w)"/', "«", $text);
I am not that fit in regular expressions. The problem is that i need to replace the starting quotes with another symbol than ending quotes.
If you do need more information, say so.
Any help is appreciated!
EDIT
Test Case
<p>This is a "wonderful long text". At least it should be. Here we have a link.</p>
The expected output should be:
<p>This is a »wonderful long text«. At least it should be. Here we have a link.</p>
Right now it is like this:
<p>This is a »wonderful long text«. At least it should be. Here we have a <a href=»http://wwww.site-to-nowhere.com« target=»_blank«>link</a>.</p>
EDIT 2
Thx for the answer of Kamehameha i've added the following code to my script:
$text = preg_replace("/\"([^<>]*?)\"(?=[^>]+?<)/", "»\1«", $text);
What worked great in the regex tester does not replace anything. Did i do anything wrong?
This regex works for the given strings.
Search for - "([^<>]*?)"(?=[^>]*?<)
Replace with - »\1«
Demo here
Testing it -
INPUT -
<p>This is a "wonderful long text". "Another wonderful ong text" At least it should be. Here we have a link.</p>
OUTPUT -
<p>This is a »wonderful long text«. »Another wonderful ong text« At least it should be. Here we have a link.</p>
EDIT 1-
Executing this in PHP -
$str = '<p>This is a "wonderful long text". "Another wonderful ong text" At least it should be. Here we have a link.</p>';
var_dump(preg_replace('/"([^<>]*?)"(?=[^>]*?<)/', '»\1«', $str));
It's output -
/** OUTPUT **/
string '<p>This is a »wonderful long text«. »Another wonderful ong text« At least it should be. Here we have a link.</p>' (length=196)
EDIT 2-
You have executed the preg_replace function properly, but in the replacement string, you have used \1 inside the Double quotes(""). Doing so, you are escaping the 1 itself and that won't be replaced.
To make it more clear, try this and see what happens -
echo '»\1«';
echo "»\1«";
The second \1 should not be visible.
So the solution would be one of these -
preg_replace('/"([^<>]*?)"(?=[^>]*?<)/', '»\1«', $str)
preg_replace("/\"([^<>]*?)\"(?=[^>]*?<)/", "»\\1«", $str)
preg_replace("/\"([^<>]*?)\"(?=[^>]*?<)/", "»$1«", $str)
Read the Replacement section in this page for more clarity.
EDIT 3-
A regex that covers text which might not be enclosed within tags-
\"([^<>]*?)\"(?=(?:[^>]*?(?:<|$)))
Demo here
Could also use a negative lookahead:
(?![^<]*>)"([^"]+)"
Replace with: »\1«
For the record, there is a simple PHP solution that was not mentioned and that efficiently skips over all the <a...</a> tags.
Search: <a.*?<\/a>(*SKIP)(*F)|"([^"]*)"
Replace: »\1«
In the Demo, look at the Substitutions at the bottom.
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Use this regex:
(?<=^|>)[^><]+?(?=<|$)
This will match non html strings.
And then do your regex on the resultant string
I am writing a PHP function which is supposed to convert certain keywords into links. It uses Cyrillic words in UTF-8.
So I came up with this:
function keywords($text){
$keywords = Db::get('keywords'); //array with words and corresponding links
foreach ($keywords as $value){
$keyword = $value['keyword'];
$link = $value['link'];
$text = preg_replace('/(?<!\pL)('.$keyword.')(?!\pL)/iu', '$1', $text);
}
return $text;
}
So far this runs like a charm, but now I want to replace phrases with links - phrases that may contain other keywords. For example I want the word "car" to link to one place, and "blue car" to other.
Any ideas?
As written in the comment, i post this as an answer, hoping it's been useful to you.
You could try replacing the keyword into the text firstly by using a placeholder and then, when entire text has been parsed, you can substitute those placeholders with the real words.
For example, take the phrase:
"I have a car, a blue car."
We already ordered the keywords list from longer to smaller, so we get to check "blue car"; We find it in the text, so we put the placeholder and obtain:
"I have a car, a [[1]]."
The second keyword in the list is "car"; after substitution in the text, we obtain:
"I have a [[2]], a [[1]]."
Finally, when all keywords have been substituted, you only have to replace the placeholders in their order using the preg_replace in your function, and get the text with links.
I am trying to convert specific keywords in text, which are stored in array, to the links.
Example text:
$text='This text contains many keywords, but also formated keywords.'
So now I want to convert the word keywords to the #keywords.
I used the very simple preg_replace function
preg_replace('/keywords/i',' keywords ',$text);
but obviously it converts to link also the string already formatted as a link, so I get a messy html like:
$text='This text contains many keywords, but also formated keywords" title="keywords">keywords</a>.'
Expected result:
$text='This text contains many keywords, but also formated keywords.'
Any suggestions?
THX
EDIT
We are one step from the perfect function, but still not working well in this case:
$text='This text contains many keywords, but also formated
keywords.'
In this case it replaces also the word keywords in the href, so we again get the messy code like
keywords.com/keywords" title="keywords">keywords</a>
I'm not great with regular expressions, but maybe this one will work:
/[^#>"]keywords/i
What I think it will do is ignore any instances of #keywords, >keywords, and "keywords and find the rest.
EDIT:
After testing it out, it looks like that replaces the space before the word as well, and doesn't work if keywords is the beginning of the string. It also didn't preserve original capitalization. I have tested this one, and it works perfectly for me:
$string = "Keywords and keywords, plus some more keywords with the original keywords.";
$string = preg_replace("/(?<![#>\"])keywords/i", "$0", $string);
echo $string;
The first three are replaced, preserving the original capitalization, and the last one is left untouched. This one uses a negative lookbehind and backreferences.
EDIT 2:
OP edited question. With the new example provided, the following regex will work:
$string = 'This text contains many keywords, but also formated keywords.';
$string = preg_replace("/(?<![#>\".\/])keywords/i", "$0", $string);
echo $string;
// outputs: This text contains many keywords, but also formated keywords.
This will replace all instances of keywords that are not preceded by #, >, ", ., or /.
Here is the problem:
The keyword could be inside the href, the title, or the text of the link, and anywhere in there (like if the keyword was sanity and you already had href="insanity". Or even worse, you could have a non-keyword link that happens to contain a keyword, something like:
Click here to find more keywords and such!
In the above example, even though it fits every other possible criteria (it's got spaces before and after being the easiest one to test for), it still would result in a link within a link, which I think breaks the internet.
Because of this, you need to use lookaheads and lookbehinds to check if the keyword is wrapped in a link. But there is one catch: lookbehinds have to have a defined pattern (meaning no wild cards).
I thought I'd be the hero and show you the easy fix for your issue, which would be something to the effect of:
'/(?<!\<a.?>)[list|of|keywords](?!\<\/a>)/'
Except you can't do that because the lookbehind in this case has that wildcard. Without it, you end up with a super greedy expression.
So my proposed alternative is to use regex to find all link elements, then str_replace to swap them out with a placeholder, and then replacing them with the placeholder at the end.
Here's how I did it:
$text='This text contains many keywords, but also formated keywords.';
$keywords = array('text', 'formatted', 'keywords');
//This is just to make the regex easier
$keyword_list_pattern = '['. implode($keywords,"|") .']';
// First, get all matching keywords that are inside link elements
preg_match_all('/<a.*' . $keyword_list_pattern . '.*<\/a>/', $text, $links);
$links = array_unique($links[0]); // Cleaning up array for next step.
// Second, swap out all matches with a placeholder, and build restore array:
foreach($links as $count => $link) {
$link_key = "xxx_{$count}_xxx";
$restore_links[$link_key] = $link;
$text = str_replace($link, $link_key, $text);
}
// Third, we build a nice replacement array for the keywords:
foreach($keywords as $keyword) {
$keyword_links[$keyword] = "<a href='#$keyword'>$keyword</a>";
}
// Merge the restore links to the bottom of the keyword links for one mass replacement:
$keyword_links = array_merge($keyword_links, $restore_links);
$text = str_replace(array_keys($keyword_links), $keyword_links, $text);
echo $text;
You can change your RegEx so that it only targets keywords with a space in front. Since the formatted keywords do no contain a space. Here is an example.
$text = preg_replace('/ keywords/i',' keywords',$text);