Preg_replace cyrillic word or phrase - not both - php

I am writing a PHP function which is supposed to convert certain keywords into links. It uses Cyrillic words in UTF-8.
So I came up with this:
function keywords($text){
$keywords = Db::get('keywords'); //array with words and corresponding links
foreach ($keywords as $value){
$keyword = $value['keyword'];
$link = $value['link'];
$text = preg_replace('/(?<!\pL)('.$keyword.')(?!\pL)/iu', '$1', $text);
}
return $text;
}
So far this runs like a charm, but now I want to replace phrases with links - phrases that may contain other keywords. For example I want the word "car" to link to one place, and "blue car" to other.
Any ideas?

As written in the comment, i post this as an answer, hoping it's been useful to you.
You could try replacing the keyword into the text firstly by using a placeholder and then, when entire text has been parsed, you can substitute those placeholders with the real words.
For example, take the phrase:
"I have a car, a blue car."
We already ordered the keywords list from longer to smaller, so we get to check "blue car"; We find it in the text, so we put the placeholder and obtain:
"I have a car, a [[1]]."
The second keyword in the list is "car"; after substitution in the text, we obtain:
"I have a [[2]], a [[1]]."
Finally, when all keywords have been substituted, you only have to replace the placeholders in their order using the preg_replace in your function, and get the text with links.

Related

Is there a way to trim a word from a string if it is a duplicate word and not trim when there is no duplicate word?

I need a combination of if statements in which a specific word is occurring twice, then only the first one of these instances needs to be removed.
In my code $heading_title is a string to display as a product name.
Now in two cases the heading title uses the same word twice.
heading title = Same Same words used
or
heading title = Words Words are the same
or
heading title = correct words used
Now in the case of Same and Words I want the string to be trimmed so it will display like:
Same words used and Words are the same
the last heading title is ok. Is there a way to accomplish that?
I tried some answers about trim here on Stack, but I cannot get it to work, so only the initial code is pasted.
<h1 class="heading-title" itemprop="name"><?php echo $heading_title; ?></h1>
Result is that for most products it is ok, but in those two cases where the first word is the same, it doesn't look nice for a product name..
I would suggest to solve this using regex. What I got from your question, you would like to remove duplicate words from the start of the string. A pattern like this would help:
^(\w+)\s(\1)(?=\s)
Regex Demo
Code Sample:
$re = '/^(\w+)\s(\1)(?=\s)/m';
$str = 'Words Words are the same
correct words used
Same words used and Words are the same
Same words used and Words are the same Same';
$subst = '$1';
$result = preg_replace($re, $subst, $str);
echo $result;

Modify regex in a script that highlights certain words in a text by adding span classes to them

I am working on a code that highlights certain words, using a regex.
Here it is:
function addRegEx($word){
return "/\b(\w+)?".$word."(\w+)?\b/i";
}
function highlight($word){
return "<span class=\"highlighted\">".$word[0]."</span>";
}
function customHighlights($searchString,$toHighlight){
$searchFor = array_map('addRegEx',$toHighlight);
$result = preg_replace_callback($searchFor,'highlight',$searchString);
return $result;
}
Lets say I use the function customHighlights to searc for the word "car" in a certain text:
Using the boundary - \b - method, the script searches for the word car in the text.
In the regex, I have added (\w+)? in front and after the word, so the script would match words that contain "car" - cars, sportcars, etc...
The problem is, it messes up the inner html, for example:
This is a great car. Click here for more
The script will match the word car in the url of the link above, adding span classes to it and messing up the html.
How would you modify the regex and avoid this?
Use a regular expression that searches for the word after the last > or beginning of text, but part between this and the word may not contain a tag start <.
See this codepad
Code
<?php
$str = 'This is a great car. Click here for more cars';
$word = 'car';
$exp = "/((^|>)[^<]*)(\b(\w+)?".$word."(\w+)?\b)/i";
$repl = "\\1<span class=\"highlighted\">\\3</span>";
var_dump(preg_replace($exp, $repl, $str));
?>
Output
string(141) "This is a great <span class="highlighted">car</span>. Click here for more <span class="highlighted">cars</span>"
Did you consider processing the text highlighting on client-side using Javascript? jQuery or similar could allow you to iterate over nodes to find where to highlight instead of working with the raw HTML.
I can't help you much with the regular expression though.

PHP regex to match expressions not containg a pattern

I have the following text:
"This is a test text. Test, comma instead of space."
I iterate through each word and want to replace each word to a distinct link. Let's say
wordToReplace
My problem is that consecutive matches of the word "test" (to use the above example) replace the href and anchor text so I'm left with links inside links which is not good at all.
(to give a base idea of my problem this is what I'm left with. It has some additional markup.)
This is a<a href="/index.php?r<a href="/index.php?r=texts/addWord"
class="wordLink info label" id="yt2">text</a>/addWord" class="wordLink
info label" id="yt1">test</a>text.<a href="/index.php?r=texts/addWord"
class="wordLink info label" id="yt3">Test</a><a
href="/index.php?r=texts/addWord" class="wordLink info label"
id="yt4">comma</a>instead of<a href="/index.php?r=texts/addWord"
class="wordLink info label" id="yt6">space</a>
I'm trying
preg_replace("/[^\">]".$word."[^\"<]/", $link, $text->text);
but I don't think I'm on the right track at all.
Thanks for the time
From your one line of code, I'm assuming that you have that line in a loop which iterates through all the $words you want to replace, which causes the problem.
What you need to do is put all those replacements into only one preg_replace call. For that, regexes provide alternatives. So say, your list of words consisted of test, text and this. Then you could do:
preg_replace('/(test|text|this)/', $link, $text->text);
And if you have all your words in an array $words then you can generate the regex simply with:
$wordList = implode('|', $words);
preg_replace('/('.$wordList.')/', $link, $text->text);
You might want to add an i at the very end of your regex, if your checks are supposed to be case-insenstive.
In case some of your words are parts of other words (e.g. you want to replace text and texture), you could check for word boundaries:
preg_replace('/\b('.$wordList.')\b/', $link, $text->text);
Is this what you are looking for?
EDIT: If your $link was pre-generated for every $word in your loop before, you can now replace that word with $1. If your $link was something like
$link = ''.$word.'';
you can now simply use
$link = '$1';

Automatically convert keywords to links in php

I am trying to convert specific keywords in text, which are stored in array, to the links.
Example text:
$text='This text contains many keywords, but also formated keywords.'
So now I want to convert the word keywords to the #keywords.
I used the very simple preg_replace function
preg_replace('/keywords/i',' keywords ',$text);
but obviously it converts to link also the string already formatted as a link, so I get a messy html like:
$text='This text contains many keywords, but also formated keywords" title="keywords">keywords</a>.'
Expected result:
$text='This text contains many keywords, but also formated keywords.'
Any suggestions?
THX
EDIT
We are one step from the perfect function, but still not working well in this case:
$text='This text contains many keywords, but also formated
keywords.'
In this case it replaces also the word keywords in the href, so we again get the messy code like
keywords.com/keywords" title="keywords">keywords</a>
I'm not great with regular expressions, but maybe this one will work:
/[^#>"]keywords/i
What I think it will do is ignore any instances of #keywords, >keywords, and "keywords and find the rest.
EDIT:
After testing it out, it looks like that replaces the space before the word as well, and doesn't work if keywords is the beginning of the string. It also didn't preserve original capitalization. I have tested this one, and it works perfectly for me:
$string = "Keywords and keywords, plus some more keywords with the original keywords.";
$string = preg_replace("/(?<![#>\"])keywords/i", "$0", $string);
echo $string;
The first three are replaced, preserving the original capitalization, and the last one is left untouched. This one uses a negative lookbehind and backreferences.
EDIT 2:
OP edited question. With the new example provided, the following regex will work:
$string = 'This text contains many keywords, but also formated keywords.';
$string = preg_replace("/(?<![#>\".\/])keywords/i", "$0", $string);
echo $string;
// outputs: This text contains many keywords, but also formated keywords.
This will replace all instances of keywords that are not preceded by #, >, ", ., or /.
Here is the problem:
The keyword could be inside the href, the title, or the text of the link, and anywhere in there (like if the keyword was sanity and you already had href="insanity". Or even worse, you could have a non-keyword link that happens to contain a keyword, something like:
Click here to find more keywords and such!
In the above example, even though it fits every other possible criteria (it's got spaces before and after being the easiest one to test for), it still would result in a link within a link, which I think breaks the internet.
Because of this, you need to use lookaheads and lookbehinds to check if the keyword is wrapped in a link. But there is one catch: lookbehinds have to have a defined pattern (meaning no wild cards).
I thought I'd be the hero and show you the easy fix for your issue, which would be something to the effect of:
'/(?<!\<a.?>)[list|of|keywords](?!\<\/a>)/'
Except you can't do that because the lookbehind in this case has that wildcard. Without it, you end up with a super greedy expression.
So my proposed alternative is to use regex to find all link elements, then str_replace to swap them out with a placeholder, and then replacing them with the placeholder at the end.
Here's how I did it:
$text='This text contains many keywords, but also formated keywords.';
$keywords = array('text', 'formatted', 'keywords');
//This is just to make the regex easier
$keyword_list_pattern = '['. implode($keywords,"|") .']';
// First, get all matching keywords that are inside link elements
preg_match_all('/<a.*' . $keyword_list_pattern . '.*<\/a>/', $text, $links);
$links = array_unique($links[0]); // Cleaning up array for next step.
// Second, swap out all matches with a placeholder, and build restore array:
foreach($links as $count => $link) {
$link_key = "xxx_{$count}_xxx";
$restore_links[$link_key] = $link;
$text = str_replace($link, $link_key, $text);
}
// Third, we build a nice replacement array for the keywords:
foreach($keywords as $keyword) {
$keyword_links[$keyword] = "<a href='#$keyword'>$keyword</a>";
}
// Merge the restore links to the bottom of the keyword links for one mass replacement:
$keyword_links = array_merge($keyword_links, $restore_links);
$text = str_replace(array_keys($keyword_links), $keyword_links, $text);
echo $text;
You can change your RegEx so that it only targets keywords with a space in front. Since the formatted keywords do no contain a space. Here is an example.
$text = preg_replace('/ keywords/i',' keywords',$text);

PHP replace keywords in string with href link

I am being "thick this morning" so please excuse this simple question - I have an array of keywords e.g. array('keyword1','keyword2'....) and I have a string of text - (bit like a blog content in length i.e. not just a few words but may be 200-800 words) what is the best way to search the string for the keywords and replace them with an href link. So in the text 'keyword 1' (as plain text) will become <a href='apage'>keyword1</a> and so on.
See said was being thick this am.
Thanks in adavance.
Typical preg_replace case:
$text = "There is some text keyword1 and lorem ipsum keyword2.";
$keywords = array('keyword1', 'keyword2');
$regex = '/('.implode('|', $keywords).')/i';
// You might want to make the replacement string more dependent on the
// keyword matched, but you 'll have to tell us more about it
$output = preg_replace($regex, '\\1', $text);
print_r($output);
See it in action.
Now the above doesn't do a very "smart" replace in the sense that the href is not a function of the matched keyword, while in practice you will probably want to do that. Look into preg_replace_callback for more flexibility here, or edit the question and provide more information regarding your goal.
WHY would you use regex instead of just str_replace() !? Regex works, but it over complicates such an incredibly simple question.

Categories