I would like modify HTML like
I am <b>Sadi, novice</b> programmer.
to
I am <b>Sadi, learner</b> programmer.
To do it I will search using a string "novice programmer". How can I do it please? Any idea?
It search using more than one word "novice programmer". It could be a whole sentence. The extra white space (e.g. new line, tab) should be ignored and any tag must be ignored during the search. But during the replacement tag must be preserved.
It is a sort of converter. It will be better if it is case insensitive.
Thank you
Sadi
More clarification:
I get some nice reply with possible solution. But please keep posting if you have any idea in mind.
I would like to more clarify the problem just in case anyone missed it. Main post shows the problem as an example scenario.
1) Now the problem is find and replace some string without considering the tags. The tags may shows up within a single word. String may contain multiple word. Tag only appear in the content string or the document. The search phrase never contain any tags.
We can easily remove all tags and do some text operation. But here the another problem shows up.
2) The tags must be preserve, even after replacing the text. That is what the example shows.
Thank you Again for helping
ok i think this is what you want. it takes your input search and replace, splits them into arrays of strings delimited by space, generates a regexp that finds the input sentence with any number of whitespace/html tags, and replaces it with the replacement sentence with the same tags replaced between the words.
if the wordcount of the search sentence is higher than that of the replacement, it just uses spaces between any extra words, and if the replacement wordcount is higher than the search, it will add all 'orphaned' tags on the end. it also handles regexp chars in the find and replace.
<?php
function htmlFriendlySearchAndReplace($find, $replace, $subject) {
$findWords = explode(" ", $find);
$replaceWords = explode(" ", $replace);
$findRegexp = "/";
for ($i = 0; $i < count($findWords); $i++) {
$findRegexp .= preg_replace("/([\\$\\^\\|\\.\\+\\*\\?\\(\\)\\[\\]\\{\\}\\\\\\-])/", "\\\\$1", $findWords[$i]);
if ($i < count($findWords) - 1) {
$findRegexp .= "(\s?(?:<[^>]*>)?\s(?:<[^>]*>)?)";
}
}
$findRegexp .= "/i";
$replaceRegexp = "";
for ($i = 0; $i < count($findWords) || $i < count($replaceWords); $i++) {
if ($i < count($replaceWords)) {
$replaceRegexp .= str_replace("$", "\\$", $replaceWords[$i]);
}
if ($i < count($findWords) - 1) {
$replaceRegexp .= "$" . ($i + 1);
} else {
if ($i < count($replaceWords) - 1) {
$replaceRegexp .= " ";
}
}
}
return preg_replace($findRegexp, $replaceRegexp, $subject);
}
?>
here are the results of a few tests :
Original : <b>Novice Programmer</b>
Search : Novice Programmer
Replace : Advanced Programmer
Result : <b>Advanced Programmer</b>
Original : Hi, <b>Novice Programmer</b>
Search : Novice Programmer
Replace : Advanced Programmer
Result : Hi, <b>Advanced Programmer</b>
Original : I am not a <b>Novice</b> Programmer
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b>Advanced</b> Programmer
Original : Novice <b>Programmer</b> in the house
Search : Novice Programmer
Replace : Advanced Programmer
Result : Advanced <b>Programmer</b> in the house
Original : <i>I am not a <b>Novice</b> Programmer</i>
Search : Novice Programmer
Replace : Advanced Programmer
Result : <i>I am not a <b>Advanced</b> Programmer</i>
Original : I am not a <b><i>Novice</i> Programmer</b> any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b><i>Advanced</i> Programmer</b> any more
Original : I am not a <b><i>Novice</i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a <b><i>Advanced</i></b> Programmer any more
Original : I am not a Novice<b> <i> </i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a Advanced<b> <i> </i></b> Programmer any more
Original : I am not a Novice <b><i> </i></b> Programmer any more
Search : Novice Programmer
Replace : Advanced Programmer
Result : I am not a Advanced <b><i> </i></b> Programmer any more
Original : <i>I am a <b>Novice</b> Programmer</i> too, now
Search : Novice Programmer too
Replace : Advanced Programmer
Result : <i>I am a <b>Advanced</b> Programmer</i> , now
Original : <i>I am a <b>Novice</b> Programmer</i>, now
Search : Novice Programmer
Replace : Advanced Programmer Too
Result : <i>I am a <b>Advanced</b> Programmer Too</i>, now
Original : <i>I make <b>No money</b>, now</i>
Search : No money
Replace : Mucho$1 Dollar$
Result : <i>I make <b>Mucho$1 Dollar$</b>, now</i>
Original : <i>I like regexp, you can do [A-Z]</i>
Search : [A-Z]
Replace : [Z-A]
Result : <i>I like regexp, you can do [Z-A]</i>
I would do this:
if (preg_match('/(.*)novice((?:<.*>)?\s(?:<.*>)?programmer.*)/',$inString,$attributes) {
$inString = $attributes[1].'learner'.$attributes[2];
}
It should match any of the following:
novice programmer
novice</b> programmer
novice </b>programmer
novice<span> programmer
A test version of what the regex states would be something like: Match any set of characters until you reach "novice" and put it into a capturing group, then maybe match something that starts with a '<' and has any number of characters after it and then ends with '>' (but don't capture it), but then there only match something with a white space and then maybe match again something that starts with a '<' and has any number of characters after it and then ends with '>' (but don't capture it) which must then be followed by programmer followed by any number of characters and put that into a capture group.
I would do some specific testing though, as I may have missed some stuff. Regex is a programmers best friend!
Well, there might be a better way, but off the top of my head (assuming that tags won't appear in the middle of words, HTML is well-formed, etc.)...
Essentially, you'll need three things (sorry if this sounds patronising, not intended that way):
1. A method of sub-string matching that ignores tags.
2. A way of making the replacement preserving the tags.
3. A way of putting it all together.
1 - This is probably the most difficult bit. One method would be to iterate through all of the characters in the source string (strings are basically arrays of characters so you can access the characters as if they are array elements), attempting to match as many characters as possible from the search string, stopping when you've either matched all of the characters or run out of characters to match. Any characters between and including '<' and '>' should be ignored. Some pseudo-code (check this over, it's late and there may be mistakes):
findMatch(startingPos : integer, subject : string, searchString : string){
//Variables for keeping track of characters matched, positions, etc.
inTag = false;
matchFound = false;
matchedCharacters = 0;
matchStart = 0;
matchEnd = 0;
for(i from startingPos to length(searchString)){
//Work out when entering or exiting tags, ignore tag contents
if(subject[i] == '<' || subject[i] == '>'){
inTag = !inTag;
}
else if(!inTag){
//Check if the character matches expected in search string
if(subject[i] == searchString[matchedCharacters]){
if(!matchFound){
matchFound = true;
matchStart = i;
}
matchedCharacters++;
//If all of the characters have been matched, return the start and end positions of the substring
if(matchedCharacters + 1 == length(searchString)){
matchEnd = i - matchStart;
return matchStart, matchEnd;
}
}
else{
//Reset counts if not found
matchFound = false;
matchCharacters = 0;
}
}
}
//If no full matches were found, return error
return -1;
}
2 - Split the HTML source code into three strings - the bit you want to work on (between the two positions returned by the matching function) and the part before and after. Split up the bit you want to modify using, for example:
$parts = preg_split("/(<[^>]*>)/",$string, -1, PREG_SPLIT_DELIM_CAPTURE);
Keep a record of where the tags are, concatenate the non-tag segments and perform substring replace on this as normal, then split the modified string up again and reassemble with the tags in place.
3 - This is the easy part, just concatenate the modified part and the other two bits back together.
I may have horribly over complicated this mind, if so just ignore me.
Unless cOm's already written it, the regex would be the best way to go:
$cleaned_string = preg_replace('/\<.\>/', $raw_text, "");
Or something like that. I would need to research/test the regex.
Then you can just use a simple $foobar = str_replace($find, $replace_with, $cleaned_string); to find the text you want to replace.
Didn't realize he wanted to put the HTML back in. It's all regex for that, and more than I know at the moment.
Knowing what I do know, technique-wise I would probably use an expression that didn't ignore whitespace between the words, but did between the < and > brackets, then use the variable-containing abilities of regex to output.
Interesting problem.
I would use the DOM and XPath to find the closest nodes containing that text and then use substring matching to find out which bit of the string is in what node. That will involve character-per-character matching and possible backtracking, though.
Here is the first part, finding the container nodes:
<?php
error_reporting(E_ALL);
header('Content-Type: text/plain; charset=UTF-8');
$doc = new DOMDocument();
$doc->loadHTML(<<<EOD
<p>
<span>
<i>
I am <b>Sadi, novice</b> programmer.
</i>
</span>
</p>
<ul>
<li>
<div>
I am <em>Cornholio, novice</em> programmer of television shows.
</div>
</li>
</ul>
EOD
);
$xpath = new DOMXPath($doc);
// First, get a list of all nodes containing the text anywhere in their tree.
$nodeList = $xpath->evaluate('//*[contains(string(.), "programmer")]');
$deepestNodes = array();
// Now only keep the deepest nodes, because the XPath query will also return HTML, BODY, ...
foreach ($nodeList as $node) {
$deepestNodes[] = $node;
$ancestor = $node;
while (($ancestor = $ancestor->parentNode) && ($ancestor instanceof DOMElement)) {
$deepestNodes = array_filter($deepestNodes, function ($existingNode) use ($ancestor) {
return ($ancestor !== $existingNode);
});
}
}
foreach ($deepestNodes as $node) {
var_dump($node->tagName);
}
I hope that helps you along.
Since you didn't give exact specifics on what you will use this for, I will use your example of "I am sadi, novice programmer".
$before = 'I am <b>sadi, novice</b> programmer';
$after = preg_replace ('/I am (<.*>)?(.*), novice(<.*>)? programmer/','/I am $1$2, learner$3 programmer/',$string);
Alternatively, for any text:
$string = '<b>Hello</b>, world!';
$orig = 'Hello';
$replace = 'Goodbye';
$pattern = "/(<.*>)?$orig(<.*>)?/";
$final = "/$1$replace$2/";
$result = preg_replace($pattern,$final,$string);
//$result should now be 'Goodbye, world!'
Hope that helped. :d
Edit: An example of your example, with the second piece of code:
$string = 'I am sadi, novice programmer.';
$orig = 'novice';
$replace = 'learner';
$pattern = "/(<.>)?$orig(<.>)?/";
$final = "$1$replace$2";
$result = htmlspecialchars(preg_replace($pattern,$final,$string));
echo $result;
The only problem is if you were searching for something that was more than a word long.
Edit 2: Finally came up with a way to do it across multiple words. Here's the code:
function htmlreplace($string,$orig,$replace)
{
$orig = explode(' ',$orig);
$replace = explode(' ',$replace);
$result = $string;
while (count($orig)>0)
{
$shift = array_shift($orig);
$rshift = array_shift($replace);
$pattern = "/$shift\s?(<.*>)?/";
$replacement = "$rshift$1";
$result = preg_replace($pattern,$replacement,$result);
}
$result .= implode(' ',$replace);
return $result;
}
Have fun! :d
Related
I have the following title formation on my website:
It's no use going back to yesterday, because at that time I was... Lewis Carroll
Always is: The phrase… (author).
I want to delete everything after the ellipsis (…), leaving only the sentence as the title. I thought of creating a function in php that would take the parts of the titles, throw them in an array and then I would work each part, identifying the only pattern I have in the title, which is the ellipsis… and then delete everything. But when I do that, in the X space of my array, it returns the following:
was...
In position 8 of the array comes the word and the ellipsis and I don't know how to find a pattern to delete the author of the title, my pattern was the ellipsis. Any idea?
<?php
$a = get_the_title(155571);
$search = '... ';
if(preg_match("/{$search}/i", $a)) {
echo 'true';
}
?>
I tried with the code above and found the ellipsis, but I needed to bring it into an array to delete the part I need. I tried something like this:
<?php
define('WP_USE_THEMES', false);
require('./wp-blog-header.php');
global $wpdb;
$title_array = explode(' ', get_the_title(155571));
$search = '... ';
if (array_key_exists("/{$search}/i",$title_array)) {
echo "true";
}
?>
I started doing it this way, but it doesn't work, any ideas?
Thanks,
If you use regex you need to escape the string as preg_quote() would do, because a dot belongs to the pattern.
But in your simple case, I would not use a regex and just search for the three dots from the end of the string.
Note: When the elipsis come from the browser, there's no way to detect in PHP.
$title = 'The phrase... (author).';
echo getPlainTitle($title);
function getPlainTitle(string $title) {
$rpos = strrpos($title, '...');
return ($rpos === false) ? $title : substr($title, 0, $rpos);
}
will output
The phrase
First of all, since you're working with regular expressions, you need to remember that . has a special meaning there: it means "any character". So /... / just means "any three characters followed by a space", which isn't what you want. To match a literal . you need to escape it as \.
Secondly, rather than searching or splitting, you could achieve what you want by replacing part of the string. For instance, you could find everything after the ellipsis, and replace it with an empty string. To do that you want a pattern of "dot dot dot followed by anything", where "anything" is spelled .*, so \.\.\..*
$title = preg_replace('/\.\.\..*/', '', $title);
I have a string that is formatted by symbols (symbols similar to those used to format questions on this site).
Rules:
**Hello** means bold = < b>Hello< /b>
*Hello\\ means bulleted list = < li>Hello< /li>
Hello\\ means line break = Hello< br>
I want to replace:
Every first occurence of ** with < b> and every second ** with < /b>.
The same for * with < li> and \\ with < /li>.
All \\ that occur without a * somewhere in the string before, should be converted to < br>.
Example string:
$myString = 'Hello my **Friend**,\\here is the stuff you need to buy for me:*knife\\*water bottle\\***fake ID**\\\\\\Thank you in advance and don not forget the **fake ID**!\\Sincerely yours\\Eddy'
Note: This style is not my invention. It is in use and I have to convert it.
I preg_match()-ed parts of it to get the stuff between the tags.
$myString = 'Hello my **Friend**,\\here is the stuff you need to buy for me:*knife\\*water bottle\\***fake ID**\\\\\\Thank you in advance and don not forget the **fake ID**!\\Sincerely yours\\Eddy';
$result = array();
$firstBold = '<b>'. preg_match('~\*\*(.*?)\*\*~', $myString, $firstBold) . </b>;
$result += $firstBold
// and so on...
(Ignore mistakes in this, its written from memory)
I didn't consider the words before the first bold, but it's basically the same.
This will get the job done at the end but it seems cumbersome to me. I am in search for a more elegant way to do this.
What is the best way to solve this in PHP?
You can use preg_replace. because of your markup your order of replacement will matter.
http://php.net/manual/en/function.preg-replace.php
$myString = preg_replace("/[*][*]([^*]+)[*][*]/",'<b>${1}</b>',$myString);
$myString = preg_replace("/[*]([^\/]+)[\/][\/]/",'<li>${1}</li>',$myString);
$myString = str_replace("//",'<br/>',$myString);
This function searches for words (from the $words array) inside a text and highlights them.
function highlightWords(Array $words, $text){ // Loop through array of words
foreach($words as $word){ // Highlight word inside original text
$text = str_replace($word, '<span class="highlighted">' . $word . '</span>', $text);
}
return $text; // Return modified text
}
Here is the problem:
Lets say the $words = array("car", "drive");
Is there a way for the function to highlight not only the word car, but also words which contain the letters "car" like: cars, carmania, etc.
Thank you!
What you want is a regular expression, preg_replace or peg_replace_callback more in particular (callback in your case would be recommended)
<?php
$searchString = "The car is driving in the carpark, he's not holding to the right lane.\n";
// define your word list
$toHighlight = array("car","lane");
Because you need a regular expression to search your words and you might want or need variation or changes over time, it's bad practice to hard code it into your search words. Hence it's best to walk over the array with array_map and transform the searchword into the proper regular expression (here just enclosing it with / and adding the "accept everything until punctuation" expression)
$searchFor = array_map('addRegEx',$toHighlight);
// add the regEx to each word, this way you can adapt it without having to correct it everywhere
function addRegEx($word){
return "/" . $word . '[^ ,\,,.,?,\.]*/';
}
Next you wish to replace the word you found with your highlighted version, which means you need a dynamic change: use preg_replace_callback instead of regular preg_replace so that it calls a function for every match it find and uses it to generate the proper result. Here we enclose the found word in its span tags
function highlight($word){
return "<span class='highlight'>$word[0]</span>";
}
$result = preg_replace_callback($searchFor,'highlight',$searchString);
print $result;
yields
The <span class='highlight'>car</span> is driving in the <span class='highlight'>carpark</span>, he's not holding to the right <span class='highlight'>lane</span>.
So just paste these code fragments after the other to get the working code, obviously. ;)
edit: the complete code below was altered a bit = placed in routines for easy use by original requester. + case insensitivity
complete code:
<?php
$searchString = "The car is driving in the carpark, he's not holding to the right lane.\n";
$toHighlight = array("car","lane");
$result = customHighlights($searchString,$toHighlight);
print $result;
// add the regEx to each word, this way you can adapt it without having to correct it everywhere
function addRegEx($word){
return "/" . $word . '[^ ,\,,.,?,\.]*/i';
}
function highlight($word){
return "<span class='highlight'>$word[0]</span>";
}
function customHighlights($searchString,$toHighlight){
// define your word list
$searchFor = array_map('addRegEx',$toHighlight);
$result = preg_replace_callback($searchFor,'highlight',$searchString);
return $result;
}
I haven't tested it, but I think this should do it:-
$text = preg_replace('/\W((^\W)?$word(^\W)?)\W/', '<span class="highlighted">' . $1 . '</span>', $text);
This looks for the string inside a complete bounded word and then puts the span around the whole lot using preg_replace and regular expressions.
function replace($format, $string, array $words)
{
foreach ($words as $word) {
$string = \preg_replace(
sprintf('#\b(?<string>[^\s]*%s[^\s]*)\b#i', \preg_quote($word, '#')),
\sprintf($format, '$1'), $string);
}
return $string;
}
// courtesy of http://slipsum.com/#.T8PmfdVuBcE
$string = "Now that we know who you are, I know who I am. I'm not a mistake! It
all makes sense! In a comic, you know how you can tell who the arch-villain's
going to be? He's the exact opposite of the hero. And most times they're friends,
like you and me! I should've known way back when... You know why, David? Because
of the kids. They called me Mr Glass.";
echo \replace('<span class="red">%s</span>', $string, [
'mistake',
'villain',
'when',
'Mr Glass',
]);
Sine it's using an sprintf format for the surrounding string, you can change your replacement accordingly.
Excuse the 5.4 syntax
This is a problem that I have figured out how to solve, but I want to solve it in a simpler way... I'm trying to improve as a programmer.
Have done my research and have failed to find an elegant solution to the following problem:
I have a hypothetical array of keywords to search for:
$keyword_array = array('he','heather');
and a hypothetical string:
$text = "What did he say to heather?";
And, finally, a hypothetical function:
function bold_keywords($text, $keyword_array)
{
$pattern = array();
$replace = array();
foreach($keyword_array as $keyword)
{
$pattern[] = "/($keyword)/is";
$replace[] = "<b>$1</b>";
}
$text = preg_replace($pattern, $replace, $text);
return $text;
}
The function (not too surprisingly) is returning something like this:
"What did <b>he</b> say to <b>he</b>ather?"
Because it is not recognizing "heather" when there is a bold tag in the middle of it.
What I want the final solution to do is, as simply as possible, return one of the two following strings:
"What did <b>he</b> say to <b>heather</b>?"
"What did <b>he</b> say to <b><b>he</b>ather</b>?"
Some final conditions:
--I would like the final solution to deal with a very large number of possible keywords
--I would like it to deal with the following two situations (lines represent overlapping strings):
One string engulfs the other, like the following two examples:
-- he, heather
-- sanding, and
Or one string does not engulf the other:
-- entrain, training
Possible way to solve:
-A regex that ignores tags in keywords
-Long way (that I am trying to avoid):
*Search string for all occurrences of each keyword, store an array of positions (start and end) of keywords to be bolded
*Process this array recursively to combine overlapping keywords, so there is no redundancy
*Add the bold tags (starting from the end of the string, to avoid the positions of information shifting from the additional characters)
Many thanks in advance!
Example
$keyword_array = array('he','heather');
$text = "What did he say to heather?";
$pattern = array();
$replace = array();
sort($keyword_array, SORT_NUMERIC);
foreach($keyword_array as $keyword)
{
$pattern[] = "/ ($keyword)/is";
$replace[] = " <b>$1</b>";
}
$text = preg_replace($pattern, $replace, $text);
echo $text; // What did <b>he</b> say to <b>heather</b>?
need to change your regex pattern to recognize that each "term" you are searching for is followed by whitespace or punctuation, so that it does not apply the pattern match to items followed by an alpha-numeric.
Simplistic and lazy-ish Approach off The Top of My head:
Sort your initial Array by Item length, descending! No more "Not recognized because there's already a Tag in The Middle" issues!
Edit: The nested tags issue is then easily fixed by extending your regex in a Way that >foo and foo< isn't being matched anymore.
We use custom bbcode in our news posts
[newsImage]imageName.jpg[/newsImage]
And i'd like to use preg_match to get the imageName.jpg from between those tags. The whole post is stored in a variable called $newsPost.
I'm new to regex and I just can't figure out the right expression to use in preg_match to get what I want.
Any help is appreciated. Also, do any of you know a good resource for learning what each of the characters in regex do?
preg_match_all('/\[newsImage\]([^\[]+)\[\/newsImage\]/i', $newsPost, $images);
The variable $images should then contain your list of matches.
http://www.php.net/manual/en/regexp.introduction.php
To answer your second question: A very good regex tutorial is regular-expressions.info.
Among other things, it also contains a regular expression syntax reference.
Since different regex flavors use a different syntax, you'll also want to look at the regex flavor comparison page.
As Rob said but escaping last ]
preg_match('/\[newsImage\]([^\[]+)\[newsImage\]/i', $newsPost, $images);
$images[1] will contain the name of image file.
This is not exactly what you asked for, but you can replace your [newsImage] tags with tags using the following code, its not perfect as it will fall down if you have an empty tag e.g. [newsImage][/newsImage]
function process_image_code($text) {
//regex looks for [newsImage]sometext[/newsImage]
$urlReg ="/((?:\[newsImage]{1}){1}.{1,}?(?:\[\/newsImage]){1})/i";
$pregResults = preg_split ($urlReg , $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$output = "";
//loop array to build the output string
for($i = 0; $i < sizeof($pregResults); $i++) {
//if the array item has a regex match process the result
if(preg_match($urlReg, $pregResults[$i]) ) {
$pregResults[$i] = preg_replace ("/(?:\[\/newsImage]){1}/i","\" alt=\"Image\" border=\"0\" />",$pregResults[$i] ,1);
// find if it has a http:// at the start of the image url
if(preg_match("/(?:\[newsImage]http:\/\/?){1}/i",$pregResults[$i])) {
$pregResults[$i] = preg_replace ("/(?:\[newsImage]?){1}/i","<img src=\"",$pregResults[$i] ,1);
}else {
$pregResults[$i] = preg_replace ("/(?:\[newsImage]?){1}/i","<img src=\"http://",$pregResults[$i] ,1);
}
$output .= $pregResults[$i];
}else {
$output .= $pregResults[$i];
}
}
return $output;
}