PHP | Replicate specific word excluding the title attribute

PHP | Replicate specific word excluding the title attribute - php

I'm trying to replace the word "custom" and replicate it with <span> custom </span>.
With the str_replace () function it works but this also replaces it in the title attribute and I don't want this to happen because the span tag inside the title is an error.
How can I replace the word "custom" without touching the title attribute?
This is my code:
$oldText = "custom";
$newText = "<span>custom</span>";
$string = "<a href='#' title='Products custom'>Products custom</a>";
str_ireplace($oldText, $newText,$string);
This is just one example.
The word custom can also be placed in the middle of a string or at the beginning...
Thanks

You'll probably have to use PHP's DOM parser to do that. Writting a regular expression to solve it will just not work for all cases.
A) With DOM
I would start off with this Stackoverflow answer and then change it a bit to accomplish what you want to do. As you are replacing custom by <span>custom</span> you'll be creating a new DOM element. Replacing the text content won't work because <span> will be escaped and replaced by <span>.
So I would do this:
use preg_match_all() with a pattern such as /\bcustom\b/ to get all the offsets of the found items in the text:
// Search for the word, but delimited by word boundaries to
// avoid matching 'custom' in 'customization' or 'customer'.
$pattern = '/\b' . preg_quote($word_to_search) . '\b/';
if (preg_match_all($pattern, $child->wholeText, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE)) {
var_export($matches);
}
convert these offsets in bytes to offsets in chars (this is because UTF-8 can have chars of 1 or n bytes):
function char_offset($string, $byte_offset, $encoding = null)
{
$substr = substr($string, 0, $byte_offset);
return mb_strlen($substr, $encoding ?: mb_internal_encoding());
}
use DOMText::splitText() to split the text nodes into two text nodes with the offset in char unit.
create a <span> element with DOMDocument::createElement()
$new_text = 'custom'; // or whatever.
$spanElement = $domNode->ownerDocument->createElement('span', $new_text);
insert this span element before the second text node with DOMNode::insertBefore()
correct the second text node to remove the custom word at the beginning.
B) With a regex
But if your case is always in a <a> tag then you could have a go with something like this: https://regex101.com/r/ksPqxe/1
For the regex explanation, look at the description on the right column. You could remove the i flag for case-insensitive, if needed. The s flag is used so that the . also matches new lines. I had to use the ungreedy search with .*? instead of .*. So in the end I used the U for Ungreedy flag and then used .*.
This solution will not handle the case of several custom words in the link. But you'll probably only have it once. If you need that then use one regex to get the text content of the link and then a second one to replace all instances of custom by <span>custom</span>.
<?php
$pattern = '/(<a[^>]*>.*)\bcustom\b(.*<\/a>)/isU';
// Or without the ungreedy flag:
//$pattern = '/(<a[^>]*>.*?)\bcustom\b(.*?<\/a>)/is';
$substitution = '$1<span>custom</span>$2';
$inputs = [
"<a href='#' title='Products custom'>Products custom</a>",
'Custom stuff',
'<a href=\"https://www.customer.com\" title=\"customer"
data-type="custom">Customer stuff</a>',
'customize it!',
];
$results = [];
foreach ($inputs as $input) {
$result = preg_replace($pattern, $substitution, $input);
$results[] = "$input\n$result\n";
}
print implode(str_repeat('-', 80) . "\n", $results);
Output:
<a href='#' title='Products custom'>Products custom</a>
<a href='#' title='Products custom'>Products <span>custom</span></a>
--------------------------------------------------------------------------------
Custom stuff
<span>custom</span> stuff
--------------------------------------------------------------------------------
<a href=\"https://www.customer.com\" title=\"customer"
data-type="custom">Customer stuff</a>
<a href=\"https://www.customer.com\" title=\"customer"
data-type="custom">Customer stuff</a>
--------------------------------------------------------------------------------
customize it!
customize it!

Related

Removing double quotes from href

I have a html string and needs to remove double quote from href of anchor tag.
$content = '<p style="abc" rel="blah blah"> Hello I am p </p> ';
should return
$content = '<p style="abc" rel="blah blah"> Hello I am p </p> ';
I have tried
preg_replace('/<a\s+[^>]*href\s*=\s*"([^"]+)"[^>]*>/', '<a href="\1">', $content)
but this removes all attributes from anchor tag except for href. Unable to find out something that can actually works inside href
Looking for some php code for the same.

You may try:
(<a href=".*?)"(.*?)"(.*)
Explanation of the above regex:
(<a href=".*?) - Represents first capturing group capturing capturing everything before the first ". Notice I used lazy matching which facilitates this task.
" - Matches " literally.
(.*?) - Represents second capturing group capturing data xyz&123 which is in between ".
(.*) - Represents 3rd capturing group which captures everything after the ".
$1\'$2\'$3 - For the replacement part; use the captured groups along with single quotes.
You can find the demo of the above regex in here.
Sample Implementation inf php:
<?php
$re = '/(<a href=".*?)"(.*?)"(.*)/m';
$str = '<p style="abc" rel="blah blah"> Hello I am p </p> ';
$subst = '$1\'$2\'$3';
$result = preg_replace($re, $subst, $str);
echo $result;
You can find the sample run of the above code in here.

I have tried preg_replace('/<a\s+[^>]*href\s*=\s*"([^"]+)"[^>]*>/', '<a href="\1">', $content) regex. but this removes all attributes from anchor tag except for href.
Maybe be a bit more generic - and leave all that <a ...> stuff out of the equation to begin with?
Not too many HTML elements have a href attribute to begin with - and even if you encountered a different one with such a href value, it would not make sense there either, so it would need replacing as well anyway.
#href="(\S+)"# as a greedy pattern looking for & capturing the longest possible non-whitespace string between href=" and ".
That gives href="https://example.com/abc?name="xyz&123"" as the full match, and just the https://example.com/abc?name="xyz&123" as the partial one.
Let’s feed the latter into str_replace to get rid of the ", using preg_replace:
$content = preg_replace_callback('#href="(\S+)"#', function($m) {
return 'href="'.str_replace('"', '', $m[1]).'"';
}, $content);

Search and replace each unique word that begins with # symbol in string, even if they're similar

I want to replace all occurrences in the string starting with #. If i use str_replace everything works fine until the usernames becomes similar. I need something to replace the exact unique words in full, without affecting other similar words. Example #johnny and #johnnys would be problematic. Maybe regex could help?
function myMentions($str){
$str = "Hello #johnny, how is #johnnys doing?"; //let's say this is our param
$regex = "~(#\w+)~"; //my regex to extract all words beginning with #
if(preg_match_all($regex, $str, $matches, PREG_PATTERN_ORDER)){
foreach($matches[1] as $matches){ //iterate over match results
$link = "<a href='www.google.com'>$matches</a>"; //wrap my matches in links
$str = str_replace($matches,$link,$str); //replace matches with links
}
}
return $str;
}
Output should be: Hello <a href=''>#johnny</a>, how is <a href=''>#johnnys</a> doing?
Instead i am getting: Hello <a href=''>#johnny</a>, how is <a href=''>#johnny</a> s doing?
(NOTE: The extra "s" on #johnnys isn't wrap)
It doesn't recognize that #johnny and #johnnys are two different words, so str_replace both words with in one go. Basically the function is taking one word and replacing all similar words at once.

Your code is unnecessarily complex, you just need a mere preg_replace:
function myMentions($str){
return preg_replace("~#\w+~", "<a href='www.google.com'>\$0</a>", $str);
}
$str = "Hello #johnny, how is #johnnys doing?";
echo myMentions($str);
// => Hello <a href='www.google.com'>#johnny</a>, how is <a href='www.google.com'>#johnnys</a> doing?
See the PHP demo.
The preg_replace("~#\w+~", "<a href='www.google.com'>\$0</a>", $str) matches all non-overlapping occurrences of # + 1 or more word chars, and wraps them with <a href='www.google.com'> and </a> texts. Note the $0 is a backreference to the whole match.

preg_match_all and foreach only replacing last match

I have the following code, which should make plain text links clickable. However, if there are several links, it only replaces the last one.
Code:
$nc = preg_match_all('#<pre[\s\S]*</pre>#U', $postbits, $matches_code);
foreach($matches_code[0] AS $match_code)
{
$match = null;
$matches = null;
$url_regex = '#https?://(\w*:\w*#)?[-\w.]+(:\d+)?(/([\w/_.]*(\?\S+)?)?)?[^<\.,:;"\'\s]+#';
$n = preg_match_all($url_regex, $match_code, $matches);
foreach($matches[0] AS $match)
{
$html_url = '' . $match . '';
$match_string = str_replace($match, $html_url, $match_code);
}
$postbits = str_replace($match_code, $match_string, $postbits);
}
Result:
http://www.google.com
http://www.yahoo.com
http://www.microsoft.com/ <-- only this one is clickable
Expected result:
http://www.google.com
http://www.microsoft.com/
Where is my error?

if there are several links it only replaces the last one
Where is my error?
Actually, it's replacing all 3 links, but it replaces the original string each time.
foreach($matches[0] AS $match)
{
$html_url = '' . $match . '';
$match_string = str_replace($match, $html_url, $match_code);
}
The loop is executed 3 times, each time it replaces 1 link in $match_code and assigns the result to $match_string. On the first iteration, $match_string is assigned the result with a clickable google.com. On the second iteration, $match_string is assigned with a clickable yahoo.com. However, you've just replaced the original string, so google.com is not clickable now. That's why you only get your last link as a result.
There are a couple of things you may also want to correct in your code:
The regex #<pre[\s\S]*</pre>#U is better constructed as #<pre.*</pre>#Us. The class [\s\S]* is normally used in JavaScript, where there is no s flag to allow dots matching newlines.
I don't get why you're using that pattern to match URLs. I think you could simply use https?://\S+. I'll also link you to some alternatives here.
You're using 2 preg_match_all() calls and 1 str_replace() call for the same text, where you could wrap it up in 1 preg_replace().
Code
$postbits = "
<pre>
http://www.google.com
http://w...content-available-to-author-only...o.com
http://www.microsoft.com/ <-- only this one clickable
</pre>";
$regex = '#\G((?:(?!\A)|.*<pre)(?:(?!</pre>).)*)(https?://\S+?)#isU';
$repl = '\1\2';
$postbits = preg_replace( $regex, $repl, $postbits);
ideone demo
Regex
\G Always from the first matching position in the subject.
Group 1
(?:(?!\A)|.*<pre) Matches the first <pre tag from the beggining of the string, or allows to get the next <pre tag if no more URLs found in this tag.
(?:(?!</pre>).)*) Consumes any chars inside a <pre> tag.
Group 2
(https?://\S+?) Matches 1 URL.

Function to find marked-up text and replace it with code tags

i made this function to find specific sets of characters in a text string and convert them to html tags:
function ccfc($content)
{
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
// $code_block = preg_replace($reg_exUrl, "{$url[0]} ", $content);
if(preg_match($reg_exUrl, $content, $url)) {
// make the urls hyper links
$content = preg_replace($reg_exUrl, "{$url[0]} ", $content);
} else {
// if no urls in the text just return the text
$content = $content;
}
$code_block = preg_replace_callback(
'/([\`]{3})(.*?)([\`]{3})/s',
function($matches) {
$matches[2] = htmlentities($matches[2]);
return '<pre><code>'. $matches[2] .'</code></pre>';
},
$content);
$bold = preg_replace_callback(
'/([\*]{2})(.*?)([\*]{2})/s',
function($matches) {
$matches[2] = htmlentities($matches[2]);
return '<b>'. $matches[2] .'</b>';
},
$code_block);
$italic = preg_replace_callback(
'/([\*]{1})(.*?)([\*]{1})/s',
function($matches) {
$matches[2] = htmlentities($matches[2]);
return '<i>'. $matches[2] .'</i>';
},
$bold);
return $italic;
}
This function will find the urls like http://www.google.com and convert them to links
the second will find the ``` code content ``` and convert it to <pre><code> code content </code></pre>
The third will find the ** content ** and convert to <b> content </b>
The fourth will find the * content * and convert it to <i> content </i>
but if the code is written outside the ``` ``` it is executed. How can I make the remaining text use htmlentities()?

Rather than calling htmlentities after running the text through your converter functions, call it before you do the converting:
function ccfc($content) {
$content = htmlentities($content);
This won't affect the entities involved in the markup (* and `), and you can also set the double_encode flag to false ensure that already-encoded content (e.g. & characters in links) does not get encoded twice -- see the PHP manual for the settings:
$content = htmlentities($content, ENT_QUOTES, UTF-8, false);
This setting will treat the text as UTF-8, encode all quotes, but will not double encode a link like http://example.com?p=1&q=2.
On another note, you don't need to use preg_replace_callback for your substitutions; you can use the captured text in the replacement expression. Here's an example from the code formatting regex:
$code_block = preg_replace(
'/`{3}(.*?)`{3}/s',
"<pre><code>$1</code></pre>",
$content);
As noted in my comment, <b> and <i> are deprecated; if you are using them to place emphasis on text, you could replace them with <strong> and <em> respectively; if the markup is solely for presentation, it is better to enclose the text in a <span> element and give it a class which has bold or italic formatting.
Here is the full code with the htmlentities moved and preg_replace replacements:
function ccfc($content)
{ $content = htmlentities($content, ENT_QUOTES, NULL, false);
echo $content . PHP_EOL;
$reg_exUrl = "/((http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/?\S*)?)/";
// $code_block = preg_replace($reg_exUrl, "{$url[0]} ", $content);
// make the urls hyperlinks
$content = preg_replace($reg_exUrl, "<a href='$1'>$1</a>", $content);
# replace ``` with code blocks
$content = preg_replace(
'/`{3}(.*?)`{3}/s',
"<pre><code>$1</code></pre>",
$content);
# replace **text** with strong text
$content = preg_replace(
'/\*{2}([^\*].*?)\*{2}/s',
"<strong>$1</strong>",
$content);
# replace *text* with em text
$content = preg_replace(
'/\*(.*?)\*/s',
"<em>$1</em>",
$content);
return $content;
}
A quick explanation of how preg_replace works: when you use parentheses in a regular expression, you capture the matter within those parentheses to the special variables $1, $2, $3, etc. The contents of the first set of parentheses are in $1, the contents of the second set in $2, and so on. For example, take this regular expression:
/(\w+) and (\w+)/
and the input string bread and butter, bread matches the expression in the first set of parens and butter matches the expression in the second set; $1 would be set to bread and $2 to butter. This becomes useful when we do preg_replace, as we can use $1 and $2 in the replacement string:
$str = preg_replace("/(\w+) and (\w+)/", "I love $2 on $1", "bread and butter");
echo $str;
Output:
I love butter on bread
Anything that is in the match string but isn't captured will disappear -- like the and in this example.
In the replacements in your code, the text between the delimiters (* and `) needs to be kept, so it is captured in parentheses; the delimiters themselves are not needed, so they aren't in parentheses.
Explanation of the other characters in the regexs:
?, *, +, {2} : these are quantifiers - they dictate the number of times the preceding pattern should appear. ? means 0 or 1 times; * is 0 or more times; + is one or more times; {2} means twice; {500} would mean 500 times.
\w represents any number, letter, or _
. matches any character
.*? matches a string of any length, including length 0.
\** would match 0 or more * characters; to match *, you have to escape it (i.e. \*) so that the regex engine doesn't interpret it as a quantifier

How to replace new lines by regular expressions

How can I set any quantity of new lines with a regular expression?
$var = "<p>some text</p><p>another text</p><p>more text</p>";
$search = array("</p>\s<p>");
$replace = array("</p><p>");
$var = str_replace($search, $replace, $var);
I need to remove every new line (\n), not <br/>, between two paragraphs.

To begin with, str_replace() (which you referenced in your original question) is used to find a literal string and replace it. preg_replace() is used to find something that matches a regular expression and replace it.
In the following code sample I use \s+ to find one or more occurrences of white space (new line, tab, space...). \s is whitespace, and the + modifier means one or more of the previous thing.
<?php
// Test string with white space and line breaks between paragraphs
$var = "<p>some text</p> <p>another text</p>
<p>more text</p>";
// Regex - Use ! as end holders, so that you don't have to escape the
// forward slash in '</p>'. This regex looks for an end P then one or more (+)
// whitespaces, then a begin P. i refers to case insensitive search.
$search = '!</p>\s+<p>!i';
// We replace the matched regex with an end P followed by a begin P w no
// whitespace in between.
$replace = '</p><p>';
// echo to test or use '=' to store the results in a variable.
// preg_replace returns a string in this case.
echo preg_replace($search, $replace, $var);
?>
Live Example

I find it odd to have huge HTML strings, and then using some string search and replace hack to format that afterwards...
When constructing HTML with PHP, I like using arrays:
$htmlArr = array();
foreach ($dataSet as $index => $data) {
$htmlArr[] = '<p>Line#'.$index.' : <span>' . $data . '</span></p>';
}
$html = implode("\n", $htmlArr);
This way, every HTML line has its separate $htmlArr[] value. Moreover, if you need your HTML to be "pretty print", you can simply have some sort of method that will indent your HTML by prepending whitespaces at the beginning of every array elements depending on some rule set. For example, if we have:
$htmlArr = array(
'<ol>',
'<li>Item 1</li>',
'<li>Item 2</li>',
'<li>Item 3</li>',
'</ol>'
);
Then the formatting function algorithm would be (a very simple one, considering that the HTML is well constructed):
$indent = 0; // Initial indent
foreach & $value in $array
$open = Count how many opened elements
$closed = Count how many closed elements
$value = str_repeat(' ', $indent * TAB_SPACE) . $value;
$indent += $open - $closed; // Next line's indent
end foreach
return $array
Then implode("\n", $array) for the prettyfied HTML
After the question edit by Felix Kling, I realize that this has nothing to do with the question. Sorry about that :) Thanks though for the clarification.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP | Replicate specific word excluding the title attribute - php

Related

Removing double quotes from href

Search and replace each unique word that begins with # symbol in string, even if they're similar

preg_match_all and foreach only replacing last match

Function to find marked-up text and replace it with code tags

How to replace new lines by regular expressions

Categories

Resources