Removing double quotes from href - php

I have a html string and needs to remove double quote from href of anchor tag.
$content = '<p style="abc" rel="blah blah"> Hello I am p </p> ';
should return
$content = '<p style="abc" rel="blah blah"> Hello I am p </p> ';
I have tried
preg_replace('/<a\s+[^>]*href\s*=\s*"([^"]+)"[^>]*>/', '<a href="\1">', $content)
but this removes all attributes from anchor tag except for href. Unable to find out something that can actually works inside href
Looking for some php code for the same.

You may try:
(<a href=".*?)"(.*?)"(.*)
Explanation of the above regex:
(<a href=".*?) - Represents first capturing group capturing capturing everything before the first ". Notice I used lazy matching which facilitates this task.
" - Matches " literally.
(.*?) - Represents second capturing group capturing data xyz&123 which is in between ".
(.*) - Represents 3rd capturing group which captures everything after the ".
$1\'$2\'$3 - For the replacement part; use the captured groups along with single quotes.
You can find the demo of the above regex in here.
Sample Implementation inf php:
<?php
$re = '/(<a href=".*?)"(.*?)"(.*)/m';
$str = '<p style="abc" rel="blah blah"> Hello I am p </p> ';
$subst = '$1\'$2\'$3';
$result = preg_replace($re, $subst, $str);
echo $result;
You can find the sample run of the above code in here.

I have tried preg_replace('/<a\s+[^>]*href\s*=\s*"([^"]+)"[^>]*>/', '<a href="\1">', $content) regex. but this removes all attributes from anchor tag except for href.
Maybe be a bit more generic - and leave all that <a ...> stuff out of the equation to begin with?
Not too many HTML elements have a href attribute to begin with - and even if you encountered a different one with such a href value, it would not make sense there either, so it would need replacing as well anyway.
#href="(\S+)"# as a greedy pattern looking for & capturing the longest possible non-whitespace string between href=" and ".
That gives href="https://example.com/abc?name="xyz&123"" as the full match, and just the https://example.com/abc?name="xyz&123" as the partial one.
Let’s feed the latter into str_replace to get rid of the ", using preg_replace:
$content = preg_replace_callback('#href="(\S+)"#', function($m) {
return 'href="'.str_replace('"', '', $m[1]).'"';
}, $content);

Related

PHP | Replicate specific word excluding the title attribute

I'm trying to replace the word "custom" and replicate it with <span> custom </span>.
With the str_replace () function it works but this also replaces it in the title attribute and I don't want this to happen because the span tag inside the title is an error.
How can I replace the word "custom" without touching the title attribute?
This is my code:
$oldText = "custom";
$newText = "<span>custom</span>";
$string = "<a href='#' title='Products custom'>Products custom</a>";
str_ireplace($oldText, $newText,$string);
This is just one example.
The word custom can also be placed in the middle of a string or at the beginning...
Thanks
You'll probably have to use PHP's DOM parser to do that. Writting a regular expression to solve it will just not work for all cases.
A) With DOM
I would start off with this Stackoverflow answer and then change it a bit to accomplish what you want to do. As you are replacing custom by <span>custom</span> you'll be creating a new DOM element. Replacing the text content won't work because <span> will be escaped and replaced by <span>.
So I would do this:
use preg_match_all() with a pattern such as /\bcustom\b/ to get all the offsets of the found items in the text:
// Search for the word, but delimited by word boundaries to
// avoid matching 'custom' in 'customization' or 'customer'.
$pattern = '/\b' . preg_quote($word_to_search) . '\b/';
if (preg_match_all($pattern, $child->wholeText, $matches, PREG_SET_ORDER | PREG_OFFSET_CAPTURE)) {
var_export($matches);
}
convert these offsets in bytes to offsets in chars (this is because UTF-8 can have chars of 1 or n bytes):
function char_offset($string, $byte_offset, $encoding = null)
{
$substr = substr($string, 0, $byte_offset);
return mb_strlen($substr, $encoding ?: mb_internal_encoding());
}
use DOMText::splitText() to split the text nodes into two text nodes with the offset in char unit.
create a <span> element with DOMDocument::createElement()
$new_text = 'custom'; // or whatever.
$spanElement = $domNode->ownerDocument->createElement('span', $new_text);
insert this span element before the second text node with DOMNode::insertBefore()
correct the second text node to remove the custom word at the beginning.
B) With a regex
But if your case is always in a <a> tag then you could have a go with something like this: https://regex101.com/r/ksPqxe/1
For the regex explanation, look at the description on the right column. You could remove the i flag for case-insensitive, if needed. The s flag is used so that the . also matches new lines. I had to use the ungreedy search with .*? instead of .*. So in the end I used the U for Ungreedy flag and then used .*.
This solution will not handle the case of several custom words in the link. But you'll probably only have it once. If you need that then use one regex to get the text content of the link and then a second one to replace all instances of custom by <span>custom</span>.
<?php
$pattern = '/(<a[^>]*>.*)\bcustom\b(.*<\/a>)/isU';
// Or without the ungreedy flag:
//$pattern = '/(<a[^>]*>.*?)\bcustom\b(.*?<\/a>)/is';
$substitution = '$1<span>custom</span>$2';
$inputs = [
"<a href='#' title='Products custom'>Products custom</a>",
'Custom stuff',
'<a href=\"https://www.customer.com\" title=\"customer"
data-type="custom">Customer stuff</a>',
'customize it!',
];
$results = [];
foreach ($inputs as $input) {
$result = preg_replace($pattern, $substitution, $input);
$results[] = "$input\n$result\n";
}
print implode(str_repeat('-', 80) . "\n", $results);
Output:
<a href='#' title='Products custom'>Products custom</a>
<a href='#' title='Products custom'>Products <span>custom</span></a>
--------------------------------------------------------------------------------
Custom stuff
<span>custom</span> stuff
--------------------------------------------------------------------------------
<a href=\"https://www.customer.com\" title=\"customer"
data-type="custom">Customer stuff</a>
<a href=\"https://www.customer.com\" title=\"customer"
data-type="custom">Customer stuff</a>
--------------------------------------------------------------------------------
customize it!
customize it!

Search and replace each unique word that begins with # symbol in string, even if they're similar

I want to replace all occurrences in the string starting with #. If i use str_replace everything works fine until the usernames becomes similar. I need something to replace the exact unique words in full, without affecting other similar words. Example #johnny and #johnnys would be problematic. Maybe regex could help?
function myMentions($str){
$str = "Hello #johnny, how is #johnnys doing?"; //let's say this is our param
$regex = "~(#\w+)~"; //my regex to extract all words beginning with #
if(preg_match_all($regex, $str, $matches, PREG_PATTERN_ORDER)){
foreach($matches[1] as $matches){ //iterate over match results
$link = "<a href='www.google.com'>$matches</a>"; //wrap my matches in links
$str = str_replace($matches,$link,$str); //replace matches with links
}
}
return $str;
}
Output should be: Hello <a href=''>#johnny</a>, how is <a href=''>#johnnys</a> doing?
Instead i am getting: Hello <a href=''>#johnny</a>, how is <a href=''>#johnny</a> s doing?
(NOTE: The extra "s" on #johnnys isn't wrap)
It doesn't recognize that #johnny and #johnnys are two different words, so str_replace both words with in one go. Basically the function is taking one word and replacing all similar words at once.
Your code is unnecessarily complex, you just need a mere preg_replace:
function myMentions($str){
return preg_replace("~#\w+~", "<a href='www.google.com'>\$0</a>", $str);
}
$str = "Hello #johnny, how is #johnnys doing?";
echo myMentions($str);
// => Hello <a href='www.google.com'>#johnny</a>, how is <a href='www.google.com'>#johnnys</a> doing?
See the PHP demo.
The preg_replace("~#\w+~", "<a href='www.google.com'>\$0</a>", $str) matches all non-overlapping occurrences of # + 1 or more word chars, and wraps them with <a href='www.google.com'> and </a> texts. Note the $0 is a backreference to the whole match.

Php select from string

Hi I'm new to php and I need a little help
I need to change the text that is between ** in php string and put it between html tag
$text = "this is an *example*";
But I really don't know how and i need help
personally I would use explode, you can then piece the sentence back together if the example appears in the middle of a sentence
<?php
$text = "this is an *example*";
$pieces = explode("*", $text);
echo $pieces[0];
?>
Edit:
Since you're looking for what basically amounts to custom BB Code use this
$text = "this is an *example*";
$find = '~[\*](.*?)[\*]~s';
$replace = '<span style="color: green">$1</span>';
echo preg_replace($find,$replace,$text);
You can add this to a function and have it parse any text that gets passed to it, you can also make the find and replace variables into arrays and add more codes to it
You really should use a DOM parser for things like this, but if you can guaratee it will always be the * character you can use some regex:
$text = "this is an *example*";
$regex = '/(?<=\*)(.*?)(?=\*)/';
$replacement = 'ostrich';
$new_text = preg_replace($regex, $replacement, $text);
echo $new_text;
Returns
this is an *ostrich*
Here is how the regex works:
Positive Lookbehind (?<=\*)
\* matches the character * literally (case sensitive)
1st Capturing Group (.*?)
.*? matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Positive Lookahead (?=\*)
\* matches the character * literally (case sensitive)
This regex essentially starts and ends by looking at what is ahead of and behind the search character you specified and leaves those characters intact during the replacement with preg_replace().

Regular expression to extract data from a colum and put it in another column

I have a MySQL table with few columns.
Column 1 contains html code:
<p style="xxx"><img src="path/to/file.png(or jpg)"></p>
I want to extract the src (path/file.xxx) to column 2, and then remove the whole P tag from column 1.
I tried few techniques like
preg_match('/\< *[img][^\>]*[src] *= *[\"\']{0,1}([^\"\']*)/i', $row->image, $matches);
But nothing seems to work.
Anything simple and light to use?
[] represents a character set, not a sequence of a characters.
$html = '<p style="xxx"><img src="path/to/file.png(or jpg)"></p>';
preg_match('/<img src="([^"]*)">/', $html, $m);
echo $m[0] . "\n";
echo $m[1] . "\n";
outputs:
<img src="path/to/file.png(or jpg)">
path/to/file.png(or jpg)
With all the disclaimers about using regex to handle html, you can use a simple preg_replace:
$replaced = preg_replace('~<p[^>]*><img src="([^"]+)"></p>~', '$1', $yourstring);
Explanation
<p matches the beginning of the tag
[^>]* matches any chars that are not a >
><img src=" matches literal characters
The parentheses in ([^"]+) capture any chars that are not a " to Group 1 (this is what you want)
"></p> matches chars to Group 1
We replace the whole string with $1, which is a backreference to the content captured by Group 1

Having trouble with preg_split and using regular expressions

I'm having a tough time learning regex and preg_split.
I'm trying to apply what i've learned and can't seem to get a simple search going..
I've tried many variations, but can't separate between the bold tags, and only bold tags
<?php
$string = "<b>this is</b> <i>not</b> <b>bold</b>";
$find = '/<b>/'; // works as expected, separating at <b>
$find = '/<b>|<\/b>/'; // works as expected, separating at either <b> or </b>
$find = '/<b>*<\/b>/'; // why doesn't this work?
$find = '/^<b>*<\/b>/'; // why doesn't this work?
$find = '/<b>.<\/b>/'; // why doesn't this work
$result = preg_split($find, $string);
print_r($result);
?>
As you can see, i'm trying to incorporate the . + or start ^/ finish $ characters.
What am I doing so very wrong where it isn't working the way I expected?
Thanks for all your help!
p.s. found this which is very helpful too
The first two "why doesn't this work" are matching <b followed by zero or more > characters, followed by </b>. The last one matches <b> then any single character then </b>.
I'm not sure what you're trying to do exactly, but this would split on start and end bold tags: <\/?b> - it matches <, followed by an optional /, followed by b>.
$find = '/<b>*<\/b>/'; // why doesn't this work?
Matches "<b", zero or more ">", followed by "</b>".
Perhaps you meant this:
$find = '/<b>.*?<\/b>/';
That would match "<b>", followed by a string of unknown length, ending at the first occurrence of "</b>". I'm not sure why you would split on that though; applied on the above you would get an array of three elements:
" "
"<i>not</b> "
""
To match everything inside "<b>" and "</b>" you need preg_match_all():
preg_match_all('#<b>(.*?)</b>#i', $str, $matches);
// $matches[1] will contain the patterns inside the bold tag, theoratically
Do note that nested tags are not a great fit for regular expressions and you'd be wanting to use DOMDocument.
$find = '/^<b>*<\/b>/'; // why doesn't this work?
Matches "<b" at the start of the string, zero or more ">", followed by "</b>".
$find = '/<b>.<\/b>/'; // why doesn't this work
Matches "<b>", followed by any character, followed by "</b>".

Categories