Okay I might not be going about this the right way, but here goes..
I have this string that takes a link and extracts the text between the tags...
$string = $item;
$pattern = '/\<a([^>]*)\>([^<]*)\<\/a\>/i';
$replacement = '$2';
$message = preg_replace($pattern, $replacement, $string);
There are a few items in this string that have ampersands (in the text portion, not the tag portion), however most don't. I'm trying to figure out a way to either incorporate the ampersand into the current pattern or do another preg_replace on the $message to remove the ampersand after the tags are striped away.
THANKS!
There's always $message = str_replace('&', '', $message);
Incidentally, if you are trying to strip tags from html input, there is also strip_tags
for example, if your input is
$text = 'Text';
Then strip_tags($text) will produce Text.
Do you want to remove everything after the ampersand? Then it's
'/\<a([^>]*)\>([^<&]*)[^<]*\<\/a\>/i';
Otherwise, you'll need a 2nd operation.
BTW: Your regex will also match other tags starting with <a, such as the <author> or the <audio> tag.
Related
Using following code to convert URLs within a text into real HTML links:
$string = preg_replace("!(http|https|ftp|ftps)://([.]?[&;%#:=a-zA-Z0-9_/?-])*!", "\\0", $string);
$string = preg_replace("!(^| |\n)(www([.]?[&;%#:=a-zA-Z0-9_/?-])+)!", "\\1\\2", $string);
Now there is a problem with URLs containing the # symbol (and perhaps other special chars too) and I would need the urlencode() function somehow within this preg_replace() function.
Any ideas?
How can I replace contents of a tag with it's link
$str = 'This <strong>string</strong> contains a local link
and a remote link';
$str = strip_tags($str,'<a>'); // strip out the <strong> tag
$str = ?????? // how can I strip out the local link anchor tag, but leave the remote link?
echo $str;
Desired output:
This string contains a local link and a remote link
Or, better yet, replace contents of remote link with its url:
This string contains a local link and a http://remo.te/link.com
How can I achieve the final output?
To replace your remotely linked anchor with the URL:
.*?
$1
To remove the anchor around a local URL:
(.*?)
$1
Explanation:
Both expressions match , and literally. The first one will then match a remote URL (http, optional s, :// and everything up to the closing ") in a capture group that we can reference with $1. The second expression will match anything that does not start with the protocol used previously, and then capture the actual text of the link into $1.
Please note that regular expressions aren't the best solution to parsing HTML, since HTML is not a regular language. However, it seems like your use case is "simple" enough that we can make a regular expression. This will not work with links like , but it can be expanded on to allow for these use cases (hence my previous note of HTML not being regular).
PHP
$str = 'This <strong>string</strong> contains a local link and a remote link';
$str = strip_tags($str,'<a>');
$str = preg_replace('~<a href="(https?://[^"]+)".*?>.*?</a>~', '$1', $str);
$str = preg_replace('~(.*?)~', '$1', $str);
echo $str;
// This string contains a local link and a http://remo.te/link.com
Note: HTML is not a regular language and can't be realiably parsed using a regular expression. Use a DOM parser instead.
However, if you're absolutely sure of the format, you can use a regex. The whole task just needs to be split into two steps:
/* Replace relative URIs with their anchor text */
$str = preg_replace('#<a[^>]*href="(?=/)[^"]+">([^<]+)</a>#', '$1', $str);
/* Replace absolute URIs with their href */
$str = preg_replace('#<a[^>]*href="((?!/)[^"]+)">[^<]+</a>#', '$1', $str);
Of course, this would fail if one of the attribute values contain a >. Using a DOM parser would be the right solution if you care about those corner cases.
Output:
This string contains a local link
and a http://remo.te/link.com
Demo
This is posible by using class DOMDocument
ex:
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadHTML($str);
and further processing of links for the method:
$doc->getElementsByTagName('a')
Here's how I solved it:
$str = 'This <strong>string</strong> contains a local link and a remote link';
$str = preg_replace('/<a [^>]*?href="(http:\/\/[A-Za-z0-9\\.:\/]+?)">([\\s\\S]*?)<\/a>/','\\1', $str); // strip remote links and replace with href
$str = strip_tags($str); // strip any local links
echo $str;
Result:
This string contains a local link and a http://remo.te/link.com
If this string not dynamically created, and you known about data href, you can try
$str = 'This <strong>string</strong> contains a local link
and a remote link';
$str = str_replace(array('', ''), ' ' , $str);
$str = strip_tags($str,'<a>'); // strip out the <strong> tag
echo $str;
Result:
This string contains a local link and a remote link
simple html dom might just be your best bet here:
$doc = str_get_html($html);
foreach($doc->find('a') as $a){
$a->outertext = preg_match('/^http/', $a->href) ? $a->href : $a->text();
}
echo $doc;
For my occasion, I needed something that replaces the anchor tag, but keeps the link AND the inner text of the anchor tag. I've therefore modified #Sam's solution and added an extra matching group for the inner text.
$text = strip_tags($html,'<a>');
$text = preg_replace('~<a href="(https?://[^"]+)".*?>(.*?)</a>~', '$2 ($1)', $text);
For <a href="https://stackoverflow.com">Stackoverflow<a> the above code would output Stackoverflow (https://stackoverflow.com).
So basically I have a large sting (few paragraphs long).
I need to remove all text from this string that is not surrounded by any HTML tags.
For example, this string:
<h1>This is the title</h1>This is a bit of text with no HTML around it<p>This is within a paragraph tag</p>
Should be converted to:
<h1>This is the title</h1><p>This is within a paragraph tag</p>
I believe this is best done with regex, although I am not very familiar with it's synax.
All help is greatly appreciated.
This is what I ended up using:
<?php
$string = '<h1>This is the title</h1>This is a bit of text with no HTML around it<p>This is within a paragraph tag</p>';
$pattern = '/(<\/[^>]+>)[^<]*(<[^>]+>)/';
$replacement = '$1$2';
echo preg_replace($pattern, $replacement, $string);
?>
you could use this regex (<\/[^>]+>)[^<]*(<[^>]+>) and replace with $1$2
live demo
I want to remove the space between two words with a regex, however this does not seem to work.
$pattern = "#\<a href=\"(.+?) (.+?)\">#is";
$txt = preg_replace($pattern, "\<a href=\"\\1%20\\2\">", $txt);
I also need this to work for multiple words, but only withing the tags, as the rest of the text should have spaces. So a str_replace won't work (I think?)
Any tips?
The stable solution would be: Use DOM to retrieve the href value, use str_replace() to remove the spaces and then write back the value using DOM again.
Don't use regexes to handle html / xml.
Try this regex code to remove white spaces
\s+(?=[^()]*\
You can try the regex:
$txt = preg_replace('~(?:href="|(?<!^)\G)\K([^" ]*)\s+~g', "$1%20", $txt);
\G matches at the end of the previous match so that you can replace multiple spaces in a single attribute.
regex101 demo.
OK, I have a section of code with things like:
<a title="title" href="http://example.com">Text</a>
I need to reformat these somehow so that they become:
<b>Text</b>
There are at least 24 links being changed, and they all have different titles and hrefs. Thanks in advance, Austin.
Although not optimal, you can do this with regular expressions:
$string = '<a title="title" href="http://example.com">Text</a>';
$string = preg_replace("/<a\s(.+?)>(.+?)<\/a>/is", "<b>$2</b>", $string);
echo($string);
This essentially says, look for a part of the string that has the form <a*>{TEXT}</a>, copy the {TEXT}, and replace that whole matched string with <b>{TEXT}</b>.
Try this,
$link = '<a title="title" href="http://example.com">Text</a>';
echo $formatted = "<b>".strip_tags($link)."</b>";
Check this link out as well, I think this is what you are looking for.
You want to read about Regular Expressions because you will need them sooner or later anyway. If you do not mind about the content of the href property, then you can use:
s/<a(?:\s[^>]*)?>([^<]+)<\/a>/<b>\1<\/b>/
The part between the first // searches for the opening tag (either <a> alone or with some parameters, in this case a white space \s is required to avoid matching <abbrev> e.g. as well), some text which will stored by the brackets, and the closing tag. The part between the second // is the replacement part where \1 denotes the text matched by the brackets in the first part.
See also PHP’s preg_replace function. The final expression would then look like this (tested):
preg_replace('/<a(?:\s[^>]*)?>([^<]+)<\/a>/i', '<b>\\1</b>', 'Text');
To replace a tag by another and matching / replacing attributes names at the same time :
$string = '<img src="https://example.com/img.png" alt="Alternative"/>';
$string = preg_replace('/<img src="(.+?)" alt="(.+?)"\\/>/is','<amp-img src="$1" alt="$2"></amp-img>',$string);
//$string is now '<amp-img src="https://example.com/img.png" alt="Alternative"></amp-img>'