urlencode within preg_replace, possible? - php

Using following code to convert URLs within a text into real HTML links:
$string = preg_replace("!(http|https|ftp|ftps)://([.]?[&;%#:=a-zA-Z0-9_/?-])*!", "\\0", $string);
$string = preg_replace("!(^| |\n)(www([.]?[&;%#:=a-zA-Z0-9_/?-])+)!", "\\1\\2", $string);
Now there is a problem with URLs containing the # symbol (and perhaps other special chars too) and I would need the urlencode() function somehow within this preg_replace() function.
Any ideas?

Related

Include all emojis and some special characters in hashtag link using Regex in PHP

I'm extracting usernames from a string starting with # sign and converting to a hashtag link. For this my code is
$str = "#John_Smith #💥DanielCarter and #Jack🙂Foster are the good programmers";
$regex = "/#+([a-zA-Z0-9#_]+)/";
$string = preg_replace($regex, '$0', $str);
Now the problem is that the hashtag link on emojis is stopping and not moving further, and I know the code below is used to track the emojis.
(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])
But I don't know how to use in this situation

Remove redundant tags '<p><br></p>' in php?

Removing redundant tags <p><br></p> at the beginning and end of the string, and in the middle leaving only one.
Input:
<p><br></p><p><br></p><p><br></p><p>gfdsgfdsgfds</p><p><br></p><p><br></p><p><br></p><p>gfdsgfdsgfdsgfds</p><p><br></p><p><br></p><p><br></p>
Desired output:
<p>gfdsgfdsgfds</p><p><br></p><p>gfdsgfdsgfdsgfds</p>
Alternative desired output:
<p>gfdsgfdsgfds</p><p><br></p><p><br></p><p><br></p><p>gfdsgfdsgfdsgfds</p>
I've tried to use: preg_replace
$string = preg_replace('/(<p><br></p>)+/', '', $string);
But the result is null.
You need to escape the slash / character in your regex:
$string = preg_replace('/(<p><br><\/p>)+/', '', $string);
Also note that this will remove all instances where multiple of these patterns occor, resulting in the following:
<p>gfdsgfdsgfds</p><p>gfdsgfdsgfdsgfds</p>
To remove duplicates but leave one instance, could do the following:
$string = preg_replace('/(<p><br><\/p>)+/', '<p><br></p>', $string);
Maybe the Purifier http://htmlpurifier.org/ can help you. It can clean up html code and also remove javascript for example if needed.

PHP: Preg_match and replace all

I have an obvious hyperlink which I all want to replace in a text to just normal HTML hyperlinks.
So this just works for one hyperlink:
$string = '<u>\\n\\\\*HYPERLINK \\"http://www.youtube.com/watch?v=A0VUsoeT9aM\\"A Youtube Video</u>';
$pattern = '/http[?.:=\\w\\d\\/]*/';
$namePattern = '/(?:")([\\s\\w]*)</';
preg_match($pattern, $string, $matches);
preg_match($namePattern, $string, $nameMatches);
echo ''.$nameMatches[1].'';
But there are more hyperlinks than just one in a text so I want to just change all of these hyperlinks:
<?php
$input = 'Blablabla Beginning Text <u>\\n\\\\*HYPERLINK \\"http://www.youtube.com/watch?v=A0VUsoeT9aM\\"1.A Youtube Video</u> blablabla Text Middle <u>\\n\\\\*HYPERLINK \\"http://www.youtube.com/watch?v=A0VUsoeT9aM\\"2. A Youtube Video</u> blabla Text after';
//To become:
$output = 'Blablabla Beginning Text 1. A Youtube Videoblablabla Text Middle 2. A Youtube Video blabla Text after';
?>
How would I do that?
So, you want to replace the found matches, then use preg_replace() which does exactly that. However, you'll run into one obvious problem: Currently there are two instances of preg_match() - should those be replaced by two instances of preg_replace()? No. Combine them.
$pattern = '/http[?.:=\w\d\\/]*/';
$namePattern = '/(?:")([\s\w]*)</';
Can be combined to (I added . to the $namePattern part, so it can work with the second example text where the link description contains a dot):
$replacePattern = '/(http[?.:=\w\d\\/]*)\\\\"([\s\w.]*)</';
Because link and text are separated by \\" in the original text. I tested via preg_match_all() if this pattern works and it does. Also by adding () to the first pattern, they are now grouped.
$replacePattern = '/(http[?.:=\w\d\\/]*)\\\\"([\s\w.]*)</';
// ^-group1-----------^ ^-group2-^
These groupes can now be used in the replace statement.
$replaceWith = '\\2<';
Where \\1 points to the first group and \\2 to the second. The < at the end is necessary because preg_replace() will replace the whole found pattern (not just groups) and since the < is at the end of the pattern, we would lose it if it wasn't in the replace part.
All that you now need, is to call preg_replace() with this parameters like the following:
$output = preg_replace($replacePattern, $replaceWith, $string);
All occurences of the $replacePattern will now be replaced with their version of $replaceWith and saved in the variable $output.
You can see it here.
If you want a larger part to be removed, just extend the $replacePattern.
$replacePattern = '/<u>.*?(http[?.:=\w\d\\/]*)\\\\"([\s\w.]+)<\/u>/';
$replaceWith = '\\2';
(see it here) .*? will match everything and is not greedy, meaning it will stop once it finds the first occurence of whatever comes after (so here it is http...).

PHP preg_replace ampersand

Okay I might not be going about this the right way, but here goes..
I have this string that takes a link and extracts the text between the tags...
$string = $item;
$pattern = '/\<a([^>]*)\>([^<]*)\<\/a\>/i';
$replacement = '$2';
$message = preg_replace($pattern, $replacement, $string);
There are a few items in this string that have ampersands (in the text portion, not the tag portion), however most don't. I'm trying to figure out a way to either incorporate the ampersand into the current pattern or do another preg_replace on the $message to remove the ampersand after the tags are striped away.
THANKS!
There's always $message = str_replace('&', '', $message);
Incidentally, if you are trying to strip tags from html input, there is also strip_tags
for example, if your input is
$text = 'Text';
Then strip_tags($text) will produce Text.
Do you want to remove everything after the ampersand? Then it's
'/\<a([^>]*)\>([^<&]*)[^<]*\<\/a\>/i';
Otherwise, you'll need a 2nd operation.
BTW: Your regex will also match other tags starting with <a, such as the <author> or the <audio> tag.

Replacing HTML attributes using a regex in PHP

OK,I know that I should use a DOM parser, but this is to stub out some code that's a proof of concept for a later feature, so I want to quickly get some functionality on a limited set of test code.
I'm trying to strip the width and height attributes of chunks HTML, in other words, replace
width="number" height="number"
with a blank string.
The function I'm trying to write looks like this at the moment:
function remove_img_dimensions($string,$iphone) {
$pattern = "width=\"[0-9]*\"";
$string = preg_replace($pattern, "", $string);
$pattern = "height=\"[0-9]*\"";
$string = preg_replace($pattern, "", $string);
return $string;
}
But that doesn't work.
How do I make that work?
PHP is unique among the major languages in that, although regexes are specified in the form of string literals like in Python, Java and C#, you also have to use regex delimiters like in Perl, JavaScript and Ruby.
Be aware, too, that you can use single-quotes instead of double-quotes to reduce the need to escape characters like double-quotes and backslashes. It's a good habit to get into, because the escaping rules for double-quoted strings can be surprising.
Finally, you can combine your two replacements into one by means of a simple alternation:
$pattern = '/(width|height)="[0-9]*"/i';
Your pattern needs the start/end pattern character. Like this:
$pattern = "/height=\"[0-9]*\"/";
$string = preg_replace($pattern, "", $string);
"/" is the usual character, but most characters would work ("|pattern|","#pattern#",whatever).
I think you're missing the parentheses (which can be //, || or various other pairs of characters) that need to surround a regular expression in the string. Try changing your $pattern assignments to this form:
$pattern = "/width=\"[0-9]*\"/";
...if you want to be able to do a case-insensitive comparison, add an 'i' at the end of the string, thus:
$pattern = "/width=\"[0-9]*\"/i";
Hope this helps!
David

Categories