I am trying to take a string of HTML and, for all URLs in the string that end in "_page.php" & transform them so that they consist of ONLY the basename and "_page" so for example with this string:
<br/>http://www.website.com/folder/A_page.php TEXT
<br/>http://www.website.com/folder/B_page.php TEXT
<br/>http://www.website.com/folder/C_page.php TEXT
<br/>http://www.website.com/folder/D_dont.php TEXT
I want it to look like:
<br/>A_page TEXT
<br/>B_page TEXT
<br/>C_page TEXT
<br/>http://www.website.com/folder/D_dont.php TEXT
I wrote this:
$str = preg_replace('!(http)(s)?:\/\/[a-zA-Z0-9.?&_/]+_page.php!', '$0',$str);
which gets the right amount of matches, but it is replacing them with $0 which is the entire matched URL so it doesn't change the URLs at all. Doing this:
$str = preg_replace('!(http)(s)?:\/\/[a-zA-Z0-9.?&_/]+_page.php!', '$1',$str);
Gets me:
http TEXT
http TEXT
http TEXT
http://www.website.com/folder/D_dont.php TEXT
So I figured if I switched the $1 to $2 it would return the body of the URL which I could parse and return like this:
$str = preg_replace('!(http)(s)?:\/\/[a-zA-Z0-9.?&_/]+_page.php!', basename('$2','.php'),$str);
$2 turns up empty though. How can I capture the body of the link in preg_replace?
You don't need all those parentheses. For this pattern just use them to capture (/.*_page.php) and that is $1:
$str = preg_replace('!https?:\/\/[a-zA-Z0-9.?&_/]+(/.*_page.php)!', '$1', $str);
To use functions in the replace use a callback. Match the entire URL and then get the basename from that which in this case is $0 or $m[0]:
$str = preg_replace_callback('!https?:\/\/[a-zA-Z0-9.?&_/]+_page.php!',
function($m) { return basename($m[0]); },
$str);
Related
I have a string like:
#test hello how are you #abcdef
How would I automatically make it so it converts all text that has # to something like:
https://example.com/test
https://example.com/abcdef
I've tried using regex and preg_replace but can't get it down perfectly.
Try regex #(\S+) with substitution https://example.com/$1:
$string = '#test hello how are you #abcdef';
$result = preg_replace('/#(\S+)/', 'https://example.com/$1', $string);
print($result);
So I have an #mentions function on my site that users input themselves but can do something line:
#foo Hello This is some mention text included.
I would like to remove just the text (Everything after #foo) The content comes through the streamitem_content:
$json['streamitem_content_usertagged'] =
preg_replace('/(^|\s)#(\w+)/', '\1#$1',
$json['streamitem_content']);
Give this a try
$json['streamitem_content'] = '#foo Hello This is some mention text included.';
$json['streamitem_content_usertagged'] =
preg_replace('/#(\w+)/', '#$1',
$json['streamitem_content']);
echo $json['streamitem_content_usertagged'];
Output:
#foo Hello This is some mention text included.
Preg_replace will only replace what it finds so you don't need to find content you aren't interested. If you did want to capture multiple parts of a string though capture groups increase by one after each group (). So this
preg_replace('/(^|\s)#(\w+)/', '$1#$2',
$json['streamitem_content']);
echo $json['streamitem_content_usertagged'];
would actually be
preg_replace('/(^|\s)#(\w+)/', '$1#$2',
$json['streamitem_content']);
Update:
$json['streamitem_content'] = '#foo Hello This is some mention text included.';
$json['streamitem_content_usertagged'] =
preg_replace('/#(\w+).*$/', '#$1',
$json['streamitem_content']);
echo $json['streamitem_content_usertagged'];
Output:
#foo
If the content you want to replace after #foo can extended to multiple lines use the s modifier.
Regex101 Demo: https://regex101.com/r/tX1rO0/1
So pretty much the regex says find an # then capture all continuous a-zA-Z0-9_ characters. After a those continuos characters we don't care go to the end of the string.
You can use this:
preg_replace('/^\s*#(\w+)/', '#$1',
$json['streamitem_content']);
This removes the leading white space, and includes the # in the hyperlink's text (not the link argument).
If you need to keep the leading white space in tact:
preg_replace('/^(\s*)#(\w+)/', '$1#$2',
$json['streamitem_content']);
You could use explode(); and str_replace(); . They might have a speed advantage over preg.
Assuming the line is available as a variable (e.g. $mention):
$mention = $json['streamitem_content'];
$mention_parts = explode(" ", $mention);
$the_part_you_want = str_replace('#','', $mention_parts[0]);
// or you could use $the_part_you_want = ltrim($mention_parts[0], '#');
$json['streamitem_content_usertagged'] = '#' . $mention_parts[0] . '';
or use trim($mention_parts[0]); to remove any whitespace if it is unwanted.
You could use fewer variables and reuse $mention as array but this seemed a clearer way to illustrate the principle.
I have a php variable where I need to show #value Values as link pattern.
The code looks like this.
$reg_exUrl = "/\#::(.*?)/";
// The Text you want to filter for urls
$text = "This is a #simple text from which we have to perform #regex operation";
// Check if there is a url in the text
if(preg_match($reg_exUrl, $text, $url)) {
// make the urls hyper links
echo preg_replace($reg_exUrl, ''.$url[0].'', $text);
} else {
// if no urls in the text just return the text
echo "IN Else #$".$text;
}
By using \w, you can match a word contains alphanumeric characters and underscore. Change your expression with this:
$reg_exUrl = "/#(.*?)\w+/"
$reg_exUrl = "/\#::(.*?)/";
This doesn't match because of the following reasons
1. there is no need to escape #, this is because it is not a special character.
2. since you want to match just # followed by some words, there is no need for ::
3. (.*?) tries to match the least possible word because of the quantifier ?. So it won't match the required length of word you need.
If you still want to go by your pattern, you can modify it to
$reg_exUrl = "/#(.*?)\w+/" See demo
But a more efficient one that still works is
$reg_exUrl = "/#\w+/". see demo
It's not clear to me exactly what you need match. If you want to replace a # followed by any word chars:
$text = "This is a #simple text from which we have to perform #regex operation";
$reg_exUrl = "/#(\w+)/";
echo preg_replace($reg_exUrl, '$1', $text);
//Output:
//This is a simple text from which we have to perform regex operation
The replacement uses $0 to refer to the text matched and $1 the first group.
$text = 'Hello #demo here!';
$pattern = '/#(.*?)[ ]/';
$replacement = '<strong>${1}</strong> ';
echo preg_replace($pattern, $replacement, $text);
This works, I get HTML like this: Hello <strong>demo</strong> here!. But this not works, when that #demo is at the end of string, example: $text = 'Hello #demo';. How can I change my pattern, so it will return same output whenever it is end of the string or not.
Question 2:
What if the string is like $text = 'Hello #demo!';, so it will not put ! as bolded text? Just catch space, end of string or not real-word.
Sorry for bad English, hope you know what I need.
In order to select a word beginning with the # symbol, this regex will work:
$pattern = "/#(\w+)\b/"
`\w` is a short hand character class for `[a-zA-Z0-9_]`. `\b` is an anchor for the beginning or end of a word, in this case the end. So the regex is saying: select something starting with an '#' followed by one or more word characters until the end of the word is reached.
Reference: http://www.regular-expressions.info/tutorial.
You could use a word boundary, that's what they're for:
$pattern = '/#(.+?)\b/';
This will work for question 2 also
You can add an option to match the end of the string:
#(.*?)(?= |\p{P}?$)
Replace with <strong>$1</strong>.
You can also use \p{P} (any Unicode punctuation symbol) to prevent punctuation from bold formatting.
Here is a demo.
i made this function to find specific sets of characters in a text string and convert them to html tags:
function ccfc($content)
{
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
// $code_block = preg_replace($reg_exUrl, "{$url[0]} ", $content);
if(preg_match($reg_exUrl, $content, $url)) {
// make the urls hyper links
$content = preg_replace($reg_exUrl, "{$url[0]} ", $content);
} else {
// if no urls in the text just return the text
$content = $content;
}
$code_block = preg_replace_callback(
'/([\`]{3})(.*?)([\`]{3})/s',
function($matches) {
$matches[2] = htmlentities($matches[2]);
return '<pre><code>'. $matches[2] .'</code></pre>';
},
$content);
$bold = preg_replace_callback(
'/([\*]{2})(.*?)([\*]{2})/s',
function($matches) {
$matches[2] = htmlentities($matches[2]);
return '<b>'. $matches[2] .'</b>';
},
$code_block);
$italic = preg_replace_callback(
'/([\*]{1})(.*?)([\*]{1})/s',
function($matches) {
$matches[2] = htmlentities($matches[2]);
return '<i>'. $matches[2] .'</i>';
},
$bold);
return $italic;
}
This function will find the urls like http://www.google.com and convert them to links
the second will find the ``` code content ``` and convert it to <pre><code> code content </code></pre>
The third will find the ** content ** and convert to <b> content </b>
The fourth will find the * content * and convert it to <i> content </i>
but if the code is written outside the ``` ``` it is executed. How can I make the remaining text use htmlentities()?
Rather than calling htmlentities after running the text through your converter functions, call it before you do the converting:
function ccfc($content) {
$content = htmlentities($content);
This won't affect the entities involved in the markup (* and `), and you can also set the double_encode flag to false ensure that already-encoded content (e.g. & characters in links) does not get encoded twice -- see the PHP manual for the settings:
$content = htmlentities($content, ENT_QUOTES, UTF-8, false);
This setting will treat the text as UTF-8, encode all quotes, but will not double encode a link like http://example.com?p=1&q=2.
On another note, you don't need to use preg_replace_callback for your substitutions; you can use the captured text in the replacement expression. Here's an example from the code formatting regex:
$code_block = preg_replace(
'/`{3}(.*?)`{3}/s',
"<pre><code>$1</code></pre>",
$content);
As noted in my comment, <b> and <i> are deprecated; if you are using them to place emphasis on text, you could replace them with <strong> and <em> respectively; if the markup is solely for presentation, it is better to enclose the text in a <span> element and give it a class which has bold or italic formatting.
Here is the full code with the htmlentities moved and preg_replace replacements:
function ccfc($content)
{ $content = htmlentities($content, ENT_QUOTES, NULL, false);
echo $content . PHP_EOL;
$reg_exUrl = "/((http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/?\S*)?)/";
// $code_block = preg_replace($reg_exUrl, "{$url[0]} ", $content);
// make the urls hyperlinks
$content = preg_replace($reg_exUrl, "<a href='$1'>$1</a>", $content);
# replace ``` with code blocks
$content = preg_replace(
'/`{3}(.*?)`{3}/s',
"<pre><code>$1</code></pre>",
$content);
# replace **text** with strong text
$content = preg_replace(
'/\*{2}([^\*].*?)\*{2}/s',
"<strong>$1</strong>",
$content);
# replace *text* with em text
$content = preg_replace(
'/\*(.*?)\*/s',
"<em>$1</em>",
$content);
return $content;
}
A quick explanation of how preg_replace works: when you use parentheses in a regular expression, you capture the matter within those parentheses to the special variables $1, $2, $3, etc. The contents of the first set of parentheses are in $1, the contents of the second set in $2, and so on. For example, take this regular expression:
/(\w+) and (\w+)/
and the input string bread and butter, bread matches the expression in the first set of parens and butter matches the expression in the second set; $1 would be set to bread and $2 to butter. This becomes useful when we do preg_replace, as we can use $1 and $2 in the replacement string:
$str = preg_replace("/(\w+) and (\w+)/", "I love $2 on $1", "bread and butter");
echo $str;
Output:
I love butter on bread
Anything that is in the match string but isn't captured will disappear -- like the and in this example.
In the replacements in your code, the text between the delimiters (* and `) needs to be kept, so it is captured in parentheses; the delimiters themselves are not needed, so they aren't in parentheses.
Explanation of the other characters in the regexs:
?, *, +, {2} : these are quantifiers - they dictate the number of times the preceding pattern should appear. ? means 0 or 1 times; * is 0 or more times; + is one or more times; {2} means twice; {500} would mean 500 times.
\w represents any number, letter, or _
. matches any character
.*? matches a string of any length, including length 0.
\** would match 0 or more * characters; to match *, you have to escape it (i.e. \*) so that the regex engine doesn't interpret it as a quantifier