so I'm having issues getting preg_replace to work right. I'm trying to create my own custom markdown. I get the result I want since it seems to be coughing up what I wanted. However, the problem is that it spits the user input outside of the blockquote. Here is an example of what I am talking about.
Here's my code.
<?php
$user_input = '> My quote';
$syntax = array(
'/>\s+(.*?)/is'
);
$replace_with_html = array(
'<blockquote><h3>Quote</h3><p>$1</p></blockquote>'
);
$replaced = preg_replace($syntax, $replace_with_html, $user_input);
print($replaced);
Here's the user input.
> My quote
And here is the result.
<blockquote><h3>Quote</h3><p></p></blockquote>My quote
What I want is
<blockquote><h3>Quote</h3><p>My quote</p></blockquote>
As you can see, the user input is in the wrong placement (at the end of the final HTML code). Is there a way to possibliy fix this and place it within the paragraph tags?
You don't need to make arrays, use this:
$user_input = '> My quote';
$syntax = '/>\s+(.*)/s';
$replace_with_html = '<blockquote><h3>Quote</h3><p>$1</p></blockquote>';
$replaced = preg_replace($syntax, $replace_with_html, $user_input);
print($replaced);
This works the same way: (Demo)
$user_input = '> My quote';
$syntax = ['/>\s+(.*)/s'];
$replace_with_html = ['<blockquote><h3>Quote</h3><p>$1</p></blockquote>'];
$replaced = preg_replace($syntax, $replace_with_html, $user_input);
print($replaced);
Either way, you WANT the dot in the pattern to be greedy, remove the ?.
Without this adjustment, you're only replacing the >\s+ part of the pattern.
That said, let me solve some problems that you haven't encountered yet...
How do you know where to stop quoting?
What if someone wants to use > to mean "greater than"?
Consider this new pattern and how it may help you tackle some future challenges:
/^>\s+(\S+(?:\s\S+)*)/m Replacement Demo
In the demo link you will see that the pattern will match (after > and 1 or more spaces) one or more non-whitespace characters optionally followed by: a single whitespace character (this can be a space/tab/return/newline) then one or more non-whitespace characters.
Effectively this says, you want to continue matching "quote" text until there are 2 or more consecutive whitespace characters (or else to the end of the string).
This adjustment should give your users the ability to accurately/conveniently quote-format their text while appropriately leaving innocent > character alone.
Related
I'm really stuck with this one program...
I'm learning how to program and I'm starting with PHP right now.
I need to get titles out of articles.
I already asked this question, and I mannaged to get the first title of the text in many ways. For example if text was :
Hello
I'm learning how
to write this code.
:like this, so I got the "Hello" part for example like this:
<?php
$string = "Hello
I'm learning how
to write this code.";
$str=strstr($string,"\n",true);
echo $str . "<br />";
?>
However, there can be a lot of titles in the article and each one of them is seperated with blank lines from above and bellow and I cannot mannage to get all of these titles.
Here's what I tried:
<?php
$string="
Good text
Good text is good but I have no idea
how to code this.
Another title
I need to get you,
but don't know how."
$get = substr($string, strpos($string, $finda), -1);
$finda="\n";
$getFinal=strstr($get, $finda, true);
echo $getFinal;
?>
But this doesn't work because there are "\n" after every line. How to identify only those blank lines? I tried to find them:
$getRow = explode("\n", $string);
foreach($getRow as $row){
if(strlen($row) <= 1){
but I don't know what to do next.
Do you have any ideas? Can you help?
Thank you in advance:)
You can use a regular expression like this:
<?php
$string="
Good text
Good text is good but I have no idea
how to code this.
Another title
I need to get you,
but don't know how.";
preg_match_all('/^\n(.+?)\n\n/m', $string, $matches);
var_dump($matches[1]);
?>
Outputs:
array(2) {
[0] =>
string(9) "Good text"
[1] =>
string(13) "Another title"
}
Explanation of the regular expression
Regular expressions are a compact way to describe constraints for a string. Either to check that it verifies a given pattern or to capture some of its parts. In this case, we want to capture some parts of the string (titles).
'/^\n(.+?)\n\n/m' is the regular expression used to solve your problem. The actual expression is between the slashes while the leading m is an option. It indicates that we want to analyse multiple lines.
We are left with ^\n(.+?)\n\n which can be read from left to right.
^ indicates the beginning of a line and \n represents the "new line" character. Coupled (^\n), they represent an empty line.
Parenthesis indicates what we want to capture. In this case, the title, which can be any number of any characters. The . represents any characters and the + indicates that we want any number of occurrences of that character (but at least one, the * can be used to include zero occurrence). The ? indicates that we don't want to go too far and capture the whole string. It will thus stop at the first occasion it has to match the remaining part of the regular expression.
Then, the two \n represent the end of the title line and the end of the empty line following it.
As we used preg_match_all instead of preg_match, every occurrence of the pattern will be matched instead of the first one only.
Regular expressions are really powerful and I invite you to learn them further.
While iterating over the lines, you could have a variable that stores what you are currently doing. What I mean is that you could have 3 states: processing_text, expecting_title, got_title.
Each time you find that $row == "" (meaning there was an empty line, only containing a \n), you set your variable to expecting_title. If the var==expecting_title, you store/echo the next row you encounter and set the variable to got_title. This way, when you encounter the next empty line, you won't set the variable to expecting_title, but to processing_text.
Some pseudocode to get you started:
foreach ($getRow as $row)
if (state == expecting_title)
processTitle($row)
state=got_title
if ($row == "")
if (state == processing_text)
state=expecting_title
else
state=processing_text
Or, you can always use regex, as the other answer mentioned, but that's another story.
I cannot find a way to allow a space in this regex for extract between title tag
<title>my exemple</title>
here is the regex
$pattern = "/<title>(.+)<\/title>/i";
I tried
/<title>(.+)<\/title>/i\s
/<title>(.+)<\/title>/i\S
/<title>\s(.+)<\/title>/i
/<title>(.+)\s<\/title>/i
here is the full fonction
function getSiteTitle(){
$RefURL = (is_null($_SERVER['HTTP_REFERER'])) ? 'Un know' : $_SERVER['HTTP_REFERER'];
if($RefURL != 'Un know'){
$con = file_get_contents($RefURL) or die (" can't open URL referer ");
$pattern = "/<title>(.+)<\/title>/i";
preg_match($pattern,$con,$match);
$result = array($match[1],$RefURL);
return $result;
i have verified that i receive a keyword in my referer , because it work petty well with keywords without space
thx you
If you want to capture HTML on multiple lines (is that what you mean by "spaces"?), you'll need to turn on the s modifier, which allows the . character to match newline characters, as well.
This should work:
/<title>(.+)<\/title>/is
How about
$pattern = "/<title>\s*(.+)\s*<\/title>/i";
then the first capturing group will contain only the keyword, which may contain spaces, like:
<title> key word </title>
// result is "key word"
add the s modifier to the end (/.../is) if you want to allow newlines inside title as well.
If I got what you want right, you could also use this approach:
$pattern = "/<title>(.+)<\/title>/is";
and then trim the first capturing group.
Selecting text between title text and the tags as well:
/<title>(.+)<\/title>/
Doing the same even if they are spread over multiple lines:
/<title>(.+)<\/title>/s
Doing the same as above but ignoring cases (lower or upper case doesn't matter)
/<title>(.+)<\/title>/is
Now we are using lookbehind and lookahead in order to only select the text between the tags:
/(?<=<title>)(.+)(?=<\/title>)/is
Please change the flags (i and s) the way you need them.
If that doesn't solve your problem I don't know what will :)
Here you can see an example of how my last regex would work: http://regexr.com?37ukf
EDIT:
Ok, try to test this code somehere:
<?php
$title = '<title> My Example </title>';
preg_match('/(?<=<title>)(.+)(?=<\/title>)/is', $title, $match);
var_dump($match);
?>
You'll see that it works perfectly fine. Now with this knowledge go ahead and check if $con truly looks the way you think it should. And do a var_dump of your $matches instead of looking for specific indices.
I am attempting to change a string occurance e.g. http://www.bbc.co.uk/ so that it appears inside a html link e.g. http://www.bbc.co.uk
however for some reason my regex conversion does not work. Can someone please point me in the correct direction?
$text = "I love this website http://www.bbc.co.uk/";
$x = preg_replace("#[a-z]+://[^<>\s]+[[a-z0-9]/]#i", "\\0", $text);
var_dump($x);
outputs I love this website http://www.bbc.co.uk/ (No html link)
Your weird character class is at fault:
[[a-z0-9]/]
Double square brackets are for POSIX character classes like [[:digit:]].
You meant to write just:
[a-z0-9/]
It is because you regex is giving you a match (in fact it's really not even close to giving you a match as you are not accepting periods in the domain name at all). Try something like this:
$pattern = '#https?://.*\b#i';
$replace = '$0';
$x = preg_replace($pattern, $replace, $text);
Note that I am not actually trying to validate the URL format here, so I just accept anything like http():// up to the next word boundary. It didn't seem as if you were going for a true URL validation regex anyway (i.e. validating there is at least one ., that the TLD component has 2-6 characters, etc.), so I just figure I would give you the simplest pattern that would match.
Use this:
$x = preg_replace('#http://[?=&a-z0-9._/-]+#i', '<a target="_blank" href="$0">$0</a>', $text);
I have a forum that supports hashtags. I'm using the following line to convert all hashtags into links. I'm using the (^|\(|\s|>) pattern to avoid picking up named anchors in URLs.
$str=preg_replace("/(^|\(|\s|>)(#(\w+))/","$1$2",$str);
I'm using this line to pick up hashtags to store them in a separate field when the user posts their message, this picks up all hashtags EXCEPT those at the start of a new line.
preg_match_all("/(^|\(|\s|>)(#(\w+))/",$Content,$Matches);
Using the m & s modifiers doesn't make any difference. What am I doing wrong in the second instance?
Edit: the input text could be plain text or HTML. Example of problem input:
#startoftextreplacesandmatches #afterwhitespacereplacesandmatches <b>#insidehtmltagreplacesandmatches</b> :)
#startofnewlinereplacesbutdoesnotmatch :(
Your replace operation has a problem which you have evidently not yet come across - it will allow unescaped HTML special characters through. The reason I know this is because your regex allows hashtags to be prefixed with >, which is a special character.
For that reason, I recommend you use this code to do the replacement, which will double up as the code for extracting the tags to be inserted into the database:
$hashtags = array();
$expr = '/(?:(?:(^|[(>\s])#(\w+))|(?P<notag>.+?))/';
$str = preg_replace_callback($expr, function($matches) use (&$hashtags) {
if (!empty($matches['notag'])) {
// This takes care of HTML special characters outside hashtags
return htmlspecialchars($matches['notag']);
} else {
// Handle hashtags
$hashtags[] = $matches[2];
return htmlspecialchars($matches[1]).'#'.htmlspecialchars($matches[2]).'';
}
}, $str);
After the above code has been run, $str will contain the modified string, properly escaped for direct output, and $hashtags will be populated with all the tags matched.
See it working
I'm creating some custom BBcode for a forum. I'm trying to get the regular expression right, but it has been eluding me for two days. Any expert advice is welcome.
The input (e.g. sample forum post):
[quote=Bob]I like Candace. She is nice.[/quote]
I agree, she is very nice. I like Ashley, too, and especially [Ryan] when he's drinking.
Essentially, I want to encase any names (from a specified list) in [user][/user] BBcode... except, of course, those being quoted, because doing that causes some terrible parsing errors. Below is an example of how I want the output to be.
The desired output:
[quote=Bob]I like [user]Candace[/user]. She is nice.[/quote]
I agree, she is very nice. I like [user]Ashley[/user], too, and especially [[user]Ryan[/user]] when he's drinking.
My current code:
$searchArray = array(
'/(?i)(Ashley|Bob|Candace|Ryan|Tim)/'
);
$replaceArray = array(
"[user]\\0[/user]"
);
$text = preg_replace($searchArray, $replaceArray, $input);
$input is of course set to the post contents (i.e. the first example listed above). How can I achieve the results I want? I don't want the regex to match when a name is preceded by an equals sign (=), but putting a [^=] in front of the names in the regex will make it match any non-equals sign character (i.e. spaces), which then messes up the formatting.
Update
The problem is that by using \1 instead of \0 it is omitting the first character before the names (because anything but = is matched). The output results in this:
[quote=Bob]I like[user]Candace[/user]. She is nice.[/quote]
I agree, she is very nice. I like[user]Ashley[/user], too, and especially [user]Ryan[/user]] when he's drinking.
You were on the right track with the [^=] idea. You can put it outside the capture group, and instead of \\0 which is the full match, use \\1 and \\2 i.e. the first & second capture groups
$searchArray = array(
'/(?i)([^=])(Ashley|Bob|Candace|Ryan|Tim)/'
);
$replaceArray = array(
"\\1[user]\\2[/user]"
);
$text = preg_replace($searchArray, $replaceArray, $input);