I'am doing a simple text editor that I need to handle creating paragraphs.
Paragraphs will be in WikiDot Syntax, long story short what i need to change:
+ paragraph 1
changes to
< h1>paragraph< /h1>
++ subparagraph 1
changes to
< h2>subparagraph< /h2>
How do this in PHP?
To expand on #CrayonViolent's (in cases where the first replace interrups the second):
<?php
$content = "Hello, world
+ Big Heading
++ Smaller heading
Additional content";
function r($m){
$tag = "h".strlen($m[1]);
return "<{$tag}>{$m[2]}</{$tag}>";
}
$content = preg_replace_callback('/^(\+{1,6})\s?(.*)$/m','r', $content);
echo $content;
?>
Also added the m (multi-line) flag to the regex for a little better matching, and will only do headers <h1>~<h6>.
Working example can be located here
$content = preg_replace ("~^\+\+(.*?)\n\n~",'<h2>$1</h2>',$content);
$content = preg_replace ("~^\+(.*?)\n\n~",'<h1>$1</h1>',$content);
Related
I have a some html paragraphs and I want to wrap every word in . Now I have
$paragraph = "This is a paragraph.";
$contents = explode(' ', $paragraph);
$i = 0;
$span_content = '';
foreach ($contents as $c){
$span_content .= '<span>'.$c.'</span> ';
$i++;
}
$result = $span_content;
The above codes work just fine for normal cases, but sometimes the $paragraph would contains some html tags, for example
$paragraph = "This is an image: <img src='/img.jpeg' /> This is a <a href='/abc.htm'/>Link</a>'";
How can I not wrap "words" inside html tag so that the htmnl tags still works but have the other words wrapped in spans? Thanks a lot!
Some (*SKIP)(*FAIL) mechanism?
<?php
$content = "This is an image: <img src='/img.jpeg' /> ";
$content .= "This is a <a href='/abc.htm'/>Link</a>";
$regex = '~<[^>]+>(*SKIP)(*FAIL)|\b\w+\b~';
$wrapped_content = preg_replace($regex, "<span>\\0</span>", $content);
echo $wrapped_content;
See a demo on ideone.com as well as on regex101.com.
To leave out the Link as well, you could go for:
(?:<[^>]+> # same pattern as above
| # or
(?<=>)\w+(?=<) # lookarounds with a word
)
(*SKIP)(*FAIL) # all of these alternatives shall fail
|
(\b\w+\b)
See a demo for this on on regex101.com.
The short version is you really do not want to attempt this.
The longer version: If you are dealing with HTML then you need an HTML parser. You can't use regexes. But where it becomes even more messy is that you are not starting with HTML, but with an HTML fragment (which may, or may not be well-formed. It might work if Hence you need to use an HTML praser to identify the non-HTML extents, separate them out and feed them into a secondary parser (which might well use regexes) for translation, then replace the translted content back into the DOM before serializing the document.
I have this text in a MySql database:
First paragraph very long.
Second paragraph very long.
Third paragraph.
I add p tags and it works:
$text = preg_replace("/\n/","<p>",$text);
$text = '<p>'.$text;
I try to add line breaks when I echo to a html page. I tried 3 different things. But none of them seem to work:
$text = preg_replace("/<\/p>/","</p>\n\n",$text);
$text = preg_replace("/<\/p>/","</p><br><br>",$text);
$text = nl2br($text);
echo $text;
If I go to the web inspector in the Safari browser, I get this:
<p>First paragraph very long.</p><p>Second paragraph very long.</p><p>Third paragraph.</p>
I would like to have this:
<p>First paragraph very long.</p>\n>\n
<p>Second paragraph very long.</p>\n>\n
<p>Third paragraph.</p>\n>\n
It seems that my regex does not select <\/p> even when I escape it. I do not understand. What is wrong?
Presuming you need newline control chars (and not html line break tags):
$text = "First paragraph very long.\nSecond paragraph very long.\nThird paragraph.";
$text = '<p>' . preg_replace("~\n~", "<p>\n\n</p>", trim($text)) . '</p>;
Note trim is used incase you have leading or trailing newlines, ~ is used as a delimiter, because / is a poor choice when dealing with html, causeing you to escape all over the place.
It is not apparent in the above example, but using some of your reqex as an example:
preg_replace("~</p>~","</p>\n\n",$text);
is much easier to read than:
preg_replace("/<\/p>/","</p>\n\n",$text);
Also, you dont need regex, you could just use str_replace:
$text = '<p>' .str_replace("\n", "<p>\n\n</p>", trim($text)) . '</p>;
Or even explode/implode:
$text = '<p>' . implode("</p>\n\n<p>", explode("\n", trim($text))) . '</p>';
If it was html line breaks you wanted, then you could just edit the replacement argument to:
"</p><br><br><p>"
in any of the above, but it would probably be better to use some css:
p{
margin-bottom:10px;
}
You don't need regex, simple str_replace works (in your example):
$text = str_replace( "</p><p>","</p>\n<p>",$text );
I'm using the code below for highlight one word from file_get_content and go to anchor.
$file='
IAR6=1002
SHF6=1
REF6=0002
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
TY8=2
DATE8=20130820182429
STAT_N8=1002
SH8=1
OP8=S123
SEQ8=0002120000081
';
$Seq = 0002110000001;
$text = preg_replace("/\b($Seq)\b/i", '<span class="highlight"><a name="here">\1</a></span>', $file);
for now this highlight : 0002110000001
i would like to highlight all part of the same index number.
ex:
looking for 0002110000001
highlight this part of txt only where number is 7
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
Any help will be appreciated.
EDIT:
i try to be more specific.
file contain lot of code parts always start by TYx (x is auto numbering)
i have the SEQ number for my search , in ex 0002110000001
the preg_replace("/\b($Seq)\b/i", '\1 find 0002110000001 and higlight them.
what i need is higlight what is between TY7 and TY8 instead of only 0002110000001.
Hope this is clear enough due to my bad english
thanks
You can make use of stripos() and explode() in PHP
<?php
$file='
IAR6=1002
SHF6=1
REF6=0002
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
TY8=2
DATE8=20130820182429
STAT_N8=1002
SH8=1
OP8=S123
SEQ8=0002120000081
';
//$Seq = "0002110000001";
$Seq = "7";
$new_arr=explode(PHP_EOL,$file);
foreach($new_arr as $k=>$v)
{
if(stripos($v,$Seq)!==false)
{
echo "$v\n";
}
}
OUTPUT :
TY7=2
DATE7=20130820182357
STAT_N7=1002
SEQ7=0002110000001
STA7=000005
What is the easiest way of applying highlighting of some text excluding text within OCCASIONAL tags "<...>"?
CLARIFICATION: I want the existing tags PRESERVED!
$t =
preg_replace(
"/(markdown)/",
"<strong>$1</strong>",
"This is essentially plain text apart from a few html tags generated with some
simplified markdown rules: <a href=markdown.html>[see here]</a>");
Which should display as:
"This is essentially plain text apart from a few html tags generated with some simplified markdown rules: see here"
... BUT NOT MESS UP the text inside the anchor tag (i.e. <a href=markdown.html> ).
I've heard the arguments of not parsing html with regular expressions, but here we're talking essentially about plain text except for minimal parsing of some markdown code.
Actually, this seems to work ok:
<?php
$item="markdown";
$t="This is essentially plain text apart from a few html tags generated
with some simplified markdown rules: <a href=markdown.html>[see here]</a>";
//_____1. apply emphasis_____
$t = preg_replace("|($item)|","<strong>$1</strong>",$t);
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=
// <strong>markdown</strong>.html>[see here]</a>"
//_____2. remove emphasis if WITHIN opening and closing tag____
$t = preg_replace("|(<[^>]+?)(<strong>($item)</strong>)([^<]+?>)|","$1$3$4",$t);
// this preserves the text before ($1), after ($4)
// and inside <strong>..</strong> ($2), but without the tags ($3)
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=markdown.html>
// [see here]</a>"
?>
A string like $item="odd|string" would cause some problems, but I won't be using that kind of string anyway... (probably needs htmlentities(...) or the like...)
You could split the string into tag/no-tag parts using preg_split:
$parts = preg_split('/(<(?:[^"\'>]|"[^"<]*"|\'[^\'<]*\')*>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
Then you can iterate the parts while skipping every even part (i.e. the tag parts) and apply your replacement on it:
for ($i=0, $n=count($parts); $i<$n; $i+=2) {
$parts[$i] = preg_replace("/(markdown)/", "<strong>$1</strong>", $parts[$i]);
}
At the end put everything back together with implode:
$str = implode('', $parts);
But note that this is really not the best solution. You should better use a proper HTML parser like PHP’s DOM library. See for example these related questions:
Highlight keywords in a paragraph
Regex / DOMDocument - match and replace text not in a link
First replace any string after a tag, but force your string is after a tag:
$t=preg_replace("|(>[^<]*)(markdown)|i",'$1<strong>$2</strong>',"<null>$t");
Then delete your forced tag:
$show=preg_replace("|<null>|",'',$show);
You could split your string into an array at every '<' or '>' using preg_split(), then loop through that array and replace only in entries not beginning with an '>'. Afterwards you combine your array to an string using implode().
This regex should strip all HTML opening and closing tags: /(<[.*?]>)+/
You can use it with preg_replace like this:
$test = "Hello <strong>World!</strong>";
$regex = "/(<.*?>)+/";
$result = preg_replace($regex,"",$test);
actually this is not very efficient, but it worked for me
$your_string = '...';
$search = 'markdown';
$left = '<strong>';
$right = '</strong>';
$left_Q = preg_quote($left, '#');
$right_Q = preg_quote($right, '#');
$search_Q = preg_quote($search, '#');
while(preg_match('#(>|^)[^<]*(?<!'.$left_Q.')'.$search_Q.'(?!'.$right_Q.')[^>]*(<|$)#isU', $your_string))
$your_string = preg_replace('#(^[^<]*|>[^<]*)(?<!'.$left_Q.')('.$search_Q.')(?!'.$right_Q.')([^>]*<|[^>]*$)#isU', '${1}'.$left.'${2}'.$right.'${3}', $your_string);
echo $your_string;
$variable = 'Afrikaans
Shqip - Albanian
Euskara - Basque';
How do I convert each new line to paragraph?
$variable should become:
<p>Afrikaans</p>
<p>Shqip - Albanian</p>
<p>Euskara - Basque</p>
Try this:
$variable = str_replace("\n", "</p>\n<p>", '<p>'.$variable.'</p>');
The following should do the trick :
$variable = '<p>' . str_replace("\n", "</p><p>", $variable) . '</p>';
Be careful, with the other proposals, some line breaks are not catch.
This function works on Windows, Linux or MacOS :
function nl2p($txt){
return str_replace(["\r\n", "\n\r", "\n", "\r"], '</p><p>', '<p>' . $txt . '</p>');
}
$array = explode("\n", $variable);
$newVariable = '<p>'.implode('</p><p>', $array).'</p>'
<?php
$variable = 'Afrikaans
Shqip - Albanian
Euskara - Basque';
$prep0 = str_replace(array("\r\n" , "\n\r") , "\n" , $variable);
$prep1 = str_replace("\r" , "\n" , $prep0);
$prep2 = preg_replace(array('/\n\s+/' , '/\s+\n/') , "\n" , trim($prep1));
$result = '<p>'.str_replace("\n", "</p>\n<p>", $prep2).'</p>';
echo $result;
/*
<p>Afrikaans</p>
<p>Shqip - Albanian</p>
<p>Euskara - Basque</p>
*/
?>
Explanation:
$prep0 and $prep1: Make sure each line ends with \n.
$prep2: Remove redundant whitespace. Keep linebreaks.
$result: Add p tags.
If you don't include $prep0, $prep1 and $prep2, $result will look like this:
<p>Afrikaans
</p>
<p>Shqip - Albanian
</p>
<p>Euskara - Basque</p>
Not very nice, I think.
Also, don't use preg_replace unless you have to. In most cases, str_replace is faster (at least according to my experience). See the comments below for more information.
Try:
$variable = 'Afrikaans
Shqip - Albanian
Euskara - Basque';
$result = preg_replace("/\r\n/", "<p>$1</p>", $variable);
echo $result;
I know this is a very old thread, but I want to highlight, that suggested solutions can have some issues in HTML world:
They do not check whether there is already a p tag around respective paragraph. This can result in extra paragraphs. At least some browsers will then show this as extra paragraphs, meaning <p>line1<p>line2</p>line3</p> will result in 3 paragraphs, which may not be the intention.
In fact, there is a bunch of tags, that are not expected inside of p, as per the spec of phrasing content. Or rather there is a limited set of tags, tha can be.
They will change new lines inside tags, where you want to preserve new lines as is. pre and textarea are the ones, where you could generally want that. code, samp, kbd and var are an example of other common values, but technically it can be any tag with white-space CSS property set to either pre, pre-wrap, pre-line or break-spaces.
They usually only check for \r\n or just \r or \n, while there are actually more symbols, that would mean new line, and they also have respective HTML entities, which can easily occur in HTML string.
To "combat" these flaws, at least, to an extent, I've just released a nl2tag library, which can also "convert" new lines to <li> items and has an "improved" nl2br logic (mostly for the sake of whitespace retention).
It's far from perfect (check the readme for limitations), but should cover you in case of relatively simple HTML string.