replace string pattern in HTML text with PHP - php

For my customer I wrote a custom web-based WYSIWYG HTML editor. It allows them to format basic HTML text and insert images. When they insert images I insert them with pattern like ##image1##. The produced HTML can be something like this:
<p>some text and some more text</p>
<p>some text and some <b>bold text</b></p>
<div>##image1##</div>
<p>more text can follow here</p>
<div>##image2##</div>
When outing this HTML I am searching trough it and replacing occurrences for images and replacing ##image1##, ##image2## and so on with HTML markup that actually display images. My replace code is here:
// first find all occurrences of image string
preg_match_all('|##(.+)##|', $inputHTML, $matches);
for every match in $inputHTML
$output = preg_replace('|##(.+)##|', $imageHTML, $inputHTML, 1 );
This will work mot of the times, but in some variations of input HTML will parse strange result. One of the HTML that produces strange result is:
<div>##image1##</div><p class="align-justify"><strong>Peter Dekleva</strong>, <strong>Damir Lisica</strong>, <strong>Anej Kočevar</strong> in <strong>Gregor Jakac</strong> so glasbeniki, ki v svoji glasbi združujejo silovite instrumentalne vložke, markantne melodije in močna besedila.</p><div>##image2##</div><p class="align-justify">Video dvojček skladbe Brez strahu torej prikazuje oblico sproščenih trenutkov iz zaodrja, veličasnih posnetkov s koncertnega dogajanja, priprav na nastope, nepredvidljive zaključke noči.</p>
If I edit that HTML and add a line brake before <div>##image2##</div> then it will parse it OK. Any idea what is happening here and why I have problems?
I am also opened to suggestions for a better way of doing this. I can insert something else instead ##image1## when inserting image in my WYSIWYG editor... Thanks

This is because the + modifier is greedy. So it will match everything until the last instance of ##. Try adding a ? after the + to change it to ungreedy.
|##(.+?)##|
The reason that a line break fixes the problem is because by default the . doesn't match line breaks. however if you had done instead: |##(.+)##|s the line break wouldn't have fixed the problem.
Edit I just noticed that churk's answer to your previous question would have also worked correctly.

you should create <img/> directly - but anyway, if you don't use # for your image names, use ^# instead of .
also if you are not sure that ## won't be used in other HTML, test for <div> too
<div>##(^#+)##</div>

Related

PHP output string and maintain spacing [duplicate]

Any ideas why formatted text from DB, when echo-ed out in php loses its formatting, i.e. no new lines? Thanks!
Use nl2br().
New lines are ignored by browser. That's why you see all text without line breaks. nl2br() converts new lines to <br /> tags that are displayed as new lines in browsers.
If you want to display your text in <textarea>, you don't need to convert all new lines to <br />. Anyway, if you do it... you will see "<br />"s as text in new lines places.
Because there are no html tags for formatting!
Try the nl2br function.
You could try add nl2br() function...
something like this: echo nl2br($your_text_variable);
It should work ;-)
The reason
This is the default behavior for all user agents. If you look at the page source, you'll see that your text has the same formatting like the one in the database (or textarea).
The reason of your confusion is probably that you once see the text in the <textarea> tag, which displays preformatted text, does not interpret the tags, and in the other case the text is interpreted (whitespace is not important in this case).
The browsers don't display new lines, unless specifically asked for - using <br> tag or any block level tags.
No tags == no new lines.
The fix
If you store preformatted text in the database,
you should wrap the output in the <pre> tag.
You may want to convert the formatting characters to the HTML tags you need using set of functions like nl2br, str_replace etc.
You may also correct your structure to store the HTML in the database instead of just plain text (however markup looks like a better solution).
See similar question:
How do I keep whitespace formatting using PHP/HTML?
The difference between the two images you show is that one has the text in a <textarea></textarea> and the other does not ... if you want 1:1: <textarea><?php echo $yourVariable;?></textarea>
It does output what you say to output. If the text is pre-formatted, put it inside the HTML <pre></pre> tag in your output script.
This should be helpful in answering.
How do I keep whitespace formatting using PHP/HTML?enter link description here
Set up a string preprocessing code for both input to database and output to display page

RegEx replace not working in PHP

I've written a regular expression to get the first two paragraphs from a database clob which stores its content in HTML formatting.
I've checked with these online RegEx builder/checkers here and here and they both seem to be doing what I want them to do (I've altered the RegEx slightly since these checkers to handle the new line formatting which I found after.
However when I go to use this in my PHP it doesn't seem to want to get just the group I'm after, and instead matches everything.
Here is my preg_replace line:
$description = preg_replace('/(^.*?)((<p[^>]*>.*?<\/p>\s*){2})(.*)/', "$2", $description);
And here is my testing content in the format of the content I am getting
<p>
Paragraph 1</p>
<p>
Paragraph 2</p>
<p>
Paragraph 3</p>
I've had a look at this SO Post which didn't help.
Any Ideas?
EDIT
As pointed out in one of the comments you cannot Regex HTML in PHP (Don't know why, I'm not really bothered by that).
Now I'm opening the option for getting it in PL/SQL as well.
select
DBMS_LOB.substr(description, 32000, 1) /* How do I make this into a regular expression? */
from
blog_posts
Your input contains newlines, therefore you have to add the s modifier:
/(^.*?)((<p[^>]*>.*?<\/p>\s*){2})(.*)/s
Otherwise, .* breaks on newlines and the regex doesn't match.
You could take a look at the PHP Simple DOM Parser. Going by their manual, you could do something like so:
$html = str_get_html('your html string');
foreach($html->find('p') as $element) //This should get all the paragraph elements in your string.
echo $element->plaintext. '<br>';

What is the best way to parse text and code in my PHP blog?

Usually, I use nl2br() and it does come out just like it's entered in the textarea, but this causes a problem when using bbcode or posting code in <code> or <pre> tags, since it adds extra line breaks.
For example this code
[sub-title]test[/sub-title]
some text here.
I'd like it to look like that when displayed in the browser, but because [sub-title] becomes <div class="sub-title"> the <br /> adds an extra line break, so it will look like this (with 2 line breaks in between)
**test**
some text here.
I haven't fully looked into it yet, but could the PHP bbcode parser help, or is the only/best solution to use regex?
You can use nl2br()
Example
$message = nl2br(preg_replace('#(\\]{1})(\\s?)\\n#Usi', ']', stripslashes($message)));

Why does PHP echo'd text lose its formatting?

Any ideas why formatted text from DB, when echo-ed out in php loses its formatting, i.e. no new lines? Thanks!
Use nl2br().
New lines are ignored by browser. That's why you see all text without line breaks. nl2br() converts new lines to <br /> tags that are displayed as new lines in browsers.
If you want to display your text in <textarea>, you don't need to convert all new lines to <br />. Anyway, if you do it... you will see "<br />"s as text in new lines places.
Because there are no html tags for formatting!
Try the nl2br function.
You could try add nl2br() function...
something like this: echo nl2br($your_text_variable);
It should work ;-)
The reason
This is the default behavior for all user agents. If you look at the page source, you'll see that your text has the same formatting like the one in the database (or textarea).
The reason of your confusion is probably that you once see the text in the <textarea> tag, which displays preformatted text, does not interpret the tags, and in the other case the text is interpreted (whitespace is not important in this case).
The browsers don't display new lines, unless specifically asked for - using <br> tag or any block level tags.
No tags == no new lines.
The fix
If you store preformatted text in the database,
you should wrap the output in the <pre> tag.
You may want to convert the formatting characters to the HTML tags you need using set of functions like nl2br, str_replace etc.
You may also correct your structure to store the HTML in the database instead of just plain text (however markup looks like a better solution).
See similar question:
How do I keep whitespace formatting using PHP/HTML?
The difference between the two images you show is that one has the text in a <textarea></textarea> and the other does not ... if you want 1:1: <textarea><?php echo $yourVariable;?></textarea>
It does output what you say to output. If the text is pre-formatted, put it inside the HTML <pre></pre> tag in your output script.
This should be helpful in answering.
How do I keep whitespace formatting using PHP/HTML?enter link description here
Set up a string preprocessing code for both input to database and output to display page

Problem displaying the mysql content in Paragraphs

I insert questions(which might be a few paragraphs) in a sql table using php and than i diplay them on a webpage.
but when i display the question it loses its formatting. I mean it will just show the whole question in one paragraph, even thou there were many paragraphs before.
<td width=\"700px\" bgcolor=\"#EAD57F\"><font color=\"#4A2A0B\">Question :</font><font color=\"#5E450B\">".$row2['Question']."</font></td>
$row2['Question'] --> is my question that i am getting from my sql table by running the SELECT query.
So if i post something like :
a
s
d
f
into my input box.
the output looks like : asdf
How should i resolve this?
Best
Zeeshan
You probably save your paragraphs separated by a "new line" character. To translate that in HTML check the nl2br PHP function (in HTML new line is the <br /> tag).
Are you storing them as plain text, or do they contain HTML tags? If they are stored as plain text you should put them in a <pre> tag or something equivalent in order to preserve the spacing. Alternatively, you could do the encoding into HTML, putting in <p> tags and such where necessary, but that is complicated and easy to get wrong.

Categories