Make HTML readable again - php

I have some HTML code in a file created by an online JS editor
<h1>Title</h1><p>Some text</p><p>Some text</p>
that is not easily readable offline.
I'd like to split it like this with php, that is more readable
<h1>Title</h1>
<p>Some text</p>
<p>Some text</p>
I can make a string replace adding the newline after each closure, but if I save several times it adds newlines every time I save.
Do you have any suggestion?
Thank you.
P.S. the online JS editor is Summernote, maybe there is a config to work around this?

what you looking to is "unminify html",there is some online tools can do the work like:
unminify.com
textfixer.com

Following the suggestions of Mohamed, I found Tidy.
Tidy comes with both a shell command (http://tidy.sourceforge.net/) and a PHP library (http://php.net/manual/en/book.tidy.php), both of them work very well and provide sereal tools to maintain HTML code.

Related

Include Line Breaks with htmlspecialchars

I am creating a pattern library similar to A List Apart's here: http://patterns.alistapart.com/. I am using the following code snippet to grab the contents of the HTML file and output it on the page (see the code area on A List Apart's example):
echo "<code class=\"col-md-8 col-sm-6 prettyprint pattern-markup language-markup\">".htmlspecialchars(#file_get_contents($dir.'/'.$ff))."</code>\n";
It is pulling in the code, but is stripping out all line breaks in the HTML so the code all runs together. Is it possible to add the line breaks back into the code view? I know nl2br is an option, but that works if I was outputting the HTML in visual mode, not code.
Thanks!
This has absolutely nothing to do with stripping newlines and everything to do with how basic HTML works.
Try this:
<p>Hello
World</p>
Do you see newlines? Nope.
Try applying the CSS white-space: pre-wrap to your element.
Just Use pre tags And everything will work fine. But put codes like this
<pre>Start from here
Not from here
</pre>

Every time <br> when needed

A part of my HTML looks like this:
Now of course I want some of this text to go to a new line. I can put <br> after each line, but I was wondering whether there is a better/easier way to do it because it could be pretty annoying with long text like this. Like could I use a foreach for this or something?
add your text into pre tag:
<pre>
.. your text
</pre>
and you do not need add br after each line.
As it seems to be a portion of code, I suggest to use SyntaxHighlighter. You will not care neither about indentation nor line numbers. See the official website here.
You can use this
<p style="word-wrap: break-word;width:40px"> your content</p>

Download text-only webpage

The question title says it all, after a bit of Googling and several days of tinkering with code, I cannot figure out how to download the plain text of a webpage.
Using strip_tags(); still leaves the JavaScript and CSS and trying to clean it up with regex also causes issues.
Is there any (simple or complicated) way to download a webpage (say a Wikipedia article) in plain-text using PHP?
I downloaded the page using PHP's file_get_contents(); as here:
$homepage = file_get_contents('http://www.example.com/');
As I said, I tried using strip_tags(); etc but I can't get the plain text.
I've tried using: http://millkencode.googlecode.com/svn/trunk/htmlxtractor/ContentExtractor.php to get the main content but it doesn't seem to work.
This is not nearly as easy as it seems. I'd recommend looking on something like PHP Simple HTML DOM Parser. Aside from JavaScript and CSS being hard to remove (and using RegEx for HTML is not proper) there could still be some inline styling there and stuff like that.
This, of course, is relative to the complexity of the HTML. strip_tags could be sufficient in some cases.
Use this code:
require_once('simple_html_dom.php');
$content=file_get_html('http://en.wikipedia.org/wiki/FYI');
$title=$content->find("#firstHeading",0)->plaintext ;
$text=$content->find("#bodyContent",0)->plaintext;
echo $title.$text;
http://simplehtmldom.sourceforge.net

Regular expression to match block of HTML

First I'll show you a sample of the code I'm working with:
<div class="entry">
<p>Any HTML content could go here!</p>
</div>
</div><!--/post -->
Normally I'd use a regex rule such as the following to look for a prefix and a suffix and grab everything in between:
(?<=<div class="entry">).*(?=</div><!--/post -->)
However, that doesnt appear to be working as it seems to be pulling the white space in between then following parts instead of the HTML content itself:
<div class="entry">
<p>
Any help/suggestions would be much appreciated as I've been bashing my head with this one for a good few hours now.
Many thanks in advance.
Don't use Regex to parse HTML. You need an Xml Parser or similar.
Search Stackoverflow for the best one, like so: Robust and Mature HTML Parser for PHP
You can also consider php strip_tags().

Help with changing how jWYSIWYG editor works

In jWYSIWYG editor, pushing enter inserts <br />s.
Instead of this, I would prefer that pushing enter would wrap chunks in <p> tags.
WHAT IS OUTPUT
line
<br />
new line
WHAT I WANT
<p>line</p>
<p>new line</p>
Quick examination of the config seems I can't do it without hacking it internally.
Do you suggest I hack the plugin, or use PHP to do it? The incoming HTML is parsed with HTML Purifier, so if that could do it, that would be great.
So - where should I do it, in the plugin or PHP?
Any quick implementations of how to do it?
Thanks
You could search replace <br>s with newlines, and then use %AutoFormat.AutoParagraph

Categories