How to convert html into one line string?

How to convert html into one line string? - php

I have some large templates as html:
<% /*
This is basically all the markup and interface
/* %>
<div id="test" class="test-right"></div>
<div id="test" class="test-right"></div>
Which I need to have as one line string, for example:
<% /*\n This is basically all the markup and interface\n*/ %>\n<div id=\"test\" class=\"test-right\"></div>\n <div id=\"test\" class=\"test-right\"></div>
How can you do that?
original_string='test this id="one" test this id="two"'
string_to_replace_Suzi_with=\"
result_string="${original_string/"/$string_to_replace_Suzi_with}"

If you want to replace newlines (carriage return) by a litteral \n, you can use awk:
awk -v ORS='\\n' 1 file
This will replace the output record separator ORS to a litteral \n.
The 1 triggers the default action in awk, print the entire record.

If you are looking for a pure bash way to do this, you can run them in command-line one at a time or in a script.
# Store the contents of the file into a variable
fileContents="$(<file)"
# Create a temporary string using 'GNU mktemp'
tempfile="$(mktemp)"
# Parameter substitution syntax to replace the new-line character by
# empty string and store it in the file identified by the temporary
# name
printf "%s\n" "${fileContents//$'\n'//}" > "$tempfile"
# Revert the temp file to original file
mv "$tempfile" file
But considering bash is slower for this trivial task, use Awk by restting the ORS from new-line to empty string,
awk -v ORS="" 1 file
<% /* This is basically all the markup and interface/* %><div id="test" class="test-right"></div> <div id="test" class="test-right"></div>

Related

Regex: Exclude first line with brackets

I am trying to strip all HTML brackets, except anything from the first line of code using this REGEX
(?ms)(?!\A)<[^>]*>
It's very close to working, unfortunately it strips the closing brackets from the first line as well. The example I am working with is:
<div id="uniquename">https://www.example.com?item_id=10302</div>
<div id="uniqname2">
<div id="uniqname3">
<h2 id="uniqnametitle">Title</h2>
<div class="row">
<div class="large-3 columns">Example:</div>
<div class="large-9 columns"><b>Sub example</b></div>
</div>
<div class="row">
<div class="large-3 columns">Additional</div>
The current REGEX removes all other HTML tags and excludes the first line with the exception of the trailing div close tag and outputs the following:
<div id="uniquename">https://www.example.com?item_id=10302
Title
Example:
Sub example
Additional
If there is a better way to perform the REGEX than excluding the first line I am open to suggestions. Skipping the first line seems to be the easiest way, however, I need the end bracket to stay intact.
What am I missing in my REGEX?

You can try this
(?ms)((?<firstline>\A[^\n]*)|(<[^>]*>))
With substitution
$firstline
Playground for your example - https://regex101.com/r/ASItOP/3

UPDATE 1 : just realized it could be massively simplified
gawk 'NR==!_ || (NF=NF)*/./' FS='<[^>]+>' OFS=
mawk 'NR==!_ || (NF=NF)*/./' FS='^(<[^>]+>)+|(<[/][^>]+>)+$' OFS=
1 <div id="uniquename">https://www.example.com?item_id=10302</div>
2 Title
3 Example:
4 Sub example
5 Additional

You should use an HTML parser in general...
However, you can do:
$ cat <(head -n 1 file) <(sed 1d file | sed -E 's/<[^>]*>//g; /^$/d')
Or an awk:
$ awk 'FNR==1 {print; next}
{gsub(/<[^>]*>/,""); if ($0) print}' file
Either prints:
<div id="uniquename">https://www.example.com?item_id=10302</div>
Title
Example:
Sub example
Additional

Remove line breaks in a xml file, in tags and between, keeping the structure

Long title :)
Anyways, I have many XML files that I wish to clean up on fly, simple convert on fly with PHP preg_replace RegEx output.
Now I can't make the changes permanent, so I've written a php function to go thru the file.
What I can't fix is the RegEx pattern.
https://regex101.com/r/bN5eF4/7
I want to match:
<all-tags with-their="attribute"
even-if-there="are-more">
and all the content between the start and end tag
even if there
are line breaks
in between them
</all-tags>
I bet it's very simple, but I've never handled RegEx very well... sadly.
Edited
Seems people want me to build a parser function of SimpleXML, that goes throu the xml file and remove the line breaks?
In the same process, I want to remove some elements with their content, depending on what it says in their attributes. Profiling so to speak.
I thought doing line breaks and profiling before processing the xml file with Xsltprocessor would be the faster choice?

I managed to do it with 2 regex patterns.
Input:
<all-tags
with-their="attribute"
even-if-there="are-more"
aa="1">
and all the content between
the start and end tag
</all-tags>
<meta-tag />
1. remove newline before open tag and after closing tag https://regex101.com/r/PPzkWv/2/
/(?<=\>)(\n+)|(\n+)(?=\<)/
Output:
<all-tags
with-their="attribute"
even-if-there="are-more"
aa="1">and all the content between
the start and end tag</all-tags><meta-tag />
2. from the output remove newlines inside tags without breaking semantic https://regex101.com/r/GvBc7J/3/
/(\s?\n+\s+|\n)/
Final output:
<all-tags with-their="attribute" even-if-there="are-more" aa="1">and all the content between the start and end tag</all-tags><meta-tag />

Try following regex:
/(?<=\>)(\r?\n)|(\r?\n)(?=\<\/)/
Here you are searching for newline character at end of > or at beginning of </ and replacing it will empty string.
See demo at Regex101
Based on your sample input text, it will remove all newlines and emit content as:
<all-tags with-their="attribute" even-if-there="are-more">and all the content between the start and end tag</all-tags>

Regexp pattern to match (and remove) multi-line PHP code from include files

I'm loading a template file into a string for further processing (using file_get_contents). This template may contain PHP code which I need to remove before sending the reformatted template contents to stdout. The PHP code should not be executed, it should just be removed.
Example:
<h1>This is a template. This is HTML code.</h1>
<?php
// This is a PHP comment.
uselessFunction ('foo', $bar);
/* This is another PHP comment */
?>
<p>This is more HTML code followed by </p><?= outputUselessInfo ('Blah blah') ?>
<h1>More HTML</h1>
<? echo "foo " . $bar; ?>
<p>That's all, folks</p>
I need to strip out all PHP code, leaving me with:
<h1>This is a template. This is HTML code.</h1>
<p>This is more HTML code followed by
<h1>More HTML</h1>
<p>That's all, folks</p>
What regexp pattern would match all PHP code, either single line or multi-line, either long or short tags (and, e.g. by means of preg_replace, remove it, leaving no empty lines as a result of this operation)?
I've been staring myself blind at it but I can't see my way out. According to Google I'm the first one dumb enough to try this, because I haven't managed to find any ready-to-use patterns there.
(PS: I know the use of short tags in PHP is generally discouraged; I just want to cover the possibility.)

Try the following regular expression (replace with ""):
/\n?<\?(php|=)?(.*?)\?>\n?/ms
Explained:
\n? - Tests for a newline
< - Tests for start tag
\? - Tests for '?' after the start tag
(php|=)? - Tests for the 'php' or '=' after the start tag
(.*?) - Tests for any PHP code
\? - Tests for end tag
\n? - Tests for a newline
/ms - Allows multiple lines
EDIT: Fixed Multiline Support

Or try this one
/(<[a-z].*?>.*?>)/gm
but it takes out all html.
O.k., another try
/(<\?.*?\?>)/gms
Now it should be following the assignment.

Form deleting spaces

<form action="class.php" method="POST">
thread link:
<br>
<input type="text" name="thread">
<input type="submit" value="Submit">
</form>
I have this simple form. Upon entering a string starting with many spaces, something like
" test"
my PHP code
echo 'test:'.$_POST['thread'];
will print test: test. It will erase all spaces except one.
Where did all the spaces go and why does this happen?

Specification of HTMLs tells, renderer removes multiple spaces. That is useful in some cases. To avoid that, you can place content of this field in <pre></pre> block. Like that:
echo '<pre>test:'.$_POST['thread'].'</pre>';

The form does not delete spaces. Neither does your PHP code. The spaces are still there in resulting HTML document (generated by your PHP code in response to form submission). They just get rendered as a single space, since in most contexts, any sequence of whitespace characters in HTML content is equivalent to a single space. This is defined in CSS 2.1 spec, in the description of the white-space property.
Thus, to prevent the collapse of spaces, the simple way is to set white-space: pre in CSS. It also prevents line breaks in the content, but this is probably not a problem here. Using the pre element in HTML causes this setting, but it also sets font family to monospace.
So this is just a matter of HTML and CSS, independently of PHP. Example:
<p> Hello world!</p>
<p style="white-space: pre"> Hello world!</p>

You need to convert whitespaces to html entities
$thread = str_replace(' ', ' ', $_POST['thread'])
and now echo 'test:'.$thread will output your text with whitespaces.

This is the most basic thing about HTML. Any whitespace is equivalent and is treated as a single space.
You should never use multiple spaces to try to layout your text in HTML ( like you could do in Word for instance ). You should use css styles like margin or padding instead.
The answers that propose to replace the spaces with & nbsp; are correct, but they leave you on the wrong track.

What good is new line character?

I don't really get it: what's the purpose of a new line character?
If I do this:
<?php
echo "This is a test. \n";
echo "This is another test.";
?>
Code results in both sentences being in the same line. Why doesn't the \n causes the second sentence being in second line?
The sentences are each in it's own line, if I do:
<?php
echo "This is a test. <br>";
echo "This is another test.";
?>
But I have also seen people do this:
<?php
echo "This is a test. <br>\n";
echo "This is another test.";
?>
Which essentially results in the same output as the second code snippet. Someone care to explain this?

The HTML standard treats a line break as just another white space character, which is why the <br> tag exists. Note however a line break will work within a <pre> tag, or an element with the white-space:pre CSS style.
The third example is just to make "pretty" HTML: it makes it easier to "view source" and check it by eye. Otherwise, you have a big long string of HTML.

as you have observed there are different ways to create a new line.
<br />
this is not a new line character, this is an XHTML tag which means, it works in XHTML.
correctly speaking it is not a new line character but the tag makes sure, one is inserted = it forces a line break. closing tag is mandatory.
XHTML specs
<br>
this is a HTML tag which forces a line break. closing tag is prohibited.
HTML 4.1 specs
\n
is an escape sequence for the ASCII new line char LF. A common problem is the use of '\n' when communicating using an Internet protocol that mandates the use of ASCII CR+LF for ending lines. Writing '\n' to a text mode stream works correctly on Windows systems, but produces only LF on Unix, and something completely different on more exotic systems. Using "\r\n" in binary mode is slightly better, as it works on many ASCII-compatible systems, but still fails in the general case. One approach is to use binary mode and specify the numeric values of the control sequence directly, "\x0D\x0A".
read more
PHP_EOL
is a php new line constant which is replaced by the correct system dependent new line.
so the message is, use everything in it's right place.

<br> will give you a new line in the user's view; \n will give you a new line in source code, ie. developer's view.

When the html is rendered, only the "<br />" renders the break line. However the markup is much easier to read when "<br />\n" is printed, so that everything is not in one long line.

\n is code based
<br /> is HTML tag based
There is a clear distinction between the two.

Your problem is the way html is rendered. If you look in the source code, the first example will have the two lines on separate lines, but the browser does not see line breaks as breaks that should be displayed. This allows you to have a long paragraph in your source:
rghruio grgo rhgior hiorghrg hg rgui
ghergh ugi rguihg rug hughuigharug
hruauruig huilugil guiui rui ghruf hf
uhrguihgui rhguibuihfioefhw8u
beruvbweuilweru gwui rgior
That would only wrap as the browser needed it to, allowing it to be easily editable at the right line length, but displayed at any resolution.

HTML does not care about new lines in the source code, you can put it all in one line. It will only interpret <br /> as a new line. You should use an \n to beautify your HTML-output though, but the better way is to not output it with PHP, but to use it in the HTML itself and only embed PHP stuff into it, like this:
<ul id="menu">
<?php foreach ($menu_items as $item): ?>
<li>
<a href="<?= htmlspecialchars($item['link']) ?>" title="<?= htmlspecialchars($item['title']) ?>">
<?= htmlspecialchars($item['title']) ?>
</a>
</li>
<?php endforeach; ?>
</ul>
That way you won't have to bother with formatting inside PHP, but you automagically have it, by design, in HTML. Also, you seperate Model-logic and View-logic from each other like this and leave the output to your HTTP Server rather than the PHP engine.

That's because you're creating HTML and view it in a browser, and whitespace is more or less ignored there. Ten spaces don't produce a bigger gap than one space, but that doesn't mean that the space character doesn't work. Try setting the content type to text/plain or look at the HTMLs source to see the effect of the newline.

The correct XHTML syntax for it would be
echo "This is the test code <br />\n";
The <br /> renders a new line onscreen, the "\n" renders a new line in the source coed

The new line character is useful otherwise, such as in a PDF. You're correct that the new line character has very little do with HTML as other people have said, it's treated as another while space character. Although it is useful inside the <pre> tag. It can also be used to format the HTML output to make it easier to read. (It's a little annoying to try to find a piece of HTML in a string that's 1000 characters wide.)
The new line character is also useful when storing data in the database. Usually you want to store the data without HTML special characters such as <br /> so that it can be easily used in other formats (again, such as PDF). On output, you want to use the nl2br() function to convert the new lines to <br />s.

The new line character is is useful for string functions.
For example:
explode( '\n' , $input );
Would split a string by a new line.
str_replace( '\n' , '<br />' , $input );
Would replace every newline in $input with a br tag.
Also because PHP also has a CLI, \n is useful for formatting:
eg.
echo 'Hello world';
Would, in the CLI, output;
Hello worldphp>
echo 'Hello world' . "\n";
would output;
Hello world
php>

Although it also has uses when writing web-based scripts, keep in mind PHP is more than a web engine; it also has a CLI where the br tag is useless.

<br /> is also useless if you're running a script from the command line.
$ php -f file.php
Output <br />$
I know not too many people use PHP from the command line, but it does come up:
file.php:
<?php
print "Output\n";
?>
At the command line:
$ php -f file.php
Output
$

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to convert html into one line string? - php

If you want to replace newlines (carriage return) by a litteral \n, you can use awk: awk -v ORS='\\n' 1 file This will replace the output record separator ORS to a litteral \n. The 1 triggers the default action in awk, print the entire record.

Related

Regex: Exclude first line with brackets

Remove line breaks in a xml file, in tags and between, keeping the structure

Regexp pattern to match (and remove) multi-line PHP code from include files

Form deleting spaces

What good is new line character?

Categories

Resources