New line formatting when using HTML file as Word file?

New line formatting when using HTML file as Word file? - php

I'm writing a PHP application for a client that needs a pre-existing HTML page I've already created to be "exported" as an Word file. Simply, this is how it's done:
if (isset($_GET["word"])) {
header("Content-type: application/vnd.ms-word");
header("Content-Disposition: attachment;Filename=some_file.doc");
}
This, of course, will be called if a "word" flag is located in the page querystring, e.g.:
whateverpage.php?somequery=string&someother=test&word
Anyways, my question is, despite how complex this HTML page actually is, it actually transfers pretty well to a nicely formatted Word file just by changing the content-type. The only problem I'm having is that new line breaks (HTML <br> tags) aren't formatting properly. E.g.: In my html, if I have something that looks like
Aug
01
with a BR between the lines, it always ends up showing
Aug 01
in the generated Word file.
I've done some Googling and lots of tests with various other things but nothing seems to format properly with a simple new line.
Does anyone know how to properly format a new line character in a Word file that's being created from an HTML file?
Any help is greatly appreciated.
Edit:
I've tried wrapping the said line in a P tag, ala:
<p>Aug<br>01</p>
Without luck. I've also tried making a basic document and Word, saving it as an HTML file and looking at the generated (i.e sloppy) Word HTML source. There is some CSS in there that I thought might give me a clue, but I tried everything and nothing seemed to work properly. Word seems to add an 'MsoNormal' class to wrapped paragraphs, I tried adding this but it just removes any font formatting I had and doesn't help. Here is the CSS Word creates itself:
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-unhide:no;
mso-style-qformat:yes;
mso-style-parent:"";
margin-top:0cm;
margin-right:0cm;
margin-bottom:10.0pt;
margin-left:0cm;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;
mso-fareast-language:EN-US;}

I had this same problem, I was tagging my line breaks like so:
<br/>
When I changed it to just
<br>
Then my line breaks starting working.

Your problem is probably due to the fact that when you switch the content type to a Word document, the browser doesn't render it as HTML. My guess is that you need to add a newline to the Word document if you want a line break.
How to insert this line break? I'm not sure, but you could always try:
echo "Aug\r\n01";
Where \r\n are the newline characters.

How about, if you want to maintain a line-break, just echo "<p>Aug</p><p>01</p>"; it ain't pretty, but it should effect the line break you're looking for.

Related

Replace only on lines which contain no html tags

My knowledge in the RegEx context is still not big enough. The example should demonstrate my problem - I hope. I parse a text and render HTML. Currently, my problem is to set the paragraph markup for each text, paragraph without a markup and a line ending.
An example text:
<h1>Header</h1>\nA simple text with less of words. Yes much more lines.\n<h2>Tests</h2>\nThe solution is still active in his tests.\n
I like to add a simple paragraph <p> markup to each line (before <p> and after </p>), if it is without markup or an empty line, like ''.
The goal of the example below should looks like:
<h1>Header</h1>\n<p>A simple text with less of words. Yes much more lines.</p>\n<h2>Tests</h2>\n<p>The solution is still active in his tests.</p>\n
I'm tried
My current RegEx parse that, but have the problem if I have a line is empty or after an empty line after a tag, like </code>\n.
'#(?![a-z][0-9]).(.*\n)#'
I tried also with negative look for closing the HTML tag like #(?!\>).(.*\n)#.
Online test
https://regex101.com/r/khYWy4/2

Use another tool if you can!
Depending on how you are going to use this, I will recommend that you find a solution which is not based on regex. This task is better solved by iterating the lines in a proper script or program, perhaps the one which generates the html in the first place, and injects the tags you need.
Having said that, I appreciate that sometimes there is no optimal solution.
My attempt to solve yor case
I have updated your example with a substitution which does seem to do what you want.
\n([^<>\n;]+?)\n
Substitute with
\n<p>\1</p>\n
The updated example:
https://regex101.com/r/khYWy4/3
Be aware of a few things here:
I ignore any lines which already contain any html tags.
I ignore any lines which contain a semicolon, to avoid tags in your code block.
Disclaimer!
Depending on what other cases you have may look like, these simple skips were made just to make your example work. I can not guarantee that this will work for a larger set of data.

What does exactly mean by 'Line feeds' in HTML and PHP? How do they are added in HTML and PHP code?

I was reading PHP Manual and I come across following text paragraph :
Line feeds have little meaning in HTML, however it is still a good
idea to make your HTML look nice and clean by putting line feeds in. A
linefeed that follows immediately after a closing ?> will be removed
by PHP. This can be extremely useful when you are putting in many
blocks of PHP or include files containing PHP that aren't supposed to
output anything. At the same time it can be a bit confusing. You can
put a space after the closing ?> to force a space and a line feed to
be output, or you can put an explicit line feed in the last echo/print
from within your PHP block.
I've following questions related to the text from above paragraph :
What does exactly mean by 'Line feeds' in HTML?
How to add them to the HTML code as well as PHP code and make visible in a web browser? What HTML entities/tags/characters are used to achieve this?
Is the meaning of 'Line feed' same in case of HTML and PHP? If no, what's the difference in meaning in both the contexts?
Why the PHP manual is saying in first line of paragraph itself that? What does PHP Manual want to say by the below sentence?
"Line feeds have little meaning in HTML"
How can it be useful to remove a linefeed that follows immediately after a closing tag ?> when someone is putting in many blocks of PHP or include files containing PHP that aren't supposed to output anything?
Please someone clear my above mentioned doubts by giving answer in simple, lucid and easy to understand language. If someone could accompany the answer by suitable working code examples it would be of great help to me in understanding the concept more clearly.
Thank You.

What does exactly mean by 'Line feeds' in HTML?
It is a general computing term.
The character (0x0a in ASCII) which advances the paper by one line in a teletype or printer, or moves the cursor to the next line on a display.
— source: Wiktionary
How to add them to the HTML code
Press the enter key on your keyboard. Note that (with a couple of exceptions like <pre>) all whitespace characters are interchangeable in HTML. A new line will be treated as a space.
as well as PHP code
Ditto … or you could use the escape sequence \n inside a string literal.
and make visible in a web browser?
The material you quoted is talking about making source code look nice. You generally don't want line feed characters to be visible in a browser.
You could use a <pre> element instead.
Outside of <pre> elements (and the CSS setting they have by default) you can use a space instead of a new line for the same effect in HTML.
What HTML entities/tags/characters are used to achieve this?
… but the advice given in the last sentence of the material you quoted is probably a better approach.

'Lines feed' exactly means a 'New line' both in Html and Php, only the syntax is different.
In case of Html tag, you can use <br> or <br/> tag for a Lines feed. Basically, this tag shows a new line in the output of the Html attribute block, while running through the browser.
You can take the following example for <br> tag:
<html> <body>
<p> To break lines<br>in a text,<br/>use the br element. </p>
</body> </html>
Output:
To break linesin a text,use the br element.
In case of Php, you can use '\n' for a lines feed.
If you are using a string in Php, then instead of writing,
echo "New \nLine";
you can use nl2br() function to get line break, like:
echo nl2br("New \nLine");
Output:
New
Line

body in a php file

I created a custom index.php for a wordpress theme. I just renamed the .html to .php file. Everything seems to work fine except there are extra characters printed if I run the page.
These characters are printed at start of the body area in the browser : " --> "
I am confused as to from where these characters are printed. I can create a .php with complete html contents right? Or do I need to do some modification.

<!--this is a HTML comment line -->
If you forget to delete last --> characters after deleting the first part, you might be seeing that. We cannot know without seeing your code.

As answer to the last question, you can mix php and plain HTML. Whenever you are writing php your code must be within
<?php ... CODE HERE ... ?>
Inline php however is not a good programming pattern in my opinion.

PHP adding empty text node before including html file

I'm using php as templating engine, and I've noticed that when I include view file, empty text node is added before content of that view.
For example, I have html file I want to include that has following content:
<p>Some text</p>
than I include that file like this:
<div><?php require_once('file/path.htm'); ?></div>
(notice that I've removed any spaces between div and php) And after php includes file he adds empty text node (which I'll mark like this "") that adds space before p tag, so I get something like this:
Some previous content...
<div>
"" //empty text node
<p>Some text</p>
</div>
This is quite problematic since it ruins content composition. Is there any solution to this?

FSou1 has it right, it's the charset, it can also be solved by saving as UTF-8 without BOM:
Open your PHP inlcude file in Notepad++ (download here: http://notepad-plus-plus.org/)
Select Encoding --> Encode in UTF-8 without BOM
Empty nodes disappear. Hope that helps someone. This was driving me crazy.

I had the same problem right now, and i had a luck when find answer. There answer is in charset. It could be strange, but when you save your file in UTF-8, you have empty in your markup. When your file in cp1251, you dont have this problem.

This was my second issue caused by BOM (both took over an hour of debugging, Googling and hairpulling).
I just found this (windows-only) small drag and drop program that check for BOM which it can remove:
File BOM Detector by Brynt Younce
Softpedia.com/get/System/File-Management/File-BOM-Detector.shtml
Small, easy and simple. There seems to by a PHP solution for all platforms bu I have not tested it.
Take a look if interested:
Github.com/emrahgunduz/BomCleaner

How do I maintain text layout when using file_get_contents?

Let's say I have the following text file:
This is the first line of the text file.
This is the second line,
and here goes the third.
When using
echo file_get_contents($_SERVER{'DOCUMENT_ROOT'} . "/file.txt");
The output is
This is the first line of the text
file. This is the second line, and
here goes the third.
How do I prevent the layout from changing?
Thanks in advance :-)

In HTML, new line characters (\n or \r\n) don't cause actual line breaks to appear in the rendered page. Here are two possible solutions:
Use the nl2br() function to convert newlines to <BR> tags. This will work for some layouts, but not for ASCII art or others that rely on multiple spaces (which are reduced to one in HTML).
echo nl2br(file_get_contents(...));
Wrap the result in <pre> tags. This will keep all layout, but can look a bit ugly. You can style <pre> tags with CSS to make them prettier, if you'd prefer.
echo '<pre>' . file_get_contents(...) . '</pre>';

Every time you are gonna use PHP, it must be done in 2 parts:
Write your page in pure HTML. Make it work as desired. Ask HTML questions if any.
Write a PHP script which produce the the same text.
An HTML tag that could help you is <pre>
Another way to tell to browser that here hoes plain text is to send appropriate HTTP header
So, it this text is the only text being displayed on this page,
header("Content-type: text/plain");
readfile($_SERVER['DOCUMENT_ROOT'] . "/file.txt");

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.