Rendering plain text through PHP - php

For some reason, I want to serve my robots.txt via a PHP script. I have setup apache so that the robots.txt file request (infact all file requests) come to a single PHP script.
The code I am using to render robots.txt is:
echo "User-agent: wget\n";
echo "Disallow: /\n";
However, it is not processing the newlines. How to server robots.txt correctly, so search engines (or any client) see it properly? Do I have to send some special headers for txt files?
EDIT 1:
Now I have the following code:
header("Content-Type: text/plain");
echo "User-agent: wget\n";
echo "Disallow: /\n";
which still does not display newlines (see http://sarcastic-quotes.com/robots.txt ).
EDIT 2:
Some people mentioned its just fine and not displayed in browser. Was just curious how does this one display correctly: http://en.wikipedia.org/robots.txt
EDIT 3:
I downloaded both mine and wikipedia's through wget, and see this:
$ file en.wikipedia.org/robots.txt
en.wikipedia.org/robots.txt: UTF-8 Unicode English text
$ file sarcastic-quotes.com/robots.txt
sarcastic-quotes.com/robots.txt: ASCII text
FINAL SUMMARY:
Main issue was I was not setting the header. However, there is another internal bug, which is making the Content-Type as html. (this is because my request is actually served through an internal proxy but thats another issue).
Some comments that browsers don't display newline were only half-correct -> modern browsers correctly display newline if content-type is text/plain. I am selecting the answer that closely matched the real problem and was void of the above slightly misleading misconception :). Thanks everyone for the help and your time!
thanks
JP

Yes, you forgot to set the Content Type of your output to text/plain:
header("Content-Type: text/plain");
Your output is probably being sent as HTML, where a newline is truncated into a space, and to actually display a newline, you would need the <br /> tag.

header('Content-Type: text/plain') is correct.
You must call this method before anything is written to your output, including white space. Check for whitespace before your opening <?php.
If your Content-Type header has been set to text/plain, no browser in its right mind would collapse whitespace. That behaviour is exclusive to HTML and similar formats.
I'm sure you have your reasons, but as a rule, serving static content through PHP uses unnecessary server resources. Every hit to PHP is typically a new process spawn and a few megs of memory. You can use apache config directives to point to different robots files based on headers like User-Agent - I'd be looking into that.
It's likely that search engines ignore the Content-Type header, so this shouldn't be an issue anyway.
Hope this helps.
-n

<?php header("Content-Type: text/plain"); ?>
User-agent: wget
Disallow: /
BTW, the newlines are there just fine. They're just not displayed in a browser. Browsers collapse all whitespace, including newlines, to a single space.
deceze$ curl http://sarcastic-quotes.com/robots.txt
User-agent: wget
Disallow: /

i was having a similar issue and either "\n" nor PHP_EOL worked. I finally used:
header('Content-Disposition: attachment; filename="plaintext.txt"');
header("Content-Type: text/plain");
echo "some data";
echo chr(13).chr(10);
The echo of BOTH characters did the trick.
Hope it helps someone.
Bye
anankin

You must set the content type of the document you are serving. In the case of a .txt text file:
header("Content-Type: text/plain");
The IANA has information about some of the more popular MIME (content) types.

If you are using echo, then use <br> for new lines. the printf function is what uses \n.
In your case, use printf because you are not using HTML. I believe this is the proper way to do this, along with setting the MIME type to text.

Related

Characters being rendered strangely in HTML/PHP

I'm presenting an RSS feed with this part of a PHP function:
echo "<li><a target='_blank' href='$item_link'>$item_title</a></li>";
Using an example, this output the following in HTML:
<li>
<a target='_blank' href='http://www.internationalaccountingbulletin.com/news/ey-shinnihon-will-audit-toshibas-corrected-accounts-while-under-investigation-4639900'>
EY ShinNihon will audit Toshiba’s corrected accounts… while under investigation
</a>
</li>
The titles have a large amount of discrepancy when it comes to the symbols used.
It outputs this
EY ShinNihon will audit Toshiba’s corrected accounts… while under investigation
as
EY ShinNihon will audit Toshiba’s corrected accounts… while under investigation
with apostrophes and ellipses (among others) being various symbols prefixed by â€.
How can I convert these symbols back to the originals in PHP?
Choose you character encoding to match your what you are editing check this site to learn more. http://htmlpurifier.org/docs/enduser-utf8.html
I took out the charset meta tag because I understood that it was bad practice for speed/SEO. When putting it back in, the problem is rectified, thank you. However, is there an alternative that is better practice? Setting headers via PHP - is that prefferable or worse?
So your problem was that you were outputting text in some encoding, without informing the browser what encoding you're giving it, and the browser therefore misinterpreting the text in the wrong encoding, leading to garbage characters. You always need to inform clients about what encoding you're sending them text in. The primary method to do that over HTTP is an HTTP Content-Type header. That way the browser is informed about the type of content it receives before it actually receives the content. Which is exactly as it should be.
HTML <meta> tags are only a fallback. You should include them, since they help specify the encoding of the HTML document should it ever be used outside of an HTTP context (e.g. you just open it from your hard disk, no HTTP involved, no HTTP Content-Type header, no way to specify the encoding... other than the HTML <meta> tag). But again, it should only be a fallback. And there's absolutely no issue with SEO or speed; wherever you got that from, it's pure FUD.
This will work for you.
first just use mb_convert_encoding() function it will wok for you.
$item_title = addslashes('this is your text');
$item_title = mb_convert_encoding($item_title, "HTML-ENTITIES", 'UTF-8');

symbols displayed at run time but not present in code

I am creating a site with html and php.
When I Run my php page on borwser using localhost(XAMPP server), then some symbols () are displayed but when I check my html-php code, then no symbol or script like: ¿ or » is found.
If i am wrong somewhere then Please let me know.
That's a UTF-8 byte-order marker. You should configure your editor to save UTF-8 without BOM. It isn't mandatory for the UTF-8 encoding; in fact, its use is discouraged and it only causes problems.
Additionally, make sure your web server is sending an appropriate Content-Type HTTP header:
Content-Type: text/plain; charset=utf-8
¿ or » are html entities, they are looks different at php code and at browser. You can find them, for example, here. Also, you possibly have an issue with BOM
My best guess: You have an issue with encoding (UTF vs. ISO). Look up encoding used by your editor on saving, and send it to the browser like i.e. header("Content-type:text/html;charset=UTF-8")
sounds like you're dealing with a character encoding problem.
try to declare the encoding in your headers.
header("Content-Type: text/html; charset=UTF-8")
this needs to be output before any text is sent to the client.

Header makes two empty lines before write my text

I have a problem sending a file to client. I want to send plain text using header, but the out file, instead of having just my content, has two empty lines at the beginning. I don't know why this happens.
The variable I use is an xml format like this:
$section=<book>xbook<author>nmauthor</author></book>
I use this code to send the file.
header("Content-type:text/plain");
header("Content-Disposition: attachment;filename=file.xml");
header("Content-Transfer-Encoding:binary");
header("Pragma:no-cache");
header("Expires:0");
echo $section
I will be very grateful if someone helps me.
Could be many things... for one, if you have no space before your opening php tag, one thing to look at would be the file's encoding type. If you're using UTF-8, especially, make sure it is "without BOM" as that could cause your problem.

browsers do not read new line in a php file

I have a password protected text file and to make it password protected, i used a password protector script (which works great) but it required me to rename the text file to .php on my server. This went fine, however, when I open this text file in any browser on windows, i do not seeing any new lines (I used to see them)
I tried writing -"\n", "\r", "\r\n". I think it has to do with the browser thinking its a .php file i guess.
This is because the server is sending a different MIME type. It is now sending text/html (the default type returned by PHP) rather than text/plain.
Your browser is then expecting HTML. Line breaks are just like any other white space in HTML, so they are essentially meaningless for what you are trying to do.
You can use this to fix it:
header('Content-Type: text/plain');
Be sure to put that at the top of your code, or at least before you output anything.
This causes the server to send the MIME type you are expecting.
By default the output of PHP scripts are rendered as HTML, which means that whitespace is folded. If you want to change this back to text then you need to set the Content-Type header to "text/plain", either in the web server or via the header() function.
That's because the browser would see the content as html and in html a newline is just a whitespace
I am not sure if I understood your question properly, but in two cases there is a solution:
-You output the text: In this case, you have to use
<br>
-You want to write it with new lines in the file: Use the PHP-Constant
PHP_EOL
which means End-of-Line. This inserts always a correct break.
You need to use <br>, as html is being rendered in the browser.
Browsers uses HTML to format text (contained in html of course)
use <br> or <p>
Also considering your file is a .php it's a normal behaviour that your webserver will send it as text/html

problem injecting UTF8 encoded JSONP into a Win 1255 encoded webpage

I am developing a third-party service embedded in websites as a JS snippet, that amongst other things need to fetch some jsonp data from my PHP server, and display the text contained in the json object on the hosting embedding website.
I am using jQuery, so I issue the following .getJSON request:
$.getJSON("http://localhost/php/server.php?a=gfs"+"&callback=?",function(Obj) {
doSomething(Obj);
});
and on the PHP side (server.php) I have:
<?php
header('Content-Type: text/javascript; charset=utf8');
$retval = file_get_contents('../scripts/file.json');
//change to json php
$callback = $_GET['callback'];
echo $callback . '(' . $retval . ')';
?>
These work perfectly in FF, but fail in IE when the embedding website is encoded with something different than utf8, specifically a Windows 1255 (Hebrew) web page, in the sense that the text contained in file.json is displayed as gibberish. Changing the encoding of the website (in the browser, not the source) to unicode "fixes" the problem with the displayed text from the json, while of course makes the rest of the page look like gibberish... I had a similar problem with FF, before I added the header(...) line to the php script.
What should I do? Does anyone know why it works well in FF and not in IE? Is there an additional definition such as the header(...) one required for IE specifically?
Constraints:
I have no control on the embedding website
file.json has to be encoded in utf8 (that's how my db works)
The same code needs to be able to handle both utf8 encoded pages and non utf8 pages
urgh.
It seems that the fix for IE is changing the header definition from:
header('Content-Type: text/javascript; charset=utf8');
to
header('Content-Type: text/javascript; charset=utf-8');
yep, the missing "-" in the charset name. turns ot the utf8 (without the dash) is not understood by IE, while it is understood by FF.
The joy.
Hope this might prove helpful to someone, sometime, and save some wasted time.

Categories