Generate a raw preview of files which outputs formatting - php

Building a Raw view of text in my application, like pastebin does:
http://pastebin.com/raw.php?i=nfVT7b0Z
So, I wrote the script which just echos out the $contents, but newlines, spaces, and tabes are not being proceeded. Looking at the pastebin source above, you will notice they don't wrap the text in any HTML tags such as <pre>. How then are they getting the text to format?
Are they using some special header?

The content you see is not HTML. It's text. You can use the following header to achieve the same result with output sent by php.
Content-Type: text/plain; charset=utf-8
Example:
<?php
header('Content-Type: text/plain; charset=utf-8');
echo "First Line\nSecond Line";
?>

Curl was my friend here. I noticed in the response this header:
Content-Type: text/plain; charset=utf-8
So simply do:
header("Content-Type: text/plain; charset=utf-8");
Before outputting any text.

Related

How to get file content with a proper utf-8 encoding using file_get_contents?

I need to get content of the remote file in utf-8 encoding. The file in in utf-8. When I display that file on screen, it has proper encoding:
http://www.parfumeriafox.sk/source_file.html
(notice the ň and č characters, for example, these are alright).
When I run this code:
<?php
$url = 'http://parfumeriafox.sk/source_file.html';
$csv = file_get_contents_utf8($url);
header('Content-type: text/html; charset=utf-8');
print $csv;
function file_get_contents_utf8($fn) {
$content = file_get_contents($fn);
return mb_convert_encoding($content, 'utf-8');
}
(you can run it using http://www.parfumeriafox.sk/encoding.php), then I get question marks instead of those special characters. I have done huge research on this, I have tried standard file_read_contents function, I have even used some stream bla bla php context function, I also tried fopen and fread function to read that file on binary level, nothing seems to work. I have tried that with and without sending header. This is supposed to be perfectly siple, what am I doing wrong? When I check that string with some encoding detect function, it returns UTF-8.
You can see which character set your browser decided the document was by opening the developer console and looking at document.characterSet:
> document.characterSet
"windows-1250"
With this knowledge we can ask iconv to convert from "windows-1250" to utf-8 for us:
<?php
$text = file_get_contents("source_file.csv");
$text = iconv("windows-1250", "utf-8", $text);
print($text);
The output is valid utf-8, and levanduľa is displayed correctly as well.
How about this one????
For this one I used header('Content-Type: text/plain;; charset=Windows-1250');
bergamot, citrón, tráva, rebarbora, bazalka;levanduľa, škorica, hruška;céderové drevo, vanilka, pižmo, amberlyn
This code works for me
<?php
header('Content-Type: text/plain;charset=Windows-1250');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
The problem is not with file_get_contents()
I save the $data to a file and the characters were correct but still not encoded correctly by my text editor. See image below.
$data = file_get_contents('http://www.parfumeriafox.sk/source_file.html');
file_put_contents('doc.txt',$data);
UPDATE
Seems to be one problematic character as shown here.
It also is seen on the HTML image below. Renders as ¾
Its Hex value is xBE (190 decimal)
I tried these two character sets. Neither worked.
header('Content-Type: text/plain; charset=ISO 8859-1');
header('Content-Type: text/plain; charset=ISO 8859-2');
END OF UPDATE
It works by adding a header WITHOUT charset=utf-8.
These two headers work
header('Content-Type: text/plain');
header('Content-Type: text/html');
These two headers do NOT work
header('Content-Type: text/plain; charset=utf-8');
header('Content-Type: text/html; charset=utf-8');
This code is tested and displayed all characters.
<?php
header('Content-Type: text/plain');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
<?php
header('Content-Type: text/html');
echo file_get_contents('http://www.parfumeriafox.sk/source_file.html');
?>
These are some of the problematic characters with their Hex values.
This is the saved file viewed in Notepad++ with UTF-8 Encoding.
Check the Hex values against these character sets.
From the above table I saw the character set was Latin2.
I went to Wikipedia Windows code page and found that Latin2 is Windows-1250
bergamot, citrón, tráva, rebarbora, bazalka;levanduľa, škorica, hruška;céderové drevo, vanilka, pižmo, amberlyn

Sending HTML Email in php?

Here is the code.
$to = 'youraddress#example.com';
$subject = 'Test HTML email';
//create a boundary string. It must be unique
//so we use the MD5 algorithm to generate a random hash
$random_hash = md5(date('r', time()));
//define the headers we want passed. Note that they are separated with \r\n
$headers = "From: webmaster#example.com\r\nReply-To: webmaster#example.com";
//add boundary string and mime type specification
$headers .= "\r\nContent-Type: multipart/alternative; boundary=\"PHP-alt-".$random_hash."\"";
//define the body of the message.
ob_start(); //Turn on output buffering
?>
--PHP-alt-<?php echo $random_hash; ?>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Hello World!!!
This is simple text email message.
--PHP-alt-<?php echo $random_hash; ?>
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
<h2>Hello World!</h2>
<p>This is something with <b>HTML</b> formatting.</p>
--PHP-alt-<?php echo $random_hash; ?>--
<?
//copy current buffer contents into $message variable and delete current output buffer
$message = ob_get_clean();
//send the email
$mail_sent = #mail( $to, $subject, $message, $headers );
//if the message is sent successfully print "Mail sent". Otherwise print "Mail failed"
echo $mail_sent ? "Mail sent" : "Mail failed";
?>
I don't follow it well. expect someone can do me a favor.
Why should I generate a random hash?
Why I must add boundary string and mime type specification to header?
Why use ob_start();?
4.
--PHP-alt-<?php echo $random_hash; ?>
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
What are those lines meaning? Could I delete them? Thank you.
Generating a random hash is to avoid it colliding with your content.
A "boundary string" tells the email clients where headers start and stop and where the actual email contents start. Since you want to send HTML email, you must specifically tell the email client it will be receiving HTML, not just any content.
Otherwise the HTML and stuff will be sent directly to the browser, ie, the user viewing your site. Instead you want to store the HTML in a variable and use it instead.
Content-Type tells the email client what kind of content you are sending and how it is encoded.
Of course you cannot delete them. It would be like sending you a PDF file without saying it is a PDF and without a proper extension - you won't know what to do with it.
Note
Emails, websites, anything which has a structure (including most files) usually are laid out in a structure of "header" and "body".
The header tells the file reader what to expect in the "body". The "body" is the actual content the reader should do something with.
I am not certain why the random hash is used here, but I think it is just additional safety to ensure a unique boundary string, preventing name collisions between parts.
As to the content-type: you need to specify that to tell the mail client that it should render HTML, and to indicate that your message is multipart. Multipart means there's more than one part, in your case a text-based part and a HTML part.
The boundary part is used to separate the contents of one part form the contents of another part, and from the header.
Using the PHP Output Buffer (ob_start and ob_end_clean) is not necessary at all, you can also just enter strings using quotes or using HEREDOC. An advantage of using the output buffer is that you can end the PHP (using ?>) and have your IDE help you writing HTML. Make sure to add ob_end_clean(); though, it is not yet in your code.
You don't have to. It's just that it makes things easier: the delimiter must be a string that's not part of the mail content.
You need a boundary to split the message in parts. An e-mail message is nothing else that a stream of characters. You need a MIME type so the e-mail client can know what each part contains. Otherwise, it could not know whether it's HTML or not (or a JPEG picture, or a PowerPoint presentation...)
Honestly, it looks like an overcomplicate replacement for regular string assignments. Rather than doing $message = 'Hello World!';, it prints Hello World! to standard output and captures the standard output into a variable.
These lines mean that you are finishing one part of the message and you are starting a new one that contains HTML. You can delete them if you don't want to add another message part that contains HTML but... isn't that what you want to do?

How to get by Content-Id the attachements of a multipart response with PHP CURL

I can't figure out how to get effectively a specific part of a multipart http response.
I'm trying to get a PDF report from Jasper Reports server by sending SOAP over PHPCURL and I didn't find in the CURL documentation how to handle multipart response.
I would like to get 'report' part of the RAW response below.
Thanks for any help.
ilyas
------=_Part_6_9317140.1311257231623
Content-Type: text/xml; charset=UTF-8
Content-Transfer-Encoding: binary
Content-Id: <B33D7700BFCF12CC2A64A7F6FB84CEAE
sutffs here
------=_Part_6_9317140.1311257231623
Content-Type: application/pdf
Content-Transfer-Encoding: binary
Content-Id: <**report**>
binary PDF content here
Try this:
<?php
$str = '------=_Part_6_9317140.1311257231623
Content-Type: text/xml; charset=UTF-8
Content-Transfer-Encoding: binary
Content-Id: <B33D7700BFCF12CC2A64A7F6FB84CEAE
sutffs here
------=_Part_6_9317140.1311257231623
Content-Type: application/pdf
Content-Transfer-Encoding: binary
Content-Id: <**report**>
binary PDF content here';
$pattern = '%Content-Type: application/pdf'."\n".
'Content-Transfer-Encoding: binary'."\n".
'Content-Id: ([^\s]+)%';
preg_match($pattern, $str, $matches);
echo $matches[1];
?>
However, I wouldn't vouch for its reliability, because the structure of the returned string might change at any time.
I modified a little Shef's solution like that:
$pattern = "/report>\r\n\r\n/";
$matches = preg_split($pattern, $output);
file_put_contents('c:/report.pdf', $matches[1]);
And it works now. Thanks Shef.

Proper PHP way to parse email attachments from EML format

I have a file containing an email in "plain text MIME message format". I am not sure if this is the EML format. The email contains an attachment and I want to extract the attachment and create those files again. This is how the attachment part looks like -
...
...
Receive, deliver details
...
...
From: sac ascsac <sacsac#sacascsac.ascsac>
Date: Thu, 20 Jan 2011 18:05:16 +0530
Message-ID: <AANLkTimmSL0iGW4rA3tvSJ9M3eT5yZLTGsqvCvf2fFC3#mail.gmail.com>
Subject: Test attachments
To: ascsacsa#ascsac.com
Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12
--20cf3054ac85d97721049a465e12
Content-Type: multipart/alternative; boundary=20cf3054ac85d97717049a465e10
--20cf3054ac85d97717049a465e10
Content-Type: text/plain; charset=ISO-8859-1
hello this is a test mail. It contains two attachments
--20cf3054ac85d97717049a465e10
Content-Type: text/html; charset=ISO-8859-1
hello this is a test mail. It contains two attachments<br>
--20cf3054ac85d97717049a465e10--
--20cf3054ac85d97721049a465e12
Content-Type: text/plain; charset=US-ASCII; name="simple_test.txt"
Content-Disposition: attachment; filename="simple_test.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gj5n2yx60
aGVsbG8gd29ybGQKYWMgYXNj
...
encoded things here
...
ZyBmZyAKCjIKNDIzCnQ2Mwo=
--20cf3054ac85d97721049a465e12
Content-Type: application/x-httpd-php; name="oscomm_backup_code.php"
Content-Disposition: attachment; filename="oscomm_backup_code.php"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gj5n5gxn1
PD9waHAKCg ...
...
encoded things here
...
X2xpbmsoRklMRU5BTUVfQkFDS1VQKSk7Cgo/Pgo=
--20cf3054ac85d97721049a465e12--
I can see that the part between X-Attachment-Id: f_gj5n2yx60 and ZyBmZyAKCjIKNDIzCnQ2Mwo=, both including
is the content of the first attachment. I want to parse those attachments (file names and contents and create those files).
I got this file after parsing a dbx format file using a DBX Parser class available in PHP classes.
I searched in many places and did not find much discussion regarding this here in SO other than Script to parse emails for attachments. May be I missed some terms while searching. In that answer it is mentioned -
you can use the boundries to extract
the base64 encoded information
But I am not sure which are the boundaries and how exactly to use the boundaries? There already must be some libraries or some well defined method of doing this. I guess I will commit many mistakes if I try reinventing the wheel here.
There's an PHP Mailparse extension, have you tried it?
The manual way would be, process the mail line by line. When you hit your first Content-Type header (this one in your example):
Content-Type: multipart/mixed; boundary=20cf3054ac85d97721049a465e12
You have the boundary. This string is used as the boundary between your multiple parts (that's why they call it multipart).
Everytime a line starts with the dashes and this string, a new part begin. In your example:
--20cf3054ac85d97721049a465e12
Every part will start with headers, a blank line, and content. By looking at the content-type of the headers you can determine which are attachments, what their type is and their filename.
Read the whole content, strip the spaces, base64_decode it, and you've got the binary contents of the file. Does this help?

How to pull html encoding from email data using PHP

I'm working with emails and want to display the html in the browser, I'm not sure how to deal with the encoding. I want to extract the html to display it in the html browser. The way I plan on doing this is using an html parser on the entire email parsing the data inbetween the tags in the html section. Is there an easier/more efficient way to do this?
Here's text encoding
------=_Part_29856965_540743623.1285814590176
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Here's the html encoding
------=_Part_29856965_540743623.1285814590176
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
You can have a look at the ezComponents - Mail component. It has a lot of operations for building and using a MIME
http://ezcomponents.org/docs/tutorials/Mail

Categories