Does charset matter when I echo content from a BLOB field? - php

I have various images in a mysql table (a classic BLOB field).
In my viewImage.php I simply do:
header( 'Content-Type: image/jpeg' );
//> Fetch Image from database then echo $row['blobField'];
Should I specify a charset ?
Should I write:
header( 'Content-Type: image/jpeg; charset=UTF-8' );
? For what I might I think, I believe the response is no we don't need to set a charset

No, since character sets and encodings only matter for text.

No you don't need to set a charset.
The content-type is to tell the browser how to handle a request. if you're sending text/html and image/jpeg then the browser will not be able to handle it.
It has to be one or the other, it can't be both.
Using text/html is kind of a catch-all.

Related

Sending encrypted mail from php with non-ASCII Content-type

I am trying the below code to send encrypted mail where the cleartext is text/html and charset is utf-8.
For what it's worth, I cannot manage to get the mail displayed correctly (in MS Outlook).
As you may notice, I (meanwhile) try to have the relevant Content-Type: header both prepended to the data file to be encrypted and present in the headers parameter of the call to openssl_pkcs7_encrypt (and tried various other combinations as well).
While the desired content-type applies to the message when it is sent unencrypted, it simply does not work with encrypted messages. The result is always as if the "inner" Content-Type had been text/plain;charset=ascii.
I experimented with prepending the header to the data file before encryption only because I think I found some suggestion similar here on SE for a similar situation. But apparently this only makes the header line appear as part of the decrypted message (in other words, the file submitted to openssl_pkcs7_encrypt should really only be the "pure" content (as I had orginally suspected).
But having it in the headers parameter does not help, either: In the headers present in the enncryped file, there is a header added with the correct "outer" S/MIME content type, which overrides my "inner" content type given.
Question: Where and how in the combination of openssl_pkcs7_encrypt() and mail() should I specify the "inner" Content-Type of my encrypted message?
$hdr_to = implode(',',$empfaenger);
$hdr_subject = '=?UTF-8?q?' . quoted_printable_encode($subject) . '?=';
$other_headers = array(
"From" => $hdr_from,
"Content-Type" => "text/html; charset=utf-8",
"X-Mailer" => "PHP/".phpversion()
);
$mailbody = wordwrap($_REQUEST['message']);
//write msg to disk
$msg_fn = tempnam("/tmp","MSG");
$enc_fn = tempnam("/tmp","ENC");
$fp = fopen($msg_fn, "w");
fwrite($fp, 'Content-Type: ' . $other_headers['Content-Type'] . CRLF . CRLF);
fwrite($fp, $mailbody);
fclose($fp);
// Encrypt message
$enc_ok = openssl_pkcs7_encrypt($msg_fn,$enc_fn,$pubkeys,$other_headers,PKCS7_TEXT,1);
//$enc_ok = false; // for debugging: simulate encryption failure
if (!$enc_ok) {
// Will try to send unencrypted instead
} else {
// Seperate headers and body for mail()
$data = str_replace("\n",CRLF,file_get_contents($enc_fn));
$parts = explode(CRLF.CRLF, $data, 2);
$mailbody = $parts[1];
$other_headers = $parts[0];
}
// Send mail
$mail_ok = mail($hdr_to, $hdr_subject, $mailbody, $other_headers);
The problem was the PKCS7_TEXT flag given as parameter to the encryption function.
My intended media type was text/html; charset=utf-8, i.e.,
of type text
of subtype html
with parameter charset=utf-8
and superficially, the type matches what the name PKCS7_TEXT suggests. However, this flag is described as:
Adds text/plain content type headers to encrypted/signed message. If decrypting or verifying, it strips those headers from the output - if the decrypted or verified message is not of MIME type text/plain then an error will occur.
To be precise, it seems that said header and a blank line (to separate it from the message body) are prepended, which turns any previously existing header lines (Content-Type: or other) into part of the body.
So the simple solution is to not use the PKCS7_TEXT flag (unless your message is really plain ASCII² text).
² It may also be okay for latin-1 text, but I won't count on it. After all the description quoted above suggests that an error will occur; then again, UTF-8 text gets decrypted successfully (i.e., without an explicit error), but exhibits the usual codepage related display artefacts .

PHP - html_simple_dom, crawlers encodes innerhtml?

Im using PHP html_simple_dom.
The targeted site is using UTF-8. My php as well as the stream context are set to use UTF 8.
An element (which i inspect by browser) has an innerHTML of "AAA ' BBB", at least as far as when its rendering using my firefox and chrome browsers.
However, my PHP script always fetches this string as "AAA ' BBB".
I can fix this using htmlspecialchars_decode($string, 1), but i really want to know why the PHP script, or rather the website is ("wrongly) encoding the string in the first place when visiting it using my PHP, which is explicitly set to UTF
header('Content-Type: text/html; charset=utf-8');
define("CONTEXT", stream_context_create(
array(
"http" =>
array(
"header" => 'Content-Type: text/html; charset=utf-8'
// also tried 'header' => 'Accept-Charset: UTF-8'
)
)
)
);
targetsite reads UTF-8 - http://mtggoldfish.com.cutercounter.com/
$html = file_get_html($url, false, CONTEXT);
// do things, blurts out every "'" as encoded &#039
Browser inspectors do a bit of transformation to have something human-readable.
Create a simple HTML with only AAA ' BBB in the body, you will see AAA ' BBB in the inspectors.
If you really want to see the content of the page, look at the source code (which is what file_get_html gets)

file_get_contents() breaking ISO-8859-1 encoding

I am trying to read a page using file_get_contents() but I cannot get the character encoding to work.
this is my code:
$username = "masked";
$password = "maskedPass";
$remote_url = 'https://utfws.utfpr.edu.br/aluno01/sistema/mplistahorario.inicio?p_curscodnr=212';
// Create a stream
$opts = array(
'http'=>array(
'method'=>"GET",
'header' => array(
"Authorization: Basic " . base64_encode("$username:$password"),
'Accept-Charset: iso-8859-1'
)
)
);
$context = stream_context_create($opts);
// Open the file using the HTTP headers set above
$file = file_get_contents($remote_url, false, $context);
echo $file;
I tried to change the character encoding to utf-8 but I always get a page with question marks instead of áéíóúãõç.
When I open the page directly in my browser it works just fine. Why is this happening?
It sounds to me like this might just be a problem of lost encoding details.
What you're describing is:
request document from webserver, specifying encoding 8859-1
server responds with document in requested encoding, including header specifying the encoding is 8859-1. This will look correct in a browser.
output document ( but not header data! ) from php ( where this goes isn't specified
open the data in some sort of viewer.
See where the encoding specification was lost, there in step 3?
The data can correctly be decoded with 8859-1, but only will be decoded with 8859-1 if the viewer is configured to use that encoding by default. Some apps may have a default of 8859-1, but UTF-8 is a lot more common these days.
If you load the data into a different storage engine, say mysql, the problem may compound. mysql associates a charset with text data. If your database defaults to utf-8, and you don't tell it the data is actually in 8859-1, but you don't tell it the data is in 8859-1, now you're feeding it data that is assumed to be in utf-8, and the data will be treated as such in the database going forward. Now even if you ask the database for 8859-1 in the future, the data will be re-encoded from utf-8 to 8859-1, but it's not valid utf-8 - it's yet another incorrect set of bytes.
To address this problem, specify the encoding when you view the data, or when you save it to a database.

Performance text/html vs. application/json

While evaluating performance of PHP frameworks I came across a strange problem
Sending a JSON as application/json seems to be much slower than sending with no extra header (which seems to fallback to text/html)
Example #1 (application/json)
header('Content-Type: application/json');
echo json_encode($data);
Example #2 (text/html)
echo json_encode($data);
Testing with apache bench (ab -c10 -n1000) gives me:
Example #1: 350 #/sec
Example #2: 440 #/sec
which shows that setting the extra header seems to be a little bit slower.
But:
Getting the same JSONs via "ajax" (jQuery.getJSON('url', function(j){console.log(j)});) makes the difference very big (timing as seen in Chrome Web Inspector):
Example #1: 340 ms / request
Example #2: 980 ms / request
Whats the matter of this difference?
Is there a reason to use application/json despite the performance difference?
I'll take up the last part of question:
Is there a reason to use application/json despite the performance
difference?
Answer: Yes
Why:
1) text/html can often be malformed json and will go uncaught until you try parsing it. application/json will fail and you can easily debug whenever the json is malformed
2) If you are viewing json in browser, having the header type will format it in a user-friendly formatting. text/html will show it more as a blob.
3) If you are consuming this json on your webpage, application/json will immediately be converted into js object and you may access them as obj.firstnode.childnode etc.
4) callback feature can work on application/json, but not on text/html
Note:
Using gzip will sufficiently alleviate the performance problem. text/html will still be bit faster, but not the recommended way for fetching json objects
Would like to see more insight on performance though. Header length is definitely not causing performance issue. More to do with your webserver analyzing the header format.
Does your server handle gzipping/deflate differently depending on content-type? Mine does. Believe ab does not accept gzip by default. (You can set this in ab with a custom header with the -H flag). But Chrome will always say it accepts gzipping.
You can use curl test to see if the files are different sizes:
curl http://www.example.com/whatever --silent -H "Accept-Encoding: gzip,deflate" --write-out "size_download=%{size_download}\n" --output /dev/null
You can also look at the headers to see if gzipping is applied:
curl http://www.example.com/whatever -I -H "Accept-Encoding: gzip,deflate"

UTF-8 characters don't display correctly

This is my PHP code:
<?php
$result = '';
$str = 'Тугайный соловей';
for ($y=0; $y < strlen($str); $y++) {
$tmp = mb_substr($str, $y, 1);
$result = $result . $tmp;
}
echo 'result = ' . $result;
The output is:
Тугайный Ñоловей
What can I do? I have to put $result into a MySQL database.
What's the encoding of your file? It should be UTF8 too. What's the default charset of your http server? It should be UTF-8 as well.
Encoding only works if:
the file is encoded correctly
the server tells what's the encoding of the delivered file.
When working with databases, you also have to set the right encoding for your DB fields and the way the MySQL client communicates with the server (see mysql_set_charset()). Fields only are not enough because your MySQL client (in this case, PHP) could be set to ISO by default and reinterprets the data. So you end up with UTF8 DB -> ISO client -> injected into UTF8 PHP script. No wonder why it's messed up at the end :-)
How to serve the file with the right charset?
header('Content-type: text/html; charset=utf-8') is one solution
.htaccess file containing AddDefaultCharset UTF-8 is another one
HTML meta content-type might work too but it's always better to send this information using HTTP headers.
PS: you also have to use mb_strlen() because strlen() on UTF8 strings will probably report more than the real length.
If you're going to send a mix of data and don't want to specify utf-8 using a php header, you can add this html to your page:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
I suppose, your code is in windows-1251 encoding since it is Russian :)
convert your string to utf-8:
$str = iconv('windows-1251', 'utf-8', $str);
If your database is UTF-8, it's ok for mysql.
For your echo, if you do it in a web site, put this in the top page:
header('Content-Type: text/html; charset=UTF-8');
Just add this line at the beginning, after the connection with server:
mysqli_set_charset($conn,"utf8");
try this:
header('Content-Type: text/html; charset=UTF-8');
header("Content-type: application/octetstream");
header("Pragma: no-cache");
header("Expires: 0");
//print "$name_field\n$data";
// با این کد درست شد
print chr(255) . chr(254) . mb_convert_encoding("$name_field\n$data", 'UTF-16LE', 'UTF-8');
if you are just using PHP echo with no HTML headers etc., this worked great for me.
$connect = mysqli_connect($host_name, $user_name, $password, $database);
mysqli_set_charset($connect,"utf8");

Categories