PHP using UTF8 characters in URL, url encoding fails - php

In my PHP script I try to send utf8 characters to the google translate website for them to send me a translation of the text, but this doesn't work for UTF8 characters such as chinese, arabic and russian and I can't figure out why. If I try to translate 'как дела' to english I could use this link: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=как дела
And it would return this: [[["how are you","как дела",,,1]],,"ru"]
A fine translation, exactly what I wanted, but if I try to recreate it in PHP I do this (I used bytes in the beginning because my future script will use bytes as starting point):
<?php
$bytes = array(1082,1072,1082,32,1076,1077,1083,1072); // bytes of: как дела
$str = "";
for($i = 0; $i < count($bytes); ++$i) {
$str .= json_decode('"\u' . '0' . strtoupper(dechex($bytes[$i])) . '"'); // returns string: как дела
}
$from = 'ru';
$to = 'en';
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $str;
$call = fopen($url,"r");
$contents = fread($call,2048);
print $contents;
?>
And it outputs: [[["RєR RєRґRμR ° \"° F","какдела",,,0]],,"ru"]
The output doesn't make sense, it appears that my PHP script send the string 'какдела' to translate to english for me. I read something about making UTF-8 characters readable for google in a URI (or url). It says I should transfer my bytes to UTF-8 code units and put them in my url. I didn't yet figure out how to transfer bytes to UTF-8 code units, but I first wanted to try if it worked. I started by converting my text 'как дела' to code units (with percents for URL) to test it myself. This resulted in the following link: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=%D0%BA%D0%B0%D0%BA+%D0%B4%D0%B5%D0%BB%D0%B0
And when tested in browser it returns: [[["how are you","как дела",,,1]],,"ru"]
Again a fine translation, it appears it works so I tried to implement it in my script with the following code:
<?php
$from = 'ru';
$to = 'en';
$text = "%D0%BA%D0%B0%D0%BA+%D0%B4%D0%B5%D0%BB%D0%B0"; // code units of: как дела
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $text;
$call = fopen($url,"r");
$contents = fread($call,2048);
print $contents;
?>
This script outputs: [[["RєR Rє RґRμR ° \"° F","как дела",,,0]],,"ru"]
Again my script doesn't output what I want and what I get when I test these URL's in my own browser. I can't figure what I'm doing wrong and why google responds with a mess up of characters if I use the link in my PHP file.
Does someone know how to get the output I want? Thanks in advance!
Updated code to set strings in UTF-8, (not working)
I added a lot of settings at the top of the PHP file to make sure everything is in UTF8 format. Also I added a mb_convert_encoding halfway but the output keeps being wrong. The fopen function doesn't send the right UTF-8 string to google.
Output I get:
URL: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=%D0%BA%D0%B0%D0%BA%20%D0%B4%D0%B5%D0%BB%D0%B0
Encoding: ASCII
File contents: [[["RєR Rє RґRμR ° \"° F","как дела",,,0]],,"ru"]
Code I use:
<?php
header('Content-Type: text/html; charset=utf-8');
$TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8';
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');
$from = 'ru';
$to = 'en';
$text = rawurlencode('как дела');
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $text;
$url = mb_convert_encoding($url, "UTF-8", "ASCII");
$call = fopen($url,"r");
$contents = fread($call,2048);
print 'URL: ' . $url . '<br>';
print 'Encoding: ' . mb_detect_encoding($url) . '<br>';;
print 'File contents: ' . $contents;
?>

Solved! I got the hint from another not from these forums to look at this stackoverflow post about setting a user agent. After some more research I found that this answer was the solution to my problem. Now everything works fine!

Related

PHP mail() - Images don't always load

I have a PHP mail script which is basically the following:
$result = mail($to, $subject, $message, $headers);
if(!$result) {
echo "Error";
} else {
echo "Success";
}
The $message is a HTML email that mostly renders fine in my email client except the images seem to only load sporadically.
The images are all like so:
<img src='http://www.mywebsite.com/media/twitter.png' />
I don't understand why some would load and some wouldn't, when they are all set up the same way.
I've read that it's better to embed images into the email as attachments but I'm unsure how to do this. It seems that you add a line like so:
<img src='cid:123456789'>
But what does this reference? How would I encode an image like this?
Any help would be appreciated!! Thanks
You would have to base64 encode the file.
I found a code example on github. I have not tested it myself but should give you a good nudge in the right direction...
$picture = file_get_contents($file);
$size = getimagesize($file);
// base64 encode the binary data, then break it into chunks according to RFC 2045 semantics
$base64 = chunk_split(base64_encode($picture));
echo '<img src="data:' . $size['mime'] . ';base64,' . "\n" . $base64 . '" ' . $size[3] . ' />', "\n";
Source : https://gist.github.com/jasny/3938108
Just as a side note. Are the images that you are using web optimised? Large images might be blocked by email clients, or just not downloaded by email clients.

wordpress mail header set else plain text

Hope to get some help with a piece of code, I am using a theme for wordpress which sets the mail headers to text/html, this causes some problems with plain text mail ex. linebreaks don't show anymore.
I tried setting :
} else {
return 'text/plain';
}
but I don't know php very well so I don't know where to place it to make it work. I would like to set the text/plain for mails not defined.
this is the code for the wp header :
/**
* filter mail headers
*/
function wp_mail($compact) {
if (isset($_GET['action']) && $_GET['action'] == 'lostpassword') return $compact;
if ($compact['headers'] == '') {
//$compact['headers'] = 'MIME-Version: 1.0' . "\r\n";
$compact['headers'] = 'Content-type: text/html; charset=utf-8' . "\r\n";
$compact['headers'].= "From: " . get_option('blogname') . " < " . get_option('admin_email') . "> \r\n";
}
$compact['message'] = str_ireplace('[site_url]', home_url() , $compact['message']);
$compact['message'] = str_ireplace('[blogname]', get_bloginfo('name') , $compact['message']);
$compact['message'] = str_ireplace('[admin_email]', get_option('admin_email') , $compact['message']);
$compact['message'] = html_entity_decode($compact['message'], ENT_QUOTES, 'UTF-8');
$compact['subject'] = html_entity_decode($compact['subject'], ENT_QUOTES, 'UTF-8');
//$compact['message'] = et_get_mail_header().$compact['message'].et_get_mail_footer();
return $compact;
}
Instead of changing that, change your plain line breaks to html.
$message=nl2br($message); // of course use your var name.
That way you get to keep a standard format for email as well. plain text has nothing so special to need a separate header in this case. This function will convert all line breaks to html version.
Other than new lines most of your plain text will hold its formatting even in html because it has no special tags.
Here is how you will place it
function wp_mail($compact) {
// leave your existing code intact here, don't remove it.
$compact["message"]=nl2br($compact["message"]);
return $compact;
}

Result from URL request returning weird characters instead of accents

My problem is that the accents are not displayed in the output of print_r().
Here is my code:
<?php
include('./lib/simple_html_dom.php');
error_reporting(E_ALL);
if (isset($_GET['q'])){
$q = $_GET['q'];
$keyword=urlencode($q);
$url="https://www.google.com/search?q=$keyword";
$html=file_get_html($url);
$results=$html->find('li.g');
$G_tot = sizeof($results)-1;
for($g=0;$g<=$G_tot;$g++){
$results=$html->find('li.g',$g);
$array_ttl_google[]=$results->find('h3.r',0)->plaintext;
$array_desc_google[]=$results->find('span.st',0)->plaintext;
$array_href_google[]=$results->find('cite',0)->plaintext;
}
print_r($array_desc_google);
}
?>
Here is the result of print_r:
Array ( [0] => �t� m (plural �t�s)...
What is the resolution in your opinion?
3 basic things you can do:
Set the page encoding to UTF-8 - Add at the very begining of your page: header('Content-Type: text/html; charset=utf-8');
Make sure your code file is saved as UTF-8 (without BOM).
Add a function to translate the parsed string to UTF-8 (in case some other sites are using different encodings)
Your code should look something like that (Tested - working great tried with english and hebrew results):
<?php
header('Content-Type: text/html; charset=utf-8');
include('simple_html_dom.php');
error_reporting(0);
if (isset($_GET['q'])){
$q = $_GET['q'];
$keyword=urlencode($q);
$url="https://www.google.com/search?q=$keyword";
$html=file_get_html($url);
//Make sure we received UTF-8:
$encoding = #mb_detect_encoding($html);
if ($encoding && strtoupper($encoding) != "UTF-8")
$html = #iconv($encoding, "utf-8//TRANSLIT//IGNORE", $html);
//Proceed with your code:
$results=$html->find('li.g');
$G_tot = sizeof($results)-1;
for($g=0;$g<=$G_tot;$g++){
$results=$html->find('li.g',$g);
$array_ttl_google[]= $results->find('h3.r',0)->plaintext;
$array_desc_google[]= $results->find('span.st',0)->plaintext;
$array_href_google[] = $results->find('cite',0)->plaintext;
}
print_r($array_desc_google);
} else {
echo "You forgot to set the 'q' variable in your url.";
}
?>

Can't seem to write a valid .ics file?

I need your help. I'm writing a iCal .ics File-format with a php function.
If I open the .ics file with iCal it says the application says: "Calendar can’t read this calendar file. No events have been added to your calendar."
However if I validate the file with an online .ics validator it says everything should be fine except the line endings. The validator says this:
Your calendar is using an invalid newline format. Make sure to use
\r\n to end lines rather than just \n (RFC 2445 §4.1).
Congratulations; your calendar validated!
But I'm not sure if this is the "real" problem why my iCal can't read the file.
First off, I wonder how to change this line endings?
<?php
function wpse63611_events_feed_output(){
$filename = urlencode( 'My-Events-' . date('Y') . '.ics' );
// Start collecting output
ob_start();
header( 'Content-Description: File Transfer' );
header( 'Content-Disposition: attachment; filename=' . $filename );
header( 'Content-type: text/calendar' );
header( "Pragma: 0" );
header( "Expires: 0" );
?>
BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//<?php get_bloginfo('name'); ?>//NONSGML Events //EN
CALSCALE:GREGORIAN
X-WR-CALNAME:<?php echo get_bloginfo('name');?> - Events
<?php
if ( have_posts() ):
$now = new DateTime();
$datestamp =$now->format('Ymd\THis\Z');
while( have_posts() ): the_post();
global $post;
$uid = md5(uniqid(mt_rand(), true))."#mydomain.com";
$start = unixToiCal(get_event_date($post, true, true), 2.0);
$end = unixToiCal(get_event_end_date($post, true, true), 2.0);
$summary = wpse63611_esc_ical_text(get_the_title());
$description = apply_filters('the_excerpt_rss', get_the_content());
$description = wpse63611_esc_ical_text($description); ?>
BEGIN:VEVENT
UID:<?php echo $uid; ?>
<?php echo "\r\n"; ?>
DTSTAMP:<?php echo $datestamp; ?>
<?php echo "\r\n"; ?>
DTSTART:<?php echo $start; ?>
<?php echo "\r\n"; ?>
DTEND:<?php echo $end; ?>
<?php echo "\r\n"; ?>
SUMMARY:<?php echo $summary; ?>
<?php echo "\r\n"; ?>
DESCRIPTION:<?php echo $description; ?>
<?php echo "\r\n"; ?>
END:VEVENT
<?php endwhile; endif; ?>
END:VCALENDAR
<?php
// Collect output and echo
$eventsical = ob_get_contents();
ob_end_clean();
echo $eventsical;
exit();
}
function unixToiCal( $uStamp = 0, $tzone = 0.0 ) {
$uStampUTC = $uStamp + ($tzone * 3600);
$stamp = date("Ymd\THis\Z", $uStampUTC);
return $stamp;
}
function wpse63611_esc_ical_text( $text='' ) {
$text = str_replace("\\", "", $text);
$text = str_replace("\r", "\r\n ", $text);
$text = str_replace("\n", "\r\n ", $text);
return $text;
}
?>
Can you see any problem with this? What could cause the calendar not to work?
UPDATE
Well, I fixed the line endings and the calendar validates fine now. So no errors when validating it, but I still can't get it working in iCal. When I open it it still says the calendar file is not readable. Here is the actual file that is generated by my script … http://cl.ly/383D3M3q3P32
looking at your ical file brings 2 topics:
as mentionned above, all your lines need to be ended by \r\n (i.e. you need to ensure
BEGIN:VCALENDAR
VERSION:2.0
..
are also ended properly
2.you need to escape commas in text (see RFC5545 §3.3.11: https://www.rfc-editor.org/rfc/rfc5545#section-3.3.11 )
you can also run those through online icalendar validators see this post answer: https://stackoverflow.com/a/4812081/1167333
You have the wpse63611_esc_ical_text() function to normalize output but you only apply it to some output fragments. The funny thing is that such function expects Unix-style input ("\n") but the whole mechanism relies in saving your source code as Windows-style ("\r\n"). Additionally, you sometimes call the function twice on the same text.
I believe the root problem is the you don't really know what a line ending is. When you hit your keyboard's Enter key, you'll actually get a different character depending on whether your computer runs Windows or some kind of Unix (such as Linux or MacOS). On Windows, you'll actually get two characters, represented as "\r\n" in PHP. On Unix, you'll get one character, represented as "\n" in PHP. If your editor is good enough, it'll allow you to save the file with the line ending of your choice, no matter what your computer runs. Check the "Save as" dialogue for further info.
Since you aren't actually typing the ICS file, you need to ensure that PHP generates the appropriate characters. The simplest way is to type and save the source code as you please and then convert the complete output once:
$output = strtr($output, array(
"\r\n" => "\r\n",
"\r" => "\r\n",
"\n" => "\r\n",
));
You'll probably need to clean up your code first.

PHP - ASCII special characters (without MySQL)

I am doing this PHP page that have access to a Google account and than shows all emails. I've defined a header = UTF-8 and meta too, I used a lot of PHP function to convert the output to UTF but I keep getting strange icons instead of ASCII special characters. Such as ç, é or ã.
header("Content-Type: text/html; charset: UTF-8");
$message = imap_fetchbody($inbox,$email_number,2);
echo $message;
What should be the output:
çççç
What I get:
=E7=E7=E7=E7
Use imap_qprint (see first comment on that page for an alternative solution).
It seems to be a known issue, regarding the first comment on the imap_fetchbody PHP doc page.
Use imap_qprint or use the commenter solution :
<?php
function ReplaceImap($txt) {
$carimap = array("=C3=A9", "=C3=A8", "=C3=AA", "=C3=AB", "=C3=A7", "=C3=A0", "=20", "=C3=80", "=C3=89");
$carhtml = array("é", "è", "ê", "ë", "ç", "à", " ", "À", "É");
$txt = str_replace($carimap, $carhtml, $txt);
return $txt;
}
$mbox = imap_open("{imap.gmail.com:993/imap/ssl}INBOX", "login", "pass");
$no = 5; // Mail to show (mail number)
$text = imap_fetchbody($mbox, $no, 1);
$text = imap_utf8($text);
$text = ReplaceImap($text);
$text = nl2br($text);
echo $text;
?>

Categories