In php how to display chinese character? - php

what I build now is I grabbing from RSS feed in chinese RSS website, but once I echo out is blank, my code was work on english RSS, I try a lot of decode,iconv, header("Content-Type: text/html; charset=utf-8");, but still the same cannot display any chinese word on my screen.
here is my coding:
header("Content-Type: text/html; charset=utf-8");
function getrssfeed($feed_url){
$Current = date("Y-m-d" ,strtotime("now"));
$content = file_get_contents($feed_url);
$xml = new SimpleXmlElement($content);
$body = "";
foreach($xml->channel->item as $entry){
$body .= get_html_translation_table(htmlspecialchars_decode(strip_tags($Current ." ". $entry->description))) . "\n\n";
//$result = iconv('UTF-8', 'ISO-8859-1//TRANSLIT//IGNORE', $body);
$i++;
if($i==5) {
break;
}
}
echo $body;
}
getrssFeed("http://news.baidu.com/n?cmd=1&class=enternews&tn=rss");
Can you guy help me how to solve my problem ?
thank you

in your HTML header put this
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" ></meta>

Two things you need to do
Set document type or header as
content="text/html;charset=utf-8"
Save those user Chinese characters in database with field collation as utf8_general_ci

may be you can use this function with
mb_convert_encoding
,but at the same time ,you should attention the native document charset must be utf-8 or gb2312

Related

PHP using UTF8 characters in URL, url encoding fails

In my PHP script I try to send utf8 characters to the google translate website for them to send me a translation of the text, but this doesn't work for UTF8 characters such as chinese, arabic and russian and I can't figure out why. If I try to translate 'как дела' to english I could use this link: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=как дела
And it would return this: [[["how are you","как дела",,,1]],,"ru"]
A fine translation, exactly what I wanted, but if I try to recreate it in PHP I do this (I used bytes in the beginning because my future script will use bytes as starting point):
<?php
$bytes = array(1082,1072,1082,32,1076,1077,1083,1072); // bytes of: как дела
$str = "";
for($i = 0; $i < count($bytes); ++$i) {
$str .= json_decode('"\u' . '0' . strtoupper(dechex($bytes[$i])) . '"'); // returns string: как дела
}
$from = 'ru';
$to = 'en';
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $str;
$call = fopen($url,"r");
$contents = fread($call,2048);
print $contents;
?>
And it outputs: [[["RєR RєRґRμR ° \"° F","какдела",,,0]],,"ru"]
The output doesn't make sense, it appears that my PHP script send the string 'какдела' to translate to english for me. I read something about making UTF-8 characters readable for google in a URI (or url). It says I should transfer my bytes to UTF-8 code units and put them in my url. I didn't yet figure out how to transfer bytes to UTF-8 code units, but I first wanted to try if it worked. I started by converting my text 'как дела' to code units (with percents for URL) to test it myself. This resulted in the following link: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=%D0%BA%D0%B0%D0%BA+%D0%B4%D0%B5%D0%BB%D0%B0
And when tested in browser it returns: [[["how are you","как дела",,,1]],,"ru"]
Again a fine translation, it appears it works so I tried to implement it in my script with the following code:
<?php
$from = 'ru';
$to = 'en';
$text = "%D0%BA%D0%B0%D0%BA+%D0%B4%D0%B5%D0%BB%D0%B0"; // code units of: как дела
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $text;
$call = fopen($url,"r");
$contents = fread($call,2048);
print $contents;
?>
This script outputs: [[["RєR Rє RґRμR ° \"° F","как дела",,,0]],,"ru"]
Again my script doesn't output what I want and what I get when I test these URL's in my own browser. I can't figure what I'm doing wrong and why google responds with a mess up of characters if I use the link in my PHP file.
Does someone know how to get the output I want? Thanks in advance!
Updated code to set strings in UTF-8, (not working)
I added a lot of settings at the top of the PHP file to make sure everything is in UTF8 format. Also I added a mb_convert_encoding halfway but the output keeps being wrong. The fopen function doesn't send the right UTF-8 string to google.
Output I get:
URL: https://translate.googleapis.com/translate_a/single?client=gtx&sl=ru&tl=en&dt=t&q=%D0%BA%D0%B0%D0%BA%20%D0%B4%D0%B5%D0%BB%D0%B0
Encoding: ASCII
File contents: [[["RєR Rє RґRμR ° \"° F","как дела",,,0]],,"ru"]
Code I use:
<?php
header('Content-Type: text/html; charset=utf-8');
$TYPO3_CONF_VARS['BE']['forceCharset'] = 'utf-8';
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');
$from = 'ru';
$to = 'en';
$text = rawurlencode('как дела');
$url = 'https://translate.googleapis.com/translate_a/single?client=gtx&sl=' . $from . '&tl=' . $to . '&dt=t&q=' . $text;
$url = mb_convert_encoding($url, "UTF-8", "ASCII");
$call = fopen($url,"r");
$contents = fread($call,2048);
print 'URL: ' . $url . '<br>';
print 'Encoding: ' . mb_detect_encoding($url) . '<br>';;
print 'File contents: ' . $contents;
?>
Solved! I got the hint from another not from these forums to look at this stackoverflow post about setting a user agent. After some more research I found that this answer was the solution to my problem. Now everything works fine!

Result from URL request returning weird characters instead of accents

My problem is that the accents are not displayed in the output of print_r().
Here is my code:
<?php
include('./lib/simple_html_dom.php');
error_reporting(E_ALL);
if (isset($_GET['q'])){
$q = $_GET['q'];
$keyword=urlencode($q);
$url="https://www.google.com/search?q=$keyword";
$html=file_get_html($url);
$results=$html->find('li.g');
$G_tot = sizeof($results)-1;
for($g=0;$g<=$G_tot;$g++){
$results=$html->find('li.g',$g);
$array_ttl_google[]=$results->find('h3.r',0)->plaintext;
$array_desc_google[]=$results->find('span.st',0)->plaintext;
$array_href_google[]=$results->find('cite',0)->plaintext;
}
print_r($array_desc_google);
}
?>
Here is the result of print_r:
Array ( [0] => �t� m (plural �t�s)...
What is the resolution in your opinion?
3 basic things you can do:
Set the page encoding to UTF-8 - Add at the very begining of your page: header('Content-Type: text/html; charset=utf-8');
Make sure your code file is saved as UTF-8 (without BOM).
Add a function to translate the parsed string to UTF-8 (in case some other sites are using different encodings)
Your code should look something like that (Tested - working great tried with english and hebrew results):
<?php
header('Content-Type: text/html; charset=utf-8');
include('simple_html_dom.php');
error_reporting(0);
if (isset($_GET['q'])){
$q = $_GET['q'];
$keyword=urlencode($q);
$url="https://www.google.com/search?q=$keyword";
$html=file_get_html($url);
//Make sure we received UTF-8:
$encoding = #mb_detect_encoding($html);
if ($encoding && strtoupper($encoding) != "UTF-8")
$html = #iconv($encoding, "utf-8//TRANSLIT//IGNORE", $html);
//Proceed with your code:
$results=$html->find('li.g');
$G_tot = sizeof($results)-1;
for($g=0;$g<=$G_tot;$g++){
$results=$html->find('li.g',$g);
$array_ttl_google[]= $results->find('h3.r',0)->plaintext;
$array_desc_google[]= $results->find('span.st',0)->plaintext;
$array_href_google[] = $results->find('cite',0)->plaintext;
}
print_r($array_desc_google);
} else {
echo "You forgot to set the 'q' variable in your url.";
}
?>

Cannot get the correct utf-8 text from Access

When I tried to get chinese characters from the database, I got weird text.
I tried almost everything, like html_entity_decode, htmlentities, save the file using utf-8, encode in utf-8, but I can't seem to get it right.
How do i get the right text?
Here's my code:
<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />
<?php
header('Content-Type: text/html; charset=utf-8');
$conn=odbc_connect('vocab','','');
$rs1=odbc_exec($conn,"SELECT MAX(ID) AS MaxId FROM vocab");
$NewMaxID=odbc_result($rs1,"MaxId");
$rand=rand(1,$NewMaxID);
$sql="SELECT word,part_of_speech,chinese FROM vocab WHERE ID=".$rand.";";
$rs=odbc_exec($conn,$sql);
$i=1;
odbc_fetch_row($rs);
$a=(odbc_result($rs,1));
$b=(odbc_result($rs,2));
$c=(odbc_result($rs,3));
//$c="鎮";
//$d=html_entity_decode($c);
//$c=htmlentities($d, ENT_NOQUOTES , "UTF-8");
$rows=array("first"=>$a,"second"=>$b,"third"=>$c);
echo json_encode($rows);
?>
ps: I am using Traditional Chinese version of MS Office.
I encountered this issue a while ago and the only way I could get it to work was to write the HTML into an ADODB.Stream object, save it to a file, and then echo the file:
<?php
define("TEMP_FOLDER", "C:\\__tmp\\");
header('Content-Type: text/html; charset=utf-8');
$stm = new COM("ADODB.Stream") or die("Cannot create COM object.");
$stm->Type = 2; // adTypeText
$stm->Charset = 'utf-8';
$stm->Open();
$stm->WriteText('<html>');
$stm->WriteText('<head>');
$stm->WriteText('<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />');
$stm->WriteText('<title>ADODB test</title>');
$stm->WriteText('</head>');
$stm->WriteText('<body>');
$con = new COM("ADODB.Connection");
$con->Open(
"Driver={Microsoft Access Driver (*.mdb, *.accdb)};" .
"Dbq=C:\\Users\\Public\\Database1.accdb");
$rst = $con->Execute("SELECT word FROM vocab WHERE ID=3");
$stm->WriteText($rst->Fields("word"));
$rst->Close();
$con->Close();
$stm->WriteText('</body>');
$stm->WriteText('</html>');
$tempFile = TEMP_FOLDER . uniqid("", TRUE) . ".txt";
$stm->SaveToFile($tempFile, 2); // adSaveCreateOverWrite
$stm->Close();
echo file_get_contents($tempFile);
unlink($tempFile);
?>

PHP Script not Working: error on line 1 column 538

I have no idea of what is going wrong, i was using this same script to get another XML and it was working just fine.
This is the script:
<?php
header("Content-type: text/xml");
echo "<?xml version=\"1.0\" encoding=\"LATIN-1\"?>";
echo "<!DOCTYPE plist PUBLIC \"-//Apple//DTD PLIST 1.0//EN\" \"http://www.apple.com/DTDs/PropertyList-1.0.dtd\">";
echo "<plist version=\"1.0\">";
function getXML($sql="SELECT IDCargo FROM database.table"){
$conn = mysql_connect ("myserverurl.com", "user", "psw");
$db = mysql_select_db("database");
$result = mysql_query($sql, $conn);
$column = "";
echo "<array>";
while($row = mysql_fetch_assoc($result)){
$column .= "<dict>";
foreach ($row as $key => $value){
$column .= "<key>$key</key>";
$column .= "<string>$value</string>";
}
$column .= "</dict>";
}
echo $column;
echo "</array>";
echo "</plist>";
}
getXML("SELECT IDCargo as ID_CARGO, SequencialDoCandidato as NUMERO_SEQ_CANDIDATO, NomeDoCandidato AS NOME_CANDIDATO, NumeroDoCandidato as NUMERO_CANDIDATO, NomeDoPartido AS SIGLA_PARTIDO, Estado as SIGLA_ESTADO,IDUnidadeEleitoral AS ID_CIDADE_CANDIDATO, UnidadeEleitoral AS CIDADE_CANDIDATO FROM database.table");
mysql_close();
?>
It brings this error:
This page contains the following errors:
error on line 1 at column 538: Encoding error
Below is a rendering of the page up to the first error.
ID_CARGO11NUMERO_SEQ_CANDIDATO10000002965NOME_CANDIDATOAGRECINO DE SOUSANUMERO_CANDIDATO13SIGLA_PARTIDOPTSIGLA_ESTADOACID_CIDADE_CANDIDATO1120CIDADE_CANDIDATO
Can anyone see anything? Give me some help here, please.
I got the script from one that was working just fine.
If I read your code, I can see :
echo "<?xml version=\"1.0\" encoding=\"LATIN-1\"?>";
So I guess you want to output LATIN-1 charset. You need to specify this charset everywhere :
1/ You can try to specify the encoding on the http header :
header("Content-type: text/xml; charset=ISO-8859-1");
2/ And you can also set the charset on the DB client :
mysql_set_charset('latin1', $db);
Your output most likely contains a character or characters in an encoding not supported in the default encoding.
Either update the encoding you use, or escape any characters that does not fit the encoding. The htmlentities php function would be a good start.

PHP - ASCII special characters (without MySQL)

I am doing this PHP page that have access to a Google account and than shows all emails. I've defined a header = UTF-8 and meta too, I used a lot of PHP function to convert the output to UTF but I keep getting strange icons instead of ASCII special characters. Such as ç, é or ã.
header("Content-Type: text/html; charset: UTF-8");
$message = imap_fetchbody($inbox,$email_number,2);
echo $message;
What should be the output:
çççç
What I get:
=E7=E7=E7=E7
Use imap_qprint (see first comment on that page for an alternative solution).
It seems to be a known issue, regarding the first comment on the imap_fetchbody PHP doc page.
Use imap_qprint or use the commenter solution :
<?php
function ReplaceImap($txt) {
$carimap = array("=C3=A9", "=C3=A8", "=C3=AA", "=C3=AB", "=C3=A7", "=C3=A0", "=20", "=C3=80", "=C3=89");
$carhtml = array("é", "è", "ê", "ë", "ç", "à", " ", "À", "É");
$txt = str_replace($carimap, $carhtml, $txt);
return $txt;
}
$mbox = imap_open("{imap.gmail.com:993/imap/ssl}INBOX", "login", "pass");
$no = 5; // Mail to show (mail number)
$text = imap_fetchbody($mbox, $no, 1);
$text = imap_utf8($text);
$text = ReplaceImap($text);
$text = nl2br($text);
echo $text;
?>

Categories