I always work with MySQL but in but I am forced now to work with SQL Server and I am lost. I just want to get a row in spanish and I can't make it work. Here is the code, hopefully everything makes sense.
$connection = odbc_connect("Driver={SQL Server Native Client 11.0};Server=$server;Database=$database;", $user, $password);
$sql="SELECT * FROM my_table";
$res=odbc_exec($connection,$sql)or die(exit("Error en odbc_exec"));
while($arr = odbc_fetch_array($res)) {
$var = $arr["OkRef"];
echo "1.- ".iconv("Windows-1256", "UTF-8", "$var")."<br />";
echo "2.- ".iconv("CP437", "UTF-8", $var)."<br />";
echo "3.- ".iconv("CP850", "UTF-8", $var)."<br />";
echo "4.- ".utf8_decode($arr["OkRef"])."<br />";
echo "5.- ".utf8_encode($arr["OkRef"])."<br />";
echo "6.- ".$arr["OkRef"]."<br />";
echo "7.- ".mb_convert_encoding($arr["OkRef"], "utf-8", "windows-1251")."<br />";
echo "8.- ".htmlspecialchars( iconv("iso-8859-1", "utf-8", $var) );
}
}
I get this as result:
1.- ér àçHه¬´§d_meta_packet1Y³§0ت.122) ¸ؤ
2.- Θr ατHσ¼┤ºd_meta_packet1Y│º0╩.122) ╕─
3.- Úr ÓþHÕ¼┤ºd_meta_packet1Y│º0╩.122) ©─
4.- ?r ??H????d_meta_packet1Y??0?.122) ??
5.- ér àçH嬴§d_meta_packet1Y³§0Ê.122) ¸Ä
6.- �r ��H����d_meta_packet1Y��0�.122) ��
7.- йr азH嬴§d_meta_packet1Yі§0К.122) ёД
8.- ér àçH嬴§d_meta_packet1Y³§0Ê.122) ¸Ä
I tried also to add the following (not at once obviously) to make it work as it is:
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
header('Content-Type: text/html;charset=utf-8');
header('Content-Type: text/html;charset=iso-8859-1');
ini_set('mssql.charset', 'UTF-8');
The server is a Microsoft SQL Server Enterprise Edition, and the server Collation is Modern_Spanish_CI_AS.
I know, that this answer is posted too late, but I am in similar situation these days, so I want to share my experience.
My configuration is almost the same - database and table columns with Cyrillic_General_CS_AS collation. Note, that I use PHP Driver for SQL Server, not build-in ODBC support.
The steps below have helped me to resolve my case. I've used collation from your example.
Database:
CREATE TABLE [dbo].[MyTable] (
[TextInSpanish] [varchar](50) COLLATE Modern_Spanish_CI_AS NULL,
[NTextInSpanish] [nvarchar](50) COLLATE Modern_Spanish_CI_AS NULL
)
INSERT [dbo].[MyTable] (TextInSpanish, NTextInSpanish)
VALUES ('Algunas palabras en español', N'Algunas palabras en español')
PHP:
Set default_charset = "UTF-8" in your php.ini file.
Encode your source files in UTF-8. I use Notepad++ for this step.
Read data from database:
With default connection encoding. For reading data from database use $data = iconv('CP1252', 'UTF-8', $data);
Note, that by default data is returned in 8-bit characters as specified in the code
page of the Windows locale that is set on the system. Any
multi-byte characters or characters that do not map into
this code page are substituted with a single-byte question
mark (?) character. This is the default encoding.
With UTF-8 connection encoding.
Column must be of type 'nchar' or 'nvarchar'.
HTML:
Use: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Working Example:
test.php (PHP 7.1, PHP Driver for SQL Server 4.3, file test.php is UTF-8 encoded):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<meta charset="utf-8">
<?php
// Connection settings
$server = '127.0.0.1\instance,port';
$database = 'database';
$user = 'username';
$password = 'password';
$cinfo = array(
"CharacterSet"=>SQLSRV_ENC_CHAR,
#"CharacterSet"=>"UTF-8",
"Database"=>$database,
"UID"=>$user,
"PWD"=>$password
);
$conn = sqlsrv_connect($server, $cinfo);
if ($conn === false)
{
echo "Error (sqlsrv_connect): ".print_r(sqlsrv_errors(), true);
exit;
}
// Query
$sql = "SELECT * FROM MyTable";
$res = sqlsrv_query($conn, $sql);
if ($res === false) {
echo "Error (sqlsrv_query): ".print_r(sqlsrv_errors(), true);
exit;
}
// Results
while ($arr = sqlsrv_fetch_array($res, SQLSRV_FETCH_ASSOC)) {
# Use next 2 lines with "CharacterSet"=>SQLSRV_ENC_CHAR connection setting
echo iconv('CP1252', 'UTF-8', $arr['TextInSpanish'])."</br>";
echo iconv('CP1252', 'UTF-8', $arr['NTextInSpanish'])."</br>";
# Use next 2 lines with "CharacterSet"=>"UTF-8" connection setting
#echo $arr['TextInSpanish']."</br>";
#echo $arr['NTextInSpanish']."</br>";
}
// End
sqlsrv_free_stmt($res);
sqlsrv_close($conn);
?>
</head>
<body></body>
</html>
Oh my gosh, this did it:
"$data = iconv('CP1252', 'UTF-8', $data);"
Or in my case:
$specialnost = $_POST['specialnost'];
$specialnost = iconv('CP1251', 'UTF-8', $specialnost);
I have been searching for the last three days for a solution! Thank you Zhorov!
Related
I am making a system that automatically generates a contract, the problem is that I am unable to print some of the characters in PDF.
Sérgio Avilla (My name, for example, goes like this) ->
It should come out like this: Sérgio Avilla.
Below is the simplified application code.
<?php
require_once __DIR__ . '/vendor/autoload.php';
include 'config.php';
header("Content-type: text/html; charset=utf-8");
function file_get_contents_utf8($fn) {
$content = file_get_contents($fn);
return mb_convert_encoding($content, 'UTF-8', mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
}
$html = file_get_contents_utf8("contratos/".$contrato);
$mpdf = new \Mpdf\Mpdf();
$mpdf->WriteHTML($html);
$mpdf->Output();
?>
I would be grateful if anyone could help me. I've already tested, $ html, if printed directly on the screen gives no problems, all the right characters, the problem is mpdf down.
On the contract html file there was a charset =... , meta tag, I just changed it to charset = utf-8 and it worked.
After:<meta http-equiv=Content-Type content="text/html; charset=utf-8">
Before: <meta http-equiv=Content-Type content="text/html; charset=windows-1252">
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
I'm trying to create a social bookmarking site using php and mysql.
When I save a website's URL, I want to be able to save the site's title, favicon and description in a table in my database, then print them on my page using ajax.
How can I extract those elements from a website?
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>
<body>
<?php
$myServer = "localhost";
$myUser = "root";
$myPass = "'100pushups'";
$myDB = "social_bookmarking";
//connection to the database
$connect = mysqli_connect($myServer,$myUser, $myPass)
or die("Couldn't connect to SQLServer on $myServer");
//select a database to work with
$selected = mysqli_select_db($connect, $myDB)
or die("Couldn't open database $myDB");
var_dump($_POST);
//declare the SQL statement that will query the database
$url = "INSERT INTO url (url ) VALUES ('$_POST[url]')";
if (isset($_POST['value']))
{
// Instructions if $_POST['value'] exist
echo 'Your url is ' .$url;
}
$data = get_meta_tags($url);
print_r($data);
if (!mysqli_query($connect, $url)) {
die('Error: ' . mysql_error());
}
else
{
echo "Your information was added to the database";
}
mysqli_close($connect);
?>
</body>
</html>
I know I'm doing something wrong with my url there, but I don't know how to use a variable as an argument in get_meta_tags, since the function only accepts filenames or strings.
You can get the title by using: (courtesy of https://stackoverflow.com/users/54680/jonathan-sampson)
<?php
if ( $_POST["url"] ) {
$doc = new DOMDocument();
#$doc->loadHTML( file_get_contents( $_POST["url"] ) );
$xpt = new DOMXPath( $doc );
$output = $xpt->query("//title")->item(0)->nodeValue;
} else {
$output = "URL not provided";
}
echo $output;
?>
You can get the favicon using:
<?php
$url = $_POST['url'];
$doc = new DOMDocument();
$doc->strictErrorChecking = FALSE;
$doc->loadHTML(file_get_contents($url));
$xml = simplexml_import_dom($doc);
$arr = $xml->xpath('//link[#rel="shortcut icon"]');
echo $arr[0]['href'];
?>
Finally for the description you can use:
<?php
$tags = get_meta_tags($_POST['url']);
$description = $tags['description'];
echo $description;
?>
There are very smart scripts/classes out there that help getting content from the dom. For instance using smart selectors. I recommend using one of those.
This is a nice example:
http://simplehtmldom.sourceforge.net/
To get the content of the page, use file_get_contents or equal function.
You can use file_get_contents() function to get the favicon for a site(unless it thwarts you for https). Example:
$icon = file_get_contents("http://stackoverflow.com/favicon.ico");
// now save it
Another option is using curl. It's an awesome php extension if you know how to use it.
Using these methods, you can fetch the html content from the sites too. And then can parse them any HTML parser library of PHP. Or can use REGEX(which experts doesn't recommend often).
When I tried to get chinese characters from the database, I got weird text.
I tried almost everything, like html_entity_decode, htmlentities, save the file using utf-8, encode in utf-8, but I can't seem to get it right.
How do i get the right text?
Here's my code:
<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />
<?php
header('Content-Type: text/html; charset=utf-8');
$conn=odbc_connect('vocab','','');
$rs1=odbc_exec($conn,"SELECT MAX(ID) AS MaxId FROM vocab");
$NewMaxID=odbc_result($rs1,"MaxId");
$rand=rand(1,$NewMaxID);
$sql="SELECT word,part_of_speech,chinese FROM vocab WHERE ID=".$rand.";";
$rs=odbc_exec($conn,$sql);
$i=1;
odbc_fetch_row($rs);
$a=(odbc_result($rs,1));
$b=(odbc_result($rs,2));
$c=(odbc_result($rs,3));
//$c="鎮";
//$d=html_entity_decode($c);
//$c=htmlentities($d, ENT_NOQUOTES , "UTF-8");
$rows=array("first"=>$a,"second"=>$b,"third"=>$c);
echo json_encode($rows);
?>
ps: I am using Traditional Chinese version of MS Office.
I encountered this issue a while ago and the only way I could get it to work was to write the HTML into an ADODB.Stream object, save it to a file, and then echo the file:
<?php
define("TEMP_FOLDER", "C:\\__tmp\\");
header('Content-Type: text/html; charset=utf-8');
$stm = new COM("ADODB.Stream") or die("Cannot create COM object.");
$stm->Type = 2; // adTypeText
$stm->Charset = 'utf-8';
$stm->Open();
$stm->WriteText('<html>');
$stm->WriteText('<head>');
$stm->WriteText('<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />');
$stm->WriteText('<title>ADODB test</title>');
$stm->WriteText('</head>');
$stm->WriteText('<body>');
$con = new COM("ADODB.Connection");
$con->Open(
"Driver={Microsoft Access Driver (*.mdb, *.accdb)};" .
"Dbq=C:\\Users\\Public\\Database1.accdb");
$rst = $con->Execute("SELECT word FROM vocab WHERE ID=3");
$stm->WriteText($rst->Fields("word"));
$rst->Close();
$con->Close();
$stm->WriteText('</body>');
$stm->WriteText('</html>');
$tempFile = TEMP_FOLDER . uniqid("", TRUE) . ".txt";
$stm->SaveToFile($tempFile, 2); // adSaveCreateOverWrite
$stm->Close();
echo file_get_contents($tempFile);
unlink($tempFile);
?>
i'm trying to print a JSON in Hebrew and only get utf-8 encoded string. how can I make sure the client's browser shows the string in Hebrew?
the code is:
<html>
<head>
<meta charset=utf-8" />
</head>
<body>
<?php
header('Content-Type: text/html; charset=utf-8');
$response = array();
require_once __DIR__.'/db_connect.php';
$db = new DB_CONNECT();
$result = mysql_query(" SELECT * FROM stores") or die(mysql_error());
if (mysql_num_rows($result)>0){
$response["stores"]=array();
while($row = mysql_fetch_array($result)){
$store = array();
$store["_id"]=$row["_id"];
$store["name"]=$row["name"];
$store["store_type"]=$row["store_type"];
array_push($response["stores"],$store);
}
$response["success"] = 1;
$string = utf8_encode(json_encode($response));
echo hebrevc($string);
}else{
$response["success"]=0;
$response["message"]="No stores found";
echo utf8_encode(json_encode($response));
}
?>
</body>
</html>
and the response is:
{{"stores":[{"_id":"1","name":"\u05d7\u05ea\u05d5\u05dc\u05d9","store_type":"\u05de\u05e1\u05e2\u05d3\u05ea \u05d1\u05e9\u05e8\u05d9\u05dd"},{"_id":"2","name":"\u05de\u05e2\u05d3\u05e0\u05d9 \u05de\u05d0\u05de\u05d9","store_type":"\u05de\u05e1\u05e2\u05d3\u05d4 \u05dc\u05e8\u05d5\u05e1\u05d9\u05dd"}],"success":1
A nice constant was added in PHP 5.4: JSON_UNESCAPED_UNICODE
Using it will not escape your Hebrew characters.
echo json_encode($response, JSON_UNESCAPED_UNICODE);
Check out the json_encode reference.
The result looks like a UCS-2 string. Try setting the charset of the Mysql Connection:
mysql_set_charset('utf8', $conn)
then remove the utf8_encode statements
I am loading a HTML from an external server. The HTML markup has UTF-8 encoding and contains characters such as ľ,š,č,ť,ž etc. When I load the HTML with file_get_contents() like this:
$html = file_get_contents('http://example.com/foreign.html');
It messes up the UTF-8 characters and loads Å, ¾, ¤ and similar nonsense instead of proper UTF-8 characters.
How can I solve this?
UPDATE:
I tried both saving the HTML to a file and outputting it with UTF-8 encoding. Both doesn't work so it means file_get_contents() is already returning broken HTML.
UPDATE2:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sk" lang="sk">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta http-equiv="Content-Language" content="sk" />
<title>Test</title>
</head>
<body>
<?php
$html = file_get_contents('http://example.com');
echo htmlentities($html);
?>
</body>
</html>
I had similar problem with polish language
I tried:
$fileEndEnd = mb_convert_encoding($fileEndEnd, 'UTF-8', mb_detect_encoding($fileEndEnd, 'UTF-8', true));
I tried:
$fileEndEnd = utf8_encode ( $fileEndEnd );
I tried:
$fileEndEnd = iconv( "UTF-8", "UTF-8", $fileEndEnd );
And then -
$fileEndEnd = mb_convert_encoding($fileEndEnd, 'HTML-ENTITIES', "UTF-8");
This last worked perfectly !!!!!!
Solution suggested in the comments of the PHP manual entry for file_get_contents
function file_get_contents_utf8($fn) {
$content = file_get_contents($fn);
return mb_convert_encoding($content, 'UTF-8',
mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
}
You might also try your luck with http://php.net/manual/en/function.mb-internal-encoding.php
Alright. I have found out the file_get_contents() is not causing this problem. There's a different reason which I talk about in another question. Silly me.
See this question: Why Does DOM Change Encoding?
Exemple :
$string = file_get_contents(".../File.txt");
$string = mb_convert_encoding($string, 'UTF-8', "ISO-8859-1");
echo $string;
I think you simply have a double conversion of the character type there :D
It may be, because you opened an html document within a html document. So you have something that looks like this in the end
<!DOCTYPE html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
</head>
<body>
<!DOCTYPE html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Test</title>.......
The use of mb_detect_encoding therefore may lead you to other issues.
İn Turkish language, mb_convert_encoding or any other charset conversion did not work.
And also urlencode did not work because of space char converted to + char. It must be %20 for percent encoding.
This one worked!
$url = rawurlencode($url);
$url = str_replace("%3A", ":", $url);
$url = str_replace("%2F", "/", $url);
$data = file_get_contents($url);
I managed to solve using this function below:
function file_get_contents_utf8($url) {
$content = file_get_contents($url);
return mb_convert_encoding($content, "HTML-ENTITIES", "UTF-8");
}
file_get_contents_utf8($url);
Try this too
$url = 'http://www.domain.com/';
$html = file_get_contents($url);
//Change encoding to UTF-8 from ISO-8859-1
$html = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $html);
I am working with 35000 lines of data.
$f=fopen("veri1.txt","r");
$i=0;
while(!feof($f)){
$i++;
$line=mb_convert_encoding(fgets($f), 'HTML-ENTITIES', "UTF-8");
echo $line;
}
This code convert my strange characters into normal.
I had a similar problem, what solved it was html_entity_decode.
My code is:
$content = file_get_contents("http://example.com/fr");
$x = new SimpleXMLElement($content);
foreach($x->channel->item as $entry) {
$subEntry = html_entity_decode($entry->description);
}
In here I am retrieving an xml file (in French), that's why I'm using this $x object variable. And only then I decode it into this variable $subEntry.
I tried mb_convert_encoding but this didn't work for me.
Try this function
function mb_html_entity_decode($string) {
if (extension_loaded('mbstring') === true)
{
mb_language('Neutral');
mb_internal_encoding('UTF-8');
mb_detect_order(array('UTF-8', 'ISO-8859-15', 'ISO-8859-1', 'ASCII'));
return mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
}
return html_entity_decode($string, ENT_COMPAT, 'UTF-8');
}