Here is my working code:
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/>
</head>
<body>
<?php
$arabic = "صحيفة اسبوعية مستقلة شاملة تتابع الاخبار فى المنطقة العربية";
$french = "que voulez vous dire?";
if (isset($_POST['search'])) {
$search = $_POST['search'];
$key = $_POST['key'];
$td = substr_count($arabic, $key);
echo $td;
}
echo "<br />" . $arabic;
function count_occurences($char_string, $haystack, $case_sensitive = true) {
if ($case_sensitive === false) {
$char_string = strtolower($char_string);
$haystack = strtolower($haystack);
}
$characters = preg_split('//u', $char_string, -1, PREG_SPLIT_NO_EMPTY);
//$characters = str_split($char_string);
$character_count = 0;
foreach ($characters as $character) {
$character_count = $character_count + substr_count($haystack, $character);
}
return $character_count;
}
?>
<form name="input" action="" method="post">
<input type= "text" name="key" value=""/>
<input type ="submit" name="search" value =" find it !"/>
</form>
</body>
</html>
For the $french it works good, however with $arabic it doesn't.
Of course there is no error but if I enter for example ح to search for that letter, it shows always 0 for every letter I enter.
Is there some wrong? Or am I missing something with Arabic? I don't know why in $french works good if i enter v it shows 2 in result.
You need to use Multibyte String Functions.
You can also set mbstring.func_overload = 7 in your php.ini, and php will automatically use multibyte counterparts for standard string functions.
Look at mbstring overloading documentation if you want to use some other value for overloaded functions which would suit your needs better
Also, replace
$characters = str_split($char_string);
with
$characters = preg_split('//u', $char_string, -1, PREG_SPLIT_NO_EMPTY);
because str_split is not multibyte safe and has no alternative
Additionaly, if no encoding is sent in the headers after you submit the form, or there is some issue with them, you can set in your php.ini
default_charset = "UTF-8"
i tested your code with Encoding UTF-8, and it's work..
i'v added a meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Related
I have a PHP code that read text file and allow the user to make a search on a word and its work perfectly.
the files content is in arabic
Where the user make a search and the system will display the requested string with the line number where it exist.
What i want now is to make the system read multiple text files and when the user request a word the system will display the name of files where he found the user request.
is this possible and how long this process will take if i have 100 files ?
code:
<?php
$myFile = "arabic text.txt";
$myFileLink = fopen($myFile, 'r');
$line = 1;
if(isset($_POST["search"]))
{
$search =$_POST['name'];
while(!feof($myFileLink))
{
$myFileContents = fgets($myFileLink);
if( preg_match_all('/('.preg_quote($search,'/').')/i', $myFileContents, $matches))
{
foreach($matches[1] as $match)
{
echo "Found $match on Line $line";
}
}
++$line;
}
}
fclose($myFileLink);
//echo $myFileContents;
?>
<html>
<head>
</head>
<meta http-equiv="Content-Language" content="ar-sa">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<body>
<form action="index.php" method="post">
<p>enter your string <input type ="text" id = "idName" name="name" /></p>
<p><input type ="Submit" name ="search" value= "Search" /></p>
</form>
</body>
</html>
So you have to put your currently working code into a function, with the function returning the content you want.
function readFile($fileName)
{
// ...your code
return $yourMessage;
}
$fileNames = array("file1.txt", "file2.txt");
$result = array();
foreach($fileNames as $name)
{
$message = readFile($name);
$result[$name] = $message;
}
You can now iterate over the $result and print it in your own desired way.
To get all files into one array you can use scandir('/mydir/').
$files = scandir('/mydir/here/');
foreach($files as $file) {
if (strpos($file, '.txt')) {
//dosomething
}
}
I try to run the file_get_contents inside my php $tag but its not working, hope you can tell me why.
I need to include 3 contents sufflet in my array in my $linkcontent after that will it be send to overwrite another .php document later!
But the content from the file_get_contents does not run correctly.
if ($content_type == '1') {
$linkcontent = "
$homepageheader
<meta property=\"og:url\" content=\"http://$directory\"/>
<meta property=\"og:image\" content=\"$billedeurl\" />
<br>
$homepage2<br>
$homepage3<br>
$homepage4<br>
$homepagefooter
";
}else{
$linkcontent = "
$homepageheader
<meta property=\"og:url\" content=\"http://$directory\"/>
<meta property=\"og:image\" content=\"$billedeurl\" />
<br>
$first = 'xxxx/2.php';
$second = 'xxxx/3.php';
$third = 'xxxx/4.php';
$array = array($first, $second, $third);
shuffle($array);
foreach($array as $el) {
file_get_contents($el);
}
;";
}
Try using require instead of file_get_contents.
Using require, the files you include will be interpreted as PHP files.
I'm Vietnamese and i want to upload a utf-8 filename like
Tên Tệp Tiếng Việt.JPG
Here is my code
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>utf-8</title>
</head>
<body>
<?php
if(isset($_POST["submit"])) {
if($_FILES["upload"]["error"] > 0 ) echo "FILE ERROR!";
else
{
$base_dir = "D:/";
$fn = $_FILES["upload"]["name"];
$fn2 = $base_dir.$fn;
move_uploaded_file($_FILES["upload"]["tmp_name"],$fn2);
}
}
?>
<form action="" method="post" enctype="multipart/form-data" name="form1" id="form1">
<input type="file" name="upload" id="upload" />
<input type="submit" name="submit" id="submit" value="Send" />
</form>
</body>
</html>
but when i upload that i see on my computer D:\ has a file like
Tên Tệp Tiếng Việt.JPG
How to fix that thanks
I'm on Windows 8 chinese version, and I deal with similar problem with this:
$filename = iconv("utf-8", "cp936", $filename);
cp stands for Code page and cp936 stands for Code page 936, which is the default code page of simplified chinese version of Windows.
So I think maybe your problem could be solved in a similar way:
$fn2 = iconv("UTF-8","cp1258", $base_dir.$fn);
I'm not quite sure whether the default code page of your OS is 1258 or not, you should check it yourself by opening command prompt and type in command chcp. Then change 1258 to whatever the command give you.
UPDATE
It seems that PHP filesystem functions can only handle characters that are in system codepage, according to this answer. So you have 2 choices here:
Limit the characters in the filename to system codepage - in your case, it's 437. But I'm pretty sure that code page 437 does not include all the vietnamese characters.
Change your system codepage to the vietnamese one: 1258 and convert the filename to cp1258. Then the filesystem functions should work.
Both choices are deficient:
Choice 1: You can't use vietnamese characters anymore, which is not what you want.
Choice 2: You have to change system code page, and filename characters are limited to code page 1258.
UPDATE
How to change system code page:
Go to Control Panel > Region > Administrative > Change system locale and select Vietnamese(Vietnam) in the drop down menu.
This meta has no effect:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
because the web server has already sent Content-Type header, and thus decided what the encoding will be. Web browsers send forms in the same encoding. The meta is useful when user is off-line.
So you have to sned http header Content-Type by yourself:
<?php header("Content-Type: text/html; charset=utf-8"); ?>
ensure that you put this before any html, content or whatever is sent.
Alternatively, accept-charset tag on form should work as weel:
<form accept-charset="utf-8">
I am Persian and I have same problem with utf-8 character in my language.
I could solve my problem with this code:
$fn = $_FILES["upload"]["name"]; // name of file have some utf-8 characters
$name=iconv('utf-8','windows-1256', str_replace('ی', 'ي', $fn));
move_uploaded_file($_FILES["upload"]["tmp_name"],$name );
I am not sure about vientam language but maybe you can use the same code as above with a few changes:
$fn = $_FILES["upload"]["name"]; // name of file have some utf-8 characters
$name=iconv('utf-8','cp936', $fn);
move_uploaded_file($_FILES["upload"]["tmp_name"],$name );
The only solution I have found so far.. (2014 year):
1) I dont store files on my FTP in UTF-8 string. Instead, i use this function, to rename the uploaded files:
<?php
// use your custom function.. for example, utf8_encode
$newname = utf8_encode($_FILES["uploadFile"]["name"]);
move_uploaded_file($_FILES["uploadFile"]["tmp_name"], $newname);
?>
2) Now you can rename (or etc) $newname;
For a start get detecting filename encoding (before uploading).
print_r($_FILES["upload"]);
Insert filename to decoder and check encoding.
Sorry! your question is about file name.
You must save your file name with iconv but read without this.
for saving:
<?php
$name = $_FILES['photo']['name'];
$unicode = iconv('windows-1256', 'utf-8', $name);
move_uploaded_file($_FILES['photo']['tmp_name'], 'photos/' . $name);
mysql_query("INSERT INTO `photos` (`filename`) VALUES ('{$unicode}')");
?>
for reading:
<?php
$images = mysql_query('SELECT * FROM `photos`');
if($images && mysql_num_rows($images) > 0) {
while($image = mysql_fetch_assoc($images)) {
$name = iconv('utf-8', 'windows-1256', $image['filename']);
echo '<img src="photos/' . $name . '"/>';
}
mysql_free_result($images);
}?>
function convToUtf8($str)
{
if( mb_detect_encoding($str,"UTF-8, ISO-8859-1, GBK")!="UTF-8" )
{
return iconv("gbk","utf-8",$str);
}
else
{
return $str;
}
}
$filename= convToUtf8($filename) ;
try this
$imgname = $_FILES['img'] ['name'] ;
$imgsize = $_FILES['img'] ['size'] ;
$imgtmpname = $_FILES['img'] ['tmp_name'] ;
$imgtype = $_FILES['img'] ['type'] ;
$size = 1024;
$imgtypes = array('image/jpeg','image/gif','image/png');
$folder = "up";
if(empty($imgname)){
echo "Shose ur photo";
}else if(!in_array($imgtype,$imgtypes)){
echo "this photo type is not avalable";
}else if($imgsize > $size){
echo "this photo is dig than 6 MB";
}else if($imgwidth > 5000){
echo "the file is to big";
}
else{
move_uploaded_file($imgtmpname, $folder, $filename);
}
You can use this
$fn2 = basename($_FILES['upload']['name']);
I'm Vietnamese and i want to upload a utf-8 filename like
Tên Tệp Tiếng Việt.JPG
Here is my code
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>utf-8</title>
</head>
<body>
<?php
if(isset($_POST["submit"])) {
if($_FILES["upload"]["error"] > 0 ) echo "FILE ERROR!";
else
{
$base_dir = "D:/";
$fn = $_FILES["upload"]["name"];
$fn2 = $base_dir.$fn;
move_uploaded_file($_FILES["upload"]["tmp_name"],$fn2);
}
}
?>
<form action="" method="post" enctype="multipart/form-data" name="form1" id="form1">
<input type="file" name="upload" id="upload" />
<input type="submit" name="submit" id="submit" value="Send" />
</form>
</body>
</html>
but when i upload that i see on my computer D:\ has a file like
Tên Tệp Tiếng Việt.JPG
How to fix that thanks
I'm on Windows 8 chinese version, and I deal with similar problem with this:
$filename = iconv("utf-8", "cp936", $filename);
cp stands for Code page and cp936 stands for Code page 936, which is the default code page of simplified chinese version of Windows.
So I think maybe your problem could be solved in a similar way:
$fn2 = iconv("UTF-8","cp1258", $base_dir.$fn);
I'm not quite sure whether the default code page of your OS is 1258 or not, you should check it yourself by opening command prompt and type in command chcp. Then change 1258 to whatever the command give you.
UPDATE
It seems that PHP filesystem functions can only handle characters that are in system codepage, according to this answer. So you have 2 choices here:
Limit the characters in the filename to system codepage - in your case, it's 437. But I'm pretty sure that code page 437 does not include all the vietnamese characters.
Change your system codepage to the vietnamese one: 1258 and convert the filename to cp1258. Then the filesystem functions should work.
Both choices are deficient:
Choice 1: You can't use vietnamese characters anymore, which is not what you want.
Choice 2: You have to change system code page, and filename characters are limited to code page 1258.
UPDATE
How to change system code page:
Go to Control Panel > Region > Administrative > Change system locale and select Vietnamese(Vietnam) in the drop down menu.
This meta has no effect:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
because the web server has already sent Content-Type header, and thus decided what the encoding will be. Web browsers send forms in the same encoding. The meta is useful when user is off-line.
So you have to sned http header Content-Type by yourself:
<?php header("Content-Type: text/html; charset=utf-8"); ?>
ensure that you put this before any html, content or whatever is sent.
Alternatively, accept-charset tag on form should work as weel:
<form accept-charset="utf-8">
I am Persian and I have same problem with utf-8 character in my language.
I could solve my problem with this code:
$fn = $_FILES["upload"]["name"]; // name of file have some utf-8 characters
$name=iconv('utf-8','windows-1256', str_replace('ی', 'ي', $fn));
move_uploaded_file($_FILES["upload"]["tmp_name"],$name );
I am not sure about vientam language but maybe you can use the same code as above with a few changes:
$fn = $_FILES["upload"]["name"]; // name of file have some utf-8 characters
$name=iconv('utf-8','cp936', $fn);
move_uploaded_file($_FILES["upload"]["tmp_name"],$name );
The only solution I have found so far.. (2014 year):
1) I dont store files on my FTP in UTF-8 string. Instead, i use this function, to rename the uploaded files:
<?php
// use your custom function.. for example, utf8_encode
$newname = utf8_encode($_FILES["uploadFile"]["name"]);
move_uploaded_file($_FILES["uploadFile"]["tmp_name"], $newname);
?>
2) Now you can rename (or etc) $newname;
For a start get detecting filename encoding (before uploading).
print_r($_FILES["upload"]);
Insert filename to decoder and check encoding.
Sorry! your question is about file name.
You must save your file name with iconv but read without this.
for saving:
<?php
$name = $_FILES['photo']['name'];
$unicode = iconv('windows-1256', 'utf-8', $name);
move_uploaded_file($_FILES['photo']['tmp_name'], 'photos/' . $name);
mysql_query("INSERT INTO `photos` (`filename`) VALUES ('{$unicode}')");
?>
for reading:
<?php
$images = mysql_query('SELECT * FROM `photos`');
if($images && mysql_num_rows($images) > 0) {
while($image = mysql_fetch_assoc($images)) {
$name = iconv('utf-8', 'windows-1256', $image['filename']);
echo '<img src="photos/' . $name . '"/>';
}
mysql_free_result($images);
}?>
function convToUtf8($str)
{
if( mb_detect_encoding($str,"UTF-8, ISO-8859-1, GBK")!="UTF-8" )
{
return iconv("gbk","utf-8",$str);
}
else
{
return $str;
}
}
$filename= convToUtf8($filename) ;
try this
$imgname = $_FILES['img'] ['name'] ;
$imgsize = $_FILES['img'] ['size'] ;
$imgtmpname = $_FILES['img'] ['tmp_name'] ;
$imgtype = $_FILES['img'] ['type'] ;
$size = 1024;
$imgtypes = array('image/jpeg','image/gif','image/png');
$folder = "up";
if(empty($imgname)){
echo "Shose ur photo";
}else if(!in_array($imgtype,$imgtypes)){
echo "this photo type is not avalable";
}else if($imgsize > $size){
echo "this photo is dig than 6 MB";
}else if($imgwidth > 5000){
echo "the file is to big";
}
else{
move_uploaded_file($imgtmpname, $folder, $filename);
}
You can use this
$fn2 = basename($_FILES['upload']['name']);
I am loading a HTML from an external server. The HTML markup has UTF-8 encoding and contains characters such as ľ,š,č,ť,ž etc. When I load the HTML with file_get_contents() like this:
$html = file_get_contents('http://example.com/foreign.html');
It messes up the UTF-8 characters and loads Å, ¾, ¤ and similar nonsense instead of proper UTF-8 characters.
How can I solve this?
UPDATE:
I tried both saving the HTML to a file and outputting it with UTF-8 encoding. Both doesn't work so it means file_get_contents() is already returning broken HTML.
UPDATE2:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sk" lang="sk">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta http-equiv="Content-Language" content="sk" />
<title>Test</title>
</head>
<body>
<?php
$html = file_get_contents('http://example.com');
echo htmlentities($html);
?>
</body>
</html>
I had similar problem with polish language
I tried:
$fileEndEnd = mb_convert_encoding($fileEndEnd, 'UTF-8', mb_detect_encoding($fileEndEnd, 'UTF-8', true));
I tried:
$fileEndEnd = utf8_encode ( $fileEndEnd );
I tried:
$fileEndEnd = iconv( "UTF-8", "UTF-8", $fileEndEnd );
And then -
$fileEndEnd = mb_convert_encoding($fileEndEnd, 'HTML-ENTITIES', "UTF-8");
This last worked perfectly !!!!!!
Solution suggested in the comments of the PHP manual entry for file_get_contents
function file_get_contents_utf8($fn) {
$content = file_get_contents($fn);
return mb_convert_encoding($content, 'UTF-8',
mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true));
}
You might also try your luck with http://php.net/manual/en/function.mb-internal-encoding.php
Alright. I have found out the file_get_contents() is not causing this problem. There's a different reason which I talk about in another question. Silly me.
See this question: Why Does DOM Change Encoding?
Exemple :
$string = file_get_contents(".../File.txt");
$string = mb_convert_encoding($string, 'UTF-8', "ISO-8859-1");
echo $string;
I think you simply have a double conversion of the character type there :D
It may be, because you opened an html document within a html document. So you have something that looks like this in the end
<!DOCTYPE html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
</head>
<body>
<!DOCTYPE html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Test</title>.......
The use of mb_detect_encoding therefore may lead you to other issues.
İn Turkish language, mb_convert_encoding or any other charset conversion did not work.
And also urlencode did not work because of space char converted to + char. It must be %20 for percent encoding.
This one worked!
$url = rawurlencode($url);
$url = str_replace("%3A", ":", $url);
$url = str_replace("%2F", "/", $url);
$data = file_get_contents($url);
I managed to solve using this function below:
function file_get_contents_utf8($url) {
$content = file_get_contents($url);
return mb_convert_encoding($content, "HTML-ENTITIES", "UTF-8");
}
file_get_contents_utf8($url);
Try this too
$url = 'http://www.domain.com/';
$html = file_get_contents($url);
//Change encoding to UTF-8 from ISO-8859-1
$html = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $html);
I am working with 35000 lines of data.
$f=fopen("veri1.txt","r");
$i=0;
while(!feof($f)){
$i++;
$line=mb_convert_encoding(fgets($f), 'HTML-ENTITIES', "UTF-8");
echo $line;
}
This code convert my strange characters into normal.
I had a similar problem, what solved it was html_entity_decode.
My code is:
$content = file_get_contents("http://example.com/fr");
$x = new SimpleXMLElement($content);
foreach($x->channel->item as $entry) {
$subEntry = html_entity_decode($entry->description);
}
In here I am retrieving an xml file (in French), that's why I'm using this $x object variable. And only then I decode it into this variable $subEntry.
I tried mb_convert_encoding but this didn't work for me.
Try this function
function mb_html_entity_decode($string) {
if (extension_loaded('mbstring') === true)
{
mb_language('Neutral');
mb_internal_encoding('UTF-8');
mb_detect_order(array('UTF-8', 'ISO-8859-15', 'ISO-8859-1', 'ASCII'));
return mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES');
}
return html_entity_decode($string, ENT_COMPAT, 'UTF-8');
}