UTF8 not working when Posting - php

Have a really strange problem with UTF8 characters.
I have the following:
All my files are UTF-8
I am using (in my form): accept-charset="utf-8"
I got: <meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
But for some reason when i post UTF8 characters like: ö ä å
And i then echo the $_GET[] the output show: ���
Feels like I've tried everything, all help is very welcome! :)

Browsers will send the data in same encoding as you declared to page to be. For a sanity test, run this page:
<?php
header("Content-Type:text/html; charset=utf-8");
$file = basename(__FILE__);
if( isset( $_POST['data'] ) ) {
echo $_POST['data'];
}
else {
echo <<<HTML
<form method="POST" action="$file">
<input name="data" type="text">
<input type="submit">
</form>
HTML;
}
Write "äöä" to the form and see if it's right. If it isn't, try to check your mbstring ini values for:
<?php
var_dump(
ini_get("mbstring.http_input"),
ini_get("mbstring.http_output"),
ini_get("mbstring.encoding_translation")
);
The correct values are:
string(4) "pass"
string(4) "pass"
string(1) "0"

Related

Hebrew chars from pdf file shows gibberish using PHP

I'm trying to get text from a pdf file with Hebrew in it and manipulate it, but when I'm using echo it shows these letters instead of Hebrew:
Ço̬mÀÃ6ÜÍzWÃýCW¶°ÐÞ]Aµ±¸¤:ÄÞ[JÞaCå+wÎ[n6GZù>"âÊù+ýÕ9^6ÓF½íoßEcì¸_pùnÚbïjÅÅß^UtýÝ-®»þgåĿٻƷ8ԯβzÅr
I made sure the page is in utf-8 and converted the returned text to utf-8 but it doesn't fix it.
When The text wasn't in utf-8 it showed these symbols:
��G�W����/��<� ������%�M����>����z.�m47�M �O�4�Nf�/7ʓ쓻#2FGj��,U8�J
I feel like I'm just missing something.
This is my code:
<?php
header('Content-type: text/html; charset=UTF-8');
$formReturn = $_POST["formReturn"];
if ($formReturn)
{
$file = $_FILES["gradesPdf"]["tmp_name"];
$text = file_get_contents($file);
$text = utf8_encode($text);
}
$html = '
<!DOCTYPE html>
<html lang="he">
<meta charset="utf-8" />
<head>
<title>נסיון</title>
</head>
<body>
<form enctype="multipart/form-data" method="post">
<input type="file" name="gradesPdf" id="gradesPdf">
<br><br>
<button type="submit">run</button>
<input type="hidden" name="formReturn" value="1">
</form>
'. $text .'
</body>
</html>
';
echo $html;
Btw I can't use pdfParser, I tried the demo on their site and it didn't return the text the way I wanted. I think since my pdf has a table in it.

utf-8 character input fail to PHP regex

<?php
if(isset($_GET['textvalue'])){
$string = $_GET['textvalue']; //preg_match return false
//$string = '한자漢字メ'; //preg_match return true
$stringArray = preg_match('/^[\p{L}]{2,30}$/u', $string);
}
?>
<!DOCTYPE html>
<html>
<body>
<form method="GET">
<input type="text" name="textvalue">
<input type="submit">
</form>
</body>
</html>
I'm trying to regex the value from the input.
Unfortunately, every time I submit the characters, preg_match return false. But, if I use the string from the variable, it'll return true.
What going on and how do I fix it?
If anyone ran into this problem, I've found it. You just need to add this meta header:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
I'm not sure why, but with out the codes above, html it send the values to php as a non-utf-8 value. So, then the preg_match try to read it, its reading a different value then what was typed in, thus; it return false.
That's why it work when you just uses the string. HTml is not involved.
note. Even if you try to read by echoing it out, html with return it to its orginal utf-8 value. weird.
Example:
<?php
if(isset($_GET['textvalue'])){
$string = $_GET['textvalue']; //preg_match return false
//$string = '한자漢字メ'; //preg_match return true
$stringArray = preg_match('/^[\p{L}]{2,30}$/u', $string);
}
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<head>
<body>
<form method="GET">
<input type="text" name="textvalue">
<input type="submit">
</form>
</body>
</html>

How to stop HTML text in textarea to be interpreted as code

I have a textarea that users can edit. After the edit I save the text in a PHP variable $bio. When I want to display it I do this:
<?php
$bio = nl2br($bio);
echo $bio;
?>
But if a user for example types an HTML command like "strong" in their text my site will actually output the text as bold. Which is nothing I want.
How can I print/echo the $bio on the screen just as text and not as HTML code?
Thanks in advance!
Replace echo $bio; with echo htmlspecialchars($bio);
http://php.net/htmlspecialchars
When you output text to the html / the browser and you want to make sure that the output does not break the html, you should always use htmlspecialchars().
In your case you do want to show the <br> tags, so you should do that before you add them:
$bio = nl2br(htmlspecialchars($bio));
You can also use strip_tags() to get rid of the html tags altogether, but you would still need to use htmlspecialchars() so that for example a < character will not break your html.
You can also use htmlentites()
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title></title>
</head>
<body>
<form method="POST" action="">
<p><textarea rows="8" name="bio" cols="40"></textarea></p>
<p><input type="submit" value="Submit"></p>
</form>
<p>Result:</p>
<?php echo isset($_POST['bio']) ? htmlentities($_POST['bio']) : null; ?>
</body>
</html>
So like:

Some characters of $_POST string variable are wrongly displayed

I have 2 php pages. After submitting the form on page1 it's posted data is being displayed on page2. This works fine but some of characters like ' and " automatically get a \ just before themselves and the spaces are also gone.
For example I give ' " on page1. This is displayed as \' \" on page2. As you see the characters got \ attached and the spaces are also gone.
My code:
Page1.php
<html>
<head>
<title>PAGE 1</title>
</head>
<body>
<form enctype="multipart/form-data" action="page2.php" method="post">
<input type="text" name="txtNaam" id="txtNaam" />
<input type="submit" value="Submit">
</form>
</body>
</html>
Page2.php
<?php
// TEST 1
echo $_POST['txtNaam']; // <== \' \"
echo "<br/>";
// TEST 2
echo rawurlencode($_POST['txtNaam']); // <== %5C%27%20%20%20%20%5C%22
echo "<br/>";
// TEST 3
echo urlencode($_POST['txtNaam']); // <== %5C%27++++%5C%22
?>
How can I get these special characters correctly displayed when they are posted?
Try this:
echo stripslashes($_POST['txtNaam']);
Have you tried
echo htmlspecialchars($_POST['txtNaam'], ENT_QUOTES);
or
echo htmlentities(stripslashes($_POST['txtNaam']), ENT_QUOTES)
If magic_quotes_gpc is turned on, all $_GET, $_POST and $_COOKIE variables (GPC) in PHP will already have special characters like ", ' and \ escaped.
To prevent this from happening, you can disable it.
Edit your php.ini like so:
magic_quotes_gpc = Off
You can also use base64_encode() & base64_decode()

Replace non standard characters in php

I'm trying to replace some non standard characters like ë,Ë,ç,Ç with numeric entities like Ë , ' etc but i ran into a bit of a problem.
When i try to replace them directly like this it works fine:
$string = "Ë";
$vname = str_replace("Ë","AAAA",$string);
echo $vname."<br>";
an i get AAAA as a result.
But when i try to replace the characters from a string that i get from a form with POST then it doesn't change the characters. Here is an example:
<?php
if(isset($_POST['submit'])) {
$string = $_POST['title'];
if ($string == "Ë")
echo "Yes";
else
echo "No";
$vname = str_replace("Ë","AAAA",$string);
echo $vname."<br>";
echo $string;
}
?>
<form method="post" name="Form">
Title: <input name="title" type="text" value="" size="20"/>
<input name="submit" type="submit" value="submit"/>
</form>
Any help would be great!!
Most likely your characterset is wrong. I would suggest sending the following header when outputing html:
<?php header("content-type: text/html; charset=utf-8"); ?>
Where the charset match the charset you are storing your file in.
Edit: Just some more information. The file you store is in one charset for example latin1, while your browser interprets your html page as another charset (utf-8 for example). When the browser then sends the Ë character, it will send the utf-8 code 0xc38b, while the same character is 0xcb. As you can see, these does not match.
Edit - You can also update the CHARSET via HTML5 or xHTML:
HTML5
<meta charset="UTF-8"/>
xHTML
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Categories