Replace non standard characters in php - php

I'm trying to replace some non standard characters like ë,Ë,ç,Ç with numeric entities like Ë , ' etc but i ran into a bit of a problem.
When i try to replace them directly like this it works fine:
$string = "Ë";
$vname = str_replace("Ë","AAAA",$string);
echo $vname."<br>";
an i get AAAA as a result.
But when i try to replace the characters from a string that i get from a form with POST then it doesn't change the characters. Here is an example:
<?php
if(isset($_POST['submit'])) {
$string = $_POST['title'];
if ($string == "Ë")
echo "Yes";
else
echo "No";
$vname = str_replace("Ë","AAAA",$string);
echo $vname."<br>";
echo $string;
}
?>
<form method="post" name="Form">
Title: <input name="title" type="text" value="" size="20"/>
<input name="submit" type="submit" value="submit"/>
</form>
Any help would be great!!

Most likely your characterset is wrong. I would suggest sending the following header when outputing html:
<?php header("content-type: text/html; charset=utf-8"); ?>
Where the charset match the charset you are storing your file in.
Edit: Just some more information. The file you store is in one charset for example latin1, while your browser interprets your html page as another charset (utf-8 for example). When the browser then sends the Ë character, it will send the utf-8 code 0xc38b, while the same character is 0xcb. As you can see, these does not match.
Edit - You can also update the CHARSET via HTML5 or xHTML:
HTML5
<meta charset="UTF-8"/>
xHTML
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Related

Hebrew chars from pdf file shows gibberish using PHP

I'm trying to get text from a pdf file with Hebrew in it and manipulate it, but when I'm using echo it shows these letters instead of Hebrew:
Ço̬mÀÃ6ÜÍzWÃýCW¶°ÐÞ]Aµ±¸¤:ÄÞ[JÞaCå+wÎ[n6GZù>"âÊù+ýÕ9^6ÓF½íoßEcì¸_pùnÚbïjÅÅß^UtýÝ-®»þgåĿٻƷ8ԯβzÅr
I made sure the page is in utf-8 and converted the returned text to utf-8 but it doesn't fix it.
When The text wasn't in utf-8 it showed these symbols:
��G�W����/��<� ������%�M����>����z.�m47�M �O�4�Nf�/7ʓ쓻#2FGj��,U8�J
I feel like I'm just missing something.
This is my code:
<?php
header('Content-type: text/html; charset=UTF-8');
$formReturn = $_POST["formReturn"];
if ($formReturn)
{
$file = $_FILES["gradesPdf"]["tmp_name"];
$text = file_get_contents($file);
$text = utf8_encode($text);
}
$html = '
<!DOCTYPE html>
<html lang="he">
<meta charset="utf-8" />
<head>
<title>נסיון</title>
</head>
<body>
<form enctype="multipart/form-data" method="post">
<input type="file" name="gradesPdf" id="gradesPdf">
<br><br>
<button type="submit">run</button>
<input type="hidden" name="formReturn" value="1">
</form>
'. $text .'
</body>
</html>
';
echo $html;
Btw I can't use pdfParser, I tried the demo on their site and it didn't return the text the way I wanted. I think since my pdf has a table in it.

How to stop HTML text in textarea to be interpreted as code

I have a textarea that users can edit. After the edit I save the text in a PHP variable $bio. When I want to display it I do this:
<?php
$bio = nl2br($bio);
echo $bio;
?>
But if a user for example types an HTML command like "strong" in their text my site will actually output the text as bold. Which is nothing I want.
How can I print/echo the $bio on the screen just as text and not as HTML code?
Thanks in advance!
Replace echo $bio; with echo htmlspecialchars($bio);
http://php.net/htmlspecialchars
When you output text to the html / the browser and you want to make sure that the output does not break the html, you should always use htmlspecialchars().
In your case you do want to show the <br> tags, so you should do that before you add them:
$bio = nl2br(htmlspecialchars($bio));
You can also use strip_tags() to get rid of the html tags altogether, but you would still need to use htmlspecialchars() so that for example a < character will not break your html.
You can also use htmlentites()
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title></title>
</head>
<body>
<form method="POST" action="">
<p><textarea rows="8" name="bio" cols="40"></textarea></p>
<p><input type="submit" value="Submit"></p>
</form>
<p>Result:</p>
<?php echo isset($_POST['bio']) ? htmlentities($_POST['bio']) : null; ?>
</body>
</html>
So like:

PHP - GET Method adds unnecessary additions symbols to URL

I looked around Stack Overflow for the recommended method type for PHP; either the GET or POST method. Recommended by the community, the GET method seems to be a good idea for passing queries for a simple search engine.
Unfortunately, the GET method adds unnecessary addition symbols to the URL to indicate spaces. Basically, a aesthetic issue.
I tried the trim() function to lessen the spaces, however, that is only called after the data is submitted and the URL is already set with the parameters.
Here is a the index.php file I think is not cooperating with me.
<?php $query = ""; ?>
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document</title>
</head>
<body>
<form action="index.php" method="GET">
<input type="text" name="query" placeholder="Enter Query">
<input type="submit" text="Search">
</form>
<?php
$query = $_GET['query'];
print $query;
?>
</body>
</html>
A example, if needed. If I type into the search bar this query...
sample 1
The URL will be formed this way...
http://localhost/search/index.php?query=sample++++++++++++++1
Is there a way to fix this problem or is the POST method the only way to circumvent this problem?
You will need to use the POST method.
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Document</title>
</head>
<body>
<form action="index.php" method="POST">
<input type="text" name="query" placeholder="Enter Query">
<input type="submit" text="Search">
</form>
<?php
$query = $_GET['query'];
print $query;
?>
</body>
</html>
Use urlencode or str_replace.
urlencode will replace all spaces with plus symbols, and with str_replace you can replace either underscores with plus symbols, or spaces with minus symbols.
Replace spaces with underscores: str_replace(' ', '_', $url);
Urlencode your $_GET*: urlencode($url);

UTF8 not working when Posting

Have a really strange problem with UTF8 characters.
I have the following:
All my files are UTF-8
I am using (in my form): accept-charset="utf-8"
I got: <meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
But for some reason when i post UTF8 characters like: ö ä å
And i then echo the $_GET[] the output show: ���
Feels like I've tried everything, all help is very welcome! :)
Browsers will send the data in same encoding as you declared to page to be. For a sanity test, run this page:
<?php
header("Content-Type:text/html; charset=utf-8");
$file = basename(__FILE__);
if( isset( $_POST['data'] ) ) {
echo $_POST['data'];
}
else {
echo <<<HTML
<form method="POST" action="$file">
<input name="data" type="text">
<input type="submit">
</form>
HTML;
}
Write "äöä" to the form and see if it's right. If it isn't, try to check your mbstring ini values for:
<?php
var_dump(
ini_get("mbstring.http_input"),
ini_get("mbstring.http_output"),
ini_get("mbstring.encoding_translation")
);
The correct values are:
string(4) "pass"
string(4) "pass"
string(1) "0"

encoding conflict: php output corrupted by html <head> content

I'm new to PHP, HTML and MySQL, and have encountered the following problem:
I have a PHP document which outputs the results of a MySQL query. Because the MySQL database and thus the output results have some non-standard characters (such as æ or á), my PHP document and the MySQL database/tables are encoded as utf-8.
For instance, here is an example of the a database entry and correct output:
goahteæjgáda
When the PHP document does not have anything in the HTML <head/> node (not even comments), then the search is successful and the output is displayed correctly (but then I can't apply my external css or include the shortcut icon, etc.).
However, if there is anything at all in the HTML <head/> node, such as standard metadata concerning content type, links to css and icon files, keywords, or even just <!-- comments -->, then either:
the search does not work if the string being searched for contains a non-standard character
OR
any non-standard characters in the resulting output are displayed as � -- for instance, the example above shows up like this after searching for "goahte":
goahte�jg�da
Any help would be appreciated!
Here is my code:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link href='style_mavsulasj.css' rel='stylesheet' type='text/css'/>
<link rel='shortcut icon' href='farben4.gif'/>
<title>search</title>
</head>
<body>
<div style="width:230px;padding:0px;margin:0px;float:left;">
<table border="0">
<tr><td>Search here:</td><td>
<?php if (strlen($_GET['smj'])==0) echo ""; else echo "current search"; ?>
</td></tr>
<form action="" method="GET">
<tr><td colspan="2">Entry:</td></tr>
<tr><td><input type='text' name='smj' value=''></input></td><td align='center'><?php echo "<span class='searchCrits' > ".$_GET['smj']."</span>"; ?></td></tr>
<tr><td colspan="2"><input type="submit" value="submit query"/></td></tr>
</form>
</table>
</div>
<div style="width:960px;padding:30px;margin-left:210px;">
<?php
if($_GET){
$smj = $_GET['smj'];
$connect = mysql_connect("localhost","root","root");
if($connect){
$toDB = mysql_select_db("bigG_reimport_test",$connect);
if($toDB){
$query = "SELECT * FROM reimport_Sheet1 WHERE smj LIKE '" . $smj . "%'";
$results = mysql_query($query);
echo "<span class='header4'>results:<br/>";
while($row = mysql_fetch_array($results)){
echo "-> " . $row['smj'] . "<br/>" ;
}
echo "</span><br/>";
}
else {die("Failed to connect to database!<br/>" . mysql_error());}}
else {die("Failed to connect to mysql!<br/>" . mysql_error());}}
?>
</div>
</body>
</html>
Make sure:
Your PHP source file is encoded as UTF-8 (yes, this really matters, sadly)
You've set the charset in your MySQL session to UTF-8. See mysql_set_charset
You set the encoding to UTF-8 in your HTTP headers.

Categories