PHP: Get encoded html entities - php

I'm trying to get the html entities of a UTF-8 string,
Example: example.com/search?q=مرحبا
<?php
echo htmlentities($_GET['q']);
?>
I got:
مرحبا0مرحبا
It's UTF-8 text not html entities,
what I need is:
مرحبا
I have tried urldecode and htmlentities functions!

Add this code to the start of your file:
header('Content-Type: text/html; charset=utf-8');
The browser needs to know it is UTF-8. This tag also can go in the head section for formality.
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />

I think you can solve it by getting the each char in the string and get its value.
From Mark Baker's answer and vartec's answer you can get:
<?php
$chrArray = preg_split('//u',$_GET['q'], -1, PREG_SPLIT_NO_EMPTY);
$htmlEntities = "";
foreach ($chrArray as $chr) {
$htmlEntities .= '&#'._uniord($chr).';';
}
echo $htmlEntities;
?>
I have not test it.

Related

Displaying japanese characters with PHP

I have plain Japanese hieroglyphs texts with utf8mb_general_ci in MySQL table, I can fetch row and display as a single string.
But what I need is to get a single character from string and use it for a query to match other results(find other hieroglyphs words that consist of that specific single hieroglyph).
Problem is that when I loop that string, all I get is ? marks.
I read that I have to use UTF8 everywhere but I believe I do.
So, what are the steps from zero to make sure so I can fetch Japanese hieroglyph string, split into separate chars and queries would understand what kind of input is that(not just a ? mark).
Here's some basic code below as an example with the same data that I fetch from my DB and which results in the same problem.
<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body>
<?php
$word = "東京のビルの中";
echo $word;
echo strlen($word);
echo "<br>";
$chars = str_split($word);
foreach($chars as $single) {
echo $single . "<br>";
}
?>
</body>
</html>
This answer works fine in your case as well. Just add the function to your code and then just call
$chars = mb_str_split($word);

Why does PHP header(charset) work while HTML <meta charset> doesn't?

this one may be easy, but seems a problem for my server (or me myself).
I have this piece of code in index.php:
<?php
header('Content-Type: text/html; charset="UTF-8"');
// Some code for generating data to be displayed
foreach ($ObjectArray as $SingleObject) {
print_r($SingleObject->getAllProperties());
}
And it does this:
But I don't want to use header('Content-Type: text/plain; charset="UTF-8"'); - I'd rather include HTML code from my header.htm:
<html>
<head>
<meta charset="UTF-8">
<title>Test Cards</title>
<link rel="stylesheet" type="text/css" href="style.css">
<script src="jquery-3.1.0.min.js"></script>
<link rel="stylesheet" href="jquery-ui.css">
<script src="jquery-ui.js"></script>
</head>
with my index.php like that:
<?php
include 'view/header.htm';
echo '<body>';
// Some code for generating data to be displayed
foreach ($ObjectArray as $SingleObject) {
print_r($SingleObject->getAllProperties());
echo '</body>';
echo '</html>';
}
Unfortunately, this ain't too good. Charset still is recognized as UTF-8, but the result is far from my expectations:
Please tell me, what is happening and how to handle this kind of problem. Is it a case of combining HTML and PHP (clean PHP does use some fancy styling when HTML ain't present?) or maybe some mistake in my code?
Thanks in advance :)
The formatted look is preserved, because in the first case you have the content-type text/plain, while in the second case it is HTML (text/html).
You can wrap it in <pre></pre> tags to preserve formatting when returning HTML.
<?php
include 'view/header.php';
echo '<body>';
echo '<pre>';
// ...
// your foreach here
// ...
echo '</pre>';
echo '</body>';

Getting title of page with php is not UTF-8

I used many types of code to get a title of one url address with php, but with all of them, i had problem,
For example,the below code , using DOMDocument :
$doc = new DOMDocument();
#$doc->loadHTML(file_get_contents("http://www.farsnews.com/newstext.php?nn=13930431001635"));
// find the title
$titlelist = $doc->getElementsByTagName("title");
if($titlelist->length > 0){
echo $titlelist->item(0)->nodeValue;
}
The out put of the code , is this :
طبق اعلام مهدی تاج گران‌ترین بازیکن ÙÂوتبال ایران معرÙÂی شد
But the title of that page is this :
طبق اعلام مهدی تاج گران‌ترین بازیکن فوتبال ایران معرفی شد
So, the problem is with encoding of the string . May be the problem is just with this site !
But how ti fix this ? And echo out the correct title ?
edit:
i have tested this meta :
<META http-equiv="Content-Type" content="text/html; charset=utf-8">
But no results.
Did you check if you php internal encoding handle UTF-8 correctly?
<?php
var_dump(mb_internal_encoding());
?>

string's result is different after load in domdocument

I want to have same result after load in domdocument. how to do it?
echo "Café";
$s = <<<HTML
<html>
<head>
</head>
<body>
Café
</body>
</html>
HTML;
$d = new domdocument;
$d->loadHTML($s);
echo $d->textContent;
first echo's result is = Café
second echo's result is =Café
You need to mark your HTML as UTF-8 encoded
$s = <<<HTML
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
Café
</body>
</html>
HTML;
$d = new domdocument;
$d->loadHTML($s);
echo $d->textContent;
your problem is Encoding,
for the First Echo, you echo the text with your default encoding,
but for the text randered through the DOMDocument,
the e+apostroph is split into two chars,
i dont know how to enforce the right encoding to DOMDoc...
but i am sure this is your problem
hope i helped,
best of luck.
With First echo before HTML you send HEADERS with your server default encoding. This ignores any next set encodings..
You must first echo
<Html tag and encodings etc..
and than echo any other values..

Character encoding issues - UTF-8 / Issue while transmitting data on the internet?

I've got data being sent from a client side which is sending it like this:
// $booktitle = "Comí habitación bailé"
$xml_obj = new DOMDocument('1.0', 'utf-8');
// node created with booktitle and added to xml_obj
// NO htmlentities / other transformations done
$returnHeader = drupal_http_request($url, $headers = array("Content-Type: text/xml; charset=utf-8"), $method = 'POST', $data = $xml_data, $retry = 3);
When I receive it at my end (via that drupal_http_request) and I do htmlentities on it, I get the following:
Comí habitación bailé
Which when displayed looks like gibberish:
Comí Habitación Bailé
What is going wrong?
Edit 1)
<?php
$title = "Comí habitación bailé";
echo "title=$title\n";
echo 'encoding is '.mb_detect_encoding($title);
$heutf8 = htmlentities($title, ENT_COMPAT, "UTF-8");
echo "heutf8=$heutf8\n";
?>
Running this test script on a Windows machine and redirecting to a file shows:
title=Comí habitación bailé
encoding is UTF-8heutf8=
Running this on a linux system:
title=Comí habitación bailé
encoding is UTF-8PHP Warning: htmlentities(): Invalid multibyte sequence in argument in /home/testaccount/public_html/test2.php on line 5
heutf8=
I think you shouldn't encode the entities with htmlentities just for outputting it correctly (you should as stated in the comments use htmlspecialchars to avoid cross side scripting) , just set the correct headers and meta end echo the values normally:
<?php
header ('Content-type: text/html; charset=utf-8');
?>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
</body>
</html>
htmlentities interprets its input as ISO-8859-1 by default; are you passing UTF-8 for the charset parameter?
Try passing headers information in a key/value array format.
Something like
$headers = array("Content-Type" => "text/xml; charset=utf-8"")

Categories