Encode a string into character codes - php

I want to encode an email address into its corresponding character codes, so when it is printed the char codes are interpreted by the browser, but the robots get the encoded string instead of the interpreted one.
For example (1):
abc#abc.com
should be sent to the browser as (2) (whitespaces added so the browser shows it):
&#97 ;&#98 ;&#99 ;&#64 ;&#97 ;&#98 ;&#99 ;&#46 ;&#99 ;&#111 ;&#109 ;
so the human reads (1) and web robots read(2)
There should be an easy function or way to do this, but cannot find one.

YOu could try this:
<?php
$s = "abc#abc.com";
$obj = array_map(function($x){return "&#". strval(ord($x)) . ";";},str_split($s));
echo implode($obj);
?>

function encode_everything($string){
$encoded = "";
for ($n=0;$n<strlen($string);$n++){
$check = htmlentities($string[$n],ENT_QUOTES);
$string[$n] == $check ? $encoded .= "&#".ord($string[$n]).";" : $encoded .= $check;
}
return $encoded;
}
Found at:
http://php.net/manual/en/function.htmlentities.php

Here's a neat function I wrote to do something similar - not sure if it's exactly what you're looking for, but it uses php's ord function to take each character in a string and output its ascii equivalent:
$testString = "I hate – Character – 150" ;
function printAscii($string){
for ($n=0;$n<strlen($string);$n++){
echo "<pre>";
echo ord($string[$n]);
echo " ---->";
echo $string[$n];
echo "</pre>";
}
}
$test = printAscii($testString);

Related

Create UTF-8 code from dynamic Unicode in PHP

I am making a dynamic Unicode icon in PHP. I want the UTF-8 code of the Unicode icon.
So far I have done:
$value = "1F600";
$emoIcon = "\u{$value}";
$emoIcon = preg_replace("/\\\\u([0-9A-F]{2,5})/i", "&#x$1;", $emoIcon);
echo $emoIcon; //output 😀
$hex=bin2hex($emoIcon);
echo $hex; // output 26237831463630303b
$hexVal=chunk_split($hex,2,"\\x");
var_dump($hexVal); // output 26\x23\x78\x31\x46\x36\x30\x30\x3b\x
$result= "\\x" . substr($hexVal,0,-2);
var_dump($result); // output \x26\x23\x78\x31\x46\x36\x30\x30\x3b
But when I put the value directly, it prints the correct data:
$emoIcon = "\u{1F600}";
$emoIcon = preg_replace("/\\\\u([0-9A-F]{2,5})/i", "&#x$1;", $emoIcon);
echo $emoIcon; //output 😀
$hex=bin2hex($emoIcon);
echo $hex; // output f09f9880
$hexVal=chunk_split($hex,2,"\\x");
var_dump($hexVal); // output f0\x9f\x98\x80\x
$result= "\\x" . substr($hexVal,0,-2);
var_dump($result); // output \xf0\x9f\x98\x80
\u{1F600} is a Unicode escape sequence used in double-quoted strings, it must have a literal value - trying to use "\u{$value}", as you've seen, doesn't work (for a couple reasons, but that doesn't matter so much.)
If you want to start with "1F600" and end up with 😀 use hexdec to turn it into an integer and feed that to IntlChar::chr to encode that code point as UTF-8. E.g.:
$value = "1F600";
echo IntlChar::chr(hexdec($value));
Outputs:
😀

Text between less than and greater than not output PHP

i use (str_replace) function to replace ##ID## in youtube url with this regular expression : (?P<id>[a-z-A-Z_0-9]+)
so i use this code to do this :
<?php
$urlbase = 'https://www.youtube.com/watch?v=##ID##';
$lastchange = str_replace('##ID##', '(<id>[a-z-A-Z_0-9]+)', $urlbase);
echo $lastchange;
?>
i get the output in the browser like this : https://www.youtube.com/watch?v=(?P[a-z-A-Z_0-9]+), its looks like <id> not show up !
i try this simple code :
<?php
echo "This is my <id>";
?>
but i just get this is my in the browser !
What's the probleme ? and how i can fix it , thanks
is being interpreted as HTML so your browser is parsing it and since it is not a renderable element, it shows nothing. Try:
<?php
echo "This is my <id>
?>
As for the str_replace, it's doing exactly what the function is supposed to be doing. If you're looking to use regular expressions in string replacements, use preg_replace
The tag <id> is being removed by your browser. It is really there if you watch the source code. Maybe you should try:
$urlbase = 'https://www.youtube.com/watch?v=##ID##';
$lastchange = str_replace('##ID##', '(<id>[a-z-A-Z_0-9]+)', $urlbase);
echo urlencode( $lastchange );
Problem is with the line:
$lastchange = str_replace('##ID##', '(<id>[a-z-A-Z_0-9]+)', $urlbase);
str_replace does not use regex.
You will need preg_replace
$pattern = '(<id>[a-z-A-Z_0-9]+)'
$replacement = '##ID##'
$string = $urlbase
$lastchange = preg_replace($pattern, $replacement, $string);
Also < and > are html entities which means they are reserved chars for HTML they have some special meanings if you want to show them then you must use there entity name eg < and > in your case respectively.
<?php
echo " echo "This is my <id>";
?>

How to convert Emoji from Unicode in PHP?

I use this table of Emoji and try this code:
<?php print json_decode('"\u2600"'); // This convert to ☀ (black sun with rays) ?>
If I try to convert this \u1F600 (grinning face) through json_decode, I see this symbol — ὠ0.
Whats wrong? How to get right Emoji?
PHP 5
JSON's \u can only handle one UTF-16 code unit at a time, so you need to write the surrogate pair instead. For U+1F600 this is \uD83D\uDE00, which works:
echo json_decode('"\uD83D\uDE00"');
😀
PHP 7
You now no longer need to use json_decode and can just use the \u and the unicode literal:
echo "\u{1F30F}";
🌏
In addition to the answer of Tino, I'd like to add code to convert hexadecimal code like 0x1F63C to a unicode symbol in PHP5 with splitting it to a surrogate pair:
function codeToSymbol($em) {
if($em > 0x10000) {
$first = (($em - 0x10000) >> 10) + 0xD800;
$second = (($em - 0x10000) % 0x400) + 0xDC00;
return json_decode('"' . sprintf("\\u%X\\u%X", $first, $second) . '"');
} else {
return json_decode('"' . sprintf("\\u%X", $em) . '"');
}
}
echo codeToSymbol(0x1F63C); outputs 😼
Example of code parsing string including emoji unicode format
$str = 'Test emoji \U0001F607 \U0001F63C';
echo preg_replace_callback(
'/\\\U([A-F0-9]+)/',
function ($matches) {
return mb_convert_encoding(hex2bin($matches[1]), 'UTF-8', 'UTF-32');
},
$str
);
Output: Test emoji 😇 😼
https://3v4l.org/63dUR

check if the string begin with euro/pound symbol

I'm trying to check if a string is start with '€' or '£' in PHP.
Below are the codes
$text = "€123";
if($text[0] == "€"){
echo "true";
}
else{
echo "false";
}
//output false
If only check a single char, it works fine
$symbol = "€";
if($symbol == "€"){
echo "true";
}
else{
echo "false";
}
// output true
I have also tried to print the string on browser.
$text = "€123";
echo $text; //display euro symbol correctly
echo $text[0] //get a question mark
I have tried to use substr(), but the same problem occurred.
Characters, such as '€' or '£' are multi-byte characters. There is an excellent article that you can read here. According to the PHP docs, PHP strings are byte arrays. As a result, accessing or modifying a string using array brackets is not multi-byte safe, and should only be done with strings that are in a single-byte encoding such as ISO-8859-1.
Also make sure your file is encoded with UTF-8: you can use a text editor such as NotePad++ to convert it.
If I reduce the PHP to this, it works, the key being to use mb_substr:
<?php
header ('Content-type: text/html; charset=utf-8');
$text = "€123";
echo mb_substr($text,0,1,'UTF-8');
?>
Finally, it would be a good idea to add the UTF-8 meta-tag in your head tag:
<meta charset="utf-8">
I suggest this as the easiest solution to you. Convert the symbols to their unicode identifiers using htmlentities().
htmlentities($text, ENT_QUOTES, "UTF-8");
Which will either give you £ or €. Now that allows you to run a switch() {case:} statement to check. (Or your if statements)
$symbols = explode(";", $text);
switch($symbols[0]) {
case "&pound":
echo "It's Pounds";
break;
case "&euro":
echo "It's Euros";
break;
}
Working Example
This happens because you’re using a multi-byte character encoding (probably UTF-8) in which both € and £ are recorded using multiple bytes. That means that "€" is a string of three bytes, not just one.
When you use $text[0] you're getting only the first byte of the first character, and so it doesn't match the three bytes of "€". You need to get the first three bytes instead, to check whether one string starts with another.
Here’s the function I use to do that:
function string_starts_with($string, $prefix) {
return substr($string, 0, strlen($prefix)) == $prefix;
}
The question mark appears because the first byte of "€" isn’t enough to encode a whole character: the error is indicated by ‘�’ when available, otherwise ‘?’.

How to convert unicode to arabic characters in php?

let us say that the string is
$uni_str="06280628002006280628";
In Arabic,it is: بب بب
so , how can i convert it in php without using html like:
for($i=0; $i<strlen($uni_str); $i+=4)
{
$text_str .= "&#x".substr($uni_str,$i,4).";";
}
as this code just solves the problem of viewing the result in html page ,
but i want to but the result in php variable .
as the result of the code above was like
بب بب
I found the solution , hope to help:
function uni2arabic($uni_str)
{
for($i=0; $i<strlen($uni_str); $i+=4)
{
$new="&#x".substr($uni_str,$i,4).";";
$txt = html_entity_decode("$new", ENT_COMPAT, "UTF-8");
$All.=$txt;
}
return $All;
}
variable $All contains the arabic string
Use hex2bin to decode the hex into a sequence of bytes, and then you can unpack each pair of bytes as a UTF-16 code unit (which is what I assume your string represents).
Assuming you are producing UTF-8 text output:
iconv('UTF-16BE', 'UTF-8', hex2bin('06280628002006280628'))
The following code allows you to decode the characters as well as re-encode them if necessary
Code :
if (!function_exists('codepoint_encode')) {
function codepoint_encode($str) {
return substr(json_encode($str), 1, -1);
}
}
if (!function_exists('codepoint_decode')) {
function codepoint_decode($str) {
return json_decode(sprintf('"%s"', $str));
}
}
How to use :
header('Content-Type: text/html; charset=utf-8');
var_dump(codepoint_encode('ඔන්ලි'));
var_dump(codepoint_encode('සින්ග්ලිෂ්'));
var_dump(codepoint_decode('\u0d94\u0db1\u0dca\u0dbd\u0dd2'));
var_dump(codepoint_decode('\u0dc3\u0dd2\u0db1\u0dca\u0d9c\u0dca\u0dbd\u0dd2\u0dc2\u0dca'));
Output :
string(30) "\u0d94\u0db1\u0dca\u0dbd\u0dd2"
string(60) "\u0dc3\u0dd2\u0db1\u0dca\u0d9c\u0dca\u0dbd\u0dd2\u0dc2\u0dca"
string(15) "ඔන්ලි"
string(30) "සින්ග්ලිෂ්"
If you want more complex functionality, see How to get the character from unicode code point in PHP?.

Categories