mb_substr return wrong string length - php

I was using substr for my code to cut exactly 75 characters of my string but I've got some problems with UTF-8 strings that produce characters like "�" so I searched for a solution and found out I could use mb_string but when I use mb_string the string that it returns to me is not 75 characters anymore and its length get change in different strings...
$title = "hi mémé hi mémé hi mémé hi mémé hi mémé hi mémé hi mémé hi mémé hi mémé hi mémé hi mémé hi mémé ";
if(strlen($title) > 75){
$title = mb_substr($title , 0 , 72 , 'UTF-8');
$title = $title."...";
}
echo strlen($title); // 93
I think the function add extra data to string that makes its length longer, I dont know but I just want to get a 75 character string , I tried to use lower characters for mb_substr parameter but it doesn't work for every strings

Related

Base64_decode return string not exactly japanse character Cakephp

When I try to convert string encode to decode string japanse character into IE with function base64_decode, it return string not exactly. My code very simple like this :
$text = base64_encode('aサンプル.pdf');
echo base64_decode($text);
It's will print a繧オ繝ウ繝励Ν.pdf only into IE and cakephp 2.0 .Chrome or FF working fine. In addition I try with the same code into codeigniter framework, it working fine too. What's wrong stay here. Please help!

PHP Urlencoded string not showing properly in SMS body

I am trying to send an encoded string appending it to an api url used fo sending BULK SMS. But not all characeters are being encoded correctly. I've tried using utf8_encode, htmlentities and htmlspecialchars on the string before using urlencode but that didn't work either. Since its an external api call they are only using urldecode for showing the string which I can not change. So what needs to be done to encode the special characters present in the following string to show it correctly in the resulting SMS.
Example String:
$message = '2016 brings with it a new year sale at “Orchid” & “home n décor”.';
2016 brings with it a new year sale at “Orchid” & “home n décor”.
After applying urlencode and htmlentities
urlencode(htmlentities($message, ENT_QUOTES, 'UTF-8'))
I get the following output -
2016+brings+with+it+a+new+year+sale+at+%26ldquo%3BOrchid%26rdquo%3B+%26amp%3B+%26ldquo%3Bhome+n+d%26eacute%3Bcor%26rdquo%3B.
When I try to use urldecode on this it gives me result as expected -
2016 brings with it a new year sale at “Orchid” & “home n décor”.
When I try to send the encoded output string as an url parameter and echo the parameter I get the exact same result, without even using the urldecode.
But the problem is in my SMS body, it is always showing in the following pattern -
2016 brings with it a new year sale at “Orchid” & “home n décor”.
Why is it not showing the decoded string in SMS? How this can be done? Its showing normal encoded characters like & without any problem as its encoded to %26. But causing issues for double quotes or characters like é. Can anyone suggest any workarounds?
Use:
$message = "Message Text";
instead of:
$message = urlencode("Message Text");

PHP function to convert HEX char codes to display equivalent

I am working on some PHP code to identify HEX character codes in a string and convert them to their "as seen on screen" equivalent. Mainly, there HEX codes are for accented characters like é, ç and so on.
For example, I am receiving a string like this:
$str = "caf&#xe9s"; - NOTE there is a semicolon after the 9 (i had to remove it to stop this text editor converting it!
The HEX part of the string is &#xe9 (again with semicolon at end) - and I am needing to convert that to its "as seen on screen" equivalent, in this case "é". So the converted string would be "cafés".
The following PHP code works, but I have to write one for each HEX code, and there are scores of them.
$keywords = str_replace("&#xe9","é",$keywords); [again the needle part has a semicolon]
Can anyone suggest an existing PHP function that can scan any string for known HEX codes and convert it to the display equivalent?
I am working in UTF8 otherwise.
Thanks for your consideration, sorry if my terminology sounds amateur.
James
http://www.php.net/manual/en/function.html-entity-decode.php
This will convert HTML entities into their associated char
$keywords = html_entity_decode($keywords);

Rendering � in PHP after using the substr() method [duplicate]

This question already has answers here:
Using PHP's substr() with special characters at the end results in question marks
(5 answers)
Closed 9 years ago.
I need to select just 180 characters from a MySQL database by PHP and show read more link for users that want to read total text. So I read all text from MySQL and use the substr() function like this:
$some_text = substr($total_text, 0, 180);
Everything is fine, but after some string char � shows up.
What is this and how can I fix it?
It sounds like you're working with multi-byte characters.
Try using mb_substr() instead:
$some_text = mb_substr($total_text, 0, 180);
I had this exact issue with a language translation project I've recently been working on.
Apart from altering the charsets in your database, you can try the following after your code above:
echo htmlentities($some_text, ENT_QUOTES, 'UTF-8');

strlen & special chars

I'm having an issue finding a solution here.. I'm developing a WordPress theme for a client that uses a for() loop to iterate through the title of the page so it can be wrapped in <span>s and displayed vertically.. the loop uses strlen() to find the length of the title but since some of the page titles include '...' or commas in the title it returns the html chars instead.. I can't figure out what is causing that and every effort via htmlspecialchars_decode() or html_entity_decode() doesn't work.. any suggestions? Is there something going on with the for loop that I'm now aware of?
Since it was requested here is the actual code:
$p_title = get_the_title($port_page->ID);
$title = '';
for($i=0;$i<strlen($p_title);$i++){
if(($p_title[$i])){
$title .="<span>$p_title[$i]</span>";
}
I've tried using mb_strlen as well.. the problem with searching for a specific character to replace doesn't necessarily solve the problem since page titles are arbitrarily set by the site owner..
The weird thing is the Title is not encoded in any way and echo's normally before the for loop.. So it's as if something is converting it..
This sounds a lot like a character encoding issue with multibyte characters. Can you try replacing strlen() with mb_strlen() and see if it does the job?
http://php.net/manual/en/function.mb-strlen.php
strlen() only returns the number of bytes in a string. Some special characters can be represented with multiple bytes, and Unicode can also make single 'characters' like a copyright symbol ("©") occupy many characters (e.g. ©).
Your "..." (ellipsis) can be a special character in Unicode for example.
The quick and dirty solution I suggest:
// Example string should be 1 character long, 6 bytes
$text = "©";
$bytes = strlen($text);
mb_internal_encoding('UTF-8');
$text = html_entity_decode($text, ENT_QUOTES, "UTF-8");
$length = mb_strlen($text);
print "String is ".$length." characters long, ".$bytes." bytes long";
Note that I'm assuming your string is already UTF-8. If it isn't, convert it first.

Categories