How to use substring on non english characters?

How to use substring on non english characters? - php

i have small site where i display search results of posts titles,
some titles are up to 255 characters long and while displaying those in html table , the table's row breaks i.e. doesnt shows correct so i use substr php function to trim the title so that it can fit in table row.
For english titles it working great but for non-english titles it shows blank space i.e. trim everything.
i am using substr like this
<? echo htmlspecialchars(substr($row['title'],0,70)); ?>
so how can i make the non-english titles also of characters 70 ?

You should use multi-byte safe substr() operation based on number of characters for UTF-8:
mb_substr();
http://php.net/manual/en/function.mb-substr.php

http://php.net/manual/de/function.wordwrap.php
That might occur because substr is not multi-byte save function.
You can wether use mb_substr() instead - "http://de1.php.net/manual/de/function.mb-substr.php"
Or try function "wordwrap" because its simply made for cutting strings:
<? echo htmlspecialchars(wordwrap($row['title'], "70", "", true)); ?>
Another possibility is it that this happens when using only htmlspecialchars() without substr()? But this is just a suggestion incase my other two ideas do not help.

Try using this custom function i got from DataLife Engine CMS
function dle_substr($str, $start, $length, $charset = "utf-8" ) {
if ( strtolower($charset) == "utf-8") return iconv_substr($str, $start, $length, "utf-8");
else return substr($str, $start, $length);
}
like this
<? echo htmlspecialchars(dle_substr($row['title'],0,70)); ?>

Related

PHP - Re-format number with commas

there are several topics with pretty similar questions, but in my specific case, I failed to find an answer.
I have a number, such as:
4,063,500.00
I need it reformatted in
4063500
Is there any inbuilt functionality in PHP or Laravel that allows doing it? Thanks in advance.

Your original number is treated as a string because of the commas. So at minimum you need to remove them before calling intval() which will truncate off the decimal:
<?php
$num = '4,063,500.00';
echo intval(str_replace(',', '', $num));
And the output is:
4063500

You can use regular expression to remove all extra characters except number.
ex removing extra characters from below example.
$num = '4,063,500.00';
$filtered_num = preg_replace("/[^0-9]/", "", substr($num, 0, strpos($num, ".")));
echo $filtered_num;
o/p : 4063500

Normalize Name-Surname strings: PHP+REGEX (Spanish chars- UTF8)

I'm having strings with name and surname which I need to normalize with a functiont and make them like:
Name Surname (I can recive strings like NAME SURNAME, Name SURNAME, etc...)
I've found this snipet:
echo nameize("HÉCTOR MAÑAÇ");
function nameize($str,$a_char = array("'","-"," ")){
//$str contains the complete raw name string
//$a_char is an array containing the characters we use as separators for capitalization. If you don't pass anything, there are three in there as default.
$string = strtolower($str);
foreach ($a_char as $temp){
$pos = strpos($string,$temp);
if ($pos){
//we are in the loop because we found one of the special characters in the array, so lets split it up into chunks and capitalize each one.
$mend = '';
$a_split = explode($temp,$string);
foreach ($a_split as $temp2){
//capitalize each portion of the string which was separated at a special character
$mend .= ucfirst($temp2).$temp;
}
$string = substr($mend,0,-1);
}
}
return ucfirst($string);
}
Which works pretty well, but, as you can see testing this exact example, doesn't parse spanish chars (utf8) I've tested mb_regex_encoding("UTF-8"); mb_internal_encoding("UTF-8");, headers UTF8, etc. But can't make it work fine with "special" spanish chars.
Any suggestion?

Can't see, where you use the Multibyte String Functions.
Maybe this would be convenient for your needs:
echo mb_convert_case("HÉCTOR MAÑAÇ", MB_CASE_TITLE, "UTF-8");
output:
Héctor Mañaç

Your function works fine for the given example also. Please check your file encoding type. It must be UTF-8. You can check it in Notepadd++.

PHP arabic text compare using strpos

I have a arabic keyword in a mysql table like
*#1591; *#1610; *#1585;*#1575;*#1606
// Please consider & in the place of * , value with '&' automatically converts in to arabic.
Mysql table encoding: utf8_general_ci
I am getting some string from the external resources example twitter.
I would like to match the keyword with the tweet i am getting .
$tweet = 'وينج وأداسي الاماراتية توقعان اتفاقية تعاون لتوفير أنظمة الطائرات بدون طيا';
$keyword = '*#1591; *#1610; *#1585;*#1575;*#1606'; //From db
$status = strpos ($tweet, $keyword)
$status always returns false.
I have checked with utf8_encode(), utf_8_decode() , mb_strpos() without any luck.
I know need to convert both strings to one common format before compare but which format i need to convert ?
Please help me on this.

As arabic symbols are encoded using multibyte characters, you must use functions that support such a constraint: grapheme_strpos and mb_strpos (in that order).
Using them instead of plain old strpos will do the trick.
Also, keep in mind that you may have to check for its availability prior to its use, as not all hosted environments have them enabled:
if (function_exists('grapheme_strpos')) {
$pos = grapheme_strpos($tweet, $keyword);
} elseif (function_exists('mb_strpos')) {
$pos = mb_strpos($tweet, $keyword);
} else {
$pos = strpos($tweet, $keyword);
}
And last but not least, check the docs for the different arguments that functions take, as the encoding used by the strings.

PHP - Substring after X characters with special-characters

Sorry for the title, I really didn't know how to say this...
I often have a string that needs to be cut after X characters, my problem is that this string often contains special characters like : & egrave ;
So, I'm wondering, is their a way to know in php, without transforming my string, if when I am cutting my string, I am in the middle of a special char.
Example
This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact
so right now my result with a sub string would be :
This is my string with a special char : &egra
but I want to have something like this :
This is my string with a special char : è

The best thing to do here is store your string as UTF-8 without any html entities, and use the mb_* family of functions with utf8 as the encoding.
But, if your string is ASCII or iso-8859-1/win1252, you can use the special HTML-ENTITIES encoding of the mb_string library:
$s = 'This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact';
echo mb_substr($s, 0, 40, 'HTML-ENTITIES');
echo mb_substr($s, 0, 41, 'HTML-ENTITIES');
However, if your underlying string is UTF-8 or some other multibyte encoding, using HTML-ENTITIES is not safe! This is because HTML-ENTITIES really means "win1252 with high-bit characters as html entities". This is an example of where this can go wrong:
// Assuming that é is in utf8:
mb_substr('é ', 0, 2, 'HTML-ENTITIES') === 'Ã©'
// should be 'é '
When your string is in a multibyte encoding, you must instead convert all html entities to a common encoding before you split. E.g.:
$strings_actual_encoding = 'utf8';
$s_noentities = html_entity_decode($s, ENT_QUOTES, $strings_actual_encoding);
$s_trunc_noentities = mb_substr($s_noentities, 0, 41, $strings_actual_encoding);

The best solution would be to store your text as UTF-8, instead of storing them as HTML entities. Other than that, if you don't mind the count being off (&grave; equals one character, instead of 7), then the following snippet should work:
<?php
$string = 'This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact';
$cut_string = htmlentities(mb_substr(html_entity_decode($string, NULL, 'UTF-8'), 0, 45), NULL, 'UTF-8')."<br><br>";
Note: If you use a different function to encode the text (e.g. htmlspecialchars()), then use that function instead of htmlentities(). If you use a custom function, then use another custom function that does the opposite of your new custom function instead of html_entity_decode() (and custom function instead of htmlentities()).

The longest HTML entity is 10 characters long, including the ampersand and semicolon. If you intend to cut the string at X bytes, check bytes X-9 through X-1 for an ampersand. If the corresponding semicolon appears at byte X or later, cut the string after the semicolon instead of after byte X.
However, if you're willing to preprocess the string, Mike's solution will be more accurate because his cuts the string at X characters, not bytes.

You can use html_entity_decode() first to decode all the HTML entities. Then split your string. Then htmlentities() to re-encode the entities.
$decoded_string = html_entity_decode($original_string);
// implement logic to split string here
// then for each string part do the following:
$encoded_string_part = htmlentities($split_string_part);

A little bruteforce solution, that I'm not really happy with would a PCRE expression, let's say that you want to pass 80 characters and the longest possible HTML expression is 7 chars long:
$regex = '~^(.{73}([^&]{7}|.{0,7}$|[^&]{0,6}&[^;]+;))(.*)~mx'
// Note, this could return a bit of shorter text
return preg_replace( $regexp, '$1', $text);
Just so you know:
.{73} - 73 characters
[^&]{7} - okay, we may fill it with anything that doesn't contain &
.{0,7}$ - keep in mind the possible end (this shouldn't be necessary because shorter text wouldn't match at all)
[^&]{0,6}&[^;]+; - up to 6 characters (you'd be at 79th), then & and let it finish
Something that seems much better but requires bit of play with numbers is to:
// check whether $text is at least $N chars long :)
if( strlen( $text) < $N){
return;
}
// Get last &
$pos = strrpos( $text, '&', $N);
// We're not young anymore, we have to check this too (not entries at all) :)
if( $pos === false){
return substr( $text, 0, $N);
}
// Get Last
$end = strpos( $text, ';', $N);
// false wouldn't be smaller then 0 (entry open at the beginning
if( $end === false){
$end = -1;
}
// Okay, entry closed (; is after &)(
if( $end > $pos){
return substr($text, 0, $N);
}
// Now we need to find first ;
$end = strpos( $text, ';', $N)
if( $end === false){
// Not valid HTML, not closed entry, do whatever you want
}
return substr($text, 0, $end);
Check numbers, there may be +/-1 somewhere in indexes...

I think you would have to use a combination of strpos and strrpos to find the next and previous spaces, parse the text between the spaces, check that against a known list of special characters, and if it matches, extend your "cut" to the position of the next space. If you had a code sample of what you have now, we could give you a better answer.

php shorten amount of text displayed

I am trying to limit the amount of text/data is being shown from MySQL not MySQL LIMIT but limiting it on the actual page like most blogs do. After certain text point they just display ...more, I know there is a function to do this in PHP but I am unable to remember its name could some one help me out with it?

if(strlen($text)>1000){
$text=substr($text,0,1000).' Read more';
}
you should understand that it can cut words and tags too.

SELECT LEFT(content, 1000) FROM blog
If you load entire content for example 30 000 chars and do substr(), you are wasting memory in order to show only 1000.

You could use a function like this (taken from http://brenelz.com/blog/creating-an-ellipsis-in-php/):
function ellipsis($text, $max=100, $append='…')
{
if (strlen($text) <= $max) return $text;
$out = substr($text,0,$max);
if (strpos($text,' ') === FALSE) return $out.$append;
return preg_replace('/\w+$/','',$out).$append;
}
This won't cut a word in half like substr.

There are a few ways to do it, the easiest probably being substr()
$short = substr($long, 0, $max_len);

string substr ( string $string , int $start [, int $length ] )
It accepts two arguments. The first is the string that you would like to trim. The second is the length, in characters of what you'd like returned.
http://php.net/manual/en/function.substr.php

Apply the wrap() function to get your shortened text, and replace "99" with the number of characters you want to limit it to.
function wrap($string) {
$wstring = explode("\n", wordwrap($string, 99, "\n") );
return $wstring[0];
}

<?
$position=14; // Define how many character you want to display.
$message="You are now joining over 2000 current";
$post = substr($message, 0, $position);
echo $post;
echo "...";
?>
This result shows 14 characters from your message

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to use substring on non english characters? - php

You should use multi-byte safe substr() operation based on number of characters for UTF-8: mb_substr(); http://php.net/manual/en/function.mb-substr.php

Related

PHP - Re-format number with commas

Normalize Name-Surname strings: PHP+REGEX (Spanish chars- UTF8)

PHP arabic text compare using strpos

PHP - Substring after X characters with special-characters

php shorten amount of text displayed

Categories

Resources