Rendering � in PHP after using the substr() method [duplicate] - php

This question already has answers here:
Using PHP's substr() with special characters at the end results in question marks
(5 answers)
Closed 9 years ago.
I need to select just 180 characters from a MySQL database by PHP and show read more link for users that want to read total text. So I read all text from MySQL and use the substr() function like this:
$some_text = substr($total_text, 0, 180);
Everything is fine, but after some string char � shows up.
What is this and how can I fix it?

It sounds like you're working with multi-byte characters.
Try using mb_substr() instead:
$some_text = mb_substr($total_text, 0, 180);

I had this exact issue with a language translation project I've recently been working on.
Apart from altering the charsets in your database, you can try the following after your code above:
echo htmlentities($some_text, ENT_QUOTES, 'UTF-8');

Related

preg_replace UTF-8 doesn't work [duplicate]

This question already has answers here:
Matching Unicode letter characters in PCRE/PHP
(5 answers)
Closed 4 years ago.
I've got the following code which works fine on my offline test version but it fails on the online server.
$names = "dimitris giannIs micHalis";
echo preg_replace("/s\b/", "w", mb_convert_case($names, MB_CASE_TITLE, "UTF-8"));
The result I get is Dimitriw Gianniw Michaliw.
But instead of English chars/words I've got UTF-8 ones. If I place the above example as it is (in English) it works fine so I'm guessing I'm doing something wrong here with UTF-8
Typically (but see the note below the Edit), you need to use the u modifier on your regex to make it work with UTF-8 characters. e.g.
$words = "qθαεqθε γραεcισ cονσεcτε";
echo preg_replace("/ε\b/u", "α", mb_convert_case($words, MB_CASE_TITLE, "UTF-8"));
Output:
Qθαεqθα Γραεcισ Cονσεcτα
This example on rextester demonstrates the use of the u modifier (note that rextester doesn't support mb_convert_case but that doesn't really affect the result).
Edit
As was pointed out by #CasimiretHippolyte, it is possible to compile the PCRE extension (used by PHP for regex) to handle unicode characters by default with the --enable-unicode-properties option. This may explain the difference between the results on the offline test version and online server.

character encoding for mixed data [duplicate]

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 8 years ago.
I'm having an issue with getting the correct character encoding for data being POSTed which is built up from multiple sources (I get the data as a single POST variable). I think they're not in the same character encoding...
For instance, take the symbol £. If I do nothing to the character encoding I get two results:
a = £ and b = £
I've tried using various configurations of iconv() like so;
$data = iconv('UTF-8', 'windows-1252//TRANSLIT', $_POST['data']);
The above results in a = £ and b = �
I've also tried utf8_encode/decode as well as html_entity_decode, as I think there's a possibility that one of the pound symbols are being generated using html_entities.
I've tried setting the character encoding in the header which didn't work. I just can't get both instances to work at the same time.
I'm not sure what to try next, any ideas?
I've managed to work around this issue by finding the content that was causing an issue when everything else was in utf8 by using utf8_encode().
This appears to work for the £ symbol. I've not found any other characters causing an issue so far.
Note, I am still using iconv() in conjunction with this.

arabic echoing substr it gives me ? mark on end of my data been echoed [duplicate]

This question already has answers here:
php substr() function with utf-8 leaves � marks at the end
(7 answers)
Closed 8 years ago.
I am using php and mysql and I want to echo a part of data from row one
I used the command:
<? echo substr($row['text'],0,500); ?>
It will get 500 characters from the $row of text but at end ? appears. I am using Arabic text, which may be the reason.
Since the Arabic text are joined with each other may be it's not ended and that's why when it breaks at the time of the join which each other and if keep the join it will be more then 500 characters so I get the gives ?
How can I fix it so a question mark does not appear at the end?
It’s definitely a UTF-8 encoding issue. And the solution is to use mb_substr instead of plain substr like so:
<? echo mb_substr($row['text'],0,500,"utf-8"); ?>

preg_match with cyrillic text [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to match Cyrillic characters with a regular expression
I have a simple php script which uses preg_match to compare a string against some cyrillic text inside a variable (e.g. $var = 'страница').
However when I input the cyrilic text into the variable it comes up as ???????? in my code.
$var1 = '/?????????????/';
I get the folowing warning when I run the script:
preg_match(): Compilation failed: nothing to repeat at offset 0
Can anyone suggest a solution?
thanks very much.
Change encoding of your scripts or all project source files on UTF for example in your IDE.
Use modifier for unicode
preg_match('/abcdef/u',$some_string)
Maybe it’s because of invalid codepage, which codepage has your interpreter and which codepage uses connection to a database (if any?)

Unknown character � after importing excel to MySQL, how to avoid it? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Problem in utf-8 encoding PHP + MySQL
I've imported about 1000 records into MySQL from an excel file. But now I'm seeing � between some texts. It seems they were double quotes.
How can I avoid this while importing data?
Can I use str_replace() function to handle this issue while printing data in web page?
Use preg_replace to do a regex replacement of all unrecognized characters.
Example:
$data = preg_replace("/[^a-zA-Z0-9]/", "", $data);
This example will replace all non alpha-numeric characters (anything that is not a-z, A-Z, 0-9).
http://php.net/manual/en/function.preg-replace.php
If your database is simple enough (no serialised values and no gigabytes in size), you could export it entirely (e.g. using PhpMyAdmin), open in a text editor, do search-replace and import it back.
str_replace('“', '"', $original_string);
there's a few characters word does this with, so you will want to probably also do:
str_replace("‘", "'", $original_string);
if you see other characters causing the same issue, you can open up the doc in word, and copy/paste the offending character into your editor and do a similar replacement.
Since you are most likely looking to replace the character with an equivalent version, you probably do not want to do a regex like suggested in another answer. str_replace is faster than preg_replace for type of use.

Categories