preg_match with cyrillic text [duplicate] - php

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to match Cyrillic characters with a regular expression
I have a simple php script which uses preg_match to compare a string against some cyrillic text inside a variable (e.g. $var = 'страница').
However when I input the cyrilic text into the variable it comes up as ???????? in my code.
$var1 = '/?????????????/';
I get the folowing warning when I run the script:
preg_match(): Compilation failed: nothing to repeat at offset 0
Can anyone suggest a solution?
thanks very much.

Change encoding of your scripts or all project source files on UTF for example in your IDE.

Use modifier for unicode
preg_match('/abcdef/u',$some_string)
Maybe it’s because of invalid codepage, which codepage has your interpreter and which codepage uses connection to a database (if any?)

Related

My preg_match only works with utf8_encode [duplicate]

This question already has answers here:
Difference between * and + regex
(7 answers)
Closed 2 years ago.
My PHP code receives a $request from an AJAX call. I am able to extract the $name from this parameter. As this name is in German, the allowed characters also include ä, ö and ü.
I want to validate $name = "Bär" via preg_match. I am sure, that the ä is correctly arriving as an UTF-8 encoded string in my PHP code. But if I do this
preg_match('/^[a-zA-ZäöüÄÖÜ]*$/', $name);
I get false, although it should be true. I only receive true in case I do
preg_match(utf8_encode('/^[a-zA-ZäöüÄÖÜ]*$/'), $name);
Can someone explain this to me and also how I set PHP to globaly encode every string to UTF-8?
PHP strings do not have any specific character encoding. String literals contain the bytes that the interpreter finds between the quotes in the source file.
You have to make sure that the text editor or IDE that you are using is saving files in UTF-8. You'll typically find the character encoding in the settings menu.
Your regular expression is wrong. You only test for one sign. The + stands for 1 or more characters. If your PHP code is saved as UTF-8 (without BOM), the u flag is required for Unicode.
$name = "Bär";
$result = preg_match('/^[a-zA-ZäöüÄÖÜ]+$/u', $name);
var_dump($result); //int(1)
For all German umlauts the ß is still missing in the list.

preg_replace UTF-8 doesn't work [duplicate]

This question already has answers here:
Matching Unicode letter characters in PCRE/PHP
(5 answers)
Closed 4 years ago.
I've got the following code which works fine on my offline test version but it fails on the online server.
$names = "dimitris giannIs micHalis";
echo preg_replace("/s\b/", "w", mb_convert_case($names, MB_CASE_TITLE, "UTF-8"));
The result I get is Dimitriw Gianniw Michaliw.
But instead of English chars/words I've got UTF-8 ones. If I place the above example as it is (in English) it works fine so I'm guessing I'm doing something wrong here with UTF-8
Typically (but see the note below the Edit), you need to use the u modifier on your regex to make it work with UTF-8 characters. e.g.
$words = "qθαεqθε γραεcισ cονσεcτε";
echo preg_replace("/ε\b/u", "α", mb_convert_case($words, MB_CASE_TITLE, "UTF-8"));
Output:
Qθαεqθα Γραεcισ Cονσεcτα
This example on rextester demonstrates the use of the u modifier (note that rextester doesn't support mb_convert_case but that doesn't really affect the result).
Edit
As was pointed out by #CasimiretHippolyte, it is possible to compile the PCRE extension (used by PHP for regex) to handle unicode characters by default with the --enable-unicode-properties option. This may explain the difference between the results on the offline test version and online server.

Writing source code in PHP without special characters [duplicate]

This question already has answers here:
Unicode character in PHP string
(8 answers)
Closed 4 years ago.
Is There a way to print special characters in PHP using only source code with ascii characters?
For example, in javascript, we can use \u00e1 in the middle of text.
In Java we can use \u2202 for example.
And in PHP? How can I use it?
I don't want to include special chars in my source code.
I found 3 ways for this.
Php Documentation: http://php.net/manual/en/language.types.string.php#language.types.string.syntax.double
A good explanation in portuguese: https://pt.stackoverflow.com/questions/293500/escrevendo-c%C3%B3digo-em-php-sem-caracteres-especiais
Sintax added only in PHP7:
\u{[0-9A-Fa-f]+}
the sequence of characters matching the regular expression is a Unicode codepoint.
which will be output to the string as that codepoint's UTF-8 representation
examples:
<?php
echo "\u{00e1}\n";
echo "\u{2202}\n";
echo "\u{aa}\n";
echo "\u{0000aa}\n";
echo "\u{9999}\n";
Sintax for PHP7 and old PHP versions:
\x[0-9A-Fa-f]{1,2}
the sequence of characters matching the regular expression,
is a character in hexadecimal notation
examples:
<?php
echo "\xc3\xa1\n";
echo "\u{00e1}\n";
Using int to binary convertion functions:
<?php
printf('%c%c', 0xC3, 0xA1);
echo chr(0xC3) . chr(0xA1);
printf() Extended Unicode Characters?
http://phptester.net/

Rendering � in PHP after using the substr() method [duplicate]

This question already has answers here:
Using PHP's substr() with special characters at the end results in question marks
(5 answers)
Closed 9 years ago.
I need to select just 180 characters from a MySQL database by PHP and show read more link for users that want to read total text. So I read all text from MySQL and use the substr() function like this:
$some_text = substr($total_text, 0, 180);
Everything is fine, but after some string char � shows up.
What is this and how can I fix it?
It sounds like you're working with multi-byte characters.
Try using mb_substr() instead:
$some_text = mb_substr($total_text, 0, 180);
I had this exact issue with a language translation project I've recently been working on.
Apart from altering the charsets in your database, you can try the following after your code above:
echo htmlentities($some_text, ENT_QUOTES, 'UTF-8');

Unknown character � after importing excel to MySQL, how to avoid it? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Problem in utf-8 encoding PHP + MySQL
I've imported about 1000 records into MySQL from an excel file. But now I'm seeing � between some texts. It seems they were double quotes.
How can I avoid this while importing data?
Can I use str_replace() function to handle this issue while printing data in web page?
Use preg_replace to do a regex replacement of all unrecognized characters.
Example:
$data = preg_replace("/[^a-zA-Z0-9]/", "", $data);
This example will replace all non alpha-numeric characters (anything that is not a-z, A-Z, 0-9).
http://php.net/manual/en/function.preg-replace.php
If your database is simple enough (no serialised values and no gigabytes in size), you could export it entirely (e.g. using PhpMyAdmin), open in a text editor, do search-replace and import it back.
str_replace('“', '"', $original_string);
there's a few characters word does this with, so you will want to probably also do:
str_replace("‘", "'", $original_string);
if you see other characters causing the same issue, you can open up the doc in word, and copy/paste the offending character into your editor and do a similar replacement.
Since you are most likely looking to replace the character with an equivalent version, you probably do not want to do a regex like suggested in another answer. str_replace is faster than preg_replace for type of use.

Categories