Why Chinese character performance wrong in preg_replace function? [duplicate] - php

This question already has answers here:
Matching Unicode letter characters in PCRE/PHP
(5 answers)
Closed 5 years ago.
My code is:
preg_replace('/[中]/', '1', '中,博文大,精中深');
Why the result is:
111,博文大,精111深
The Chinese character '中' should be replace once, while triple instead.
Any help? Thanks

First of all, please read this article about unicode characters in regexps.
Next, you may need this article about modifiers. I think that you need u modifier in your regexp.
preg_replace('/[中]/u', '1', '中,博文大,精中深');
Please, also read comments in modifiers article for more examples.
Also, for simple replaces like in example above you can use str_replace.
str_replace('中', '1', '中,博文大,精中深');

Related

Regex PHP part of string [duplicate]

This question already has answers here:
Variable-length lookbehind-assertion alternatives for regular expressions
(5 answers)
Closed 4 years ago.
I cant get my regexpression to work in php. It works in javascript (vuejs):
(?<=.+: )(.*)
I have this string:
NL: abcdef
and i would like to get
abcdef
Can someone please tell me what i am doing wrong?
There are many ways to solve this using PHP/PCRE, one is to skip the preceding string using \K
[^:]+: \K(.*)
Regex Demo
If you can add an anchor to the beginning of the string, even better: ^[^:]+: \K(.*)

how to keep special characters using preg_replace in php [duplicate]

This question already has answers here:
Is There a Way to Match Any Unicode Alphabetic Character?
(2 answers)
Closed 4 years ago.
I am trying to use preg_replace function and it works as expected. The problem is it also removes the special alphabet like this one Ö and removes O. How can I keep Ö?
$string='GÖTEBORG-SEASON-1';
echo $str=preg_replace('/[^A-Za-z-_]/', '', $string);
It output GTEBORG-SEASON- (Ö is missing) but I am expecting GÖTEBORG-SEASON-
Thank you.
I think I have solved it. I need to use something like this
preg_replace ('/[^\p{L}-_]/u','', $string);

PHP isolate character surrounded by special character [duplicate]

This question already has answers here:
Extract a single (unsigned) integer from a string
(23 answers)
Closed 6 years ago.
I have the following string:
$db_string = '/var/www/html/1_wlan_probes.db';
I want to isolate/strip the number character so that I only have the following left:
$db_string = '1';
So far I havn't found an simply solution since the number that needs to be found is random and could be any positive number. I have tried strstr, substr and custom functions but none produce what I am looking after, or I'm simply overlooking somehthing really simple.
Thanks in advance
You should use the preg_match() function:
$db_string = '/var/www/html/1_wlan_probes.db';
preg_match('/html\/(\d+)/', $db_string, $matches);
print_r($matches[1]); // 1
html\/(\d+) - capture all the numbers that come right after the html/
You can test it out Here. It does not matter how long the number is, you're using a regular expression to match all of them.

How to prevent zalgo text using php [duplicate]

This question already has answers here:
Remove special characters that mess with formating
(2 answers)
Closed 7 years ago.
I have some problems with Zalgo on my imageboard.
Texts like below mess up my imageboard. Is there a way to prevent these characters and "fix" or clean up the texts?
Example text Source:
ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
I tried to use this solution:
$cleanMessage = preg_replace("/[^\x20-\xAD\x7F]/", "", $input_lines);
Taken from here: Remove special characters that mess with formating
But it works only for latin chars
Can anyone help me?
This regular expression replaces every superscript symbol in the $text variable:
$text = preg_replace("~[\p{M}]~uis","", $text);
If $text contains char with superscript, for example กิ this regex will remove that superscript symbol and result $text will contain just ก.
I was improved this regex and changed it to filter only second level of phonetic marks
$text = preg_replace("~(?:[\p{M}]{1})([\p{M}])+?~uis","", $text);
This regex will filter only second level of superscript symbols.
Use it if you want to filter deutch or other languages with reserved marks.
This regex will transform this word -
͐̈ͩ̎Zͮ͌ͦ͆ͦͤÃ̉͛̄ͭ̈̚LͫG̉̋͂̉Oͨ͌̋͗!
into this: ZÄLͫGO!
I hope second regex will help you.

Regex not working as expected [duplicate]

This question already has answers here:
What regex to use for this
(6 answers)
Closed 4 years ago.
I'm having troube modifying this regex. Right now it matches . or ? but I want to change it to match dot followed by a space. How do I do that?
'('/([.|?])/'
By the way, I need the grouping to stay.
What about this:
(\. |\?)
......
The easiest way would be:
'('/(\. )/'
or, if you want a space or a tab or a new-line:
'('/(\.\s)/'
Note that I only changed the part in the inner parenthesis as that part seems to be the focus of your question.
/\.\s/ should work for matching a dot followed by a space..
note: \s matches any whitespace

Categories