How to prevent zalgo text using php [duplicate] - php

This question already has answers here:
Remove special characters that mess with formating
(2 answers)
Closed 7 years ago.
I have some problems with Zalgo on my imageboard.
Texts like below mess up my imageboard. Is there a way to prevent these characters and "fix" or clean up the texts?
Example text Source:
ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the an​*̶͑̾̾​̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
I tried to use this solution:
$cleanMessage = preg_replace("/[^\x20-\xAD\x7F]/", "", $input_lines);
Taken from here: Remove special characters that mess with formating
But it works only for latin chars
Can anyone help me?

This regular expression replaces every superscript symbol in the $text variable:
$text = preg_replace("~[\p{M}]~uis","", $text);
If $text contains char with superscript, for example กิ this regex will remove that superscript symbol and result $text will contain just ก.
I was improved this regex and changed it to filter only second level of phonetic marks
$text = preg_replace("~(?:[\p{M}]{1})([\p{M}])+?~uis","", $text);
This regex will filter only second level of superscript symbols.
Use it if you want to filter deutch or other languages with reserved marks.
This regex will transform this word -
͐̈ͩ̎Zͮ͌ͦ͆ͦͤÃ̉͛̄ͭ̈̚LͫG̉̋͂̉Oͨ͌̋͗!
into this: ZÄLͫGO!
I hope second regex will help you.

Related

Preg_split loses foreign letters [duplicate]

This question already has answers here:
What is the best way to split a string into an array of Unicode characters in PHP?
(8 answers)
Closed 2 years ago.
I'm trying to use one script for keyword density. Everything works except for foreign letters (be it swedish, Estonian, or anything else).
$file includes the text.
Here's where the problem comes in:
$testsource = explode(" ", $file); // This has no problems with non-english letters
FIRST WORD in array: "Mängi"
$source = preg_split("/[(\b\W+\b)]/", $file, 0, PREG_SPLIT_NO_EMPTY); // This removes the non-english letter sometimes and also a letter in front of it
FIRST WORD in array: "ngi"
In case of this specific word the problem seems to be the "ä" character (and in case of other words other non-english characters) as my current preg_split removes the "Mä" from the beginning of the word. Words with no special characters are ok.
Question: What can I add to the preg_split not to cause issues?
Ah, never mind, the answer is to change the preg_split line to the following:
$source = preg_split("/[(\b\+\b)\s!##$%*]/", $file, 0, PREG_SPLIT_NO_EMPTY);

how to keep special characters using preg_replace in php [duplicate]

This question already has answers here:
Is There a Way to Match Any Unicode Alphabetic Character?
(2 answers)
Closed 4 years ago.
I am trying to use preg_replace function and it works as expected. The problem is it also removes the special alphabet like this one Ö and removes O. How can I keep Ö?
$string='GÖTEBORG-SEASON-1';
echo $str=preg_replace('/[^A-Za-z-_]/', '', $string);
It output GTEBORG-SEASON- (Ö is missing) but I am expecting GÖTEBORG-SEASON-
Thank you.
I think I have solved it. I need to use something like this
preg_replace ('/[^\p{L}-_]/u','', $string);

How do you replace AND update in PHP using preg_replace (or similar)? [duplicate]

This question already has answers here:
What does the $1$2$4 mean in this preg_replace?
(3 answers)
Closed 4 years ago.
I want to loop through an array converting specific key/value pairs that contain markup to HTML.
So an example value for $comment['comment_text'] would be:
This has *bolded* text
And should become:
This has <strong>bolded</strong> text
Here's what I've tried:
$pattern = "/\*\b.*?\b\*/i";
$newComment = preg_replace($pattern, "<strong>$&</strong>",
$comment['comment_text']);
And what I get:
This has $& text
I realize I'm mashing up Javascript with PHP, but reading about back references in PHP hasn't made things any clearer.
My strings may have multiple bolded (in markup) instances...
Any help appreciated.
UPDATE:
Apologies - I didn't realize that Stackoverflow was converting asterisks to italics. I converted the example to code.
Also, my confusion came down to the use of $0 vs. $1. Which I still don't fully understand. I thought the numbers referred to the matches in the string...so if you had 5 instances you could refer to them by $0 through $4.
If you use $0 you get:
This has <strong>*bolded*</strong> text
But if you use $1 you get the desired result.
Do this.
$pattern = "/\*\b(.*?)\b\*/";
$newComment = preg_replace($pattern, "<strong>$1</strong>", $comment['comment_text']);
Here $1 refers to the group 1 match. Here I'm supposing that you want to make text between ** bolded.

What is the regular expression I should use in preg_replace() to find tabs inside all quoted strings and replace with a single space? [duplicate]

This question already has answers here:
How do I replace tabs with spaces within variables in PHP?
(9 answers)
Replace all occurences of char inside quotes on PHP
(5 answers)
Closed 5 years ago.
I know what needs to be done, but have been unsuccessful with the correct regex.
What is the regular expression I should use in preg_replace() to find tabs inside all quoted strings and replace with a single space?
any help would be much appreciated.
Example:
$string = '"Foo\tMan\tChoo"'
preg_replace($expression_string, ' ',$string);
echo $string;//desired result---->'"Foo Man Choo"'
I think this question is not a duplicate
answer : you will need to repeat preg_replace as many time as the max of \t you expect in a field (never could be bothered comming up with a more generic solution : it is usually possible to use this one)
also make sure every line ends with a tab (otherwise the last field will not be processed)
then you need to repeat replacement starting with the max possible number of \t (2 in the example)
$string = '"Foo'."\t".'Man'."\t".'Choo"'."\t".'boo'."\t".'"Foo'."\t".'Man"'."\t";
$string = preg_replace('/"([^"]*)'."\t".'([^"]*)'."\t".'([^"]*)"'."\t".'/','"$1_$2_$3"'."\t",$string);
$string = preg_replace('/"([^"]*)'."\t".'([^"]*)"'."\t".'/','"$1_$2"'."\t",$string);
echo $string;

PHP preg_replace only muptiple occurrences [duplicate]

This question already has answers here:
PHP Preg-Replace more than one underscore
(7 answers)
Closed 1 year ago.
this following code will replaces spaces correctly:
$string = preg_replace("/[[:blank:]]+/", "", $string);
but how can I make it so that it will only replace it if there is more than 2 blank spaces? Because right now it replaces all spaces, I only need it to replace more than one space. I searched on here and see people use totally different preg_replace codes, but it also removes newlines so if the code I posted can just be simply modified to allow more than one blank, that would be great. I remember a while back reading a tutorial where it used something like {2+} in the preg area to match anything with more than two or something but not sure how to make it work correctly.
/[[:blank:]]{2,}/
That will make it replace sequences of two or more.
The php manual has a chapter about repetition/quantifiers.
$string = preg_replace("/[[:blank:]]+/", " ", $string);
Same as yours but replaces all occurrences of spaces with one space.

Categories