What does kill octets mean in WordPress's sanitize_user()? [duplicate]

What does kill octets mean in WordPress's sanitize_user()? [duplicate] - php

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 days ago.
In sanitize_user() there is a line of code which is commented "Kill octets":
// Kill octets.
$username = preg_replace( '|%([a-fA-F0-9][a-fA-F0-9])|', '', $username );
I thought that octets were essentially bytes which encode unicode characters (although I know many unicode characters are encoded by more than one byte) and therefore I do not understand why they need to be 'killed'.

sanitize_user() removes characters and sequences that aren't allowed in WordPress user names. User names like Mickey%20Mouse aren't allowed. That user name attempts to include a space by including the %20 space octet.
In general, sanitize operations strip out disallowed data.

Related

Preg_split loses foreign letters [duplicate]

This question already has answers here:
What is the best way to split a string into an array of Unicode characters in PHP?
(8 answers)
Closed 2 years ago.
I'm trying to use one script for keyword density. Everything works except for foreign letters (be it swedish, Estonian, or anything else).
$file includes the text.
Here's where the problem comes in:
$testsource = explode(" ", $file); // This has no problems with non-english letters
FIRST WORD in array: "Mängi"
$source = preg_split("/[(\b\W+\b)]/", $file, 0, PREG_SPLIT_NO_EMPTY); // This removes the non-english letter sometimes and also a letter in front of it
FIRST WORD in array: "ngi"
In case of this specific word the problem seems to be the "ä" character (and in case of other words other non-english characters) as my current preg_split removes the "Mä" from the beginning of the word. Words with no special characters are ok.
Question: What can I add to the preg_split not to cause issues?

Ah, never mind, the answer is to change the preg_split line to the following:
$source = preg_split("/[(\b\+\b)\s!##$%*]/", $file, 0, PREG_SPLIT_NO_EMPTY);

How to prevent zalgo text using php [duplicate]

This question already has answers here:
Remove special characters that mess with formating
(2 answers)
Closed 7 years ago.
I have some problems with Zalgo on my imageboard.
Texts like below mess up my imageboard. Is there a way to prevent these characters and "fix" or clean up the texts?
Example text Source:
ALL IS LOŚ͖̩͇̗̪̏̈́T ALL IS LOST the pon̷y he comes he c̶̮omes he comes the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*̶͑̾̾̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e not rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
I tried to use this solution:
$cleanMessage = preg_replace("/[^\x20-\xAD\x7F]/", "", $input_lines);
Taken from here: Remove special characters that mess with formating
But it works only for latin chars
Can anyone help me?

This regular expression replaces every superscript symbol in the $text variable:
$text = preg_replace("~[\p{M}]~uis","", $text);
If $text contains char with superscript, for example กิ this regex will remove that superscript symbol and result $text will contain just ก.
I was improved this regex and changed it to filter only second level of phonetic marks
$text = preg_replace("~(?:[\p{M}]{1})([\p{M}])+?~uis","", $text);
This regex will filter only second level of superscript symbols.
Use it if you want to filter deutch or other languages with reserved marks.
This regex will transform this word -
͐̈ͩ̎Zͮ͌ͦ͆ͦͤÃ̉͛̄ͭ̈̚LͫG̉̋͂̉Oͨ͌̋͗!
into this: ZÄLͫGO!
I hope second regex will help you.

Only allow English characters/letters/numbers and a few special characters [duplicate]

This question already has answers here:
php POST and non-english language chars passes empty
(2 answers)
PHP: Allow only certain characters in string, without using regex
(1 answer)
Closed 9 years ago.
My problem is that I am making a small search engine from scratch, but it gets messed up if I search in Russian/any other language besides English. I was hoping some one could give me a code with regex that could filter out (not just detect, automaticallt filter out) Russian letters, or any other letters except the English letters, and keyboard special characters (-/:;()$&#". - etc).
Later on, I will implement different language support for my engine, but for now, I want to finish the base of the engine.
Thanks in advance.

You may create an array of allowed characters and then filter those that are not allowed:
$allowed = array_merge(range('a', 'z'), range('A', 'Z'), range(0, 9), array(' ', '+', '/', '-', '*', '.')); // Create an array of allowed characters
$string = 'This is allowed and this not é Ó ½ and nothing 123.'; // test string
$array = str_split($string); // split the string (character length = 1)
echo implode('', array_intersect($array, $allowed)); // Filter and implode !
Online demo.

Why complicate? A regex will read the contents of the string, so better do it yourself. Read the characters of the string and check their corresponding ASCII value.
Create a hashset like structure with SplStorageObject and check manually if the characters fall in the desired set. You can add any characters that you want to read to this set.
EDIT - You might want to use regex too - something like [a-zA-Z0-9,./+&-] but using a set could allow you to expand your search engine gradually by adding more characters to the known-characters set.

this may not be the most effective way but it works :)
$str='"it is a simple test \ + - é Ó ½ 213 /:;()$&#".~" ';
$result= preg_replace('/[^\s\w\+\-\\":;#\(\)\$\&\.\/]*/', '', $str);
echo $result;
but you need to add every special characters.

PHP preg_replace only muptiple occurrences [duplicate]

This question already has answers here:
PHP Preg-Replace more than one underscore
(7 answers)
Closed 1 year ago.
this following code will replaces spaces correctly:
$string = preg_replace("/[[:blank:]]+/", "", $string);
but how can I make it so that it will only replace it if there is more than 2 blank spaces? Because right now it replaces all spaces, I only need it to replace more than one space. I searched on here and see people use totally different preg_replace codes, but it also removes newlines so if the code I posted can just be simply modified to allow more than one blank, that would be great. I remember a while back reading a tutorial where it used something like {2+} in the preg area to match anything with more than two or something but not sure how to make it work correctly.

/[[:blank:]]{2,}/
That will make it replace sequences of two or more.
The php manual has a chapter about repetition/quantifiers.

$string = preg_replace("/[[:blank:]]+/", " ", $string);
Same as yours but replaces all occurrences of spaces with one space.

Checking for special characters [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
preg_match php special characters
As part of my register system I need to check for the existence of special characters In an variable. How can I perform this check? The person who gives the most precise answer gets best.

Assuming that you mean html entities when you say "special chars", you can use this:
<?php
$table = get_html_translation_table(HTML_ENTITIES, ENT_COMPAT, 'UTF-8');
$chars = implode('', array_keys($table));
if (preg_match("/[{$chars}]+/", $string) === 1) {
// special chars in string
}
get_html_translation_table gets all the possible html entities. If you only want the entities that the function htmlspecialchars converts, then you can pass HTML_SPECIALCHARS instead of HTML_ENTITIES. The return value of get_html_translation_table is an array of (html entity, escaped entity) pairs.
Next, we want to put all the html entities in a regular expression like [&"']+, which will match any substring containing one of the characters inside square brackets of length 1 or more. So we use array_keys to get the keys of the translation table (the unencoded html entities), and implode them together into a single string.
Then we put them into the regular expression and use preg_match to see if the string contains any of those characters. You can read more about regular expression syntax at the PHP docs.

$special_chars = // all the special characters you want to check for
$string = // the string you want to check for
if (preg_match('/'.$special_chars.'/', $string))
{
// special characters exist in the string.
}
Check the manual of preg_match for more details

A quick google search for "php special characters" brings up some good info:
htmlentities() - http://php.net/manual/en/function.htmlentities.php
htmlspecialchars() - http://php.net/manual/en/function.htmlspecialchars.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

What does kill octets mean in WordPress's sanitize_user()? [duplicate] - php

sanitize_user() removes characters and sequences that aren't allowed in WordPress user names. User names like Mickey%20Mouse aren't allowed. That user name attempts to include a space by including the %20 space octet. In general, sanitize operations strip out disallowed data.

Related

Preg_split loses foreign letters [duplicate]

How to prevent zalgo text using php [duplicate]

Only allow English characters/letters/numbers and a few special characters [duplicate]

PHP preg_replace only muptiple occurrences [duplicate]

Checking for special characters [duplicate]

Categories

Resources