PHP Regex Strip Away All Emojis

PHP Regex Strip Away All Emojis - php

I am trying to strip away all non-allowed characters from a string using regex. Here is my current php code
$input = "👮";
$pattern = "[a-zA-Z0-9_ !##$%^&*();\\\/|<>\"'+\-.,:?=]";
$message = preg_replace($pattern,"",$input);
if (empty($message)) {
echo "The string is empty";
}
else {
echo $message;
}
The emoji gets printed out when I run this when I want it to print out "The string is empty.".
When I put my regex code into http://regexr.com/ it shows that the emoji is not matching, but when I run the code it gets printed out. Any suggestions?

This pattern should do the trick :
$filteredString = preg_replace('/([^-\p{L}\x00-\x7F]+)/u', '', $rawString);
Some sequences are quite rare, so let's explain them:
\p{L} matches any kind of letter from any language
\x00-\x7F a single character in the range between (index 0) and (index 127) (case sensitive)
the u modifier who turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8.

Your pattern is incorrect. If you want to strip away all the characters that are not in the list provided, then you have to use a negating character class: [^...]. Also, currently, [ and ] are being used as delimiters, which means, the pattern isn't seen as a character class.
The pattern should be:
$pattern = "~[^a-zA-Z0-9_ !##$%^&*();\\\/|<>\"'+.,:?=-]~";
This should now strip away the emoji and print your message.

Related

Regex for Chinese / Japanese letters

Okai so I already have this regular expression for names allowed on my website.
However, I also wish to add other possible letters that names use.
Does someone have a good regex or know how I can make this more complete? I have searched for quite a while now, and I can't find anything that suits my needs.
This is my current regex for checking names:
$regex = "/^([a-zA-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð ,.'-])+$/";
if(preg_match($regex, $fullname)){
// do something
}

As Lucas Trzesniewski has mentioned, the \p{L} will include the [a-zA-Z], so I have removed from the pattern.
Thus, combining the character lists that you have included in the example; the pattern will look like this, /^[\p{L}\s,.'-]+$/u
^[]+$ matches the string from start to end, thus + also imply the need of matching one or more
\p{L} matches unicode characters
\s,.'- matches space, comma, period, single quotation, and dash
u the PCRE_UTF8 modifier, this modifier turns on additional functionality of PCRE that is incompatible with Perl.
if(preg_match("/^[\p{L}\s,.'-]+$/u", "お元気ですか你好吗how are you你好嗎,.'-") === 1) {
echo "match";
}
else {
echo "no match";
}
// match
if(preg_match("/^[\p{L}\s,.'-]+$/u", "お元気ですか你好吗how are you你好_嗎-,.'") === 1) {
echo "match";
}
else {
echo "no match";
}
// no match as there are underscore in 你好_嗎

PHP regex to allow newline didn't work

PHP preg_match to accept new line
I want to pass every post/string through PHP preg_match function. I want to accept all the alpha-numerics and some special characters. Help me edit my syntax to allow newline. As the users fill textarea and press enter. Following syntax does not allow new line.
Please feedback whether following special characters are properly done or not
*/_:,.?#;-*
if (preg_match("/^[0-9a-zA-Z \/_:,.?#;-]+$/", $string)) {
echo 'good';
else {
echo 'bad';
}

You were almost there!
The DOTALL modifier mentioned by others is irrelevant to your regex.
To allow new lines, we just add \r\n to your character class. Your code becomes:
if (preg_match("/^[\r\n0-9a-zA-Z \/_:,.?#;-]+$/", $string)) {
echo 'good';
else {
echo 'bad';
}
Note that this test and the regex can be written in a tidier way:
echo (preg_match("~^[\r\n\w /:,.?#;-]+$~",$string))? "***Good!***" : "Bad!";
See the result of the online demo at the bottom.
\w matches letters, digits and underscores, so we can get rid of them in the character class
Changing the delimiter to a ~ allows you to use a / slash without escaping it (you need to escape delimiters)

it's always safe to add backslash to any non-alphanumeric characters so:
/^[0-9a-zA-Z \/\_\:\,\.\?\#\;\-]+$/
Also use character classes:
/^[[:alnum:] \/\_\:\,\.\?\#\;\-]+$/
oh about the new lines:
/^[[:alnum:] \r\n\/\_\:\,\.\?\#\;\-]+$/
to be able to do that string ^ (also, it'll be easier/safer to use single quotes)
'/^[[:alnum:] \\r\\n\/\_\:\,\.\?\#\;\-]+$/'

You can use an alternation to factor in the newlines:
/^(?:[0-9a-zA-Z \/_:,.?#;-]|\r?\n)+$/
Btw, you can shorten the expression a bit by replacing [A-Za-z0-9_] with [\w\d]:
/^(?:[\w\d \/:,.?#;-]|\r?\n)+$/
So:
if (preg_match('/^(?:[\w\d \/:,.?#;-]|\r?\n)+$/', $string)) {
echo "good";
} else {
echo "bad";
}

preg_match a back slash

$pattern = '/\\\p\\\/';
if (preg_match($pattern, "\p\")) {
echo "Correct";
} else {
echo "Incorrect";
}
I don't understand the first \\\p.
Why \\p does not work?

Your pattern is wrong. pattern \\p\\ matches the string \p\. But \\\p\\\ doesn't matches anything.
DEMO
If you want to match the string \\p\\, your pattern should be \\\\p\\\\.
DEMO

Note that "\p\" is not a valid string:
The final \" escapes the quote, so that the string is not terminated
The \p matches a literal p character, which is not what you intended
If you want to say \p\ in a string, you have to write it like this: "\\p\\"
To match \p\, use:
$regex = '~\\\\p\\\\~';
echo (preg_match($regex,"\\p\\")) ? "Matches" : "Doesn't Match";
See the output at the bottom of the online demo.

The problem here is that both strings and regular expressions use escape characters and they need to be doubled in order to effect the intended behaviour.
So, in this case you need four backslashes in the regular expression and two of them in the search string:
if (preg_match('/\\\\p\\\\/', '\\p\\')) {
echo "Hurray!\n";
}
The reason why '/\\\p\\\/' works is because \p and \/ have no special meaning in a single quoted string and so the backslash is printed verbatim. In other words, PHP corrects your string to have the correct meaning; that said, you should use the correct number of escape characters.
Btw, "\\p\" is just plain wrong and will cause a parse error; I'm going to assume that this was a typo.

preg_replace PHP not working?

Why doesn't preg_replace return anything in this scenario? I've been trying to figure it out all night.
Here is the text contained within $postContent:
Test this. Here is a quote: [Quote]1[/Quote] Quote is now over.
Here is my code:
echo "Test I'm Here!!!";
$startQuotePos = strpos($postContent,'[Quote]')+7;
$endQuotePos = strpos($postContent,'[/Quote]');
$postStrLength = strlen($postContent);
$quotePostID = substr($postContent,$startQuotePos,($endQuotePos-$postStrLength));
$quotePattern = '[Quote]'.$quotePostID.'[/Quote]';
$newPCAQ = preg_replace($quotePattern,$quotePostID,$postContent);
echo "<br />$startQuotePos<br />$endQuotePos<br />$quotePostID<br />Qpattern:$quotePattern<br />PCAQ: $newPCAQ<br />";
This is my results:
Test I'm Here!!!
35
36
1
Qpattern:[Quote]1[/Quote]
PCAQ:

For preg_replace(), "[Quote]" matches a single character that is one of the following: q, u, o, t, or e.
If you want that preg_replace() finds the literal "[Quote]", you need to escape it as "\[Quote\]". preg_quote() is the function you should use: preg_quote("[Quote]").
Your code is also wrong because a regular expression is expected to start with a delimiter. In the preg_replace() call I am showing at the end of my answer, that is #, but you could use another character, as long as it doesn't appear in the regular expression, and it is used also at the end of the regular expression. (In my case, # is followed by a pattern modifier, and pattern modifiers are the only characters allowed after the pattern delimiter.)
If you are going to use preg_replace(), it doesn't make sense that you first find where "[Quote]" is. I would rather use the following code:
$newPCAQ = preg_replace('#\[Quote\](.+?)\[/Quote\]#i', '\1', $postContent);
I will explain the regular expression I am using:
The final '#i' is saying to preg_replace() to ignore the difference between lowercase, and uppercase characters; the string could contain "[QuOte]234[/QuOTE]", and that substring would match the regular expression the same.
I use a question mark in "(.+?)" to avoid ".+" is too greedy, and matches too much characters. without it, the regular expression could include in a single match a substring like "[Quote]234[/Quote] Other text [Quote]475[/Quote]" while this should be matched as two substrings: "[Quote]234[/Quote]", and "[Quote]475[/Quote]".
The '\1' string I am using as replacement string is saying to preg_replace() to use the string matched from the sub-group "(.+?)" as replacement. In other words, the call to preg_replace() is removing "[Quote]", and "[/Quote]" surrounding other text. (It doesn't replace "[/Quote]" that doesn't match with "[Quote]", such as in "[/Quote] Other text [Quote]".)

your regex must start & end with '/':
$quotePattern = '/[Quote]'.$quotePostID.'[/Quote]/';

The reason you don't see anything for the return value of preg_replace is because it has returned NULL (see the manual link for details). This is what preg_replace returns when an error occurs, which is what happened in your situation. The string value of NULL is a zero-length string. You can see this by using var_dump instead, which will tell you that preg_replace returned NULL.
Your regular expression is invalid and as such PHP will throw an E_WARNING level error of Warning: preg_replace(): Unknown modifier '['
There are a couple of reason for this. First, you need to specify an opening and closing delimiter for you regular expression as preg_* functions use PCRE style regular expression. Second, you want to also consider using preg_quote on your patter (sans the delimiter) to ensure it is escaped properly.
$postContent = "Test this. Here is a quote: [Quote]1[/Quote] Quote is now over.";
/* Specify a delimiter for your regular expression */
$delimiter = '#';
$startQuotePos = strpos($postContent,'[Quote]')+7;
$endQuotePos = strpos($postContent,'[/Quote]');
$postStrLength = strlen($postContent);
$quotePostID = substr($postContent,$startQuotePos,($endQuotePos-$postStrLength));
/* Make sure you use the delimiter in your pattern and escape it properly */
$quotePattern = $delimiter . preg_quote("[Quote]{$quotePostID}[/Quote]", $delimiter) . $delimiter;
$newPCAQ = preg_replace($quotePattern,$quotePostID,$postContent);
echo "<br />$startQuotePos<br />$endQuotePos<br />$quotePostID<br />Qpattern:$quotePattern<br />PCAQ: $newPCAQ<br />";
The output will be:
35
36
1
Qpattern:#[Quote]1[/Quote]#
PCAQ: Test this. Here is a quote: 1 Quote is now over.

How to check if a string is in an array?

I basically need a function to check whether a string's characters (each character) is in an array.
My code isn't working so far, but here it is anyway,
$allowedChars = array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"," ","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"," ","0","1","2","3","4","5","6","7","8","9"," ","#",".","-","_","+"," ");
$input = "Test";
$input = str_split($input);
if (in_array($input,$allowedChars)) {echo "Yep, found.";}else {echo "Sigh, not found...";}
I want it to say 'Yep, found.' if one of the letters in $input is found in $allowedChars. Simple enough, right? Well, that doesn't work, and I haven't found a function that will search a string's individual characters for a value in an array.
By the way, I want it to be just those array's values, I'm not looking for fancy html_strip_entities or whatever it is, I want to use that exact array for the allowed characters.

You really should look into regex and the preg_match function: http://php.net/manual/en/function.preg-match.php
But, this should make your specific request work:
$allowedChars = array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"," ","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"," ","0","1","2","3","4","5","6","7","8","9"," ","#",".","-","_","+"," ");
$input = "Test";
$input = str_split($input);
$message = "Sigh, not found...";
foreach($input as $letter) {
if (in_array($letter, $allowedChars)) {
$message = "Yep, found.";
break;
}
}
echo $message;

Are you familiar with regular expressions at all? It's sort of the more accepted way of doing what you're trying to do, unless I'm missing something here.
Take a look at preg_match(): http://php.net/manual/en/function.preg-match.php
To address your example, here's some sample code (UPDATED TO ADDRESS ISSUES IN COMMENTS):
$subject = "Hello, this is a string";
$pattern = '/[a-zA-Z0-9 #._+-]*/'; // include all the symbols you want to match here
if (preg_match($pattern, $subject))
echo "Yep, matches";
else
echo "Doesn't match :(";
A little explanation of the regex: the '^' matches the beginning of the string, the '[a-zA-Z0-9 #._+-]' part means "any character in this set", the '*' after it means "zero or more of the last thing", and finally the '$' at the end matches the end of the string.

A somewhat different approach:
$allowedChars = array("a","b","c","d","e");
$char_buff = explode('', "Test");
$foundTheseOnes = array_intersect($char_buff, $allowedChars);
if(!empty($foundTheseOnes)) {
echo 'Yep, something was found. Let\'s find out what: <br />';
print_r($foundTheseOnes);
}

Validating the characters in a string is most appropriately done with string functions.preg_match() is the most direct/elegant method for this task.
Code: (Demo)
$input="Test Test Test Test";
if(preg_match('/^[\w +.#_-]*$/',$input)){
echo "Input string does not contain any disallowed characters";
}else{
echo "Input contains one or more disallowed characters";
}
// output: Yes, input contains only allowed characters
Pattern Explanation:
/ # start pattern
^ # start matching from start of string
[\w +.#-] # match: a-z, A-Z, 0-9, underscore, space, plus, dot, atsign, hyphen
* # zero or more occurrences
$ # match until end of string
/ # end pattern
Significant points:
The ^ and $ anchors are crucial to ensure that the entire string is validated versus just a substring of the string.
The \w (a.k.a. "any word character" -> a shorthand character class) is the easy way to write: [a-zA-Z0-9_]
The . dot character loses its "match anything (almost)" meaning and becomes literal when it is written inside of a character class. No escaping slash is necessary.
The hyphen inside of a character class can be written without an escaping slash (\-) so long as the it is positioned at the start or end of the character class. If the hyphen is not at the start/end and it is not escaped, it will create a range of characters between the characters on either side of it.Like it or not, [.-z] will not match a hyphen symbol because it does not exist "between" the dot character and the lowercase letter z on the ascii table.
The * that follows the character class is the "quantifier". The asterisk means "0 or more" of the preceding character class. In this case, this means that preg_match() will allow an empty string. If you want to deny an empty string, you can use + which means "1 or more" of the preceding character class. Finally, you can be far more specific about string length by using a number or numbers in a curly bracketed expression.
{8} would mean the string must be exactly 8 characters long.
{4,} would mean the string must be at least 4 characters long.
{,10} would mean the string length must be between 0 and 10.
{5,9} would mean the string length must be between 5 and 9 characters.
All of that advice aside, if you absolutely must use your array of characters AND you wanted to use a loop to check individual characters against your validation array (and I certainly don't recommend it), then the goal should be to reduce the number of array elements involved so as to reduce total iterations.
Your $allowedChars array has multiple elements that contain the space character, but only one is necessary. You should prepare the array using array_unique() or a similar technique.
str_split($input) will run the chance of generating an array with duplicate elements. For example, if $input="Test Test Test Test"; then the resultant array from str_split() will have 19 elements, 14 of which will require redundant validation checks.
You could probably eliminate redundancies from str_split() by calling count_chars($input,3) and feeding that to str_split() or alternatively you could call str_split() then array_unique() before performing the iterative process.

Because you're just validating a string, see preg_match() and other PCRE functions for handling this instead.
Alternatively, you can use strcspn() to do...
$check = "abcde.... '; // fill in the rest of the characters
$test = "Test";
echo ((strcspn($test, $check) === strlen($test)) ? "Sigh, not found..." : 'Yep, found.');

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Regex Strip Away All Emojis - php

Related

Regex for Chinese / Japanese letters

PHP regex to allow newline didn't work

preg_match a back slash

preg_replace PHP not working?

How to check if a string is in an array?

Categories

Resources