Checking for special characters [duplicate] - php

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
preg_match php special characters
As part of my register system I need to check for the existence of special characters In an variable. How can I perform this check? The person who gives the most precise answer gets best.

Assuming that you mean html entities when you say "special chars", you can use this:
<?php
$table = get_html_translation_table(HTML_ENTITIES, ENT_COMPAT, 'UTF-8');
$chars = implode('', array_keys($table));
if (preg_match("/[{$chars}]+/", $string) === 1) {
// special chars in string
}
get_html_translation_table gets all the possible html entities. If you only want the entities that the function htmlspecialchars converts, then you can pass HTML_SPECIALCHARS instead of HTML_ENTITIES. The return value of get_html_translation_table is an array of (html entity, escaped entity) pairs.
Next, we want to put all the html entities in a regular expression like [&"']+, which will match any substring containing one of the characters inside square brackets of length 1 or more. So we use array_keys to get the keys of the translation table (the unencoded html entities), and implode them together into a single string.
Then we put them into the regular expression and use preg_match to see if the string contains any of those characters. You can read more about regular expression syntax at the PHP docs.

$special_chars = // all the special characters you want to check for
$string = // the string you want to check for
if (preg_match('/'.$special_chars.'/', $string))
{
// special characters exist in the string.
}
Check the manual of preg_match for more details

A quick google search for "php special characters" brings up some good info:
htmlentities() - http://php.net/manual/en/function.htmlentities.php
htmlspecialchars() - http://php.net/manual/en/function.htmlspecialchars.php

Related

Only allow English characters/letters/numbers and a few special characters [duplicate]

This question already has answers here:
php POST and non-english language chars passes empty
(2 answers)
PHP: Allow only certain characters in string, without using regex
(1 answer)
Closed 9 years ago.
My problem is that I am making a small search engine from scratch, but it gets messed up if I search in Russian/any other language besides English. I was hoping some one could give me a code with regex that could filter out (not just detect, automaticallt filter out) Russian letters, or any other letters except the English letters, and keyboard special characters (-/:;()$&#". - etc).
Later on, I will implement different language support for my engine, but for now, I want to finish the base of the engine.
Thanks in advance.
You may create an array of allowed characters and then filter those that are not allowed:
$allowed = array_merge(range('a', 'z'), range('A', 'Z'), range(0, 9), array(' ', '+', '/', '-', '*', '.')); // Create an array of allowed characters
$string = 'This is allowed and this not é Ó ½ and nothing 123.'; // test string
$array = str_split($string); // split the string (character length = 1)
echo implode('', array_intersect($array, $allowed)); // Filter and implode !
Online demo.
Why complicate? A regex will read the contents of the string, so better do it yourself. Read the characters of the string and check their corresponding ASCII value.
Create a hashset like structure with SplStorageObject and check manually if the characters fall in the desired set. You can add any characters that you want to read to this set.
EDIT - You might want to use regex too - something like [a-zA-Z0-9,./+&-] but using a set could allow you to expand your search engine gradually by adding more characters to the known-characters set.
this may not be the most effective way but it works :)
$str='"it is a simple test \ + - é Ó ½ 213 /:;()$&#".~" ';
$result= preg_replace('/[^\s\w\+\-\\":;#\(\)\$\&\.\/]*/', '', $str);
echo $result;
but you need to add every special characters.

Different results between preg_replace & preg_match_all

I have a forum that supports hashtags. I'm using the following line to convert all hashtags into links. I'm using the (^|\(|\s|>) pattern to avoid picking up named anchors in URLs.
$str=preg_replace("/(^|\(|\s|>)(#(\w+))/","$1$2",$str);
I'm using this line to pick up hashtags to store them in a separate field when the user posts their message, this picks up all hashtags EXCEPT those at the start of a new line.
preg_match_all("/(^|\(|\s|>)(#(\w+))/",$Content,$Matches);
Using the m & s modifiers doesn't make any difference. What am I doing wrong in the second instance?
Edit: the input text could be plain text or HTML. Example of problem input:
#startoftextreplacesandmatches #afterwhitespacereplacesandmatches <b>#insidehtmltagreplacesandmatches</b> :)
#startofnewlinereplacesbutdoesnotmatch :(
Your replace operation has a problem which you have evidently not yet come across - it will allow unescaped HTML special characters through. The reason I know this is because your regex allows hashtags to be prefixed with >, which is a special character.
For that reason, I recommend you use this code to do the replacement, which will double up as the code for extracting the tags to be inserted into the database:
$hashtags = array();
$expr = '/(?:(?:(^|[(>\s])#(\w+))|(?P<notag>.+?))/';
$str = preg_replace_callback($expr, function($matches) use (&$hashtags) {
if (!empty($matches['notag'])) {
// This takes care of HTML special characters outside hashtags
return htmlspecialchars($matches['notag']);
} else {
// Handle hashtags
$hashtags[] = $matches[2];
return htmlspecialchars($matches[1]).'#'.htmlspecialchars($matches[2]).'';
}
}, $str);
After the above code has been run, $str will contain the modified string, properly escaped for direct output, and $hashtags will be populated with all the tags matched.
See it working

Replace all html codes by preg_replace

I want to replace all html codes to empty space. I think I should use preg_replace function, but I'm not sure how should I do that in case when html codes looks in this way:
”
β
$text="β something ” test..."
$text=preg_replace("&# [what should be here?] ;", " ", $text);
echo $text;
result = something test...
I think it should be only numeric, because I found only numeric ones here: http://www.ascii.cl/htmlcodes.htm
You could look at strip_tags which does exactly that. However those arent HTML codes, they are called HTML entities.
The regex to match what you want looks like this:
(&#.+?;)
Its rather simple, look for the &# then any repeated character until ;.
Edit: As Qtax pointed out, they dont have to be numbers. Dot matches all.
HTML character references can be defined in two ways. Assuming that you only want to replace numeric character references, you need a regular expression that parses these formats:
&#D; where D is a decimal number
&#xH; where H is a hexadecimal number
The regex that takes care of both:
/&#(\d+|x[\da-f]+);/i
If you want to replace all HTML entities like &foo; you could use something like:
preg_replace('/&(?:[a-z]+|#x[\da-f]+|#\d+);/i', ' ', $text);
If you want to decode them, use html_entity_decode.
&<something>; is a syntax for HTML entity. If you want to replace all of them, use this regexp:
preg_replace('/&.*?;/', '', $subject); // from ampersand till the next semicolon
It will replace all HTML entities with an empty string, including ä, &x20; and others

regex with special characters?

i am looking for a regex that can contain special chracters like / \ . ' "
in short i would like a regex that can match the following:
may contain lowercase
may contain uppercase
may contain a number
may contain space
may contain / \ . ' "
i am making a php script to check if a certain string have the above or not, like a validation check.
The regular expression you are looking for is
^[a-z A-Z0-9\/\\.'"]+$
Remember if you are using PHP you need to use \ to escape the backslashes and the quotation mark you use to encapsulate the string.
In PHP using preg_match it should look like this:
preg_match("/^[a-z A-Z0-9\\/\\\\.'\"]+$/",$value);
This is a good place to find the regular expressions you might want to use.
http://regexpal.com/
You can always escape them by appending a \ in front of the special characters.
try this:
preg_match("/[A-Za-z0-9\/\\.'\"]/", ...)
NikoRoberts is 100% correct.
I would only add the following suggestion: When creating a PHP regex pattern string, always use: single-quotes. There are far fewer chars which need to be escaped (i.e. only the single quote and the backslash itself needs to be escaped (and the backslash only needs to be escaped if it appears at the end of the string)).
When dealing with backslash soup, it helps to print out the (interpreted) regex string. This shows you exactly what is being presented to the regex engine.
Also, a "number" might have an optional sign? Yes? Here is my solution (in the form of a tested script):
<?php // test.php 20110311_1400
$data_good = 'abcdefghijklmnopqrstuvwxyzABCDE'.
'FGHIJKLMNOPQRSTUVWXYZ0123456789+- /\\.\'"';
$data_bad = 'abcABC012~!###$%^&*()';
$re = '%^[a-zA-Z0-9+\- /\\\\.\'"]*$%';
echo($re ."\n");
if (preg_match($re, $data_good)) {
echo("CORRECT: Good data matches.\n");
} else {
echo("ERROR! Good data does NOT match.\n");
}
if (preg_match($re, $data_bad)) {
echo("ERROR! Bad data matches.\n");
} else {
echo("CORRECT: Bad data does NOT match.\n");
}
?>
The following regex will match a single character that fits the description you gave:
[a-zA-Z0-9\ \\\/\.\'\"]
If your point is to insure that ONLY characters in this range of characters are used in your string, then you can use the negation of this which would be:
[^a-zA-Z0-9\ \\\/\.\'\"]
In the second case, you could use your regex to find the bad stuff (that you don't want to be included), and if it didn't find anything then your string pattern must be kosher, because I'm assuming that if you find one character that is not in the proper range, then your string is not valid.
so to put it in PHP syntax:
$regex = "[^a-zA-Z0-9\ \\\/\.\'\"]"
if preg_match( $regex, ... ) {
// handle the bad stuff
}
Edit 1:
I've completely ignored the fact that backslashes are special in php double-quoted strings, so here is a correcting to the above code:
$regex = "[^a-zA-Z0-9\\ \\\\\\/\\.\\'\\\"]"
If that doesn't work it shouldn't take too much for someone to debug how many of the backslashes need to be escaped with a backslash, and what other characters need also to be escaped....

best way to escape and create a slug [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
URL Friendly Username in PHP?
im somehow confused in using proper functions to escape and create a slug
i used this :
$slug_title = mysql_real_escape_string()($mtitle);
but someone told me not to use it and use urlencode()
which one is better for slugs and security
as i can see in SO , it inserts - between words :
https://stackoverflow.com/questions/941270/validating-a-slug-in-django
Using either MySQL or URL escaping is not the way to go.
Here is an article that does it better:
function toSlug($string,$space="-") {
if (function_exists('iconv')) {
$string = #iconv('UTF-8', 'ASCII//TRANSLIT', $string);
}
$string = preg_replace("/[^a-zA-Z0-9 -]/", "", $string);
$string = strtolower($string);
$string = str_replace(" ", $space, $string);
return $string;
}
This also works correctly for accented characters.
mysql_real_escape_string() has different purpose than urlencode() which both aren't appropriate for creating a slug.
A slug is supposed to be a clear & meaningful phrase that concisely describes the page.
mysql_real_escape_string() escapes dangerous characters that can change the purpose of the original query string.
urlencode() escapes invalid URL characters with "%" followed by 2 hex digits that represents their code (e.g. %20 for space). This way, the resulting string will not be clear & meaningful, because of the unpleasant characters sequences, e.g. http://www.domain.com/bad%20slug%20here%20%3C--
Thus any characters which may be affected by urlencode() should be omitted, except for spaces that are usually replaced with -.

Categories