How can I allow only letters and special characters with a regular expression?
I suggest you use GSkinner's REGEX builder and experiment with a lot of the examples on the right hand side. There are are many variations to get this job done. If you want to be explicit you can use:
/[a-zA-Z!##$%¨&*()-=+/*.{}]/
Tony's answer will also work, but includes more extra characters than the ones you've defined in your comment.
This
$str = $_REQUEST["htmlstringinput"];
preg_match("([\w\-]+[##%.])", $str);
for letters, numbers and special characters in this special character range [##%.] are allowed
and this
$str = $_REQUEST["htmlstringinput"];
preg_match("([-a-zA-Z]+[##%.])", $str);
for only letters and special characters in the same special character range as above
Worked for me. For further reading and research you can go to : http://gskinner.com/RegExr/
/[\p{L}\p{P}]+/u
matches letters and punctuation characters. Or what did you mean by "special characters"?
all characters not a number? how bout this:
/[^\d]*/
Use following code in .htaccess to block all URLs with number (as per OP's comments)
Options +FollowSymlinks -MultiViews
RewriteEngine on
RewriteCond %{REQUEST_URI} ![0-9]
RewriteRule ^user/ /index.php?goto=missed [NC,L]
Related
I have a link like this
www.example.com/profile.php?name=sagar123
I used this rule:
RewriteRule ^profile/([a-zA-Z0-9_-]+)$ profile.php?name=$1 [L]
and now I can chang my URL to like this:
www.example.com/profile/sagar123
everything is fine but, now I want to use Hindi language characters also like this
www.example.com/profile.php?name=सागर (It's working fine)
www.example.com/profile/सागर (It is not working and showing Server error)
Please help me to write a rule or regex to accept all ([a-zA-Z0-9_-]+) and also Hindi Character.
Thanks and regards,
Hindi chars falls between \u0900-\u097F range. So you can use this inside character class.
To answer your question, most regexes(PCRE) do not support \u notation and support format of \x{900}
([\x{900}-\x{97F}a-zA-Z0-9_-]+)$
In python \u is supported, so :
([\u0900-\u097Fa-zA-Z0-9_-]+)$
see this for regex matching demonstrating both English and Hindi chars getting matched.
Also, see this for reading literal hindi char mapped to their hex values.
Use the (.*) regex class to match any type of character.
Also, you don't need the + operator at the end in your capturing ( and ) parens, as you're using ^ to indicate the beginning of the URL line, and $ to indicate its end, so a + greedy operator doesn't get you anything extra.
It should look like...
RewriteRule ^profile/(.*)$ profile.php?name=$1 [L]
If you need further info, I recommend taking a look at Apache.org: Apache mod_rewrite Introduction. They cover most of the characters I've discussed in this post up to this point: ., (, ), +, etc..
I need to write simple routing system, I have only one question.
When I have url/slug like this
/article/1/simple-article-1
What characters should be allowed there.
Of course letters, digits, '-', '/' and?
.htaccess:
Options -Indexes
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php?/$1 [L,QSA]
PHP:
if(isset($_SERVER['QUERY_STRING'])) {
if(!preg_match('/^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$/', $_SERVER['QUERY_STRING'])) {
return false;
}
$info = explode('/', $_SERVER['QUERY_STRING']);
....
}
What characters should be allowed there.
Usually slugs are all lowercase, with accented characters replaced by letters of the english alphabet and blank characters replaced by a - or an _. Punctuation marks like the period, comma, question mark, exclamation point, apostrophe and quotation mark are generally removed. It may be also truncated to keep a reasonable length.
The reserved chars that may have a particular meaning in the URI are: !, *, ', (, ), ;, :, #, &, =, +, $, /, ?, #, [ and ]. If the character would conflict with a reserved character's purpose, then the conflicting data must be percent-encoded before the URI is formed.
Once you product the URI from its component parts, if you want add characters that are not alpha, digit, -, ., _ or ~ you should always percent-encoding it.
Example:
/article/1/i!want!use!the!exclamation!mark <-- bad
/article/1/i%21want%21use%21the%21exclamation%21mark <-- good
I'm wanting to make a URL look pleasing to the eye.
from
/index.php?a=grapes
to
/grapes
Although, I'm having a few problems. I wanted a to have a wider variety of characters like a-z A-Z 0-9 / _ - . [ ].
from
/index.php?a=Grapes.Are.Green/Red[W4t3r-M3l0n_B1G_Gr4p3]
to
/Grapes.Are.Green/Red[W4t3r-M3l0n_B1G_Gr4p3]
In the index.php file I have
<?php
$a = $_GET["a"];
echo $a;
?>
just to test the URL is working correctly.
Right now what I have in .htaccess
RewriteEngine On
RewriteRule ^([a-zA-Z0-9/_]+)?$ index.php?a=$1
only accepts a-z A-Z 0-9 / _.
If I add - into the square brackets and have it as one of the
characters which a equals I get the 404 error.
If I add . into the square brackets I get index.php outputted.
If I add [ or ] I get the 404 error.
If anyone has a solution I'd love to see it. Also, if anyone has time please could you explain each part of the RewriteRule saying what the part does. Thanks!
The problem is that some of your character are "special":
Special characters:
(full stop) - match any character
* (asterix) - match zero or more of the previous symbol
+ (plus) - match one or more of the previous symbol
? (question) - match zero or one of the previous symbol
\? (backslash-something) - match special characters
^ (caret) - match the start of a string
$ (dollar) - match the end of a string
[set] - match any one of the symbols inside the square braces.
(pattern) - grouping, remember what the pattern matched as a special variable
So if you want to use them in a url, you have to scape them.
For example
.s?html? matches ".htm", ".shtm", ".html" or ".shtml"
RewriteEngine On
RewriteRule ^(.*)$ index.php?a=$1 [QSA]
The [QSA] thing at the end is what made it work :) Thanks to jedwards for suggesting to use ^(.*)$ which accepts all characters.
Hey guys can you help me with this. I've got this '/[^A-Za-z]/' but cannot figure out the punctuations part.
Gracious!
The regular expression you are using doesn't allow letters; it's the opposite of what you are reported in the title.
/[a-z]/i is enough, if you want to accept only letters. If you want to allow letters like à, è, or ç, then you should expand the regular expression; /[\p{L}]/ui should work with all the Unicode letters.
#^[^a-z]+$#i
Your code was correct, you just need ^ and $. So it means all character from the beginning to the end doesn't allow outside alphabet. Negative match is preferred than positive match here.
/[^A-Za-z]*/ will match everything except letters. You shouldn't need to specify numbers or punctuation.
Inside of a character class, the ^ means not.
So you're looking for not a letter.
You want something like
[A-Za-z]+
you can also use the shorthand \w for a "word character" (alphanumeric plus _). Of course some regex engines may differ on support for this, but if it's PCRE it should work. See here (under heading "escape sequences").
This code is being used to parse email, it's stored as a table in a mySQL database. I believe it's PHP code. What does the (.+) do?
/A new order has been successfully placed through(.+)Name:(.+)Company:(.+)Email:(.+)Address 1(.+)Order ID:(.+)Date:(.+)Payment Type:(.+)Order Status:(\s*)Accepted(.*)\n(.+)\$([\d\.]+)\s+X/si
Thanks, super-brainiacs!
That looks like a regular expression match. The (.+) parts are 'wildcard' captures.
This can be run against a string and the necessary information can be extracted. In most cases this string is language independent.
Just a few notes:
/si at the end means 'case insensitve' and '. match all' (means . will match everything including \n (which it normally does not))
The captures ((.+)) can be referenced after matching as $# in your average regex enabled language (where # is the order the (.+) appears in your regex string.
EDIT: You've updated your question so as an example, the section Name:(.+)Company:(.+) will match Name:Some random set of characters Company: Some more random characters where 'Some random set of characters' and 'Some more random characters' are extracted into variables $1 and $2 (because they are first and second in the order in your regex).
That's a regex pattern. The relevant PHP documentation starts here: PCRE (Perl-Compatible Regular Expression) Introduction
Each .+ is a placeholder for "any character, one or more times". The parentheses around it makes it so that anything matched by that placeholder is "captured", so that it can be used later (in your case, to store it into the database).
It's a regular expression, the (.+) means match any character once, or many times. See here for a handy reference, and here for a tool to test your regex expressions.
It seems to be a dirty regex to verify that there are effectively the substrings 'Name', 'Company' etc... , the (.+) pattern allowing to catch the value of each 'field', in fact, here it catches everything around these substrings.
I think it's an argument for the php function ereg();
its a wildcard character. It . means single character and + means combination of specified character. like [a-zA-Z+] means combination of characters range from a-z A-Z and [a-z+]. means one character after a-z
The (.+) is a wildcard written in PCRE (regular expression).