PHP URL/slug accept chars - php

I need to write simple routing system, I have only one question.
When I have url/slug like this
/article/1/simple-article-1
What characters should be allowed there.
Of course letters, digits, '-', '/' and?

.htaccess:
Options -Indexes
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(.*)$ index.php?/$1 [L,QSA]
PHP:
if(isset($_SERVER['QUERY_STRING'])) {
if(!preg_match('/^[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*$/', $_SERVER['QUERY_STRING'])) {
return false;
}
$info = explode('/', $_SERVER['QUERY_STRING']);
....
}

What characters should be allowed there.
Usually slugs are all lowercase, with accented characters replaced by letters of the english alphabet and blank characters replaced by a - or an _. Punctuation marks like the period, comma, question mark, exclamation point, apostrophe and quotation mark are generally removed. It may be also truncated to keep a reasonable length.
The reserved chars that may have a particular meaning in the URI are: !, *, ', (, ), ;, :, #, &, =, +, $, /, ?, #, [ and ]. If the character would conflict with a reserved character's purpose, then the conflicting data must be percent-encoded before the URI is formed.
Once you product the URI from its component parts, if you want add characters that are not alpha, digit, -, ., _ or ~ you should always percent-encoding it.
Example:
/article/1/i!want!use!the!exclamation!mark <-- bad
/article/1/i%21want%21use%21the%21exclamation%21mark <-- good

Related

How to allow 1-9 a-z A-Z - _ % in url via htaccess?

I want to allow in url (1-9 , a-z, A-z, -, _ , %)
I have below code in htaccess
RewriteRule ^shop/search/([a-zA-Z0-9_-]+)/?$ shop.php?search=$1 [QSA,NC]
Issue : when space is passed in url
Example
domain.com/shop/search/my%20keyword
It is not working
Basically i want to allow % in url via htaccess
How to do it?
... it is matched against the (%-decoded) URL-path of the request ...
source, emphasis mine.
mod_rewrite never sees the %, it decodes the %20 to a space. If you want to accept %20 in the URL then add space to the character class.
Basically i want to allow % in url via htaccess How to do it?
You can use this rewrite rule with negative character class:
RewriteRule ^shop/search/([^/]+)/?$ shop.php?search=$1 [QSA,NC,L]
[^/]+ will match 1 or more of any character that is not / hence it will match whitespace or any other decoded character also that you want to match.

Regex for alphanumeric characters plus brackets and spaces

I am trying to build a regex to match these strings:
jfldfldf ldjfdlf ldfl
ldfldf 8998 dfjldjf 89dfdf dfdf899
ljdljf [dff]dfdf (fdfdf) 898
Requirements:
String should starts only with any small or capital character (A-Z)
It may contain spaces or brackets (( ) [ ])
Any other special characters are not allowed
I tried /^[a-zA-Z]+[\sa-zA-Z0-9\[\]\(\)].+/m, but it is still accepting other special characters.
So close.
/^[a-zA-Z]+[\sa-zA-Z0-9\[\]\(\)].+/m
^ ^ ^-- missing $
^ ^-- delete this dot
^-- you could also delete this plus, but that's not as important
/^[a-zA-Z]{1}[a-zA-Z0-9\ \[\]\(\)]+$/m
\s = allows whitespaces like spaces tabs and new lines, so this should probably be "\ "
Because the rule is only the first letter needs to be a capital or lowercase letter, strictly it's {1} as + means one or more.
Needed a $ at the end to show this is the end of the line, and nothing else can follow it
The biggest thing that is failing in that regex is the single '.'. That serves as a wildcard matching any value aside from a new line. The plus symbols are not needed and the end of string character '$' is missing.
/^[a-zA-Z][\sa-zA-Z0-9\[\]\(\)]$/m

Using .htaccess to make fancy URLs with a wide variety of characters

I'm wanting to make a URL look pleasing to the eye.
from
/index.php?a=grapes
to
/grapes
Although, I'm having a few problems. I wanted a to have a wider variety of characters like a-z A-Z 0-9 / _ - . [ ].
from
/index.php?a=Grapes.Are.Green/Red[W4t3r-M3l0n_B1G_Gr4p3]
to
/Grapes.Are.Green/Red[W4t3r-M3l0n_B1G_Gr4p3]
In the index.php file I have
<?php
$a = $_GET["a"];
echo $a;
?>
just to test the URL is working correctly.
Right now what I have in .htaccess
RewriteEngine On
RewriteRule ^([a-zA-Z0-9/_]+)?$ index.php?a=$1
only accepts a-z A-Z 0-9 / _.
If I add - into the square brackets and have it as one of the
characters which a equals I get the 404 error.
If I add . into the square brackets I get index.php outputted.
If I add [ or ] I get the 404 error.
If anyone has a solution I'd love to see it. Also, if anyone has time please could you explain each part of the RewriteRule saying what the part does. Thanks!
The problem is that some of your character are "special":
Special characters:
(full stop) - match any character
* (asterix) - match zero or more of the previous symbol
+ (plus) - match one or more of the previous symbol
? (question) - match zero or one of the previous symbol
\? (backslash-something) - match special characters
^ (caret) - match the start of a string
$ (dollar) - match the end of a string
[set] - match any one of the symbols inside the square braces.
(pattern) - grouping, remember what the pattern matched as a special variable
So if you want to use them in a url, you have to scape them.
For example
.s?html? matches ".htm", ".shtm", ".html" or ".shtml"
RewriteEngine On
RewriteRule ^(.*)$ index.php?a=$1 [QSA]
The [QSA] thing at the end is what made it work :) Thanks to jedwards for suggesting to use ^(.*)$ which accepts all characters.

Match only letters and special characters with RegExp

How can I allow only letters and special characters with a regular expression?
I suggest you use GSkinner's REGEX builder and experiment with a lot of the examples on the right hand side. There are are many variations to get this job done. If you want to be explicit you can use:
/[a-zA-Z!##$%ยจ&*()-=+/*.{}]/
Tony's answer will also work, but includes more extra characters than the ones you've defined in your comment.
This
$str = $_REQUEST["htmlstringinput"];
preg_match("([\w\-]+[##%.])", $str);
for letters, numbers and special characters in this special character range [##%.] are allowed
and this
$str = $_REQUEST["htmlstringinput"];
preg_match("([-a-zA-Z]+[##%.])", $str);
for only letters and special characters in the same special character range as above
Worked for me. For further reading and research you can go to : http://gskinner.com/RegExr/
/[\p{L}\p{P}]+/u
matches letters and punctuation characters. Or what did you mean by "special characters"?
all characters not a number? how bout this:
/[^\d]*/
Use following code in .htaccess to block all URLs with number (as per OP's comments)
Options +FollowSymlinks -MultiViews
RewriteEngine on
RewriteCond %{REQUEST_URI} ![0-9]
RewriteRule ^user/ /index.php?goto=missed [NC,L]

.htaccess rewrite rule won't allow ' and #

I have a rewrite rule that rewrites domain.co.uk/member.php?x=$member to domain.co.uk/$member
It looks like this:
RewriteEngine On
RewriteRule ^([a-zA-Z0-9_-]+)$ member.php?x=$1
RewriteRule ^([a-zA-Z0-9_-]+)/$ member.php?x=$1
I've tried to just add ' and # to the square brackets but then I get a 500 internal server error. I need these characters for peoples usernames
How do I do this?
# is used to specify user and password in a URI string like this:
http://user:passw...#host/path.
You need to urlencode it: %40
Your path will be: /user%40foo.com or something like this
This should work
From RFC 1738:
The characters ";", "/", "?", ":",
"#", "=" and "&" are the characters
which may be reserved for special
meaning within a scheme. No other
characters may be reserved within a
scheme.
and:
Thus, only alphanumerics, the special
characters "$-_.+!*'(),", and
reserved characters used for their
reserved purposes may be used
unencoded within a URL.
What you should do:
Encode the '#' to %40.
Escape the single quote like in the .htaccess like so: \'

Categories