PHP Regular expression preg_replace definition missing - php

I want to make user input valid , therefor i'm using regual expression.
I'd like the user to insert only alphanumeric content and some specific characters as - ,!$^& etc.
So far the code I've got is :
$validText = preg_replace('#[^A-Za-z0-9\w\ ]#', '',$text);
But it only care for alphanumeric and spaces , but how do I define the regual expression to refer to characters as .'!## and etc ? Where do I define it ?
By the way, could you please refer me to a good regular expression?
Any suggesti

Simply add them to your group.
$validText = preg_replace('/[^A-Za-z0-9\w .\'!##]/', '',$text);
Note: preg_match() may be better suited in this case. Currently you're allowing invalid characters then removing them. Makes for a poor user experience.

Related

replace special strings in a html page by php

I am looking for a way to replace all string looking alike in entire page with their defined values
Please do not recommend me other methods of including language constants.
Strings like this :
[_HOME]
[_NEWS]
all of them are looking the same in [_*] part
Now the big issue is how to scan a HTML page and to replace the defined values .
One ways to parse the html page is to use DOMDocument and then pre_replace() it
but my main problem is writing a pattern for the replacement
$pattern = "/[_i]/";
$replacement= custom_lang("/i/");
$doc = new DOMDocument();
$htmlPage = $doc->loadHTML($html);
preg_replace($pattern, $replacement, $htmlPage);
In RegEx, [] are operators, so if you use them you need to escape them.
Other problem with your expression is _* which will match Zero or more _. You need to replace it with some meaningful match, Like, _.* which will match _ and any other characters after that. SO your full expression becomes,
/\[_.*?\]/
Hey, why an ?, you might be tempted to ask: The reason being that it performs a non-greedy match. Like,
[_foo] [_bar] is the query string then a greedy match shall return one match and give you the whole of it because your expression is fully valid for the string but a non-greedy match will get you two seperate matches. (More information)
You might be better-off in being more constrictive, by having an _ followed by Capital letters. Like,
/\[_[A-Z]+\]/
Update: Using the matched strings and replacing them. To do so we use the concept called back-refrencing.
Consider modifying the above expression, enclosing the string in parentheses, like, /\[_([A-Z]+)\]/
Now in preg-replace arguments we can use the expression in parentheses by back-referencing them with $1. So what you can use is,
preg_replce("/\[_([A-Z]+)\]/e", "my_wonderful_replacer('$1')", $html);
Note: We needed the e modifier to treat the second parameter as PHP code. (More information)
If you know the full keyword you are trying to replace (e.g. [_HOME]), then you can just use str_replace() to replace all instances.
No need to make things like this more complex by introducing regex.

php validating and authenticating strings

I have a couple of strings that make up a CustomerInfo Object.I want to write a function that authenticates each of those strings.
For instance I have the following values which are stored as strings:
apartment number
Street
City
Name
Telephone
Email
Each of these is received by the server and they need to be stored in a database. However, before I do that, I would like to authenticate the contents of the strings variables that carry these values.
I am new to PHP and server side programming in general. I was wondering what are some good and yet simple strategies to accomplish this.
Could someone point me to some ideas and links please.
Thanks
This is where regex is useful. Meet preg_match. You may want to read this tutorial on regular expressions :) And when you get good at them keep in mind that the can't do everything. They can only parse regular languages. Sometimes people get carried away and try to do too much with regex, so that's just a warning. Here's a simple example usage:
if(!preg_match('/^[\w.%+-]+#[\w.-]+\.[A-Z]{2,5}$/i', $email)){
// Email is unvalid.
// Handle it here
}
The pattern/^[\w.%+-]+#[\w.-]+\.[A-Z]{2,5}$/i can be broken down like:
/ --> Delimiter, any character can be used.
^ --> Start of the string
[\w.%+-]+ --> One or more (+) characters from the set [\w.%+-] which allows word characters (letters and underscores) and any of the symbols '.%+-'
# --> A single # sign
[\w.-]+ --> One or more word characters dots or hyphens
\. --> A single dot
[A-Z]{2,5} --> 2-5 capital letters
$ --> End of string
/ --> End delimeter (End of regular expression)
i --> Case-insensitive modifier (This means that where I had A-Z before will now also match a-z
In php you can use preg_match to check your strings against regular expressions. If you want to use these but you are not familiar with regular expressions you could look at this simple tutorial or you could just search for a good regular expression in an online regular expression database (this example shows regular expressions used for email pattern checking).
You can use built-in PHP filters to perform standard validation:
http://php.net/manual/en/function.filter-var.php
http://php.net/manual/en/filter.filters.validate.php
// Example:
if (filter_var($email_input, FILTER_VALIDATE_EMAIL) !== false) {
// Invalid e-mail!
}
Please note, this is only available for PHP 5.2+

REGEX (PCRE) matching only if zero or once

I have the following problem.
Let's take the input (wikitext)
======hello((my first program)) world======
I want to match "hello", "my first program" and " world" (notice the space).
But for the input:
======hello(my first program)) world======
I want to match "hello(my first program" and " world".
In other words, I want to match any letters, spaces and additionally any single symbols (no double or more).
This should be done with the unicode character properties like \p{L}, \p{S} or \p{Z}, as documented here.
Any ideas?
Addendum 1
The regex has just to stop before any double symbol or punctuation in unicode terms, that is, before any \p{S}{2,} or \p{P}{2,}.
I'm not trying to parse the whole wikitext with this, read my question carefully. The regex I'm looking for IS for the lexer I'm working on, and making it match such inputs will simplify my parser incredibly.
Addendum 2
The pattern must work with preg_match(). I can imagine how I'd have to split it first. Perhaps it would use some lookahead, I don't know, I've tried everything that I could imagine.
Using only preg_match() is a requirement set in stone by the current implementation of the lexer. It must be that way, because that's the natural way of how lexers work: they match sequences in the input stream.
return preg_split('/([\pS\pP])\\1+/', $theString);
Result: http://www.ideone.com/YcbIf
(You need to get rid of the empty strings manually.)
Edit: as a preg_match regex:
'/(?:^|([\pS\pP])\\1+)((?:[^\pS\pP]|([\pS\pP])(?!\\3))*)/'
take the 2nd capture group when it is matched. Example: http://www.ideone.com/ErTVA
But you could just consume ([\pS\pP])\\1+ and discard, or if doesn't match, consume (?:[^\pS\pP]|([\pS\pP])(?!\\3))* and record, since your lexer is going to use more than 1 regex anyway?
Regular expressions are notoriously overused and ill-suited for parsing languages like this. You can get away with it for a little while, but eventually you will find something that breaks your parser, requiring tweak after tweak and a huge library of unit tests to ensure compliance.
You should seriously consider writing a proper lexer and parser instead.

Regular expression to convert usernames into links like Twitter does

in twitter
when you write #moustafa
will change to <a href='user/moustafa'>#moustafa</a>
now i want make the same thing
when write #moustafa + space its change #moustafa only
One regular expression that could be used (shamelessly stolen from the #anywhere javascript library mentioned in another answer) would be:
\B\#([a-zA-Z0-9_]{1,20})
This looks for a non–word-boundary (to prevent a#b [i.e. emails] from matching) followed by #, then between one and 20 (inclusive) characters in that character class. Of course, the anything-except-space route, as in other answers; it depends very much on what values are to be (dis)allowed in the label part of the #label.
To use the highlighted regex in PHP, something like the following could be used to replace a string $subject.
$subject = 'Hello, #moustafa how are you today?';
echo preg_replace('/\B\#([a-zA-Z0-9_]{1,20})/', '$0', $subject);
The above outputs something like:
Hello, #moustafa how are you today?
You're looking for a regular expression that matches #username, where username doesn't have a space? You can use:
#[^ ]+
If you know the allowed characters in a username you can be more specific, like if they have to be alphanumeric:
#[A-Za-z0-9]+
Regular Expressions in PHP are just Strings that start and end with the same character. By convention this character is /
So you can use something like this as an argument to any of the many php regular expression functions:
Not space:
"/[^ ]+/"
Alphanumeric only:
"/[A-Za-z0-9]+/"
Why not use the #anywhere javascript library that Twitter have recently released?
There are several libraries that perform this selection and linking for you. Currently I know of Java, Ruby, and PHP libraries under mzsanford's Github account: http://github.com/mzsanford/twitter-text-rb

Regular Expression to Detect a Specific Query

I wonder if you anyone can construct a regular expression that can detect if a person searches for something like "site:cnn.com" or "site:www.globe.com.ph/". I've been having the most difficult time figuring it out. Thanks a lot in advance!
Edit: Sorry forgot to mention my script is in PHP.
Ok, for input into an arbitary text field, something as simple as the following will work:
\bsite:(\S+)
where the parentheses will capture whatever site/domain they're trying to search. It won't verify it as valid, but validating urls/domains is complex and there are many easily googlable regexes for doing that, for instance, there's one here.
What are you matching against? A referer url?
Assuming you're matching against a referer url that looks like this:
http://www.google.com/search?client=safari&rls=en-us&q=whatever+site:foo.com&ie=UTF-8&oe=UTF-8
A regex like this should do the trick:
\bsite(?:\:|%3[aA])(?:(?!(?:%20|\+|&|$)).)+
Notes:
The colon after 'site' can either be unencoded or it can be percent encoded. Most user agents will leave it unencoded (which I believe is actually contrary to the standard), but this will handle both
I assumed the site:... url would be right-bounded by the equivalent of a space character, end of field (&) or end of string ($)
I didn't assume x-www-form-urlencoded encoding (spaces == '+') or spaces encoded with percent encoding (space == %20). This will handle both
The (?:...) is a non-capturing group. (?!...) is a negative lookahead.
no it's not for a referrer url. My php script basically spits out information about a domain (e.g. backlinks, pagerank etc) and I need that regex so it will know what the user is searching for. If the user enters something that doesn't match the regex, it does a regular web search instead.
If this is all you are trying to do, I guess I'd take the more simple approach and just do:
$entry = $_REQUEST['q'];
$tokens = split(':', trim($entry));
if (1 < count($tokens) && strtolower($tokens[0]) == 'site')
$site = $tokens[1];

Categories