PHP preg_replace() pattern, string sanitization - php

I have a regex email pattern and would like to strip all but pattern-matched characters from the string, in a short I want to sanitize string...
I'm not a regex guru, so what I'm missing in regex?
<?php
$pattern = "/^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+#((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i";
$email = 'contact<>#domain.com'; // wrong email
$sanitized_email = preg_replace($pattern, NULL, $email);
echo $sanitized_email; // Should be contact#domain.com
?>
Pattern taken from: http://fightingforalostcause.net/misc/2006/compare-email-regex.php (the very first one...)

You cannot filter and match at the same time. You'll need to break it up into a character class for stripping invalid characters and a matching regular expression which verifies a valid address.
$email = preg_replace($filter, "", $email);
if (preg_match($verify, $email)) {
// ok, sanitized
return $email;
}
For the first case, you want to use a negated character class /[^allowedchars]/.
For the second part you use the structure /^...#...$/.
Have a look at PHPs filter extension. It uses const unsigned char allowed_list[] = LOWALPHA HIALPHA DIGIT "!#$%&'*+-=?^_\{|}~#.[]";` for cleansing.
And there is the monster for validation: line 525 in http://gcov.php.net/PHP_5_3/lcov_html/filter/logical_filters.c.gcov.php - but check out http://www.regular-expressions.info/email.html for a more common and shorter variant.

i guess filter_var php function can also do this functionality, and in a cleaner way.
Have a look at:
http://www.php.net/manual/en/function.filter-var.php
example:
$email = "chris#exam\\ple.com";
$cleanEmail = filter_var($email, FILTER_SANITIZE_EMAIL); // chris#example.com

Related

php preg replace include slash(/)

i use preg replace since my column database does not support "strange letters"
but after regex i need keep "/", in this code bellow "/" is always missing
in code bellow i need to get all letter complete
<?php
$jurnalName = "TL 110/90-12 K93-N02 AHM+";
$name = htmlspecialchars(htmlentities($jurnalName));
$name = preg_replace('/[^A-Za-z0-9|\- +]/', '', $name);
var_dump($name);
the result is always "TL 11090-12 K93-N02 AHM+" what i expecting is complete "TL 110/90-12 K93-N02 AHM+"
Add / to the list of chars you want to keep
<?php
$jurnalName = "TL 110/90-12 K93-N02 AHM+";
$name = htmlspecialchars(htmlentities($jurnalName));
$name = preg_replace('/[^A-Za-z0-9|\-\s\+\/]/', '', $name);
var_dump($name);
(I also changed the space for \s and scaped the plus +, its optional here but I think its a good practice for characters that have spetial meaning inside regex

PHP replace everything between two symbols with something else

I'm trying to replace the domain name of email addresses, between # and . with ***
For example:
$email1 = info#mytestdomain.com
$email2 = info#mytestdomain.net
Need to be become
$email1 = info#***.com
$email2 = info#***.net
I know I can use the PHP preg_replace function but I'm not sure what regex I need to use in my case. So my question is, which regex should I use in my case to replace everything between # and . with ***?
Thanks
You can use this assertion based regex.
$eml = preg_replace('/#\K[^.]+/', '***', $eml);
Live Demo
Live demo
$email1 = "info#mytestdomain.com";
echo preg_replace("/(.*#)([^\.]+)(\..*)/","$1***$3",$email1);
Output:
info#***.com
You could use a positive lookahead also.
$email1 = "info#mytestdomain.com";
echo preg_replace("/[^#]+(?=\.)/","***",$email1);
Pattern Explanation:
[^#]+(?=\.) Matches any character but not of # one or more times only if the characters are followed by a literal dot.

Detecting emails in a text

I'm trying to create a function that translates every occurrence of a plain text email address in a given string into it's htmlized version.
Let's say I have the following code, where htmlizeEmails is the function I'm looking for:
$str = "Send me an email to bob#example.com.";
echo htmlizeEmails($str); // Echoes "Send me an email to bob#example.com."
If possible, I'd like this function to use the filter_var function to check if the email is valid.
Does anyone know how to do this? Thanks!
Edit:
Thanks for the answers, I used Shocker's regex to match potential email addresses and then, only if the filter_var validates it, it gets replaced.
function htmlizeEmails($text)
preg_match_all('/([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})/', $text, $potentialEmails, PREG_SET_ORDER);
$potentialEmailsCount = count($potentialEmails);
for ($i = 0; $i < $potentialEmailsCount; $i++) {
if (filter_var($potentialEmails[$i][0], FILTER_VALIDATE_EMAIL)) {
$text = str_replace($potentialEmails[$i][0], '' . $potentialEmails[$i][0] .'', $text);
}
}
}
$str = preg_replace('/([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})/', '$1', $str);
where ([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}) is the regular expression used for detecting an email address (this is a general example, email addresses may be more complicated than this and not all addresses may be covered, but finding the perfect regex for emails is up to you)
There's always matching every sequence of non-space characters and testing those with filter_var, but this is probably one of those cases where it's just better to use regular expressions.
echo preg_replace('/(([\w!#$%&\'*+\-\/=?^`{|}~]|\\\\\\\\|\\\\?"|\\\\ )+\.)*([\w!#$%&\'*+\-\/=?^`{|}~]|\\\\\\\\|\\\\?"|\\\\ )+#((\w+[\.-])*[a-zA-Z]{2,}|\[(\d{1,3}\.){3}\d{1,3}\])/', '$0', $str);
I've tried to follow the standard as best I could without making it ridiculously compliant. And anybody who puts comments in his or her e-mail address can just be forgotten safely, I think. And it definitely works for common e-mails.
EDIT: After a long, difficult struggle, here's my regular expression to match everything:
((([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~]+|"([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~(),:;<>#\[\]]|\\[ \\"])+")\.)*([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~]+|"([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~(),:;<>#\[\]]|\\[ \\"])+"))#((([a-zA-Z0-9]([a-zA-Z0-9]*(\-[a-zA-Z0-9]*)*)?\.)*[a-zA-Z]{2,}|\[((0?\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\.){3}(0?\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\]|\[[Ii][Pp][vV]6(:[0-9a-fA-F]{0,4}){6}\]))
Enjoy escaping it!
The code below should work fine, but it regex is easier to go with.
$str = "Send me an email to bob#example.com.";
function htmlizestring($a){
if(substr_count($a,"#") != 1){
return false;
}else{
$b4 = stristr($a,"#",true);
$b4pos = strripos($b4," ")+1;
$b4 = trim(substr($b4,$b4pos));
$after = stristr($a,"#");
if(substr_count($after, " ") == 0){
$after=rtrim($after," .,");
}else{
$after=trim(stristr($after," ",true));
}
$email = $b4.$after;
echo $email;
if(filter_var($email, FILTER_VALIDATE_EMAIL)){
echo "Send me an email at: <a href='mailto:".$email."'>".$email."</a>";
}else{
return false;
}
}
}
htmlizestring($str);
I happen to use stristr() with the third parameter TRUE, which only works on php 5.3+
filter_var is nice to validate an email, but Dominic Sayers' is_email is even better, and my personal choice.
source code: http://code.google.com/p/isemail/source/browse/PHP/trunk/is_email.php
about: http://isemail.info/about

Regex Get Email handle from Email Address

I have an email address that could either be
$email = "x#example.com"; or $email="Johnny <x#example.com>"
I want to get
$handle = "x"; for either version of the $email.
How can this be done in PHP (assuming regex). I'm not so good at regex.
Thanks in advance
Use the regex <?([^<]+?)# then get the result from $matches[1].
Here's what it does:
<? matches an optional <.
[^<]+? does a non-greedy match of one or more characters that are not ^ or <.
# matches the # in the email address.
A non-greedy match makes the resulting match the shortest necessary for the regex to match. This prevents running past the #.
Rubular: http://www.rubular.com/r/bntNa8YVZt
Here is a complete PHP solution based on marcog's answer
function extract_email($email_string) {
preg_match("/<?([^<]+?)#([^>]+?)>?$/", $email_string, $matches);
return $matches[1] . "#" . $matches[2];
}
echo extract_email("ice.cream.bob#gmail.com"); // outputs ice.cream.bob#gmail.com
echo extract_email("Ice Cream Bob <ice.cream.bob#gmail.com>"); // outputs ice.cream.bob#gmail.com
Just search the string using this basic email-finding regex: \b[A-Z0-9._%+-]+#[A-Z0-9.-]+.[A-Z]{2,4}\b
It will match any email in any text, and in your first string it will match the whole string, and in the second, only the part of the string that is e-mail.
To quickly learn regexp this is the best place: http://www.regular-expressions.info
$email = 'x#gmail.com';
preg_match('/([a-zA-Z0-9\-\._\+]+#[a-z0-9A-Z\-\._]+\.[a-zA-Z]+)/', $email, $regex);
$handle = array_shift(explode('#', $regex[1]));
Try that (Not tested)

How can I remove the NULL character from string

I have a PHP variable that contains a string which represents an XML structure. This string contains ilegal characters that dont let me build a new SimpleXMLElement object from the string. I dont have a way to ask the source of the content to modify their response, so I need to execute some cleaning on this string before I create a SimpleXMLElement object.
I believe the character causing the problem is a  (0x00 (00) HEX) character, and its located within one of the Text Nodes of this string XML.
What is the best way to remove this character or other characters that could break the SimpleXMLElement object.
$text = str_replace("\0", "", $text);
will replace all null characters in the $text string. You can also supply arrays for the first two arguments, if you want to do multiple replacements.
trim() will also remove null characters, from either end of the source string (but not within).
$text = trim($text);
I've found this useful for socket server communication, especially when passing JSON around, as a null character causes json_decode() to return null.
While it's probably not the primary target of your question, please have a look at PHP's filter functions: http://www.php.net/manual/en/intro.filter.php
Filter functions validate and sanitize values. Form the PHP site:
$a = 'joe#example.org';
$b = 'bogus - at - example dot org';
$c = '(bogus#example.org)';
$sanitized_a = filter_var($a, FILTER_SANITIZE_EMAIL);
if (filter_var($sanitized_a, FILTER_VALIDATE_EMAIL)) {
echo "This (a) sanitized email address is considered valid.\n";
}
$sanitized_b = filter_var($b, FILTER_SANITIZE_EMAIL);
if (filter_var($sanitized_b, FILTER_VALIDATE_EMAIL)) {
echo "This sanitized email address is considered valid.";
} else {
echo "This (b) sanitized email address is considered invalid.\n";
}
$sanitized_c = filter_var($c, FILTER_SANITIZE_EMAIL);
if (filter_var($sanitized_c, FILTER_VALIDATE_EMAIL)) {
echo "This (c) sanitized email address is considered valid.\n";
echo "Before: $c\n";
echo "After: $sanitized_c\n";
}
Result:
This (a) sanitized email address is considered valid.
This (b) sanitized email address is considered invalid.
This (C) sanitized email address is considered valid.
Before: (bogus#example.org)
After: bogus#example.org

Categories