I have a PHP variable that contains a string which represents an XML structure. This string contains ilegal characters that dont let me build a new SimpleXMLElement object from the string. I dont have a way to ask the source of the content to modify their response, so I need to execute some cleaning on this string before I create a SimpleXMLElement object.
I believe the character causing the problem is a (0x00 (00) HEX) character, and its located within one of the Text Nodes of this string XML.
What is the best way to remove this character or other characters that could break the SimpleXMLElement object.
$text = str_replace("\0", "", $text);
will replace all null characters in the $text string. You can also supply arrays for the first two arguments, if you want to do multiple replacements.
trim() will also remove null characters, from either end of the source string (but not within).
$text = trim($text);
I've found this useful for socket server communication, especially when passing JSON around, as a null character causes json_decode() to return null.
While it's probably not the primary target of your question, please have a look at PHP's filter functions: http://www.php.net/manual/en/intro.filter.php
Filter functions validate and sanitize values. Form the PHP site:
$a = 'joe#example.org';
$b = 'bogus - at - example dot org';
$c = '(bogus#example.org)';
$sanitized_a = filter_var($a, FILTER_SANITIZE_EMAIL);
if (filter_var($sanitized_a, FILTER_VALIDATE_EMAIL)) {
echo "This (a) sanitized email address is considered valid.\n";
}
$sanitized_b = filter_var($b, FILTER_SANITIZE_EMAIL);
if (filter_var($sanitized_b, FILTER_VALIDATE_EMAIL)) {
echo "This sanitized email address is considered valid.";
} else {
echo "This (b) sanitized email address is considered invalid.\n";
}
$sanitized_c = filter_var($c, FILTER_SANITIZE_EMAIL);
if (filter_var($sanitized_c, FILTER_VALIDATE_EMAIL)) {
echo "This (c) sanitized email address is considered valid.\n";
echo "Before: $c\n";
echo "After: $sanitized_c\n";
}
Result:
This (a) sanitized email address is considered valid.
This (b) sanitized email address is considered invalid.
This (C) sanitized email address is considered valid.
Before: (bogus#example.org)
After: bogus#example.org
Related
I have an email system, where user write a message and it will send the message.
The main problem which I just found, consider this code
$findEmail = $this->Data->field('body', array('id' => 1610));
//$getUserEmailTemplate will take frm dbase and e.g:
//Hi, ##MESSAGE##. From: StackOverflow
//It should change ##MESSAGE## part to data from $findEmail (in this example is the $74.97 ...)
$getUserEmailTemplate = $findUser['User']['email_template'];
$emailMessage = preg_replace('/\B##MESSAGE##\B/u', $findEmail, $getUserEmailTemplate);
debug($findEmail);
debug($emailMessage);
and consider this input for the email for $findemail result:
$74.97
$735.00s
$email Message will result in:
.97
5.00s
How can I fix this? I feel like there's problem with my preg_replace pattern.
User template can be anything, as long as there is ##MESSAGE## which, that part will be changed to the user message input.
Thank you
Pre-parse the replacement text to escape the $ when followed by a number (remember that $n has special meaning when using in the replacement text). See the comment on the php.net docs page:
If there's a chance your replacement text contains any strings such as
"$0.95", you'll need to escape those $n backreferences:
<?php
function escape_backreference($x){
return preg_replace('/\$(\d)/', '\\\$$1', $x);
}
?>
The high-voted function escape_backreference is incomplete in the general case: it will only escape backreferences of the form $n, but not those of the form ${n} or \n.
To escape any potential backreferences, change
$emailMessage = preg_replace('/\B##MESSAGE##\B/u', $findEmail, $getUserEmailTemplate);
to
$emailMessage = preg_replace('/\B##MESSAGE##\B/u', addcslashes($findEmail, '\\$'), $getUserEmailTemplate);
Here is the reason:
The $1 portion of a replacement text stands for the first group/match found. So if you have abc 123 and you try preg_match('/([\w]+)-([\d]+)/'), regex will store internally something like $1 = abc and $2 = 123. Those variables are going to exists, even if they have no value.
So, for example:
$text = '[shortcode]';
$replacement = ' some $var $101 text';
$result = preg_replace('/\[shortcode\]/', $var, $text);
// returns "some $var 1 text"
As the match group $10 is empty is going to be replaced by a null string.
That's why you need to scape any $NN from your REPLACEMENT text before running the preg_replace function.
Happy coding.
If (ever) an template hast been in $getUserEmailTemplate, you did overwrite (destroy) it with this line;
$getUserEmailTemplate = "##MESSAGE##";
So just remove this line and make sure, $getUserEmailTemplate really contains anything and best of all a template.
Guess your template just includes "pure" PHP and tries to use $74 as variable, which does not exist and does not hold any data. So change the quotes in the template to single quotes '.
guessed template:
$tpl = "Sum: $74.97"; //results in "Sum: .97"
corrected template:
$tpl = 'Sum: $74.97'; //results in "Sum: $74.97"
I have this code in preg_match
if (preg_match("/(for+\([\w\-]+\;[\w\-]+\;[\w\-]+\){)/",$email))
{
$message = "Valid input";
}
else
$message ="Invalid Input";
if the user will input for(aw;aw;aw){
if will output Valid input
but if the user will put a space like for (awd ; awd; awd) {
it will output invalid input..
my problem is how can i bypass space or remove space without using explode to my string..
need help..
You can match a space like any other character. So for example, you can just add spaces where needed, like below:
if (preg_match("/(for+ *\([\w\-]+ *\; *[\w\-]+ *\; *[\w\-]+\) *{)/",$email))
However, for+ matches 1 or more literal r's so would also match positively on forrrr, so just using for might be more appropriate there.
I am creating a simple checker function in PHP to validate strings before putting them into an SQL query. But I can not get the right results the from the preg_match function.
$myval = "srg845s4hs64f849v8s4b9s4vs4v165";
$tv = preg_match('/[^a-z0-9]/', $myval);
echo $tv;
Sometimes nothing echoed to the source code, not even a false value... I want to get 1 as the result of this call, because $myval only contains lowercase alphanumerics and numbers.
So is there any way in php to detect if a string only contains lowercase alphanumerics and numbers using the preg_match function?
Yes, the circumflex goes outside the [] to indicate the start of the string, you probably need an asterisk to allow an arbitrary number of characters, and you probably want a $ at the end to indicate the end of the string:
$tv = preg_match('/^[a-z0-9]*$/', $myval);
If you write [^a-z] it means anything else than a-z.
If you want to test if a string contains lowercase alphanumerics only, I would present your code that way to get the proper results (what you wrote already works):
$myval = "srg845s4hs64f849v8s4b9s4vs4v165";
$tv = preg_match('/[^a-z0-9]/', $myval);
if($tv === 0){
echo "the string only contains lowercase alphanumerics";
}else if($tv === 1){
echo "the string does not only contain lowercase alphanumerics";
}else{
echo "error";
}
I'm trying to create a function that translates every occurrence of a plain text email address in a given string into it's htmlized version.
Let's say I have the following code, where htmlizeEmails is the function I'm looking for:
$str = "Send me an email to bob#example.com.";
echo htmlizeEmails($str); // Echoes "Send me an email to bob#example.com."
If possible, I'd like this function to use the filter_var function to check if the email is valid.
Does anyone know how to do this? Thanks!
Edit:
Thanks for the answers, I used Shocker's regex to match potential email addresses and then, only if the filter_var validates it, it gets replaced.
function htmlizeEmails($text)
preg_match_all('/([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})/', $text, $potentialEmails, PREG_SET_ORDER);
$potentialEmailsCount = count($potentialEmails);
for ($i = 0; $i < $potentialEmailsCount; $i++) {
if (filter_var($potentialEmails[$i][0], FILTER_VALIDATE_EMAIL)) {
$text = str_replace($potentialEmails[$i][0], '' . $potentialEmails[$i][0] .'', $text);
}
}
}
$str = preg_replace('/([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})/', '$1', $str);
where ([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}) is the regular expression used for detecting an email address (this is a general example, email addresses may be more complicated than this and not all addresses may be covered, but finding the perfect regex for emails is up to you)
There's always matching every sequence of non-space characters and testing those with filter_var, but this is probably one of those cases where it's just better to use regular expressions.
echo preg_replace('/(([\w!#$%&\'*+\-\/=?^`{|}~]|\\\\\\\\|\\\\?"|\\\\ )+\.)*([\w!#$%&\'*+\-\/=?^`{|}~]|\\\\\\\\|\\\\?"|\\\\ )+#((\w+[\.-])*[a-zA-Z]{2,}|\[(\d{1,3}\.){3}\d{1,3}\])/', '$0', $str);
I've tried to follow the standard as best I could without making it ridiculously compliant. And anybody who puts comments in his or her e-mail address can just be forgotten safely, I think. And it definitely works for common e-mails.
EDIT: After a long, difficult struggle, here's my regular expression to match everything:
((([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~]+|"([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~(),:;<>#\[\]]|\\[ \\"])+")\.)*([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~]+|"([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~(),:;<>#\[\]]|\\[ \\"])+"))#((([a-zA-Z0-9]([a-zA-Z0-9]*(\-[a-zA-Z0-9]*)*)?\.)*[a-zA-Z]{2,}|\[((0?\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\.){3}(0?\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\]|\[[Ii][Pp][vV]6(:[0-9a-fA-F]{0,4}){6}\]))
Enjoy escaping it!
The code below should work fine, but it regex is easier to go with.
$str = "Send me an email to bob#example.com.";
function htmlizestring($a){
if(substr_count($a,"#") != 1){
return false;
}else{
$b4 = stristr($a,"#",true);
$b4pos = strripos($b4," ")+1;
$b4 = trim(substr($b4,$b4pos));
$after = stristr($a,"#");
if(substr_count($after, " ") == 0){
$after=rtrim($after," .,");
}else{
$after=trim(stristr($after," ",true));
}
$email = $b4.$after;
echo $email;
if(filter_var($email, FILTER_VALIDATE_EMAIL)){
echo "Send me an email at: <a href='mailto:".$email."'>".$email."</a>";
}else{
return false;
}
}
}
htmlizestring($str);
I happen to use stristr() with the third parameter TRUE, which only works on php 5.3+
filter_var is nice to validate an email, but Dominic Sayers' is_email is even better, and my personal choice.
source code: http://code.google.com/p/isemail/source/browse/PHP/trunk/is_email.php
about: http://isemail.info/about
I have a regex email pattern and would like to strip all but pattern-matched characters from the string, in a short I want to sanitize string...
I'm not a regex guru, so what I'm missing in regex?
<?php
$pattern = "/^([\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+\.)*[\w\!\#$\%\&\'\*\+\-\/\=\?\^\`{\|\}\~]+#((((([a-z0-9]{1}[a-z0-9\-]{0,62}[a-z0-9]{1})|[a-z])\.)+[a-z]{2,6})|(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?)$/i";
$email = 'contact<>#domain.com'; // wrong email
$sanitized_email = preg_replace($pattern, NULL, $email);
echo $sanitized_email; // Should be contact#domain.com
?>
Pattern taken from: http://fightingforalostcause.net/misc/2006/compare-email-regex.php (the very first one...)
You cannot filter and match at the same time. You'll need to break it up into a character class for stripping invalid characters and a matching regular expression which verifies a valid address.
$email = preg_replace($filter, "", $email);
if (preg_match($verify, $email)) {
// ok, sanitized
return $email;
}
For the first case, you want to use a negated character class /[^allowedchars]/.
For the second part you use the structure /^...#...$/.
Have a look at PHPs filter extension. It uses const unsigned char allowed_list[] = LOWALPHA HIALPHA DIGIT "!#$%&'*+-=?^_\{|}~#.[]";` for cleansing.
And there is the monster for validation: line 525 in http://gcov.php.net/PHP_5_3/lcov_html/filter/logical_filters.c.gcov.php - but check out http://www.regular-expressions.info/email.html for a more common and shorter variant.
i guess filter_var php function can also do this functionality, and in a cleaner way.
Have a look at:
http://www.php.net/manual/en/function.filter-var.php
example:
$email = "chris#exam\\ple.com";
$cleanEmail = filter_var($email, FILTER_SANITIZE_EMAIL); // chris#example.com