Exception on php mb_ereg_match

Exception on php mb_ereg_match - php

I am using mb_ereg_match to validate that a domain name does not containe illegal characters.
I am using this regex:
'/:\/\/|www[.][a-zA-Zα-ωΑ-ΩάέύήίόώϋϊΐΰΆΈΏΊΎΌΉΫΪÀàÂâÆæÄäÇçÉéÈèÊêËëÎîÏïÔôŒœÖöÙùÛûÜüŸÿ0-9]+[.]|^[-]+|^[.]+|[-]+$|[.]+$|[-]{2,}|[.]{2,}|[^\w-.]|-[.]|[.]-/u'
Which as you can se by your self contain all the basic latin chars, nums, France's letters and the whole Greek alphabet.
My validation code is the following:
$utf8 = (mb_detect_encoding($value) == 'UTF-8') ? TRUE : FALSE;
if ($utf8){
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
$matches = mb_ereg_match($pattern, $value);
}else{
preg_match($pattern, $value, $matches);
}
I am trying to validate this:
'geoσσσrge.cσσσσσm.gr'
Here is the error I get:
mb_ereg_match(): mbregex compile err: empty range in char class
The error does not appear all the time. Usually it apears when it stays idle for a long time and after I refresh my page returns to normal.
I don't know how to handle this error or how to approche it in order to find the source of the problem.
Any suggestions?

\w through . is not a range it can understand. Escape the - or move the - to the start; [^\w-.].
$pattern = '/:\/\/|www[.][a-zA-Zα-ωΑ-ΩάέύήίόώϋϊΐΰΆΈΏΊΎΌΉΫΪÀàÂâÆæÄäÇçÉéÈèÊêËëÎîÏïÔôŒœÖöÙùÛûÜüŸÿ0-9]+[.]|^[-]+|^[.]+|[-]+$|[.]+$|[-]{2,}|[.]{2,}|[^\w\-.]|-[.]|[.]-/u';
$value = 'geoσσσrge.cσσσσσm.gr';
$utf8 = (mb_detect_encoding($value) == 'UTF-8') ? TRUE : FALSE;
if ($utf8){
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
$matches = mb_ereg_match($pattern, $value);
}else{
preg_match($pattern, $value, $matches);
}
or
$pattern = '/:\/\/|www[.][a-zA-Zα-ωΑ-ΩάέύήίόώϋϊΐΰΆΈΏΊΎΌΉΫΪÀàÂâÆæÄäÇçÉéÈèÊêËëÎîÏïÔôŒœÖöÙùÛûÜüŸÿ0-9]+[.]|^[-]+|^[.]+|[-]+$|[.]+$|[-]{2,}|[.]{2,}|[^-\w.]|-[.]|[.]-/u';
$value = 'geoσσσrge.cσσσσσm.gr';
$utf8 = (mb_detect_encoding($value) == 'UTF-8') ? TRUE : FALSE;
if ($utf8){
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
$matches = mb_ereg_match($pattern, $value);
}else{
preg_match($pattern, $value, $matches);
}

Related

Url validation with preg_match

I want to make sure the link is from the chosen social type.
help!!
$socialType = 'youtube';
$link = 'https://www.youtube.co.uk/watch?v=DBK-Cy9ge4M';
if (!preg_match("/^(http|https):\\/\\/[a-z0-9_]+$socialType*\\.[_a-z]{2,5}"."((:[0-9]{1,5})?\\/.*)?$/i",$link))
{
return Response::json('inValid');
}
{
return Response::json('Valid');
}

There will be two options as-
1. with preg_match -
$subject = "https://www.youtube.co.uk/watch?v=DBK-Cy9ge4M";
$pattern = '/^youtube/';
preg_match($pattern, substr($subject,7), $matches);
print_r($matches);
2. with strops as (Ruslan Osmanov)-
$socialType = 'youtube';
$link = 'https://www.youtube.co.uk/watch?v=DBK-Cy9ge4M';
if (strpos($link, $socialType) !== false) {
return Response::json('Valid');
}

You can simply check, if the link contains your substring using strpos:
$link = 'https://www.youtube.co.uk/watch?v=DBK-Cy9ge4M';
$type = 'youtube';
if (strpos($link, $type) !== false) {
// passed
}
Or use a simple regular expression, if you want stricter check:
$reg_type = preg_quote($type, '/');
if (preg_match("/^https?:\/\/(www\.)?$reg_type/", $link)) {
// passed
}
Note, you should escape values passed into the regular expression using preg_quote.
The pattern should be just enough. Don't overcomplicate. It's generally impossible to write a perfect regular expression. For example, it is very unlikely to find HTTP(S) protocol prefix + optional "www." + "youtube." in an URL not belonging to Youtube.
Also, I wouldn't expect to get the answer with a universal regular expression for all kinds of social networks. Each has its own pattern.

What does "ÿþ" in the content of an URL mean? [duplicate]

I'm trying to read ID3 data in bulk. On some of the tracks, ÿþ appears. I can remove the first 2 characters, but that hurts the tracks that don't have it.
This is what I currently have:
$trackartist=str_replace("\0", "", $trackartist1);
Any suggestions would be greatful, thanks!

ÿþ is 0xfffe in UTF-8; this is the byte order mark in UTF-16.
You can convert your string to UTF-8 with iconv or mb_convert_encoding():
$trackartist1 = iconv('UTF-16LE', 'UTF-8', $trackartist1);
# Same as above, but different extension
$trackartist1 = mb_convert_encoding($trackartist1, 'UTF-16LE', 'UTF-8');
# str_replace() should now work
$trackartist1 = str_replace('ÿþ', '', $trackartist1);
This assumes $trackartist1 is always in UTF-16LE; check the documentation of your ID3 tag library on how to get the encoding of the tags, since this may be different for different files. You usually want to convert everything to UTF-8, since this is what PHP uses by default.

I had a similar problem but was not able to force UTF-16LE as the input charset could change. Finally I detect UTF-8 as follows:
if (!preg_match('~~u', $html)) {
For the case that this fails I obtain the correct encoding through the BOM:
function detect_bom_encoding($str) {
if ($str[0] == chr(0xEF) && $str[1] == chr(0xBB) && $str[2] == chr(0xBF)) {
return 'UTF-8';
}
else if ($str[0] == chr(0x00) && $str[1] == chr(0x00) && $str[2] == chr(0xFE) && $str[3] == chr(0xFF)) {
return 'UTF-32BE';
}
else if ($str[0] == chr(0xFF) && $str[1] == chr(0xFE)) {
if ($str[2] == chr(0x00) && $str[3] == chr(0x00)) {
return 'UTF-32LE';
}
return 'UTF-16LE';
}
else if ($str[0] == chr(0xFE) && $str[1] == chr(0xFF)) {
return 'UTF-16BE';
}
}
And now I'm able to use iconv() as you can see in #carpetsmoker answer:
iconv(detect_bom_encoding($html), 'UTF-8', $html);
I did not use mb_convert_encoding() as it did not remove the BOM (and did not convert the linebreaks as iconv() does):

Use regex replacement:
$trackartist1 = preg_replace("/\x00?/", "", $trackartist1);
The regex above seeks the first occurrence of "\x00"(hexadecimal zeros), if possible, and replaces it with nothing.

Reading an email's subject in Unicode out of an IMAP server

I'm using Zend Framework 1's IMAP server connector and I'm trying to fetch an email from server with Unicode characters in its subject. Here's how I do it:
$message = $imapServer->getMessage($message_number);
echo $message->getHeader('subject');
The problem is that it comes out encoded:
=?UTF-8?B?2KjYp9uM?=
I can find the encoding function within Zend_Mail class named _encodeHeader but I can not find the decoding pair! Does anyone know how to decode this string?
And here's the encoder function:
protected function _encodeHeader($value)
{
if (Zend_Mime::isPrintable($value) === false) {
if ($this->getHeaderEncoding() === Zend_Mime::ENCODING_QUOTEDPRINTABLE) {
$value = Zend_Mime::encodeQuotedPrintableHeader($value, $this->getCharset(), Zend_Mime::LINELENGTH, Zend_Mime::LINEEND);
} else {
$value = Zend_Mime::encodeBase64Header($value, $this->getCharset(), Zend_Mime::LINELENGTH, Zend_Mime::LINEEND);
}
}
return $value;
}

Search for a "RFC2047 decoder" and pick one of the existing libraries which does just that. If nothing is usable, roll your own.

Here's how I solved it:
switch (strtolower($encoding)) {
case \Zend_Mime::ENCODING_QUOTEDPRINTABLE:
if (preg_match('/^\s?=\?([^\?]+)\?Q\?/', $str, $matches) === 1) {
$str = preg_replace('/\s?=\?'.preg_quote($matches[1]).'\?Q\?/', ' ', $str);
$str = strtr($str, array('?=' => ''));
$str = trim($str);
}
return \Zend_Mime_Decode::decodeQuotedPrintable($str);
case \Zend_Mime::ENCODING_BASE64:
return base64_decode($encodedText);
case \Zend_Mime::ENCODING_7BIT:
case \Zend_Mime::ENCODING_8BIT:
default:
return $encodedText;
}

Why won't preg_match work?

Ever since "ereg" became depreciated, I began to use "preg_match". Unfortunately in my code, it doesn't accept my valid e-mail address. I am certain that this Regular Expression i'm using is working, but what I'm looking for is an alternative in doing this function or to point out what I'm doing wrong.
Here's my function:
function validate($email){
$regex = "^((([a-z]|[0-9]|!|#|$|%|&|'|\*|\+|\-|/|=|\?|\^|_|`|\{|\||\}|~)+(\.([a-z]|[0-9]|!|#|$|%|&|'|\*|\+|\-|/|=|\?|\^|_|`|\{|\||\}|~)+)*)#((((([a-z]|[0-9])([a-z]|[0-9]|\-){0,61}([a-z]|[0-9])\.))*([a-z]|[0-9])([a-z]|[0-9]|\-){0,61}([a-z]|[0-9])\.(af|ax|al|dz|as|ad|ao|ai|aq|ag|ar|am|aw|au|at|az|bs|bh|bd|bb|by|be|bz|bj|bm|bt|bo|ba|bw|bv|br|io|bn|bg|bf|bi|kh|cm|ca|cv|ky|cf|td|cl|cn|cx|cc|co|km|cg|cd|ck|cr|ci|hr|cu|cy|cz|dk|dj|dm|do|ec|eg|sv|gq|er|ee|et|fk|fo|fj|fi|fr|gf|pf|tf|ga|gm|ge|de|gh|gi|gr|gl|gd|gp|gu|gt| gg|gn|gw|gy|ht|hm|va|hn|hk|hu|is|in|id|ir|iq|ie|im|il|it|jm|jp|je|jo|kz|ke|ki|kp|kr|kw|kg|la|lv|lb|ls|lr|ly|li|lt|lu|mo|mk|mg|mw|my|mv|ml|mt|mh|mq|mr|mu|yt|mx|fm|md|mc|mn|ms|ma|mz|mm|na|nr|np|nl|an|nc|nz|ni|ne|ng|nu|nf|mp|no|om|pk|pw|ps|pa|pg|py|pe|ph|pn|pl|pt|pr|qa|re|ro|ru|rw|sh|kn|lc|pm|vc|ws|sm|st|sa|sn|cs|sc|sl|sg|sk|si|sb|so|za|gs|es|lk|sd|sr|sj|sz|se|ch|sy|tw|tj|tz|th|tl|tg|tk|to|tt|tn|tr|tm|tc|tv|ug|ua|ae|gb|us|um|uy|uz|vu|ve|vn|vg|vi|wf|eh|ye|zm|zw|com|edu|gov|int|mil|net|org|biz|info|name|pro|aero|coop|museum|arpa))|(((([0-9]){1,3}\.){3}([0-9]){1,3}))|(\[((([0-9]){1,3}\.){3}([0-9]){1,3})\])))$";
if (preg_match($regex, $email)) {
return true;
} else {
return false;
}
}
and here's my code to react with the function.
if (validate($email) == false){
$_SESSION['error'] = "You have an invalid email!<br /><br />";
header("Location: contact.php");
die;
}
As I run this code, it shows this error message:
Warning: preg_match() [function.preg-match]: No ending delimiter '^' found in C:\xampp\htdocs\2012\Next\inc\functions.php on line 11

Your regex pattern need delimiters:
function validate($email){
$regex = "/...your pattern.../";
if (preg_match($regex, $email)) {
return true;
} else {
return false;
}
}
EDIT: AND then you need to escape the delimiters characters in your regex, if present.
Using / as delimiter, you got two in your pattern:
$regex = "/^((([a-z]|[0-9]|!|#|$|%|&|'|\*|\+|\-|\/|=|\?|\^|_|`|\{|\||\}|~)+(\.([a-z]|[0-9]|!|#|$|%|&|'|\*|\+|\-|\/|=|\?|\^|_|`|\{|\||\}|~)+)*)#((((([a-z]|[0-9])([a-z]|[0-9]|\-){0,61}([a-z]|[0-9])\.))*([a-z]|[0-9])([a-z]|[0-9]|\-){0,61}([a-z]|[0-9])\.(af|ax|al|dz|as|ad|ao|ai|aq|ag|ar|am|aw|au|at|az|bs|bh|bd|bb|by|be|bz|bj|bm|bt|bo|ba|bw|bv|br|io|bn|bg|bf|bi|kh|cm|ca|cv|ky|cf|td|cl|cn|cx|cc|co|km|cg|cd|ck|cr|ci|hr|cu|cy|cz|dk|dj|dm|do|ec|eg|sv|gq|er|ee|et|fk|fo|fj|fi|fr|gf|pf|tf|ga|gm|ge|de|gh|gi|gr|gl|gd|gp|gu|gt| gg|gn|gw|gy|ht|hm|va|hn|hk|hu|is|in|id|ir|iq|ie|im|il|it|jm|jp|je|jo|kz|ke|ki|kp|kr|kw|kg|la|lv|lb|ls|lr|ly|li|lt|lu|mo|mk|mg|mw|my|mv|ml|mt|mh|mq|mr|mu|yt|mx|fm|md|mc|mn|ms|ma|mz|mm|na|nr|np|nl|an|nc|nz|ni|ne|ng|nu|nf|mp|no|om|pk|pw|ps|pa|pg|py|pe|ph|pn|pl|pt|pr|qa|re|ro|ru|rw|sh|kn|lc|pm|vc|ws|sm|st|sa|sn|cs|sc|sl|sg|sk|si|sb|so|za|gs|es|lk|sd|sr|sj|sz|se|ch|sy|tw|tj|tz|th|tl|tg|tk|to|tt|tn|tr|tm|tc|tv|ug|ua|ae|gb|us|um|uy|uz|vu|ve|vn|vg|vi|wf|eh|ye|zm|zw|com|edu|gov|int|mil|net|org|biz|info|name|pro|aero|coop|museum|arpa))|(((([0-9]){1,3}\.){3}([0-9]){1,3}))|(\[((([0-9]){1,3}\.){3}([0-9]){1,3})\])))$/";
EDIT 2:
about the error you were getting
Warning: preg_match() [function.preg-match]: No ending delimiter '^' found ...
it found the ^ character as first character in your pattern, taking it as a delimiter, and complained because it did not find a matching ^ delimiter ending the pattern.

You can do this without preg_xx altogether:
function validate($email)
{
return filter_var($email, FILTER_VALIDATE_EMAIL);
}
if (false === validate($email)) {
// invalid email given
}
It uses filter_var() together with FILTER_VALIDATE_EMAIL

Detect base64 encoding in PHP?

Is there some way to detect if a string has been base64_encoded() in PHP?
We're converting some storage from plain text to base64 and part of it lives in a cookie that needs to be updated. I'd like to reset their cookie if the text has not yet been encoded, otherwise leave it alone.

Apologies for a late response to an already-answered question, but I don't think base64_decode($x,true) is a good enough solution for this problem. In fact, there may not be a very good solution that works against any given input. For example, I can put lots of bad values into $x and not get a false return value.
var_dump(base64_decode('wtf mate',true));
string(5) "���j�"
var_dump(base64_decode('This is definitely not base64 encoded',true));
string(24) "N���^~)��r��[jǺ��ܡם"
I think that in addition to the strict return value check, you'd also need to do post-decode validation. The most reliable way is if you could decode and then check against a known set of possible values.
A more general solution with less than 100% accuracy (closer with longer strings, inaccurate for short strings) is if you check your output to see if many are outside of a normal range of utf-8 (or whatever encoding you use) characters.
See this example:
<?php
$english = array();
foreach (str_split('az019AZ~~~!##$%^*()_+|}?><": Iñtërnâtiônàlizætiøn') as $char) {
echo ord($char) . "\n";
$english[] = ord($char);
}
echo "Max value english = " . max($english) . "\n";
$nonsense = array();
echo "\n\nbase64:\n";
foreach (str_split(base64_decode('Not base64 encoded',true)) as $char) {
echo ord($char) . "\n";
$nonsense[] = ord($char);
}
echo "Max nonsense = " . max($nonsense) . "\n";
?>
Results:
Max value english = 195
Max nonsense = 233
So you may do something like this:
if ( $maxDecodedValue > 200 ) {} //decoded string is Garbage - original string not base64 encoded
else {} //decoded string is useful - it was base64 encoded
You should probably use the mean() of the decoded values instead of the max(), I just used max() in this example because there is sadly no built-in mean() in PHP. What measure you use (mean,max, etc) against what threshold (eg 200) depends on your estimated usage profile.
In conclusion, the only winning move is not to play. I'd try to avoid having to discern base64 in the first place.

function is_base64_encoded($data)
{
if (preg_match('%^[a-zA-Z0-9/+]*={0,2}$%', $data)) {
return TRUE;
} else {
return FALSE;
}
};
is_base64_encoded("iash21iawhdj98UH3"); // true
is_base64_encoded("#iu3498r"); // false
is_base64_encoded("asiudfh9w=8uihf"); // false
is_base64_encoded("a398UIhnj43f/1!+sadfh3w84hduihhjw=="); // false
http://php.net/manual/en/function.base64-decode.php#81425

I had the same problem, I ended up with this solution:
if ( base64_encode(base64_decode($data)) === $data){
echo '$data is valid';
} else {
echo '$data is NOT valid';
}

Better late than never: You could maybe use mb_detect_encoding() to find out whether the encoded string appears to have been some kind of text:
function is_base64_string($s) {
// first check if we're dealing with an actual valid base64 encoded string
if (($b = base64_decode($s, TRUE)) === FALSE) {
return FALSE;
}
// now check whether the decoded data could be actual text
$e = mb_detect_encoding($b);
if (in_array($e, array('UTF-8', 'ASCII'))) { // YMMV
return TRUE;
} else {
return FALSE;
}
}
UPDATE For those who like it short
function is_base64_string_s($str, $enc=array('UTF-8', 'ASCII')) {
return !(($b = base64_decode($str, TRUE)) === FALSE) && in_array(mb_detect_encoding($b), $enc);
}

We can combine three things into one function to check if given string is a valid base 64 encoded or not.
function validBase64($string)
{
$decoded = base64_decode($string, true);
$result = false;
// Check if there is no invalid character in string
if (!preg_match('/^[a-zA-Z0-9\/\r\n+]*={0,2}$/', $string)) {$result = false;}
// Decode the string in strict mode and send the response
if (!$decoded) {$result = false;}
// Encode and compare it to original one
if (base64_encode($decoded) != $string) {$result = false;}
return $result;
}

I was about to build a base64 toggle in php, this is what I did:
function base64Toggle($str) {
if (!preg_match('~[^0-9a-zA-Z+/=]~', $str)) {
$check = str_split(base64_decode($str));
$x = 0;
foreach ($check as $char) if (ord($char) > 126) $x++;
if ($x/count($check)*100 < 30) return base64_decode($str);
}
return base64_encode($str);
}
It works perfectly for me.
Here are my complete thoughts on it: http://www.albertmartin.de/blog/code.php/19/base64-detection
And here you can try it: http://www.albertmartin.de/tools

base64_decode() will not return FALSE if the input is not valid base64 encoded data. Use imap_base64() instead, it returns FALSE if $text contains characters outside the Base64 alphabet
imap_base64() Reference

Here's my solution:
if(empty(htmlspecialchars(base64_decode($string, true)))) {
return false;
}
It will return false if the decoded $string is invalid, for example: "node", "123", " ", etc.

$is_base64 = function(string $string) : bool {
$zero_one = ['MA==', 'MQ=='];
if (in_array($string, $zero_one)) return TRUE;
if (empty(htmlspecialchars(base64_decode($string, TRUE))))
return FALSE;
return TRUE;
};
var_dump('*** These yell false ***');
var_dump($is_base64(''));
var_dump($is_base64('This is definitely not base64 encoded'));
var_dump($is_base64('node'));
var_dump($is_base64('node '));
var_dump($is_base64('123'));
var_dump($is_base64(0));
var_dump($is_base64(1));
var_dump($is_base64(123));
var_dump($is_base64(1.23));
var_dump('*** These yell true ***');
var_dump($is_base64(base64_encode('This is definitely base64 encoded')));
var_dump($is_base64(base64_encode('node')));
var_dump($is_base64(base64_encode('123')));
var_dump($is_base64(base64_encode(0)));
var_dump($is_base64(base64_encode(1)));
var_dump($is_base64(base64_encode(123)));
var_dump($is_base64(base64_encode(1.23)));
var_dump($is_base64(base64_encode(TRUE)));
var_dump('*** Should these yell true? Might be edge cases ***');
var_dump($is_base64(base64_encode('')));
var_dump($is_base64(base64_encode(FALSE)));
var_dump($is_base64(base64_encode(NULL)));

May be it's not exactly what you've asked for. But hope it'll be usefull for somebody.
In my case the solution was to encode all data with json_encode and then base64_encode.
$encoded=base64_encode(json_encode($data));
this value could be stored or used whatever you need.
Then to check if this value isn't just a text string but your data encoded you simply use
function isData($test_string){
if(base64_decode($test_string,true)&&json_decode(base64_decode($test_string))){
return true;
}else{
return false;
}
or alternatively
function isNotData($test_string){
if(base64_decode($test_string,true)&&json_decode(base64_decode($test_string))){
return false;
}else{
return true;
}
Thanks to all previous answers authors in this thread:)

Usually a text in base64 has no spaces.
I used this function which worked fine for me. It tests if the number of spaces in the string is less than 1 in 20.
e.g: at least 1 space for each 20 chars --- ( spaces / strlen ) < 0.05
function normalizaBase64($data){
$spaces = substr_count ( $data ," ");
if (($spaces/strlen($data))<0.05)
{
return base64_decode($data);
}
return $data;
}

Your best option is:
$base64_test = mb_substr(trim($some_base64_data), 0, 76);
return (base64_decode($base64_test, true) === FALSE ? FALSE : TRUE);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Exception on php mb_ereg_match - php

Related

Url validation with preg_match

What does "ÿþ" in the content of an URL mean? [duplicate]

Reading an email's subject in Unicode out of an IMAP server

Why won't preg_match work?

Detect base64 encoding in PHP?

Categories

Resources