Preg_matching email and looking for specific text - php

I have a bunch of emails that I read as text in my program and they all have phone numbers such as these:
+370 655 54298
+37065782505
37069788505
865782825
65782825
(686) 51852
How would I go about finding them and saving it into a variable?
For now I am doing it like this:
$found = preg_match('^[0-9\-\+]{9,15}^', $text, $num);
But it does not working at all

Have a look at the "libphonenumber" Google Library.
There are two functions you may find useful
isPossibleNumber - quickly guessing whether a number is a possible phonenumber by using only the length information, much faster than a full validation.
isValidNumber - full validation of a phone number for a region using length and prefix information.

This should work https://regex101.com/r/E2PzRN/2
#\+?\(?\d+\)?\s?\d+\s?\d+#
<?php
$regex = '#\+?\(?\d+\)?\s?\d+\s?\d+#';
$x = [
'+370 655 54298',
'+37065782505',
'37069788505',
'865782825',
'hjtgfjtdfjtgdfjt',
'65782825',
'(686) 51852',
];
foreach ($x as $y) {
if (preg_match($regex, $y, $match)) {
echo $match[0] . "\n";
}
}
Check it in action here https://3v4l.org/6AlQa

We distinguish here 3 types of phone numbers.
The first type is this one:
+37065782505
37069788505
865782825
65782825
Here, the beginning + is optional. we thus consider that we have 7 digits minimum for these numbers.
The regular expression obtained is therefore
(\+?[0-9]{7,})
The second type is this one:
+370 655 54298
Here we have a first block consisting of a + followed by 2 to 6 digits and then several other blocks of 2 to 6 digits and separated by spaces.
The regular expression obtained is therefore
(\+[0-9]{2,6}(\s[0-9]{2,6})+)
The last type is this one:
(686) 51852
This is a first block consisting of 2 to 6 digits surrounded by parentheses and then several other blocks of 2 to 6 digits and separated by spaces.
The regular expression obtained is therefore
(\([0-9]{2,6}\)(\s[0-9]{2,6})+)
The complete extraction code is therefore
preg_match_all("#(\+?[0-9]{7,})|(\+[0-9]{2,6}(\s[0-9]{2,6})+)|(\([0-9]{2,6}\)(\s[0-9]{2,6})+)#",$text,$out);
$found = $out[0];
where $found is an array.

I would suggest stripping out '+','(',')',' ' and testing if it is a ctype_digit
remove all characters and test if numeric, this assumes that the result is a phone no, if you were to run this on an email address the result would be false
var_dump(ctype_digit(str_replace([' ', '+', '(', ')'], '', '(686) 51852')));
TRUE
var_dump(ctype_digit(str_replace([' ', '+', '(', ')'], '', 'r#pm.mr')));
FALSE

Related

Match/extract all characters between 2 strings

I want to extract John Doe from the string \n*DRIVGo*\nVolledige naam: John Doe\nTelefoonnummer: 0612345678\nIP: 94.214.168.86\n
So I guess the regex pattern needs to extract all characters between 'Volledige naam:' and '\n'. Is there anyone who can help me out?
You may use this regex to capture the name in group 1,
naam:\s+([a-zA-Z ]+)
As the name can only contain alphabets and spaces hence use of [a-zA-Z ]+ charset.
Php sample codes,
$str = "\n*DRIVGo*\nVolledige naam: John Doe\nTelefoonnummer: 0612345678\nIP: 94.214.168.86\n";
preg_match('/naam:\s+([a-zA-Z ]+)/', $str, $matches);
print_r($matches[1]);
Prints,
John Doe
Online demo
You can use
^Volledige naam:\s*\K.+
in multiline mode. That is
^ # start of line
Volledige naam:\s*\K # Volledige naam:, whitespaces and "forget" what#s been matched
.+ # rest of the line
In PHP:
<?php
$string = <<<DATA
*DRIVGo*
Volledige naam: John Doe
Telefoonnummer: 0612345678
IP: 94.214.168.86
DATA;
$regex = '~^Volledige naam:\s*\K.+~m';
if (preg_match($regex, $string, $match)) {
print_r($match);
}
?>
See a demo on ideone.com as well as on regex101.com.
The required string exists constantly at indexOf(':') and ends at the same call using the previously obtained value of indexOf as the offset in the subsequent call. (Given that the first call doesn't indicate that the result was not found and also that result of the send call [which would indicate the complete segment is not contained in the string])
Using a regular expression for this seems less useful because the source string will not varry in some way which requires automata.
Consider a simple split('\n') operation [optionally given a length of matches to obtain] which can be followed by further such calls if necessary to obtain the desired value without the need of any underlying engine.
The logic provided would be the same as a Regex is doing for you with it's underlying implementation although the associated cost both in terms of memory and performance is usually only justified for certain scenarios [for instance involving code page or locale conversions but not limited to, another case would be finding words with incorrect Declension, Punctuation etc.] which in this case do not seem to be needed.
Consider a parser construct with fields and methods that can obtain [point to] and also verify the integrity of the data when requires; This will also allow you to quickly serialize and deserialize the results in most cases.
Finally since you indicated your language is PHP I figured I should also let you know that equivalent of indexOf is strpos and the following code will demonstrate various ways to solve this problem without the use of regex.
$str = "\n*DRIVGo*\nVolledige naam: John Doe\nTelefoonnummer: 0612345678\nIP: 94.214.168.86\n";
$search = chr(10);
$parts = explode($search, $str);
$partsCount = count($parts);
print_r($parts);
if($partsCount > 1) print($parts[1]); //*DRIVGo*
print('-----Same results via different methodology------');
$groupStart = 0;
$groupEnd = $groupStart;
$max = strlen($str);
//While the groupEnd has not approached the length of str
while($groupEnd <= $max &&
($groupStart = strpos($str, $search, $groupStart)) >= 0 && // find search in str starting at groupStart, assign result to groupStart
($groupEnd = strpos($str, $search, $groupEnd + 1)) > $groupStart) // find search in str starting at groupEnd + 1, assign result to groupEnd
{
//Show the start, end, length and resulting substring
print_r([$groupStart, $groupEnd, $groupEnd - $groupStart, substr($str, $groupStart, $groupEnd - $groupStart)]);
//advance the parsing
$groupStart = $groupEnd;
}

check if array has this value and then grab it and place it an a variable

I have a string like the one below
20Nov 18:14:xxxxxxxxxx has given 10 points to xxxxx. New bitcoin collection Balance:XXXXXXXX. Ref:675743957424
I will explode it and it will then be turned into an array.
But I want to check if the array has Ref:675743957424 and then place it inside a variable like for example $a.
I want to do this since the string might change from one point to another so the position of Ref is not fixed.
How Can i obtain such thing?
Thanks.
Edited
I tried not exploding it but instead try grabbing the data see code below
<?php
$line = "20Nov 18:14:xxxxxxxxxx has given 10 points to xxxxx. New bitcoin collection Balance:XXXXXXXX. Ref:675743957424";
// perform a case-Insensitive search for the word "Vi"
if (preg_match("/\bRef\b/i", $line, $match)) :
print "Match found!";
//how can I grab the Ref part?
endif;
?>
You have to use:
preg_match ('/Ref:[\d]*/', $line, $matches);
The matches will be saved to variable $matches and then you can operate with said matches.
The RegExp, you just need to look for string Ref: followed by any amount of numbers (\d looks for any digit and * looks for zero or more ocurrences of the previous operator, digits in this case).
If you know the exact number of digits that you must to find and it is not varying you could use the pattern {NUMBER}, like:
preg_match ('/Ref:[\d]{12}/', $line, $matches);
This case, you are looking for 12 digits after Ref:.
You can use strpos() to check whether the substring present in the string. If it is true, you can assign that to your variabble. Pleas see the below code, it may help you.
$line = "20Nov 18:14:xxxxxxxxxx has given 10 points to xxxxx. New bitcoin collection Balance:XXXXXXXX. Ref:675743957424";
$string_to_check ='Ref:675743957424'
if (strpos($line,$string_to_check) !== false) { //Ref is present
$a = $line;
}

Weird whitespace error in PHP

I have phone numbers that I want to format
And I have a pattern matcher that breaks down the numbers into a 10 digit format, and then applies dashes.
It works most of the time. However Im having an issue with certain numbers.
$trimmed = trim(preg_replace('/\s+/', '', $v->cust_num));
$tendigit = str_replace(array( '(', ')','-',' ' ), '', $trimmed);
$num = substr($tendigit,0,3)."-".substr($tendigit,3,3)."-".substr($tendigit,6,4);
This will change (555)555 5555, or 555-555 5555 or 5555555555 or (555)-555-5555 or 555-555-5555
to my format of 555-555-5555
However, I came across a few entries in my database, that dont seem to want to change.
One of the bad entries is this one. It contains two white spaces infront of the 4.
4-035-0100
When it runs through $trimmed, and I output $tendigit...it outputs
40350100
as expected. But then when I apply $num to it. It goes back to
4-035-0100
I would at least expect it to be
403-501-00
It seems there is some hidden whitespace in it, that my preg_replace, trim, and str_replace are not attacking.
Any ideas??
Thanks
The code below works, I have tried it with the special characters we discovered in the comments. Basically, the regex removes everything that isnt a number (0-9) and then uses your original formatting.
$trimmed = preg_replace('/\D+/', '', $v->cust_num);
$num = substr($trimmed,0,3)."-".substr($trimmed,3,3)."-".substr($trimmed,6,4);
You can condense your code a little:
$tendigit = preg_replace('/[^\d]/', '', $v->cust_num);
$num = substr($tendigit,0,3)."-".substr($tendigit,3,3)."-".substr($tendigit,6,4);
Though, you should add in some conditions to check that the phone number actually has 10 digits too:
$tendigit = preg_replace('/[^\d]/', '', $v->cust_num);
if(strlen($tendigit == 10)){
$num = substr($tendigit,0,3)."-".substr($tendigit,3,3)."-".substr($tendigit,6,4);
} else {
// catch your error here, eg 'please enter 10 digits'
}
The first line removes any 'non-digit' [^\d].
The conditional statement checks if the $tendigit variable has 10 digits in it.
If it does, then it uses your code to parse and format.
If it doesnt, then you can catch an error.

PHP preg_match for dimension

I was searching for hours to get a regular expression for matching dimension in a string.
Consider following types of strings,
120x200
100' X 130'
4 acres
0.54
0.488 (90x223)/ GIS0.78
90x160
100x149.7
143.76 X 453.52
6.13 per tax bill
120x378 per tax roll
I want the O/P contain only dimensions, even with 'X' or 'x'
From the above string, the expected output is,
120x200,100' X 130',0,0,90x223,90x160,100x149.7,143.76 X 453.52,0,120x378
Is there any possible reg-ex? or any other alternative?
Thanks
This seems to work:
<?php
$str = <<<EOD
120x200
100' X 130'
4 acres
0.54
0.488 (90x223)/ GIS0.78
90x160
100x149.7
143.76 X 453.52
6.13 per tax bill
120x378 per tax roll
EOD;
$lines = explode("\n", $str);
foreach($lines as $line) {
if (preg_match('/-?(\d+(?:\.\d+)?(\'|ft|yd|m|")?)\s*x\s*-?(\d+(?:\.\d+)?(?:\\2)?)/i', $line, $match)) {
echo "{$match[1]}x{$match[3]}\n";
} else {
echo "0\n";
}
}
You can add more units of measurement into the 3rd parenthesized expression if you want to match more things, but this matches whole numbers, real numbers, and optional units of measurement after.
This should get you started:
\d+(\.\d+)?['"]?\s*[xX]\s*\d+(\.\d+)?['"]?
demo : http://regexr.com?386sf
However,it does not cover all of your cases,as they are too customized to be handled by regex.
I would recommend using a customized method to parse the input with different cases.

Validate measurements with PHP

I need to validate measurements entered into a form generated by PHP.
I intend to compare them to upper and lower control limits and decide if they fail or pass.
As a first step, I imagine a PHP function which accepts strings representing engineering measurements and converts them to pure numbers before the comparison.
At the moment I'm only expecting measurements of small voltages and currents, so strings like
'1.234uA', '2.34 nA', '39.9mV'. or '-1.003e-12'
will be converted to
1.234e-6, 2.34e-9, 3.99e-2 and -1.003e-12, respectively.
But the method should be generalisable to any measured quantity.
function convert($value) {
$units = array('p' => 'e-12',
'n' => 'e-9',
'u' => 'e-6',
'm' => 'e-3');
$unitstring = implode("", array_keys($units));
$matches = array();
$pattern = "/^(-?(?:\\d*\.\\d+)|(?:\\d+))\s*([$unitstring])([a-z])$/i";
$result = preg_match($pattern, $value, $matches);
if ($result)
$retval = $matches[1].$units[$matches[2]].$matches[3];
else
$retval = $value;
return $retval;
}
So to explain what the above does:
$units is an array to map unit-prefix to the exponent.
$unitstring conglomerates the units into a single string (in the example it would be 'pnum')
The regular expression will match an optional -, followed by either 0 or more digits, a period and 1 or more digits OR 1 or more digits, followed by one of the unit prefixes (only one) and then a single alphabetical character. There can be any amount of whitespace between the number and the units.
Because of the parethesis and the use of preg_match, the number section, the unit prefix, and the unit are all separately captured in the array $matches as elements 1, 2, and 3. (0 will contain the entire string)
$result will be 1 if it matched the regex, 0 otherwise.
$retval is constructed by just connecting the number, the exponent (based on the unit prefix from the array) and the units provided, or it will just be the passed in string (such as if you're given the -1.003e-12, it will be returned)
Of course you can tweak some things, but in general this is a good start. Hope it helps.
In your function
first you need to initialize values for units like -6 for u, -3 for m...etc
divide the string in Number and Unit(i.e micro(u),mili(m),etc).
and then say the entered no is NUM; and unit is UNIT..(char like u,m etc);
while(NUM>10)
{
NUM=NUM/10;
x++; //x is keeping track of the DOT.
}
UNIT=UNIT+x; //i.e UNIT is increased(for M,K,etc) or decreased(for u,m,etc)
echo NUM.e.UNIT;
May be it will do!
My own possibly simple-minded approach has been to use an array of patterns in preg_replace
function convert($value) {
$result = preg_replace($patterns, $replacements, $value);
return $result;
}
Where
$patterns = array('/p[av]/i', '/n[av]/i', '/u[av]/i', '/m[av]/i');
$replacements = array('e-12', 'e-9', 'e-6', 'e-3');
And it could be extended to higher prefixes, but it seems heavy-handed to keep adding increasingly complex regexes to the $patterns array.
Edit: The comparison, later, should interpret the return value as a real number.
I'm hoping someone can suggest something more elegant.

Categories