I try to check, if there is a custom security number in an ocr string with a regex. I never know the correct number. I just want to find out if a custom security number exists in a string.
For example:
// 81130790K038 (always 12 characters, 8x 0-9, 1x A-Z, 3x 0-9)
Here is what i have, but it dont match the correct ssn:
// At the end there is: " 8 11 30 7 90K0 38 "
$example_str = "SGA/SIE 405801/11700/69 Personal? Nr IGe?urlsdalun smfltpgngen?mßxllige .If .lr Juni 2 01 8 0 3 .0 7 .2 01 8 Blatt 1 0 0 0 69 1 3,0 7190 4 . roilcfessmn Fretbetrag am. Frerbetrag mtl.? DSA Glettzone lät.ng? VJ Ur1 üb. unAnspl. Url.Tg.gen. IResturlaub SV-Nummer .kk oJ 8 11 30 7 90K0 38 ";
if(preg_match('/([\s]{1})([0-9\s]{8,16})([A-Z0-9a-z]{1})([0-9\s]{3,6})([\s]{1})/', $example_str, $matches, PREG_OFFSET_CAPTURE)
{
print_r($matches);
}
Result:
Array
(
[0] Array
(
[0] 1 0 0 0 69 1
[1] 133
)
[1] Array
(
[0]
[1] 133
)
[2] Array
(
[0] 1 0 0 0
[1] 134
)
[3] Array
(
[0] 6
[1] 142
)
[4] Array
(
[0] 9 1
[1] 143
)
[5] Array
(
[0]
[1] 146
)
)
Can someone help me with the regex, please?
Best regards,
olli
UPDATE: preg_match_all helps me, thank you.
According to the format, you can use this pattern:
(?<!\S)(?:[0-9] *){8}[A-Z](?: *[0-9]){3}(?!\S)
(?<!\S) checks there isn't a character that isn't a whitespace before and (?!\S) does the same after.
If you want you can replace each literal space with \h (character class for horizontal white-spaces) or \s for any kind of white-spaces.
Related
I need to figure out a method using PHP to chunk the 1's and 0's into sections.
1001 would look like: array(100,1)
1001110110010011 would look like: array(100,1,1,10,1,100,100,1,1)
It gets different when the sequence starts with 0's... I would like it to segment the first 0's into their own blocks until the first 1 is reached)
00110110 would look like (0,0,1,10,1,10)
How would this be done with PHP?
You can use preg_match_all to split your string, using the following regex:
10*|0
This matches either a 1 followed by some number of 0s, or a 0. Since a regex always tries to match the parts of an alternation in the order they occur, the second part will only match 0s that are not preceded by a 1, that is those at the start of the string. PHP usage:
$beatstr = '1001110110010011';
preg_match_all('/10*|0/', $beatstr, $m);
print_r($m);
$beatstr = '00110110';
preg_match_all('/10*|0/', $beatstr, $m);
print_r($m);
Output:
Array
(
[0] => Array
(
[0] => 100
[1] => 1
[2] => 1
[3] => 10
[4] => 1
[5] => 100
[6] => 100
[7] => 1
[8] => 1
)
)
Array
(
[0] => Array
(
[0] => 0
[1] => 0
[2] => 1
[3] => 10
[4] => 1
[5] => 10
)
)
Demo on 3v4l.org
I have a string and I want to match a specific pattern optionally as many times as may occur.
My String
0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL
After 45 until $595 There could be upto 6 more number there. How can I optionally look for repeating number in that space?
Here's what I have so far:
/([\d.]+) ([\d.]+) ([\d.]+)? (\d+) (\d+) (\d+) \$(\d+)/ig
Here are some samples with expected outputs:
0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL
output: array([0] => 0.91,
[1] => 0.45,
[2] => 0.69,
[3] => 58,
[4] => 47,
[5] => 45,
[6] => 23,
[7] => 83,
[8] => 90,
[9] => 595)
0.91 0.45 0.69 58 47 45 $595 NO IDL
output: array([0] => 0.91,
[1] => 0.45,
[2] => 0.69,
[3] => 58,
[4] => 47,
[5] => 45,
[5] => 595)
0.91 0.45 0.69 0.63 58 47 45 $595 NO IDL
output: Does not match the pattern because we only want 3 of the first items to contain decimals.
This seems to split the last number into multiple numbers. Can't figure out whats going on.
I am using php preg_match method for this so would like not empty elements in the resulting array if possible. Thanks.
You may validate the string with a positive lookahead triggered at the start of the string, and then match all numbers from the start up to the currency value once the validation succeeds:
'~(?:\G(?!^)|^(?=\d+\.\d+ \d+\.\d+ \d+(?:\.\d+)?(?: \d+)* \$\d))\s*\$?\K\d+(?:\.\d+)?~'
See the regex demo
Details
(?:\G(?!^)|^(?=\d+\.\d+ \d+\.\d+ \d+(?:\.\d+)?(?: \d+)* \$\d)) - either the end of the previous match (\G(?!^)) or start of a string (^) that is followed with
\d+\.\d+
- a space
\d+\.\d+
- a space
\d+ - 1+ digits
(?:\.\d+)? - an optional fractional part
(?: \d+)* - 0+ sequences of a space followed with 1+ digits
- space
\$\d - a $ and a digit.
\s* - 0+ whitespaces
\$? - an optional $ char
\K - match reset operator
\d+(?:\.\d+)? - an int/float number (1+ digits followed with an optional sequence of . and 1+ digits).
PHP demo:
$strs = ['0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL','0.91 0.45 0.69 58 47 45 $595 NO IDL','0.91 0.45 0.69 0.63 58 47 45 $595 NO IDL'];
$rx = '~(?:\G(?!^)|^(?=\d+\.\d+ \d+\.\d+ \d+(?:\.\d+)?(?: \d+)* \$\d))\s*\$?\K\d+(?:\.\d+)?~';
foreach ($strs as $s) {
echo "$s:\n";
if (preg_match_all($rx, $s, $matches)) {
print_r($matches[0]);
echo "---------\n";
} else {
echo "NO MATCH!!!\n---------\n";
}
}
Output:
0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL:
Array
(
[0] => 0.91
[1] => 0.45
[2] => 0.69
[3] => 58
[4] => 47
[5] => 45
[6] => 23
[7] => 83
[8] => 90
[9] => 595
)
---------
0.91 0.45 0.69 58 47 45 $595 NO IDL:
Array
(
[0] => 0.91
[1] => 0.45
[2] => 0.69
[3] => 58
[4] => 47
[5] => 45
[6] => 595
)
---------
0.91 0.45 0.69 0.63 58 47 45 $595 NO IDL:
NO MATCH!!!
---------
This should give you the expected results:
/([\d\$.]+)/ig
You might repeat the amount of numbers until you matched 45 which is the 6th number.
Explanation
(?:\d+\.\d+)(?: \d+\.\d+){2} Match the number at the start (digit with an decimal part) 3 times
(?: \d+){3} Match a digit with a whitespace 3 times. That will match up till 45
\s* Match zero or more whitespace characters
| Or
\G(?!^) Assert the position at the end of the previous match using a negative lookahead to assert not start of the string
(\d+)\s Capture the digits and match the whitespace in a capturing group
(?:\d+\.\d+)(?: \d+\.\d+){2}(?: \d+){3}\s*|\G(?!^)(\d+)\s
Regex demo
For example a demo to extract the 3 digits after 45:
Demo
I am facing a problem i am not capable to solve. I have a string consisting of not needed text and 10 digit numbers who always start with "2" or "6". I need to get those in 10digit numbers into an array. I thought of regex and found this article Regular Expression for matching a numeric sequence? which is pretty close to what i need (except the descending/ascending thing) yet, as i could never and will NEVER be able to understand regex, i cant modify to my needs. If anyone could help me out here i would highly appreciate it!
Here is a sample of my string:
".........693 7098469 - ZQH X Bop. Hrtepou 50 flerpoUrroXn ........210 5014166 - 0E000PA E KapaoAn Anpn-rPou 21
EAArivtg .....................................................210 9618677 - MAPIA KapaoAri Arpn-rptou 21 Elanvolo .. 210 9643623 - MAPIA E ...................................................... 210 9643887 - MAPIA 0 loucrrivou 8 HX.toOrran ..............210 9914534 AIPITAKHE APTEMIOE n Avrtnopou 22
Reptcrrept ....._.........._......._................697 7440896 , -10AN."
Thank you very much in advance!
Greetings from Greece!
As I see your string your digits have an space between, and if you want strictly make your selections this is the regex:
[62]\d{2}\s*\d{7}
Explanation:
[62] # Start with 6 or 2
\d{2} # 2 more digits
\s* # any number of white spaces
\d{7} # 7 more digits
Live demo
and PHP code which has preg_match_all to match all occurrences of those strings:
preg_match_all("/[62]\d{2}\s*\d{7}/", $text, $matches);
Output:
Array
(
[0] => 693 7098469
[1] => 210 5014166
[2] => 210 9618677
[3] => 210 9643623
[4] => 210 9643887
[5] => 210 9914534
[6] => 697 7440896
)
PHP live demo
Maybe like this:
<?php
$x=
".........693 7098469 - ZQH X Bop. Hrtepou 50 flerpoUrroXn ........210 5014166 - 0E000PA E KapaoAn Anpn-rPou 21 EAArivtg ....................................................210 9618677 - MAPIA KapaoAri Arpn-rptou 21 Elanvolo .. 210 9643623 - MAPIA E ...................................................... 210 9643887 - MAPIA 0 loucrrivou 8 HX.toOrran ..............210 9914534 AIPITAKHE APTEMIOE n Avrtnopou 22
Reptcrrept ....._.........._......._................697 7440896 , -10AN.";
$x=str_replace(' ','',$x);
preg_match_all('/((2|6)\d{9})/',$x,$matches);
print_r($matches[0]);
And the result:
Array
(
[0] => 6937098469
[1] => 2105014166
[2] => 2109618677
[3] => 2109643623
[4] => 2109643887
[5] => 2109914534
[6] => 6977440896
)
there is a pretty cool page, that visualize the regex code for better understading:
https://www.debuggex.com/
this should work
((?:2|6)[0-9]{2} [0-9]{7})
I am trying to group bunch of texts from a string and create an array for it.
The string is something like this:
<em>string</em> and the <em>test</em> here.
tableBegin rowNumber:2, columnNumber:2 11 22 33 44 tableEnd
<em>end</em> text here
I was hoping to get an array like the following results
array (0 => '<em>string</em> and the <em>test</em> here.',
1=>'rowNumber:5',
2=>'columnNumber:3',
3=>'11',
4=>'22',
5=>'33',
6=>'44'
7=>'<em>end</em> text here')
11,22,33,44 are the table cell data the user enters. I want to make them have unique index but keep the rest of texts together.
tableBegin and tableEnd are just the check for the table cell data
Any help or tips? Thanks a lot!
You may try the following, note that you need PHP 5.3+:
$string = '<em>string</em> and the <em>test</em> here.
tableBegin rowNumber:2, columnNumber:2 11 22 33 44 tableEnd
SOme other text
tableBegin rowNumber:3, columnNumber:3 11 22 33 44 55 tableEnd
<em>end</em> text here';
$array = array();
preg_replace_callback('#tableBegin\s*(.*?)\s*tableEnd\s*|.*?(?=tableBegin|$)#s', function($m)use(&$array){
if(isset($m[1])){ // If group 1 exists, which means if the table is matched
$array = array_merge($array, preg_split('#[\s,]+#s', $m[1])); // add the splitted string to the array
// split by one or more whitespace or comma --^
}else{// Else just add everything that's matched
if(!empty($m[0])){
$array[] = $m[0];
}
}
}, $string);
print_r($array);
Output
Array
(
[0] => string and the test here.
[1] => rowNumber:2
[2] => columnNumber:2
[3] => 11
[4] => 22
[5] => 33
[6] => 44
[7] => SOme other text
[8] => rowNumber:3
[9] => columnNumber:3
[10] => 11
[11] => 22
[12] => 33
[13] => 44
[14] => 55
[15] => end text here
)
Regex explanation
tableBegin : match tableBegin
\s* : match a whitespace zero or more times
(.*?) : match everything ungreedy and put it in group 1
\s* : match a whitespace zero or more times
tableEnd : match tableEnd
\s* : match a whitespace zero or more times
| : or
.*?(?=tableBegin|$) : match everything until tableBegin or end of line
The s modifier : make dots also match newlines
Here is the ugly way to do it, if you can't find a Regex guru out ther.
So, this is your text
$string = "<em>string</em> and the <em>test</em> here.
tableBegin rowNumber:2, columnNumber:2 11 22 33 44 tableEnd
<em>end</em> text here";
And this is my code
$E = explode(' ', $string);
$A = $E[0].$E[1].$E[2].$E[3].$E[4].$E[5];
$B = $E[17].$E[18].$E[19];
$All = [$A, $E[8],$E[9], $E[11], $E[12], $E[13], $E[14], $B];
print_r($All);
And this is the output
Array
(
[0] => stringandthetesthere.
[1] => rowNumber:2,
[2] => columnNumber:2
[3] => 11
[4] => 22
[5] => 33
[6] => 44
[7] => endtexthere
)
off-course, the <em> tags won't be visible, unless view the source code.
I am attempting to use RegEx to strip down the following data:
mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109
I am hoping to split it apart by home team (first city), home score (first digit), away team (second city), away score (second digit), and where in the game it is (in parenthesis). This is the RegEx I have currently, but am feeling is very wrong.
preg_match_all('/mlb_s_left[0-9]=(?P<hometeam>.*?) (?P<homescore>.*?) (?P<awayteam>.*?) (?P<awayscore>.*?)\((?P<time>.*?)\)/', $content, $matches);
I would appreciate any and all help in getting this working.
I have tested following code snippet in php 5.4.5:
<?php
$foo = 'mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)&mlb_s_right1_1=W: Hughes L: Britton&mlb_s_right1_count=1&mlb_s_url1=http://sports.espn.go.com/mlb/boxscore?gameId=320801110&mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)&mlb_s_right2_1=W: Peavy L: Diamond S: Reed&mlb_s_right2_count=1&mlb_s_url2=http://sports.espn.go.com/mlb/boxscore?gameId=320801109';
preg_match_all('/mlb_s_left\d=\^?(?P<hometeam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<homescore>\d+)\s+\^?(?P<awayteam>[a-zA-Z]+(?:\s+[a-zA-Z]+)*)\s+(?P<awayscore>\d+)\s+\((?P<time>\w+)\)/', $foo, $matches, PREG_SET_ORDER);
print_r($matches);
?>
output:
Array
(
[0] => Array
(
[0] => mlb_s_left1=Baltimore 3 ^NY Yankees 12 (FINAL)
[hometeam] => Baltimore
[1] => Baltimore
[homescore] => 3
[2] => 3
[awayteam] => NY Yankees
[3] => NY Yankees
[awayscore] => 12
[4] => 12
[time] => FINAL
[5] => FINAL
)
[1] => Array
(
[0] => mlb_s_left2=^Chicago Sox 3 Minnesota 2 (FINAL)
[hometeam] => Chicago Sox
[1] => Chicago Sox
[homescore] => 3
[2] => 3
[awayteam] => Minnesota
[3] => Minnesota
[awayscore] => 2
[4] => 2
[time] => FINAL
[5] => FINAL
)
)
Something like this should get you close.
preg_match_all('/mlb_s_left\d+=(?P<hometeam>\D+)\s+(?P<homescore>\d+)\s+(?P<awayteam>\D+)\s+(?P<awayscore>\d+)\s*\((?P<time>[^)]+)\)/',
$content, $matches);
Note that \d matches any digit, and \D matches anything that is not a digit.
[^)]+ matches one or more non-close parens characters; \s+ matches one or more whitespace chars, and \s* matches zero or more whitespace characters.
This wouldn't work very well if you have a city name with a number in it, and if you have a huge string, it's possible it could get hung up somewhere; you might consider splitting it up and matching a bit more piecemeal.
Generally speaking I would avoid .*? as a pattern match, as it basically matches almost anything. It's best for your regular expression to be as specific as possible, based on what you know about the data.