I want to parse a mobile number without special character for example
+61-426 861 479 ====> 61 426 861 479
PHP preg_match_all
preg_match_all('/(\d{2}) (\d{3}) (\d{3}) (\d{3})/', $part,$matches);
if (count($matches[0])){
foreach ($matches[0] as $mob) {
$records['mobile'][] = $mob;
}
}
Expected Output
+61-426 861 479 ====> 61 426 861 479
You are missing the + and the - in your pattern. You might update your pattern to use 2 capturing groups and use preg_match_all. To add the mobile number to the array you could concatenate the first and the second index.
\+(\d{2})-(\d{3}(?: \d{3}){2})\b
Regex demo | Php demo
For example
$part = "+61-426 861 478 +61-426 861 479 ";
preg_match_all('/\+(\d{2})-(\d{3}(?: \d{3}){2})\b/', $part, $matches, PREG_SET_ORDER, 0);
if (count($matches)) {
foreach ($matches as $mob) {
$records['mobile'][] = $mob[1] . ' ' . $mob[2];
}
}
print_r($records);
Result
Array
(
[mobile] => Array
(
[0] => 61 426 861 478
[1] => 61 426 861 479
)
)
If the number is the only string, you might also remove all the non digits using \D+ and replace with a space. Then use ltrim to remove the leading space from the +. See a php demo.
Related
I have a string and I want to match a specific pattern optionally as many times as may occur.
My String
0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL
After 45 until $595 There could be upto 6 more number there. How can I optionally look for repeating number in that space?
Here's what I have so far:
/([\d.]+) ([\d.]+) ([\d.]+)? (\d+) (\d+) (\d+) \$(\d+)/ig
Here are some samples with expected outputs:
0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL
output: array([0] => 0.91,
[1] => 0.45,
[2] => 0.69,
[3] => 58,
[4] => 47,
[5] => 45,
[6] => 23,
[7] => 83,
[8] => 90,
[9] => 595)
0.91 0.45 0.69 58 47 45 $595 NO IDL
output: array([0] => 0.91,
[1] => 0.45,
[2] => 0.69,
[3] => 58,
[4] => 47,
[5] => 45,
[5] => 595)
0.91 0.45 0.69 0.63 58 47 45 $595 NO IDL
output: Does not match the pattern because we only want 3 of the first items to contain decimals.
This seems to split the last number into multiple numbers. Can't figure out whats going on.
I am using php preg_match method for this so would like not empty elements in the resulting array if possible. Thanks.
You may validate the string with a positive lookahead triggered at the start of the string, and then match all numbers from the start up to the currency value once the validation succeeds:
'~(?:\G(?!^)|^(?=\d+\.\d+ \d+\.\d+ \d+(?:\.\d+)?(?: \d+)* \$\d))\s*\$?\K\d+(?:\.\d+)?~'
See the regex demo
Details
(?:\G(?!^)|^(?=\d+\.\d+ \d+\.\d+ \d+(?:\.\d+)?(?: \d+)* \$\d)) - either the end of the previous match (\G(?!^)) or start of a string (^) that is followed with
\d+\.\d+
- a space
\d+\.\d+
- a space
\d+ - 1+ digits
(?:\.\d+)? - an optional fractional part
(?: \d+)* - 0+ sequences of a space followed with 1+ digits
- space
\$\d - a $ and a digit.
\s* - 0+ whitespaces
\$? - an optional $ char
\K - match reset operator
\d+(?:\.\d+)? - an int/float number (1+ digits followed with an optional sequence of . and 1+ digits).
PHP demo:
$strs = ['0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL','0.91 0.45 0.69 58 47 45 $595 NO IDL','0.91 0.45 0.69 0.63 58 47 45 $595 NO IDL'];
$rx = '~(?:\G(?!^)|^(?=\d+\.\d+ \d+\.\d+ \d+(?:\.\d+)?(?: \d+)* \$\d))\s*\$?\K\d+(?:\.\d+)?~';
foreach ($strs as $s) {
echo "$s:\n";
if (preg_match_all($rx, $s, $matches)) {
print_r($matches[0]);
echo "---------\n";
} else {
echo "NO MATCH!!!\n---------\n";
}
}
Output:
0.91 0.45 0.69 58 47 45 23 83 90 $595 NO IDL:
Array
(
[0] => 0.91
[1] => 0.45
[2] => 0.69
[3] => 58
[4] => 47
[5] => 45
[6] => 23
[7] => 83
[8] => 90
[9] => 595
)
---------
0.91 0.45 0.69 58 47 45 $595 NO IDL:
Array
(
[0] => 0.91
[1] => 0.45
[2] => 0.69
[3] => 58
[4] => 47
[5] => 45
[6] => 595
)
---------
0.91 0.45 0.69 0.63 58 47 45 $595 NO IDL:
NO MATCH!!!
---------
This should give you the expected results:
/([\d\$.]+)/ig
You might repeat the amount of numbers until you matched 45 which is the 6th number.
Explanation
(?:\d+\.\d+)(?: \d+\.\d+){2} Match the number at the start (digit with an decimal part) 3 times
(?: \d+){3} Match a digit with a whitespace 3 times. That will match up till 45
\s* Match zero or more whitespace characters
| Or
\G(?!^) Assert the position at the end of the previous match using a negative lookahead to assert not start of the string
(\d+)\s Capture the digits and match the whitespace in a capturing group
(?:\d+\.\d+)(?: \d+\.\d+){2}(?: \d+){3}\s*|\G(?!^)(\d+)\s
Regex demo
For example a demo to extract the 3 digits after 45:
Demo
Assuming I have the string variable:
$str = '
[WhiteTitle "GM"]
[WhiteCountry "Cuba"]
[BlackCountry "United States"]
1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6
7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6
12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. Rae1 Nh5 16. Bc1 Kg7
17. Rxe6 Nf6 18. Ne5 c5 19. Bxh6+ Kxh6 20. Nxf7+ 1-0
';
I would like to extract some information from that variable into an array that looks like this:
Array {
["WhiteTitle"] => "GM",
["WhiteCountry"] => "Cuba",
["BlackCountry"] => "United States"
}
Thanks.
Here is a safer and more compact solution:
$re = '~\[([^]["]*?)\s*"([^]"]+)~'; // Defining the regex
$str = "[WhiteTitle \"GM\"]\n[WhiteCountry \"Cuba\"]\n[BlackCountry \"United States\"]\n\n1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6\n7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6\n12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. Rae1 Nh5 16. Bc1 Kg7\n17. Rxe6 Nf6 18. Ne5 c5 19. Bxh6+ Kxh6 20. Nxf7+ 1-0";
preg_match_all($re, $str, $matches); // Getting all matches
print_r(array_combine($matches[1],$matches[2])); // Creating the final array with array_combine
See IDEONE PHP demo, and a regex demo.
Regex details:
\[ - opening [
([^]["]*?) - Group 1 matching 0+ characters other than ", [ and ], as few as possible up to
\s* - 0+ whitespaces (to trim the first value)
" - a double quote
([^]"]+) - Group 2 matching 1+ characters other than ] and "
You can use:
preg_match_all('/\[(.*?) "(.*?)"\]/m', $str, $matches, PREG_SET_ORDER);
print_r($matches);
It will give you all the matches in array, 0 key will be complete match, 1st key will be the first part, and 2nd key will be second part:
Output:
Array
(
[0] => Array
(
[0] => [WhiteTitle "GM"]
[1] => WhiteTitle
[2] => GM
)
[1] => Array
(
[0] => [WhiteCountry "Cuba"]
[1] => WhiteCountry
[2] => Cuba
)
[2] => Array
(
[0] => [BlackCountry "United States"]
[1] => BlackCountry
[2] => United States
)
)
If you want it in the format you asked you can use simple looping for this:
$array = array();
foreach($matches as $match){
$array[$match[1]] = $match[2];
}
print_r($array);
Output:
Array
(
[WhiteTitle] => GM
[WhiteCountry] => Cuba
[BlackCountry] => United States
)
You can use something like;:
<?php
$string = <<< EOF
[WhiteTitle "GM"]
[WhiteCountry "Cuba"]
[BlackCountry "United States"]
1. d4 d5 2. Nf3 Nf6 3. e3 c6 4. c4 e6 5. Nc3 Nbd7 6. Bd3 Bd6
7. O-O O-O 8. e4 dxe4 9. Nxe4 Nxe4 10. Bxe4 Nf6 11. Bc2 h6
12. b3 b6 13. Bb2 Bb7 14. Qd3 g6 15. Rae1 Nh5 16. Bc1 Kg7
17. Rxe6 Nf6 18. Ne5 c5 19. Bxh6+ Kxh6 20. Nxf7+ 1-0
EOF;
$final = array();
preg_match_all('/\[(.*?)\s+(".*?")\]/', $string, $matches, PREG_PATTERN_ORDER);
for($i = 0; $i < count($matches[1]); $i++) {
$final[$matches[1][$i]] = $matches[2][$i];
}
print_r($final);
Output:
Array
(
[WhiteTitle] => "GM"
[WhiteCountry] => "Cuba"
[BlackCountry] => "United States"
)
Ideone Demo:
http://ideone.com/wQYshT
Regex Explanation:
\[(.*?)\s+(".*?")\]
Match the character “[” literally «\[»
Match the regex below and capture its match into backreference number 1 «(.*?)»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\s+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 2 «(".*?")»
Match the character “"” literally «"»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “"” literally «"»
Match the character “]” literally «\]»
I have a file that contains many lines such as these:
2011-03-23 10:11:08 34 57 2 25,5 -
2011-03-23 10:11:12 67 54 3 3,5 -
2011-03-23 10:11:16 76 57 3 2,4 -
2011-03-23 10:11:18 39 41 2 25,5 +
Each line ends with + or -. I'd like the file content to be split after + or - sign. Lines doesn't have same number of characters.
I was trying to read the file using fgets() with auto_detect_line_endings on, but there were still many lines combined into single one:
Output example:
Output should be two lines but there is only one (you can see the "the new line" but PHP doesn't):
2011-03-23 10:11:08 34 57 2 25,5 -
2011-03-23 10:11:12 67 54 3 3,5 -
EDIT:
Code I am using to read the file
ini_set('auto_detect_line_endings', true);
$handle = fopen($filename, "r");
$index = 1;
if ($handle) {
while (($line = fgets($handle)) !== false) {
if (trim($line) != '') {
$data = preg_split('/\s+/', trim($line));
// Saving into DB...
$index++;
}
}
}
fclose($handle);
To make sure you get all of the possible new line combinations you should use preg_split instead:
LF = \n, CR = \r
LF: Multics, Unix and Unix-like systems (GNU/Linux, OS X, FreeBSD, AIX, Xenix, etc.), BeOS, Amiga, RISC OS and others.
CR: Commodore 8-bit machines, Acorn BBC, ZX Spectrum, TRS-80, Apple II family, Mac OS up to version 9 and OS-9
LF+CR: Acorn BBC and RISC OS spooled text output.
CR+LF: Microsoft Windows, DEC TOPS-10, RT-11 and most other early non-Unix and non-IBM OSes, CP/M, MP/M, DOS (MS-DOS, PC DOS, etc.), Atari TOS, OS/2, Symbian OS, Palm OS, Amstrad CPC
The regex would be /(\r\n|\n\r|\n|\r)/ (CR+LF or LF+CR or LF or CR):
$lines = preg_split('/(\r\n|\n\r|\n|\r)/', $string);
DEMO
If you plan on not having any empty lines (lines with white space count as empty) you can add an optional \s* to the end of your regex which will match 0 to an infinite amount of white spaces after your newlines:
$lines = preg_split('/(\r\n|\n\r|\n|\r)\s*/', $string);
DEMO
If you plan on not having any empty lines, but expect lines with white space to not count as empty, you can even simplify the regex:
$lines = preg_split('/[\n\r]+/', $string);
DEMO
TRY THIS:
<?php
$input = "2011-03-23 10:11:08 34 57 2 25,5 -
2011-03-23 10:11:12 67 54 3 3,5 -
2011-03-23 10:11:16 76 57 3 2,4 -
2011-03-23 10:11:18 39 41 2 25,5 +";
// 1st explode by new line
$output = explode("\n", $input);
print_r($output);
// 2nd remove last character
$result = array();
foreach($output as $op)
{
$result[] = substr($op, 0, -1);
}
print_r($result);
OUTPUT:
Array
(
[0] => 2011-03-23 10:11:08 34 57 2 25,5 -
[1] => 2011-03-23 10:11:12 67 54 3 3,5 -
[2] => 2011-03-23 10:11:16 76 57 3 2,4 -
[3] => 2011-03-23 10:11:18 39 41 2 25,5 +
)
Array
(
[0] => 2011-03-23 10:11:08 34 57 2 25,5
[1] => 2011-03-23 10:11:12 67 54 3 3,5
[2] => 2011-03-23 10:11:16 76 57 3 2,4
[3] => 2011-03-23 10:11:18 39 41 2 25,5
)
DEMO:
http://3v4l.org/0uIe7#v430
How can I find all the numbers that are contained in a string except the ones that have also a letter in them (like A1)?
For example in a String "saddfs 2300 dfsfd 45 A3 A6" I only want to get 2300 and 45.
I know that
preg_match_all('!\d+!', $string, $nums);
can find all numbers, but I dont want to find the numbers from A3,A6 too.
Thanks!
Just use word boundary or string boundaries:
preg_match_all('!(^|\b)\d+(\b|$)!', $string, $nums);
Some tests:
php > preg_match_all('!(^|\b)\d+(\b|$)!', 'saddfs 2300 dfsfd 45 A3 A6', $nums);
php > print_r($nums[0]);
Array
(
[0] => 2300
[1] => 45
)
php > preg_match_all('!(^|\b)\d+(\b|$)!', 'saddfs 2300 dfsfd 45 A3 A6 123', $nums);
php > print_r($nums[0]);
Array
(
[0] => 2300
[1] => 45
[2] => 123
)
php > preg_match_all('!(^|\b)[0-9]+(\b|$)!', '789 saddfs 2300 dfsfd 45 A3 A6 123', $nums);
php > print_r($nums[0]);
Array
(
[0] => 789
[1] => 2300
[2] => 45
[3] => 123
)
UPDATE: changed \d to [0-9] per Zsolt Szilagy's suggestion.
Non-robust, quick-and-dirty -- and wrong -- solution:
$ php -a
Interactive shell
php > preg_match_all('/\W\d+\W/', 'saddfs 2300 dfsfd 45 A3 A6', $matches);
php > print_r($matches);
Array
(
[0] => Array
(
[0] => 2300
[1] => 45
)
)
Update Per Aleks G suggestion, laying out the pitfalls to this solution:
First problem: this fails to match pure numbers at the strict beginning or ending of a string. To do that, follow Aleks G pattern, which puts anchor characters in capturing sub-patterns:
preg_match_all('/(^|\W)\d+(\W|$)/', '2300 df A6 242 sfd 45', $matches);
You could make the pattern non-capturing ('/(?:^|\W)\d+(?:\W|$)/') to signal your intent that the parentheses are for grouping, not for capturing -- but this is purely optional as the values you still want remain in $matches[0].
Second problem: \b and \W are not quite the same thing. \b is a "word boundary" while \W is "not a word character". Compare the result of Aleks G and my answer and you'll see that \b gives back pure numbers while \W gives back surrounding space.
Update Per Zsolt Szilagy comment, \d matches the digits in the current character set, so for languages with more digit characters (eg Chinese) you won't get the 0 through 9 expected. Use the character class [0-9] for that.
I have a big string as
------%%CreationDate: 11/9/2006 1:01 PM %%BoundingBox: -1 747 53 842 %%HiResBoundingBox: -0.28---------
now i want to get the values after this match "%%BoundingBox:"
I mean I need to get "-1 747 53 842", so i can split it and process, please help how to do this with preg_match or with any other.
Thanks.
Try with following regex:
/%%BoundingBox: ([^%]*)/
This regex matches everything before first % character.
/%%BoundingBox: (.*?)%%/
This regex matches everything before %% - if single % occurs, it will be captured.
PHP code:
$input = '------%%CreationDate: 11/9/2006 1:01 PM %%BoundingBox: -1 747 53 842 %%HiResBoundingBox: -0.28---------';
preg_match('/%%BoundingBox: ([^%]*)/', $input, $matches);
$output = $matches[1];
You could find the position of "%%BoundingBox:" and "%%HiResBoundingBox:" with strpos() and then extract the value with substr().
$text = '------%%CreationDate: 11/9/2006 1:01 PM %%BoundingBox: -1 747 53 842 %%HiResBoundingBox: -0.28---------';
$pattern = "#(%%BoundingBox: )(.*?)( %%HiResBoundingBox)#i";
preg_match_all($pattern, $text, $matches);
print_r($matches[2]);
output:
Array
(
[0] => -1 747 53 842
)
Try this,
$str='------%%CreationDate: 11/9/2006 1:01 PM %%BoundingBox: -1 747 53 842 %%HiResBoundingBox: -0.28---------';;
preg_match("/\%\%BoundingBox:\s(.*)\s\%\%/",$str,$match);
Will give
Array ( [0] => %%BoundingBox: -1 747 53 842 %% [1] => -1 747 53 842 )
Then you can find your value by
echo $match[1];// -1 747 53 842
It seems that the matching set is digits and spaces, so:
/%%BoundingBox: ([\s\d-]+)/
Doing so makes it work even if it's not followed by %%; here's an example implementation:
preg_match_all('/%%BoundingBox: ([\s\d-]+)/', $string, $matches);
print_r($matches[1]);
Output:
Array
(
[0] => -1 747 53 842
)
You could make it more strict by enforcing 4 sets of numbers:
preg_match_all('/%%BoundingBox: ((?:\s*\-?\d+){4})/', $string, $matches);
Update
To parse them into key-value pairs, you can do this:
preg_match_all('/%%([^:]++):([^%]*+)/', $string, $matches);
print_r(array_combine($matches[1], array_map('trim', $matches[2])));
Output:
Array
(
[CreationDate] => 11/9/2006 1:01 PM
[BoundingBox] => -1 747 53 842
[HiResBoundingBox] => -0.28---------
)