Regex match position - php

$str1 = '10 sold';
$re = "/(?<Alpha>[a-zA-Z]*)(?<Numeric>[0-9]*)/";
preg_match_all($re, $str1, $str1matches);
echo print_r($str1matches,1);
prints:
Array
(
[0] => Array
(
[0] => 10
[1] =>
[2] => sold
[3] =>
)
[Alpha] => Array
(
[0] =>
[1] =>
[2] => sold
[3] =>
)
[1] => Array
(
[0] =>
[1] =>
[2] => sold
[3] =>
)
[Numeric] => Array
(
[0] => 10
[1] =>
[2] =>
[3] =>
)
[2] => Array
(
[0] => 10
[1] =>
[2] =>
[3] =>
)
)
But why does it print such a long array, and how do I determine at which position will my values (xxx and label) be available always?

I'd use a simple /^([0-9]+)\s*([a-zA-Z]+)$/ regex since you confirm there is a number and then a word in the input string:
preg_match('/^([0-9]+)\s*([a-zA-Z]+)$/', '10 sold', $str1matches, PREG_OFFSET_CAPTURE);
See the PHP demo:
$str1 = '10 sold';
$re = "/^([0-9]+)\s*([a-zA-Z]+)$/";
preg_match($re, $str1, $str1matches, PREG_OFFSET_CAPTURE);
echo print_r($str1matches[1]);
echo print_r($str1matches[2]);
The $str1matches[1] will contain an array with the Group 1 (number) value and its position, and the $str1matches[2] will contain an array with the Group 2 (word) value and its position.

Related

Split string with multiple length into array

I have a list of strings like this
A45618416541548234
A48432185120148084
A15973357048208202
I want to split these strings and put them into arrays like this
Array
(
[0] => Array
(
[0] => A45
[1] => 6184165
[2] => 41548234
)
[1] => Array
(
[0] => A48
[1] => 4321851
[2] => 20148084
)
[2] => Array
(
[0] => A15
[1] => 9733570
[2] => 48208202
)
)
I want to split the strings into 3 parts - 1st to 3rd character, 4th to 10th, and 11th to 18th.
I tried doing this using substr, but I could make an array like above...
How can I accomplish this??
You can achieve what you want with array_map and substr:
$strings = array('A45618416541548234', 'A48432185120148084', 'A15973357048208202');
print_r(array_map(function ($v) {
return array(substr($v, 0, 3), substr($v, 3, 7), substr($v, 10, 8)); }
, $strings));
Output:
Array
(
[0] => Array
(
[0] => A45
[1] => 6184165
[2] => 41548234
)
[1] => Array
(
[0] => A48
[1] => 4321851
[2] => 20148084
)
[2] => Array
(
[0] => A15
[1] => 9733570
[2] => 48208202
)
)
Demo on 3v4l.org

A regular expression to match a single character followed by numbers and ends with 'c'

I need a regex for the following pattern:
a single character from [e-g] followed by one or more numbers that ends with character 'c'.
for example
e123f654g933c
expected result:
Array
(
[0] => Array
(
[0] => e123
[1] => f654
[2] => g933
)
)
or
e123f654g933ce99f77g66c
expected result:
Array
(
[0] => Array
(
[0] => e123
[1] => f654
[2] => g933
),
[1] => Array
(
[0] => e99
[1] => f77
[2] => g66
)
)
I tried using the following but I don't know what to do with 'c' part.
I used this ([e-g]{1}[0-9]{1,}c)+ but it fails.
$subject="e123f654g933ce99f786g776c";
preg_match_all('/[e-g]{1}[0-9]{1,}/', $subject, $match);
print '<pre>' . print_r($match,1) . '</pre>';
Array
(
[0] => Array
(
[0] => e123
[1] => f654
[2] => g933
[3] => e99
[4] => f786
[5] => g776
)
)
thanks.
I couldn't manage to generate your multi dimensional output array via a single regex function call.
Code (Demo)
$strings = [
'e123f654g933c',
'e123f654g933ce99f77g66c'
];
foreach ($strings as $string) {
var_export(
array_map(
function($v) {
return preg_match_all('/[e-g]\d+/', $v, $out2) ? $out2[0] : []; // split the groups by string format
// or return preg_split('/\d+\K/', $v, 0, PREG_SPLIT_NO_EMPTY);
// or return preg_split('/(?=[e-g])/', $v, 0, PREG_SPLIT_NO_EMPTY);
},
preg_match_all('/(?:[e-g]\d+)+(?=c)/', $string, $out1) ? $out1[0] : [] // split into groups using c
// or explode('c', rtrim($string, 'c'))
// or array_slice(explode('c', $string), 0, -1)
// or preg_split('/c/', $string, 0, PREG_SPLIT_NO_EMPTY)
)
);
echo"\n\n";
}
Output:
array (
0 =>
array (
0 => 'e123',
1 => 'f654',
2 => 'g933',
),
)
array (
0 =>
array (
0 => 'e123',
1 => 'f654',
2 => 'g933',
),
1 =>
array (
0 => 'e99',
1 => 'f77',
2 => 'g66',
),
)
It seems you are looking for
[e-g]\d+
This needs to be matched and extracted in PHP like so...
<?php
$strings = ['e123f654g933c', 'e123f654g933ce99f77g66c'];
$regex = '~[e-g]\d+~';
foreach ($strings as $string) {
if (preg_match_all($regex, $string, $matches)) {
print_r($matches[0]);
}
}
?>
... and yields
Array
(
[0] => e123
[1] => f654
[2] => g933
)
Array
(
[0] => e123
[1] => f654
[2] => g933
[3] => e99
[4] => f77
[5] => g66
)
You may use
'~(?:\G(?!^)|(?=(?:[e-g]\d+)+c))[e-g]\d+~'
See the regex demo. In short, due to the (?:\G(?!^)|(?=(?:[e-g]\d+)+c)) part, [e-g]\d+ will only match when in between 1 or more occurrences of [e-g]\d+ and c.
Details
(?:\G(?!^)|(?=(?:[e-g]\d+)+c)) - match the end of the last successful match (\G(?!^)) or (|) the location followed with an e, f or g letter followed with 1+ digits, 1+ occurrences (due to the(?=(?:[e-g]\d+)+c) positive lookahead)
[e-g]\d+ - an e, f or g letter followed with 1+ digits
PHP demo:
$re = '/(?:\G(?!^)|(?=(?:[e-g]\d+)+c))[e-g]\d+/';
$str = 'e123f654g933c and e123f654g933ce99f77g66c';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
// => Array ( [0] => e123 [1] => f654 [2] => g933 [3] => e123 [4] => f654 [5] => g933 [6] => e99 [7] => f77 [8] => g66 )
You can't easily achieve this with a single RegExp.
The solution is to split the string at the occurrences of 'c', handle the parts separately, and then build the result array:
<?php
$strings = [
'e123f654g933c',
'e123f654g933ce99f77g66c',
];
foreach ($strings as $input)
{
print_r(match($input));
}
function match($input)
{
$result = [];
$parts = array_filter(explode('c', $input));
foreach ($parts as $part)
{
preg_match_all('~[e-g]\d+~', $part, $matches);
$result[] = $matches[0];
}
return $result;
}
The output will be
Array
(
[0] => Array
(
[0] => e123
[1] => f654
[2] => g933
)
)
Array
(
[0] => Array
(
[0] => e123
[1] => f654
[2] => g933
)
[1] => Array
(
[0] => e99
[1] => f77
[2] => g66
)
)

Wrong working regular expression for parsing short terms

I wrote some a regular expression for PHP to parsing abbreviation from string.
My code:
$re = "/(([$]?+[А-Яа-я.]+[.]){1,})/";
$str = "г. Братск, ж.р. Южный Падун, ул. Мамырская, 62А, за остановкой";
preg_match_all($re, $str, $matches);
And this script return:
Array
(
[0] => Array
(
[0] => г.
[1] => ж.
[2] => л.
)
[1] => Array
(
[0] => г.
[1] => ж.
[2] => л.
)
[2] => Array
(
[0] => г.
[1] => ж.
[2] => л.
)
)
But it will work like this:
[1]=>'ж.р.', [2]=>'ул.'
It means, that my regex parse part of abbreviation, though I need to get full abbreviation.
For example on regex101.com it pretty works: https://regex101.com/r/wQ7lR7/1
How I can get full abbreviation ('г.','ж.р.','ул.')?
You need to use the unicode modifier, u, http://php.net/manual/en/reference.pcre.pattern.modifiers.php.
Example:
$re = "/(([$]?+[А-Яа-я.]+[.]){1,})/u";
$str = "г. Братск, ж.р. Южный Падун, ул. Мамырская, 62А, за остановкой";
preg_match_all($re, $str, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => г.
[1] => ж.р.
[2] => ул.
)
[1] => Array
(
[0] => г.
[1] => ж.р.
[2] => ул.
)
[2] => Array
(
[0] => г.
[1] => ж.р.
[2] => ул.
)
)

How to use preg match in array?

If I have an array:
Array
(
[0] => Array
(
[0] => |174|September|2001|
[1] => |Pengantar=Hello!!!!
[2] => |Tema= Sami Mawon
[3] => |Tema_isi=meet you!!!
[4] => |Kutip=people
[5] => |Kutip_kitab=Efesus
[6] => |Kutip_pasal=4
[7] => |Kutip_ayat=28
[8] => |Tema_sumber=Kiriman dari Maurits albert (romind# )
[9] => [[Kategori:e-humor 2001]]
)
How can I get the value of Pengantar, Tema, Tema_isi etc?
You're going to have to loop over the array and use preg_match with a reference.
Regex something like this (off the top of my head) would probably work:
/\|(.*?)=(.*?)|?/
Just use preg_match('/\|(.*?)=(.*?)|?/', $subject[$x], $matches); and var_dump($matches); to see the results.
Don't forget that the $matches array passed into the preg_match function is a reference to an array which you should instantiate first and that it will be overwritten in each loop cycle.
just use array walk, a matching reg ex and a lambda function:
$array = array(
array(
'|174|September|2001|',
'Pengantar=Hello!!!!',
'Tema= Sami Mawon ',
'Kategori:e-humor 2001',
[...]
)
);
$values = array();
array_walk($array[0],function(&$item1, $key) use(&$values) {
if(preg_match('#[^=]=(.+)#',$item1,$match)){
$values[] = $match[1];
}
});
print_r($values);
The regular expression can be like this, using the named subpattern of "preg_match()":-
$a = array(
array(
'|174|September|2001|',
'|Pengantar=Hello!!!!',
'|Tema= Sami Mawon',
'|Tema_isi=meet you!!!',
'|Kutip=people',
'|Kutip_kitab=Efesus',
'|Kutip_pasal=4',
'|Kutip_ayat=28',
'|Tema_sumber=Kiriman dari Maurits albert (romind# )',
'[[Kategori:e-humor 2001]]',
)
);
$pattern = '/(?<first>\w+)[:=](?<rest>[\d|\w|\s]+)/';
$matches = array();
foreach ($a as $_arrEach) {
foreach ($_arrEach as $_each) {
$result = preg_match($pattern, $_each, $matches[]);
}
}
echo "<pre>";
print_r($matches);
echo "</pre>";
You will find that the array key "first" satisfies your requirement.
The above will output as:-
Array
(
[0] => Array
(
)
[1] => Array
(
[0] => Pengantar=Hello
[first] => Pengantar
[1] => Pengantar
[rest] => Hello
[2] => Hello
)
[2] => Array
(
[0] => Tema= Sami Mawon
[first] => Tema
[1] => Tema
[rest] => Sami Mawon
[2] => Sami Mawon
)
[3] => Array
(
[0] => Tema_isi=meet you
[first] => Tema_isi
[1] => Tema_isi
[rest] => meet you
[2] => meet you
)
[4] => Array
(
[0] => Kutip=people
[first] => Kutip
[1] => Kutip
[rest] => people
[2] => people
)
[5] => Array
(
[0] => Kutip_kitab=Efesus
[first] => Kutip_kitab
[1] => Kutip_kitab
[rest] => Efesus
[2] => Efesus
)
[6] => Array
(
[0] => Kutip_pasal=4
[first] => Kutip_pasal
[1] => Kutip_pasal
[rest] => 4
[2] => 4
)
[7] => Array
(
[0] => Kutip_ayat=28
[first] => Kutip_ayat
[1] => Kutip_ayat
[rest] => 28
[2] => 28
)
[8] => Array
(
[0] => Tema_sumber=Kiriman dari Maurits albert
[first] => Tema_sumber
[1] => Tema_sumber
[rest] => Kiriman dari Maurits albert
[2] => Kiriman dari Maurits albert
)
[9] => Array
(
[0] => Kategori:e
[first] => Kategori
[1] => Kategori
[rest] => e
[2] => e
)
)
Hope it helps.
Apart from the question of "which pattern", I suggest you take a look into preg_replace­Docs which is able to operate on arrays directly.
$pattern = "...";
$matches = preg_replace($pattern, '\1', $array);
It is not clear from the question, but it looks like you want preg_filter.
This function can perform a regex match and replace on an array, returning a new array containing only the replaced values of the matched items.

Parsing attributes in PHP using regular expressions

Consider that i have the string,
$string = 'tag2 display="users" limit="5"';
Using the preg_match_all function, i need to get the output
Required o/p
Array
(
[0] => Array
(
[0] => tag2
[1] => tag2
[2] =>
)
[1] => Array
(
[0] => display="users"
[1] => display
[2] => users
)
[2] => Array
(
[0] => limit="5"
[1] => limit
[2] => 5
)
)
I tried using this pattern '/([^=\s]+)="([^"]+)"/' but it is not recognizing the parameter with no value (in this case tag2) Instead it gives the output
What I am getting
Array
(
[0] => Array
(
[0] => display="users"
[1] => display
[2] => users
)
[1] => Array
(
[0] => limit="5"
[1] => limit
[2] => 5
)
)
What will be the pattern for getting the required output ?
EDIT 1: I also need to get the attributes which are not wrapped with quotes ex: attr=val. Sorry for not mentioning before.
Try this:
<?php
$string = 'tag2 display="users" limit="5"';
preg_match_all('/([^=\s]+)(="([^"]+)")?/', $string, $res);
foreach ($res[0] as $r => $v) {
$o[] = array($res[0][$r], $res[1][$r], $res[3][$r]);
}
print_r($o);
?>
It outputs me:
Array
(
[0] => Array
(
[0] => tag2
[1] => tag2
[2] =>
)
[1] => Array
(
[0] => display="users"
[1] => display
[2] => users
)
[2] => Array
(
[0] => limit="5"
[1] => limit
[2] => 5
)
)
I think it's not fully possible to give you with one call what you're looking for, but this is pretty close:
$string = 'tag2 display="users" limit=5';
preg_match_all('/([^=\s]+)(?:="?([^"]+)"?|())?/', $string, $res, PREG_SET_ORDER);
print_r($res);
Output:
Array
(
[0] => Array
(
[0] => tag2
[1] => tag2
[2] =>
[3] =>
)
[1] => Array
(
[0] => display="users"
[1] => display
[2] => users
)
[2] => Array
(
[0] => limit=5
[1] => limit
[2] => 5
)
)
As you can see, the first element has no value, I tried to work around that and offer an empty match now. So this builds the array you were asking for, but has an additional entry on the empty attribute.
However the main point is the PREG_SET_ORDER flag of preg_match_all. Maybe you can live with this output already.
Maybe you're interested in this litte snippet that parses all sorts of attribute styles. <div class="hello" id=foobar style='display:none'> is valid html(5), not pretty, I know…
<?php
$string = '<tag2 display="users" limit="5">';
$attributes = array();
$pattern = "/\s+(?<name>[a-z0-9-]+)=(((?<quotes>['\"])(?<value>.*?)\k<quotes>)|(?<value2>[^'\" ]+))/i";
preg_match_all($pattern, $source, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$attributes[$match['name']] = $match['value'] ?: $match['value2'];
}
var_dump($attributes);
will give you
$attributes = array(
'display' => 'users',
'limit' => '5',
);

Categories