Wrong working regular expression for parsing short terms - php

I wrote some a regular expression for PHP to parsing abbreviation from string.
My code:
$re = "/(([$]?+[А-Яа-я.]+[.]){1,})/";
$str = "г. Братск, ж.р. Южный Падун, ул. Мамырская, 62А, за остановкой";
preg_match_all($re, $str, $matches);
And this script return:
Array
(
[0] => Array
(
[0] => г.
[1] => ж.
[2] => л.
)
[1] => Array
(
[0] => г.
[1] => ж.
[2] => л.
)
[2] => Array
(
[0] => г.
[1] => ж.
[2] => л.
)
)
But it will work like this:
[1]=>'ж.р.', [2]=>'ул.'
It means, that my regex parse part of abbreviation, though I need to get full abbreviation.
For example on regex101.com it pretty works: https://regex101.com/r/wQ7lR7/1
How I can get full abbreviation ('г.','ж.р.','ул.')?

You need to use the unicode modifier, u, http://php.net/manual/en/reference.pcre.pattern.modifiers.php.
Example:
$re = "/(([$]?+[А-Яа-я.]+[.]){1,})/u";
$str = "г. Братск, ж.р. Южный Падун, ул. Мамырская, 62А, за остановкой";
preg_match_all($re, $str, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => г.
[1] => ж.р.
[2] => ул.
)
[1] => Array
(
[0] => г.
[1] => ж.р.
[2] => ул.
)
[2] => Array
(
[0] => г.
[1] => ж.р.
[2] => ул.
)
)

Related

Extract a pattern using preg_match

I have a string variable and want to extract year and number alone.
$val = '2015(15)';
preg_match ('/(.*?)\((.*?)\)/',$val,$match);
print_r($match);
Output: Array ( [0] => 2015(15) [1] => 2015 [2] => 15 )
Expected: the above is ok. or Array ( [0] => 2015 [1] => 15 )
$val = '2015';
preg_match ('/(.*?)\((.*?)\)/',$val,$match);
print_r($match);
Output: Array ( )
Expected: Array ( [0] => 2015 [1] => )
$val = '(15)';
preg_match ('/(.*?)\((.*?)\)/',$val,$match);
print_r($match);
Output: Array ( [0] => (15) [1] => [2] => 15 )
Expected: Array ( [0] => [1] => 15 )
Solution
Perhaps you can try something like,
/([0-9]{4})?(?:\(([0-9]*)\))?/

Regex match position

$str1 = '10 sold';
$re = "/(?<Alpha>[a-zA-Z]*)(?<Numeric>[0-9]*)/";
preg_match_all($re, $str1, $str1matches);
echo print_r($str1matches,1);
prints:
Array
(
[0] => Array
(
[0] => 10
[1] =>
[2] => sold
[3] =>
)
[Alpha] => Array
(
[0] =>
[1] =>
[2] => sold
[3] =>
)
[1] => Array
(
[0] =>
[1] =>
[2] => sold
[3] =>
)
[Numeric] => Array
(
[0] => 10
[1] =>
[2] =>
[3] =>
)
[2] => Array
(
[0] => 10
[1] =>
[2] =>
[3] =>
)
)
But why does it print such a long array, and how do I determine at which position will my values (xxx and label) be available always?
I'd use a simple /^([0-9]+)\s*([a-zA-Z]+)$/ regex since you confirm there is a number and then a word in the input string:
preg_match('/^([0-9]+)\s*([a-zA-Z]+)$/', '10 sold', $str1matches, PREG_OFFSET_CAPTURE);
See the PHP demo:
$str1 = '10 sold';
$re = "/^([0-9]+)\s*([a-zA-Z]+)$/";
preg_match($re, $str1, $str1matches, PREG_OFFSET_CAPTURE);
echo print_r($str1matches[1]);
echo print_r($str1matches[2]);
The $str1matches[1] will contain an array with the Group 1 (number) value and its position, and the $str1matches[2] will contain an array with the Group 2 (word) value and its position.

Error with php regular expression

i have this code:
$text = "###12###hello###43###good###113###thefinalstring";
preg_match_all('/(.*?)###(\d*)###(.*?)/is', $text, $matches, PREG_SET_ORDER);
If I dump $matches, why there is not "thefinalstring" anywhere?
Where is the error in the regular expression?
Thanks
(.*?)###(\d*)###(.*?)([a-zA-Z]*)
Use this regex
Have a try with:
$text = "###12###hello###43###good###113###thefinalstring";
preg_match_all('/###(\d*)###([^#]*)/is', $text, $matches, PREG_SET_ORDER);
print_r($matches);
output:
Array
(
[0] => Array
(
[0] => ###12###hello
[1] => 12
[2] => hello
)
[1] => Array
(
[0] => ###43###good
[1] => 43
[2] => good
)
[2] => Array
(
[0] => ###113###thefinalstring
[1] => 113
[2] => thefinalstring
)
)

preg_match_all and umlets

I am using preg_match_all to filter out strings
The string which I have supplied in preg_match_all is
$text = "Friedric'h Wöhler"
after that I use
preg_match_all('/(\"[^"]+\"|[\\p{L}\\p{N}\\*\\-\\.\\?]+)/', $text, $arr, PREG_PATTERN_ORDER);
and the result i get when I print $arr is
Array
(
[0] => Array
(
[0] => friedric
[1] => h
[2] => w
[3] => ouml
[4] => hler
)
[1] => Array
(
[0] => friedric
[1] => h
[2] => w
[3] => ouml
[4] => hler
)
)
Somehow the ö character is replaced by ouml which I am not really sure how to figure this out
I am expecting following result
Array
(
[0] => Array
(
[0] => Friedric'h
[1] => Wöhler
)
)
Per nhahtdh's comment:
$text = "Friedric'h Wöhler";
preg_match_all('/"[^"]+"|[\p{L}\p{N}*.?\\\'-]+/u', $text, $arr, PREG_PATTERN_ORDER);
echo "<pre>";
print_r($arr);
echo "</pre>";
Gives
Array
(
[0] => Array
(
[0] => Friedric'h
[1] => Wöhler
)
)
If you think preg_match_all() is messy, you could take a look at pattern():
$p = '"[^"]+"|[\p{L}\p{N}*.?\\\'-]+'; // automatic delimiters
$text = "Friedric'h Wöhler";
$result = pattern($p)->match($text)->all();

Parsing attributes in PHP using regular expressions

Consider that i have the string,
$string = 'tag2 display="users" limit="5"';
Using the preg_match_all function, i need to get the output
Required o/p
Array
(
[0] => Array
(
[0] => tag2
[1] => tag2
[2] =>
)
[1] => Array
(
[0] => display="users"
[1] => display
[2] => users
)
[2] => Array
(
[0] => limit="5"
[1] => limit
[2] => 5
)
)
I tried using this pattern '/([^=\s]+)="([^"]+)"/' but it is not recognizing the parameter with no value (in this case tag2) Instead it gives the output
What I am getting
Array
(
[0] => Array
(
[0] => display="users"
[1] => display
[2] => users
)
[1] => Array
(
[0] => limit="5"
[1] => limit
[2] => 5
)
)
What will be the pattern for getting the required output ?
EDIT 1: I also need to get the attributes which are not wrapped with quotes ex: attr=val. Sorry for not mentioning before.
Try this:
<?php
$string = 'tag2 display="users" limit="5"';
preg_match_all('/([^=\s]+)(="([^"]+)")?/', $string, $res);
foreach ($res[0] as $r => $v) {
$o[] = array($res[0][$r], $res[1][$r], $res[3][$r]);
}
print_r($o);
?>
It outputs me:
Array
(
[0] => Array
(
[0] => tag2
[1] => tag2
[2] =>
)
[1] => Array
(
[0] => display="users"
[1] => display
[2] => users
)
[2] => Array
(
[0] => limit="5"
[1] => limit
[2] => 5
)
)
I think it's not fully possible to give you with one call what you're looking for, but this is pretty close:
$string = 'tag2 display="users" limit=5';
preg_match_all('/([^=\s]+)(?:="?([^"]+)"?|())?/', $string, $res, PREG_SET_ORDER);
print_r($res);
Output:
Array
(
[0] => Array
(
[0] => tag2
[1] => tag2
[2] =>
[3] =>
)
[1] => Array
(
[0] => display="users"
[1] => display
[2] => users
)
[2] => Array
(
[0] => limit=5
[1] => limit
[2] => 5
)
)
As you can see, the first element has no value, I tried to work around that and offer an empty match now. So this builds the array you were asking for, but has an additional entry on the empty attribute.
However the main point is the PREG_SET_ORDER flag of preg_match_all. Maybe you can live with this output already.
Maybe you're interested in this litte snippet that parses all sorts of attribute styles. <div class="hello" id=foobar style='display:none'> is valid html(5), not pretty, I know…
<?php
$string = '<tag2 display="users" limit="5">';
$attributes = array();
$pattern = "/\s+(?<name>[a-z0-9-]+)=(((?<quotes>['\"])(?<value>.*?)\k<quotes>)|(?<value2>[^'\" ]+))/i";
preg_match_all($pattern, $source, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$attributes[$match['name']] = $match['value'] ?: $match['value2'];
}
var_dump($attributes);
will give you
$attributes = array(
'display' => 'users',
'limit' => '5',
);

Categories