trivial regex assistance

trivial regex assistance - php

I have a string as follows:
$str="1-3";
When I pass it through here:
preg_match('#(\\d+)\\s*-\\s*(\\d+)#', $str, $matches);
I get:
$matches[0] //1-3
$matches[1] //1
$matchers[2] //3
Now if you hass something like this:
$str="a-3";
You get
$matches //empty
This is correct since it is restricted to only integers.
Now my problem is i want to implement something that functions the same however for characters.
Here's what I have so far
preg_match('#(\\w+)\\s*-\\s*(\\w+)#', $str, $matches);
$str="a-d"
I get:
$matches[0] //a-d
$matches[1] //a
$matchers[2] //d
Which works great, however if u do this (notice the integer):
$str="a-5"
I get:
$matches[0] //a-5
$matches[1] //a
$matchers[2] //5
What i need is to enforce only alphabetic characters on the subsequent regex expression - thus if you pass a-5 it should be marked as errored.
Essentially i need the first regex solution applied to the second one with characters only

Simple Change the capturing group to ([a-zA-z]+), Like(DEMO):
([a-zA-Z]+)\s*-\s*([a-zA-Z]+)
\w, works by matching, any alphanumeric characters and _ underscore. If you only want to match alphabets then you need to provide the alphabets range like
a-z small letter and A-Z capital letters.

You could use unicode property \pL that means any letter in any language:
$arr = [
'a-d',
'1-5',
'1-d',
'ç-é',
];
foreach($arr as $str) {
if (preg_match('/(\pL)\s*-\s*(\pL)/u', $str, $matches)) {
print_r($matches);
} else {
echo "$str : error\n";
}
}
Output:
Array
(
[0] => a-d
[1] => a
[2] => d
)
1-5 : error
1-d : error
Array
(
[0] => ç-é
[1] => ç
[2] => é
)

Related

find a specific word in string php

I have a text in PHP stored in the variable $row. I'd like to find the position of a certain group of words and that's quite easy. What's not so easy is to make my code recognize that the word it has found is exactly the word i'm looking for or a part of a larger word. Is there a way to do it?
Example of what I'd like to obtain
CODE:
$row= "some ugly text of some kind i'd like to find in someway"
$token= "some";
$pos= -1;
$counter= substr_count($row, $token);
for ($h=0; $h<$counter; $h++) {
$pos= strpos($row, $token, $pos+1);
echo $pos.' ';
}
OUTPUT:
what I obtain:
0 17 47
what I'd like to obtain
0 17
Any hint?

Use preg_match_all() with word boundaries (\b):
$search = preg_quote($token, '/');
preg_match_all("/\b$search\b/", $row, $m, PREG_OFFSET_CAPTURE);
Here, the preg_quote() statement is used to correctly escape the user input so as to use it in our regular expression. Some characters have special meaning in regular expression language — without proper escaping, those characters will lose their "special meaning" and your regex might not work as intended.
In the preg_match_all() statement, we are supplying the following regex:
/\b$search\b/
Explanation:
/ - starting delimiter
\b - word boundary. A word boundary, in most regex dialects, is a position between a word character (\w) and a non-word character (\W).
$search - escaped search term
\b - word boundary
/ - ending delimiter
In simple English, it means: find all the occurrences of the given word some.
Note that we're also using PREG_OFFSET_CAPTURE flag here. If this flag is passed, for every occurring match the appendant string offset will also be returned. See the documentation for more information.
To obtain the results you want, you can simply loop through the $m array and extract the offsets:
$result = implode(' ', array_map(function($arr) {
return $arr[1];
}, $m[0]));
echo $result;
Output:
0 18
Demo

What you're looking for is a combination of Regex with a word boundaries pattern and the flag to return the offset (PREG_OFFSET_CAPTURE).
PREG_OFFSET_CAPTURE
If this flag is passed, for every occurring match the appendant
string offset will also be returned. Note that this changes the
value of matches into an array where every element is an array
consisting of the matched string at offset 0 and its string offset
into subject at offset 1.
$row= "some ugly text of some kind i'd like to find in someway";
$pattern= "/\bsome\b/i";
preg_match_all($pattern, $row, $matches, PREG_OFFSET_CAPTURE);
And we get something like this:
Array
(
[0] => Array
(
[0] => Array
(
[0] => some
[1] => 0
)
[1] => Array
(
[0] => some
[1] => 18
)
)
)
And just loop through the matches and extract the offset where the needle was found in the haystack.
// store the positions of the match
$offsets = array();
foreach($matches[0] as $match) {
$offsets[] = $match[1];
}
// display the offsets
echo implode(' ', $offsets);

Use preg_match():
if(preg_match("/some/", $row))
// [..]
The first argument is a regex, which can match virtually anything you want to match. But, there are dire warnings about using it to match things like HTML.

preg_match return longest match

I am trying to return a series of numbers between 5 and 9 digits long. I want to be able to get the longest possible match, but unfortunately preg_match just returns the last 5 characters that match.
$string = "foo 123456";
if (preg_match("/.*(\d{5,9}).*/", $string, $match)) {
print_r($match);
};
will yield results
Array
(
[0] => foo 123456
[1] => 23456
)

Since you want only the numbers, you can just remove the .* from the pattern:
$string = "foo 123456";
if (preg_match("/\d{5,9}/", $string, $match)) {
print_r($match);
};
Note that if the input string is "123456789012", then the code will return 123456789 (which is a substring of a longer sequence of digits).
If you don't want to match a sequence of number that is part of a longer sequence of number, then you must add some look-around:
preg_match("/(?<!\d)\d{5,9}(?!\d)/", $string, $match)
DEMO
(?<!\d) checks that there is no digit in front of the sequence of digits. (?<!pattern) is zero-width negative look-behind, which means that without consuming text, it checks that looking behind from the current position, there is no match for the pattern.
(?!\d) checks that there is no digit after the sequence of digits. (?!pattern) is zero-width negative look-ahead, which means that without consuming text, it checks that looking ahead from the current position, there is no match for the pattern.

Use a "local" non-greedy like .*?
<?php
$string = "foo 123456 bar"; // work with "foo 123456", "123456", etc.
if (preg_match("/.*?(\d{5,9}).*/", $string, $match)) {
print_r($match);
};
result :
Array
(
[0] => foo 123456 bar
[1] => 123456
)
For more informations : http://en.wikipedia.org/wiki/Regular_expression#Lazy_quantification

PHP Regex pulling text after period and before space

I'm attempting to pull a certain part out of different varying strings, and am having a really hard time getting the correct regex to do so. Here are a few examples of what I am trying to pull from:
AG055.MA - MAGNUM (Want to return just MA)
WI460.16 - SOMETHING (Want to return 16)
AG055.QB (Want to return QB)
So basically, I just want to pull the characters after the period, but before the space. Nothing else before or after. Can someone give me a hand with getting the correct regex?

This should work:
<?php
preg_match( '/\.([^ ]+)/', $text, $matches );
print_r( $matches );
?>
Output:
Array
(
[0] => .MA
[1] => MA
)
Array
(
[0] => .16
[1] => 16
)
Array
(
[0] => .QB
[1] => QB
)
The regex is saying find a . character, then get any characters after it that are not a space character. The + makes it only return matches where there is a non-space character after the dot.

preg_match('/\w+\.(\w{2})\s/', $input, $matches);
echo $matches[1];
\w+ means 1 or more word characters (a-z, A-Z and 0-9).
\. means the period/dot (the backslash is to escape it, because \. is used as an operator in regex)
(\w{2}) matches 2 word characters
\s means whitespace

preg_match('/^[A-Z0-9]{5}\.([A-Z0-9]{2})/', $string, $matches);
var_dump($matches);
Should return the characters in $matches[1].

Get all occurrences of words between curly brackets

I have a text like:
This is a {demo} phrase made for {test}
I need to get
demo
test
Note: My text can have more than one block of {}, not always two. Example:
This is a {demo} phrase made for {test} written in {English}
I used this expression /{([^}]*)}/ with preg_match but it returns only the first word, not all words inside the text.

Use preg_match_all instead:
preg_match_all($pattern, $input, $matches);
It's much the same as preg_match, with the following stipulations:
Searches subject for all matches to the regular expression given in
pattern and puts them in matches in the order specified by flags.
After the first match is found, the subsequent searches are continued
on from end of the last match.

Your expression is correct, but you should be using preg_match_all() instead to retrieve all matches. Here's a working example of what that would look like:
$s = 'This is a {demo} phrase made for {test}';
if (preg_match_all('/{([^}]*)}/', $s, $matches)) {
echo join("\n", $matches[1]);
}
To also capture the positions of each match, you can pass PREG_OFFSET_CAPTURE as the fourth parameter to preg_match_all. To use that, you can use the following example:
if (preg_match_all('/{([^}]*)}/', $s, $matches, PREG_OFFSET_CAPTURE)) {
foreach ($matches[1] as $match) {
echo "{$match[0]} occurs at position {$match[1]}\n";
}
}

As the { and } are part of regex matching syntax, you need to escape these characters:
<?php
$text = <<<EOD
this {is} some text {from}
which I {may} want to {extract}
some words {between} brackets.
EOD;
preg_match_all("!\{(\w+)\}!", $text, $matches);
print_r($matches);
?>
produces
Array
(
[0] => Array
(
[0] => {is}
[1] => {from}
[2] => {may}
[3] => {extract}
[4] => {between}
)
... etc ...
)
This example may be helpful to understand the use of curly brackets in regexes:
<?php
$str = 'abc212def3456gh34ij';
preg_match_all("!\d{3,}!", $str, $matches);
print_r($matches);
?>
which returns:
Array
(
[0] => Array
(
[0] => 212
[1] => 3456
)
)
Note that '34' is excluded from the results because the \d{3,} requires a match of at least 3 consecutive digits.

Matching portions between pair of braces using RegEx, is less better than using Stack for this purpose. Using RegEx would be something like «quick and dirty patch», but for parsing and processing input string you have to use a stack. Visit here for the concept and here for applying the same.

Split string on non-alphanumeric characters and on positions between digits and non-digits

I'm trying to split a string by non-alphanumeric delimiting characters AND between alternations of digits and non-digits. The end result should be a flat array of consisting of alphabetic strings and numeric strings.
I'm working in PHP, and would like to use REGEX.
Examples:
ES-3810/24MX should become ['ES', '3810', '24', 'MX']
CISCO1538M should become ['CISCO' , '1538', 'M']
The input file sequence can be indifferently DIGITS or ALPHA.
The separators can be non-ALPHA and non-DIGIT chars, as well as a change between a DIGIT sequence to an APLHA sequence, and vice versa.

The command to match all occurrances of a regex is preg_match_all() which outputs a multidimensional array of results. The regex is very simple... any digit ([0-9]) one or more times (+) or (|) any letter ([A-z]) one or more times (+). Note the capital A and lowercase z to include all upper and lowercase letters.
The textarea and php tags are inluded for convenience, so you can drop into your php file and see the results.
<textarea style="width:400px; height:400px;">
<?php
foreach( array(
"ES-3810/24MX",
"CISCO1538M",
"123ABC-ThatsHowEasy"
) as $string ){
// get all matches into an array
preg_match_all("/[0-9]+|[[:upper:][:lower:]]+/",$string,$matches);
// it is the 0th match that you are interested in...
print_r( $matches[0] );
}
?>
</textarea>
Which outputs in the textarea:
Array
(
[0] => ES
[1] => 3810
[2] => 24
[3] => MX
)
Array
(
[0] => CISCO
[1] => 1538
[2] => M
)
Array
(
[0] => 123
[1] => ABC
[2] => ThatsHowEasy
)

$str = "ES-3810/24MX35 123 TEST 34/TEST";
$str = preg_replace(array("#[^A-Z0-9]+#i","#\s+#","#([A-Z])([0-9])#i","#([0-9])([A-Z])#i"),array(" "," ","$1 $2","$1 $2"),$str);
echo $str;
$data = explode(" ",$str);
print_r($data);
I could not think on a more 'cleaner' way.

The most direct preg_ function to produce the desired flat output array is preg_split().
Because it doesn't matter what combination of alphanumeric characters are on either side of a sequence of non-alphanumeric characters, you can greedily split on non-alphanumeric substrings without "looking around".
After that preliminary obstacle is dealt with, then split on the zero-length positions between a digit and a non-digit OR between a non-digit and a digit.
/ #starting delimiter
[^a-z\d]+ #match one or more non-alphanumeric characters
| #OR
\d\K(?=\D) #match a number, then forget it, then lookahead for a non-number
| #OR
\D\K(?=\d) #match a non-number, then forget it, then lookahead for a number
/ #ending delimiter
i #case-insensitive flag
Code: (Demo)
var_export(
preg_split('/[^a-z\d]+|\d\K(?=\D)|\D\K(?=\d)/i', $string, 0, PREG_SPLIT_NO_EMPTY)
);
preg_match_all() isn't a silly technique, but it doesn't return the array, it returns the number of matches and generates a reference variable containing a two dimensional array of which the first element needs to be accessed. Admittedly, the pattern is shorter and easier to follow. (Demo)
var_export(
preg_match_all('/[a-z]+|\d+/i', $string, $m) ? $m[0] : []
);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

trivial regex assistance - php

Simple Change the capturing group to ([a-zA-z]+), Like(DEMO): ([a-zA-Z]+)\s-\s([a-zA-Z]+) \w, works by matching, any alphanumeric characters and _ underscore. If you only want to match alphabets then you need to provide the alphabets range like a-z small letter and A-Z capital letters.

Related

find a specific word in string php

preg_match return longest match

PHP Regex pulling text after period and before space

Get all occurrences of words between curly brackets

Split string on non-alphanumeric characters and on positions between digits and non-digits

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

trivial regex assistance - php

Simple Change the capturing group to ([a-zA-z]+), Like(DEMO): ([a-zA-Z]+)\s*-\s*([a-zA-Z]+) \w, works by matching, any alphanumeric characters and _ underscore. If you only want to match alphabets then you need to provide the alphabets range like a-z small letter and A-Z capital letters.

Related

find a specific word in string php

preg_match return longest match

PHP Regex pulling text after period and before space

Get all occurrences of words between curly brackets

Split string on non-alphanumeric characters and on positions between digits and non-digits

Categories

Resources

Simple Change the capturing group to ([a-zA-z]+), Like(DEMO): ([a-zA-Z]+)\s-\s([a-zA-Z]+) \w, works by matching, any alphanumeric characters and _ underscore. If you only want to match alphabets then you need to provide the alphabets range like a-z small letter and A-Z capital letters.