There is a string variable containing number data with dots , say $x = "OP/1.1.2/DIR"; . The position of the number data may change at any circumstance by user desire by modifying it inside the application , and the slash bar may be changed by any other character ; but the dotted number data is mandatory. So how to extract the dotted number data , here 1.1.2, from the string ?
Use a regular expression:
(\d+(?:\.\d+)*)
Breakdown:
\d+ look for one or more digits
\. a literal decimal . character
\d+ followed by one or more digits again
(...)* this means match 0 or more occurrences of this pattern
(?:...) this tells the engine not to create a backreference for this group (basically, we don't use the reference, so it's pointless to have one)
You haven't given much information about the data, so I've made the following assumptions:
The data will always contain at least one number
The data may contain only a number without a dot
The data may contain multi-digit numbers
The numbers themselves may contain any number of dot/digit pairs
If any of these assumptions are incorrect, you'll have to modify the regular expression.
Example usage:
$x = "OP/1.1.2/DIR";
if (!preg_match('/(\d+(\.\d+)*)/', $x, $matches)) {
// Could not find a matching number in the data - handle this appropriately
} else {
var_dump($matches[1]); // string(5) "1.1.2"
}
Related
How can I find numbers inside certain strings in php?
For example, having this text inside a page, I would like to find for
|||12345|||
or
|||354|||
I'm interested in the numbers, they always change according to the page I visit (numbers being the id of the page and 3-5 characters length).
So the only thing I know for sure is those pipes surrounding the numbers.
Thanks in advance.
Using this \|\|\|\K\d{3,5}(?=\|\|\|)
gives many advantages.
https://regex101.com/r/LtbKfM/1
First, three literals without a quantifier is a simple strncmp() c
call. Also, anytime a regex starts with an assertion it is
inherently slower. Therefore, this is the fastest match for the 3
leading pipe symbols.
Second, using the \K construct excludes whatever was previously
matched from group 0. We don't want to get the 3 pipes in the
match, but we do want to match them.
edit
Note that capture group results are not stored in a special string
buffer.
Each group is really a pointer (or offset) and a length.
The pointer (or offset) is to somewhere in the source string.
When it comes time to extract a particular group string, the overload function for braces
matches[#] uses the pointer (or offset) and length to create and return a string instance.
Using the \K construct simply sets the group 0 pointer (or offset)
to the position in the string that represents the position that
matched after the \K construct.
Third, using a lookahead assertion for 3 pipe symbols does not
consume the symbols as far as the next match is concerned. This
makes these symbols available for the next match. I.e:
|||999|||888||| would get 2 matches as would
|||999|||||888|||.
The result is an array of just the numbers.
Formatted
\|\|\| # 3 pipe symbols
\K # Exclude previous items from the match (group 0)
\d{3,5} # 3-5 digits
(?= \|\|\| ) # Assertion, not consumed, 3 pipe symbols ahead
While #S.Kablar's suggestion is pretty valid, it makes use of a syntax that may be difficult for a beginner.
The more casual way to achieve your goal would be as follows:
$text = 'your input string';
if (preg_match_all('~\|{3}(\d+)\|{3}~', $text, $matches)) {
foreach($matches[1] as $number) {
var_dump($number); // prints smth like string(3) "345"
}
}
The breakdown of the regex:
~ and ~ surround the expression
\| stands for the pipe, which is a special character in regex and must be escaped with a backslash
{3} says 'the previous (the pipe) must be present exactly three times'
( and ) enclose a subpattern so that it is stored under $matches[1]
\d requires a digit
+ says 'the previous (a digit) may be repeated but must have at least one instance'
I have a pattern like this:
[X number of digits][c][32 characters (md5)][X]
/* Examples:
2 c jg3j2kf290e8ghnaje48grlrpas0942g 65
5 c kdjeuw84398fj02i397hf4343i013g44 94824
1 c pokdk94jf0934nf0932mf3923249f3j3 3
*/
Note: Those spaces into those examples aren't exist in the real string.
I need to divide such a string into four parts:
// based on first example
$the_number_of_digits = 2
$separator = c // this is constant
$hashed_string = jg3j2kf290e8ghnaje48grlrpas0942g
$number = 65
How can I do that?
Here is what I've tried so far:
/^(\d+)(c)(\w{32})/
Online Demo
My pattern cannot get last part.
EDIT: I don't want to select the rest of number as last part. I need a algorithm based on the number which is in the beginning of that string.
Because maybe my string be like this:
2 c 65 jg3j2kf290e8ghnaje48grlrpas0942g
This regex uses named groups to access the results:
(?<numDigits>\d+) (?<separator>c) (?<hashedString>\w{32}) (?<number>\d+)
edit: (from #RocketHazmat's helpful comments) since the OP wants to also validate that "number" has the number of digits from "numDigits":
Use the regex provided then validate the length of number in PHP. if(
strlen($matches['number']) == $matches['numDigits'] )
regex demo output (your string as input):
The fact that one match drives the length of another match suggests that you will need something a bit more complicated than a single expression. However, it need not be that much more complicated: sscanf was designed for this kind of job:
sscanf($code, '%dc%32s%n', $length, $md5, $width);
$number = substr($code, $width, $length);
Live example.
The trick here is that sscanf gives you the width of the string (%n) at exactly the point you need to start cutting, as well as the length (from the first %d), so you have everything you need to do simple string cuts.
Add (\d+) to the end, like you have in the beginning.
/^(\d+)(c)(\w{32})(\d+)/
/(\d)(c)([[:alnum:]]{32})(\d+)/
preg_match('/(\d)(c)([[:alnum:]]{32})(\d+)/', $string, $matches);
$the_number_of_digits = $matches[1];
$separator = $matches[2];
$hashed_string = $matches[3];
$number = $matches[4];
Then, to check if the string length of $number is equal to $the_number_of_digits, you can use strlen, i.e.:
if(strlen($number) == $the_number_of_digits){
}
The main difference from other answers is the use of [[:alnum:]], unlike \w, it won't match _.
[:alnum:]
Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘C’
locale and ASCII character encoding, this is the same as
‘[0-9A-Za-z]’.
http://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html
Regex101 Demo
Ideone Demo
Regex Explanation:
(\d)(c)([[:alnum:]]{32})(\d+)
Match the regex below and capture its match into backreference number 1 «(\d)»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d»
Match the regex below and capture its match into backreference number 2 «(c)»
Match the character “c” literally (case insensitive) «c»
Match the regex below and capture its match into backreference number 3 «([[:alnum:]]{32})»
Match a character from the **POSIX** character class “alnum” (Unicode; any letter or ideograph, digit, other number) «[[:alnum:]]{32}»
Exactly 32 times «{32}»
Match the regex below and capture its match into backreference number 4 «(\d+)»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
I have a String (filename): s_113_2.3gp
How can I extract the number that appears after the second underscore? In this case it's '2' but in some cases that can be a few digits number.
Also the number of digits that appears after the first underscore can vary so the length of this String is not constant.
You can use a capturing group:
preg_match('/_(\d+)\.\w+$/', $str, $matches);
$number = $matches[1];
\d+ represents 1 or more digits. The parentheses around that capture it, so you can later retrieve it with $matches[1]. The . needs to be escaped, because otherwise it would match any character but line breaks. \w+ matches 1 or more word characters (digits, letters, underscores). And finally the $ represents the end of the string and "anchors" the regular expression (otherwise you would get problems with strings containing multiple .).
This also allows for arbitrary file extensions.
As Ωmega pointed out below there is another possibility, that does not use a capturing group. With the concept of lookarounds, you can avoid matching _ at the start and the \.\w+$ at the end:
preg_match('/(?<=_)\d+(?=\.\w+$)/', $str, $matches);
$number = $matches[0];
However, I would recommend profiling, before applying this rather small optimization. But it is something to keep in mind (or rather, to read up on!).
Using regex lookaround it is very short code:
$n = preg_match('/(?<=_)\d+(?=\.)/', $str, $m) ? $m[0] : "";
...which reads: find one or more digits \d+ that are between underscore (?<=_) and period (?=\.)
I use this preg_match condition for matching positive, negative and decimal values
/^[0-9,-\.]{1,50}$/
But when I enter --34.000 it does not show error, when I enter 34...9868 it does not show error,
what I want is that it must accept only positive, negative and decimal values.
Better if you use something like is_numeric() if yuo need to check if it's a number.
And your regex is totally broke because as now it can accept even only a string containing 50 dots
As yes123 stated, there are better ways to detect if a given input string is a numeric value. If you'd like to stick to regular expressions, the following might be OK for you:
/^-?[0-9]+(?:\.[0-9]+)?$/
Explanation:
match start of the string ( ^)
match a possible - character (-?); the ? means "not required"
match at least one number ([0-9]+)
possibly match the whole statement in the parentheses ((?:...)?); ?: means "do not capture the subpattern"
a point (\.); the . needs to be escaped due to its special function
at least one number ([0-9]+)
match end of the string ($)
You need to split up your regular expression so that it only accepts the characters in the right places. For example:
/^[+\-]?([0-9]+,)*[0-9]+(\.[0-9]+)?$/
To explain this expression:
[+\-]?: This checks for a + or - prefix to the number. It's completely optional, but can only be a + or -.
([0-9]+,)*: This allows an optional set of comma-delimited numbers. This is for the thousands, millions etc.
[0-9]+: This requires that the value contains at least some numbers
(\.[0-9]+)?: Finally, this allows an optional decimal point with trailing numbers.
try this regex ^-?\d*\.?\d+$
i suppose it however cannot be limited to 50 chars
I have a large string (multiple lines) I need to find numbers in with regex. The position the number I need is always proceeded/follow by an exact order of characters so I can use non-capturing matches to pinpoint the exact number I need. I put together a regex to get this number but it refuses to work and I can't figure it out!
Below is a small bit of php code that I can't get to work showing the basic format of what i need
$sTestData = 'lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
$sNumberStripRE = '/.*?(?:sjdhfklsjaf<\\?kjnsdfh)(\\d+)(?:uihrfkjsn\\+%5Bmlknsadlfjncas).*?/gim';
if (preg_match_all($sNumberStripRE, $sTestData, $aMatches))
{
var_dump($aMatches);
}
the number I need is 461 and the characters before/after the spaces on either side of this number are always the same
any help getting the above regex working would be great!
This link RegExr: My Reg Ex (to an online regex genereator and my regex) shows that it should work!
g is an invalid modifier, drop it.
Ideone Link
With regard to that link, which regular expression engine is it working from? Built in Flex, so probably the ActionScript RegExp engine. They are not all the same, each one varies.
You have a number of double-backslashes, they should probably be single in those strings.
$sTestData = 'lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
$lDelim = ' sjdhfklsjaf<?kjnsdfh';
$rDelim = 'uihrfkjsn+%5Bmlknsadlfjncas ';
$start = strpos($sTestData, $lDelim) + strlen($lDelim);
$length = strpos($sTestData, $rDelim) - $start;
$number = substr($sTestData, $start, $length);
Using regex you can accomplish your goal with the following code:
$string='lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
if (preg_match('/(sjdhfklsjaf<\?kjnsdfh)(\d+)(uihrfkjsn\+%5Bmlknsadlfjncas)/', $string, $num_array)) {
$aMatches = $num_array[2];
} else {
$aMatches = "";
}
echo $aMatches;
Explanation:
I declared a variable entitled $string and made it equal to the variable you initially presented. You indicated that the characters on either side of the numeric value of interest were always the same. I assigned the numerical value of interest to $aMatches by setting $aMatches equal to back reference 2. Using the parentheses in regex you will get 3 matches: backreference 1 which will contain the characters before the number, backreference 2 which will contain the numbers that you want, and backreference 3 which is the stuff after the number. I assigned $num_array as the variable name for those backreferences and the [2] indicates that it is the second backreference. So, $num_array[1] would contain the match in backreference 1 and $num_array[3] would contain the match in backreference 3.
Here is the explanation of my regular expression:
Match the regular expression below and capture its match into backreference number 1 «(sjdhfklsjaf<\?kjnsdfh)»
Match the characters “sjdhfklsjaf<” literally «sjdhfklsjaf<»
Match the character “?” literally «\?»
Match the characters “kjnsdfh” literally «kjnsdfh»
Match the regular expression below and capture its match into backreference number 2 «(\d+)»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below and capture its match into backreference number 3 «(uihrfkjsn+%5Bmlknsadlfjncas)»
Match the characters “uihrfkjsn” literally «uihrfkjsn»
Match the character “+” literally «+»
Match the characters “%5Bmlknsadlfjncas” literally «%5Bmlknsadlfjncas»
Hope this helps and best of luck to you.
Steve