How can I taking apart a string? - php

I have a pattern like this:
[X number of digits][c][32 characters (md5)][X]
/* Examples:
2 c jg3j2kf290e8ghnaje48grlrpas0942g 65
5 c kdjeuw84398fj02i397hf4343i013g44 94824
1 c pokdk94jf0934nf0932mf3923249f3j3 3
*/
Note: Those spaces into those examples aren't exist in the real string.
I need to divide such a string into four parts:
// based on first example
$the_number_of_digits = 2
$separator = c // this is constant
$hashed_string = jg3j2kf290e8ghnaje48grlrpas0942g
$number = 65
How can I do that?
Here is what I've tried so far:
/^(\d+)(c)(\w{32})/
Online Demo
My pattern cannot get last part.
EDIT: I don't want to select the rest of number as last part. I need a algorithm based on the number which is in the beginning of that string.
Because maybe my string be like this:
2 c 65 jg3j2kf290e8ghnaje48grlrpas0942g

This regex uses named groups to access the results:
(?<numDigits>\d+) (?<separator>c) (?<hashedString>\w{32}) (?<number>\d+)
edit: (from #RocketHazmat's helpful comments) since the OP wants to also validate that "number" has the number of digits from "numDigits":
Use the regex provided then validate the length of number in PHP. if(
strlen($matches['number']) == $matches['numDigits'] )
regex demo output (your string as input):

The fact that one match drives the length of another match suggests that you will need something a bit more complicated than a single expression. However, it need not be that much more complicated: sscanf was designed for this kind of job:
sscanf($code, '%dc%32s%n', $length, $md5, $width);
$number = substr($code, $width, $length);
Live example.
The trick here is that sscanf gives you the width of the string (%n) at exactly the point you need to start cutting, as well as the length (from the first %d), so you have everything you need to do simple string cuts.

Add (\d+) to the end, like you have in the beginning.
/^(\d+)(c)(\w{32})(\d+)/

/(\d)(c)([[:alnum:]]{32})(\d+)/
preg_match('/(\d)(c)([[:alnum:]]{32})(\d+)/', $string, $matches);
$the_number_of_digits = $matches[1];
$separator = $matches[2];
$hashed_string = $matches[3];
$number = $matches[4];
Then, to check if the string length of $number is equal to $the_number_of_digits, you can use strlen, i.e.:
if(strlen($number) == $the_number_of_digits){
}
The main difference from other answers is the use of [[:alnum:]], unlike \w, it won't match _.
[:alnum:]
Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘C’
locale and ASCII character encoding, this is the same as
‘[0-9A-Za-z]’.
http://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html
Regex101 Demo
Ideone Demo
Regex Explanation:
(\d)(c)([[:alnum:]]{32})(\d+)
Match the regex below and capture its match into backreference number 1 «(\d)»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d»
Match the regex below and capture its match into backreference number 2 «(c)»
Match the character “c” literally (case insensitive) «c»
Match the regex below and capture its match into backreference number 3 «([[:alnum:]]{32})»
Match a character from the **POSIX** character class “alnum” (Unicode; any letter or ideograph, digit, other number) «[[:alnum:]]{32}»
Exactly 32 times «{32}»
Match the regex below and capture its match into backreference number 4 «(\d+)»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Related

how to extract a certain digit from a String using regular expression in php?

I have a String (filename): s_113_2.3gp
How can I extract the number that appears after the second underscore? In this case it's '2' but in some cases that can be a few digits number.
Also the number of digits that appears after the first underscore can vary so the length of this String is not constant.
You can use a capturing group:
preg_match('/_(\d+)\.\w+$/', $str, $matches);
$number = $matches[1];
\d+ represents 1 or more digits. The parentheses around that capture it, so you can later retrieve it with $matches[1]. The . needs to be escaped, because otherwise it would match any character but line breaks. \w+ matches 1 or more word characters (digits, letters, underscores). And finally the $ represents the end of the string and "anchors" the regular expression (otherwise you would get problems with strings containing multiple .).
This also allows for arbitrary file extensions.
As Ωmega pointed out below there is another possibility, that does not use a capturing group. With the concept of lookarounds, you can avoid matching _ at the start and the \.\w+$ at the end:
preg_match('/(?<=_)\d+(?=\.\w+$)/', $str, $matches);
$number = $matches[0];
However, I would recommend profiling, before applying this rather small optimization. But it is something to keep in mind (or rather, to read up on!).
Using regex lookaround it is very short code:
$n = preg_match('/(?<=_)\d+(?=\.)/', $str, $m) ? $m[0] : "";
...which reads: find one or more digits \d+ that are between underscore (?<=_) and period (?=\.)

regex to find number of specific length, but with any character except a number before or after it

I'm trying to work out a regex pattern to search a string for a 12 digit number. The number could have any number of other characters (but not numbers) in front or behind the one I am looking for.
So far I have /([0-9]{12})/ which finds 12 digit numbers correctly, however it also will match on a 13 digit number in the string.
the pattern should match 123456789012 on the following strings
"rgergiu123456789012ergewrg"
"123456789012"
"#123456789012"
"ergerg ergerwg erwgewrg \n rgergewrgrewg regewrge 123456789012 ergwerg"
it should match nothing on these strings:
"123456789012000"
"egjkrgkergr 123123456789012"
What you want are look-arounds. Something like:
/(?<![0-9])[0-9]{12}(?![0-9])/
A lookahead or lookbehind matches if the pattern is preceded or followed by another pattern, without consuming that pattern. So this pattern will match 12 digits only if they are not preceded or followed by more digits, without consuming the characters before and after the numbers.
/\D(\d{12})\D/ (in which case, the number will be capture index 1)
Edit: Whoops, that one doesn't work, if the number is the entire string. Use the one below instead
Or, with negative look-behind and look-ahead: /(?<!\d)\d{12}(?!\d)/ (where the number will be capture index 0)
if( preg_match("/(?<!\d)\d{12}(?!\d)/", $string, $matches) ) {
$number = $matches[0];
# ....
}
where $string is the text you're testing

preg_match string

Can someone explain me the meaning of this pattern.
preg_match(/'^(d{1,2}([a-z]+))(?:s*)S (?=200[0-9])/','21st March 2006','$matches);
So correct me if I'm wrong:
^ = beginning of the line
d{1,2} = digit with minimum 1 and maximum 2 digits
([a-z]+) = one or more letters from a-z
(?:s*)S = no idea...
(?= = no idea...
200[0-9] = a number, starting with 200 and ending with a number (0-9)
Can someone complete this list?
Here's a nice diagram courtesy of strfriend:
But I think you probably meant ^(\d{1,2}([a-z]+))(?:\s*)\S (?=200[0-9]) with the backslashes, which gives this diagram:
That is, this regexp matches the beginning of the string, followed by one or two digits, one or more lowercase letters, zero or more whitespace characters, one non-whitespace character and a space. Also, all this has to be followed by a number between 2000 and 2009, although that number is not actually matched by the regexp — it's only a look-ahead assertion. Also, the leading digits and letters are captures into $matches[1], and just the letters into $matches[2].
For more information on PHP's PCRE regexp syntax, see http://php.net/manual/en/pcre.pattern.php
regular-exressions.info is very helpful resource.
/'^(d{1,2}([a-z]+))(?:s*)S (?=200[0-9])/
(?:regex) are non-capturing parentheses; They aren't very useful in your example, but could be used to expres things like (?:bar)+, to mean 1 or more bars
(?=regex) does a positive lookahead, but matches the position not the contents. So (?=200[0-9]) in your example makes the regex match only dates in the previous decade, without matching the year itself.

Remove number from large string in specific position [PHP RegEx]

I have a large string (multiple lines) I need to find numbers in with regex. The position the number I need is always proceeded/follow by an exact order of characters so I can use non-capturing matches to pinpoint the exact number I need. I put together a regex to get this number but it refuses to work and I can't figure it out!
Below is a small bit of php code that I can't get to work showing the basic format of what i need
$sTestData = 'lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
$sNumberStripRE = '/.*?(?:sjdhfklsjaf<\\?kjnsdfh)(\\d+)(?:uihrfkjsn\\+%5Bmlknsadlfjncas).*?/gim';
if (preg_match_all($sNumberStripRE, $sTestData, $aMatches))
{
var_dump($aMatches);
}
the number I need is 461 and the characters before/after the spaces on either side of this number are always the same
any help getting the above regex working would be great!
This link RegExr: My Reg Ex (to an online regex genereator and my regex) shows that it should work!
g is an invalid modifier, drop it.
Ideone Link
With regard to that link, which regular expression engine is it working from? Built in Flex, so probably the ActionScript RegExp engine. They are not all the same, each one varies.
You have a number of double-backslashes, they should probably be single in those strings.
$sTestData = 'lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
$lDelim = ' sjdhfklsjaf<?kjnsdfh';
$rDelim = 'uihrfkjsn+%5Bmlknsadlfjncas ';
$start = strpos($sTestData, $lDelim) + strlen($lDelim);
$length = strpos($sTestData, $rDelim) - $start;
$number = substr($sTestData, $start, $length);
Using regex you can accomplish your goal with the following code:
$string='lak sjdhfklsjaf<?kjnsdfh461uihrfkjsn+%5Bmlknsadlfjncas dlk';
if (preg_match('/(sjdhfklsjaf<\?kjnsdfh)(\d+)(uihrfkjsn\+%5Bmlknsadlfjncas)/', $string, $num_array)) {
$aMatches = $num_array[2];
} else {
$aMatches = "";
}
echo $aMatches;
Explanation:
I declared a variable entitled $string and made it equal to the variable you initially presented. You indicated that the characters on either side of the numeric value of interest were always the same. I assigned the numerical value of interest to $aMatches by setting $aMatches equal to back reference 2. Using the parentheses in regex you will get 3 matches: backreference 1 which will contain the characters before the number, backreference 2 which will contain the numbers that you want, and backreference 3 which is the stuff after the number. I assigned $num_array as the variable name for those backreferences and the [2] indicates that it is the second backreference. So, $num_array[1] would contain the match in backreference 1 and $num_array[3] would contain the match in backreference 3.
Here is the explanation of my regular expression:
Match the regular expression below and capture its match into backreference number 1 «(sjdhfklsjaf<\?kjnsdfh)»
Match the characters “sjdhfklsjaf<” literally «sjdhfklsjaf<»
Match the character “?” literally «\?»
Match the characters “kjnsdfh” literally «kjnsdfh»
Match the regular expression below and capture its match into backreference number 2 «(\d+)»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below and capture its match into backreference number 3 «(uihrfkjsn+%5Bmlknsadlfjncas)»
Match the characters “uihrfkjsn” literally «uihrfkjsn»
Match the character “+” literally «+»
Match the characters “%5Bmlknsadlfjncas” literally «%5Bmlknsadlfjncas»
Hope this helps and best of luck to you.
Steve

Can Someone explain this reg ex to me?

I recently asked a question on formatting a telephone number and I got lots of responses. Most of the responses were great but one i really wanted to figure out what its doing because it worked great. If phone is the following how do the other lines work...what are they doing so i can learn
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Let's break the code into two lines.
preg_replace("~[^0-9]~", "", $phone);
First, we're going to replace matches to a regex with an empty string (in other words, delete matches from the string). The regex is [^0-9] (the ~ on each end is a delimiter). [...] in a regex defines a character class, which tells the regex engine to match one character within the class. Dashes are generally special characters inside a character class, and are used to specify a range (ie. 0-9 means all characters between 0 and 9, inclusive).
You can think of a character class like a shorthand for a big OR condition: ie. [0-9] is a shorthand for 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9. Note that classes don't have to contain ranges, either -- [aeiou] is a character class that matches a or e or i or o or u (or in other words, any vowel).
When the first character in the class is ^, the class is negated, which means that the regex engine should match any character that isn't in the class. So when you put all that together, the first line removes anything that isn't a digit (a character between 0 and 9) from $phone.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
The second line tries to match $phone against a second expression, and puts the results into an array called $matches, if a match is made. You will note there are three sets of brackets; these define capturing groups -- ie. if there is a match of a pattern as a whole, you will end up with three submatches, which in this case will contain the area code, prefix and suffix of the phone number. In general, anything contained in brackets in a regular expression is capturing (while there are exceptions, they are beyond the scope of this explanation). Groups can be useful for other things too, without wanting the overhead of capturing, so a group can be made non-capturing by prefacing it with ?: (ie. (?:...)).
Each group does a similar thing: [0-9]{3} or [0-9]{4}. As we saw above, [0-9] defines a character class containing the digits between 0 and 9 (as the classes here don't start with ^, these are not negated groups). The {3} or {4} is a repetition operator, which says "match exactly 3 (or 4) of the previous token (or group)". So [0-9]{3} will match exactly three digits in a row, and [0-9]{4} will match exactly four digits in a row. Note that the digits don't have to be all the same (ie. 111), because the character class is evaluate for each repetition (so 123 will match because 1 matches [0-9], then 2 matches [0-9], and then 3 matches [0-9]).
In the preg_replace it looks for anything that is not, ^ inside of the [], 0-9 (basically not a number) and replaces / removes it from that string given the replacement is "".
For the first section, it pulls out the first 3 numbers ([0-9]{3}) the {3} is the number of characters to match the items inside the [] are what to match and since this is inside of paranthesis () it stores it as a match in the array $matches. The second part pulls out the next 3 numbers and the last part pulls out the last 4 numbers from $phone and stores the matches that were matched in $matches.
The ~ are delimeters for the regular expressions.
You know it's a regular expression from the regex tag.
So, you are pattern matching.
The pattern you are matching is: [^0-9] followed by the phone number.
[^0-9] is NOT '^' any one digit
So, the match after that is any 3 digits, followed by any 3 digits, followed by any 4 digits.
I don't think it will match because of the () around the area code and the dash are missing.
I'd do this:
~\(([0-9]{3})\)([0-9]{3})-([0-9]{4})~'
"[^0-9]" means everything but numbers from 0 to 9. So basically, first line replace everything but numbers with "" (nothing)
[0-9]{3} means number from 0 to 9, 3 times in a row.
So it check if you have 3 numbers then 3 numbers than 4 numbers and try to match it with $matches.
Check this tuts
Using Regular Expressions with PHP
http://www.webcheatsheet.com/php/regular_expressions.php
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
In php you have to delimit regex pattern in some non-alphanumeric character "~" is used here.
[^0-9] is regex pattern used to remove anything out of $phone that is not in 0-9 range remember [^...] will negate the pattern it precedes.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Again in this line of code you have "~" as delimiter and
([0-9]{3}) this part of pattern will return 3 numbers from string (note: {} is used to specify range/number of characters to match) in a different output array dimension (check your $matches variable for result) using ( ) in a pattern results in groups/submatches

Categories