Regex Pattern Differences - "*" and "+"

Regex Pattern Differences - "*" and "+" - php

I am trying to replace " on inch i.e. 12" wall would become 12 inch wall
I have 2 patterns working:
/\b([0-9]+)"/ -> preg_replace('/\b([0-9]+)"/', '$1 inch ', $string)
and
/\b([0-9]*)"/ -> preg_replace('/\b([0-9]*)"/', '$1 inch ', $string)
what is a difference between them then, why + and * works same way here ?
cheers,
/Marcin

The + means find the previous character/group 1 or more times.
The * means find the previous character/group any amount of times (0-infinity)

/\b([0-9]+)"/ requires that there is at least one digit between the word boundary and the ", whereas /\b([0-9]*)"/ also accepts zero digits. So the first does not match a space followed by " and the second does.
If you want to mach both new 15 " tv and new 15" tv you need to match against a space character that may or may not be present:
/\b([0-9]+)\s?"/
This matches a word boundary, followed by a sequence (on or more) numbers, optionally followed by one space (or tab), followed by a ". I presume that's what you are looking for.
If not, you should first define strings that must match and strings that may not match.

Related

How can I taking apart a string?

I have a pattern like this:
[X number of digits][c][32 characters (md5)][X]
/* Examples:
2 c jg3j2kf290e8ghnaje48grlrpas0942g 65
5 c kdjeuw84398fj02i397hf4343i013g44 94824
1 c pokdk94jf0934nf0932mf3923249f3j3 3
*/
Note: Those spaces into those examples aren't exist in the real string.
I need to divide such a string into four parts:
// based on first example
$the_number_of_digits = 2
$separator = c // this is constant
$hashed_string = jg3j2kf290e8ghnaje48grlrpas0942g
$number = 65
How can I do that?
Here is what I've tried so far:
/^(\d+)(c)(\w{32})/
Online Demo
My pattern cannot get last part.
EDIT: I don't want to select the rest of number as last part. I need a algorithm based on the number which is in the beginning of that string.
Because maybe my string be like this:
2 c 65 jg3j2kf290e8ghnaje48grlrpas0942g

This regex uses named groups to access the results:
(?<numDigits>\d+) (?<separator>c) (?<hashedString>\w{32}) (?<number>\d+)
edit: (from #RocketHazmat's helpful comments) since the OP wants to also validate that "number" has the number of digits from "numDigits":
Use the regex provided then validate the length of number in PHP. if(
strlen($matches['number']) == $matches['numDigits'] )
regex demo output (your string as input):

The fact that one match drives the length of another match suggests that you will need something a bit more complicated than a single expression. However, it need not be that much more complicated: sscanf was designed for this kind of job:
sscanf($code, '%dc%32s%n', $length, $md5, $width);
$number = substr($code, $width, $length);
Live example.
The trick here is that sscanf gives you the width of the string (%n) at exactly the point you need to start cutting, as well as the length (from the first %d), so you have everything you need to do simple string cuts.

Add (\d+) to the end, like you have in the beginning.
/^(\d+)(c)(\w{32})(\d+)/

/(\d)(c)([[:alnum:]]{32})(\d+)/
preg_match('/(\d)(c)([[:alnum:]]{32})(\d+)/', $string, $matches);
$the_number_of_digits = $matches[1];
$separator = $matches[2];
$hashed_string = $matches[3];
$number = $matches[4];
Then, to check if the string length of $number is equal to $the_number_of_digits, you can use strlen, i.e.:
if(strlen($number) == $the_number_of_digits){
}
The main difference from other answers is the use of [[:alnum:]], unlike \w, it won't match _.
[:alnum:]
Alphanumeric characters: ‘[:alpha:]’ and ‘[:digit:]’; in the ‘C’
locale and ASCII character encoding, this is the same as
‘[0-9A-Za-z]’.
http://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html
Regex101 Demo
Ideone Demo
Regex Explanation:
(\d)(c)([[:alnum:]]{32})(\d+)
Match the regex below and capture its match into backreference number 1 «(\d)»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d»
Match the regex below and capture its match into backreference number 2 «(c)»
Match the character “c” literally (case insensitive) «c»
Match the regex below and capture its match into backreference number 3 «([[:alnum:]]{32})»
Match a character from the **POSIX** character class “alnum” (Unicode; any letter or ideograph, digit, other number) «[[:alnum:]]{32}»
Exactly 32 times «{32}»
Match the regex below and capture its match into backreference number 4 «(\d+)»
Match a single character that is a “digit” (any decimal number in any Unicode script) «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

replace substring if follows number, eg screen resolution, php

I have looked everywhere but I am stuck and need some help.
I believe I need a regex to replace a quote " (for inches) at the end of a number.
For example 21.5" with 21.5inch
I need to do it only if the " has a number before it.
Thanks in advance!

The regex for this task is
(\d)"
Here is DEMO with explanation

Try this:
(?<=\d)"
https://regex101.com/r/lC9tZ7/2
It should grab the " as long as it follows a digit.
If it can have spaces between the digit and ", try this:
(?<=\d)\s*"
https://regex101.com/r/cT1bF7/3

Try (\d+\.{0,1}\d+)\s*"
Explanation: Lets try matching 21.54 inch
\d+ matches 21
\.{0,1} escapes decimal notation and matches if there's a . atleast 0 times (i.e., there is no decimal at all) and atmost 1 times (i.e., a number can only have at most 1 decimal). So, we have so far matched 21.
\d+ matches the remaining 54. So far matched 21.54
\s* forgives if there is any space followed by the number
" finally ensures that the number is followed by the inch notation.
Check this demo link here.

Trying to create a regex for 1d barcode(RegexIterator)

I'm trying for couple of days to create a regex for finding the correct picture by the product barcode from the pictures folder.
The folder containing something like 4500 pictures.
The name of the file can be in 4 formats.
XXXXXX.jpg/png - short barcode unknown number of characters(numbers only).
00000(from 1 to unknow number of leading zero)XXXX(then the short barcode).jpg/png
729(as leading number)00000(from 1 to unknow number of leading zero)XXXX(then the short barcode).jpg/png
72900000XXXXXXYYY YYY YYY.jpg/png same as option 3 but with some characters(Y-represent a character).
I came up with something like that:
$i = new RegexIterator($a, '($barcode)\D*|^([0][0-9]+$barcode)\D+|(729[0-9][0-9]+$barcode)\D+|(729[0-9][0-9]+$barcode).+/', RegexIterator::GET_MATCH);
$barcode - can be 7290000232 or 0000232 or 232
But it doesn't working.
Any ideas?

You have four cases that build up on each other:
Only numbers, 1 to unlimited times: \d+
1. with leading zeros: effectively the same as 1., as zeros are numbers ;) No need for a special case here
1. optionally preceeded by 729: (?:729)?\d+ (this may already be used for the cases 1.-3.)
3. with optional characters (zero to unlimited): (?:729)?\d+(?:[a-zA-Z])*
Only the extension is left to be added:
((?:729)?\d+(?:[a-zA-Z])*\.(?:jpg|png))
Now there's one thing left. This regex would match on abc123.jpg, as 123.jpg is perfectly valid. To counter this we add ^ (this denotes the start of the input):
^((?:729)?\d+(?:[a-zA-Z])*\.(?:jpg|png))
demo # regex101
As you insert the barcode (from case 1) yourself there are few adjustments to be made:
^((?:729)?0*?$barcode(?:[a-zA-Z])*\.(?:jpg|png))
Here we have to insert the second case with 0*? (0 zero to unlimited times, lazy).
Regarding the [a-zA-Z]: you have to decide what to allow here. Currently it only allows lowercase and uppercase letters. If you want to allow spaces (for example), then simply add them to the character group: [a-zA-Z ].
For non-latin characters you can use [\x{00BF}-\x{1FFF}\x{2C00}-\x{D7FF}a-zA-Z] (credits to this comment) as your character group, so your regex would then look like:
^((?:729)?0*?123(?:[\x{00BF}-\x{1FFF}\x{2C00}-\x{D7FF}a-zA-Z])*\.(?:jpg|png))
demo # regex101

From what I understand - options 1-3 are all the same (729 is a digit string same as others):
^\d+(?:jpg|png)$
With 4 you are saying 'allow word characters and whitespaces, but only if name starts with 729'. So it is now:
(?:(?:^\d+[.](?:jpg|png)$)|(?:^729\d*[\w\s]+[.](?:jpg|png)$))
Demo here.
\s matches spaces, '\w' matches word characters.

Regex for detecting the same character more than five times?

I'm trying to figure out how to write a regex that can detect if in my string, any character is repeated more than five times consecutively? For example it wouldn't detect "hello", but it would detect "helloooooooooo".
Any ideas?
Edit: Sorry, to clarify, I need it to detect the same character repeated more than five times, not any sequence of five characters. And I also need it to work with any charter, not just "o" like in my example. ".{5,}" is no good because it just detects any sequence of any five characters, not the same character.

This should do it
(\w)\1{5,}
(\w) match any character and put it in the first group
\1{5,} check that the first group match at least 5 times.
Usage :
$input = 'helloooooooooo';
if (preg_match('/(\w)\1{5,}/', $input)) {
# Successful match
} else {
# Match attempt failed
}

Correction, should be (.)\1{5,}, I believe. My mistake. This gets you:
(.) #Any character
\1 #The character captured by (.)
{5,} #At least 5 more repetitions (total of at least 6)
You can also restrict it to letters by using (\w)\1{5,} or ([a-zA-Z])\1{5,}

You can use the regex:
(.)\1{5,}
Explanation:
. : Meta char that matches any
char.
() : Are used for grouping and
remembering the matched single char.
\1 : back reference to the single
char that was remembered in prev
step.
{5,} : Quantifier for 5 or more
and in PHP you can use it as:
$input = 'helloooooooooo';
if(preg_match('/(.)\1{5,}/',$input,$matches)) {
echo "Found repeating char $matches[1] in $input";
}
Output:
Found repeating char o in helloooooooooo

Yep.
(.)\1+
This will match repeated sequences of any character.
The \1 looks at the contents of the first set of brackets. (so if you have more complex regex, you'd need to adjust it to the correct number so it picks up the right set of brackets).
If you need to specify, say more than three of them:
(.)\1{3,}
The \1 syntax is quite powerful -- eg You can also use it elsewhere in your regex to search for the same character appearing in different places in your search string.

Can Someone explain this reg ex to me?

I recently asked a question on formatting a telephone number and I got lots of responses. Most of the responses were great but one i really wanted to figure out what its doing because it worked great. If phone is the following how do the other lines work...what are they doing so i can learn
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);

Let's break the code into two lines.
preg_replace("~[^0-9]~", "", $phone);
First, we're going to replace matches to a regex with an empty string (in other words, delete matches from the string). The regex is [^0-9] (the ~ on each end is a delimiter). [...] in a regex defines a character class, which tells the regex engine to match one character within the class. Dashes are generally special characters inside a character class, and are used to specify a range (ie. 0-9 means all characters between 0 and 9, inclusive).
You can think of a character class like a shorthand for a big OR condition: ie. [0-9] is a shorthand for 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9. Note that classes don't have to contain ranges, either -- [aeiou] is a character class that matches a or e or i or o or u (or in other words, any vowel).
When the first character in the class is ^, the class is negated, which means that the regex engine should match any character that isn't in the class. So when you put all that together, the first line removes anything that isn't a digit (a character between 0 and 9) from $phone.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
The second line tries to match $phone against a second expression, and puts the results into an array called $matches, if a match is made. You will note there are three sets of brackets; these define capturing groups -- ie. if there is a match of a pattern as a whole, you will end up with three submatches, which in this case will contain the area code, prefix and suffix of the phone number. In general, anything contained in brackets in a regular expression is capturing (while there are exceptions, they are beyond the scope of this explanation). Groups can be useful for other things too, without wanting the overhead of capturing, so a group can be made non-capturing by prefacing it with ?: (ie. (?:...)).
Each group does a similar thing: [0-9]{3} or [0-9]{4}. As we saw above, [0-9] defines a character class containing the digits between 0 and 9 (as the classes here don't start with ^, these are not negated groups). The {3} or {4} is a repetition operator, which says "match exactly 3 (or 4) of the previous token (or group)". So [0-9]{3} will match exactly three digits in a row, and [0-9]{4} will match exactly four digits in a row. Note that the digits don't have to be all the same (ie. 111), because the character class is evaluate for each repetition (so 123 will match because 1 matches [0-9], then 2 matches [0-9], and then 3 matches [0-9]).

In the preg_replace it looks for anything that is not, ^ inside of the [], 0-9 (basically not a number) and replaces / removes it from that string given the replacement is "".
For the first section, it pulls out the first 3 numbers ([0-9]{3}) the {3} is the number of characters to match the items inside the [] are what to match and since this is inside of paranthesis () it stores it as a match in the array $matches. The second part pulls out the next 3 numbers and the last part pulls out the last 4 numbers from $phone and stores the matches that were matched in $matches.
The ~ are delimeters for the regular expressions.

You know it's a regular expression from the regex tag.
So, you are pattern matching.
The pattern you are matching is: [^0-9] followed by the phone number.
[^0-9] is NOT '^' any one digit
So, the match after that is any 3 digits, followed by any 3 digits, followed by any 4 digits.
I don't think it will match because of the () around the area code and the dash are missing.
I'd do this:
~\(([0-9]{3})\)([0-9]{3})-([0-9]{4})~'

"[^0-9]" means everything but numbers from 0 to 9. So basically, first line replace everything but numbers with "" (nothing)
[0-9]{3} means number from 0 to 9, 3 times in a row.
So it check if you have 3 numbers then 3 numbers than 4 numbers and try to match it with $matches.

Check this tuts
Using Regular Expressions with PHP
http://www.webcheatsheet.com/php/regular_expressions.php

$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
In php you have to delimit regex pattern in some non-alphanumeric character "~" is used here.
[^0-9] is regex pattern used to remove anything out of $phone that is not in 0-9 range remember [^...] will negate the pattern it precedes.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Again in this line of code you have "~" as delimiter and
([0-9]{3}) this part of pattern will return 3 numbers from string (note: {} is used to specify range/number of characters to match) in a different output array dimension (check your $matches variable for result) using ( ) in a pattern results in groups/submatches

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex Pattern Differences - "*" and "+" - php

The + means find the previous character/group 1 or more times. The * means find the previous character/group any amount of times (0-infinity)

Related

How can I taking apart a string?

replace substring if follows number, eg screen resolution, php

Trying to create a regex for 1d barcode(RegexIterator)

Regex for detecting the same character more than five times?

Can Someone explain this reg ex to me?

Categories

Resources