Regex equivalence yield error - php

Are these two regexp equivalent, except that second don't capture dpthd text?
'/<a name="dpthd_.*><\/a><a.*><\/a><a.*><\/a><h3.*>(.*)<\/h3>(.*)\s*(dpthd)/sU'
'/<a name="dpthd_.*><\/a><a.*><\/a><a.*><\/a><h3.*>(.*)<\/h3>(.*)\s*dpthd/sU'
I just removed parenthesis.
The problem is that the first don't work, and the second works fine
EDIT>>>>
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
EDIT>>>>
Ok, I don't know what was a problem, I added m modifier and use preg_match_all now
preg_match_all('/<a name="dpthd_.*><\/a><a.*><\/a><a.*><\/a><h3.*>(.*)<\/h3>(.*)(?:<a name="dpthd_.*>|$)/sUm', $contents, $matches, PREG_OFFSET_CAPTURE)
Seems it's work, but I will place a test text later, because I want to know why it wasn't working in first version

Here you have a detailed view of the regexp to comapre :)
First Regexp
r"""
<a\ name="dpthd_ # Match the characters “<a name="dpthd_” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><a # Match the characters “a><a” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><a # Match the characters “a><a” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><h3 # Match the characters “a><h3” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
> # Match the character “>” literally
( # Match the regular expression below and capture its match into backreference number 1
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
< # Match the character “<” literally
\/ # Match the character “/” literally
h3> # Match the characters “h3>” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 3
dpthd # Match the characters “dpthd” literally
)
"""
Second Regexp
r"""
<a\ name="dpthd_ # Match the characters “<a name="dpthd_” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><a # Match the characters “a><a” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><a # Match the characters “a><a” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><h3 # Match the characters “a><h3” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
> # Match the character “>” literally
( # Match the regular expression below and capture its match into backreference number 1
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
< # Match the character “<” literally
\/ # Match the character “/” literally
h3> # Match the characters “h3>” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
dpthd # Match the characters “dpthd” literally
"""
HTH!

Related

PHP regex: each word must end with dot

Can someone help me how to specific pattern for preg_match function?
Every word in string must end with dot
First character of string must be [a-zA-Z]
After each dot there can be a space
There can't be two spaces next to each other
Last character must be a dot (logicaly after word)
Examples:
"Ing" -> false
"Ing." -> true
".Ing." -> false
"Xx Yy." -> false
"XX. YY." -> true
"XX.YY." -> true
Can you help me please how to test the string? My pattern is
/^(([a-zA-Z]+)(?! ) \.)+\.$/
I know it's wrong, but i can't figure out it. Thanks
Check how this fits your needs.
/^(?:[A-Z]+\. ?)+$/i
^ matches start
(?: opens a non-capture group for repetition
[A-Z]+ with i flag matches one or more alphas (lower & upper)
\. ? matches a literal dot followed by an optional space
)+ all this once or more until $ end
Here's a demo at regex101
If you want to disallow space at the end, add negative lookbehind: /^(?:[A-Z]+\. ?)+$(?<! )/i
Try this:
$string = "Ing
Ing.
.Ing.
Xx Yy.
XX. YY.
XX.YY.";
if (preg_match('/^([A-Za-z]{1,}\.[ ]{0,})*/m', $string)) {
// Successful match
} else {
// Match attempt failed
}
Result:
The Regex in detail:
^ Assert position at the beginning of a line (at beginning of the string or after a line break character)
( Match the regular expression below and capture its match into backreference number 1
[A-Za-z] Match a single character present in the list below
A character in the range between “A” and “Z”
A character in the range between “a” and “z”
{1,} Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\. Match the character “.” literally
[ ] Match the character “ ”
{0,} Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)

php - regex - how to extract a number with decimal (dot and comma) from a string (e.g. 1,120.01)?

how to extract a number with decimal (dot and comma) from a string (e.g. 1,120.01) ?
I have a regex but doesn't seem to play well with commas
preg_match('/([0-9]+\.[0-9]+)/', $s, $matches);
The correct regex for matching numbers with commas and decimals is as follows (The first two will validate that the number is correctly formatted):
decimal optional (two decimal places)
^[+-]?[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]{2})?$
Debuggex Demo
Explained:
number (decimal optional)
^[+-]?[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]{2})?$
Options: case insensitive
Assert position at the beginning of the string «^»
Match a single character present in the list below «[+-]?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
The character “+” «+»
The character “-” «-»
Match a single character in the range between “0” and “9” «[0-9]{1,3}»
Between one and 3 times, as many times as possible, giving back as needed (greedy) «{1,3}»
Match the regular expression below «(?:,?[0-9]{3})*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “,” literally «,?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match a single character in the range between “0” and “9” «[0-9]{3}»
Exactly 3 times «{3}»
Match the regular expression below «(?:\.[0-9]{2})?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character “.” literally «\.»
Match a single character in the range between “0” and “9” «[0-9]{2}»
Exactly 2 times «{2}»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
Will Match:
1,432.01
456.56
654,246.43
432
321,543
Will not Match
454325234.31
324,123.432
,,,312,.32
123,.23
decimal mandatory (two decimal places)
^[+-]?[0-9]{1,3}(?:,?[0-9]{3})*\.[0-9]{2}$
Debuggex Demo
Explained:
number (decimal required)
^[+-]?[0-9]{1,3}(?:,?[0-9]{3})*\.[0-9]{2}$
Options: case insensitive
Assert position at the beginning of the string «^»
Match a single character present in the list below «[+-]?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
The character “+” «+»
The character “-” «-»
Match a single character in the range between “0” and “9” «[0-9]{1,3}»
Between one and 3 times, as many times as possible, giving back as needed (greedy) «{1,3}»
Match the regular expression below «(?:,?[0-9]{3})*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “,” literally «,?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match a single character in the range between “0” and “9” «[0-9]{3}»
Exactly 3 times «{3}»
Match the character “.” literally «\.»
Match a single character in the range between “0” and “9” «[0-9]{2}»
Exactly 2 times «{2}»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
Will Match:
1,432.01
456.56
654,246.43
324.75
Will Not Match:
1,43,2.01
456,
654,246
324.7523
Matches Numbers separated by commas or decimals indiscriminately:
^(\d+(.|,))+(\d)+$
Debuggex Demo
Explained:
Matches Numbers Separated by , or .
^(\d+(.|,))+(\d)+$
Options: case insensitive
Match the regular expression below and capture its match into backreference number 1 «(\d+(.|,))+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «+»
Match a single digit 0..9 «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below and capture its match into backreference number 2 «(.|,)»
Match either the regular expression below (attempting the next alternative only if this one fails) «.»
Match any single character that is not a line break character «.»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «,»
Match the character “,” literally «,»
Match the regular expression below and capture its match into backreference number 3 «(\d)+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «+»
Match a single digit 0..9 «\d»
Will Match:
1,32.543,2
5456.35,3.2,6.1
2,7
1.6
Will Not Match:
1,.2 // two ., side by side
1234,12345.5467. // ends in a .
,125 // begins in a ,
,.234 // begins in a , and two symbols side by side
123,.1245. // ends in a . and two symbols side by side
Note: wrap either in a group and then just pull the group, let me know if you need more specifics.
Description: This type of RegEx works with any language really (PHP, Python, C, C++, C#, JavaScript, jQuery, etc). These Regular Expressions are good for currency mainly.
You can use this regex: -
/((?:[0-9]+,)*[0-9]+(?:\.[0-9]+)?)/
Explanation: -
/(
(?:[0-9]+,)* # Match 1 or more repetition of digit followed by a `comma`.
# Zero or more repetition of the above pattern.
[0-9]+ # Match one or more digits before `.`
(?: # A non-capturing group
\. # A dot
[0-9]+ # Digits after `.`
)? # Make the fractional part optional.
)/
Add the comma to the range that can be in front of the dot:
/([0-9,]+\.[0-9]+)/
# ^ Comma
And this regex:
/((?:\d,?)+\d\.[0-9]*)/
Will only match
1,067120.01
121,34,120.01
But not
,,,.01
,,1,.01
12,,,.01
# /(
# (?:\d,?) Matches a Digit followed by a optional comma
# + And at least one or more of the previous
# \d Followed by a digit (To prevent it from matching `1234,.123`)
# \.? Followed by a (optional) dot
# in case a fraction is mandatory, remove the `?` in the previous section.
# [0-9]* Followed by any number of digits --> fraction? replace the `*` with a `+`
# )/
The locale-aware float (%f) might be used with sscanf.
$result = sscanf($s, '%f')
That doesn't split the parts into an array though. It simply parses a float.
See also: http://php.net/manual/en/function.sprintf.php
A regex approach:
/([0-9]{1,3}(?:,[0-9]{3})*\.[0-9]+)/
This should work
preg_match('/\d{1,3}(,\d{3})*(\.\d+)?/', $s, $matches);
Here is a great working regex. This accepts numbers with commas and decimals.
/^-?(?:\d+|\d{1,3}(?:,\d{3})+)?(?:\.\d+)?$/

Parse a skype log

I need to parse a skype log, grab all the call durations and add them up and find out the total duration of calls for the entire chat history.
Sample:
[3/12/2012 11:36:44 AM] * Call ended, duration 21:33 *
I think I need to use preg_match with the proper regex expression. If it's possible to store the actual timestamp in array at the same time that would be better.
I think what i'm really stumped on is the actual regex rule that's needed to grab just the call duration.
Try this
(?i)\[(?P<time_stamp>[^[]+)\]\s*[*]\s*[a-z ,]+(?P<duration>(?:\d{2}:?){2,3})\s*[*]
Explanation
"
(?i) # Match the remainder of the regex with the options: case insensitive (i)
\[ # Match the character “[” literally
(?P<time_stamp> # Match the regular expression below and capture its match into backreference with name “time_stamp”
[^[] # Match any character that is NOT a “[”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\] # Match the character “]” literally
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[*] # Match the character “*”
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[a-z ,] # Match a single character present in the list below
# A character in the range between “a” and “z”
# One of the characters “ ,”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?P<duration> # Match the regular expression below and capture its match into backreference with name “duration”
(?: # Match the regular expression below
\d # Match a single digit 0..9
{2} # Exactly 2 times
: # Match the character “:” literally
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
){2,3} # Between 2 and 3 times, as many times as possible, giving back as needed (greedy)
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[*] # Match the character “*”
"
You can use this:
\*.+?([0-9]+:){1,2}([0-9]+)
Then it can catch both HH:MM:SS and MM:SS that comes after the first *.

regular expression for links in string to exclude dot only at end of line

I created a regular expression that reads a string and transforms found url's into HTML links. I wanted to exclude the dot at the end of a line (containing the text link) but it also excludes the dot inside the text link (like in http://www.website.com/page.html.) The end dot here should be excluded but not the .html. This is my regex:
$text = preg_replace("#(^|[\n \"\'\(<;:,\*])((www|ftp)\.+[a-zA-Z0-9\-_]+\.[^ \"\'\t\n\r< \[\]\),>;:.\*]*)#", "\\1\\2", $text);
How would one do that?
Thanx! Tom
Change your RegEx to this
\b((?#protocol)https?|ftp)://((?#domain)[-A-Z0-9.]+)((?#file)/[-A-Z0-9+&##/%=~_|!:,.;]*)?((?#parameters)\?[A-Z0-9+&##/%=~_|!:,.;]*)?
or this
\b((?:https?|ftp|file)://[-A-Z0-9+&##/%?=~_|$!:,.;]*[A-Z0-9+&##/%=~_|$]*)\b
Explanation
"
\b # Assert position at a word boundary
( # Match the regular expression below and capture its match into backreference number 1
# Match either the regular expression below (attempting the next alternative only if this one fails)
http # Match the characters “http” literally
s # Match the character “s” literally
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
| # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
ftp # Match the characters “ftp” literally
| # Or match regular expression number 3 below (the entire group fails if this one fails to match)
file # Match the characters “file” literally
)
:// # Match the characters “://” literally
[-A-Z0-9+&##/%?=~_|\$!:,.;] # Match a single character present in the list below
# The character “-”
# A character in the range between “A” and “Z”
# A character in the range between “0” and “9”
# One of the characters “+&##/%?=~_|\$!:,.;”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[A-Z0-9+&##/%=~_|\$] # Match a single character present in the list below
# A character in the range between “A” and “Z”
# A character in the range between “0” and “9”
# One of the characters “+&##/%=~_|\$”
"
Hope this helps.

Regular Expression (preg_match)

This is the not working code:
<?php
$matchWith = " http://videosite.com/ID123 ";
preg_match_all('/\S\/videosite\.com\/(\w+)\S/i', $matchWith, $matches);
foreach($matches[1] as $value)
{
print 'Hyperlink';
}
?>
What I want is that it should not display the link if it has a whitespace before or after.
So now it should display nothing. But it still displays the link.
This can also match ID12, because 3 is not an space, and the / of http:/ is not a space. You can try:
preg_match_all('/^\S*\/videosite\.com\/(\w+)\S*$/i', $matchWith, $matches);
So, you don't want it to display if there's whitespaces. Something like this should work, didn't test.
preg_match_all('/^\S+?videosite\.com\/(\w+)\S+?$/i', $matchWith, $matches);
You can try this. It works:
if (preg_match('%^\S*?/videosite\.com/(\w+)(?!\S+)$%i', $subject, $regs)) {
#$result = $regs[0];
}
But i am positive that after I post this, you will update your question :)
Explanation:
"
^ # Assert position at the beginning of the string
\S # Match a single character that is a “non-whitespace character”
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\/ # Match the character “/” literally
videosite # Match the characters “videosite” literally
\. # Match the character “.” literally
com # Match the characters “com” literally
\/ # Match the character “/” literally
( # Match the regular expression below and capture its match into backreference number 1
\w # Match a single character that is a “word character” (letters, digits, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
\S # Match a single character that is a “non-whitespace character”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
"
It would probably be simpler to use this regex:
'/^http:\/\/videosite\.com\/(\w+)$/i'
I believe you are referring to the white space before http, and the white space after the directory. So, you should use the ^ character to indicate that the string must start with http, and use the $ character at the end to indicate that the string must end with a word character.

Categories