preg_match_all for a string

preg_match_all for a string - php

I'm look to just get America/Chicago from this string
Local Time Zone (America/Chicago (CDT) offset -18000 (Daylight))
and have it work for other TimeZones, like America/Los_Angeles,America/New_York and so on. Im not very good with prey_match_all and also, if someone can direct me to a good tutorial on how to properly learn that,because this is the 3rd time ive needed to use it.

here is your solution with my regular expression
Code
$in = "Local Time Zone (America/Chicago (CDT) offset -18000 (Daylight))";
preg_match_all('/\(([A-Za-z0-9_\W]+?)\\s/', $in, $out);
echo "<pre>";
print_r($out[1][0]);
?>
And OUTPUT
America/Chicago
hope this will sure help you.

I'd use the regex: \((\S+)
preg_match_all('/\((\S+)/', $in, $out);
explanation:
The regular expression:
(?-imsx:\((\S+))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Related

PHP Regex on CD Tracklists

I'm using preg_match to format tracklists so the track number, title and duration are separated into their own cells in a table:
<td>01</td><td>Track Title</td><td>01:23</td>
The problem is that the tracks themselves can take any of the following forms (the leading zeros on the track numbers and durations are not always present):
01. Track Title (01:23)
01. Track Title 01:23
01. Track Title
1 Track Title (01:23)
1 Track Title 01:23
1 Track Title
The following only works on tracks with a timestamp:
/([0-9]+)\.?[\s+](.*)[\s+](\?[0-5]?[0-9]:[0-5][0-9]\)?)/
So I added ? to the timestamp:
/([0-9]+)\.?[\s+](.*)[\s+]((\?[0-5]?[0-9]:[0-5][0-9]\)?)?/
This then works for tracks without a timestamp, but tracks with a timestamp end up with the timestamp stuck with the title, like so:
<td>01</td><td>Track Title 01:23</td><td></td>
EDIT: The tracklists are plaintext and are being pulled from an SQL table before parsing.

Try this:
/^([0-9]+)\.?[\s]+(.*)([\s]+(\(?[0-5]?[0-9]:[0-5][0-9]\)?))?$/U
Note I am used ungreedy pattern modifier U to try to match smallest matching string and I have anchored the beginning and end of the string.

By default regular expressions are greedy so the part for matching title .*
eats the rest of the string because last part with duration is optional.
Use /U modifier to turn on ungreedy behavior - look for PCRE_UNGREEDY on
http://us1.php.net/manual/en/reference.pcre.pattern.modifiers.php

How about:
^([0-9]+)\.?\s+(.*?)(?:\(?([0-5]?[0-9]:[0-5][0-9])\)?)?$
explanation:
The regular expression:
(?-imsx:^([0-9]+)\.?\s+(.*?)(?:\(?([0-5]?[0-9]:[0-5][0-9])\)?)?$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\(? '(' (optional (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
[0-5]? any character of: '0' to '5' (optional
(matching the most amount possible))
----------------------------------------------------------------------
[0-9] any character of: '0' to '9'
----------------------------------------------------------------------
: ':'
----------------------------------------------------------------------
[0-5] any character of: '0' to '5'
----------------------------------------------------------------------
[0-9] any character of: '0' to '9'
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
\)? ')' (optional (matching the most amount
possible))
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Regex to find information in PHP function

I'm working on a regex but I'm not able to fix it.
I'm scanning documents (.php) with PHP and I'm looking for: $__this('[TEXT]') or $__this("[TEXT]")
So my question is: can somebody help me with a regex that searches in a string for: $__this('[TEXT]') or $__this("[TEXT]") and gives me [TEXT]
UPDATE (with answer, thanks to #Explosion Pills):
$string = '$__this("Foo Bar<br>HelloHello")';
preg_match('/\$__this\(([\'"])(.*?)\1\)/xi', $string, $matches);
print_r($matches);

preg_match('/
\$__this # just $__this. $ is meta character and must be escaped
\( # open paren also must be escaped
([\'"]) # open quote (capture for later use). \' is needed in string
(\[ # start capture. open bracket must also be escaped
.*? # Ungreedily capture whatever is between the quotes
\]) # close the open bracket and end capture
\1 # close the quote (captured earlier)
\) # close the parentheses
/xi' # ignore whitespace in pattern, allow comments, case insensitive
, $document, $matches);
The captured text will be in $matches[2]. This assumes one possible capture per line. If you need more, use preg_match_all.

how about:
preg_match('/\$__this(?:(\'|")\((.+?)\)\1)/', $string);
explanation:
(?-imsx:\$__this(?:(\'|")\((.+?)\)\1))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\$ '$'
----------------------------------------------------------------------
__this '__this'
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\' '''
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
" '"'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
.+? any character except \n (1 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\) ')'
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Here's a solution that will catch strings with quotes and apostrophes in them as well.
$txt = "
blah blah blah
blah \$_this('abc') blah
blah \$_this('a\"b\"c') blah balah \$_this('a\"b\"c\'')
\$_this(\"123\");\$_this(\"1'23\") \$_this(\"1'23\\\"\")
";
$matches = array();
preg_match_all('/(?:\$_this\()(?:[\'"])(.*?[^\\\])(?:[\'"])(?:\))/im', $txt, $matches);
print_r($matches[1]);

How do you create a string to match an regex?

I need to create a formatting documentation. I know the regex that are used to format the text but I don't know how to reproduce an example for that regex.
This one should be an internal link:
'{\[((?:\#|/)[^ ]*) ([^]]*)\]}'
Can anyone create an example that would match this, and maybe explain how he got it. I got stuck at '?'.
I never used this meta-character at the beginning, usually I use it to mark that an literal cannot appear or appear exactly once.
Thanks

(?:...) has the same grouping effect as (...), but without "capturing" the contents of the group; see http://php.net/manual/en/regexp.reference.subpatterns.php.
So, (?:\#|/) means "either # or /".
I'm guessing you know that [^ ]* means "zero or more characters that aren't SP", and that [^]]* means "zero or more characters that aren't right-square-brackets".
Putting it together, one possible string is this:
'{[/abcd asdfasefasdc]}'

See Open source RegexBuddy alternatives and Online regex testing for some helpful tools. It's easiest to have a regex explained by them first. I used YAPE here:
NODE EXPLANATION
----------------------------------------------------------------------
\[ '['
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
\# '#'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
[^ ]* any character except: ' ' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
' '
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
[^]]* any character except: ']' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
\] ']'
----------------------------------------------------------------------
This is under the presumption that { and } in your example are the regex delimiters.
You can just read through the list of explanations and come up with a possible source string such as:
[#NOSPACE NOBRACKET]

I think this is a good post to help design regex. While its fairly easy to write a
general regex to match a string, sometimes its helpfull to look at it in reverse after
its designed. Sometimes it is necessary to see what bizzar things will match.
When mixing a lot of the metachars as literals, its fairly important to format
these kind for ease of reading and to avoid errors.
Here are some samples in Perl which were easier (for me) to prototype.
my #samps = (
'{[/abcd asdfasefasdc]}',
'{[# ]}',
'{[# /# \/]}',
'{[/# {[
| /# {[#\/} ]}',
,
);
for (#samps) {
if (m~{\[([#/][^ ]*) ([^]]*)\]}~)
{
print "Found: '$&'\ngrp1 = '$1'\ngrp2 = '$2'\n===========\n\n";
}
}
__END__
Expanded
\{\[
(
[#/][^ ]*
)
[ ]
(
[^\]]*
)
\]\}
Output
Found: '{[/abcd asdfasefasdc]}'
grp1 = '/abcd'
grp2 = 'asdfasefasdc'
===========
Found: '{[# ]}'
grp1 = '#'
grp2 = ''
===========
Found: '{[# /# \/]}'
grp1 = '#'
grp2 = '/# \/'
===========
Found: '{[/# {[
| /# {[#\/} ]}'
grp1 = '/# {[
|'
grp2 = '/# {[#\/} '
===========

Understanding Pattern in preg_match_all() Function Call

I am trying to understand how preg_match_all() works and when looking at the documentation on the php.net site, I see some examples but am baffled by the strings sent as the pattern parameter. Is there a really thorough, clear explanation out there? For example, I don't understand what the pattern in this example means:
preg_match_all("/\(? (\d{3})? \)? (?(1) [\-\s] ) \d{3}-\d{4}/x",
"Call 555-1212 or 1-800-555-1212", $phones);
or this:
$html = "<b>bold text</b><a href=howdy.html>click me</a>";
preg_match_all("/(<([\w]+)[^>]*>)(.*?)(<\/\\2>)/", $html, $matches, PREG_SET_ORDER);
I've taken an introductory class on PHP, but never saw anything like this. Some clarification would be appreciated.
Thanks!

Those aren't "PHP patterns", those are Regular Expressions. Instead of trying to explain what has been explained before a thousand times in this answer, I'll point you to http://regular-expressions.info for information and tutorials.

You are looking for this,
PHP PCRE Pattern Syntax
PCRE Standard syntax
Note that first one is a subset of second one.

Also have a look at YAPE, which for example gives this nice textual explanation for your first regex:
(?x-ims:\(? (\d{3})? \)? (?(1) [\-\s] ) \d{3}-\d{4})
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?x-ims: group, but do not capture (disregarding
whitespace and comments) (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n):
----------------------------------------------------------------------
\(? '(' (optional (matching the most amount
possible))
----------------------------------------------------------------------
( group and capture to \1 (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\d{3} digits (0-9) (3 times)
----------------------------------------------------------------------
)? end of \1 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \1)
----------------------------------------------------------------------
\)? ')' (optional (matching the most amount
possible))
----------------------------------------------------------------------
(?(1) if back-reference \1 matched, then:
----------------------------------------------------------------------
[\-\s] any character of: '\-', whitespace (\n,
\r, \t, \f, and " ")
----------------------------------------------------------------------
| else:
----------------------------------------------------------------------
succeed
----------------------------------------------------------------------
) end of conditional on \1
----------------------------------------------------------------------
\d{3} digits (0-9) (3 times)
----------------------------------------------------------------------
- '-'
----------------------------------------------------------------------
\d{4} digits (0-9) (4 times)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

The pattern you write about is a mini-language in it's own called Regular Expression. It's specialized on finding patterns in strings, do replacements etc. for everything that follows some sort of pattern.
More specifically it's a Perl Compatible Regular Expression (PCRE).
The handbook for that language is not available on the PHP manual website, you find it here: PCRE Manpage.
A well made step-by-step introduction is on the Regular Expressions Info Website.

How does this regex divide text into sentences?

I know this regex divides a text into sentences. Can someone help me understand how?
/(?<!\..)([\?\!\.])\s(?!.\.)/

You can use YAPE::Regex::Explain to decipher Perl regular expressions:
use strict;
use warnings;
use YAPE::Regex::Explain;
my $re = qr/(?<!\..)([\?\!\.])\s(?!.\.)/;
print YAPE::Regex::Explain->new($re)->explain();
__END__
The regular expression:
(?-imsx:(?<!\..)([\?\!\.])\s(?!.\.))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
(?<! look behind to see if there is not:
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
) end of look-behind
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[\?\!\.] any character of: '\?', '\!', '\.'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

There is the Regular Expression Analyzer which will do quite the same as toolic already suggested - but completely webbased.

(? # Find a group (don't capture)
< # before the following regular expression
! # that does not match
\. # a literal "."
. # followed by 1 character
) # (End look-behind group)
( # Start a group (capture it to $1)
[\?\!\.] # Containing any one of the characters in the following set "?!."
) # End group $1
\s # followed by a whitespace character " ", \t, etc.
(? # Followed by a group (don't capture)
# after the preceding regular expression
! # that does not have
. # 1 character
\. # followed by a literal "."
) # (End look-ahead group)

The first part (?<!\..) is a negative look-behind. It specifies a pattern which invalidates the match. In this case it's looking for two characters--the first a period and the other one any character.
The second part is a standard capture/group, which could be better expressed: ([?!.]) (you don't need the escapes in the class brackets), that is a sentence ending punctuation character.
The next part is a single (??) white-space character: \s
And the last part is a negative look-ahead: (?!.\.). Again it is guarding against the case of a single character followed by a period.
This should work, relatively well. But I don't think I would recommend it. I don't see what the coder was getting at trying to make sure that just a period wasn't the second most recent character, or that it wasn't the second one to come.
I mean if you are looking to split on terminal punctuation, why don't you want to guard against the same class being two-back or two-ahead? Instead it relies on periods not being there. Thus a more regular expression would be:
/(?<![?!.].)([?!.])\s(?!.[?!.])/

Portions:
([\?\!\.])\s: split by ending character (.,!,or ?) which is followed by a whitespace character (space, tab, newline)
(?<!\..) where the characters before this 'ending character' arent a .+anything
(?!.\.) after the whitespace character any character directly followed by any . isn't allowed.
Those look-ahead ((?!) & look-behind ((?<!) assertions mainly seem to prevent splitting on (whitespaced?) abbreviations (q. e. d. etc.).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_match_all for a string - php

here is your solution with my regular expression Code $in = "Local Time Zone (America/Chicago (CDT) offset -18000 (Daylight))"; preg_match_all('/\(([A-Za-z0-9_\W]+?)\\s/', $in, $out); echo "<pre>"; print_r($out[1][0]); ?> And OUTPUT America/Chicago hope this will sure help you.

Related

PHP Regex on CD Tracklists

Regex to find information in PHP function

How do you create a string to match an regex?

Understanding Pattern in preg_match_all() Function Call

How does this regex divide text into sentences?

Categories

Resources