PCRE regex for movie data - php

i have a string like this
<14> south.park.s14e01.locdog.avi [190713856]
i need a php regexp to get an array like this
array(14, 'south.park.s14e01.locdog.avi', 190713856)
please help

preg_match('/^<(\d+)> \s+ (\S+) \s+ \[(\d+)\]$/x', $input, $your_array);
Where your desired results are in $your_array starting at index 1.

$test = '<14> south.park.s14e01.locdog.avi [190713856]';
preg_match('/<(\d{2})>\s(.+)\s\[(\d{9})\]/',$test,$m);
print_r($m);//[1] => 14 [2] => south.park.s14e01.locdog.avi [3] => 190713856

Related

Split string after each number

I have a database full of strings that I'd like to split into an array. Each string contains a list of directions that begin with a letter (U, D, L, R for Up, Down, Left, Right) and a number to tell how far to go in that direction.
Here is an example of one string.
$string = "U29R45U2L5D2L16";
My desired result:
['U29', 'R45', 'U2', 'L5', 'D2', 'L16']
I thought I could just loop through the string, but I don't know how to tell if the number is one or more spaces in length.
You can use preg_split to break up the string, splitting on something which looks like a U,L,D or R followed by numbers and using the PREG_SPLIT_DELIM_CAPTURE to keep the split text:
$string = "U29R45U2L5D2L16";
print_r(preg_split('/([UDLR]\d+)/', $string, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY));
Output:
Array (
[0] => U29
[1] => R45
[2] => U2
[3] => L5
[4] => D2
[5] => L16
)
Demo on 3v4l.org
A regular expression should help you:
<?php
$string = "U29R45U2L5D2L16";
preg_match_all("/[A-Z]\d+/", $string, $matches);
var_dump($matches);
Because this task is about text extraction and not about text validation, you can merely split on the zer-width position after one or more digits. In other words, match one or more digits, then forget them with \K so that they are not consumed while splitting.
Code: (Demo)
$string = "U29R45U2L5D2L16";
var_export(
preg_split(
'/\d+\K/',
$string,
0,
PREG_SPLIT_NO_EMPTY
)
);
Output:
array (
0 => 'U29',
1 => 'R45',
2 => 'U2',
3 => 'L5',
4 => 'D2',
5 => 'L16',
)

Can preg_match() capture unknown number of occurrences?

Let's say I'm having the following string:
$string = 'cats[Garfield,Tom,Azrael]';
I need to capture the following strings:
cats
Garfield
Tom
Azrael
That string can be any word-like text, followed by brackets with the list of comma-separated word-like entries. I tried the following:
preg_match('#^(\w+)\[(\w+)(?:,(\w+))*\]$#', $string, $matches);
The problem is that $matches ignores Tom, matching only the first and the last cat.
Now, I know how to do that with more calls, perhaps combining preg_match() and explode(), so the question is not how to do it in general.
The question is: can that be done in single preg_match(), so I could validate and match on one go?
The underlying question seems to be: is it possible to extract each occurrence of a repeated capture group?
The answer is no.
However, several workarounds exists:
The most understandable uses two steps: you capture the full list and then you split it. Something like:
$str = 'cats[Garfield,Tom,Azrael,Supermatou]';
if ( preg_match('~(?<item>\w+)\[(?<list>\w+(?:,\w+)*)]~', $str, $m) )
$result = [ $m['item'], explode(',', $m['list']) ];
(or any structure you want)
An other workaround uses preg_match_all in conjunction with the \G anchor that matches either the start of the string or the position after a successful match:
$pattern = '~(?:\G(?!\A),|(?<item>\w+)\[(?=[\w,]+]))(?<elt>\w+)~';
if ( preg_match_all($pattern, $str, $matches) )
print_r($matches);
This design ensures that all elements are between the brackets.
To obtain a more flat result, you can also write it like this:
$pattern = '~\G(?!\A)[[,]\K\w+|\w+(?=\[[\w,]+])~';
details of this last pattern:
~
# first alternative (can't be the first match)
\G (?!\A) # position after the last successful match
# (the negative lookahead discards the start of the string)
[[,] # an opening bracket or a comma
\K # return the whole match from this position
\w+ # an element
| # OR
# second alternative (the first match)
\w+ # the item
(?= # lookahead to check forward if the format is correct
\[ # opening bracket
[\w,]+ # word characters and comma (feel free to be more descriptive
# like \w+(?:,\w+)* or anything you want)
] # closing bracket
)
~
Why not a simple preg_match_all:
$string = 'cats[Garfield,Tom,Azrael], entity1[child11,child12,child13], entity2:child21&child22&child23';
preg_match_all('#\w+#', $string, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => cats
[1] => Garfield
[2] => Tom
[3] => Azrael
[4] => entity1
[5] => child11
[6] => child12
[7] => child13
[8] => entity2
[9] => child21
[10] => child22
[11] => child23
)
)

PHP regex to extract special string

I am trying to use regex to extract a certain syntax, in my case something like "10.100" or "20.111", in which 2 numbers are separated by dot(.) . So if I provide "a 10.100", it will extract 10.100 from the string. If I provide "a 10.100 20.101", it will extract 10.100 and 20.101.
Until now I have tried to use
preg_match('/^.*([0-9]{1,2})[^\.]([0-9]{1,4}).*$/', $message, $array);
but still no luck. Please provide any suggestion because I don't have strong regex knowledge. Thanks.
You may use
\b[0-9]{1,2}\.[0-9]{1,4}\b
See the regex demo.
Details:
\b - a leading word boundary
[0-9]{1,2} - 1 or 2 digits
\. - a dot
[0-9]{1,4} - 1 to 4 digits
\b - a trailing word boundary.
If you do not care about the whole word option, just remove \b. Also, to match just 1 or more digits, you may use + instead of the limiting quantifiers. So, perhaps
[0-9]+\.[0-9]+
will also work for you.
See a PHP demo:
$re = '/[0-9]+\.[0-9]+/';
$str = 'I am trying to use regex to extract a certain syntax, in my case something like "10.100" or "20.111", in which 2 numbers are separated by dot(.) . So if I provide "a 10.100", it will extract 10.100 from the string. If I provide "a 10.100 20.101", it will extract 10.100 and 20.101.';
preg_match_all($re, $str, $matches);
print_r($matches[0]);
Output:
Array
(
[0] => 10.100
[1] => 20.111
[2] => 10.100
[3] => 10.100
[4] => 10.100
[5] => 20.101
[6] => 10.100
[7] => 20.101
)
Regex: /\d+(?:\.\d+)/
1. \d+ for matching digits one or more.
2. (?:\.\d+) for matching digits followed by . like .1234
Try this code snippet here
<?php
ini_set('display_errors', 1);
$string='a 10.100 20.101';
preg_match_all('/\d+(?:\.\d+)/', $string, $array);
print_r($array);
Output:
Array
(
[0] => Array
(
[0] => 10.100
[1] => 20.101
)
)
$decimals = "10.5 100.50 10.250";
preg_match_all('/\b[\d]{2}\.\d+\b/', $decimals, $output);
print_r($output);
Output:
Array
(
[0] => 10.5
[1] => 10.250
)
Regex Demo | Php Demo

split string by spaces and colon but not if inside quotes

having a string like this:
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"
the desired result is:
[0] => Array (
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
what I get with:
preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);
is:
[0] => Array (
[0] => dateto:'2015-10-07
[1] => 15:05'
[2] => xxxx
[3] => datefrom:'2015-10-09
[4] => 15:05'
[5] => yyyy
[6] => asdf
)
Also tried with preg_split("/[\s]+/", $str) but no clue how to escape if value is between quotes. Can anyone show me how and also please explain the regex. Thank you!
I would use PCRE verb (*SKIP)(*F),
preg_split("~'[^']*'(*SKIP)(*F)|\s+~", $str);
DEMO
Often, when you are looking to split a string, using preg_split isn't the best approach (that seems a little counter intuitive, but that's true most of the time). A more efficient way consists to find all items (with preg_match_all) using a pattern that describes all that is not the delimiter (white-spaces here):
$pattern = <<<'EOD'
~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
EOD;
if (preg_match_all($pattern, $str, $m))
$result = $m[0];
pattern details:
~ # pattern delimiter
(?=\S) # the lookahead assertion only succeeds if there is a non-
# white-space character at the current position.
# (This lookahead is useful for two reasons:
# - it allows the regex engine to quickly find the start of
# the next item without to have to test each branch of the
# following alternation at each position in the strings
# until one succeeds.
# - it ensures that there's at least one non-white-space.
# Without it, the pattern may match an empty string.
# )
[^'"\s]* #"'# all that is not a quote or a white-space
(?: # eventual quoted parts
'[^']*' [^'"\s]* #"# single quotes
|
"[^"]*" [^'"\s]* # double quotes
)*
~
demo
Note that with this a little long pattern, the five items of your example string are found in only 60 steps. You can use this shorter/more simple pattern too:
~(?:[^'"\s]+|'[^']*'|"[^"]*")+~
but it's a little less efficient.
For your example, you can use preg_split with negative lookbehind (?<!\d), i.e.:
<?php
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf";
$matches = preg_split('/(?<!\d)(\s)/', $str);
print_r($matches);
Output:
Array
(
[0] => dateto:'2015-10-07 15:05'
[1] => xxxx
[2] => datefrom:'2015-10-09 15:05'
[3] => yyyy
[4] => asdf
)
Demo:
http://ideone.com/EP06Nt
Regex Explanation:
(?<!\d)(\s)
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\d)»
Match a single character that is a “digit” «\d»
Match the regex below and capture its match into backreference number 1 «(\s)»
Match a single character that is a “whitespace character” «\s»

Not able to match regex

I have a string like "5-2,5-12,15-27,5-22,50-3,5-100"
I need a regular expression which matches all the occurrences like below: -
5-2
5-12
5-22
5-100
What will be the correct regex that matches all of them.
Use below regex:
(?<!\d)5-\d{1,}
DEMO
Not sure to well understand your needs, but, how about:
$str = "5-2,5-12,15-27,5-22,50-3,5-100";
preg_match_all('/\b5-\d+/', $str, $matches);
print_r($matches)
or
preg_match_all('/\b\d-\d+/', $str, $matches);
Output:
Array
(
[0] => Array
(
[0] => 5-2
[1] => 5-12
[2] => 5-22
[3] => 5-100
)
)
How about:
Online Demo
/(?<!\d)\d\-\d{1,3}/g
If understand correctly the first part of the pattern is one single digit \d therefore we need to exclude other number with a lookbehind (?<!\d) followed by a - and last seems to be a number up to 3 digits if you need more you can remove the 3 and it will also work so it is either \d{1,3} or \d{1,}

Categories