Regular expression - Extracting strings from a repeating format - php

EDIT - SOLVED:
Thanks for the responses - I learned that this is actually in serialised format and that there's no need to process it using RegEx.
Apologies for the newb question - and I have tried many many variations, based on StackOverflow answers with no luck. I've also spent a while experimenting with an online Regex tool to try and solve this myself.
This is the string I'm checking :
i:0;s:1:"1";i:1;s:1:"3";i:2;s:1:"5";i:3;s:1:"6";
I'll settle for matching these strings :
i:0;s:1:"1";
i:1;s:1:"3";
i:2;s:1:"5";
i:3;s:1:"6";
But ideally I would like to capture all the values between quotes only.
(there could be anywhere between 1-10 of these kinds of entries)
i.e. regex_result = [1,3,5,6]
These are some of the regexes I've tried.
I've only been able to either capture the first match, or the last match, but not all matches - I'm confused as to why the regex isnt "repeating" as I'd expected:
(i:.;s:1:".";)*
(i:.;s:1:".";)+
(i:.;s:1:".";)+?
Thanks

You can use this regex.
/(?<=:")\d+(?=";)/g
DEMO

"([^"]*)"
Try this .See demo.
http://regex101.com/r/hQ1rP0/43

You need to use \G so that it would get the number within double quotes which was preceded by i:.;s:1:"(Here the dot after i: represents any character). The anchor \G matches at the position where the previous match ended.
<?php
$string = 'i:0;s:1:"1";i:1;s:1:"3";i:2;s:1:"5";i:3;s:1:"6";';
echo preg_match_all('~(?:i:.;s:1:"|(?<!^)\G)(.)(?=";)~', $string, $match);
print_r($match[1]);
?>
Output:
4Array
(
[0] => 1
[1] => 3
[2] => 5
[3] => 6
)
DEMO

Related

regex to convert string 018v-s001v => 18v-s1v but 020v_001 => 20v_001

I'm struggling with a Regex to convert the following strings
018v-s001v => 18v-s1v
018v-s001r => 18v-s1r
018r-s002v => 18r-s2v
020v_001 => 20v_001
020r_002 => 20r_002
0001 => 0001
I could manage to convert the first three cases but I'm struggling with the latter three: How to preserve the zeros after_ and the all zeros in the last case?
My attempt: (0*)([1-9]{0,4}[vr]?)((-s)?+([0]{0,2}))?+([1-9][vr])?
https://regex101.com/r/2go5KO/1
For your given examples, you could use
000\d+(*SKIP)(*FAIL)|(?<=\b|[a-z])0+
See a demo on regex101.com.
To get the expected result for your example data you might use preg_replace.
You could match one or more times a zero 0+, capture in a group one or more digits and use a character class to match by v or r ([0-9]+[vr])
Regex
0+([0-9]+[vr])
Replace
Captured group 1 $1
Demo Php
How about this one:
$result = preg_replace('/(?:(\d{4})|(0)?(\d{2}\w))(?:([-_])(?:(\d{3})|(\w)(0+)(\d+?\w)))?/m',
'$1$3$4$5$6$8', $subject);
This produces all the results you require from your test strings. But it wasn't clear where a zero definitely will appear or only optionally. But I'm sure it can be adapted. Also I noticed the separator was occasionally a hyphen - and occasionally an underscore _ and it wasn't clear if that was just your typing or was significant. In any case I've assumed it could be either somewhat randomly.

preg_match - console.log removing

This is the scenario:
JS file is loaded into string using file_get_contents
I want to remove all debugging info from it
For the purpose of finding out whats happening in PHP code I am
using preg_match
I'm using this expression:
(\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*?[^}(])\);?$
On regex101 and phpliveregex websites it matches:
//console.log(abc)
// console.log(abc)
// console.log(abc);
// console.log('abc');
console.log(abc);
console.log('abc' + some_function());
etc...
But when I put it in PHP code like this:
preg_match('/(\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*?[^}(])\);?$/', $js_code, $matches);
if (!empty($matches[0])) print_r($matches[0]);
I dont get any matches. Too tired to notice what am I missing. Probably something staring at me with its big eyes. :)
Any help would be appreciated.
After some further investigation I improved my regex pattern to match every combination.
#Jan
Your answer pushed me in the right direction.
((\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)(\s*?)\((.*[^}(])(\){1,});?)
Why so complicated? Do you need this distinctuation between the different functions (log, etc.) ? The following regex matches all of your above examples. See a working demo here.
$regex = '/(?<console>(?:\/\/)?\s*console\.[^;]+;)/g';
# captured group named console with two forward slashes optionally
# followed by whitespaces (or not)
# match console. literally then anything up to a semicolon
preg_match_all($regex, $js_string, $matches);
print_r($matches["console"]);
As per your comment, if you need to match the actual method name as well, you could alter the regex like so:
$regex = '/(?<console>(?:\/\/)?\s*console\.(?<function>[^(]+)[^;]+;)/g';
Now $matches["function"] hold the actual method name, see a demo for this here.
So this is what I did to approach your problem. Hopefully it works for you.
// DEFINE THE STRING
$string = "
<br>Other Text Goes Here
//console.log(abc)
// console.log(abc)
// console.log(abc);
// console.log('abc');
<br>More Text Here
console.log(abc);
console.warn('abc' + some_function());
console.log('abc' + some_function());
<br>And More Text Goes Here";
// DO THE PREG_MATCH_ALL TO FIND ALL OCCURRENCES
preg_match_all('~(?://)?\s*console\.[A-Z]+\(.*?$~sim', $string, $matches);
print "<pre>"; print_r($matches[0]); print "</pre>";
That will give you the following:
Array
(
[0] => //console.log(abc)
[1] => // console.log(abc)
[2] => // console.log(abc);
[3] => // console.log('abc');
[4] =>
console.log(abc);
[5] =>
console.warn('abc' + some_function());
[6] =>
console.log('abc' + some_function());
)
Finding them is one thing, but not too different from actually replacing the occurrences of it with an empty string. Something like this should do the trick:
print preg_replace('~((?://)?\s*console\.[A-Z]+\(.*?$)~sim', '', $string);
That will show this in the browser:
Other Text Goes Here
More Text Here
And More Text Goes Here
Here is a working demo for you to take a look at:
http://ideone.com/Vv0cGY
Explanation:
(?://)?\s*console\.[A-Z]+\(.*?$
(?://)? - Look for an optional two forward slashes. The ?: in front tells it to find it, but don't remember it.
\s* - Look for any spaces that may or may not be present.
console\.[A-Z]+ - Will match console, followed by a literal dot ., followed by at least one alpha character.
\(.*?$ - Find an open parenthesis and grab everything up through the end of the line.

how to get the string between two character in this case?

I want to get the string between "yes""yes"
eg.
yes1231yesyes4567yes
output:
1231,4567
How to do it in php?
may be there are 1 more output '' between yesyes right?
In this particular example, you could use preg_match_all() with a regex to capture digits between "yes" and "yes":
preg_match_all("/yes(\d+)yes/", $your_string, $output_array);
print_r($output_array[1]);
// Array
// (
// [0] => 1231
// [1] => 4567
// )
And to achieve your desired output:
echo implode(',', $output_array[1]); // 1231,4567
Edit: side reference, if you need a looser match that will simply match all sets of numbers in a string e.g. in your comment 9yes123yes12, use the regex: (\d+) and it will match 9, 123 and 12.
Good Evening,
If you are referring to extracting the digits (as shown in your example) then you can achieve this by referencing the response of Christopher who answered a very similar question on this topic:
PHP code to remove everything but numbers
However, on the other hand, if you are looking to extract all instances of the word "yes" from the string, I would recommend using the str_replace method supplied by PHP.
Here is an example that would get you started;
str_replace("yes", "", "yes I do like cream on my eggs");
I trust that this information is of use.

preg_split with two patterns (one of them quoted)

I would like to split a string in PHP containing quoted and unquoted substrings.
Let's say I have the following string:
"this is a string" cat dog "cow"
The splitted array should look like this:
array (
[0] => "this is a string"
[1] => "cat"
[2] => "dog"
[3] => "cow"
)
I'm struggling a bit with regex and I'm wondering if it is even possible to achieve with just one regex/preg_split-Call...
The first thing I tried was:
[[:blank:]]*(?=(?:[^"]*"[^"]*")*[^"]*$)[[:blank:]]*
But this splits only array[0] and array[3] correctly - the rest is splitted on a per character base.
Then I found this link:
PHP preg_split with two delimiters unless a delimiter is within quotes
(?=(?:[^"]*"[^"]*")*[^"]*$)
This seems to me as a good startingpoint. However the result in my example is the same as with the first regex.
I tried combining both - first the one for quoted strings and then a second sub-regex which should ommit quoted string (therefore the [^"]):
(?=(?:[^"]*"[^"]*")*[^"]*$)|[[:blank:]]*([^"].*[^"])[[:blank:]]*
Therefore 2 questions:
Is it even possible to achieve what I want with just one regex/preg_split-Call?
If yes, I would appreciate a hint on how to assemble the regex correctly
Since matches cannot overlap, you could use preg_match_all like this:
preg_match_all('/"[^"]*"|\S+/', $input, $matches);
Now $matches[0] should contain what you are looking for. The regex will first try to match a quoted string, and then stop. If that doesn't do it it will just collect as many non-whitespace characters as possible. Since alternations are tried from left to right, the quoted version takes precedence.
EDIT: This will not get rid of the quotes though. To do this, you could use capturing groups:
preg_match_all('/(?|"([^"]*)"|(\S+))/', $input, $matches);
Now $matches[1] will contain exactly what you are looking for. The (?| is there so that both capturing groups end up at the same index.
EDIT 2: Since you were asking for a preg_split solution, that is also possible. We can use a lookahead, that asserts that the space is followed by an even number of quotes (up until the end of the string):
$result = preg_split('/\s+(?=(?:[^"]*"[^"]*")*$)/', $input);
Of course, this will not get rid of the quotes, but that can easily be done in a separate step.

Use String for Pattern but Exclude it from Being Removed

i'm pretty new on regex, i have learned something by the way, but is still pour knowledge!
so i want ask you for clarification on how it work!
assuming i have the following strings, as you can see they can be formatted little different way one from another but they are very similar!
DTSTART;TZID="America/Chicago":20030819T000000
DTEND;TZID="America/Chicago":20030819T010000
DTSTART;TZID=US/Pacific
DTSTART;VALUE=DATE
now i want replace everything between the first A-Z block and the colon so for example i would keep
DTSTART:20030819T000000
DTEND:20030819T010000
DTSTART
DTSTART
so on my very noobs knowledge i have worked out this shitty regex! :-(
preg_replace( '/^[A-Z](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );
but why i'm sure this regex will not work!? :-)
Pls help me!
PS: the title of question is pretty explaned, i want also know how for example use a well know string block for match another...
preg_replace( '/^[DTSTART](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );
..without delete DTSTART
Thanks for the time!
Regards
Luca Filosofi
You could use a relatively simple regex like the following.
$subject = 'DTSTART;TZID="America/Chicago":20030819T000000
DTEND;TZID="America/Chicago":20030819T010000
DTSTART;TZID=US/Pacific
DTSTART;VALUE=DATE';
echo preg_replace('/^[A-Z]+\K[^:\n]*/m', '', $subject) . PHP_EOL;
It looks for a series of capital letters at the start of a line, resets the match starting point (that's what \K does) to the end of those and matches anything not a colon or newline (i.e. the parts you want to remove). Those matched parts are then replaced with an empty string.
The output from the above would be
DTSTART:20030819T000000
DTEND:20030819T010000
DTSTART
DTSTART
If the lines that you are interested in will only ever start with DTSTART or DTEND then we could be more precise about what to match (e.g. ^DT(?:START|END)) but [A-Z] obviously covers both of those.
If you want to retain part of the matched pattern in a substitution, you put parentheses around it and then refer to it by $1 (or whichever grouping it is).
For example:
s/^(this is a sentence) to edit/$1/
gives "this is a sentence"
You can check out this example work similarly as your problem
\w+): (?P\d+)/', $str, $matches);
/* This also works in PHP 5.2.2 (PCRE 7.0) and later, however
* the above form is recommended for backwards compatibility */
// preg_match('/(?\w+): (?\d+)/', $str, $matches);
print_r($matches);
?>
The above example will output:
Array
(
[0] => foobar: 2008
[name] => foobar
[1] => foobar
[digit] => 2008
[2] => 2008
)
so if u need only digit u need to print $matches[digit]
You want to remove everything between a semicolon and either a colon or the end of the line, right? So use that as your expression. You're overcomplicating things.
preg_replace('/(?:;.+?:)|(?:;.+?$)/m','',$data);
It's a pretty simple expression. Either match (?:;.+?:) or (?:;.+?$), which differ only by their terminator (the first one matches up to a colon, the second one matches up to the end of the line).
Each is a non-capturing group that starts with a semicolon, reluctantly reads in all characters, then stops at the terminator. Everything matched by this is removable according to your description.

Categories