Use String for Pattern but Exclude it from Being Removed - php

i'm pretty new on regex, i have learned something by the way, but is still pour knowledge!
so i want ask you for clarification on how it work!
assuming i have the following strings, as you can see they can be formatted little different way one from another but they are very similar!
DTSTART;TZID="America/Chicago":20030819T000000
DTEND;TZID="America/Chicago":20030819T010000
DTSTART;TZID=US/Pacific
DTSTART;VALUE=DATE
now i want replace everything between the first A-Z block and the colon so for example i would keep
DTSTART:20030819T000000
DTEND:20030819T010000
DTSTART
DTSTART
so on my very noobs knowledge i have worked out this shitty regex! :-(
preg_replace( '/^[A-Z](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );
but why i'm sure this regex will not work!? :-)
Pls help me!
PS: the title of question is pretty explaned, i want also know how for example use a well know string block for match another...
preg_replace( '/^[DTSTART](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );
..without delete DTSTART
Thanks for the time!
Regards
Luca Filosofi

You could use a relatively simple regex like the following.
$subject = 'DTSTART;TZID="America/Chicago":20030819T000000
DTEND;TZID="America/Chicago":20030819T010000
DTSTART;TZID=US/Pacific
DTSTART;VALUE=DATE';
echo preg_replace('/^[A-Z]+\K[^:\n]*/m', '', $subject) . PHP_EOL;
It looks for a series of capital letters at the start of a line, resets the match starting point (that's what \K does) to the end of those and matches anything not a colon or newline (i.e. the parts you want to remove). Those matched parts are then replaced with an empty string.
The output from the above would be
DTSTART:20030819T000000
DTEND:20030819T010000
DTSTART
DTSTART
If the lines that you are interested in will only ever start with DTSTART or DTEND then we could be more precise about what to match (e.g. ^DT(?:START|END)) but [A-Z] obviously covers both of those.

If you want to retain part of the matched pattern in a substitution, you put parentheses around it and then refer to it by $1 (or whichever grouping it is).
For example:
s/^(this is a sentence) to edit/$1/
gives "this is a sentence"

You can check out this example work similarly as your problem
\w+): (?P\d+)/', $str, $matches);
/* This also works in PHP 5.2.2 (PCRE 7.0) and later, however
* the above form is recommended for backwards compatibility */
// preg_match('/(?\w+): (?\d+)/', $str, $matches);
print_r($matches);
?>
The above example will output:
Array
(
[0] => foobar: 2008
[name] => foobar
[1] => foobar
[digit] => 2008
[2] => 2008
)
so if u need only digit u need to print $matches[digit]

You want to remove everything between a semicolon and either a colon or the end of the line, right? So use that as your expression. You're overcomplicating things.
preg_replace('/(?:;.+?:)|(?:;.+?$)/m','',$data);
It's a pretty simple expression. Either match (?:;.+?:) or (?:;.+?$), which differ only by their terminator (the first one matches up to a colon, the second one matches up to the end of the line).
Each is a non-capturing group that starts with a semicolon, reluctantly reads in all characters, then stops at the terminator. Everything matched by this is removable according to your description.

Related

PHP Regex to capture names if prefixed with key words

I'm in need of a PHP regular expression to capture the first initial an last name of people listed in a text document. But only capture the names when the sentence or line contains a few keywords. (from, with, of, and ,as ,observed). My current attempt captures list items ie. "A. General" or "B. Issues" because it doesn't seem to care about what's in front of the names.
I've been using preg_match_all() with hopes of it returning an array of names. (first inital, last name).
Example text
"from J. Smith and B. Miller"
"as T. Baker observed M. Kelly"
"We inquired with B. Brown, T. Stark and J. Maddox."
I've tried
$regex = "/[from|with|of|and|as|observed|,|.]\s+([A-Z]. \w+)/";
$regex = "/((from|with|of|and|as|observed|,|.)\s+([A-Z]. \w+))/";
$regex = "/\b(from|with|of|and|as|observed|,|.)\s+([A-Z].\ \w+)/";
$regex = "/\b(from|with|of|and|as|observed|,|.|\b)\s+([A-Z].\ \w+)/";
I cannot make it only capture when the word list is before the names. I can't use ^ to check 'starts with'. I'm horrible at regex and guess until it works. I feel the solution requires some sort of look-behind assertion, though I'm not sure how it works.
Output
Should be an array
[ 'J. Smith', 'B. Miller' ]
[ 'T. Baker', 'M. Kelly' ]
[ 'B. Brown', 'T. Stark', 'J. Maddox' ]
UPDATE
Final Regexp
$regex = "/\b(?:from|with|of|and|as|observed|,)\s+([A-Z].\ \w+)/";
Seems to work with the few documents I have. Thanks everyone!!
You can use this modified version of your third regex :
\b(?:from|with|of|and|as|observed|,)\s+([A-Z].\ \w+)\g
You need to escape . in the first group or it will accept any character. Not relevant after edit
The \g flag will find every occurrence of the pattern, and you will be able to access the results in $matches[1].
(The added ?: in first group prevent it from being captured, you can remove it if you need to know the keyword, but then the results will be stored in $matches[2] )
Edit : Removed \. in first group to not match end of sentences (see author comment).
You can try looking for a capital letter followed by a dot and a word
[A-Z]\.\s\w+
I think this should work
/(?!^from|with|of|and|as|observed|\s)([A-Z]{1,}\.\s\w*)/g
Where
?! = Discard the match of the first group, that begins with first ( and ends with ) and at least is included also the \s (space) at the beginning of the name.
^ = match the begins of the line/sentence/string
Then in second group it should match just one capital letter {1,} and then a dot \., a space \s and the word \w
The /g at the end stands for "global search"
https://regexr.com/3pa9o

How to match all words but "stop" in a string by regex

another regex question. I use PHP, and have a string: fdjkaljfdlstopfjdslafdj. You see there is a stop in the middle. I just want to replace any other words excluding that stop. i try to use [^stop], but it also includes the s at the end of the string.
My Solution
Thanks everyone’s help here.
I also figure out a solution with pure RegEx method(I mean in my knowledge scoop to RegEx. PCRE verbs are too advanced for me). But it needs 2 steps. I don’t want to mix PHP method in, because sometimes the jobs are out of coding area, i.e. multi-renaming filenames in Total Commander.
Let’s see the string: xxxfooeoropwfoo,skfhlk;afoofsjre,jhgfs,vnhufoolsjunegpq. For example, I want to keep all foos in this string, and replace any other non-foo greedily into ---.
First, I need to find all the non-foo between each foo: (?<=foo).+?(?=foo).
The string will turn into xxxfoo---foo---foo---foolsjunegpq, just both sides non-foo words left now.
Then use [^-]+(?=foo)|(?<=foo)[^-]+.
This time: ---foo---foo---foo---foo---. All words but foo have been turned into ---.
i just dont want to include "stop"...
You can skip it by using PCRE verbs (*SKIP)(*F) try like this
stop(*SKIP)(*F)|.
Demo at regex101
or sequence: (stop)(*SKIP)(*F)|(?:(?!(?1)).)+
or for words: stop(*SKIP)(*F)|\w+
[^stop] doesn't means any text that is NOT stop. It just means any character that is not one of the 4 characters inside [...] which is in this case s,t,o,p.
Better to split on the text you don't want to match:
$s = 'fdjkaljfdlstopfjdslafdjstopfoobar';
php> $arr = preg_split('/stop/', $s);
php> print_r($arr);
Array
(
[0] => fdjkaljfdl
[1] => fjdslafdj
[2] => foobar
)
You can generalize this to any pattern:
(?<neg>stop)(*SKIP)(*FAIL)|(?s:.)+?(?=\Z|(?&neg))
Demo
Just put the pattern you don't want in the neg group.
This regex will try to do the following for any character position:
Match the pattern you don't want. If it matches, discard it with (*SKIP)(*FAIL) and restart another match at this position.
If the pattern you don't want doesn't match at a particular position, then match anything, until either:
You reach the end of the input string (\Z)
Or the pattern you don't want immediately follows the current matching position ((?&neg))
This approach is slower than manually tuning the expression, you could get better performance at the cost of repeating yourself, which avoids the recursion:
stop(*SKIP)(*FAIL)|(?s:.)+?(?=\Z|stop)
But of course, the best approach would be to use the features provided by your language: match the string you don't want, then use code to discard it and keep everything else.
In PHP, you can use the PREG_OFFSET_CAPTURE flag to tell the preg_match_all function to provide you the offsets of each match.

preg_match - console.log removing

This is the scenario:
JS file is loaded into string using file_get_contents
I want to remove all debugging info from it
For the purpose of finding out whats happening in PHP code I am
using preg_match
I'm using this expression:
(\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*?[^}(])\);?$
On regex101 and phpliveregex websites it matches:
//console.log(abc)
// console.log(abc)
// console.log(abc);
// console.log('abc');
console.log(abc);
console.log('abc' + some_function());
etc...
But when I put it in PHP code like this:
preg_match('/(\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*?[^}(])\);?$/', $js_code, $matches);
if (!empty($matches[0])) print_r($matches[0]);
I dont get any matches. Too tired to notice what am I missing. Probably something staring at me with its big eyes. :)
Any help would be appreciated.
After some further investigation I improved my regex pattern to match every combination.
#Jan
Your answer pushed me in the right direction.
((\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)(\s*?)\((.*[^}(])(\){1,});?)
Why so complicated? Do you need this distinctuation between the different functions (log, etc.) ? The following regex matches all of your above examples. See a working demo here.
$regex = '/(?<console>(?:\/\/)?\s*console\.[^;]+;)/g';
# captured group named console with two forward slashes optionally
# followed by whitespaces (or not)
# match console. literally then anything up to a semicolon
preg_match_all($regex, $js_string, $matches);
print_r($matches["console"]);
As per your comment, if you need to match the actual method name as well, you could alter the regex like so:
$regex = '/(?<console>(?:\/\/)?\s*console\.(?<function>[^(]+)[^;]+;)/g';
Now $matches["function"] hold the actual method name, see a demo for this here.
So this is what I did to approach your problem. Hopefully it works for you.
// DEFINE THE STRING
$string = "
<br>Other Text Goes Here
//console.log(abc)
// console.log(abc)
// console.log(abc);
// console.log('abc');
<br>More Text Here
console.log(abc);
console.warn('abc' + some_function());
console.log('abc' + some_function());
<br>And More Text Goes Here";
// DO THE PREG_MATCH_ALL TO FIND ALL OCCURRENCES
preg_match_all('~(?://)?\s*console\.[A-Z]+\(.*?$~sim', $string, $matches);
print "<pre>"; print_r($matches[0]); print "</pre>";
That will give you the following:
Array
(
[0] => //console.log(abc)
[1] => // console.log(abc)
[2] => // console.log(abc);
[3] => // console.log('abc');
[4] =>
console.log(abc);
[5] =>
console.warn('abc' + some_function());
[6] =>
console.log('abc' + some_function());
)
Finding them is one thing, but not too different from actually replacing the occurrences of it with an empty string. Something like this should do the trick:
print preg_replace('~((?://)?\s*console\.[A-Z]+\(.*?$)~sim', '', $string);
That will show this in the browser:
Other Text Goes Here
More Text Here
And More Text Goes Here
Here is a working demo for you to take a look at:
http://ideone.com/Vv0cGY
Explanation:
(?://)?\s*console\.[A-Z]+\(.*?$
(?://)? - Look for an optional two forward slashes. The ?: in front tells it to find it, but don't remember it.
\s* - Look for any spaces that may or may not be present.
console\.[A-Z]+ - Will match console, followed by a literal dot ., followed by at least one alpha character.
\(.*?$ - Find an open parenthesis and grab everything up through the end of the line.

php preg_match_all between ... and

I'm trying to use preg_match_all to match anything between ... and ... and the line does word wrap. I've done number of searches on google and tried different combinations and nothing is working. I have tried this
preg_match_all('/...(.*).../m/', $rawdata, $m);
Below is an example of what the format will look like:
...this is a test...
...this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test...
The s modifier allows for . to include new line characters so try:
preg_match_all('/\.{3}(.*?)\.{3}/s', $rawdata, $m);
The m modifier you were using is so the ^$ acts on a per line basis rather than per string (since you don't have ^$ doesn't make sense).
You can read more about the modifiers here.
Note the . needs to be escaped as well because it is a special character meaning any character. The ? after the .* makes it non-greedy so it will match the first ... that is found. The {3} says three of the previous character.
Regex101 demo: https://regex101.com/r/eO6iD1/1
Please escape the literal dots, since the character is also a regular expressions reservered sign, as you use it inside your code yourself:
preg_match_all('/\.\.\.(.*)\.\.\./m/', $rawdata, $m)
In case what you wanted to state is that there are line breaks within the content to match you would have to add this explicitely to your code:
preg_match_all('/\.\.\.([.\n\r]*)\.\.\./m/', $rawdata, $m)
Check here for reference on what characters the dot includes:
http://www.regular-expressions.info/dot.html
You're almost near to get it,
so you need to update your RE
/\.{3}(.*)\.{3}/m
RE breakdown
/: start/end of string
\.: match .
{3}: match exactly 3(in this case match exactly 3 dots)
(.*): match anything that comes after the first match(...)
m: match strings that are over Multi lines.
and when you're putting all things together, you'll have this
$str = "...this is a test...";
preg_match_all('/\.{3}(.*)\.{3}/m', $str, $m);
print_r($m);
outputs
Array
(
[0] => Array
(
[0] => ...this is a test...
)
[1] => Array
(
[0] => this is a test
)
)
DEMO

I have list of webpage URLs, I just need to strip everything except specific value and ID from it using regex

Suppose I have list of URLs that follow structure below. I need to strip each one out so all thats left is the abcustomerid=12345. How can I do this using regex with notepad ++?
Here's an example of the different variety in each line. I just need to remove everything from each line, but leave the abcustomerid=12345 or whatever value that follows abcustomerid.
/the/stucture/blah.php?timeout=300&abcustomerid=53122&customer=zxyi
/some/other/struct/pagehere.php?today=Thursday&abcustomerid=241&count=54
/blah/blah/tendid.php?abcustomerid=12525
Each line could have anything different around the abcustomerid, but i just need to remove everything and keep the abcustomerid and the value.
This regex should do it.
(?:&|\?)abcustomerid=(\d+)
Usage:
<?php
$string= '/the/stucture/blah.php?timeout=300&abcustomerid=53122&customer=zxyi
/some/other/struct/pagehere.php?today=Thursday&abcustomerid=241&count=54
/blah/blah/tendid.php?abcustomerid=12525';
preg_match_all('~(?:&|\?)abcustomerid=(\d+)~', $string, $output);
print_r($output[1]);
The ?: tells the regex not to capture that group. We don't want to capture that data because it is irrelevant. The () capture the data we are interested in. The \d+ is one or more numbers (the + is the one or more part of it). If it can be any value change that to .+? which will match anything but then you will need an anchor for where it should stop. I'd use (?:&|$), which tells it to capture until the next & or the end of the string if it is multilined you'll need to use the m modifier. http://php.net/manual/en/reference.pcre.pattern.modifiers.php
Output:
Array
(
[0] => 53122
[1] => 241
[2] => 12525
)
Demo:
http://sandbox.onlinephpfunctions.com/code/37a4ddea8c50f98a41ac7d45fec98f5f1f58761f
Here is the RegEx which takes the abcustomerid with its value.
[?&](abcustomerid=\d+)
However, how you are going to 'remove everything' using Notepad++?
You can use this service to do this (there is demo in the end of the answer).
Copy your regex and all your data into Test string form. After it succesfully matches everything, look at Match information window at the middle right of the page. Click Export matches... button and choose plain text.
You will get something like this:
abcustomerid=53122
abcustomerid=241
abcustomerid=12525
Here is the working Demo.

Categories