preg_match - console.log removing - php

This is the scenario:
JS file is loaded into string using file_get_contents
I want to remove all debugging info from it
For the purpose of finding out whats happening in PHP code I am
using preg_match
I'm using this expression:
(\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*?[^}(])\);?$
On regex101 and phpliveregex websites it matches:
//console.log(abc)
// console.log(abc)
// console.log(abc);
// console.log('abc');
console.log(abc);
console.log('abc' + some_function());
etc...
But when I put it in PHP code like this:
preg_match('/(\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*?[^}(])\);?$/', $js_code, $matches);
if (!empty($matches[0])) print_r($matches[0]);
I dont get any matches. Too tired to notice what am I missing. Probably something staring at me with its big eyes. :)
Any help would be appreciated.

After some further investigation I improved my regex pattern to match every combination.
#Jan
Your answer pushed me in the right direction.
((\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)(\s*?)\((.*[^}(])(\){1,});?)

Why so complicated? Do you need this distinctuation between the different functions (log, etc.) ? The following regex matches all of your above examples. See a working demo here.
$regex = '/(?<console>(?:\/\/)?\s*console\.[^;]+;)/g';
# captured group named console with two forward slashes optionally
# followed by whitespaces (or not)
# match console. literally then anything up to a semicolon
preg_match_all($regex, $js_string, $matches);
print_r($matches["console"]);
As per your comment, if you need to match the actual method name as well, you could alter the regex like so:
$regex = '/(?<console>(?:\/\/)?\s*console\.(?<function>[^(]+)[^;]+;)/g';
Now $matches["function"] hold the actual method name, see a demo for this here.

So this is what I did to approach your problem. Hopefully it works for you.
// DEFINE THE STRING
$string = "
<br>Other Text Goes Here
//console.log(abc)
// console.log(abc)
// console.log(abc);
// console.log('abc');
<br>More Text Here
console.log(abc);
console.warn('abc' + some_function());
console.log('abc' + some_function());
<br>And More Text Goes Here";
// DO THE PREG_MATCH_ALL TO FIND ALL OCCURRENCES
preg_match_all('~(?://)?\s*console\.[A-Z]+\(.*?$~sim', $string, $matches);
print "<pre>"; print_r($matches[0]); print "</pre>";
That will give you the following:
Array
(
[0] => //console.log(abc)
[1] => // console.log(abc)
[2] => // console.log(abc);
[3] => // console.log('abc');
[4] =>
console.log(abc);
[5] =>
console.warn('abc' + some_function());
[6] =>
console.log('abc' + some_function());
)
Finding them is one thing, but not too different from actually replacing the occurrences of it with an empty string. Something like this should do the trick:
print preg_replace('~((?://)?\s*console\.[A-Z]+\(.*?$)~sim', '', $string);
That will show this in the browser:
Other Text Goes Here
More Text Here
And More Text Goes Here
Here is a working demo for you to take a look at:
http://ideone.com/Vv0cGY
Explanation:
(?://)?\s*console\.[A-Z]+\(.*?$
(?://)? - Look for an optional two forward slashes. The ?: in front tells it to find it, but don't remember it.
\s* - Look for any spaces that may or may not be present.
console\.[A-Z]+ - Will match console, followed by a literal dot ., followed by at least one alpha character.
\(.*?$ - Find an open parenthesis and grab everything up through the end of the line.

Related

php preg_match_all between ... and

I'm trying to use preg_match_all to match anything between ... and ... and the line does word wrap. I've done number of searches on google and tried different combinations and nothing is working. I have tried this
preg_match_all('/...(.*).../m/', $rawdata, $m);
Below is an example of what the format will look like:
...this is a test...
...this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test...
The s modifier allows for . to include new line characters so try:
preg_match_all('/\.{3}(.*?)\.{3}/s', $rawdata, $m);
The m modifier you were using is so the ^$ acts on a per line basis rather than per string (since you don't have ^$ doesn't make sense).
You can read more about the modifiers here.
Note the . needs to be escaped as well because it is a special character meaning any character. The ? after the .* makes it non-greedy so it will match the first ... that is found. The {3} says three of the previous character.
Regex101 demo: https://regex101.com/r/eO6iD1/1
Please escape the literal dots, since the character is also a regular expressions reservered sign, as you use it inside your code yourself:
preg_match_all('/\.\.\.(.*)\.\.\./m/', $rawdata, $m)
In case what you wanted to state is that there are line breaks within the content to match you would have to add this explicitely to your code:
preg_match_all('/\.\.\.([.\n\r]*)\.\.\./m/', $rawdata, $m)
Check here for reference on what characters the dot includes:
http://www.regular-expressions.info/dot.html
You're almost near to get it,
so you need to update your RE
/\.{3}(.*)\.{3}/m
RE breakdown
/: start/end of string
\.: match .
{3}: match exactly 3(in this case match exactly 3 dots)
(.*): match anything that comes after the first match(...)
m: match strings that are over Multi lines.
and when you're putting all things together, you'll have this
$str = "...this is a test...";
preg_match_all('/\.{3}(.*)\.{3}/m', $str, $m);
print_r($m);
outputs
Array
(
[0] => Array
(
[0] => ...this is a test...
)
[1] => Array
(
[0] => this is a test
)
)
DEMO

I have list of webpage URLs, I just need to strip everything except specific value and ID from it using regex

Suppose I have list of URLs that follow structure below. I need to strip each one out so all thats left is the abcustomerid=12345. How can I do this using regex with notepad ++?
Here's an example of the different variety in each line. I just need to remove everything from each line, but leave the abcustomerid=12345 or whatever value that follows abcustomerid.
/the/stucture/blah.php?timeout=300&abcustomerid=53122&customer=zxyi
/some/other/struct/pagehere.php?today=Thursday&abcustomerid=241&count=54
/blah/blah/tendid.php?abcustomerid=12525
Each line could have anything different around the abcustomerid, but i just need to remove everything and keep the abcustomerid and the value.
This regex should do it.
(?:&|\?)abcustomerid=(\d+)
Usage:
<?php
$string= '/the/stucture/blah.php?timeout=300&abcustomerid=53122&customer=zxyi
/some/other/struct/pagehere.php?today=Thursday&abcustomerid=241&count=54
/blah/blah/tendid.php?abcustomerid=12525';
preg_match_all('~(?:&|\?)abcustomerid=(\d+)~', $string, $output);
print_r($output[1]);
The ?: tells the regex not to capture that group. We don't want to capture that data because it is irrelevant. The () capture the data we are interested in. The \d+ is one or more numbers (the + is the one or more part of it). If it can be any value change that to .+? which will match anything but then you will need an anchor for where it should stop. I'd use (?:&|$), which tells it to capture until the next & or the end of the string if it is multilined you'll need to use the m modifier. http://php.net/manual/en/reference.pcre.pattern.modifiers.php
Output:
Array
(
[0] => 53122
[1] => 241
[2] => 12525
)
Demo:
http://sandbox.onlinephpfunctions.com/code/37a4ddea8c50f98a41ac7d45fec98f5f1f58761f
Here is the RegEx which takes the abcustomerid with its value.
[?&](abcustomerid=\d+)
However, how you are going to 'remove everything' using Notepad++?
You can use this service to do this (there is demo in the end of the answer).
Copy your regex and all your data into Test string form. After it succesfully matches everything, look at Match information window at the middle right of the page. Click Export matches... button and choose plain text.
You will get something like this:
abcustomerid=53122
abcustomerid=241
abcustomerid=12525
Here is the working Demo.

Regular expression - Extracting strings from a repeating format

EDIT - SOLVED:
Thanks for the responses - I learned that this is actually in serialised format and that there's no need to process it using RegEx.
Apologies for the newb question - and I have tried many many variations, based on StackOverflow answers with no luck. I've also spent a while experimenting with an online Regex tool to try and solve this myself.
This is the string I'm checking :
i:0;s:1:"1";i:1;s:1:"3";i:2;s:1:"5";i:3;s:1:"6";
I'll settle for matching these strings :
i:0;s:1:"1";
i:1;s:1:"3";
i:2;s:1:"5";
i:3;s:1:"6";
But ideally I would like to capture all the values between quotes only.
(there could be anywhere between 1-10 of these kinds of entries)
i.e. regex_result = [1,3,5,6]
These are some of the regexes I've tried.
I've only been able to either capture the first match, or the last match, but not all matches - I'm confused as to why the regex isnt "repeating" as I'd expected:
(i:.;s:1:".";)*
(i:.;s:1:".";)+
(i:.;s:1:".";)+?
Thanks
You can use this regex.
/(?<=:")\d+(?=";)/g
DEMO
"([^"]*)"
Try this .See demo.
http://regex101.com/r/hQ1rP0/43
You need to use \G so that it would get the number within double quotes which was preceded by i:.;s:1:"(Here the dot after i: represents any character). The anchor \G matches at the position where the previous match ended.
<?php
$string = 'i:0;s:1:"1";i:1;s:1:"3";i:2;s:1:"5";i:3;s:1:"6";';
echo preg_match_all('~(?:i:.;s:1:"|(?<!^)\G)(.)(?=";)~', $string, $match);
print_r($match[1]);
?>
Output:
4Array
(
[0] => 1
[1] => 3
[2] => 5
[3] => 6
)
DEMO

Emoticon Matching - PHP

I need to extract different types of terms from a string. I successfully am extracting alphanumeric characters, currency numbers, and different numerical formats with this regex:
$numalpha = '(\d+[a-zA-Z]+)';
$digitsPattern = '(\$|€|£)?\d+(\.\d+)?';
$wordsPattern = '[\p{L}]+';
preg_match_all('/('.$numalpha. '|' .$digitsPattern.'|'.$wordsPattern.')/ui', $str, $matches);
I also need to match emoticons. I compiled the following regex:
#(^|\W)(\>\:\]|\:-\)|\:\)|\:o\)|\:\]|\:3|\:c\)|\:\>|\=\]|8\)|\=\)|\:\}|\:\^\)|\>\:D|\:-D|\:D|8-D|x-D|X-D|\=-D|\=D|\=-3|8-\)|\>\:\[|\:-\(|\:\(|\:-c|\:c|\:-\<|\:-\[|\:\[|\:\{|\>\.\>|\<\.\<|\>\.\<|\>;\]|;-\)|;\)|\*-\)|\*\)|;-\]|;\]|;D|;\^\)|\>\:P|\:-P|\:P|X-P|x-p|\:-p|\:p|\=p|\:-Þ|\:Þ|\:-b|\:b|\=p|\=P|\>\:o|\>\:O|\:-O|\:O|°o°|°O°|\:O|o_O|o\.O|8-0|\>\:\\|\>\:/|\:-/|\:-\.|\:\\|\=/|\=\\|\:S|\:'\(|;'\()($|\W)#
which seems to work up to a certain extent: code.
It seems that it is not working for emoticons situated at the end of the string, even though I specified
($|\W)
inside the regex.
------------------EDIT-----------------
I removed the ($|W) as Tiddo suggested and it is now matching emoticons at the end of the string. The problem is that the regex, which contains (^|\W), is matching also the character preceding the emoticon.
For a test string:
$str = ":) Testing ,,:) ::) emotic:-)ons ,:( :D :O hsdhfkd :(";
The matches are as follows:
(
[0] => :)
[1] => ,:)
[2] => ::)
[3] => ,:(
[4] => :D
[5] => :O
[6] => :(
)
(The ',', ' ' and ':' are also matched in the ':)' and ':(' terms)
Online code snippet
How can this be fixed?
Actually if you change $full assignment to this regex based on positive lookahead:
$full = "#(?=^|\W|\w)(" . $regex .")(?=\w|\W|$)#";
or simply this one without any word boundary:
$full = "#(" . $regex .")#";
It will work as you expect without any problem. See the working code here http://ideone.com/EcCrD
Explanation: In your original code you had:
$full = "#(^|\W)(" . $regex . ")(\W|$)#";
Which is also matching and grabbing word boundaries. Now consider when more than one matching emoticon are separated by just single word boundary such as space. In this case regex matches first emoticon but grabs the text that includes space character. Now for the second emoticon it doesn't find word boundary i.e. \W and fails to grab that.
In my answer I am using positive lookahead but not actually grabbing word boundary and hence it works as expected and matches all emoticons.

Use String for Pattern but Exclude it from Being Removed

i'm pretty new on regex, i have learned something by the way, but is still pour knowledge!
so i want ask you for clarification on how it work!
assuming i have the following strings, as you can see they can be formatted little different way one from another but they are very similar!
DTSTART;TZID="America/Chicago":20030819T000000
DTEND;TZID="America/Chicago":20030819T010000
DTSTART;TZID=US/Pacific
DTSTART;VALUE=DATE
now i want replace everything between the first A-Z block and the colon so for example i would keep
DTSTART:20030819T000000
DTEND:20030819T010000
DTSTART
DTSTART
so on my very noobs knowledge i have worked out this shitty regex! :-(
preg_replace( '/^[A-Z](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );
but why i'm sure this regex will not work!? :-)
Pls help me!
PS: the title of question is pretty explaned, i want also know how for example use a well know string block for match another...
preg_replace( '/^[DTSTART](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );
..without delete DTSTART
Thanks for the time!
Regards
Luca Filosofi
You could use a relatively simple regex like the following.
$subject = 'DTSTART;TZID="America/Chicago":20030819T000000
DTEND;TZID="America/Chicago":20030819T010000
DTSTART;TZID=US/Pacific
DTSTART;VALUE=DATE';
echo preg_replace('/^[A-Z]+\K[^:\n]*/m', '', $subject) . PHP_EOL;
It looks for a series of capital letters at the start of a line, resets the match starting point (that's what \K does) to the end of those and matches anything not a colon or newline (i.e. the parts you want to remove). Those matched parts are then replaced with an empty string.
The output from the above would be
DTSTART:20030819T000000
DTEND:20030819T010000
DTSTART
DTSTART
If the lines that you are interested in will only ever start with DTSTART or DTEND then we could be more precise about what to match (e.g. ^DT(?:START|END)) but [A-Z] obviously covers both of those.
If you want to retain part of the matched pattern in a substitution, you put parentheses around it and then refer to it by $1 (or whichever grouping it is).
For example:
s/^(this is a sentence) to edit/$1/
gives "this is a sentence"
You can check out this example work similarly as your problem
\w+): (?P\d+)/', $str, $matches);
/* This also works in PHP 5.2.2 (PCRE 7.0) and later, however
* the above form is recommended for backwards compatibility */
// preg_match('/(?\w+): (?\d+)/', $str, $matches);
print_r($matches);
?>
The above example will output:
Array
(
[0] => foobar: 2008
[name] => foobar
[1] => foobar
[digit] => 2008
[2] => 2008
)
so if u need only digit u need to print $matches[digit]
You want to remove everything between a semicolon and either a colon or the end of the line, right? So use that as your expression. You're overcomplicating things.
preg_replace('/(?:;.+?:)|(?:;.+?$)/m','',$data);
It's a pretty simple expression. Either match (?:;.+?:) or (?:;.+?$), which differ only by their terminator (the first one matches up to a colon, the second one matches up to the end of the line).
Each is a non-capturing group that starts with a semicolon, reluctantly reads in all characters, then stops at the terminator. Everything matched by this is removable according to your description.

Categories