I'm trying to use preg_match_all to match anything between ... and ... and the line does word wrap. I've done number of searches on google and tried different combinations and nothing is working. I have tried this
preg_match_all('/...(.*).../m/', $rawdata, $m);
Below is an example of what the format will look like:
...this is a test...
...this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test this is a test...
The s modifier allows for . to include new line characters so try:
preg_match_all('/\.{3}(.*?)\.{3}/s', $rawdata, $m);
The m modifier you were using is so the ^$ acts on a per line basis rather than per string (since you don't have ^$ doesn't make sense).
You can read more about the modifiers here.
Note the . needs to be escaped as well because it is a special character meaning any character. The ? after the .* makes it non-greedy so it will match the first ... that is found. The {3} says three of the previous character.
Regex101 demo: https://regex101.com/r/eO6iD1/1
Please escape the literal dots, since the character is also a regular expressions reservered sign, as you use it inside your code yourself:
preg_match_all('/\.\.\.(.*)\.\.\./m/', $rawdata, $m)
In case what you wanted to state is that there are line breaks within the content to match you would have to add this explicitely to your code:
preg_match_all('/\.\.\.([.\n\r]*)\.\.\./m/', $rawdata, $m)
Check here for reference on what characters the dot includes:
http://www.regular-expressions.info/dot.html
You're almost near to get it,
so you need to update your RE
/\.{3}(.*)\.{3}/m
RE breakdown
/: start/end of string
\.: match .
{3}: match exactly 3(in this case match exactly 3 dots)
(.*): match anything that comes after the first match(...)
m: match strings that are over Multi lines.
and when you're putting all things together, you'll have this
$str = "...this is a test...";
preg_match_all('/\.{3}(.*)\.{3}/m', $str, $m);
print_r($m);
outputs
Array
(
[0] => Array
(
[0] => ...this is a test...
)
[1] => Array
(
[0] => this is a test
)
)
DEMO
Related
I'm pretty lousy at regex, and need help with the following scenario. I need to locate and replace text that has a common structure, but one aspect will be different:
here is a string (with 3 values)
here is another string (with 5 values)
In the above examples, I need to locate and then replace the value in parenthesis. I can't search by parens alone, as the string may contain other parens. But the value in the parens that needs to be replaced is consistently constructed: (with # values) -- the only difference will be the number.
So ideally the regex returns (with 3 values) and (with 5 values) so I can use a simple str_replace to change the text.
This is regex in a PHP script.
Try with this regex :
\(with\s+\d+\s+values\)
Demo here
The following regex should work for you:
/\(with (\d+) values\)/g
This matches strings of the specified format and gives the value in a capture group so it may be used in the replace. The g flag at the end is only needed if you have multiple of these in one string.
Demo here
If, however, there can only be one digit, then the following will work:
/\(with (\d) values\)/g
Or, if the number can only be a digit greater than 1, for example, then the following:
/\(with ([2-9]) values\)/g
If I got you right, you are looking for exactly three or five items within parentheses (comma separated).
This could be accomplished by
\( # "(" literally
(?:[^,()]+,){2} # not , or ( or ) exactly two times
(?:(?:[^,()]+,){2})? # repeated
[^,()]+ # without the comma in the end
\) # the closing parenthesis
See a demo on regex101.com.
If you're really looking only for two variant of strings, you could very easily do
\(with (?:3|5) values\)
In general
\(with \d+ values\)
as proposed by #SchoolBoy.
Something like this maybe
$str ="here is another string (with 5 values)";
preg_match_all("/\(with (\d+) values\)/", $str, $out );
print_r( $out );
Output:
Array
(
[0] => Array
(
[0] => (with 5 values)
)
[1] => Array
(
[0] => 5
)
)
Here at ideone...
It uses the regex
\(with (\d+) values\)
that matches the literal opening parentheses followed by the string with # values, capturing the actual number #, and finally the closing parentheses.
It returns the complete match (the parenthesized string) in the first dimension and the actual number in the second.
I am having trouble creating a regex in PHP whereby I need to extract all URLs beginning like
http://hello.hello/asefaesasef my name is
https://aw3raw.com/asdfase/
www.aer.com/afseaegfefsesef\
domain.com/afsegaesga"
I need to basically extract the URL until I hit a white space, a backslash (\) or a double quote (").
I have the following code:
$column = "adsfahttp://hello.hello/asefaesas\"ef asefa aweoija weeij asd sa https://aw3raw.com/asdfase/ asdafewww.aer.com/afseaegfefsesef\ even ashafueh domain.com/afsegaesga\"asdfasda";
preg_match_all("/(http|https):\/\/\S+[^(\"|\\)]+/",$column,$urls);
echo "Url = \n";
print_r($urls);
So I need my to extract so I have:
http://hello.hello/asefaesasef
https://aw3raw.com/asdfase
www.aer.com/afseaegfefsesef
domain.com/afsegaesga
I'm struggling to get my head around it as my result is showing as:
Url =
Array
(
[0] => Array
(
[0] => http://hello.hello/asefaesas"ef asefa aweoija weeij asd sa https://aw3raw.com/asdfase/ asdafewww.aer.com/afseaegfefsesef\ even ashafueh domain.com/afsegaesga
)
[1] => Array
(
[0] => http
)
)
First, you've got the syntax of character classes wrong. Within the square brackets, you don't need parentheses for grouping or pipes for alternation. Just list the characters you're interested in--or in this case, that you want to exclude.
What you're doing now is matching some non-whitespace characters (including \ and "), followed by some not-quote, non-backslash characters (including whitespace). You need to combine both criteria into one negated character class:
preg_match_all("~https?://[^\"\s\\\\]+~", $column, $urls);
Notice that this only matches the URLs starting with http:// or https://. You can' make the protocol optional ("~(?:https?://)?[^\"\s\\\\]+~"), but then the regex will match almost anything, making it useless. Are all your URLs at the beginning of a line, the way you showed them? If so, you can use an anchor instead:
preg_match_all('/(?m)^[^\"\s\\\\]+/', $column, $urls);
You just need to add a \s to your regex: /(http|https):\/\/\S+[^(\"|\\)\s]+/ so it doesn't match a whitespace.
This is the scenario:
JS file is loaded into string using file_get_contents
I want to remove all debugging info from it
For the purpose of finding out whats happening in PHP code I am
using preg_match
I'm using this expression:
(\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*?[^}(])\);?$
On regex101 and phpliveregex websites it matches:
//console.log(abc)
// console.log(abc)
// console.log(abc);
// console.log('abc');
console.log(abc);
console.log('abc' + some_function());
etc...
But when I put it in PHP code like this:
preg_match('/(\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)\((.*?[^}(])\);?$/', $js_code, $matches);
if (!empty($matches[0])) print_r($matches[0]);
I dont get any matches. Too tired to notice what am I missing. Probably something staring at me with its big eyes. :)
Any help would be appreciated.
After some further investigation I improved my regex pattern to match every combination.
#Jan
Your answer pushed me in the right direction.
((\/\/)?(\s*?)console\.(log|debug|info|log|warn|error|assert|dir|dirxml|trace|group|groupEnd|time|timeEnd|profile|profileEnd|count)(\s*?)\((.*[^}(])(\){1,});?)
Why so complicated? Do you need this distinctuation between the different functions (log, etc.) ? The following regex matches all of your above examples. See a working demo here.
$regex = '/(?<console>(?:\/\/)?\s*console\.[^;]+;)/g';
# captured group named console with two forward slashes optionally
# followed by whitespaces (or not)
# match console. literally then anything up to a semicolon
preg_match_all($regex, $js_string, $matches);
print_r($matches["console"]);
As per your comment, if you need to match the actual method name as well, you could alter the regex like so:
$regex = '/(?<console>(?:\/\/)?\s*console\.(?<function>[^(]+)[^;]+;)/g';
Now $matches["function"] hold the actual method name, see a demo for this here.
So this is what I did to approach your problem. Hopefully it works for you.
// DEFINE THE STRING
$string = "
<br>Other Text Goes Here
//console.log(abc)
// console.log(abc)
// console.log(abc);
// console.log('abc');
<br>More Text Here
console.log(abc);
console.warn('abc' + some_function());
console.log('abc' + some_function());
<br>And More Text Goes Here";
// DO THE PREG_MATCH_ALL TO FIND ALL OCCURRENCES
preg_match_all('~(?://)?\s*console\.[A-Z]+\(.*?$~sim', $string, $matches);
print "<pre>"; print_r($matches[0]); print "</pre>";
That will give you the following:
Array
(
[0] => //console.log(abc)
[1] => // console.log(abc)
[2] => // console.log(abc);
[3] => // console.log('abc');
[4] =>
console.log(abc);
[5] =>
console.warn('abc' + some_function());
[6] =>
console.log('abc' + some_function());
)
Finding them is one thing, but not too different from actually replacing the occurrences of it with an empty string. Something like this should do the trick:
print preg_replace('~((?://)?\s*console\.[A-Z]+\(.*?$)~sim', '', $string);
That will show this in the browser:
Other Text Goes Here
More Text Here
And More Text Goes Here
Here is a working demo for you to take a look at:
http://ideone.com/Vv0cGY
Explanation:
(?://)?\s*console\.[A-Z]+\(.*?$
(?://)? - Look for an optional two forward slashes. The ?: in front tells it to find it, but don't remember it.
\s* - Look for any spaces that may or may not be present.
console\.[A-Z]+ - Will match console, followed by a literal dot ., followed by at least one alpha character.
\(.*?$ - Find an open parenthesis and grab everything up through the end of the line.
I need to extract different types of terms from a string. I successfully am extracting alphanumeric characters, currency numbers, and different numerical formats with this regex:
$numalpha = '(\d+[a-zA-Z]+)';
$digitsPattern = '(\$|€|£)?\d+(\.\d+)?';
$wordsPattern = '[\p{L}]+';
preg_match_all('/('.$numalpha. '|' .$digitsPattern.'|'.$wordsPattern.')/ui', $str, $matches);
I also need to match emoticons. I compiled the following regex:
#(^|\W)(\>\:\]|\:-\)|\:\)|\:o\)|\:\]|\:3|\:c\)|\:\>|\=\]|8\)|\=\)|\:\}|\:\^\)|\>\:D|\:-D|\:D|8-D|x-D|X-D|\=-D|\=D|\=-3|8-\)|\>\:\[|\:-\(|\:\(|\:-c|\:c|\:-\<|\:-\[|\:\[|\:\{|\>\.\>|\<\.\<|\>\.\<|\>;\]|;-\)|;\)|\*-\)|\*\)|;-\]|;\]|;D|;\^\)|\>\:P|\:-P|\:P|X-P|x-p|\:-p|\:p|\=p|\:-Þ|\:Þ|\:-b|\:b|\=p|\=P|\>\:o|\>\:O|\:-O|\:O|°o°|°O°|\:O|o_O|o\.O|8-0|\>\:\\|\>\:/|\:-/|\:-\.|\:\\|\=/|\=\\|\:S|\:'\(|;'\()($|\W)#
which seems to work up to a certain extent: code.
It seems that it is not working for emoticons situated at the end of the string, even though I specified
($|\W)
inside the regex.
------------------EDIT-----------------
I removed the ($|W) as Tiddo suggested and it is now matching emoticons at the end of the string. The problem is that the regex, which contains (^|\W), is matching also the character preceding the emoticon.
For a test string:
$str = ":) Testing ,,:) ::) emotic:-)ons ,:( :D :O hsdhfkd :(";
The matches are as follows:
(
[0] => :)
[1] => ,:)
[2] => ::)
[3] => ,:(
[4] => :D
[5] => :O
[6] => :(
)
(The ',', ' ' and ':' are also matched in the ':)' and ':(' terms)
Online code snippet
How can this be fixed?
Actually if you change $full assignment to this regex based on positive lookahead:
$full = "#(?=^|\W|\w)(" . $regex .")(?=\w|\W|$)#";
or simply this one without any word boundary:
$full = "#(" . $regex .")#";
It will work as you expect without any problem. See the working code here http://ideone.com/EcCrD
Explanation: In your original code you had:
$full = "#(^|\W)(" . $regex . ")(\W|$)#";
Which is also matching and grabbing word boundaries. Now consider when more than one matching emoticon are separated by just single word boundary such as space. In this case regex matches first emoticon but grabs the text that includes space character. Now for the second emoticon it doesn't find word boundary i.e. \W and fails to grab that.
In my answer I am using positive lookahead but not actually grabbing word boundary and hence it works as expected and matches all emoticons.
i'm pretty new on regex, i have learned something by the way, but is still pour knowledge!
so i want ask you for clarification on how it work!
assuming i have the following strings, as you can see they can be formatted little different way one from another but they are very similar!
DTSTART;TZID="America/Chicago":20030819T000000
DTEND;TZID="America/Chicago":20030819T010000
DTSTART;TZID=US/Pacific
DTSTART;VALUE=DATE
now i want replace everything between the first A-Z block and the colon so for example i would keep
DTSTART:20030819T000000
DTEND:20030819T010000
DTSTART
DTSTART
so on my very noobs knowledge i have worked out this shitty regex! :-(
preg_replace( '/^[A-Z](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );
but why i'm sure this regex will not work!? :-)
Pls help me!
PS: the title of question is pretty explaned, i want also know how for example use a well know string block for match another...
preg_replace( '/^[DTSTART](?!;[A-Z]=[\w\W]+):$/m' , '' , $data );
..without delete DTSTART
Thanks for the time!
Regards
Luca Filosofi
You could use a relatively simple regex like the following.
$subject = 'DTSTART;TZID="America/Chicago":20030819T000000
DTEND;TZID="America/Chicago":20030819T010000
DTSTART;TZID=US/Pacific
DTSTART;VALUE=DATE';
echo preg_replace('/^[A-Z]+\K[^:\n]*/m', '', $subject) . PHP_EOL;
It looks for a series of capital letters at the start of a line, resets the match starting point (that's what \K does) to the end of those and matches anything not a colon or newline (i.e. the parts you want to remove). Those matched parts are then replaced with an empty string.
The output from the above would be
DTSTART:20030819T000000
DTEND:20030819T010000
DTSTART
DTSTART
If the lines that you are interested in will only ever start with DTSTART or DTEND then we could be more precise about what to match (e.g. ^DT(?:START|END)) but [A-Z] obviously covers both of those.
If you want to retain part of the matched pattern in a substitution, you put parentheses around it and then refer to it by $1 (or whichever grouping it is).
For example:
s/^(this is a sentence) to edit/$1/
gives "this is a sentence"
You can check out this example work similarly as your problem
\w+): (?P\d+)/', $str, $matches);
/* This also works in PHP 5.2.2 (PCRE 7.0) and later, however
* the above form is recommended for backwards compatibility */
// preg_match('/(?\w+): (?\d+)/', $str, $matches);
print_r($matches);
?>
The above example will output:
Array
(
[0] => foobar: 2008
[name] => foobar
[1] => foobar
[digit] => 2008
[2] => 2008
)
so if u need only digit u need to print $matches[digit]
You want to remove everything between a semicolon and either a colon or the end of the line, right? So use that as your expression. You're overcomplicating things.
preg_replace('/(?:;.+?:)|(?:;.+?$)/m','',$data);
It's a pretty simple expression. Either match (?:;.+?:) or (?:;.+?$), which differ only by their terminator (the first one matches up to a colon, the second one matches up to the end of the line).
Each is a non-capturing group that starts with a semicolon, reluctantly reads in all characters, then stops at the terminator. Everything matched by this is removable according to your description.