How to make pattern string in preg_match_all

How to make pattern string in preg_match_all - php

How to make one pattern string for preg_match_all() function If I have multiple strings to match like "G,C,D" & "A,B,C" & "E,C,D"
Currently my string is
$str = "/(?<=\b)([CDEFGAB](?:b|bb|m)*(?:#|##|sus|maj|min|aug)*[\d\/]*(?:[CDEFGAB](?:b|bb‌|m)*(?:#|##|sus|maj|min|aug)*[\d\/]*)*)(?=\s|$|(<.*>))(?! \w)/"
preg_match_all($str,$lyrics , $output_array);
I want to add these combinations ("G,C,D" & "A,B,C" & "E,C,D") in $str string.

NOTE this answer has been dramatically edited over time to arrive at this final conclusion.
by referring to this site: http://www.all-guitar-chords.com/ i came up with this regular expression:
\b(?:G,C,D|A,B,C|E,C,D)|(?:[ABCDEFG](?:#|b)?)(?:\/[ABCDEFG]b)?(?:(?:(?:maj|min|sus|add|aug|dim)(?:\d{0,2}(?:#\d{1,2}|sus\d)?)?)|(?:m\d{0,2}(?:(?:maj|add|#)\d{0,2})?)|(?:-?\d{0,2}(?:\([^)]*\)|#\d{1,2})?))?)
<?php preg_match_all( '`' . $reg_exp . '`', $str, $matches ); ?>
my final expression does NOT capture any backreferences. therefore, $matches[0] will be an array of the matches (because the zero-index array is always an array of the full strings that matched the full expression and subsequent arrays will refer to backreferences, if any).
it was written with as much precision as possible, but this expression could still return false positives, although very unlikely.
the only other way to do this is to use an array of each of the possible chords you are interested in. then iterate that array searching for each string explicitly -- and that would still be prone to false positives.
here is a site to tinker with it: https://regex101.com/r/vW9sJ2/7
NOTE the original question by the OP was how to add certain string combinations into the expression such as "G,C,D" or "A,B,C" (examples given by the OP). however, the musicians i spoke with said they couldn't see how extracting the chord combinations would be helpful as there are so many variations. they weren't even sure how extracting chords like this would help at all. which makes me curious as to what the OP intends to use this for and i hope he can explain for me (especially seeing how much time i spent on this given my obsessesive nature, lol).
EDIT without reply from the OP, not sure if this fulfills the OPs needs, and i've begun to wonder if he needs a more flexible match for the combinations (ex: G,C,D). so i've modified the expression to handle that:
\b(?:[ABCDEFG],[ABCDEFG],[ABCDEFG])\b|(?:[ABCDEFG](?:#|b)?)(?:\/[ABCDEFG]b)?(?:(?:(?:maj|min|sus|add|aug|dim)(?:\d{0,2}(?:#\d{1,2}|sus\d)?)?)|(?:m\d{0,2}(?:(?:maj|add|#)\d{0,2})?)|(?:-?\d{0,2}(?:\([^)]*\)|#\d{1,2})?))?)
and the fiddle for that: https://regex101.com/r/vW9sJ2/7

Here is a way to match your G,C,D-like strings:
\b
(?:(?:[CDEFGAB]
(?:b|bb|m)*
(?:\#|\#\#|sus|maj|min|aug)*
[\d\/]*
(?:[CDEFGAB]
(?:b|bb‌|m)*
(?:\#|\#\#|sus|maj|min|aug)*
[\d\/]*
)*
),?)+
(?=\s|$|(?:<.*>))
(?![[:blank:]]\w)
Or as a one-liner:
\b(?:(?:[CDEFGAB](?:b|bb|m)*(?:\#|\#\#|sus|maj|min|aug)*[\d\/]*(?:[CDEFGAB](?:b|bb‌|m)*(?:\#|\#\#|sus|maj|min|aug)*[\d\/]*)*),?)+(?=\s|$|(?:<.*>))(?![[:blank:]]\w)

Related

Regex for PHP seems simple but is killing me

I'm trying to make a replace in a string with a regex, and I really hope the community can help me.
I have this string :
031,02a,009,a,aaa,AZ,AZE,02B,975,135
And my goal is to remove the opposite of this regex
[09][0-9]{2}|[09][0-9][A-Za-z]
i.e.
a,aaa,AZ,AZE,135
(to see it in action : http://regexr.com?3795f )
My final goal is to preg_replace the first string to only get
031,02a,009,02B,975
(to see it in action : http://regexr.com?3795f )
I'm open to all solution, but I admit that I really like to make this work with a preg_replace if it's possible (It became something like a personnal challenge)
Thanks for all help !

As #Taemyr pointed out in comments, my previous solution (using a lookbehind assertion) was incorrect, as it would consume 3 characters at a time even while substrings weren't always 3 characters.
Let's use a lookahead assertion instead to get around this:
'/(^|,)(?![09][0-9]{2}|[09][0-9][A-Za-z])[^,]*/'
The above matches the beginning of the string or a comma, then checks that what follows does not match one of the two forms you've specified to keep, and given that this condition passes, matches as many non-comma characters as possible.
However, this is identical to #anubhava's solution, meaning it has the same weakness, in that it can leave a leading comma in some cases. See this Ideone demo.
ltriming the comma is the clean way to go there, but then again, if you were looking for the "clean way to go," you wouldn't be trying to use a single preg_replace to begin with, right? Your question is whether it's possible to do this without using any other PHP functions.
The anwer is yes. We can take
'/(^|,)foo/'
and distribute the alternation,
'/^foo|,foo/'
so that we can tack on the extra comma we wish to capture only in the first case, i.e.
'/^foo,|,foo/'
That's going to be one hairy expression when we substitute foo with our actual regex, isn't it. Thankfully, PHP supports recursive patterns, so that we can rewrite the above as
'/^(foo),|,(?1)/'
And there you have it. Substituting foo for what it is, we get
'/^((?![09][0-9]{2}|[09][0-9][A-Za-z])[^,]*),|,(?1)/'
which indeed works, as shown in this second Ideone demo.
Let's take some time here to simplify your expression, though. [0-9] is equivalent to \d, and you can use case-insensitive matching by adding /i, like so:
'/^((?![09]\d{2}|[09]\d[a-z])[^,]*),|,(?1)/i'
You might even compact the inner alternation:
'/^((?![09]\d(\d|[a-z]))[^,]*),|,(?1)/i'

Try it in more steps:
$newList = array();
foreach (explode(',', $list) as $element) {
if (!preg_match('/[09][0-9]{2}|[09][0-9][A-Za-z]/', $element) {
$newList[] = $element;
}
}
$list = implode(',', $newList);
You still have your regex, see! Personnal challenge completed.

Try matching what you want to keep and then joining it with commas:
preg_match_all('/[09][0-9]{2}|[09][0-9][A-Za-z]/', $input, $matches);
$result = implode(',', $matches);
The problem you'll be facing with preg_replace is the extra-commas you'll have to strip, cause you don't just want to remove aaa, you actually want to remove aaa, or ,aaa. Now what when you have things to remove both at the beginning and at the end of the string? You can't just say "I'll just strip the comma before", because that might lead to an extra comma at the beginning of the string, and vice-versa. So basically, unless you want to mess with lookaheads and/or lookbehinds, you'd better do this in two steps.

This should work for you:
$s = '031,02a,009,a,aaa,AZ,AZE,02B,975,135';
echo ltrim(preg_replace('/(^|,)(?![09][0-9]{2}|[09][0-9][A-Za-z])[^,]+/', '', $s), ',');
OUTPUT:
031,02a,009,02B,975

Try this:
preg_replace('/(^|,)[1-8a-z][^,]*/i', '', $string);
this will remove all substrings starting with the start of the string or a comma, followed by a non allowed first character, up to but excluding the following comma.
As per #GeoffreyBachelet suggestion, to remove residual commas, you should do:
trim(preg_replace('/(^|,)[1-8a-z][^,]*/i', '', $string), ',');

Capturing a pattern of unknown repitition in PCRE

This may be a quick question for experienced regular expressionists, but I'm having trouble getting my match to execute correctly.
Suppose I had a string that looked like this:
http://aaa-bbbb-cc-ddddd-eee-.sub.dom
I would like to go capture all of the "aaa", "bbbb", "cc", and "ddddd" substrings, but I'm not sure how many there will be (e.g., having all triplets up through "zzz").
This is the regular expression I'm trying to use right now:
/http:\/\/(\w*?\-)+\.sub\.dom/
I wrote it this way because:
I want to match substrings, but I want each to terminate when a - is parsed
I want to capture one or more of these substrings
But it seems to only be saving the last match that it makes (in the above case, it would only match "eee-".
Is there a good way to capture all of the matched substrings?
More information: I'm using PHP's PCRE function preg_replace_callback. Thanks!

No, it is not possible to match an unknown number of capture groups.
If you try to repeat a capture group, it will always contain the last value captured.
Could you explain a bit more broadly what you're trying to do? Perhaps there is another simple way to do it (possibly without regular expressions).

If you want the items in the subdomain, and then all matches between the dashes... This should work:
$string = "http://aaa-bbbb-cc-ddddd-eee-.sub.dom";
preg_match("/^http:\/\/([\w-]+?)\..*$/i", $string, $match);
$parts = explode('-', $match[1]);
print_r($parts);
Short of that you will probably have to build a small parsing script to parse the string yourself if that doesn't do it for you.

Need to negate this regex pattern, but no clue how

I found a regex pattern for PHP that does the exact OPPOSITE of what I'm needing, and I'm wondering how I can reverse it?
Let's say I have the following text: Item_154 ($12)
This pattern /\((.*?)\)/ gets what's inside the parenthesis, but I need to get "Item_154" and cut out what's in parenthesis and the space before the parenthesis.
Anybody know how I can do that?
Regex is above my head apparently...

/^([^( ]*)/
Match everything from the start of the string until the first space or (.
If the item you need to match can have spaces in it, and you only want to get rid of whitespace immediately before the parenthetical, then you can use this instead:
/^([^(]*?)\s*\(/

The following will match anything that looks like text (...) but returns just the text part in the match.
\w+(?=\s*\([^)]*\))
Explanation:
The \w includes alphanumeric and underscore, with + saying match one or more.
The (?= ) group is positive lookahead, saying "confirm this exists but don't match it".
Then we have \s for whitespace, and * saying zero or more.
The \( and \) matches literal ( and ) characters (since its normally a special chat).
The [^)] is anything non-) character, and again * is zero or more.
Hopefully all makes sense?

/(.*)\(.*\)/
What is not in () will now be your 1st match :)

One site that really helped me was http://gskinner.com/RegExr/
It'll let you build a regex and then paste in some sample targets/text to test it against, highlighting matches. All of the possible regex components are listed on the right with (essentially) a tooltip describing the function.

<?php
$string = 'Item_154 ($12)';
$pattern = '/(.*)\(.*?\)/';
preg_match($pattern, $string, $matches);
var_dump($matches[1]);
?>
Should get you Item_154

The following regex works for your string as a replacement if that helps? :-
\s*\(.*?\)
Here's an explanation of what's it doing...
Whitespace, any number of repetitions - \s*
Literal - \(
Any character, any number of repetitions, as few as possible - .*?
Literal - \)
I've found Expresso (http://www.ultrapico.com/) is the best way of learning/working out regular expressions.
HTH

Here is a one-shot to do the whole thing
$text = 'Item_154 ($12)';
$text = preg_replace('/([^\s]*)\s(\()[^)]*(\))/', $1$2$3, $text);
var_dump($text);
//Outputs: Item_154()
Keep in mind that using any PCRE functions involves a fair amount of overhead, so if you are using something like this in a long loop and the text is simple, you could probably do something like this with substr/strpos and then concat the parens on to the end since you know that they should be empty anyway.
That said, if you are looking to learn REGEXs and be productive with them, I would suggest checking out: http://rexv.org
I've found the PCRE tool there to very useful, though it can be quirky in certain ways. In particular, any examples that you work with there should only use single quotes if possible, as it doesn't work with double quotes correctly.
Also, to really get a grip on how to use regexs, I would check out Mastering Regular Expressions by Jeffrey Friedl ISBN-13:978-0596528126
Since you are using PHP, I would try to get the 3rd Edition since it has a section specifically on PHP PCRE. Just make sure to read the first 6 chapters first since they give you the foundation needed to work with the material in that particular chapter. If you see the 2nd Edition on the cheap somewhere, that pretty much the same core material, so it would be a good buy as well.

preg_match returning weird results

I am searching a string for urls...and my preg_match is giving me an incorrect amount of matches for my demo string.
String:
Hey there, come check out my site at www.example.com
Function:
preg_match("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t<]*)#ise", $string, $links);
echo count($links);
The result comes out as 3.
Can anybody help me solve this? I'm new to REGEX.

$links is the array of sub matches:
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
The matches of the two groups plus the match of the full regular expression results in three array items.
Maybe you rather want all matches using preg_match_all.

If you use preg_match_pattern, (as Gumbo suggested), please note that if you run your regex against this string, it will both match the value of your anchor attribute "href" as well as the linked Text which in this case happens to comtain an url. This makes TWO matches.
It would be wise to run an array_unique on your resultset :)

In addition to the advice on how to use preg_match, I believe there is something seriously wrong with the regular expression you are using. You may want to trying something like this instead:
preg_match("_([a-zA-Z]+://)?([0-9a-zA-Z$-\_.+!*'(),]+\.)?([0-9a-zA-Z]+)+\.([a-zA-Z]+)_", $string, $links);
This should handle most cases (although it wouldn't work if there was a query string after the top-level domain). In the future, when writing regular expressions, I recommend the following web-sites to help: http://www.regular-expressions.info/ and especially http://regexpal.com/ for testing them as you're writing them.

Google Style Regular Expression Search

It's been several years since I have used regular expressions, and I was hoping I could get some help on something I'm working on. You know how google's search is quite powerful and will take stuff inside quotes as a literal phrase and things with a minus sign in front of them as not included.
Example: "this is literal" -donotfindme site:examplesite.com
This example would search for the phrase "this is literal" in sites that don't include the word donotfindme on the webiste examplesite.com.
Obviously I'm not looking for something as complex as Google I just wanted to reference where my project is heading.
Anyway, I first wanted to start with the basics which is the literal phrases inside quotes. With the help of another question on this site I was able to do the following:
(this is php)
$search = 'hello "this" is regular expressions';
$pattern = '/".*"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches);
But this outputs "this" instead of the desired this, and doesn't work at all for multiple phrases in quotes. Could someone lead me in the right direction?
I don't necessarily need code even a real nice place with tutorials would probably do the job.
Thanks!

Well, for this example at least, if you want to match only the text inside the quotes you'll need to use a capturing group. Write it like this:
$pattern = '/"(.*)"/';
and then $matches will be an array of length 2 that contains the text between the quotes in element 1. (It'll still contain the full text matched in element 0) In general, you can have more than one set of these parentheses; they're numbered from the left starting at 1, and there will be a corresponding element in $matches for the text that each group matched. Example:
$pattern = '/"([a-z]+) ([a-z]+) (.*)"/';
will select all quoted strings which have two lowercase words separated by a single space, followed by anything. Then $matches[1] will be the first word, $matches[2] the second word, and $matches[3] the "anything".
For finding multiple phrases, you'll need to pick out one at a time with preg_match(). There's an optional "offset" parameter you can pass, which indicates where in the string it should start searching, and to find multiple matches you should give the position right after the previous match as the offset. See the documentation for details.
You could also try searching Google for "regular expression tutorial" or something like that, there are plenty of good ones out there.

Sorry, but my php is a bit rusty, but this code will probably do what you request:
$search = 'hello "this" is regular expressions';
$pattern = '/"(.*)"/';
$regex = preg_match($pattern, $search, $matches);
print_r($matches[1]);
$matches1 will contain the 1st captured subexpression; $matches or $matches[0] contains the full matched patterns.
See preg_match in the PHP documentation for specifics about subexpressions.
I'm not quite sure what you mean by "multiple phrases in quotes", but if you're trying to match balanced quotes, it's a bit more involved and tricky to understand. I'd pick up a reference manual. I highly recommend Mastering Regular Expressions, by Jeffrey E. F. Friedl. It is, by far, the best aid to understanding and using regular expressions. It's also an excellent reference.

Here is the complete answer for all the sort of search terms (literal, minus, quotes,..) WITH replacements . (For google visitors at the least).
But maybe it should not be done with only regular expressions though.
Not only will it be hard for yourself or other developers to work and add functionality on what would be a huge and super complex regular expression otherwise
it might even be that it is faster with this approach.
It might still need a lot of improvement but at least here is a working complete solution in a class. There is a bit more in here than asked in the question, but it illustrates some reasons behind some choices.
class mySearchToSql extends mysqli {
protected function filter($what) {
if (isset(what) {
//echo '<pre>Search string: '.var_export($what,1).'</pre>';//debug
//Split into different desires
preg_match_all('/([^"\-\s]+)|(?:"([^"]+)")|-(\S+)/i',$what,$split);
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Surround with SQL
array_walk($split[1],'self::sur',array('`Field` LIKE "%','%"'));
array_walk($split[2],'self::sur',array('`Desc` REGEXP "[[:<:]]','[[:>:]]"'));
array_walk($split[3],'self::sur',array('`Desc` NOT LIKE "%','%"'));
//echo '<pre>'.var_export($split,1).'</pre>';//debug
//Add AND or OR
$this ->where($split[3])
->where(array_merge($split[1],$split[2]), true);
}
}
protected function sur(&$v,$k,$sur) {
if (!empty($v))
$v=$sur[0].$this->real_escape_string($v).$sur[1];
}
function where($s,$OR=false) {
if (empty($s)) return $this;
if (is_array($s)) {
$s=(array_filter($s));
if (empty($s)) return $this;
if($OR==true)
$this->W[]='('.implode(' OR ',$s).')';
else
$this->W[]='('.implode(' AND ',$s).')';
} else
$this->W[]=$s;
return $this;
}
function showSQL() {
echo $this->W? 'WHERE '. implode(L.' AND ',$this->W).L:'';
}
Thanks for all stackoverflow answers to get here!

You're in luck because I asked a similar question regarding string literals recently. You can find it here: Regex for managing escaped characters for items like string literals
I ended up using the following for searching for them and it worked perfectly:
(?<!\\)(?:\\\\)*(\"|')((?:\\.|(?!\1)[^\\])*)\1
This regex differs from the others as it properly handles escaped quotation marks inside the string.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to make pattern string in preg_match_all - php

Related

Regex for PHP seems simple but is killing me

Capturing a pattern of unknown repitition in PCRE

Need to negate this regex pattern, but no clue how

preg_match returning weird results

Google Style Regular Expression Search

Categories

Resources