Consolidate repeating pattern

Consolidate repeating pattern - php

I am working on a script that develops certain strings of alphanumeric characters, separated by a dash -. I need to test the string to see if there are any sets of characters (the characters that lie in between the dashes) that are the same. If they are, I need to consolidate them. The repeating chars would always occur at the front in my case.
Examples:
KRS-KRS-454-L
would become:
KRS-454-L
DERP-DERP-545-P
would become:
DERP-545-P

<?php
$s = 'KRS-KRS-454-L';
echo preg_replace('/^(\w+)-(?=\1)/', '', $s);
?>
// KRS-454-L
This uses a positive lookahead (?=...) to check for repeated strings.
Note that \w also contains the underscore. If you want to limit to alphanumeric characters only, use [a-zA-Z0-9].
Also, I've anchored with ^ as you've mentioned: "The repeating chars would always occur at the front [...]"

Try the pattern:
/([a-z]+)(?:-\1)*(.*)/i
and replace it with:
$1$2
A demo:
$tests = array(
'KRS-KRS-454-L',
'DERP-DERP-DERP-545-P',
'OKAY-666-A'
);
foreach ($tests as $t) {
echo preg_replace('/([a-z]+)(?:-\1)*(.*)/i', '$1$2', $t) . "\n";
}
produces:
KRS-454-L
DERP-545-P
OKAY-666-A
A quick explanation:
([a-z]+) # group the first "word" in match group 1
(?:-\1)* # match a hyphen followed by what was matched in
# group 1, and repeat it zero or more times
(.*) # match the rest of the input and store it in group 2
the replacement string $1$2 are replaced by what was matched by group 1 and group 2 in the pattern above.

Use this regex ((?:[A-Z-])+)\1{1} and replaced the matched string by $1.
\1 is used in connection with {1} in the above regex. It will look for repeating instance of characters.

You need back references. Using perl syntax, this would work for you:
$line =~ s/([A-Za-z0-9]+-)\1+/\1/gi;

Related

Validate string to contain only qualifying characters and a specific optional substring in the middle

I'm trying to make a regular expression in PHP. I can get it working in other languages but not working with PHP.
I want to validate item names in an array
They can contain upper and lower case letters, numbers, underscores, and hyphens.
They can contain => as an exact string, not separate characters.
They cannot start with =>.
They cannot finish with =>.
My current code:
$regex = '/^[a-zA-Z0-9-_]+$/'; // contains A-Z a-z 0-9 - _
//$regex = '([^=>]$)'; // doesn't end with =>
//$regex = '~.=>~'; // doesn't start with =>
if (preg_match($regex, 'Field_name_true2')) {
echo 'true';
} else {
echo 'false';
};
// Field=>Value-True
// =>False_name
//Bad_name_2=>

Use negative lookarounds. Negative lookahead (?!=>) at the beginning to prohibit beginning with =>, and negative lookbehind (?<!=>) at the end to prohibit ending with =>.
^(?!=>)(?:[a-zA-Z0-9-_]+(=>)?)+(?<!=>)$
DEMO

There is absolutely no requirement for lookarounds here.
Anchors and an optional group will suffice.
Demo
/^[\w-]+(?:=>[\w-]+)?$/
^^^^^^^^^^^^^-- this whole non-capturing group is optional
This allows full strings consisting exclusively of [0-9a-zA-Z-] or split ONCE by =>.
The non-capturing group may occur zero or one time.
In other words, => may occur after one or more [\w-] characters, but if it does occur, it MUST be immediately followed by one or more [\w-] characters until the end of the string.
To cover some of the ambiguity in the question requirements:
If foo=>bar=>bam is valid, then use /^[\w-]+(?:=>[\w-]+)*$/ which replaces ? (zero or one) with * (zero or more).
If foo=>=>bar is valid then use /^[\w-]+(?:(?:=>)+[\w-]+)*$/ which replaces => (must occur once) with (?:=>)+ (substring must occur one or more times).

Well, your character ranges equal to \w, so you could use
^(?!=>)(?:(?!=>$)(?:[-\w]|=>))+$
This construct uses a "tempered greedy token", see a demo on regex101.com.
More shiny, complicated and surely over the top, you could use subroutines as in:
(?(DEFINE)
(?<chars>[-\w]) # equals to A-Z, a-z, 0-9, _, -
(?<af>=>) # "arrow function"
(?<item>
(?!(?&af)) # no af at the beginning
(?:(?&af)?(?&chars)++)+
(?!(?&af)) # no af at the end
)
)
^(?&item)$
See another demo on regex101.com

For the example data, you can use
^[a-zA-Z0-9_-]+=>[a-zA-Z0-9_-]+$
The pattern matches:
^ Start of string
[a-zA-Z0-9_-]+ Match 1+ times any of the listed ranges or characters (can not start with =>)
=> Match literally
[a-zA-Z0-9_-]+ Match again 1+ times any of the listed ranges or characters
$ End of string
Regex demo
If you want to allow for optional spaces:
^\h*[a-zA-Z0-9_-]+\h*=>\h*[a-zA-Z0-9_-]+\h*$
Regex demo
Note that [a-zA-Z0-9_-] can be written as [\w-]

Why regex with lookaheads doesn't match?

I need (in PHP) to split a sententse by the word that cannot be the first or the last one in the sentence. Say the word is "pression" and here is my regex
/^.+?(?=[\s\.\,\:\;])pression(?=[\s\.\,\:\;]).+$/i
Live here: https://regex101.com/r/CHAhKj/1/
First, it doesn't match.
Next, I think - it is at all possible to split that way? I tryed simplified example
print_r(preg_split('/^.+pizza.+$/', 'my pizza is cool'));
live here http://sandbox.onlinephpfunctions.com/code/10b674900fc1ef44ec79bfaf80e83fe1f4248d02
and it prints an array of 2 empty strings, when I expect
['my ', ' is cool']

I need (in PHP) to split a sentence by the word that cannot be the first or the last one in the sentence
You may use this regex:
(?<=[^\s.?]\h)pression(?=\h[^\s.?])
RegEx Demo
RegEx Details:
(?<=[^\s.?]\h): Lookbehind to assert that ahead of current position we have a space and a character that not a whitespace, not a dot and not a ?.
pression: Match word pression
(?=\h[^\s.?]): Lookahead to assert that before current position we have a space and a character that not a whitespace, not a dot and not a ?

First, ^.+?(?=[\s\.\,\:\;])pression(?=[\s\.\,\:\;]).+$ can't match any string at all because the (?=[\s\.\,\:\;])p part requires p to be also either a whitespace char, or a ., ,, : or ;, which invalidates the whole match at once.
Second, ^.+pizza.+$ pattern does not ensure the pizza matched is not the first or last word in a sentence as . matches whitespace, too. It does not return anything meaningful, because preg_split uses the match to break string into chunks, and the two empty values are 1) start of string and 2) empty string positions.
That said, all you need is:
preg_match('~^(.*?\w\W+)pression(\W+\w.*)$~is', $text, $m)
See the regex demo. Details:
^ - start of string
(.*?\w\W+) - Capturing group 1: any zero or more chars, as few as possible, then a word char and then one or more non-word chars
pression - a word
(\W+\w.*) - Capturing group 2: one or more non-word chars, a word char, and then any zero or more chars as many as possible
$ - end of string.
s makes the . match across lines and i flag makes the pattern match in a case insensitive way.
See the PHP demo:
$text = "You can use any regular expression pression inside the lookahead ";
if (preg_match('~^(.*?\w\W+)pression(\W+\w.*)$~is', $text, $m)) {
echo $m[1] . " << | >> " . $m[2];
}
// => You can use any regular expression << | >> inside the lookahead

Finding #mentions in string

Trying to replace all occurrences of an #mention with an anchor tag, so far I have:
$comment = preg_replace('/#([^# ])? /', '#$1 ', $comment);
Take the following sample string:
"#name kdfjd fkjd as#name # lkjlkj #name"
Everything matches okay so far, but I want to ignore that single "#" symbol. I've tried using "+" and "{2,}" after the "[^# ]" which I thought would enforce a minimum amount of matches, but it's not working.

Replace the question mark (?) quantifier ("optional") and add in a + ("one or more") after your character class:
#([^# ]+)

The regex
(^|\s)(#\w+)
Might be what you are after.
It basically means, the start of the line, or a space, then an # symbol followed by 1 or more word characters.
E.g.
preg_match_all('/(^|\s)(#\w+)/', '#name1 kdfjd fkjd as#name2 # lkjlkj #name3', $result);
var_dump($result[2]);
Gives you
Array
(
[0] => #name1
[1] => #name3
)

I like Petah's answer but I adjusted it slightly
preg_replace('/(^|\s)#([\w.]+)/', '$1#$2', $text);
The main differences are:
the # symbol is not included. That's for display only, should not be in the URL
allows . character (note: \w includes underscore)
in the replacement, I added $1 at the beginning to preserve the whitespace

Replacing ? with + will work but not as you expect.
Your expression does not match #name at the end of string.
$comment = preg_replace('##(\w+)#', '$0 ', $comment);
This should do what you want. \w+ stands for letter (a-zA-Z0-9)

I recommend using a lookbehind before matching the # then one or more characters which are not a space or #.
The "one or more" quantifier (+) prevents the matching of mentions that mention no one.
Using a lookbehind is a good idea because it not only prevents the matching of email addresses and other such unwanted substrings, it asks the regex engine to primarily search #s then check the preceding character. This should improve pattern performance since the number of spaces should consistently outnumber the number of mentions in comments.
If the input text is multiline or may contain newlines, then adding an m pattern modifier will tell ^ to match all line starts. If newlines and tabs are possible, is will be more reliable to use (?<=^|\s)#([^#\s]+).
Code: (Demo)
$comment = "#name kdfjd ## fkjd as#name # lkjlkj #name";
var_export(
preg_replace(
'/(?<=^| )#([^# ]+)/',
'#$1',
$comment
)
);
Output: (single-quotes are from var_export())
'#name kdfjd ## fkjd as#name # lkjlkj #name'

Try:
'/#(\w+)/i'

Regex matching if maximum two occurrences of dot and dash

I need a regular expression that will match any string containing at most 2 dashes and 2 dots.
There does not HAVE to be a dash nor a dot, but if there is 3+ dashes or 3 dots or even both 3+ dashes and 3+ dots, then the regex must not match the string.
Intended for use in PHP.
I know of easy alternatives using PHP functions, but it is to be used in a large system that just allows filtering using regular expressions.
Example string that will be MATCHED:
hello-world.com
Example string that will NOT be matched:
www.hello-world.easy.com or hello-world-i-win.com

Is this matching your expectations?
(?!^.*?([.-]).*\1.*\1.*$)^.*$
See it here on Regexr
(?!^.*?([.-]).*\1.*\1.*$) is a negative lookahead. It matches the first .- put it in the capture group 1, and then checks if there are two more of them using hte backreference \1. As soon as it found three, the expression will not match anymore.
^.*$ matches everything from start to the end, if the negative lookahead has not matched.

Use this: (?!^.*?([-.])(?:.*\1){2}.*$)^.*$

This tested regex will do the trick:
$re = '/# Match string with 2 or fewer dots or dashes
^ # Anchor to start of string.
(?=[^.]*(?:\.[^.]*){0,2}$) # Assert 2 or fewer dots.
(?=[^\-]*(?:-[^\-]*){0,2}$) # Assert 2 or fewer dashes.
.* # Ok to match string.
$ # Anchor to end of string.
/sx';

PHP: How to convert a string that contains upper case characters

i'm working on class names and i need to check if there is any upper camel case name and break it this way:
"UserManagement" becomes "user-management"
or
"SiteContentManagement" becomes "site-content-management"
after extensive search i only found various use of ucfirst, strtolower,strtoupper, ucword and i can't see how to use them to suit my needs any ideas?
thanks for reading ;)

You can use preg_replace to replace any instance of a lowercase letter followed with an uppercase with your lower-dash-lower variant:
$dashedName = preg_replace('/([^A-Z-])([A-Z])/', '$1-$2', $className);
Then followed by a strtolower() to take care of any remaining uppercase letters:
return strtolower($dashedName);
The full function here:
function camel2dashed($className) {
return strtolower(preg_replace('/([^A-Z-])([A-Z])/', '$1-$2', $className));
}
To explain the regular expression used:
/ Opening delimiter
( Start Capture Group 1
[^A-Z-] Character Class: Any character NOT an uppercase letter and not a dash
) End Capture Group 1
( Start Capture Group 2
[A-Z] Character Class: Any uppercase letter
) End Capture Group 2
/ Closing delimiter
As for the replacement string
$1 Insert Capture Group 1
- Literal: dash
$2 Insert Capture Group 2

Theres no built in way to do it.
This will ConvertThis into convert-this:
$str = preg_replace('/([a-z])([A-Z])/', '$1-$2', $str);
$str = strtolower($str);

You can use a regex to get each words, then add the dashes like this:
preg_match_all ('/[A-Z][a-z]+/', $className, $matches); // get each camelCase words
$newName = strtolower(implode('-', $matches[0])); // add the dashes and lowercase the result

This simply done without any capture groups -- just find the zero-width position before an uppercase letter (excluding the first letter of the string), then replace it with a hyphen, then call strtolower on the new string.
Code: (Demo)
echo strtolower(preg_replace('~(?!^)(?=[A-Z])~', '-', $string));
The lookahead (?=...) makes the match but doesn't consume any characters.

The best way to do that might be preg_replace using a pattern that replaces uppercase letters with their lowercase counterparts adding a "-" before them.
You could also go through each letter and rebuild the whole string.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Consolidate repeating pattern - php

Use this regex ((?:[A-Z-])+)\1{1} and replaced the matched string by $1. \1 is used in connection with {1} in the above regex. It will look for repeating instance of characters.

You need back references. Using perl syntax, this would work for you: $line =~ s/([A-Za-z0-9]+-)\1+/\1/gi;

Related

Validate string to contain only qualifying characters and a specific optional substring in the middle

Why regex with lookaheads doesn't match?

Finding #mentions in string

Regex matching if maximum two occurrences of dot and dash

PHP: How to convert a string that contains upper case characters

Categories

Resources