Regex - Do not capture word if it contains another word - php

I'm currently fighting with regex to achieve something and can't success by myself ...
Here my string: path/to/folder/#here
What I want is something like this: a:path/a:to/a:folder/#here
So my regex is the following one : /([^\/]+)/g, but the problem is that the result will be (preg_replace('/([^\/#]+)/', 'a:$0'): a:path/a:to/a:folder/a:#here ...
How can I had skip the replace if the captured group contains # ?
I tried this one, without any success : /((?!#)[^\/]+)/g

Another option could be to match what you want to avoid, and use a SKIP FAIL approach.
/#[^/#]*(*SKIP)(*F)|[^/]+
/# Match literally
[^/#]*(*SKIP)(*F) Optionally match any char except / and # and then skip this match
| Or
[^/]+ Match 1+ occurrences of any char except /
See a regex demo and a PHP demo.
For example
$str = 'path/to/folder/#here';
echo preg_replace('#/#[^/#]*(*SKIP)(*F)|[^/]+#', 'a:$0', $str);
Output
a:path/a:to/a:folder/#here

You can use
(?<![^\/])[^\/#][^\/]*
See the regex demo. Details:
(?<![^\/]) - a negative lookbehind that requires either start of string or a / char to appear immediately to the left of the current location
[^\/#] - a char other than / and #
[^\/]* - zero or more chars other than /.
See the PHP demo:
$text = 'path/to/folder/#here';
echo preg_replace('~(?<![^/])[^/#][^/]*~', 'a:$0', $text);
// => a:path/a:to/a:folder/#here

Related

How to capture all phrases which doesn't have a pattern in the middle of theirself?

I want to capture all strings that doesn't have the pattern _ a[a-z]* _ in the specified position in the example below:
<?php
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456"
);
foreach($myStrings as $myStr){
echo var_dump(
preg_match("/123-(?!a[a-z]*)-456/i", $myStr)
);
}
?>
You can check the following solution at this Regex101 share link.
^(123-(?:(?![aA][a-zA-Z]*).*)-456)|(123-456)$
It uses regex non-capturing group (?:) and regex negative lookahead (?!) to find all inner sections that do not start with 'a' (or 'A') and any letters after that. Also, the case with no inner section (123-456) is added (with the | sign) as a 2nd alternative for a wrong pattern.
A lookahead is a zero-length assertion. The middle part also needs to be consumed to meet 456. For consuming use e.g. \w+- for one or more word characters and hyphen inside an optional group that starts with your lookahead condition. See this regex101 demo (i flag for caseless matching).
Further for searching an array preg_grep can be used (see php demo at tio.run).
preg_grep('~^123-(?:(?!a[a-z]*-)\w+-)?456$~i', $myStrings);
There is also an invert option: PREG_GREP_INVERT. If you don't need to check for start and end a more simple pattern like -a[a-z]*- without lookahead could be used (another php demo).
Match the pattern and invert the result:
!preg_match('/a[a-z]*/i', $yourStr);
Don't try to do everything with a regex when programming languages exist to do the job.
You are not getting a match because in the pattern 123-(?!a[a-z]*)-456 the lookahead assertion (?!a[a-z]*) is always true because after matching the first - it has to directly match another hyphen like the pattern actually 123--456
If you move the last hyphen inside the lookahead like 123-(?!a[a-z]*-)456 you only get 1 match for 123-456 because you are actually not matching the middle part of the string.
Another option with php can be to consume the part that you don't want, and then use SKIP FAIL
^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$
Explanation
^ Start of string
123- Match literally
(?: Non capture group for the alternation
a[a-z]*-(*SKIP)(*F) Match a, then optional chars a-z, then match - and skip the match
| Or
\w+- Match 1+ word chars followed by -
)? Close the non capture group and make it optional to also match when there is no middle part
456 Match literally
$ End of string
Regex demo
Example
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456",
"123-b-456"
);
foreach($myStrings as $myStr) {
if (preg_match("/^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$/i", $myStr, $match)) {
echo "Match for $match[0]" . PHP_EOL;
} else {
echo "No match for $myStr" . PHP_EOL;
}
}
Output
Match for 123-456
Match for 123-7-456
No match for 123-Apple-456
Match for 123-0-456
No match for 123-Alphabet-456
Match for 123-b-456

Preg replace till dot including that

I am having one line I have to replace that below is the line
/myhome/ishere/where/are.you.ha
Here I have a live regex
preg_match('/(.+\/)[^.]+(.+\.ha)/', $input_line, $output_array);
it results me live this
/myhome/ishere/where/.you.ha
But I need a answer like this
/myhome/ishere/where/you.ha
Please anyone help me to remove this dot.
You could write the pattern as this, which will give you 2 capture groups that you can use:
(.+\/)[^.]+\.([^.]+\.ha)$
Explanation
(.+\/) Capture group 1, match 1+ chars and then the last /
[^.]+ Match 1+ non dots
\. Match a dot
([^.]+\.ha) Match non dots, then .ha
$ End of string
Regex demo | Php demo
If you use $1$2 in the replacement:
$pattern = "/(.+\/)[^.]+\.([^.]+\.ha)$/";
$s = "/myhome/ishere/where/are.you.ha";
echo preg_replace($pattern, "$1$2", $s);
Output
/myhome/ishere/where/you.ha
Or see a code an example using preg_match with 2 capture groups.
You can use
/^(.+\/)[^.]*\.(.*\.ha)$/
See the regex demo. Details:
^ - start of a string
(.+\/) - Group 1: any one or more chars other than line break chars as many as possible and then /
[^.]* - zero or more chars other than a .
\. - a . char
(.*\.ha) - Group 2: any zero or more chars other than line break chars as many as possible and then .ha
$ - end of a string
Although I got the answer I was mistaking to put +\. in between
(.+/)[^.]+\.(.+\.ha)
preg_replace('/(.+\/)[^.]+\.(.+\.ha)/', '$1$2', $input_lines);
This is how it works.

Sanitize phone number: regular expression match all except first occurence is on first position

regarding to this post "https://stackoverflow.com/questions/35413960/regular-expression-match-all-except-first-occurence" I'm wondering how to find the first occurence on a string only if it start's with a specfic character in PHP.
I would like to sanitize phonenumbers. Example bad phone number:
+49+12423#23492#aosd#+dasd
Regex to remove all "+" except first occurence.
\G(?:\A[^\+]*\+)?+[^\+]*\K\+
Problem: it should remove every "+" only if it starts with "+" not if the first occurence-position is greater than 1.
The regex to remove everything except numbers is easy:
[^0-9]*
But I don't know how to combine those two within one regex. I would just use preg_replace() twice.
Of course I would be able to use a workaround like if ($str[0] === '+') {...} but I prefer to learn some new stuff (regex :)
Thanks for helping.
You can use
(?:\G(?!\A)|^\+)[^+]*\K\+
See the regex demo. Details:
(?:\G(?!\A)|^\+) - either the end of the preceding successful match or a + at the start of string
[^+]* - zero or more chars other than +
\K - match reset operator discarding the text matched so far
\+ - a + char.
See the PHP demo:
$re = '/(?:\G(?!\A)|^\+)[^+]*\K\+/m';
$str = '+49+12423#23492#aosd#+dasd';
echo preg_replace($re, '', $str);
// => +4912423#23492#aosd#dasd
You seem to want to combine the two queries:
A regex to remove everything except numbers
A regex to remove all "+" except first occurence
Here is my two cents:
(?:^\+|\d)(*SKIP)(*F)|.
Replace what is matched with nothing. Here is an online demo
(?:^\+|\d) - A non-capture group to match a starting literal plus or any digit in the range from 0-9.
(*SKIP)(*F) - Consume the previous matched characters and fail them in the rest of the matching result.
| - Or:
. - Any single character other than newline.
I'd like to think that this is a slight adaptation of what some consider "The best regex trick ever" where one would first try to match what you don't want, then use an alternation to match what you do want. With the use of the backtracking control verbs (*SKIP)(*F) we reverse the logic. We first match what we do want, exclude it from the results and then match what we don't want.

Remove lines ending with a random format with regex or notepad+

I've got a list of url with random ending string like that :
paris-chambre-double-classique-avec-option-petit-dejeuner-a-lhotel-trianon-rive-gauche-4-pour-2-personnes-8ae0676c-aba2-4cf2-9391-91096a247672
paris-chambre-double-standard-avec-petit-dejeuner-et-acces-spa-pour-2-personnes-a-lhotel-le-mareuil-4-f707b0fe-31cb-4507-b7b3-7b91695bff9c
villes-deurope-visite-des-plus-grands-monuments-et-acces-aux-activites-etou-transport-avec-un-pass-par-destination-6a04659b-62c4-4995-9d0f-5e473df520cd
paris-chambre-doubletriplequadruple-confort-avec-petit-dejeuner-a-lhotel-de-france-gare-de-lyon-pour-2-a-4-pers-404f5780-9818-4599-af6b-be53b85a8185
paris-chambre-double-standard-avec-pdj-et-croisiere-sur-la-seine-en-option-a-lhotel-prince-albert-lyon-bercy-pour-2-33d0b087-5701-4199-9d9c-147cca687263.html
Now i try since few day with regex to convert this line into :
/paris-chambre-double-classique-avec-option-petit-dejeuner-a-lhotel-trianon-rive-gauche-4-pour-2-personnes-8ae0676c-aba2-4cf2-9391-91096a247672
/paris-chambre-double-standard-avec-petit-dejeuner-et-acces-spa-pour-2-personnes-a-lhotel-le-mareuil-4-f707b0fe-31cb-4507-b7b3-7b91695bff9c
villes-deurope-visite-des-plus-grands-monuments-et-acces-aux-activites-etou-transport-avec-un-pass-par-destination-6a04659b-62c4-4995-9d0f-5e473df520cd.html
/paris-chambre-doubletriplequadruple-confort-avec-petit-dejeuner-a-lhotel-de-france-gare-de-lyon-pour-2-a-4-pers-404f5780-9818-4599-af6b-be53b85a8185
paris-chambre-double-standard-avec-pdj-et-croisiere-sur-la-seine-en-option-a-lhotel-prince-albert-lyon-bercy-pour-2-33d0b087-5701-4199-9d9c-147cca687263.html
The problem is the random string :
3d0b087-5701-4199-9d9c-147cca687263
33d0b087-5701-4199-9d9c-147cca687263
I need to remove this part without having the last - and add .html: and add a / beforeurl like that:
/paris-chambre-doubletriplequadruple-confort-avec-petit-dejeuner-a-lhotel-de-france-gare-de-lyon-pour-2-a-4-pers.html
paris-chambre-double-standard-avec-pdj-et-croisiere-sur-la-seine-en-option-a-lhotel-prince-albert-lyon-bercy-pour-2.html
Thanks for your help. Regex is running me crazy.
This is for a new Linux server, running MySQL 5, PHP 5 and Apache 2.
The lines appear to end with some sort of hash, which means it can only contain the letters a to f and digits.
To match this hash, you can use the following regex (it does include the initial dash):
\-[0-9a-f]{8}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{4}\-[0-9a-f]{12}
See here for an demo
Once you have matched what you want to remove, you can replace it with the PHP preg_replace function.
You could use this pattern to capture into group part you want to keep ^(.+)(?:-[0-9a-zA-Z]+){5}$
and replace pattern is \\\1.html
Explanation:
^ - match beginning of a string
(.+) - capturing group: match one or more of any characters
(?:...) - non-capturing group
-[0-9a-zA-Z]+ - match hyphen - literally, then any letter (lower or uppercase) or any digit one or more times
{5} - match (?:-[0-9a-zA-Z]+) exactly five times
$ - match end of string
Replace pattern:
\\ - \ literally
\1 - refers to first capturing group
.html - .html literally
Demo

How to use search and replace all the matching words in a sentence in php

I have to search and replace all the words starting with # and # in a sentence. Can you please let me know the best way to do this in PHP. I tried with
preg_replace('/(\#+|\#+).*?(?=\s)/','--', $string);
This will solve only one word in a sentence. I want all the matches to be replace.
I cannot g here like in perl.
preg_replace replaces all matches by default. If it is not doing so, it is an issue with your pattern or the data.
Try this pattern instead:
(?<!\S)[##]+\w+
(?<!\S) - do not match if the pattern is preceded by a non-whitespace character.
[##]+ - match one or more of # and #.
\w+ - match one or more word characters (letter, numbers, underscores). This will preserve punctuation. For example, #foo. would be replaced by --.. If you don't want this, you could use \S+ instead, which matches all characters that are not whitespace.
A word starting with a character implies that it has a space right before this character. Try something like that:
/(?<!\S)[##].*(?=[^a-z])/
Why not use (?=\s)? Because if there is some ponctuation right after the word, it's not part of the word. Note: you can replace [^a-z] by any list of unallowed character in your word.
Be careful though, there are are two particular cases where that doesn't work. You have to use 3 preg_replace in a row, the two others are for words that begin and end the string:
/^[##].*(?=[^a-z])/
/(?<!\S)[##].*$/
Try this :
$string = "#Test let us meet_me#noon see #Prasanth";
$new_pro_name = preg_replace('/(?<!\S)(#\w+|#\w+)/','--', $string);
echo $new_pro_name;
This replaces all the words starting with # OR #
Output: -- let us meet_me#noon see --
If you want to replace word after # OR # even if it at the middle of the word.
$string = "#Test let us meet_me#noon see #Prasanth";
$new_pro_name = preg_replace('/(#\w+|#\w+)/','--', $string);
echo $new_pro_name;
Output: -- let us meet_me-- see --

Categories