Ignoring apostrophe while capturing contents in single quotes REGEX - php

The issue for me here is to capture the content inside single quotes(like 'xyz').
But the apostrophe which is the same symbol as a single quote(') is coming in the way!
The regex I've written is : /(\w\'\w)(*SKIP)(*F)|(\'[^\']*\')/
The example i have used is : Hello ma'am 'This is Prashanth's book.'
What needs to be captured is : 'This is Prashanth's book.'.
But, what's capured is : 'This is Prashanth'!
Here is the link of what i tried on online regex tester
Any help is greatly appreciated. Thank you!

You can't use [^\'] to capture a text that contains ' with in and in your example, This is Prashanth's book. contains a ' character within the text. You need to modify your regex to use .*? instead of [^\'] and can write your regex as this,
(\w'\w)(*SKIP)(*F)|('.*?'\B)
Demo with your updated regex
Also, you don't need to escape a single quote ' as that has no special meaning in regex.
From your example, it is not clear whether you want the captured match to contain ' around the match or not. In case you don't want ' to be captured in the match, you can use a lookarounds based regex and use this,
(?<=\B').*?(?='\B)
Explanation of regex:
(?<=\B') - This positive look behind ensures what gets captured in match is preceded by a single quote which is not preceded by a word character which is ensured by \B
.*? - Captures the text in non-greedy manner
(?='\B) - Ensures the matched text is followed by a single quote and \B ensures it doesn't match a quote that is immediately followed by any word character. E.g. it won't match an ending quote like 's
Demo

For the string you have provided, you can use the regex:
\B'\K(?:(?!'\B).)+
Click for Demo
Explanation:
\B - a non-word boundary
' - matches a '
\K - forget everything matched so far
(?:(?!'\B).)+ - matches 1+ occurrences of any character(except newline) which does not start with ' followed by a non-word boundary

Related

Regex CSV : Match quotes that are not delimiters

I'm working on a csv file that was badly built, I created a regex that only matches quotes that ARE NOT delimiters, in this link I succeeded, however do you think you can optimize my regex to have only quotes and not the letters around, the constrait and that the quotation marks at the beginning or at the end are not taken into account, example:
"ModifTextePub";"ModifObservation";"Resume"Vitrine";"Observations"Criteres"";"InternetOK";"NumPhoto";"AmianteLe";"SNavantLe";"ActePrec";"ProprietairesPrec";"Situation";"FraisNotaires"
in this example it would be necessary to match only between Resume " Vitrine and also those around " Criteres "
The regex I am using is
(.){1}(?<!;|\n|\r|\t)(")(?!;|\n|\r|\t)(.){1}
with $1$3 as replacement.
Your regex with negative lookarounds containing positive character classes can be transformed into a pattern with positive lookarounds containing negated character classes:
(?<=[^;\n\r\t])"(?=[^;\n\r\t])
See the regex demo. The replacement will be an empty string.
Now, the match will only occur if there is a " that is immediately preceded and followed with any char but ;, CR, LF or TAB.

How do I escape the brackets in a mysql REGEXP [duplicate]

I have a regular expression to escape all special characters in a search string. This works great, however I can't seem to get it to work with word boundaries. For example, with the haystack
add +
or
add (+)
and the needle
+
the regular expression /\+/gi matches the "+". However the regular expression /\b\+/gi doesn't. Any ideas on how to make this work?
Using
add (plus)
as the haystack and /\bplus/gi as the regex, it matches fine. I just can't figure out why the escaped characters are having problems.
\b is a zero-width assertion: it doesn't consume any characters, it just asserts that a certain condition holds at a given position. A word boundary asserts that the position is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one. (A "word character" is a letter, a digit, or an underscore.) In your string:
add +
...there's a word boundary at the beginning because the a is not preceded by a word character, and there's one after the second d because it's not followed by a word character. The \b in your regex (/\b\+/) is trying to match between the space and the +, which doesn't work because neither of those is a word character.
Try changing it to:
/\b\s?+/gi
Edit:
Extend this concept as far as you want. If you want the first + after any word boundary:
/\b[^+]*+/gi
Boundaries are very conditional assertions; what they anchor depends on what they touch. See this answer for a detailed explanation, along with what else you can do to deal with it.

PHP tricky regex to get quoted string up until certain words

I have a variant of strings that look like either of these
First rounder 'John Smith' had a good game.
Second rounder 'Jim O'Rielly' is on fire!
What I ultimately want is to get both names between quotes John Smith and Jim O'Rielly, however the tricky part is the names that include apostrophe like the second.
I initially was using '/\'([^\']*)\'/' to get the text inside the quotes, but doesn't work for the second case - this would only return Jim O.
I then thought to use .+?(?=had) in order to get everything up to the word had, but it needs to be either had or is, and I don't want the words First rounder, etc.
I need to essentially combine these, so I can get only the text inside the quotes, but UP UNTIL either word had or is, and I just want the text without quotes.
Unless there is a trick to get the 2nd option ignoring the apostrophe in the name (I thought to addSlashes() but how do I know which apostrophe's to add slashes to?), can anyone suggest a better solution to this ? Bonus points to ignore any special characters that I haven't considering may be found in the name :)
You can alternate between matching non-'s, and matching 's which have word characters on either side. This way 's in the middle of a word will be matched, but 's at either end of a word won't.
'((?:[^']+|\b'\b)+)'
https://regex101.com/r/L9Em5l/1
Another option could be matching any char except ' using a negated character class.
Then only accept matching a ' if followed by a word boundary and repeat that 0+ so it is optional and also matches a name without a single quote in it.
'([^']+(?:'\b[^']++)*)'
Explanation
'( Match starting ' and open capture group 1
[^']+ Match 1+ times any char except a '
(?: Non capture group
'\b[^']++ Match ' and word boundary, match 1+ times any char except ' using a possessive quantifier
)* Close group and repeat 0+ times so this will be optional
)' Close group 1 and match the closing '
Regex demo
If you don't want the negated character class match newlines, you could use [^'\r\n]+ instead.

What is the difference between 2 regex patterns?

I want users input their username with only alphanumeric and dot character.
So I wrote a regex pattern as following:
'/([a-zA-Z0-9\.]+)/'
But I want to know is it the same with:
'/([a-zA-Z0-9.]+)/'
2 below patterns is the same? Thank you for help! :-)
You don't need to escape the dot which was present inside a character class. Inside a character class, dot . and escaped dot \. matches the literal dot. So both regexes are same.
And also for validation purposes, i would suggest you to add anchors like '/^[a-zA-Z0-9.]+$/' . Anchors would be used to do a exact string match. That is , /[a-zA-Z0-9.]+/ regex would match the substring foo in this ()foo input string but if you add start and end anchors to your regex like /^[a-zA-Z0-9.]+$/, it won't match even a single character in the above mentioned string. It's allowed to match only one or more alphanumeric or dot characters , if it finds a character other than dot or alphanumeric, then the regex engine won't match the corresponding string.

REGEX - match words that contain letters repeating next to each other

im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).

Categories