Regex: Replace Characters In-between Two Characters - php

I'm having trouble using Regex to replace strings that have a ? in between two characters. Two examples of what I'd like Regex to match are:
• Replace thi?s question mark but not this one?
• ? Replace the lonely question mark
What's the best way to:
a) Match a character surrounded by other characters
b) Match a character that is on it's own and has no characters before it or after it
I'm using PHP preg_match and MySQL REGEXP to preform these pattern matchings. For MySQL I've tried:
SELECT description
FROM locations
WHERE description
REGEXP '/|([^?]+)\/'
For PHP I've tried:
preg_match('/|([^?]+)\/', $string);

I suggest this one for PHP:
(?<!\w(?=\? ))\?(?!\s*$)\s*
(?!\s*$) is a negative lookahead that will prevent a ? from matching if it is at the end of a sentence (I added whitespaces just in case).
(?<!\w(?=\? )) is a little more complex. It will prevent a match if the ? is preceded by a \w character (typically read as [a-zA-Z0-9_]) and followed by a space.
regex101 demo
I don't know whether mysql supports lookbehinds though.
|([^?]+)\
This is your current regex and I don't think your PHP code runs. The \ at the end is not escaping anything (in fact, it's trying to escape the delimiter) so... :s

Check this Demo Code Viper
Pattern
/(\w+)?(\w+)/g
Test this Pattern
PHP
<?php
echo preg_replace("/(\w+)?(\w+)/i", "thi?s", "?");
?>
Result
?
Hope this help you!

Related

path to php in regex

I am currently working at a project involving regex in php. I wanted to know why or how can I get this recursive regular expression to work in PHP:
{{test":"([a-f0-9]{32})"},{"test2":"([a-z]{3})}}
And the given results should be an array with:
[a-f0-9]{32}
[a-z]{3}
Maybe, this regex helps
/.sample.\/.*?(\d+\.\d+\.\d+)-/
You should escape the . or it will mean any character.
If it does not matter, what is after the dash, you need not use the anchor $ for end of string.
This finds the first occurence of number in the string, because .*? is not eager. It matches only as much as necessary for the rest of the pattern.
You can use this, if between / and the number are only letters allowed:
/.sample.\/[a-zA-Z]*(\d+\.\d+\.\d+)-/

RegEx expression to hit only words with a-z and no aumlats

Can you help me out with this one? I have a list of words like this:
sachbearbeiter/-in
referent/-in
anlagenführer/-in
it-projektleiter/-in
I want to select only:
sachbearbeiter/-in
referent/-in
This is my current regex: ([a-z]+)/-(in)
The problem is it hits all even the ones with - and with ü
Thank you in advance.
You can use anchors to match the word you want:
^([a-z]+)/-(in)$
^---- Here ----^
Working demo
Update: for your comment, if you want to accept aumlats you can use unicode flag with \w like this:
^(\w+)/-(in)$
Working demo
You need to specify beginning & end of string so that it can match exact chars
change your regex to
^([a-z]+)/-(in)$
^ -> stands for beginning of string
$-> for end of string
Your current regex i.e. ([a-z]+)/-(in) does escape the / character and also trying to look into substrings that matches the pattern, so it'll show each of them.
Regex should be : ^([a-z]+)\/-(in) i.e. it should start with only small case alphabets with escaped /

Using backreference wit php preg_match_all

I'am quite new in regex and php but I'm facing an issue I can't handle alone.
I've prepared this regex to find patterns starting with upper-case letter. It could sounds something like :
capture any pattern that
starts with one or more Upper-case letter
then one or more any letter or character in the list
then a space, or punctuation mark
and I use a backreference to set I want those pattern up to 3 times :
([A-ZÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜ]{1,}[a-zàáâãäåçèéêëìíîïðòóôõöùúûüýÿ;:«0-9]{1,}[\s-….?,;]\1{1,3})
According to https://regex101.com/r/pB3nY7/2 it works as a javascript regex but not as a php regex.
I've rade the other posts and make sure :
I use single quotes instead of double quotes
and I "protected" the \ in my php script :
'#([A-ZÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜ]{1,}[a-zàáâãäåçèéêëìíîïðòóôõöùúûüýÿ;:«0-9]{1,}[\\s-….?,;]\\1{1,3})#'
But it still can't match any pattern starting with a Upper-case letter.
Thank you in advance for all advice you may provide,
Regards,
Charles
i have tested it on this website http://www.phpliveregex.com/ :
(^[A-ZÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜ]{1,}[a-zàáâãäåçèéêëìíîïðòóôõöùúûüýÿ;:«0-9]{1,}[\s-….?,;]{1,1}){1,3}
To be more generalist, you could use unicode properties:
^(\p[Lu}+[\p{Ll};:«0-9]+[\s\p{P}]){1,3}
Where \p[Lu} stands for an uppercase letter, \p{Ll} a lowercase letter and \p{P} a punctuation.
preg_match('/^(\p[Lu}+[\p{Ll};:«0-9]+[\s\p{P}]){1,3}/', $string, $match);

PHP Regex Help (converting from preg_match_all to preg_replace)

I'm having a bit of difficulties converting some regex from being used in preg_match_all to being used in preg_replace.
Basically, via regex only, I would like to match uppercase characters that are preceded by either a space, beginning of text, or a hypen. This is not a problem, I have the following for this which works well:
preg_match_all('/(?<= |\A|-)[A-Z]/',$str,$results);
echo '<pre>' . print_r($results,true) . '</pre>';
Now, what I'd like to do, is to use preg_replace to only return the string with the uppercase characters that match my criteria above. If I port the regex straight into preg_replace, then it obviously replaces the characters I want to keep.
Any help would be much appreciated :)
Also, I'm fully aware regex isn't the best solution for this in terms of efficiency, but nonetheless I would like to use preg_replace.
According to De Morgan's laws,
if you want to keep letters that are
A-Z, and
preceded by [space], \A, or -
then you'd want to remove characters that are
not A-Z, or
not preceded by [space], \A, or -
Perhaps this (replace match with empty string)?
/[^A-Z]|(?<! |\A|-)./
See example here.
I think it will be something like this:
$sString = preg_replace('#.*?(?<= |\A|-)([A-Z])([a-z]+)#m',"$1", $sString);

How can I create my own regex for "parse" HTML links?

The strings looks like hyperlinks, such as http://somethings. This is what I need :
I need to check them only if they doesnt start with the character "; I mean, only that characters : if before there aren't characters it must check;
That somethings string means that every kind of characters can be used (of course, is a link) except a whitespace (The end marker link); I know, it's permitted by RFC, but is the only way I know to escape;
these string are previously filtered by using htmlentities($str, ENT_QUOTES, "UTF-8"), that's why every kind of characters can be used. Is it secure? Or I risk problems with xss or html broked?
the occurences of this replacement can me multiple, not only 1, and must be case insenstive;
This is my actual regex :
preg_replace('#\b[^"](((http|https|ftp)://).+)#', '<a class="lforum" href="$1">$1</a>', $str);
But it check only those string that START with ", and I want the opposite. Any helps answering to this question would be good, Thanks!
For both of your cases you'll want lookbehind assertions.
\b(?<!")(\w)\b - negative lookbehind to match only if not preceded by "
(?<=ThisShouldBePresent://)(.*) - positive lookbehind to match only if preceded by the your string.
Something like this: preg_match('/\b[^"]/',$input_string);
This looks for a word-break (\b), followed by any character other than a double quote ([^"]).
Something like this: preg_match('~(((ThisShouldBePresent)://).+)~');
I've assumed the brackets you specified in the question (and the plus sign) were intended as part of the regex rather than characters to search for.
I've also taken #ThiefMaster's advice and changed the delimiter to ~ to avoid having to escape the //.

Categories