incomprehensible logic preg_replace in php - php

I have string:
$string = "1\n2\n3\n4\n\n\n";
And pattern:
$pattern = '/\s*$/';
// \s* - any spaces
// $ - end of string
Why when I call:
preg_replace($pattern, "\n5", $string);
Output is:
"1\n2\n3\n4\n5\n5"
What is wrong with my pattern and how to change it, that the result will be:
"1\n2\n3\n4\n5"

$ can match both the impending linebreak, as well as the end of a string.
Your main issue in this case however is \s* matching zero or more whitespace characters.
For instance \n\nā„ is matched and replaced first by \s*$
As well as the 5\nā„ after the first replacement.
The \s*$ gets treated as $ simply.
And the new end of text ā„ still matches $.
It's not like $ by itself always matches twice, but the combination of \s*$ can.
The simplest fix is not to match for zero spaces, but make it only eat up existing whitespace:
/\s+$/
That's sufficient in your case, as you only care about the end of the string and cleaning out excess linebreaks anyway.

Related

php preg_match - pattern for number at end of string, always preceded by space, in turn preceded by any characters?

I am trying to extract a number from a string. The string is a filename, will be variable in its contents, but will end with a space followed by a number.
So strings like a b 1212, a 1212. I am not sure if filenames ever end up with a whitespace at the beginning, but something like 1212 should not produce a match.
I am using so far
preg_match('#^(.)* (\d*)$#', $str, $matches);
but that returns a match for case of something like 1212.
I thought about reversing the string and doing this
preg_match('#^(\d*)(\s*)(.*)$#', strrev($str), $matches);
var_dump(strrev($matches[1]));
and checking if $matches[1] != '' and that seems to work, but I am trying to better understand regex, so that I can do this without the reverse, and figure out proper pattern for the case get number at end of string, which is always preceded by a space, and where that space in turn is always preceded but anything.
Ideas?
You can try this approach:
preg_match("/^\S.* (\b\d+)$/", $str, $matches);
echo end($matches)."\n";
For instance if you use the following variable:
$str = "1234 lkjsdhf ldjfh 1223";
The call to end($matches) will return 1223
Whereas if you use the following variable:
$str = " 1212";
call to end($matches) will remain empty.
You may use this regex:
^\S.*\b(\d+)$
RegEx Demo
RegEx Details:
^: Start
\S: Match a non-whitespace
.*: Match 0 or more of any character
\b: Word boundary
(\d+): Match any integer number in capture group #1
$: End

Regular expression alphanumeric with dash and underscore and space, but not at the beginning or at the end of the string [duplicate]

I want to design an expression for not allowing whitespace at the beginning and at the end of a string, but allowing in the middle of the string.
The regex I've tried is this:
\^[^\s][a-z\sA-Z\s0-9\s-()][^\s$]\
This should work:
^[^\s]+(\s+[^\s]+)*$
If you want to include character restrictions:
^[-a-zA-Z0-9-()]+(\s+[-a-zA-Z0-9-()]+)*$
Explanation:
the starting ^ and ending $ denotes the string.
considering the first regex I gave, [^\s]+ means at least one not whitespace and \s+ means at least one white space. Note also that parentheses () groups together the second and third fragments and * at the end means zero or more of this group.
So, if you take a look, the expression is: begins with at least one non whitespace and ends with any number of groups of at least one whitespace followed by at least one non whitespace.
For example if the input is 'A' then it matches, because it matches with the begins with at least one non whitespace condition. The input 'AA' matches for the same reason. The input 'A A' matches also because the first A matches for the at least one not whitespace condition, then the ' A' matches for the any number of groups of at least one whitespace followed by at least one non whitespace.
' A' does not match because the begins with at least one non whitespace condition is not satisfied. 'A ' does not matches because the ends with any number of groups of at least one whitespace followed by at least one non whitespace condition is not satisfied.
If you want to restrict which characters to accept at the beginning and end, see the second regex. I have allowed a-z, A-Z, 0-9 and () at beginning and end. Only these are allowed.
Regex playground: http://www.regexr.com/
This RegEx will allow neither white-space at the beginning nor at the end of your string/word.
^[^\s].+[^\s]$
Any string that doesn't begin or end with a white-space will be matched.
Explanation:
^ denotes the beginning of the string.
\s denotes white-spaces and so [^\s] denotes NOT white-space. You could alternatively use \S to denote the same.
. denotes any character expect line break.
+ is a quantifier which denote - one or more times. That means, the character which + follows can be repeated on or more times.
You can use this as RegEx cheat sheet.
In cases when you have a specific pattern, say, ^[a-zA-Z0-9\s()-]+$, that you want to adjust so that spaces at the start and end were not allowed, you may use lookaheads anchored at the pattern start:
^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$
^^^^^^^^^^^^^^^^^^^^
Here,
(?!\s) - a negative lookahead that fails the match if (since it is after ^) immediately at the start of string there is a whitespace char
(?![\s\S]*\s$) - a negative lookahead that fails the match if, (since it is also executed after ^, the previous pattern is a lookaround that is not a consuming pattern) immediately at the start of string, there are any 0+ chars as many as possible ([\s\S]*, equal to [^]*) followed with a whitespace char at the end of string ($).
In JS, you may use the following equivalent regex declarations:
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = /^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = new RegExp("^(?!\\s)(?![^]*\\s$)[a-zA-Z0-9\\s()-]+$")
var regex = new RegExp(String.raw`^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$`)
If you know there are no linebreaks, [\s\S] and [^] may be replaced with .:
var regex = /^(?!\s)(?!.*\s$)[a-zA-Z0-9\s()-]+$/
See the regex demo.
JS demo:
var strs = ['a b c', ' a b b', 'a b c '];
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/;
for (var i=0; i<strs.length; i++){
console.log('"',strs[i], '"=>', regex.test(strs[i]))
}
if the string must be at least 1 character long, if newlines are allowed in the middle together with any other characters and the first+last character can really be anyhing except whitespace (including ##$!...), then you are looking for:
^\S$|^\S[\s\S]*\S$
explanation and unit tests: https://regex101.com/r/uT8zU0
This worked for me:
^[^\s].+[a-zA-Z]+[a-zA-Z]+$
Hope it helps.
How about:
^\S.+\S$
This will match any string that doesn't begin or end with any kind of space.
^[^\s].+[^\s]$
That's it!!!! it allows any string that contains any caracter (a part from \n) without whitespace at the beginning or end; in case you want \n in the middle there is an option s that you have to replace .+ by [.\n]+
pattern="^[^\s]+[-a-zA-Z\s]+([-a-zA-Z]+)*$"
This will help you accept only characters and wont allow spaces at the start nor whitespaces.
This is the regex for no white space at the begining nor at the end but only one between. Also works without a 3 character limit :
\^([^\s]*[A-Za-z0-9]\s{0,1})[^\s]*$\ - just remove {0,1} and add * in order to have limitless space between.
As a modification of #Aprillion's answer, I prefer:
^\S$|^\S[ \S]*\S$
It will not match a space at the beginning, end, or both.
It matches any number of spaces between a non-whitespace character at the beginning and end of a string.
It also matches only a single non-whitespace character (unlike many of the answers here).
It will not match any newline (\n), \r, \t, \f, nor \v in the string (unlike Aprillion's answer). I realize this isn't explicit to the question, but it's a useful distinction.
Letters and numbers divided only by one space. Also, no spaces allowed at beginning and end.
/^[a-z0-9]+( [a-z0-9]+)*$/gi
I found a reliable way to do this is just to specify what you do want to allow for the first character and check the other characters as normal e.g. in JavaScript:
RegExp("^[a-zA-Z][a-zA-Z- ]*$")
So that expression accepts only a single letter at the start, and then any number of letters, hyphens or spaces thereafter.
use /^[^\s].([A-Za-z]+\s)*[A-Za-z]+$/. this one. it only accept one space between words and no more space at beginning and end
If we do not have to make a specific class of valid character set (Going to accept any language character), and we just going to prevent spaces from Start & End, The must simple can be this pattern:
/^(?! ).*[^ ]$/
Try on HTML Input:
input:invalid {box-shadow:0 0 0 4px red}
/* Note: ^ and $ removed from pattern. Because HTML Input already use the pattern from First to End by itself. */
<input pattern="(?! ).*[^ ]">
Explaination
^ Start of
(?!...) (Negative lookahead) Not equal to ... > for next set
Just Space / \s (Space & Tabs & Next line chars)
(?! ) Do not accept any space in first of next set (.*)
. Any character (Execpt \n\r linebreaks)
* Zero or more (Length of the set)
[^ ] Set/Class of Any character expect space
$ End of
Try it live: https://regexr.com/6e1o4
^[^0-9 ]{1}([a-zA-Z]+\s{1})+[a-zA-Z]+$
-for No more than one whitespaces in between , No spaces in first and last.
^[^0-9 ]{1}([a-zA-Z ])+[a-zA-Z]+$
-for more than one whitespaces in between , No spaces in first and last.
Other answers introduce a limit on the length of the match. This can be avoided using Negative lookaheads and lookbehinds:
^(?!\s)([a-zA-Z0-9\s])*?(?<!\s)$
This starts by checking that the first character is not whitespace ^(?!\s). It then captures the characters you want a-zA-Z0-9\s non greedily (*?), and ends by checking that the character before $ (end of string/line) is not \s.
Check that lookaheads/lookbehinds are supported in your platform/browser.
Here you go,
\b^[^\s][a-zA-Z0-9]*\s+[a-zA-Z0-9]*\b
\b refers to word boundary
\s+ means allowing white-space one or more at the middle.
(^(\s)+|(\s)+$)
This expression will match the first and last spaces of the article..

PHP preg_replace has error. How can I found out the reason?

$string = '## aaa bbb';
$pattern = '/^(\n)?\s{0,}#{1,6}\s+| {0,}(\n)?\s{0,}#{0,} {0,}(\n)?\s{0,}$/';
$replacement = '$1$2$3';
echo preg_replace($pattern, $replacement, $string);
If the space between "aaa" and "bbb" is around 50 (or less), I DO get the correct result. BUT if I increase the number, say there are 100 space between "aaa" and "bbb". I get null. How can I find out the reason?
The pattern matches two alternatives, one ^(\n)?\s*#{1,6}\s+ and the other *(\n)?\s*#* *(\n)?\s*$.
The first one is OK, though it is advised to make \n optional rather than making the whole group optional.
The second one, *(\n)?\s*#* *(\n)?\s*$, is a very inefficient pattern because there are \s* patterns that follow an optional \n pattern that is in its turn preceded with a * pattern. When there is no \n, \s* may "fall into" * and that leads to catastrophical backtracking when part of the pattern matches but the final subpatterns fail.
So, you may use
/^(\n?)\s*#{1,6}\s+| *(?:(\n)\s*)?#* *(?:(\n)\s*)?$/
See the regex demo
The crucial point here is (?:(\n)\s*)? parts where \n is obligatory and will only be tried after all regular spaces are matched with * and then \s* will only be tried if there is \n before, thus, ensuring no subpattern will fall into the other.

php regex: or clause doesn't work

i need to write a regex for make a double check: if a string contains empty spaces at the beginning, at the end, and if all string it's composed by empty spaces, and if string contains only number.
I've write this regex
$regex = '/^(\s+ )| ^(\d+)$/';
but it doesn't' work. What's wrong ?
First things first: get your spaces right!
For example (\s+ ) will match a minimum of one space (\s+) followed by another space ()! Same applies for the space between | and ^. This way you will match the space literally every time and this leads to wrong results.
If I get you right and you want to match on strings which
start with one or more spaces OR
end with one or more spaces OR
consist only of spaces OR
consist only of numbers
I'd use
/^(?:\s+.*|.*\s+$|\d+$)/
Demo # regex101
This way you match spaces at the start of the string (\s+.*) or (|) spaces at the end of the string (.*\s+$) or a completely numeric string (\d+$).
Insert capturing groups as needed.
This will match in case the whole string consists of spaces, too, because technically the string then starts with spaces.
The space before ^(\d+) make your regex can't catch the numeric string.
It should be like below:
$regex = '/^\s*\d*\s*$/';
First if all, remove the space between | and ^. You are trying to match a space before the beginning of the line (^), so that can not work.
I do not exactly understand what you want. Either a string that only consists of white spaces, or a number that may have white spaces at the beginning or end? Try this:
$regex = '/^\s*\d*\s*$/';

preg_replace curly brace when it is the only character on the line?

Let's say I have the following string:
Some Text Here }
}
How can I do a preg_replace so that only the "}" on the line by itself gets replaced?
I would expect the following to work, but it doesn't:
preg_replace('/^(\s*)(\})(\s*)/', etc);
The following should work:
preg_replace('/^\s*\}\s*$/m', $replacement, $subject);
The s* means any number of the character s. What you probably mean is \s*, any number of whitespace characters.
You need to enable multiline mode for the ^ anchor to work on a per line basis; the default setting is that ^ is the beginning and $ the end of the entire string, not a single line.
Remember the $ anchor, otherwise something like }hello would also get matched.
^ and $ matches the beginning and end of a string. You need the m modifier to make this match the beginning and end of a line.
Your RE will not work as expected. s* matches zero or more occurences of s. It's very likely that you wanted to use \s* instead, to match white space.
preg_replace('/^(\s*)(\})(\s*)$/m', $replacement, $subject);
A multi-line free version, that could be used in a larger regex should spanning lines be needed:
/(^|\n)([^\S\n]*\}[^\S\n]*)(?=\n|$)/

Categories