I want to have A-Z, 0-9 and whitespace ignored in regular expression, currently I have this, it works but whitespaces are ignored too but I need them.
preg_match_all("/[^\w]/",$string,$matches);
\s represents whitespace characters. So if you want to match every character except word characters (\w) or whitespace characters (\s), try this:
preg_match_all("/[^\w\s]/",$string,$matches);
Related
How to write a regex with matches whitespace but no tabs and new line?
thanks everything
[[:blank:]]{2,} <-- Even though this isn't good for me because its whitespace or tab but not newlines.
As per my original comment, you can use this.
Code
See regex in use here
Note: The link contains whitespace characters: tab, newline, and space. Only space is matched.
[^\S\t\n\r]
So your regex would be [^\S\t\n\r]{2,}
Explanation
[^\S\t\n\r] Match any character not present in the set.
\S Matches any non-whitespace character. Since it's a double negative it will actually match any whitespace character. Adding \t, \n, and \r to the negated set ensures we exclude those specific characters as well. Basically, this regex is saying:
Match any whitespace character except \t\n\r
This principle in regex is often used with word characters \w to negate the underscore _ character: [^\W_]
[ ]{2,} works normally (not sure about php)
or even / {2,}/
I am writing a script to clean up a file line-by-line with non-ascii characters, but I am having trouble with a regex pattern. I need a regex pattern that matches any line that starts with an asterisk, may have an equals, and will contain non-ascii characters and spaces. I know how to match a non-ascii character, but not in the same set as other positively defined characters.
Here is a sample line that I need to match:
* = Ìÿð ÿð
Here is the pattern I have so far:
/\*[^[:ascii:]]+[\r\n]/
This will match lines that start with asterisk and containing non-ascii characters, but not if the line has spaces or equals in it.
Try the following expression:
^\*\s*=?\s*[[:^ascii:]\s]+[\r\n]*$
This matches the start-of-line ^, then it matches zero or more spaces \s* followed by an optional equal sign =? then zero or more white spaces \s*.
Now a nice piece of expression matches one or more characters which are a combination of non-ascii and white spaces [[:^ascii:]\s]+, check docs to see the syntax for character classes.
Finally the expression matches a combination of carriage returns and newlines which may end the line.
Regex101 Demo
Maybe this - (edit: changed after reread )
# ^\*(?=.*[^\0-\177])
^
\*
(?= .* [^\0-\177] )
Consider the following strings
breaking out a of a simple prison
this is b moving up
following me is x times better
All strings are lowercased already. I would like to remove any "loose" a-z characters, resulting in:
breaking out of simple prison
this is moving up
following me is times better
Is this possible with a single regex in php?
$str = "breaking out a of a simple prison
this is b moving up
following me is x times better";
$res = preg_replace("#\\b[a-z]\\b ?#i", "", $str);
echo $res;
How about:
preg_replace('/(^|\s)[a-z](\s|$)/', '$1', $string);
Note this also catches single characters that are at the beginning or end of the string, but not single characters that are adjacent to punctuation (they must be surrounded by whitespace).
If you also want to remove characters immediately before punctuation (e.g. 'the x.'), then this should work properly in most (English) cases:
preg_replace('/(^|\s)[a-z]\b/', '$1', $string);
As a one-liner:
$result = preg_replace('/\s\p{Ll}\b|\b\p{Ll}\s/u', '', $subject);
This matches a single lowercase letter (\p{Ll}) which is preceded or followed by whitespace (\s), removing both. The word boundaries (\b) ensure that only single letters are indeed matched. The /u modifier makes the regex Unicode-aware.
The result: A single letter surrounded by spaces on both sides is reduced to a single space. A single letter preceded by whitespace but not followed by whitespace is removed completely, as is a single letter only followed but not preceded by whitespace.
So
This a is my test sentence a. o How funny (what a coincidence a) this is!
is changed to
This is my test sentence. How funny (what coincidence) this is!
You could try something like this:
preg_replace('/\b\S\s\b/', "", $subject);
This is what it means:
\b # Assert position at a word boundary
\S # Match a single character that is a “non-whitespace character”
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
\b # Assert position at a word boundary
Update
As raised by Radu, because I've used the \S this will match more than just a-zA-Z. It will also match 0-9_. Normally, it would match a lot more than that, but because it's preceded by \b, it can only match word characters.
As mentioned in the comments by Tim Pietzcker, be aware that this won't work if your subject string needs to remove single characters that are followed by non word characters like test a (hello). It will also fall over if there are extra spaces after the single character like this
test a hello
but you could fix that by changing the expression to \b\S\s*\b
Try this one:
$sString = preg_replace("#\b[a-z]{1}\b#m", ' ', $sString);
I have this regex:
/[^a-z\s]/i
This is suppose to match any character from a-z and A-Z and any whitespace encountered. It works for characters, but not for spaces, why ?
I'm checking in php like this :
if (preg_match('/[^a-z\s]/i', $username)) {
...
}
I'm checking to see if the username contains any other character than letters ( a-z,A-Z ) or than space.
Your regex should like this:
/^[a-z\s]+$/i
if (preg_match('/^[a-z\s]+$/i', $username)) {
//the username is ok.
}
/[^a-z\s]/i will only match characters that aren't in the case-insensitive set a-z and space. Try removing the ^, which negates the characters inside your brackets. The pattern to match all letters and spaces should read:
/[a-z\s]/i
Note that \s won't just match spaces. It will match any whitespace character (like tabs and newlines) as well.
If you want to force matches to begin with a letter or space, you must move the ^ outside of the brackets like so:
/^[a-z\s]/i
Finally, if you're trying to match strings that begin with one or more occurrences of letters and spaces you need to add the + modifier. Otherwise it will only match a single character:
/^[a-z\s]+/i
because the ^ character is an anchor and you've placed it incorrectly...if you use ^ and $ for the start and end string markers they need to appear at the absolute beginning and end respectively.
So it sounds like you'd want:
^[a-zA-Z\s]$
or if you want to match multiples of alpha and/or spaces then:
^[a-zA-Z\s]*$
Works for me! Perhaps you should include a fully reproducible example, but it picks up spaces for me.
You can also rewrite this regex to do the opposite, which is a bit more obvious for me personally:
/^[a-z\s]*$/
I'm still kinda new to using Regular Expressions, so here's my plight. I have some rules for acceptable usernames and I'm trying to make an expression for them.
Here they are:
1-15 Characters
a-z, A-Z, 0-9, and spaces are acceptable
Must begin with a-z or A-Z
Cannot end in a space
Cannot contain two spaces in a row
This is as far as I've gotten with it.
/^[a-zA-Z]{1}([a-zA-Z0-9]|\s(?!\s)){0,14}[^\s]$/
It works, for the most part, but doesn't match a single character such as "a".
Can anyone help me out here? I'm using PCRE in PHP if that makes any difference.
Try this:
/^(?=.{1,15}$)[a-zA-Z][a-zA-Z0-9]*(?: [a-zA-Z0-9]+)*$/
The look-ahead assertion (?=.{1,15}$) checks the length and the rest checks the structure:
[a-zA-Z] ensures that the first character is an alphabetic character;
[a-zA-Z0-9]* allows any number of following alphanumeric characters;
(?: [a-zA-Z0-9]+)* allows any number of sequences of a single space (not \s that allows any whitespace character) that must be followed by at least one alphanumeric character (see PCRE subpatterns for the syntax of (?:…)).
You could also remove the look-ahead assertion and check the length with strlen.
make everything after your first character optional
^[a-zA-Z]?([a-zA-Z0-9]|\s(?!\s)){0,14}[^\s]$
The main problem of your regexp is that it needs at least two characters two have a match :
one for the [a-zA-Z]{1} part
one for the [^\s] part
Beside this problem, I see some parts of your regexp that could be improved :
The [^\s] class will match any character, except spaces : a dot or semi-colon will be accepted, try to use the [a-zA-Z0-9] class here to ensure the character is a correct one.
You can delete the {1} part at the beginning, as the regexp will match exactly one character by default