Someone sitting on a regex that only allows a-z and ONLY allow the first letter of each word to be capitalized?
So 'Im detective John kimble" would be match but "Im a Cop yOu iDiot" would not be allowed
This regex will match a word with a lower-case or capital letter at the beginning of the word.
[a-zA-Z][a-z]*
Now you can extend the regex to match multiple such words depending on what exactly you want. You have to be a bit careful with this to make sure it handles strange cases like an empty sentence etc.
([a-zA-Z][a-z]*)* // Matches the empty sentence as well
([a-zA-Z][a-z]*)+ // Must have at least one word
Then you need to consider if the start and end characters (^ and $) are relevant for your pattern.
You really don't need regex for this that .. because i don't really think how is is an offence
You can simple correct the case :
$str = "joHn KiMBle";
echo ucwords(strtolower($str)); // John Kimble
In css you can capitalize the 1st letter of each word with:
.title {
text-transform: capitalize;
}
In PHP the string function ucfirst like this:
$foo = ucfirst($foo);
Allows only a-z use this regex in Javascript
var pat = /^[a-z]+$/;
Try using
([a-zA-Z][a-z]*)+
Hope it helps
You can use this pattern to check that:
^(?>[A-Za-z][a-z]*+|[^A-Za-z]++)+$
Doable without regex.
!(strspcn($text, "0123456789") !== false ||
ucwords($text) == ucwords(strtolower($text)))
Okay, I was hoping someone could help me with a little regex-fu.
I am trying to clean up a string.
Basically, I am:
Replacing all characters except A-Za-z0-9 with a replacement.
Replacing consecutive duplicates of the replacement with a single instance of the replacement.
Trimming the replacement from the beginning and end of the string.
Example Input:
(&&(%()$()#&#&%&%%(%$+-_The dog jumped over the log*(&)$%&)#)##%&)&^)##)
Required Output:
The+dog+jumped+over+the+log
I am currently using this very discombobulated code and just know there is a much more elegant way to accomplish this....
function clean($string, $replace){
$ok = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
$ok .= $replace;
$pattern = "/[^".preg_quote($ok, "/")."]/";
return trim(preg_replace('/'.preg_quote($replace.$replace).'+/', $replace, preg_replace($pattern, $replace, $string)),$replace);
}
Could a Regex-Fu Master please grace me with a simpler/more efficient solution?
A much better solution suggested and explained by Botond Balázs and hakre:
function clean($string, $replace, $skip=""){
// Escape $skip
$escaped = preg_quote($replace.$skip, "/");
// Regex pattern
// Replace all consecutive occurrences of "Not OK"
// characters with the replacement
$pattern = '/[^A-Za-z0-9'.$escaped.']+/';
// Execute the regex
$result = preg_replace($pattern, $replace, $string);
// Trim and return the result
return trim($result, $replace);
}
I'm not a "regex ninja" but here's how I would do it.
function clean($string, $replace){
/// Remove all "not OK" characters from the beginning and the end:
$result = preg_replace('/^[^A-Za-z0-9]+/', '', $string);
$result = preg_replace('/[^A-Za-z0-9]+$/', '', $result);
// Replace all consecutive occurrences of "not OK"
// characters with the replacement:
$result = preg_replace('/[^A-Za-z0-9]+/', $replace, $result);
return $result;
}
I guess this could be simplified more but when dealing with regexes, clarity and readability is often more important than being clever or writing super-optimal code.
Let's see how it works:
/^[^A-Za-z0-9]+/:
^ matches the beginning of the string.
[^A-Za-z0-9] matches all non-alphanumeric characters
+ means "match one or more of the previous thing"
/[^A-Za-z0-9]+$/:
same thing as above, except $ matches the end of the string
/[^A-Za-z0-9]+/:
same thing as above, except it matches mid-string too
EDIT: OP is right that the first two can be replaced with a call to trim():
function clean($string, $replace){
// Replace all consecutive occurrences of "not OK"
// characters with the replacement:
$result = preg_replace('/[^A-Za-z0-9]+/', $replace, $result);
return trim($result, $replace);
}
I don't want to sound super-clever, but I would not call it regex-foo.
What you do is actually pretty much in the right direction because you use preg_quote, many others are not even aware of that function.
However probably at the wrong place. Wrong place because you quote for characters inside a character class and that has (similar but) different rules for quoting in a regex.
Additionally, regular expressions have been designed with a case like yours in mind. That is probably the part where you look for a wizard, let's see some options how to make your negative character class more compact (I keep the generation out to make this more visible):
[^0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]
There are constructs like 0-9, A-Z and a-z that can represent exactly that. As you can see - is a special character inside a character class, it is not meant literal but as having some characters from-to:
[^0-9A-Za-z]
So that is already more compact and represents the same. There are also notations like \d and \w which might be handy in your case. But I take the first variant for a moment, because I think it's already pretty visible what it does.
The other part is the repetition. Let's see, there is + which means one or more. So you want to replace one or more of the non-matching characters. You use it by adding it at the end of the part that should match one or more times (and by default it's greedy, so if there are 5 characters, those 5 will be taken, not 4):
[^0-9A-Za-z]+
I hope this is helpful. Another step would be to also just drop the non-matching characters at the beginning and end, but it's early in the morning and I'm not that fluent with that.
I'm stuck writing a preg_match
I have a string:
XPMG_ar121023.txt
and need to extract the 2 letters between XPMG_ and the first digit - be it a 0-9
$str = 'XPMG_ar121023.txt';
preg_match('/('XPMG_')|[0-9\,]))/', $str, $match);
print_r($match);
Maybe this isn't the best option: My characters will always be
You can just do
$str = "XPMG_ar121023.txt" ;
preg_match('/_([a-z]+)/i', $str, $match);
var_dump($match[1]);
Output
string 'ar' (length=2)
This is too simple for a regular expression. Just $match = substr($str,5,3) would get what you're asking for.
Let me walk through this step by step so as to help you solve similar problems in the future. Suppose we have the following format for our filenames:
XPMG_ar121023.txt
We know what we want to capture, we want the "ar" right after the _ and just before the numbers begin. So our expression would look something like this:
_[a-z]+
This is pretty straight-forward. We're starting by looking for an underscore, followed by any number of letters between a and z. The square brackets define a character class. Our class consists of the alphabet, but you can push specific numbers in there and more if you like.
Now because we want to capture only the letters, we need to put parenthesis around that part of the pattern:
_([a-z]+)
In the result we will now have access to only that subpattern. Next we put our delimiters in place to specify where our pattern begins, and ends:
/_([a-z]+)/
And lastly, after our closing delimiter we can add some modifiers. As it is written, our pattern only looks for lower-case letters. We can add the i modifier to make this case-insensitive:
/_([a-z]+)/i
Voila, we're done. Now we can pass it into preg_match to see what it spits out:
preg_match( "/_([a-z]+)/i", "XPMG_ar121023.txt", $match );
This function takes a pattern as the first parameter, a string to match it against as the second, and lastly a variable to spit the results into. When all is said and done, we can check $match for our data.
The results of this operation follow:
array(2) {
[0]=> string(3) "_ar"
[1]=> string(2) "ar"
}
This is the contents of $match. Notice our full pattern is found in the first index of the array, and our captured portion is provided in the second index of the array.
echo $match[1]; // ar
Hope this helps.
Well, why not:
$letters = $str[5].$str[6];
:)
After all, you'll always need the 2 chars after the fixed prefix, there are many ways that do not require a regexp (substr() being the best anyway)
I basically need a function to check whether a string's characters (each character) is in an array.
My code isn't working so far, but here it is anyway,
$allowedChars = array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"," ","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"," ","0","1","2","3","4","5","6","7","8","9"," ","#",".","-","_","+"," ");
$input = "Test";
$input = str_split($input);
if (in_array($input,$allowedChars)) {echo "Yep, found.";}else {echo "Sigh, not found...";}
I want it to say 'Yep, found.' if one of the letters in $input is found in $allowedChars. Simple enough, right? Well, that doesn't work, and I haven't found a function that will search a string's individual characters for a value in an array.
By the way, I want it to be just those array's values, I'm not looking for fancy html_strip_entities or whatever it is, I want to use that exact array for the allowed characters.
You really should look into regex and the preg_match function: http://php.net/manual/en/function.preg-match.php
But, this should make your specific request work:
$allowedChars = array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"," ","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"," ","0","1","2","3","4","5","6","7","8","9"," ","#",".","-","_","+"," ");
$input = "Test";
$input = str_split($input);
$message = "Sigh, not found...";
foreach($input as $letter) {
if (in_array($letter, $allowedChars)) {
$message = "Yep, found.";
break;
}
}
echo $message;
Are you familiar with regular expressions at all? It's sort of the more accepted way of doing what you're trying to do, unless I'm missing something here.
Take a look at preg_match(): http://php.net/manual/en/function.preg-match.php
To address your example, here's some sample code (UPDATED TO ADDRESS ISSUES IN COMMENTS):
$subject = "Hello, this is a string";
$pattern = '/[a-zA-Z0-9 #._+-]*/'; // include all the symbols you want to match here
if (preg_match($pattern, $subject))
echo "Yep, matches";
else
echo "Doesn't match :(";
A little explanation of the regex: the '^' matches the beginning of the string, the '[a-zA-Z0-9 #._+-]' part means "any character in this set", the '*' after it means "zero or more of the last thing", and finally the '$' at the end matches the end of the string.
A somewhat different approach:
$allowedChars = array("a","b","c","d","e");
$char_buff = explode('', "Test");
$foundTheseOnes = array_intersect($char_buff, $allowedChars);
if(!empty($foundTheseOnes)) {
echo 'Yep, something was found. Let\'s find out what: <br />';
print_r($foundTheseOnes);
}
Validating the characters in a string is most appropriately done with string functions.preg_match() is the most direct/elegant method for this task.
Code: (Demo)
$input="Test Test Test Test";
if(preg_match('/^[\w +.#_-]*$/',$input)){
echo "Input string does not contain any disallowed characters";
}else{
echo "Input contains one or more disallowed characters";
}
// output: Yes, input contains only allowed characters
Pattern Explanation:
/ # start pattern
^ # start matching from start of string
[\w +.#-] # match: a-z, A-Z, 0-9, underscore, space, plus, dot, atsign, hyphen
* # zero or more occurrences
$ # match until end of string
/ # end pattern
Significant points:
The ^ and $ anchors are crucial to ensure that the entire string is validated versus just a substring of the string.
The \w (a.k.a. "any word character" -> a shorthand character class) is the easy way to write: [a-zA-Z0-9_]
The . dot character loses its "match anything (almost)" meaning and becomes literal when it is written inside of a character class. No escaping slash is necessary.
The hyphen inside of a character class can be written without an escaping slash (\-) so long as the it is positioned at the start or end of the character class. If the hyphen is not at the start/end and it is not escaped, it will create a range of characters between the characters on either side of it.Like it or not, [.-z] will not match a hyphen symbol because it does not exist "between" the dot character and the lowercase letter z on the ascii table.
The * that follows the character class is the "quantifier". The asterisk means "0 or more" of the preceding character class. In this case, this means that preg_match() will allow an empty string. If you want to deny an empty string, you can use + which means "1 or more" of the preceding character class. Finally, you can be far more specific about string length by using a number or numbers in a curly bracketed expression.
{8} would mean the string must be exactly 8 characters long.
{4,} would mean the string must be at least 4 characters long.
{,10} would mean the string length must be between 0 and 10.
{5,9} would mean the string length must be between 5 and 9 characters.
All of that advice aside, if you absolutely must use your array of characters AND you wanted to use a loop to check individual characters against your validation array (and I certainly don't recommend it), then the goal should be to reduce the number of array elements involved so as to reduce total iterations.
Your $allowedChars array has multiple elements that contain the space character, but only one is necessary. You should prepare the array using array_unique() or a similar technique.
str_split($input) will run the chance of generating an array with duplicate elements. For example, if $input="Test Test Test Test"; then the resultant array from str_split() will have 19 elements, 14 of which will require redundant validation checks.
You could probably eliminate redundancies from str_split() by calling count_chars($input,3) and feeding that to str_split() or alternatively you could call str_split() then array_unique() before performing the iterative process.
Because you're just validating a string, see preg_match() and other PCRE functions for handling this instead.
Alternatively, you can use strcspn() to do...
$check = "abcde.... '; // fill in the rest of the characters
$test = "Test";
echo ((strcspn($test, $check) === strlen($test)) ? "Sigh, not found..." : 'Yep, found.');
$string = "Hot_Chicks_call_me_at_123456789";
How can I strip away so that I only have the numberst after the last letter in the string above?
Example, I need a way to check a string and remove everything in front of (the last UNDERSCORE FOLLOWED by the NUMBERS)
Any smart solutions for this?
Thanks
BTW, it's PHP!
Without using a regular expression
$string = "Hot_Chicks_call_me_at_123456789";
echo end( explode( '_', $string ) );
If it always ends in a number you can just match /(\d+)$/ with regex, is the formatting consistent? Is there anything between the numbers like dashes or spaces?
You can use preg_match for the regex part.
<?php
$subject = "abcdef_sdlfjk_kjdf_39843489328";
preg_match('/(\d+)$/', $subject, $matches);
if ( count( $matches ) > 1 ) {
echo $matches[1];
}
I only recommend this solution if speed isn't an issue, and if the formatting is completely consistent.
PHP's PCRE Regular Expression engine was built for this kind of task
$string = "Hot_Chicks_call_me_at_123456789";
$new_string = preg_replace('{^.*_(\d+)$}x','$1',$string);
//same thing, but with whitespace ignoring and comments turned on for explanations
$new_string = preg_replace('{
^.* #match any character at start of string
_ #up to the last underscore
(\d+) #followed by all digits repeating at least once
$ #up to the end of the string
}x','$1',$string);
echo $new_string . "\n";
To be a bit churlish, your stated specification would suggest the following algorithm:
def trailing_number(s):
results = list()
for char in reversed(s):
if char.isalpha(): break
if char.isdigit(): results.append(char)
return ''.join(reversed(results))
It returns only the digits from the end of the string up to the first letter it encounters.
Of course this example is in Python, since I don't know PHP nearly as well. However it should be easily translated as the concept is easy enough ... reverse the string (or iterate from the end towards the beginning) and accumulate digits until you find a letter and break (or fall out of the loop at the beginning of the string).
In C it would be more efficient to use something a bit like for(x=strlen(s);x>s;x--) to walk backwards through the string, saving a pointer to the most recently encountered digit until we break or drop out of the loop at the beginning of the string. Then return the pointer into the middle of the string where our most recent (leftmost) digit was found.