How to match alphanumeric and symbols using PHP? - php

I'm working with text content in UTF8 encoding stored in variable $title.
Using preg_replace, how do I append an extra space if the $title string is ending with:
upper/lower case character
digit
symbol, eg. ? or !

This should do the trick:
preg_replace('/^(.*[\w?!])$/', "$1 ", $string);
In essence what it does is if the string ends in one of your unwanted characters it appends a single space.
If the string doesn't match the pattern, then preg_replace() returns the original string - so you're still good.
If you need to expand your list of unwanted endings you can just add them into the character block [\w?!]

Using a positive lookbehind before the end of the line.
And replace with a space.
$title = preg_replace('/(?<=[A-Za-z0-9?!])$/',' ', $title);
Try it here

You may want to try this Pattern Matching below to see if that does it for you.
<?php
// THE REGEX BELOW MATCHES THE ENDING LOWER & UPPER-CASED CHARACTERS, DIGITS
// AND SYMBOLS LIKE "?" AND "!" AND EVEN A DOT "."
// HOWEVER YOU CAN IMPROVISE ON YOUR OWN
$rxPattern = "#([\!\?a-zA-Z0-9\.])$#";
$title = "What is your name?";
var_dump($title);
// AND HERE, YOU APPEND A SINGLE SPACE AFTER THE MATCHED STRING
$title = preg_replace($rxPattern, "$1 ", $title);
var_dump($title);
// THE FIRST var_dump($title) PRODUCES:
// 'What is your name?' (length=18)
// AND THE SECOND var_dump($title) PRODUCES
// 'What is your name? ' (length=19) <== NOTICE THE LENGTH FROM ADDED SPACE.
You may test it out HERE.
Cheers...

You need
$title=preg_replace("/.*[\w?!]$/", "\\0 ", $title);

Related

how to clean a dirty csv string using php regex

my string may be like this:
# *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?
in fact - it is a dirty csv string - having names of jpg images
I need to remove any non-alphanum chars - from both sides of the string
then - inside the resulting string - remove the same - except commas and dots
then - remove duplicates commas and dots - if any - replace them with single ones
so the final result should be:
lorem.jpg,ipsum.jpg,dolor.jpg
I firstly tried to remove any white space - anywhere
$str = str_replace(" ", "", $str);
then I used various forms of trim functions - but it is tedious and a lot of code
the additional problem is - duplicates commas and dots may have one or more instances - for example - .. or ,,,,
is there a way to solve this using regex, pls ?
List of modeled steps following your words:
Step 1
"remove any non-alphanum chars from both sides of the string"
translated: remove trailing and tailing consecutive [^a-zA-Z0-9] characters
regex: replace ^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$ with $1
Step 2
"inside the resulting string - remove the same - except commas and dots"
translated: remove any [^a-zA-Z0-9.,]
regex: replace [^a-zA-Z0-9.,] with empty string
Step 3
"remove duplicates commas and dots - if any - replace them with single ones"
translated: replace consecutive [,.] as a single
instance
regex: replace (\.{2,}) with .
regex: replace (,{2,}) with ,
PHP Demo:
https://onlinephp.io/c/512e1
<?php
$subject = " # *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?";
$firstStep = preg_replace('/^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$/', '$1', $subject);
$secondStep = preg_replace('/[^a-z,A-Z0-9.,]/', '', $firstStep);
$thirdStepA = preg_replace('(\.{2,})', '.', $secondStep);
$thirdStepB = preg_replace('(,{2,})', ',', $thirdStepA);
echo $thirdStepB; //lorem.jpg,ipsum.jpg,dolor.jpg
Look at
https://www.php.net/manual/en/function.preg-replace.php
It replace anything inside a string based on pattern. \s represent all space char, but care of NBSP (non breakable space, \h match it )
Exemple 4
$str = preg_replace('/\s\s+/', '', $str);
It will be something like that
Can you try this :
$string = ' # *lorem.jpg,,,, ip sum.jpg,dolor .jpg,-/ ?';
// this will left only alphanumirics
$result = preg_replace("/[^A-Za-z0-9,.]/", '', $string);
// this will remove duplicated dot and ,
$result = preg_replace('/,+/', ',', $result);
$result = preg_replace('/\.+/', '.', $result);
// this will remove ,;. and space from the end
$result = preg_replace("/[ ,;.]*$/", '', $result);

Find words starting and ending with dollar signs $ in PHP

I am looking to find and replace words in a long string. I want to find words that start looks like this: $test$ and replace it with nothing.
I have tried a lot of things and can't figure out the regular expression. This is the last one I tried:
preg_replace("/\b\\$(.*)\\$\b/im", '', $text);
No matter what I do, I can't get it to replace words that begin and end with a dollar sign.
Use single quotes instead of double quotes and remove the double escape.
$text = preg_replace('/\$(.*?)\$/', '', $text);
Also a word boundary \b does not consume any characters, it asserts that on one side there is a word character, and on the other side there is not. You need to remove the word boundary for this to work and you have nothing containing word characters in your regular expression, so the i modifier is useless here and you have no anchors so remove the m (multi-line) modifier as well.
As well * is a greedy operator. Therefore, .* will match as much as it can and still allow the remainder of the regular expression to match. To be clear on this, it will replace the entire string:
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*)\$/', '', $text));
# => string(0) ""
I recommend using a non-greedy operator *? here. Once you specify the question mark, you're stating (don't be greedy.. as soon as you find a ending $... stop, you're done.)
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*?)\$/', '', $text));
# => string(10) " bar quz "
Edit
To fix your problem, you can use \S which matches any non-white space character.
$text = '$20.00 is the $total$';
var_dump(preg_replace('/\$\S+\$/', '', $text));
# string(14) "$20.00 is the "
There are three different positions that qualify as word boundaries \b:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
$ is not a word character, so don't use \b or it won't work. Also, there is no need for the double escaping and no need for the im modifiers:
preg_replace('/\$(.*)\$/', '', $text);
I would use:
preg_replace('/\$[^$]+\$/', '', $text);
You can use preg_quote to help you out on 'quoting':
$t = preg_replace('/' . preg_quote('$', '/') . '.*?' . preg_quote('$', '/') . '/', '', $text);
echo $t;
From the docs:
This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
Contrary to your use of word boundary markers (\b), you actually want the inverse effect (\B)-- you want to make sure that there ISN'T a word character next to the non-word character $.
You also don't need to use capturing parentheses because you are not using a backreference in your replacement string.
\S+ means one or more non-whitespace characters -- with greedy/possessive matching.
Code: (Demo)
$text = '$foo$ boo hi$$ mon$k$ey $how thi$ $baz$ bar $foobar$';
var_export(
preg_replace(
'/\B\$\S+\$\B/',
'',
$text
)
);
Output:
' boo hi$$ mon$k$ey $how thi$ bar '

PHP Regex: Remove words less than 3 characters

I'm trying to remove all words of less than 3 characters from a string, specifically with RegEx.
The following doesn't work because it is looking for double spaces. I suppose I could convert all spaces to double spaces beforehand and then convert them back after, but that doesn't seem very efficient. Any ideas?
$text='an of and then some an ee halved or or whenever';
$text=preg_replace('# [a-z]{1,2} #',' ',' '.$text.' ');
echo trim($text);
Removing the Short Words
You can use this:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $yourstring);
In the demo, see the substitutions at the bottom.
Explanation
\b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
[a-z]{1,2} matches one or two letters
\b another word boundary
Replace with the empty string.
Option 2: Also Remove Trailing Spaces
If you also want to remove the spaces after the words, we can add \s* at the end of the regex:
$replaced = preg_replace('~\b[a-z]{1,2}\b\s*~', '', $yourstring);
Reference
Word Boundaries
You can use the word boundary tag: \b:
Replace: \b[a-z]{1,2}\b with ''
Use this
preg_replace('/(\b.{1,2}\s)/','',$your_string);
As some solutions worked here, they had a problem with my language's "multichar characters", such as "ch". A simple explode and implode worked for me.
$maxWordLength = 3;
$string = "my super string";
$exploded = explode(" ", $string);
foreach($exploded as $key => $word) {
if(mb_strlen($word) < $maxWordLength) unset($exploded[$key]);
}
$string = implode(" ", $exploded);
echo $string;
// outputs "super string"
To me, it seems that this hack works fine with most PHP versions:
$string2 = preg_replace("/~\b[a-zA-Z0-9]{1,2}\b\~/i", "", trim($string1));
Where [a-zA-Z0-9] are the accepted Char/Number range.

preg_replace vs trim PHP

I am working with a slug function and I dont fully understand some of it and was looking for some help on explaining.
My first question is about this line in my slug function $string = preg_replace('# +#', '-', $string); Now I understand that this replaces all spaces with a '-'. What I don't understand is what the + sign is in there for which comes after the white space in between the #.
Which leads to my next problem. I want a trim function that will get rid of spaces but only the spaces after they enter the value. For example someone accidentally entered "Arizona " with two spaces after the a and it destroyed the pages linked to Arizona.
So after all my rambling I basically want to figure out how I can use a trim to get rid of accidental spaces but still have the preg_replace insert '-' in between words.
ex.. "Sun City West " = "sun-city-west"
This is my full slug function-
function getSlug($string){
if(isset($string) && $string <> ""){
$string = strtolower($string);
//var_dump($string); echo "<br>";
$string = preg_replace('#[^\w ]+#', '', $string);
//var_dump($string); echo "<br>";
$string = preg_replace('# +#', '-', $string);
}
return $string;
}
You can try this:
function getSlug($string) {
return preg_replace('#\s+#', '-', trim($string));
}
It first trims extra spaces at the beginning and end of the string, and then replaces all the other with the - character.
Here your regex is:
#\s+#
which is:
# = regex delimiter
\s = any space character
+ = match the previous character or group one or more times
# = regex delimiter again
so the regex here means: "match any sequence of one or more whitespace character"
The + means at least one of the preceding character, so it matches one or more spaces. The # signs are one of the ways of marking the start and end of a regular expression's pattern block.
For a trim function, PHP handily provides trim() which removes all leading and trailing whitespace.

Regex to insert dot (.) after characters, before new line

I'm reformatting some text, and sometimes I have a string, where there is a sentence which is not ended by a dot.
I'm running various checks for this purpose, and one more I'd like is to "Add dot after last character before new line".
I'm not sure how to form the regular expression for this:]
$string = preg_replace("/???/", ".\n", $string);
Try this one:
$string = preg_replace("/(?<![.])(?=[\n\r]|$)/", ".", $string);
negative lookbehind (?<![.]) is checking previous character is not .
positive lookahead (?=[\n\r]|$) is checking next character is a newline or end of string.
like this I suppose:
<?php
$string = "Add dot after last character before new line\n";
$string = preg_replace("/(.)$/", "$1.\n", $string);
print $string;
?>
This way the dot will be added after the word line in the sentence and before the \n.
demo : http://ideone.com/J4g7tH
I'd do:
$string = "Add dot after last character before new line\n";
$string = preg_replace("/([^.\r\n])$/s", "$1.", $string);
Thanks for all the answers, but none of them really caught all scenarios right.
I fumbled my way to a good solution using the word boundary regex character class:
// Add dot after every word boundary that is followed by a new line.
$string = preg_replace("/[\b][\n]/", ".\n", $string);
I guess [\b][\n] could just as well be \b\n without square brackets.
This works for me:
$content = preg_replace("/(\w+)(\n)/", "$1.$2", $content);
It will match a word immediately followed by a new line, and add a dot in between.
Will match:
Hello\n
Will not match:
Hello \n
or
Hello.\n

Categories