preg_replace to remove stand-alone numbers - php

I'm looking to replace all standalone numbers from a string where the number has no adjacent characters (including dashes), example:
Test 3 string 49Test 49test9 9
Should return Test string 49Test 49Test9
So far I've been playing around with:
$str = 'Test 3 string 49Test 49test9 9';
$str= preg_replace('/[^a-z\-]+(\d+)[^a-z\-]+?/isU', ' ', $str);
echo $str;
However with no luck, this returns
Test string 9Test 9test9
leaving out part of the string, i thought to add [0-9] to the matches, but to no avail, what am I missing, seems so simple?
Thanks in advance

Try using a word boundary and negative look-arounds for hyphens, eg
$str = preg_replace('/\b(?<!-)\d+(?!-)\b/', '', $str);

Not that complicated, if you watch the spaces :)
<?php
$str = 'Test 3 string 49Test 49test9 9';
$str = preg_replace('/(\s(\d+)\s|\s(\d+)$|^(\d+)\s)/iU', '', $str);
echo $str;

Try this, I tried to cover your additional requirement to not match on 5-abc
\s*(?<!\B|-)\d+(?!\B|-)\s*
and replace with a single space!
See it here online on Regexr
The problem then is to extend the word boundary with the character -. I achieved this by using negative look arounds and looking for - or \B (not a word boundary)
Additionally I am matching the surrounding whitespace with the \s*, therefore you have to replace with a single space.

I would suggest using
explode(" ",$str)
to get an array of the "words" in your string. Then it should be easier to filter out single numbers.

Related

Preg_replace Tag Replace Dashes With HTML Tag

I am partially disabled. I write a LOT of wordpress posts in 'text' mode and to save typing I will use a shorthand for emphasis and strong tags. Eg. I'll write -this- for <em>this</em>.
I want to add a function in wordpress to regex replace word(s) that have a pair of dashes with the appropriate html tag. For starters I'd like to replace -this- with <em>this</em>
Eg:
-this- becomes <em>this</em>
-this-. becomes <em>this</em>.
What I can't figure out is how to replace the bounding chars. I want it to match the string, but then retain the chars immediately before and after.
$pattern = '/\s\-(.*?)\-(\s|\.)/';
$replacement = '<em>$1</em>';
return preg_replace($pattern, $replacement, $content);
...this does the 'search' OK, but it can't get me the space or period after.
Edit: The reason for wanting a space as the beginning boundary and then a space OR a period OR a comma OR a semi-colon as the ending boundary is to prevent problems with truly hyphenated words.
So pseudocode:
1. find the space + string + (space or punctuation)
2. replace with space + open_htmltag + string + close_htmltag + whatever the next char is.
Ideas?
a space as the beginning boundary and then a space OR a period OR a comma OR a semi-colon as the ending boundary
You can try with capturing groups with <em>$1</em>$2 as substitution.
[ ]-([^-]*)-([ .,;])
DEMO
sample code:
$re = "/-([^-]*)-([ .,;])/i";
$str = " -this-;\n -this-.\n -this- ";
$subst = '<em>$1</em>$2';
$result = preg_replace($re, $subst, $str);
Note: Use single space instead of \s that match any white space character [\r\n\t\f ]
Edited by o/p: Did not need opening space as delimiter. This is the winning answer.
You can try with Positive Lookahead as well with only single capturing group.
-([^-]*)-(?=[ .,;])
substitution string: <em>$1</em>
DEMO
You can use this regex:
(-)(.*?)(-)
Check the substitution section:
Working demo
Edit: as an improvement you can also use -(.*?)- and utilize capturing group \1
In the code below, the regex pattern will start at a hyphen and collect any non-hyphen characters until the next hyphen occurs. It then wraps the collected text in an em tag. The hyphens are discarded.
Note: If you use a hyphen for its intended purposes, this may cause problems. You may want to devise an escape character for that.
$str = "hello -world-. I am -radley-.";
$replace = preg_replace('/-([^-]+?)-/', '<em>$1</em>', $str);
echo $str; // no formatting
echo '<br>';
echo $replace; // formatting
Result:
hello -world-. I am -radley-.
hello <em>world</em>. I am <em>radley</em>.

PHP Word Censor help, censor word based on a criteria using regex

Hi all im trying to build a regular expression which will sustain the following criteria.
Word to be censored in this example "view".
Character to be used after censor: "%", since "*" mess up my post formatting.
Examples of word use:
view
views
preview
I went to see the great view
The view was great wasn't it.
Example after word censor:
%%%%
views
preview
I went to see the great %%%%
The %%%% was great wasn't it.
Here is some code I have:
$string = preg_replace_callback('/\s*'. preg_quote($word, '\\') .'\s*/is', 'bbcode_callback_censored', $string);
Trouble is this matches everything right now since i use "*" in the regex ater "\s". Any ideas what I could do to fulfill my criteria?
Don't match for whitespace, use a word boundary
Try
$string = preg_replace_callback('/\b'. preg_quote($word, '\\') .'\b/is', 'bbcode_callback_censored', $string);
See it here on Regexr
You just need to make sure that the content of $word does not start or end with a non word character, then the word boundary will not work.
\b is a word boundary. It matches on a change from a word character (as defined in \w) to a non word character as defined in \W, or the other way round.
Alternative: whitespace boundary
If you don't like the word boundary because it is possible that your word to replace starts or end with non word characters like "#view", define your own "whitespace boundary", e.g. like this:
(?<=^|\s)#view(?=$|\s)
See it here on Regexr
Would look in your code like this
$string = preg_replace_callback('/(?<=^|\s)'. preg_quote($word, '\\') .'(?=$|\s)/is', 'bbcode_callback_censored', $string);
(?<=^|\s) will match if there is the Start of the String or whitespace before
(?=$|\s) will match if there is the end of the String or whitespace ahead
You don't have to use slow and memory hungry regex engine for this simple task, use str_replace — it replaces all occurrences of the search string with the replacement string.
$string = str_replace(' view ','%',$string);
This will do...
$text_without_views = preg_replace('/(^|\W)view($|\W)/','\1***\2',$text_with_views);
$str = "view viewing view view";
$str = preg_replace('/(^|\b)view(\b|$)/is', ' %%%% ', $str);
echo $str;

Regex to add spacing between sentences in a string in php

I use a spanish dictionary api that returns definitions with small issues. This specific problem happens when the definition has more than 1 sentence. Sometimes the sentences are not properly separated by a space character, so I receive something like this:
This is a sentence.Some other sentence.Sometimes there are no spaces between dots. See?
Im looking for a regex that would replace "." for ". " when the dot is immediately followed by a char different than the space character. The preg_replace() should return:
This is a sentence. Some other sentence. Sometimes there are no spaces between dots. See?
So far I have this:
echo preg_replace('/(?<=[a-zA-Z])[.]/','. ',$string);
The problem is that it also adds a space when there is already a space after the dot. Any ideas? Thanks!
Try this regular expression:
echo preg_replace('/(?<!\.)\.(?!(\s|$|\,|\w\.))/', '. ', $string);
echo preg_replace( '/\.([^, ])/', '. $1', $string);
It works!
You just need to apply a look-ahead to so adds a space if the next character is something other than a space or is not the end of the string:
$string = preg_replace('/(?<=[a-zA-Z])[.](?![\s$])/','. ',$string);

How to replace one or more consecutive spaces with one single character?

I want to generate the string like SEO friendly URL. I want that multiple blank space to be eliminated, the single space to be replaced by a hyphen (-), then strtolower and no special chars should be allowed.
For that I am currently the code like this:
$string = htmlspecialchars("This Is The String");
$string = strtolower(str_replace(htmlspecialchars((' ', '-', $string)));
The above code will generate multiple hyphens. I want to eliminate that multiple space and replace it with only one space. In short, I am trying to achieve the SEO friendly URL like string. How do I do it?
You can use preg_replace to replace any sequence of whitespace chars with a dash...
$string = preg_replace('/\s+/', '-', $string);
The outer slashes are delimiters for the pattern - they just mark where the pattern starts and ends
\s matches any whitespace character
+ causes the previous element to match 1 or more times. By default, this is 'greedy' so it will eat up as many consecutive matches as it can.
See the manual page on PCRE syntax for more details
echo preg_replace('~(\s+)~', '-', $yourString);
What you want is "slugify" a string. Try a search on SO or google on "php slugify" or "php slug".

Insert separators into a string in regular intervals

I have the following string in php:
$string = 'FEDCBA9876543210';
The string can be have 2 or more (I mean more) hexadecimal characters
I wanted to group string by 2 like :
$output_string = 'FE:DC:BA:98:76:54:32:10';
I wanted to use regex for that, I think I saw a way to do like "recursive regex" but I can't remember it.
Any help appreciated :)
If you don't need to check the content, there is no use for regex.
Try this
$outputString = chunk_split($string, 2, ":");
// generates: FE:DC:BA:98:76:54:32:10:
You might need to remove the last ":".
Or this :
$outputString = implode(":", str_split($string, 2));
// generates: FE:DC:BA:98:76:54:32:10
Resources :
www.w3schools.com - chunk_split()
www.w3schools.com - str_split()
www.w3schools.com - implode()
On the same topic :
Split string into equal parts using PHP
Sounds like you want a regex like this:
/([0-9a-f]{2})/${1}:/gi
Which, in PHP is...
<?php
$string = 'FE:DC:BA:98:76:54:32:10';
$pattern = '/([0-9A-F]{2})/gi';
$replacement = '${1}:';
echo preg_replace($pattern, $replacement, $string);
?>
Please note the above code is currently untested.
You can make sure there are two or more hex characters doing this:
if (preg_match('!^\d*[A-F]\d*[A-F][\dA-F]*$!i', $string)) {
...
}
No need for a recursive regex. By the way, recursive regex is a contradiction in terms. As a regular language (which a regex parses) can't be recursive, by definition.
If you want to also group the characters in pairs with colons in between, ignoring the two hex characters for a second, use:
if (preg_match('!^[\dA-F]{2}(?::[A-F][\dA-F]{2})*$!i', $string)) {
...
}
Now if you want to add the condition requiring tow hex characters, use a positive lookahead:
if (preg_match('!^(?=[\d:]*[A-F][\d:]*[A-F])[\dA-F]{2}(?::[A-F][\dA-F]{2})*$!i', $string)) {
...
}
To explain how this works, the first thing it does it that it checks (with a positive lookahead ie (?=...) that you have zero or more digits or colons followed by a hex letter followed by zero or more digits or colons and then a letter. This will ensure there will be two hex letters in the expression.
After the positive lookahead is the original expression that makes sure the string is pairs of hex digits.
Recursive regular expressions are usually not possible. You may use a regular expression recursively on the results of a previous regular expression, but most regular expression grammars will not allow recursivity. This is the main reason why regular expressions are almost always inadequate for parsing stuff like HTML. Anyways, what you need doesn't need any kind of recursivity.
What you want, simply, is to match a group multiple times. This is quite simple:
preg_match_all("/([a-z0-9]{2})+/i", $string, $matches);
This will fill $matches will all occurrences of two hexadecimal digits (in a case-insensitive way). To replace them, use preg_replace:
echo preg_replace("/([a-z0-9]{2})/i", $string, '\1:');
There will probably be one ':' too much at the end, you can strip it with substr:
echo substr(preg_replace("/([a-z0-9]{2})/i", $string, '\1:'), 0, -1);
While it is not horrible practice to use rtrim(chunk_split($string, 2, ':'), ':'), I prefer to use direct techniques that avoid "mopping up" after making modifications.
Code: (Demo)
$string = 'FEDCBA9876543210';
echo preg_replace('~[\dA-F]{2}(?!$)\K~', ':', $string);
Output:
FE:DC:BA:98:76:54:32:10
Don't be intimidated by the regex. The pattern says:
[\dA-F]{2} # match exactly two numeric or A through F characters
(?!$) # that is not located at the end of the string
\K # restart the fullstring match
When I say "restart the fullstring match" I mean "forget the previously matched characters and start matching from this point forward". Because there are no additional characters matched after \K, the pattern effectively delivers the zero-width position where the colon should be inserted. In this way, no original characters are lost in the replacement.

Categories