PHP regexp - remove all leading, trailing and standalone hyphens - php

I'm trying to remove all leading, trailing and standalone hyphens from string:
-on-line - auction- website
Desired result:
on-line auction website
I came up with a working solution:
^-|(?<=\s)-|-(?=\s)|-$
But it looks to me a little bit amateur (isn't it?). So do you have a better solution?

You can use this pattern:
(?<!\S)-|-(?!\S)
example:
echo preg_replace('~(?<!\S)-|-(?!\S)~', '', '-on-line - auction- website');
Another possible pattern that uses a conditional statement: -(?(?!\S)|(?<!\S.))
This last one is interesting since it benefits of a single branch with a leading literal character. This way, the regex engine is able to quickly tests only positions in the string where the character appears (due to internal optimisations before the "normal" regex engine walk).
Note that the conditional statement isn't mandatory and can also be replaced with a non-capturing group adding a : (it doesn't change the result but it's longer):
-(?:(?!\S)|(?<!\S.))

I guess it can be shortened to:
$repl = preg_replace('/(^|\s)-|-(\s|$)/', '$1$2', $str);

You can try the following:
-(?!\w)|(?<!\w)-
This either matches a dash which is followed by something that is not a word character, or a dash that is preceded by something that is not a word character.
Or if you want to put it otherwise, match all dashes which are not between two word characters.
Regex101 Demo

There's no reason you have to do everything in one regex. Split it into two or three.
s/^-\s*//; # Strip leading hyphens and optional space
s/\s*-$//; # Strip trailing hyphens and optional space
s/\s+-\s+/ /; # Change any space-hyphen-space sequences to a single space.
That's the sed/Perl syntax. You'll adjust accordingly for the preg_replace syntax.

In PHP you can use trim and rtrim to remove any characters from the beginning and end of the string. After that you can use str_replace to remove the - from the middle.
$string = '-on-line - auction- website';
$string = trim($string, "-");
$string = rtrim($string,"-");
$string = str_replace("- ", " ", $string);
$string = str_replace(" ", " ", $string); //remove double spaces left by " - "
var_dump($string);
the result:
string(24) "on-line auction website"
You can stack that up into one line if you want:
$string = $string = str_replace(" ", " ", str_replace("- ", " ", rtrim(trim($string, "-"),"-")));

Related

Preg_replace Tag Replace Dashes With HTML Tag

I am partially disabled. I write a LOT of wordpress posts in 'text' mode and to save typing I will use a shorthand for emphasis and strong tags. Eg. I'll write -this- for <em>this</em>.
I want to add a function in wordpress to regex replace word(s) that have a pair of dashes with the appropriate html tag. For starters I'd like to replace -this- with <em>this</em>
Eg:
-this- becomes <em>this</em>
-this-. becomes <em>this</em>.
What I can't figure out is how to replace the bounding chars. I want it to match the string, but then retain the chars immediately before and after.
$pattern = '/\s\-(.*?)\-(\s|\.)/';
$replacement = '<em>$1</em>';
return preg_replace($pattern, $replacement, $content);
...this does the 'search' OK, but it can't get me the space or period after.
Edit: The reason for wanting a space as the beginning boundary and then a space OR a period OR a comma OR a semi-colon as the ending boundary is to prevent problems with truly hyphenated words.
So pseudocode:
1. find the space + string + (space or punctuation)
2. replace with space + open_htmltag + string + close_htmltag + whatever the next char is.
Ideas?
a space as the beginning boundary and then a space OR a period OR a comma OR a semi-colon as the ending boundary
You can try with capturing groups with <em>$1</em>$2 as substitution.
[ ]-([^-]*)-([ .,;])
DEMO
sample code:
$re = "/-([^-]*)-([ .,;])/i";
$str = " -this-;\n -this-.\n -this- ";
$subst = '<em>$1</em>$2';
$result = preg_replace($re, $subst, $str);
Note: Use single space instead of \s that match any white space character [\r\n\t\f ]
Edited by o/p: Did not need opening space as delimiter. This is the winning answer.
You can try with Positive Lookahead as well with only single capturing group.
-([^-]*)-(?=[ .,;])
substitution string: <em>$1</em>
DEMO
You can use this regex:
(-)(.*?)(-)
Check the substitution section:
Working demo
Edit: as an improvement you can also use -(.*?)- and utilize capturing group \1
In the code below, the regex pattern will start at a hyphen and collect any non-hyphen characters until the next hyphen occurs. It then wraps the collected text in an em tag. The hyphens are discarded.
Note: If you use a hyphen for its intended purposes, this may cause problems. You may want to devise an escape character for that.
$str = "hello -world-. I am -radley-.";
$replace = preg_replace('/-([^-]+?)-/', '<em>$1</em>', $str);
echo $str; // no formatting
echo '<br>';
echo $replace; // formatting
Result:
hello -world-. I am -radley-.
hello <em>world</em>. I am <em>radley</em>.

Meaning of a simple pattern of preg_replace (#\s+#)?

Sorry for the very basic question, but there's simply no easy way to search for a string like that nor here neither in Google or SymbolHound. Also haven't found an answer in PHP Manual (Pattern Syntax & preg_replace).
This code is inside a function that receives the $content and $length parameters.
What does that preg_replace serves for?
$the_string = preg_replace('#\s+#', ' ', $content);
$words = explode(' ', $the_string);
if( count($words) <= $length )
Also, would it be better to use str_word_count instead?
This pattern replaces successive space characters (note, not just spaces, but also line breaks or tabs) with a single, conventional space (' '). \s+ says "match a sequence, made up of one or more space characters".
The # signs are delimiters for the pattern. Probably more common is to see patterns delimited by forward slashes. (Actually you can do REGEX in PHP without delimiters but doing so has implications on how the pattern is handled, which is beyond the scope of this question/answer).
http://php.net/manual/en/regexp.reference.delimiters.php
Relying on spaces to find words in a string is generally not the best approach - we can use the \b word boundary marker instead.
$sentence = "Hello, there. How are you today? Hope you're OK!";
preg_match_all('/\b[\w-]+\b/', $sentence, $words);
That says: grab all substrings within the greater string that are comprised of only alphanumeric characters or hyphens, and which are encased by a word boundary.
$words is now an array of words used in the sentence.
# is delimiter
Often used delimiters are forward slashes (/), hash signs (#) and
tildes (~). The following are all examples of valid delimited
patterns.
$the_string = preg_replace('#\s+#', ' ', $content);
it will replace multiple space (\s) with single space
\s+ is used to match multiple spaces.
You are replacing them with a single space, using preg_replace('#\s+#', ' ', $content);
str_word_count might be suitable, but you might need to specify additional characters which count as words, or the function reports wrong values when using UTF-8 characters.
str_word_count($str, 1, characters_that_are_not_considered_word_boundaries);
EXAMPLE:
print_r(str_word_count('holóeóó what',1));
returns
Array ( [0] => hol [1] => e [2] => what )

Finding #mentions in string

Trying to replace all occurrences of an #mention with an anchor tag, so far I have:
$comment = preg_replace('/#([^# ])? /', '#$1 ', $comment);
Take the following sample string:
"#name kdfjd fkjd as#name # lkjlkj #name"
Everything matches okay so far, but I want to ignore that single "#" symbol. I've tried using "+" and "{2,}" after the "[^# ]" which I thought would enforce a minimum amount of matches, but it's not working.
Replace the question mark (?) quantifier ("optional") and add in a + ("one or more") after your character class:
#([^# ]+)
The regex
(^|\s)(#\w+)
Might be what you are after.
It basically means, the start of the line, or a space, then an # symbol followed by 1 or more word characters.
E.g.
preg_match_all('/(^|\s)(#\w+)/', '#name1 kdfjd fkjd as#name2 # lkjlkj #name3', $result);
var_dump($result[2]);
Gives you
Array
(
[0] => #name1
[1] => #name3
)
I like Petah's answer but I adjusted it slightly
preg_replace('/(^|\s)#([\w.]+)/', '$1#$2', $text);
The main differences are:
the # symbol is not included. That's for display only, should not be in the URL
allows . character (note: \w includes underscore)
in the replacement, I added $1 at the beginning to preserve the whitespace
Replacing ? with + will work but not as you expect.
Your expression does not match #name at the end of string.
$comment = preg_replace('##(\w+)#', '$0 ', $comment);
This should do what you want. \w+ stands for letter (a-zA-Z0-9)
I recommend using a lookbehind before matching the # then one or more characters which are not a space or #.
The "one or more" quantifier (+) prevents the matching of mentions that mention no one.
Using a lookbehind is a good idea because it not only prevents the matching of email addresses and other such unwanted substrings, it asks the regex engine to primarily search #s then check the preceding character. This should improve pattern performance since the number of spaces should consistently outnumber the number of mentions in comments.
If the input text is multiline or may contain newlines, then adding an m pattern modifier will tell ^ to match all line starts. If newlines and tabs are possible, is will be more reliable to use (?<=^|\s)#([^#\s]+).
Code: (Demo)
$comment = "#name kdfjd ## fkjd as#name # lkjlkj #name";
var_export(
preg_replace(
'/(?<=^| )#([^# ]+)/',
'#$1',
$comment
)
);
Output: (single-quotes are from var_export())
'#name kdfjd ## fkjd as#name # lkjlkj #name'
Try:
'/#(\w+)/i'

Regex to add spacing between sentences in a string in php

I use a spanish dictionary api that returns definitions with small issues. This specific problem happens when the definition has more than 1 sentence. Sometimes the sentences are not properly separated by a space character, so I receive something like this:
This is a sentence.Some other sentence.Sometimes there are no spaces between dots. See?
Im looking for a regex that would replace "." for ". " when the dot is immediately followed by a char different than the space character. The preg_replace() should return:
This is a sentence. Some other sentence. Sometimes there are no spaces between dots. See?
So far I have this:
echo preg_replace('/(?<=[a-zA-Z])[.]/','. ',$string);
The problem is that it also adds a space when there is already a space after the dot. Any ideas? Thanks!
Try this regular expression:
echo preg_replace('/(?<!\.)\.(?!(\s|$|\,|\w\.))/', '. ', $string);
echo preg_replace( '/\.([^, ])/', '. $1', $string);
It works!
You just need to apply a look-ahead to so adds a space if the next character is something other than a space or is not the end of the string:
$string = preg_replace('/(?<=[a-zA-Z])[.](?![\s$])/','. ',$string);

How to replace one or more consecutive spaces with one single character?

I want to generate the string like SEO friendly URL. I want that multiple blank space to be eliminated, the single space to be replaced by a hyphen (-), then strtolower and no special chars should be allowed.
For that I am currently the code like this:
$string = htmlspecialchars("This Is The String");
$string = strtolower(str_replace(htmlspecialchars((' ', '-', $string)));
The above code will generate multiple hyphens. I want to eliminate that multiple space and replace it with only one space. In short, I am trying to achieve the SEO friendly URL like string. How do I do it?
You can use preg_replace to replace any sequence of whitespace chars with a dash...
$string = preg_replace('/\s+/', '-', $string);
The outer slashes are delimiters for the pattern - they just mark where the pattern starts and ends
\s matches any whitespace character
+ causes the previous element to match 1 or more times. By default, this is 'greedy' so it will eat up as many consecutive matches as it can.
See the manual page on PCRE syntax for more details
echo preg_replace('~(\s+)~', '-', $yourString);
What you want is "slugify" a string. Try a search on SO or google on "php slugify" or "php slug".

Categories