matching space only before character regex and replacing space with dash - php

What I'm after is to replace a space with a dash - BUT only if that space is has a character after it. Reason being is I have an array of strings some of which have spaces inserted after the final word or character. (this is out of my control).
e.g
In the example below I have used %20 to show where a space is
string1 = farmer%20jones
string2 = farmer%20jim%20
I have the following regex preg_replace('/\s./', '-', $string);
I think I'm half way there, but the above searches for a space preceding a character and replaces that with a -
What I get with the above regex is
string1 = farmer-jones
string2 = farmer-jim-
What I want is:
string1 = farmer-jones
string2 = farmer-jim
I don't want that trailing - to be added.
Any help much appreciated

You can use negative lookahead here:
$repl = preg_replace('/\h+(?!$)/', '-', $string);
\h+ will match 1 or more horizontal whitespace.
(?!$) will assert that next position is not end of line.
RegEx Demo

Related

How can I split a string by white spaces that are not precedent by a certain character?

I want to split a string only at white spaces that does not have a certain delimiter (: in my case) before it. E.g.:
$string = "Time: 10:40 Request: page.php Action: whatever this is Refer: Facebook";
Then from something like this I want to achieve an array such that:
$array = ["Time: 10:40", "Request: page.php", "Action: whatever this is", "Refer: Facebook"];
I've tried the following so far:
$split = preg_split('/(:){0}\s/', $visit);
But this is still splitting at every occurence of a white space.
Edit: I think I asked the wrong question, however "whatever this is" should stay as a single string
Edit 2: The bits before the colons are known and stay the same, maybe incorporating those somehow makes the task easier (of not splitting at whitespace characters in strings that should stay together)?
You can use a lookahead in your split regex:
/\h+(?=[A-Z][a-z]*: )/
RegEx Demo
Regex \h+(?=[A-Z][a-z]*: ) matches 1+ whitespaces that is followed by a word starting with upper case letter and a colon and space.
you can do it
$string = "Time: 10:40 Request: page.php Action: whatever this is Refer: Facebook";
$split = preg_split('/\h+(?=[A-Z][a-z]*:)/', $string);
dd($split);
Another option could be to match what is before the colon and then match upon the next part that starts with a space, non whitespace chars and colon:
\S+:\h+.*?(?=\h+\S+:)\K\h+
\S+: Match 1+ times a non whitespace char
\h+ Match 1+ times a horizontal whitespace char
.*? Match any char except a newline non greedy
(?=\h+\S+:) Positive lookahead, assert what is on the right is 1+ horizontal whitespace chars, 1+ non whitespace chars and a colon
\K\h+ Forget what was matched using \K and match 1+ horizontal whitespace chars
Regex demo | php demo

Php Regex to insert character after first all-capital letter word in a string

I'm trying to use a preg_replace or similar php function to:
- identify the first all capital letter word in a string,
- and insert a character directly after it (a dash or semi-colon will do)
- the all capital letter word should be 3 characters long or more.
So far I have the regular expression:
/(?<!\ )([^A-Z{3,}])/
But, this isn't working in terms of only words that are 3+ characters. I'm also not sure I have it 'strictly' only looking at the very first word.
I believe that once I have the regex sorted out - this
$string = "LONDON On November 12th twelve people...";
$replaced_string = preg_replace('/myregex/',': ', $string);
will output as the following
LONDON: On November 12th twelve people..."
It's a fairly simple regex, really:
$replacedString = preg_replace('/\b([A-Z]{3,})\b/', '$1: ', $string);
It works like this:
\b: word boundary. This detects the start and end of a "word"
([A-Z]{3,}): Match 3 or more upper-case characters. The brackets capture this part of the match, so we can use it in the replacement string
\b: Another word boundary
Replace this match with:
'$1: ': the $1 refers back to the first captured group (the 3 or more upper case characters). To this, we're adding a colon and a space. That will be our replacement string
This will add the colon and space after all upper-case words of 3 or more characters. To replace only 1 word, just pass a limit to preg_replace:
$replaced = preg_replace('/\b([A-Z]{3,})\b/', '$1: ', $string, 1);
Where that last argument is the number of matches you wish to replace. -1 for all, 1 for 1, 2 for 2, etc...
Demo
Judging by your sample string, the upper-case words are city names. It's possible for city names to contain a dash, or even a space. To address this, you might want to match all strings containing upper-case chars, dashes and spaces:
$replaceAll = preg_replace('/\b([A-Z -]{2,}[A-Z])\b/', '$1: ', $string);
Demo 2
What changed:
([A-Z -]{2,}: The capturing match start with upper-case chars (2 or more, not 3), but also matches spaces and dashes.
[A-Z]): The last character of the captured group must be an upper-case character, this avoids capturing the trailing spaces or dashes. The result is that we capture stuff like "NEW YORK" or "FOO-TOWN", but not "ON - Something".
The rest is the same as before. If you want to allow for other characters that might occur (like a dot) just add them to the first part of the capturing group. The most complete pattern will probably be something like this:
$replaced = preg_replace('/\b([A-Z][A-Z .-]+[A-Z])\b/', '$1: ', $string);
This ensures the captured group starts, and ends with an upper case character, and contains any number of upper-case chars, spaces, dots and dashes in between. So this will match something like "ST. LEWIS", too

Regular expression to remove trailing chars

I'm looking for a regular expression in Php that could transform incoming strings like this:
abaisser_negation_pronominal_question => abaisser_n_p_q
abaisser_pronominal_question => abaisser_p_q
abaisser_negation_question => abaisser_n_q
abaisser_negation_pronominal => abaisser_n_p
abaisser_negation_voix_passive_pronominal => abaisser_n_v_p_p
abaisser => abaisser
With the Php code close to something like:
$line=preg_replace("/<h3>/im", "", $line);
How would you do?
You can use:
$input = preg_replace('/(_[A-Za-z])[^_\n]*/', '$1', $input);
RegEx Demo
Explanation:
This regex searches for (_[A-Za-z])[^_\n]* which means underscore followed by single letter and then match before a newline or underscore
It capture first part (_[A-Za-z]) in a backreference $1
Replacement is $1 leaving underscore and first letter in the replacement string
You could use \K or positive lookbehind.
$input = preg_replace('~_.\K[^_\n]*~', '', $input);
Pattern _. in the above regex would match an _ and also the character following the underscore. \K discards the previously matched characters that is, _ plus the following character. It won't take these two characters into consideration. Now [^_\n]* matches any character but not of an _ or a \n newline character zero or more times. So the characters after the character which was preceded by an underscore would be matched upto the next _ or \n character. Removing those characters will give you the desired output.
DEMO
$input = preg_replace('~(?<=_.)[^_\n]*~', '', $input);
It just looks after to the _ and the character following the _ and matches all the characters upto the next underscore or newline character.
DEMO
You can use regex
$input = preg_replace('/_(.)[^\n_]+/', '_$1', $input);
DEMO
What it does is capture the character after _ and match till \n or _ is encountered and replaced with the _$1 which means _ plus the character captured.
$line = preg_replace("/_([a-z])([a-z]*)/i", "_$1", $line);

Preg_replace Tag Replace Dashes With HTML Tag

I am partially disabled. I write a LOT of wordpress posts in 'text' mode and to save typing I will use a shorthand for emphasis and strong tags. Eg. I'll write -this- for <em>this</em>.
I want to add a function in wordpress to regex replace word(s) that have a pair of dashes with the appropriate html tag. For starters I'd like to replace -this- with <em>this</em>
Eg:
-this- becomes <em>this</em>
-this-. becomes <em>this</em>.
What I can't figure out is how to replace the bounding chars. I want it to match the string, but then retain the chars immediately before and after.
$pattern = '/\s\-(.*?)\-(\s|\.)/';
$replacement = '<em>$1</em>';
return preg_replace($pattern, $replacement, $content);
...this does the 'search' OK, but it can't get me the space or period after.
Edit: The reason for wanting a space as the beginning boundary and then a space OR a period OR a comma OR a semi-colon as the ending boundary is to prevent problems with truly hyphenated words.
So pseudocode:
1. find the space + string + (space or punctuation)
2. replace with space + open_htmltag + string + close_htmltag + whatever the next char is.
Ideas?
a space as the beginning boundary and then a space OR a period OR a comma OR a semi-colon as the ending boundary
You can try with capturing groups with <em>$1</em>$2 as substitution.
[ ]-([^-]*)-([ .,;])
DEMO
sample code:
$re = "/-([^-]*)-([ .,;])/i";
$str = " -this-;\n -this-.\n -this- ";
$subst = '<em>$1</em>$2';
$result = preg_replace($re, $subst, $str);
Note: Use single space instead of \s that match any white space character [\r\n\t\f ]
Edited by o/p: Did not need opening space as delimiter. This is the winning answer.
You can try with Positive Lookahead as well with only single capturing group.
-([^-]*)-(?=[ .,;])
substitution string: <em>$1</em>
DEMO
You can use this regex:
(-)(.*?)(-)
Check the substitution section:
Working demo
Edit: as an improvement you can also use -(.*?)- and utilize capturing group \1
In the code below, the regex pattern will start at a hyphen and collect any non-hyphen characters until the next hyphen occurs. It then wraps the collected text in an em tag. The hyphens are discarded.
Note: If you use a hyphen for its intended purposes, this may cause problems. You may want to devise an escape character for that.
$str = "hello -world-. I am -radley-.";
$replace = preg_replace('/-([^-]+?)-/', '<em>$1</em>', $str);
echo $str; // no formatting
echo '<br>';
echo $replace; // formatting
Result:
hello -world-. I am -radley-.
hello <em>world</em>. I am <em>radley</em>.

Regex to remove single characters from string

Consider the following strings
breaking out a of a simple prison
this is b moving up
following me is x times better
All strings are lowercased already. I would like to remove any "loose" a-z characters, resulting in:
breaking out of simple prison
this is moving up
following me is times better
Is this possible with a single regex in php?
$str = "breaking out a of a simple prison
this is b moving up
following me is x times better";
$res = preg_replace("#\\b[a-z]\\b ?#i", "", $str);
echo $res;
How about:
preg_replace('/(^|\s)[a-z](\s|$)/', '$1', $string);
Note this also catches single characters that are at the beginning or end of the string, but not single characters that are adjacent to punctuation (they must be surrounded by whitespace).
If you also want to remove characters immediately before punctuation (e.g. 'the x.'), then this should work properly in most (English) cases:
preg_replace('/(^|\s)[a-z]\b/', '$1', $string);
As a one-liner:
$result = preg_replace('/\s\p{Ll}\b|\b\p{Ll}\s/u', '', $subject);
This matches a single lowercase letter (\p{Ll}) which is preceded or followed by whitespace (\s), removing both. The word boundaries (\b) ensure that only single letters are indeed matched. The /u modifier makes the regex Unicode-aware.
The result: A single letter surrounded by spaces on both sides is reduced to a single space. A single letter preceded by whitespace but not followed by whitespace is removed completely, as is a single letter only followed but not preceded by whitespace.
So
This a is my test sentence a. o How funny (what a coincidence a) this is!
is changed to
This is my test sentence. How funny (what coincidence) this is!
You could try something like this:
preg_replace('/\b\S\s\b/', "", $subject);
This is what it means:
\b # Assert position at a word boundary
\S # Match a single character that is a “non-whitespace character”
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
\b # Assert position at a word boundary
Update
As raised by Radu, because I've used the \S this will match more than just a-zA-Z. It will also match 0-9_. Normally, it would match a lot more than that, but because it's preceded by \b, it can only match word characters.
As mentioned in the comments by Tim Pietzcker, be aware that this won't work if your subject string needs to remove single characters that are followed by non word characters like test a (hello). It will also fall over if there are extra spaces after the single character like this
test a hello
but you could fix that by changing the expression to \b\S\s*\b
Try this one:
$sString = preg_replace("#\b[a-z]{1}\b#m", ' ', $sString);

Categories