I have 2 texts in a string:
%Juan%
%Juan Gonzalez%
And I want to only be able to get %Juan% and not the one with the Space, I have been trying several Regexes witout luck. I currently use:
/%(.*)%/U
but it gets both things, I tried adding and playing with [^\s] but it doesnt works.
Any help please?
The issue is that . matches any character but a newline. The /U ungreedy mode only makes .* lazy and it captures a text from the % up to the first % to the right of the first %.
If your strings contain one pair of %...%, you may use
/%(\S+)%/
See the regex demo
The \S+ pattern matches 1+ characters other than a whitespace, and the whole [^\h%] negated character class that matches any character but a horizontal space and % symbol.
If you have multiple %...% pairs, you may use
/%([^\h%]+)%/
See another regex demo, where \h matches any horizontal whitespace.
PHP demo:
$re = '/%([^\h%]+)%/';
$str = "%Juan%\n%Juan Gonzalez%";
preg_match_all($re, $str, $matches);
print_r($matches[1]);
i need to format uppercase words to bold but it doesn't work if the word contains two spaces
is there any way to make regex match only with words which end with colon?
$str = "BAKA NO TEST: hey";
$str = preg_replace('~[A-Z]{4,}\s[A-Z]\s{2,}(?:\s[A-Z]{4,})?:?~', '<b>$0</b>', $str);
output: <b>BAKA NO TEST:</b> hey
but it returns <b>BAKA</b> NO TEST: hey
the original $str is a multiline text so there are many lowercase and uppercase words but i need to change only some
You can do it like this:
$txt = preg_replace('~[A-Z]+(?:\s[A-Z]+)*:~', '<b>$0</b>', $txt);
Explanations:
[A-Z]+ # uppercase letter one or more times
(?: # open a non capturing group
\s # a white character (space, tab, newline,...)
[A-Z]+ #
)* # close the group and repeat it zero or more times
If you want a more tolerant pattern you can replace \s by \s+ to allow more than one space between each words.
Unless you have some good reason to use that regexp, try something simpler, like:
/([A-Z\s]+):/
Also, just so you know, you can use asterisk to specify none or more space characters: \s*
Trying to replace all occurrences of an #mention with an anchor tag, so far I have:
$comment = preg_replace('/#([^# ])? /', '#$1 ', $comment);
Take the following sample string:
"#name kdfjd fkjd as#name # lkjlkj #name"
Everything matches okay so far, but I want to ignore that single "#" symbol. I've tried using "+" and "{2,}" after the "[^# ]" which I thought would enforce a minimum amount of matches, but it's not working.
Replace the question mark (?) quantifier ("optional") and add in a + ("one or more") after your character class:
#([^# ]+)
The regex
(^|\s)(#\w+)
Might be what you are after.
It basically means, the start of the line, or a space, then an # symbol followed by 1 or more word characters.
E.g.
preg_match_all('/(^|\s)(#\w+)/', '#name1 kdfjd fkjd as#name2 # lkjlkj #name3', $result);
var_dump($result[2]);
Gives you
Array
(
[0] => #name1
[1] => #name3
)
I like Petah's answer but I adjusted it slightly
preg_replace('/(^|\s)#([\w.]+)/', '$1#$2', $text);
The main differences are:
the # symbol is not included. That's for display only, should not be in the URL
allows . character (note: \w includes underscore)
in the replacement, I added $1 at the beginning to preserve the whitespace
Replacing ? with + will work but not as you expect.
Your expression does not match #name at the end of string.
$comment = preg_replace('##(\w+)#', '$0 ', $comment);
This should do what you want. \w+ stands for letter (a-zA-Z0-9)
I recommend using a lookbehind before matching the # then one or more characters which are not a space or #.
The "one or more" quantifier (+) prevents the matching of mentions that mention no one.
Using a lookbehind is a good idea because it not only prevents the matching of email addresses and other such unwanted substrings, it asks the regex engine to primarily search #s then check the preceding character. This should improve pattern performance since the number of spaces should consistently outnumber the number of mentions in comments.
If the input text is multiline or may contain newlines, then adding an m pattern modifier will tell ^ to match all line starts. If newlines and tabs are possible, is will be more reliable to use (?<=^|\s)#([^#\s]+).
Code: (Demo)
$comment = "#name kdfjd ## fkjd as#name # lkjlkj #name";
var_export(
preg_replace(
'/(?<=^| )#([^# ]+)/',
'#$1',
$comment
)
);
Output: (single-quotes are from var_export())
'#name kdfjd ## fkjd as#name # lkjlkj #name'
Try:
'/#(\w+)/i'
Consider the following strings
breaking out a of a simple prison
this is b moving up
following me is x times better
All strings are lowercased already. I would like to remove any "loose" a-z characters, resulting in:
breaking out of simple prison
this is moving up
following me is times better
Is this possible with a single regex in php?
$str = "breaking out a of a simple prison
this is b moving up
following me is x times better";
$res = preg_replace("#\\b[a-z]\\b ?#i", "", $str);
echo $res;
How about:
preg_replace('/(^|\s)[a-z](\s|$)/', '$1', $string);
Note this also catches single characters that are at the beginning or end of the string, but not single characters that are adjacent to punctuation (they must be surrounded by whitespace).
If you also want to remove characters immediately before punctuation (e.g. 'the x.'), then this should work properly in most (English) cases:
preg_replace('/(^|\s)[a-z]\b/', '$1', $string);
As a one-liner:
$result = preg_replace('/\s\p{Ll}\b|\b\p{Ll}\s/u', '', $subject);
This matches a single lowercase letter (\p{Ll}) which is preceded or followed by whitespace (\s), removing both. The word boundaries (\b) ensure that only single letters are indeed matched. The /u modifier makes the regex Unicode-aware.
The result: A single letter surrounded by spaces on both sides is reduced to a single space. A single letter preceded by whitespace but not followed by whitespace is removed completely, as is a single letter only followed but not preceded by whitespace.
So
This a is my test sentence a. o How funny (what a coincidence a) this is!
is changed to
This is my test sentence. How funny (what coincidence) this is!
You could try something like this:
preg_replace('/\b\S\s\b/', "", $subject);
This is what it means:
\b # Assert position at a word boundary
\S # Match a single character that is a “non-whitespace character”
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
\b # Assert position at a word boundary
Update
As raised by Radu, because I've used the \S this will match more than just a-zA-Z. It will also match 0-9_. Normally, it would match a lot more than that, but because it's preceded by \b, it can only match word characters.
As mentioned in the comments by Tim Pietzcker, be aware that this won't work if your subject string needs to remove single characters that are followed by non word characters like test a (hello). It will also fall over if there are extra spaces after the single character like this
test a hello
but you could fix that by changing the expression to \b\S\s*\b
Try this one:
$sString = preg_replace("#\b[a-z]{1}\b#m", ' ', $sString);
I need to remove blank lines (with whitespace or absolutely blank) in PHP. I use this regular expression, but it does not work:
$str = ereg_replace('^[ \t]*$\r?\n', '', $str);
$str = preg_replace('^[ \t]*$\r?\n', '', $str);
I want a result of:
blahblah
blahblah
adsa
sad asdasd
will:
blahblah
blahblah
adsa
sad asdasd
// New line is required to split non-blank lines
preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
The above regular expression says:
/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/
1st Capturing group (^[\r\n]*|[\r\n]+)
1st Alternative: ^[\r\n]*
^ assert position at start of the string
[\r\n]* match a single character present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
2nd Alternative: [\r\n]+
[\r\n]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
[\s\t]* match a single character present in the list below
Quantifier: Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\tTab (ASCII 9)
[\r\n]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\r matches a carriage return (ASCII 13)
\n matches a fine-feed (newline) character (ASCII 10)
Your ereg-replace() solution is wrong because the ereg/eregi methods are deprecated. Your preg_replace() won't even compile, but if you add delimiters and set multiline mode, it will work fine:
$str = preg_replace('/^[ \t]*[\r\n]+/m', '', $str);
The m modifier allows ^ to match the beginning of a logical line rather than just the beginning of the whole string. The start-of-line anchor is necessary because without it the regex would match the newline at the end of every line, not just the blank ones. You don't need the end-of-line anchor ($) because you're actively matching the newline characters, but it doesn't hurt.
The accepted answer gets the job done, but it's more complicated than it needs to be. The regex has to match either the beginning of the string (^[\r\n]*, multiline mode not set) or at least one newline ([\r\n]+), followed by at least one newline ([\r\n]+). So, in the special case of a string that starts with one or more blank lines, they'll be replaced with one blank line. I'm pretty sure that's not the desired outcome.
But most of the time it replaces two or more consecutive newlines, along with any horizontal whitespace (spaces or tabs) that lies between them, with one linefeed. That's the intent, anyway. The author seems to expect \s to match just the space character (\x20), when in fact it matches any whitespace character. That's a very common mistake. The actual list varies from one regex flavor to the next, but at minimum you can expect \s to match whatever [ \t\f\r\n] matches.
Actually, in PHP you have a better option:
$str = preg_replace('/^\h*\v+/m', '', $str);
\h matches any horizontal whitespace character, and \v matches vertical whitespace.
Just explode the lines of the text to an array, remove empty lines using array_filter and implode the array again.
$tmp = explode("\n", $str);
$tmp = array_filter($tmp);
$str = implode("\n", $tmp);
Or in one line:
$str = implode("\n", array_filter(explode("\n", $str)));
I don't know, but this is maybe faster than preg_replace.
The comment from Bythos from Jamie's link above worked for me:
/^\n+|^[\t\s]*\n+/m
I didn't want to strip all of the new lines, just the empty/whitespace ones. This does the trick!
There is no need to overcomplicate things. This can be achieved with a simple short regular expression:
$text = preg_replace("/(\R){2,}/", "$1", $text);
The (\R) matches all newlines.
The {2,} matches two or more occurrences.
The $1 Uses the first backreference (platform specific EOL) as the replacement.
This has been already answered long time ago but can greatly benefit for preg_replace and a much simplified pattern:
$result = preg_replace('/\s*($|\n)/', '\1', $subject);
Pattern: Remove all white-space before a new-line -or- at the end of the string.
Longest match wins:
As the white-space \s has a greedy quantifier * and contains \n consecutive empty lines are matched.
As \s contains \r as well, \r\n new-line sequences are supported, however single \r (without \n) are not.
And when $ matches the end of the buffer the backreference \1 is empty allowing to handle trailing whitespace at the very end, too.
If leading (empty) lines need to be removed as well, they have to match while not capturing, too (this was not directly asked for but could be appropriate):
$result = preg_replace('/^(?:\s*\n)+|\s*($|\n)/', '\1', $subject);
# '----------'
Pattern: Also remove all leading white-space (first line(s) are empty).
And if the new-line at the end of the buffer should be normalized differently (always a newline at the end instead of never), it needs to be added: . "\n".
This variant is portable to \r\n, \r and \n new-line sequences ((?>\r\n|\r|\n)) or \R:
$result = preg_replace('/^(?> |\t|\r\n|\r|\n)+|(?> |\t|\r\n|\r|\n)*($|(?>\r\n|\r|\n))/', '\1', $subject);
# or:
$result = preg_replace('/^(?:\s*\R)+|\s*($|\R)/', '\1', $subject);
Pattern: Support all new-line sequences.
This is with the downside that the new-lines can not be normalized (e.g. any of the three to \n).
Therefore, it can make sense to normalize new-lines before removing:
$result = preg_replace(['/(?>\r\n|\n|\r)/', '/\s*($|\n)/'], ["\n", '\1'], $subject);
# or:
$result = preg_replace(['/\R/u', '/\s*($|\n)/'], ["\n", '\1'], $subject);
It ships with the opportunity to do some normalization apart from the line handling.
For example removal of the trailing white-space and fixing the missing new-line at the end of file.
Then doing more advanced line normalization, for example zero empty lines at the beginning and end; otherwise not more than two consecutive empty lines:
$result = preg_replace(
['/[ \t]*($|\R)/u', '/^\n*|(\n)\n*$|(\n{3})\n+/'],
["\n" , '\1\2' ],
$subject
);
The secondary pattern benefits from the first patterns replacements already.
The power with preg_replace relies here in choosing the backreference(s) to replace with wisely.
Also using multiple patterns can greatly simplify things and keep the process maintainable.
Try this one:
$str = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\r\n", $str);
If you output this to a text file, it will give the same output in the simple Notepad, WordPad and also in text editors, for example Notepad++.
Use this:
$str = preg_replace('/^\s+\r?\n$/D', '', $str);
function trimblanklines($str) {
return preg_replace('`\A[ \t]*\r?\n|\r?\n[ \t]*\Z`','',$str);
}
This one only removes them from the beginning and end, not the middle (if anyone else was looking for this).
The accepted answer leaves an extra line-break at the end of the string. Using rtrim() will remove this final linebreak:
rtrim(preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string));
From this answer, the following works fine for me!
$str = "<html>
<body>";
echo str_replace(array("\r", "\n"), '', $str);
<?php
function del_blanklines_in_array_q($ar){
$strip = array();
foreach($ar as $k => $v){
$ll = strlen($v);
while($ll--){
if(ord($v[$ll]) > 32){ //hex /0x20 int 32 ascii SPACE
$strip[] = $v; break;
}
}
}
return $strip;
}
function del_blanklines_in_file_q($in, $out){
// in filename, out filename
$strip = del_blanklines_in_array_q(file($in));
file_put_contents($out, $strip );
}
$file = "file_name.txt";
$file_data = file_get_contents($file);
$file_data_after_remove_blank_line = preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $file_data );
file_put_contents($file,$file_data_after_remove_blank_line);
nl2br(preg_replace('/^\v+/m', '', $r_msg))