$data contains tabs, leading spaces and multiple spaces. I wish to replace all tabs with a space. Multiple spaces with one single space, and remove leading spaces.
In fact somthing that would turn this input data:
[ asdf asdf asdf asdf ]
Into output data:
[asdf asdf asdf asdf]
How do I do this?
Trim, replace tabs and extra spaces with single spaces:
$data = preg_replace('/[ ]{2,}|[\t]/', ' ', trim($data));
$data = trim(preg_replace('/\s+/g', '', $data));
Assuming the square brackets aren't part of the string and you're just using them for illustrative purposes, then:
$new_string = trim(preg_replace('!\s+!', ' ', $old_string));
You might be able to do that with a single regex but it'll be a fairly complicated regex. The above is much more straightforward.
Note: I'm also assuming you don't want to replace "AB\t\tCD" (\t is a tab) with "AB CD".
$data = trim($data);
That gets rid of your leading (and trailing) spaces.
$pattern = '/\s+/';
$data = preg_replace($pattern, ' ', $data);
That turns any collection of one or more spaces into just one space.
$data = str_replace("\t", " ", $data);
That gets rid of your tabs.
$new_data = preg_replace("/[\t\s]+/", " ", trim($data));
This answer takes the question completely literally: it is only concerned with spaces and tabs. Granted, the OP probably also wants to include other kinds of whitespace in what gets trimmed/compressed, but let's pretend he wants to preserve embedded CR and/or LF.
First, let's set up some constants. This will allow for both ease of understanding and maintainability, should modifications become necessary. I put in some extra spaces so that you can compare the similarities and differences more easily.
define( 'S', '[ \t]+' ); # Stuff you want to compress; in this case ONLY spaces/tabs
define( 'L', '/\A'.S.'/' ); # stuff on the Left edge will be trimmed
define( 'M', '/'.S.'/' ); # stuff in the Middle will be compressed
define( 'R', '/'.S.'\Z/' ); # stuff on the Right edge will be trimmed
define( 'T', ' ' ); # what we want the stuff compressed To
We are using \A and \Z escape characters to specify the beginning and end of the subject, instead of the typical ^ and $ which are line-oriented meta-characters. This is not so much because they are needed in this instance as much as "defensive" programming, should the value of S change to make them needed in the future.
Now for the secret sauce: we are going to take advantage of some special semantics of preg_replace, namely (emphasis added)
If there are fewer elements in the replacement array than in the pattern array, any extra patterns will be replaced by an empty string.
function trim_press( $data ){
return preg_replace( [ M, L, R ], [ T ], $data );
}
So instead of a pattern string and replacement string, we are using a pattern array and replacement array, which results in the extra patterns L and R being trimmed.
In case you need to remove too.
$data = trim(preg_replace('/\s+|nbsp;/g', '', $data));
After much frustration I found this to be the best solution, as it also removes non breaking spaces which can be two characters long:
$data = html_entity_decode(str_replace(' ',' ',htmlentities($data)));
$data = trim(preg_replace('/\h/', ' ', $data)); // replaces more space character types than \s
See billynoah
Just use this regex
$str = trim(preg_replace('/\s\s+/', ' ', $str));
it will replace all tabs and spaces by one space,
here sign + in regex means one or more times,
pattern means, that wherever there are two or more spaces, replace it by one space
Related
my string may be like this:
# *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?
in fact - it is a dirty csv string - having names of jpg images
I need to remove any non-alphanum chars - from both sides of the string
then - inside the resulting string - remove the same - except commas and dots
then - remove duplicates commas and dots - if any - replace them with single ones
so the final result should be:
lorem.jpg,ipsum.jpg,dolor.jpg
I firstly tried to remove any white space - anywhere
$str = str_replace(" ", "", $str);
then I used various forms of trim functions - but it is tedious and a lot of code
the additional problem is - duplicates commas and dots may have one or more instances - for example - .. or ,,,,
is there a way to solve this using regex, pls ?
List of modeled steps following your words:
Step 1
"remove any non-alphanum chars from both sides of the string"
translated: remove trailing and tailing consecutive [^a-zA-Z0-9] characters
regex: replace ^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$ with $1
Step 2
"inside the resulting string - remove the same - except commas and dots"
translated: remove any [^a-zA-Z0-9.,]
regex: replace [^a-zA-Z0-9.,] with empty string
Step 3
"remove duplicates commas and dots - if any - replace them with single ones"
translated: replace consecutive [,.] as a single
instance
regex: replace (\.{2,}) with .
regex: replace (,{2,}) with ,
PHP Demo:
https://onlinephp.io/c/512e1
<?php
$subject = " # *lorem.jpg,,, ip sum.jpg,dolor ..jpg,-/ ?";
$firstStep = preg_replace('/^[^a-zA-Z0-9]*(.*?)[^a-zA-Z0-9]*$/', '$1', $subject);
$secondStep = preg_replace('/[^a-z,A-Z0-9.,]/', '', $firstStep);
$thirdStepA = preg_replace('(\.{2,})', '.', $secondStep);
$thirdStepB = preg_replace('(,{2,})', ',', $thirdStepA);
echo $thirdStepB; //lorem.jpg,ipsum.jpg,dolor.jpg
Look at
https://www.php.net/manual/en/function.preg-replace.php
It replace anything inside a string based on pattern. \s represent all space char, but care of NBSP (non breakable space, \h match it )
Exemple 4
$str = preg_replace('/\s\s+/', '', $str);
It will be something like that
Can you try this :
$string = ' # *lorem.jpg,,,, ip sum.jpg,dolor .jpg,-/ ?';
// this will left only alphanumirics
$result = preg_replace("/[^A-Za-z0-9,.]/", '', $string);
// this will remove duplicated dot and ,
$result = preg_replace('/,+/', ',', $result);
$result = preg_replace('/\.+/', '.', $result);
// this will remove ,;. and space from the end
$result = preg_replace("/[ ,;.]*$/", '', $result);
I am looking to replace 4 spaces at the start of a line to tabs, but nothing further when there is text present.
My initial regex of / {4}+/ or /[ ]{4}+/ for the sake of readability clearly worked but obviously any instance found with four spaces would be replaced.
$string = ' this is some text --> <-- are these tabs or spaces?';
$string .= "\n and this is another line singly indented";
// I wrote 4 spaces, a tab, then 4 spaces here but unfortunately it will not display
$string .= "\n \t and this is third line with tabs and spaces";
$pattern = '/[ ]{4}+/';
$replace = "\t";
$new_str = preg_replace( $pattern , $replace , $string );
echo '<pre>'. $new_str .'</pre>';
This was an example of what I had originally, using the regex given the expression works perfectly with regards to the conversion but for the fact that the 4 spaces between the ----><---- are replaced by a tab. I am really looking to have text after indentation unaltered.
My best effort so far has been (^) start of line ([ ]{4}+) the pattern (.*?[;\s]*) anything up til the first non space \s
$pattern = '/^[ ]{4}+.*?[;\s]*/m';
which... almost works but for the fact that the indentation is now lost, can anybody help me understand what I am missing here?
[edit]
For clarity what I am trying to do is change the the start of text indentation from spaces to tabs, I really don't understand why this is confusing to anybody.
To be as clear as possible (using the value of $string above):
First line has 8 spaces at the start, some text with 4 spaces in the middle.
I am looking for 2 tabs at the start and no change to spaces in the text.
Second line has 4 spaces at the start.
I am looking to have only 1 tab at the start of the line.
Third line has 4 spaces, 1 tab and 4 spaces.
I am looking to have 3 tabs at the start of the line.
If you're not a regular expression guru, this will probably make most sense to you and be easier to adapt to similar use cases (this is not the most efficient code, but it's the most "readable" imho):
// replace all regex matches with the result of applying
// a given anonymous function to a $matches array
function tabs2spaces($s_with_spaces) {
// before anything else, replace existing tabs with 4 spaces
// to permit homogenous translation
$s_with_spaces = str_replace("\t", ' ', $s_with_spaces);
return preg_replace_callback(
'/^([ ]+)/m',
function ($ms) {
// $ms[0] - is full match
// $ms[1] - is first (...) group fron regex
// ...here you can add extra logic to handle
// leading spaces not multiple of 4
return str_repeat("\t", floor(strlen($ms[1]) / 4));
},
$s_with_spaces
);
}
// example (using dots to make spaces visible for explaining)
$s_with_spaces = <<<EOS
no indent
....4 spaces indent
........8 spaces indent
EOS;
$s_with_spaces = str_replace('.', ' ');
$s_with_tabs = tabs2spaces($s_with_spaces);
If you want a performant but hard to understand or tweak one-liner instead, the solutions in the comments from the regex-gurus above should work :)
P.S. In general preg_replace_callback (and its equivalent in Javascript) is a great "swiss army knife" of structured text processing. I have, shamefully, even writtent parsers to mini-languages using it ;)
The way I would do it is this.
$str = "...";
$pattern = "'/^[ ]{4}+/'";
$replace = "\t";
$multiStr = explode("\n", $str);
$out = "";
foreach ($multiStr as &$line) {
$line = str_replace("\t", " ",$line);
$out .= preg_replace( $pattern , $replace , $line )
}
$results = implode("\n", $out);
Please re-evaluate the code thoroughly as I have done this on a quick and intuitive way.
As I can't run a PHP server to test it :( but should help you resolved this problem.
I need to remove all square brackets from a string and keep the string. I've been looking around but all topic OP's want to replace the string with something.
So: [[link_to_page]]
should become: link_to_page
I think I should use php regex, can someone assist me?
Thanks in advance
You can simply use a str_replace.
$string = str_replace(array('[[',']]'),'',$string);
But this would get a '[[' without a ']]' closure. And a ']]' without a '[[' opening.
It's not entirely clear what you want - but...
If you simply want to "remove all square brackets" without worrying about pairing/etc then a simple str_replace will do it:
str_replace( array('[',']') , '' , $string )
That is not (and doesn't need to be) a regex.
If you want to unwrap paired double brackets, with unknown contents, then a regex replace is what you want, which uses preg_replace instead.
Since [ and ] are metacharacters in regex, they need to be escaped with a backslash.
To match all instances of double-bracketed text, you can use the pattern \[\[\w+\[\] and to replace those brackets you can put the contents into a capture group (by surrounding with parentheses) and replace all instances like so:
$output = preg_replace( '/\[\[(\w+)\[\]/' , '$1' , $string );
The \w matches any alphanumeric or underscore - if you want to allow more/less characters it can be updated, e.g. \[\[([a-z\-_]+)\[\] or whatever makes sense.
If you want to act on the contents of the square brackets, see the answer by fluminis.
You can use preg_replace:
$repl = preg_replace('/(\[|\]){2}/', '', '[[link_to_page]]');
OR using str_replace:
$repl = str_replace(array('[[', ']]'), '', '[[link_to_page]]');
If you want only one match :
preg_match('/\[\[([^\]]+)\]\]/', $yourText, $matches);
echo $matches[1]; // will echo link_to_page
Or if you want to extract all the link from a text
preg_match_all('/\[\[([^\]]+)\]\]/', $yourText, $matches);
foreach($matches as $link) {
echo $link[1];
}
How to read '/\[\[([^\]]+)\]\]/'
/ start the regex
\[\[ two [ characters but need to escape them because [ is a meta caracter
([^\]]+) get all chars that are not a ]
\]\] two ] characters but need to escape them because ] is a meta caracter
/ end the regex
Try
preg_replace(/(\[\[)|(\]\])/, '', $string);
i was looking for a way to remove excess whitespaces from within a string (that is, if 2 or more spaces are next each other, leave only 1 and remove the others), i found this Remove excess whitespace from within a string and i wanted to use this solution:
$foo = preg_replace( '/\s+/', ' ', $foo );
but this removes new lines aswell, while i want to keep them.
Is there any way to keep newlines while removing excess whitespace?
http://www.php.net/manual/en/regexp.reference.escape.php
defines \h
any horizontal whitespace character (since PHP 5.2.4)
so probably you are looking for
$foo = preg_replace( '/\h+/', ' ', $foo );
example: http://ideone.com/NcOiKW
If some of your symbols were converted to � after preg_replace (for example, Cyrillic capital letter R / Р), use mb_ereg_replace instead of preg_replace:
$value = mb_ereg_replace('/\h+/', ' ', $value);
if you want to remove excess of only-spaces (not tabs, new-lines, etc) you could use HEX code to be more specific:
$text = preg_replace('/\x20+/', ' ', $text);
I have a set of strings like this:
Pants [+$50]
Shirts [+$10]
Jeans [+$5]
Jackets [+$100]
How can I remove the ' [xxx]' in these lines and leaving just the item name (without the trailing space)? I was told to define a regular expression, not sure how that works...
That's actually a bit of a confusing regex, since [ and ] are special characters:
$str = 'Pants [+$50]';
$str = rtrim(preg_replace('/\[[^\]]*\]/', '', $str));
// 'Pants'
Basically the partern \[[^\]]*\] means to match a literal [ followed by 0 or more characters that are not ] followed by a ]. The second string in preg_replace is what it gets replaced with. In this case the empty string since we want to remove it. Then we use rtrim to trim any trailing whitespace.
Try this one:
The RegEx
(?im)[ \t]*\[[^\]\[]+\][ \t]*$
Code
$result = preg_replace('/^(.+?)[ \t]*\[[^\][]+\][ \t]*$/im', '$1', $subject);