Replacing space indentation with tabs

Replacing space indentation with tabs - php

I am looking to replace 4 spaces at the start of a line to tabs, but nothing further when there is text present.
My initial regex of / {4}+/ or /[ ]{4}+/ for the sake of readability clearly worked but obviously any instance found with four spaces would be replaced.
$string = ' this is some text --> <-- are these tabs or spaces?';
$string .= "\n and this is another line singly indented";
// I wrote 4 spaces, a tab, then 4 spaces here but unfortunately it will not display
$string .= "\n \t and this is third line with tabs and spaces";
$pattern = '/[ ]{4}+/';
$replace = "\t";
$new_str = preg_replace( $pattern , $replace , $string );
echo '<pre>'. $new_str .'</pre>';
This was an example of what I had originally, using the regex given the expression works perfectly with regards to the conversion but for the fact that the 4 spaces between the ----><---- are replaced by a tab. I am really looking to have text after indentation unaltered.
My best effort so far has been (^) start of line ([ ]{4}+) the pattern (.*?[;\s]*) anything up til the first non space \s
$pattern = '/^[ ]{4}+.*?[;\s]*/m';
which... almost works but for the fact that the indentation is now lost, can anybody help me understand what I am missing here?
[edit]
For clarity what I am trying to do is change the the start of text indentation from spaces to tabs, I really don't understand why this is confusing to anybody.
To be as clear as possible (using the value of $string above):
First line has 8 spaces at the start, some text with 4 spaces in the middle.
I am looking for 2 tabs at the start and no change to spaces in the text.
Second line has 4 spaces at the start.
I am looking to have only 1 tab at the start of the line.
Third line has 4 spaces, 1 tab and 4 spaces.
I am looking to have 3 tabs at the start of the line.

If you're not a regular expression guru, this will probably make most sense to you and be easier to adapt to similar use cases (this is not the most efficient code, but it's the most "readable" imho):
// replace all regex matches with the result of applying
// a given anonymous function to a $matches array
function tabs2spaces($s_with_spaces) {
// before anything else, replace existing tabs with 4 spaces
// to permit homogenous translation
$s_with_spaces = str_replace("\t", ' ', $s_with_spaces);
return preg_replace_callback(
'/^([ ]+)/m',
function ($ms) {
// $ms[0] - is full match
// $ms[1] - is first (...) group fron regex
// ...here you can add extra logic to handle
// leading spaces not multiple of 4
return str_repeat("\t", floor(strlen($ms[1]) / 4));
},
$s_with_spaces
);
}
// example (using dots to make spaces visible for explaining)
$s_with_spaces = <<<EOS
no indent
....4 spaces indent
........8 spaces indent
EOS;
$s_with_spaces = str_replace('.', ' ');
$s_with_tabs = tabs2spaces($s_with_spaces);
If you want a performant but hard to understand or tweak one-liner instead, the solutions in the comments from the regex-gurus above should work :)
P.S. In general preg_replace_callback (and its equivalent in Javascript) is a great "swiss army knife" of structured text processing. I have, shamefully, even writtent parsers to mini-languages using it ;)

The way I would do it is this.
$str = "...";
$pattern = "'/^[ ]{4}+/'";
$replace = "\t";
$multiStr = explode("\n", $str);
$out = "";
foreach ($multiStr as &$line) {
$line = str_replace("\t", " ",$line);
$out .= preg_replace( $pattern , $replace , $line )
}
$results = implode("\n", $out);
Please re-evaluate the code thoroughly as I have done this on a quick and intuitive way.
As I can't run a PHP server to test it :( but should help you resolved this problem.

Related

Regex not working with white spaces [duplicate]

I know this comment PHP.net.
I would like to have a similar tool like tr for PHP such that I can run simply
tr -d " " ""
I run unsuccessfully the function php_strip_whitespace by
$tags_trimmed = php_strip_whitespace($tags);
I run the regex function also unsuccessfully
$tags_trimmed = preg_replace(" ", "", $tags);

To strip any whitespace, you can use a regular expression
$str=preg_replace('/\s+/', '', $str);
See also this answer for something which can handle whitespace in UTF-8 strings.

A regular expression does not account for UTF-8 characters by default. The \s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines
// http://stackoverflow.com/a/1279798/54964
$str=preg_replace('/\s+/', '', $str);
With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the \s cannot account for.
To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.
Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters (\x80 in the quad set could replace all \x80 sub-bytes in smart quotes)
$cleanedstr = preg_replace(
"/(\t|\n|\v|\f|\r| |\xC2\x85|\xc2\xa0|\xe1\xa0\x8e|\xe2\x80[\x80-\x8D]|\xe2\x80\xa8|\xe2\x80\xa9|\xe2\x80\xaF|\xe2\x81\x9f|\xe2\x81\xa0|\xe3\x80\x80|\xef\xbb\xbf)+/",
"_",
$str
);
This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:
nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.
Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as "textually transmitted diseases"
[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]

Sometimes you would need to delete consecutive white spaces. You can do it like this:
$str = "My name is";
$str = preg_replace('/\s\s+/', ' ', $str);
Output:
My name is

$string = str_replace(" ", "", $string);
I believe preg_replace would be looking for something like [:space:]

You can use trim function from php to trim both sides (left and right)
trim($yourinputdata," ");
Or
trim($yourinputdata);
You can also use
ltrim() - Removes whitespace or other predefined characters from the left side of a string
rtrim() - Removes whitespace or other predefined characters from the right side of a string
System: PHP 4,5,7
Docs: http://php.net/manual/en/function.trim.php

If you want to remove all whitespaces everywhere from $tags why not just:
str_replace(' ', '', $tags);
If you want to remove new lines and such that would require a bit more...

Any possible option is to use custom file wrapper for simulating variables as files. You can achieve it by using this:
1) First of all, register your wrapper (only once in file, use it like session_start()):
stream_wrapper_register('var', VarWrapper);
2) Then define your wrapper class (it is really fast written, not completely correct, but it works):
class VarWrapper {
protected $pos = 0;
protected $content;
public function stream_open($path, $mode, $options, &$opened_path) {
$varname = substr($path, 6);
global $$varname;
$this->content = $$varname;
return true;
}
public function stream_read($count) {
$s = substr($this->content, $this->pos, $count);
$this->pos += $count;
return $s;
}
public function stream_stat() {
$f = fopen(__file__, 'rb');
$a = fstat($f);
fclose($f);
if (isset($a[7])) $a[7] = strlen($this->content);
return $a;
}
}
3) Then use any file function with your wrapper on var:// protocol (you can use it for include, require etc. too):
global $__myVar;
$__myVar = 'Enter tags here';
$data = php_strip_whitespace('var://__myVar');
Note: Don't forget to have your variable in global scope (like global $__myVar)

This is an old post but the shortest answer is not listed here so I am adding it now
strtr($str,[' '=>'']);
Another common way to "skin this cat" would be to use explode and implode like this
implode('',explode(' ', $str));

You can do it by using ereg_replace
$str = 'This Is New Method Ever';
$newstr = ereg_replace([[:space:]])+', '', trim($str)):
echo $newstr
// Result - ThisIsNewMethodEver

you also use preg_replace_callback function . and this function is identical to its sibling preg_replace except for it can take a callback function which gives you more control on how you manipulate your output.
$str = "this is a string";
echo preg_replace_callback(
'/\s+/',
function ($matches) {
return "";
},
$str
);

$string = trim(preg_replace('/\s+/','',$string));

Is old post but can be done like this:
if(!function_exists('strim')) :
function strim($str,$charlist=" ",$option=0){
$return='';
if(is_string($str))
{
// Translate HTML entities
$return = str_replace(" "," ",$str);
$return = strtr($return, array_flip(get_html_translation_table(HTML_ENTITIES, ENT_QUOTES)));
// Choose trim option
switch($option)
{
// Strip whitespace (and other characters) from the begin and end of string
default:
case 0:
$return = trim($return,$charlist);
break;
// Strip whitespace (and other characters) from the begin of string
case 1:
$return = ltrim($return,$charlist);
break;
// Strip whitespace (and other characters) from the end of string
case 2:
$return = rtrim($return,$charlist);
break;
}
}
return $return;
}
endif;
Standard trim() functions can be a problematic when come HTML entities. That's why i wrote "Super Trim" function what is used to handle with this problem and also you can choose is trimming from the begin, end or booth side of string.

A simple way to remove spaces from the whole string is to use the explode function and print the whole string using a for loop.
$text = $_POST['string'];
$a=explode(" ", $text);
$count=count($a);
for($i=0;$i<$count; $i++){
echo $a[$i];
}

The \s regex argument is not compatible with UTF-8 multybyte strings.
This PHP RegEx is one I wrote to solve this using PCRE (Perl Compatible Regular Expressions) based arguments as a replacement for UTF-8 strings:
function remove_utf8_whitespace($string) {
return preg_replace('/\h+/u','',preg_replace('/\R+/u','',$string));
}
- Example Usage -
Before:
$string = " this is a test \n and another test\n\r\t ok! \n";
echo $string;
this is a test
and another test
ok!
echo strlen($string); // result: 43
After:
$string = remove_utf8_whitespace($string);
echo $string;
thisisatestandanothertestok!
echo strlen($string); // result: 28
PCRE Argument Listing
Source: https://www.rexegg.com/regex-quickstart.html
Character Legend Example Sample Match
\t Tab T\t\w{2} T ab
\r Carriage return character see below
\n Line feed character see below
\r\n Line separator on Windows AB\r\nCD AB
CD
\N Perl, PCRE (C, PHP, R…): one character that is not a line break \N+ ABC
\h Perl, PCRE (C, PHP, R…), Java: one horizontal whitespace character: tab or Unicode space separator
\H One character that is not a horizontal whitespace
\v .NET, JavaScript, Python, Ruby: vertical tab
\v Perl, PCRE (C, PHP, R…), Java: one vertical whitespace character: line feed, carriage return, vertical tab, form feed, paragraph or line separator
\V Perl, PCRE (C, PHP, R…), Java: any character that is not a vertical whitespace
\R Perl, PCRE (C, PHP, R…), Java: one line break (carriage return + line feed pair, and all the characters matched by \v)

There are some special types of whitespace in the form of tags.
You need to use
$str=strip_tags($str);
to remove redundant tags, error tags, to get to a normal string first.
And use
$str=preg_replace('/\s+/', '', $str);
It's work for me.

replace multiple spaces, tabs and newlines into one space except commented text

I need replace multiple spaces, tabs and newlines into one space except commented text in my html.
For example the following code:
<br/> <br>
<!--
this is a comment
-->
<br/> <br/>
should turn into
<br/><br><!--
this is a comment
--><br/><br/>
Any ideas?

The new solution
After thinking a bit, I came up with the following solution with pure regex. Note that this solution will delete the newlines/tabs/multi-spaces instead of replacing them:
$new_string = preg_replace('#(?(?!<!--.*?-->)(?: {2,}|[\r\n\t]+)|(<!--.*?-->))#s', '$1', $string);
echo $new_string;
Explanation
(? # If
(?!<!--.*?-->) # There is no comment
(?: {2,}|[\r\n\t]+) # Then match 2 spaces or more, or newlines or tabs
| # Else
(<!--.*?-->) # Match and group it (group #1)
) # End if
So basically when there is no comment it will try to match spaces/tabs/newlines. If it does find it then group 1 wouldn't exist and there will be no replacements (which will result into the deletion of spaces...). If there is a comment then the comment is replaced by the comment (lol).
Online demo
The old solution
I came up with a new strategy, this code require PHP 5.3+:
$new_string = preg_replace_callback('#(?(?!<!--).*?(?=<!--|$)|(<!--.*?-->))#s', function($m){
if(!isset($m[1])){ // If group 1 does not exist (the comment)
return preg_replace('#\s+#s', ' ', $m[0]); // Then replace with 1 space
}
return $m[0]; // Else return the matched string
}, $string);
echo $new_string; // Output
Explaining the regex:
(? # If
(?!<!--) # Lookahead if there is no <!--
.*? # Then match anything (ungreedy) until ...
(?=<!--|$) # Lookahead, check for <!-- or end of line
| # Or
(<!--.*?-->) # Match and group a comment, this will make for us a group #1
)
# The s modifier is to match newlines with . (dot)
Online demo
Note: What you are asking and what you have provided as expected output are a bit contradicting. Anyways if you want to remove instead of replacing by 1 space, then just edit the code from '#\s+#s', ' ', $m[0] to '#\s+#s', '', $m[0].

It's much simpler to do this in several runs (as is done for instance in php markdown).
Step1: preg_replace_callback() all comments with something unique while keeping their original values in a keyed array -- ex: array('comment_placeholder:' . md5('comment') => 'comment', ...)
Step2: preg_replace() white spaces as needed.
Step3: str_replace() comments back where they originally were using the keyed array.
The approach you're leaning towards (splitting the string and only processing the non-comment parts) works fine too.
There almost certainly is a means to do this with pure regex, using ugly look-behinds, but not really recommended: the regex might yield backtracking related errors, and the comment replacement step allows you to process things further if needed without worrying about the comments themselves.

I’d do the following:
split the input into comment and non-comment parts
do replacement on the non-comment parts
put everything back together
Example:
$parts = preg_split('/(<!--(?:(?!-->).)*-->)/s', $input, -1, PREG_SPLIT_DELIM_CAPTURE);
foreach ($parts as $i => &$part) {
if ($i % 2 === 0) {
// non-comment part
$part = preg_replace('/\s+/', ' ', $part);
} else {
// comment part
}
}
$output = implode('', $parts);

You can use this:
$pattern = '~\s*+(<br[^>]*>|<!--(?>[^-]++|-(?!->))*-->)\s*+~';
$replacement = '$1';
$result = preg_replace($pattern, $replacement, $subject);
This pattern captures br tags and comments, and matches spaces around. Then it replaces the match by the capture group.

How to replace new lines by regular expressions

How can I set any quantity of new lines with a regular expression?
$var = "<p>some text</p><p>another text</p><p>more text</p>";
$search = array("</p>\s<p>");
$replace = array("</p><p>");
$var = str_replace($search, $replace, $var);
I need to remove every new line (\n), not <br/>, between two paragraphs.

To begin with, str_replace() (which you referenced in your original question) is used to find a literal string and replace it. preg_replace() is used to find something that matches a regular expression and replace it.
In the following code sample I use \s+ to find one or more occurrences of white space (new line, tab, space...). \s is whitespace, and the + modifier means one or more of the previous thing.
<?php
// Test string with white space and line breaks between paragraphs
$var = "<p>some text</p> <p>another text</p>
<p>more text</p>";
// Regex - Use ! as end holders, so that you don't have to escape the
// forward slash in '</p>'. This regex looks for an end P then one or more (+)
// whitespaces, then a begin P. i refers to case insensitive search.
$search = '!</p>\s+<p>!i';
// We replace the matched regex with an end P followed by a begin P w no
// whitespace in between.
$replace = '</p><p>';
// echo to test or use '=' to store the results in a variable.
// preg_replace returns a string in this case.
echo preg_replace($search, $replace, $var);
?>
Live Example

I find it odd to have huge HTML strings, and then using some string search and replace hack to format that afterwards...
When constructing HTML with PHP, I like using arrays:
$htmlArr = array();
foreach ($dataSet as $index => $data) {
$htmlArr[] = '<p>Line#'.$index.' : <span>' . $data . '</span></p>';
}
$html = implode("\n", $htmlArr);
This way, every HTML line has its separate $htmlArr[] value. Moreover, if you need your HTML to be "pretty print", you can simply have some sort of method that will indent your HTML by prepending whitespaces at the beginning of every array elements depending on some rule set. For example, if we have:
$htmlArr = array(
'<ol>',
'<li>Item 1</li>',
'<li>Item 2</li>',
'<li>Item 3</li>',
'</ol>'
);
Then the formatting function algorithm would be (a very simple one, considering that the HTML is well constructed):
$indent = 0; // Initial indent
foreach & $value in $array
$open = Count how many opened elements
$closed = Count how many closed elements
$value = str_repeat(' ', $indent * TAB_SPACE) . $value;
$indent += $open - $closed; // Next line's indent
end foreach
return $array
Then implode("\n", $array) for the prettyfied HTML
After the question edit by Felix Kling, I realize that this has nothing to do with the question. Sorry about that :) Thanks though for the clarification.

Remove excess whitespace from within a string

I receive a string from a database query, then I remove all HTML tags, carriage returns and newlines before I put it in a CSV file. Only thing is, I can't find a way to remove the excess white space from between the strings.
What would be the best way to remove the inner whitespace characters?

Not sure exactly what you want but here are two situations:
If you are just dealing with excess whitespace on the beginning or end of the string you can use trim(), ltrim() or rtrim() to remove it.
If you are dealing with extra spaces within a string consider a preg_replace of multiple whitespaces " "* with a single whitespace " ".
Example:
$foo = preg_replace('/\s+/', ' ', $foo);

$str = str_replace(' ','',$str);
Or, replace with underscore, & nbsp; etc etc.

none of other examples worked for me, so I've used this one:
trim(preg_replace('/[\t\n\r\s]+/', ' ', $text_to_clean_up))
this replaces all tabs, new lines, double spaces etc to simple 1 space.

$str = trim(preg_replace('/\s+/',' ', $str));
The above line of code will remove extra spaces, as well as leading and trailing spaces.

If you want to replace only multiple spaces in a string, for Example: "this string have lots of space . "
And you expect the answer to be
"this string have lots of space", you can use the following solution:
$strng = "this string have lots of space . ";
$strng = trim(preg_replace('/\s+/',' ', $strng));
echo $strng;

There are security flaws to using preg_replace(), if you get the payload from user input [or other untrusted sources]. PHP executes the regular expression with eval(). If the incoming string isn't properly sanitized, your application risks being subjected to code injection.
In my own application, instead of bothering sanitizing the input (and as I only deal with short strings), I instead made a slightly more processor intensive function, though which is secure, since it doesn't eval() anything.
function secureRip(string $str): string { /* Rips all whitespace securely. */
$arr = str_split($str, 1);
$retStr = '';
foreach ($arr as $char) {
$retStr .= trim($char);
}
return $retStr;
}

$str = preg_replace('/[\s]+/', ' ', $str);

You can use:
$str = trim(str_replace(" ", " ", $str));
This removes extra whitespaces from both sides of string and converts two spaces to one within the string. Note that this won't convert three or more spaces in a row to one!
Another way I can suggest is using implode and explode that is safer but totally not optimum!
$str = implode(" ", array_filter(explode(" ", $str)));
My suggestion is using a native for loop or using regex to do this kind of job.

To expand on Sandip’s answer, I had a bunch of strings showing up in the logs that were mis-coded in bit.ly. They meant to code just the URL but put a twitter handle and some other stuff after a space. It looked like this
? productID =26%20via%20#LFS
Normally, that would‘t be a problem, but I’m getting a lot of SQL injection attempts, so I redirect anything that isn’t a valid ID to a 404. I used the preg_replace method to make the invalid productID string into a valid productID.
$productID=preg_replace('/[\s]+.*/','',$productID);
I look for a space in the URL and then remove everything after it.

I wrote recently a simple function which removes excess white space from string without regular expression implode(' ', array_filter(explode(' ', $str))).

Laravel 9.7 intruduced the new Str::squish() method to remove extraneous whitespaces including extraneous white space between words: https://laravel.com/docs/9.x/helpers#method-str-squish

$str = "I am a PHP Developer";
$str_length = strlen($str);
$str_arr = str_split($str);
for ($i = 0; $i < $str_length; $i++) {
if (isset($str_arr[$i + 1]) && $str_arr[$i] == ' ' && $str_arr[$i] == $str_arr[$i + 1]) {
unset($str_arr[$i]);
}
else {
continue;
}
}
echo implode("", $str_arr);

How do I replace tabs with spaces within variables in PHP?

$data contains tabs, leading spaces and multiple spaces. I wish to replace all tabs with a space. Multiple spaces with one single space, and remove leading spaces.
In fact somthing that would turn this input data:
[ asdf asdf asdf asdf ]
Into output data:
[asdf asdf asdf asdf]
How do I do this?

Trim, replace tabs and extra spaces with single spaces:
$data = preg_replace('/[ ]{2,}|[\t]/', ' ', trim($data));

$data = trim(preg_replace('/\s+/g', '', $data));

Assuming the square brackets aren't part of the string and you're just using them for illustrative purposes, then:
$new_string = trim(preg_replace('!\s+!', ' ', $old_string));
You might be able to do that with a single regex but it'll be a fairly complicated regex. The above is much more straightforward.
Note: I'm also assuming you don't want to replace "AB\t\tCD" (\t is a tab) with "AB CD".

$data = trim($data);
That gets rid of your leading (and trailing) spaces.
$pattern = '/\s+/';
$data = preg_replace($pattern, ' ', $data);
That turns any collection of one or more spaces into just one space.
$data = str_replace("\t", " ", $data);
That gets rid of your tabs.

$new_data = preg_replace("/[\t\s]+/", " ", trim($data));

This answer takes the question completely literally: it is only concerned with spaces and tabs. Granted, the OP probably also wants to include other kinds of whitespace in what gets trimmed/compressed, but let's pretend he wants to preserve embedded CR and/or LF.
First, let's set up some constants. This will allow for both ease of understanding and maintainability, should modifications become necessary. I put in some extra spaces so that you can compare the similarities and differences more easily.
define( 'S', '[ \t]+' ); # Stuff you want to compress; in this case ONLY spaces/tabs
define( 'L', '/\A'.S.'/' ); # stuff on the Left edge will be trimmed
define( 'M', '/'.S.'/' ); # stuff in the Middle will be compressed
define( 'R', '/'.S.'\Z/' ); # stuff on the Right edge will be trimmed
define( 'T', ' ' ); # what we want the stuff compressed To
We are using \A and \Z escape characters to specify the beginning and end of the subject, instead of the typical ^ and $ which are line-oriented meta-characters. This is not so much because they are needed in this instance as much as "defensive" programming, should the value of S change to make them needed in the future.
Now for the secret sauce: we are going to take advantage of some special semantics of preg_replace, namely (emphasis added)
If there are fewer elements in the replacement array than in the pattern array, any extra patterns will be replaced by an empty string.
function trim_press( $data ){
return preg_replace( [ M, L, R ], [ T ], $data );
}
So instead of a pattern string and replacement string, we are using a pattern array and replacement array, which results in the extra patterns L and R being trimmed.

In case you need to remove too.
$data = trim(preg_replace('/\s+|nbsp;/g', '', $data));

After much frustration I found this to be the best solution, as it also removes non breaking spaces which can be two characters long:
$data = html_entity_decode(str_replace(' ',' ',htmlentities($data)));
$data = trim(preg_replace('/\h/', ' ', $data)); // replaces more space character types than \s
See billynoah

Just use this regex
$str = trim(preg_replace('/\s\s+/', ' ', $str));
it will replace all tabs and spaces by one space,
here sign + in regex means one or more times,
pattern means, that wherever there are two or more spaces, replace it by one space

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Replacing space indentation with tabs - php

Related

Regex not working with white spaces [duplicate]

replace multiple spaces, tabs and newlines into one space except commented text

How to replace new lines by regular expressions

Remove excess whitespace from within a string

How do I replace tabs with spaces within variables in PHP?

Categories

Resources