Converting entire source code from tabs to 4 spaces - php

I just realized after committing the CakePHP source to GitHub that they're now using tabs to indent code rather than four spaces. They also define this in the .editorconfig file, which I've changed to this:
root = true
[*]
indent_style = space
indent_size = 4
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
Is there a way to run through the entire source code and safely convert all tabs to four spaces for indentation? My reasoning is every developer on the repo uses four spaces and mixing and matching will cause the code to look out of place when looking at it on GitHub. And I'm just a fan of consistency :)
If I'm going down the home-brew way and writing my own script for this, I don't really mind what language although I'm more confident in PHP (not the best suited for the job, I know). Is this as simple as doing a preg_replace('~\t~', ' ', $fileText) on each file?

Try this in the directory you wish to execute it in:
find ./ -type f -exec sed -i 's/\t/XXXX/g' {} \;
That should replace the tabs with 4 spaces (if you replace the X's with spaces).
Adjust the space between t/ and /g with however many spaces you want…just get rid of the X's and put spaces in there.

A straight replacement of tabs with spaces will result in misalignment when tabs follow space characters that encroach on that tab region.
A basic python script which makes use of the expandtabs() string method will result in code looking the same as when it was conceived. Example is for a tab space of 4:
#!/usr/bin/python
#
# convert source code or text with spaces, being careful to align text as it was conceived
# with the original tab space settings, which is defaulted to 4 spaces per tab.
#
# usage:
# ./tabs2spaces.py <file_to_convert>
import os
import sys
spaces_per_tab = 4
argc = len( sys.argv )
if argc < 2:
print 'no file argument specified'
filename = sys.argv[ 1 ]
old_filename = 'old_' + filename
os.rename( filename, old_filename )
fn = open( filename, 'wb' )
fo = open( old_filename, 'r' )
for line in fo:
fn.write( line.expandtabs( spaces_per_tab ) )
fn.close()
fo.close()

Not sure if you have access to or already use Sublime Text 2, but it can automatically convert all the tabs to spaces for you:
How to replace four spaces with a tab in Sublime Text 2?

Related

PhpStorm 2016.2 find and replace multiline text

In PhpStorm 2016.2 I have a new project that has been inherited and [badly] needs updating.
There are many pages each with opening line like so (example):
<?
include ("/inc/db.php");
I need to replace this line with several lines such as:
<?php
include "siteheader.php";
require "class.myclass.inc.php";
$dataBase = new DbObj();
I have previously simply copy and pasted multiline code into the PhpStorm search/replace function and that's (usually but not always) returned the correct changes, although they're all squished into single lines, making them harder to read (EOL characters are removed).
In this instance am looking specifically at the "replace in path" function as I need to apply this change to many pages.
I have Read the manual but can see no option for this. I think I could possibly use a Regular Expression but this would not be ideal (escapings etc.).
I have also looked but not found a suitable plugin from the PhpStorm Plugin Repository.
Is there a way of searching and/or replacing multiline text in path in PhpStorm 2016.2?
Cheers
There is no easy to use multi-line search or replace across multiple files (Find/Replace in Path functionality) unfortunately.
Right now you have to use Regex option for that -- that's the only option that works.
Watch these tickets (star/vote/comment) to get notified on any progress in this regard.
https://youtrack.jetbrains.com/issue/IDEA-69435
https://youtrack.jetbrains.com/issue/IDEA-61925
https://youtrack.jetbrains.com/issue/IDEA-145720
Manually making regex-compatible text can be quite problematic .. therefore you might use this few-steps trick:
Type your new text in one file to start with
Select such text and invoke Replace in Path... dialog -- with Regex option pre-selected it should automatically escape your selection to be regex-compatible
Copy that already-escaped text somewhere (just Clipboard should be enough)
Close dialog and go back to original file
Select text you want to replace and invoke Replace in Path... dialog -- it will have your initial text already filled in and regex compatible
Paste previously copied escaped text into Replace field
Execute find/replacement
On related note: https://stackoverflow.com/a/38672886/783119
You can do multiline Find&Replace with Regex option turned on
Find:
<\?\ninclude \("/inc/db\.php"\);
Replace:
<?php\ninclude "siteheader.php"; \nrequire "class.myclass.inc.php"; \n\$dataBase = new DbObj();
As you can see you need to do some additional work to escape some special characters and put \n instead of new lines, but it works. I've just checked.
P.S.
Indeed, it was possible to simply paste multiline text in previous versions, but it's not possible anymore. ;-(
Type Alt+Enter to add a new line in either the "search" or the "replace" field.
On a Mac:
open the 'Find' or 'Replace' tool, click into the text area and press the following keys once for every new line you want to create:
⌘ + 'Shift' + 'Enter'
Besides the suggestions on how to use regex for multiline, in case you want to match two pieces of code with arbitrary lines in the middle, you can use [\s\S]* instead of [\n.]* (which doesn't have the expected result). Example:
//you can match the $result-related code using `\$result([\s\S]*)while`
$result = DB::exec($query);
//blabla
//something else
while ($row = $result->fetch()) {
\s works as expected to match all whitespaces and newlines.
In my case I wanted to find switch ... case ... continue; syntax, so switch(\s|.)*continue worked as expected

Pushing to github - strange white spaces

When I push my code into github repo and display random files, they contain some strange whitespaces, e.g.:
// 'auth' => MPATH.'auth', // Authentication module
// 'database' => MPATH.'database', // Database access
In my IDE the code is perfectly lined up, on github - it behaves like above, in totally random places. Is there any way to fix this?
I use tabs for indents.
I would suggest using spaces over tabs going forward. You can set your editor to input 2, 4 or how many spaces each time you hit Tab.I believe this will save you much headache, because a space is always exactly the same width, whereas tab width can always change depending on context.
For now you can convert the tabs like this
expand -t2 foo-tabs.php > foo-spaces.php

How to remove first whitespace of a string

I am building a new version of a telephone configuration manager where I am sucking on a stupid problem. You see these telephone .cfg configurations are rely static. So in the old version I made it gave the configuration without a problem.
It looks like this:
## Configuration header
configuration_1="parram"
configuration_2="parram"
configuration_3="parram"
etc.
Now in the new version the configuration is given as this:
whitespace
## Configuration header
configuration_1="parram"
configuration_2="parram"
configuration_3="parram"
Note that white space is actually white space and that the phone does not take the configuration, because it wants to see the first line have the #header.
So I figured that the easy way to fix this is to just backspace the first white line but how. How can I tell PHP to delete the first line?
OK, look at this: image
The first to screenshots are from phpMyAdmin where you see that inside an textarea there is no white space, but when just echoing it out you suddenly see it. The strange thing is that when manually changing the configuration with phpMyAdmin it is removed somehow, but it has be done automatically.
If you have the contents as a string, just run ltrim.
It will strip away all the whitespaces from the starting of the string.
$str = ltrim($str);
That is how to remove only the first whitespace:
$s = ' Text';
$arr = str_split($s);
array_shift($arr);
$s = implode('', $arr);
die($s);
If you got this configuration in a string, as your title says, you can just trim the string.
$config = trim($config);
I would delete up to the first hash -
$contents = substr($contents, 0, strpos($contents, "#") - 1)
You can use the PHP function trim.
Put the following code on the first line of the file:
ob_start();
Put the following code just before you displaying the content:
ob_end_clean();

Perl Regex: separate by spaces and tabs but avoid spaces on filenames?

I'm parsing the output of a command line application that looks like the following:
0644 1276317623781623132132 Crappy little message filename.txt
0644 1276317623781623132132 Crappy little message My File.txt
0644 1276317623781623132132 Crappy little message Crazy FILE.txt
Sometimes fields are spaced by tabs, sometimes by spaces. How can I write a Regex to separate the fields? I was using preg_split with [\s]+, but this messes up the message and file names. I'm pretty lost here.
Solution is to build a more specific regex to match:
For example, assuming the last one is a tab, you can hit with:
You can split using
preg_match('/^([0-9]{4}).*([0-9]{22})[\s]*([^\t]*)[\s]*(.*)$/', $string, $aMatches);
You can vary that to match your needs if the example above fluctuates. Or the last is not a tab but a bunch of spaces, then look for the required number of spaces etc etc.

Replace all "\" characters which are *not* inside "<code>" tags

First things first: Neither this, this, this nor this answered my question. So I'll open a new one.
Please read
Okay okay. I know that regexes are not the way to parse general HTML. Please take note that the created documents are written using a limited, controlled HTML subset. And people writing the docs know what they're doing. They are all IT professionals!
Given the controlled syntax it is possible to parse the documents I have here using regexes.
I am not trying to download arbitrary documents from the web and parse them!
And if the parsing does fail, the document is edited, so it'll parse. The problem I am addressing here is more general than that (i.e. not replace patterns inside two other patterns).
A little bit of background (you can skip this...)
In our office we are supposed to "pretty print" our documentation. Hence why some came up with putting it all into Word documents. So far we're thankfully not quite there yet. And, if I get this done, we might not need to.
The current state (... and this)
The main part of the docs are stored in a TikiWiki database. I've created a daft PHP script which converts the documents from HTML (via LaTeX) to PDF. One of the must have features of the selected Wiki-System was a WYSIWYG editor. Which, as expected leaves us with documents with a less then formal DOM.
Consequently, I am transliterating the document using "simple" regexes. It all works (mostly) fine so far, but I encountered one problem I haven't figured out on my own yet.
The problem
Some special characters need to replaced by LaTeX markup. For exaple, the \ character should be replaced by $\backslash$ (unless someone knows another solution?).
Except while in a verbatim block!
I do replace <code> tags with verbatim sections. But if this code block contains backslashes (as is the case for Windows folder names), the script still replaces these backslashes.
I reckon I could solve this using negative LookBehinds and/or LookAheads. But my attempts did not work.
Granted, I would be better off with a real parser. In fact, it is something on my "in-brain-roadmap", but it is currently out of the scope. The script works well enough for our limited knowledge domain. Creating a parser would require me to start pretty much from scratch.
My attempt
Example Input
The Hello \ World document is located in:
<code>C:\documents\hello_world.txt</code>
Expected output
The Hello $\backslash$ World document is located in:
\begin{verbatim}C:\documents\hello_world.txt\end{verbatim}
This is the best I could come up with so far:
<?php
$patterns = array(
"special_chars2" => array( '/(?<!<code[^>]*>.*)\\\\[^$](?!.*<\/code>)/U', '$\\backslash$'),
);
foreach( $patterns as $name => $p ){
$tex_input = preg_replace( $p[0], $p[1], $tex_input );
}
?>
Note that this is only an excerpt, and the [^$] is another LaTeX requirement.
Another attempt which seemed to work:
<?php
$patterns = array(
"special_chars2" => array( '/\\\\[^$](?!.*<\/code>)/U', '$\\backslash$'),
);
foreach( $patterns as $name => $p ){
$tex_input = preg_replace( $p[0], $p[1], $tex_input );
}
?>
... in other words: leaving out the negative lookbehind.
But this looks more error-prone than with both lookbehind and lookahead.
A related question
As you may have noticed, the pattern is ungreedy (/.../U). So will this match only as little possible inside a <code> block? Considering the look-arounds?
If me, I will try to find HTML parser and will do with that.
Another option is will try to chunk the string into <code>.*?</code> and other parts.
and will update other parts, and will recombine it.
$x="The Hello \ World document is located in:\n<br>
<code>C:\documents\hello_world.txt</code>";
$r=preg_split("/(<code>.*?<\/code>)/", $x,-1,PREG_SPLIT_DELIM_CAPTURE);
for($i=0;$i<count($r);$i+=2)
$r[$i]=str_replace("\\","$\\backslash$",$r[$i]);
$x=implode($r);
echo $x;
Here is the results.
The Hello $\backslash$ World document is located in:
C:\documents\hello_world.txt
Sorry, If my approach is not suitable for you.
I reckon I could solve this using negative LookBehinds and/or LookAheads.
You reckon wrong. Regular expressions are not a replacement for a parser.
I would suggest that you pipe the html through htmltidy, then read it with a dom-parser and then transform the dom to your target output format. Is there anything preventing your from taking this route?
Parser FTW, ok. But if you can't use a parser, and you can be certain that <code> tags are never nested, you could try the following:
Find <code>.*?</code> sections of your file (probably need to turn on dot-matches-newlines mode).
Replace all backslashes inside that section with something unique like #?#?#?#
Replace the section found in 1 with that new section
Replace all backslashes with $\backslash$
Replace als <code> with \begin{verbatim} and all </code> with \end{verbatim}
Replace #?#?#?# with \
FYI, regexes in PHP don't support variable-length lookbehind. So that makes this conditional matching between two boundaries difficult.
Pandoc? Pandoc converts between a bunch of formats. you can also concatenate a bunch of flies together then covert them. Maybe a few shell scripts combined with your php scraping scripts?
With your "expected input" and the command pandoc -o text.tex test.html the output is:
The Hello \textbackslash{} World document is located in:
\verb!C:\documents\hello_world.txt!
pandoc can read from stdin, write to stdout or pipe right into a file.
Provided that your <code> blocks are not nested, this regex would find a backslash after ^ start-of-string or </code> with no <code> in between.
((?:^|</code>)(?:(?!<code>).)+?)\\
| | |
| | \-- backslash
| \-- least amount of anything not followed by <code>
\-- start-of-string or </code>
And replace it with:
$1$\backslash$
You'd have to run this regex in "singleline" mode, so . matches newlines. You'd also have to run it multiple times, specifying global replacement is not enough. Each replacement will only replace the first eligible backslash after start-of-string or </code>.
Write a parser based on an HTML or XML parser like DOMDocument. Traverse the parsed DOM and replace the \ on every text node that is not a descendent of a code node with $\backslash$ and every node that is a code node with \begin{verbatim} … \end{verbatim}.

Categories