Perl Regex: separate by spaces and tabs but avoid spaces on filenames? - php

I'm parsing the output of a command line application that looks like the following:
0644 1276317623781623132132 Crappy little message filename.txt
0644 1276317623781623132132 Crappy little message My File.txt
0644 1276317623781623132132 Crappy little message Crazy FILE.txt
Sometimes fields are spaced by tabs, sometimes by spaces. How can I write a Regex to separate the fields? I was using preg_split with [\s]+, but this messes up the message and file names. I'm pretty lost here.

Solution is to build a more specific regex to match:
For example, assuming the last one is a tab, you can hit with:
You can split using
preg_match('/^([0-9]{4}).*([0-9]{22})[\s]*([^\t]*)[\s]*(.*)$/', $string, $aMatches);
You can vary that to match your needs if the example above fluctuates. Or the last is not a tab but a bunch of spaces, then look for the required number of spaces etc etc.

Related

Easy way to find file with ?> at the end?

I am getting this error:
Getting Warning - Cannot modify header information
I'm 99% sure it's because of a file ending with ?> and then some white space after that.
My problem is, I have looked at 15 possible files, but there are hundreds more to check. Is there an easy linux command to find the files ending with ?> and some whitespace after it? Or perhaps is there another way you guys solve this?
You are facing a EOF problem.
The whitespace at the end of the file its breaking your program, you need to find all the end of file occurrences with ?>(whitespace).
You can use a regex expression with a project finder tool, the regex would be: (?> )\z.
The \z regex condition will look for ?>(whitespace) only in the EOF.
I recommend you Sublime text 3 because you can apply regex doing a search and replace, there's a Sublime text find & replace examples if you want to learn how to.

PHP preg_match match consecutive newline chars

I am trying to prevent certain kinds of posts on my site, which are mostly meant to make it look like they contain some content but are just spam. Specifically, the posts are a few random words, some newline characters, and a random character.
So, I know some legit users might have use for using two newline chars (to create a blank line between paragraphs), but I figure 3+ can be marked as spam.
I tested this regex on regex101 and it works fine, but is never triggered when I test on my site, any ideas as to why? When I uncomment the echo line, it will show me the number 4 for my test data, so I know it sees the newlines.. is my regex formed incorrectly?!
Test data:
This is a potential
spam post
Code:
//echo substr_count($lowercaseBody, "\n");
if (preg_match('/\n{3,}./', $lowercaseBody)){
error("Stop Spamming my chan you .");
}
The data likely contains CRLF's, not just LF's.
The substr_count test does not care about the interleaving CR's, but your regex patterns does.
Use (\r?\n) instead of the \n to allow both CRLF's and LF's (different browsers/OS's, may use different new-lines):
if (preg_match('/(\r?\n){3,}./', $lowercaseBody)){
error("Stop Spamming my chan you .");
}

PHP Detect 1 or more unicode spaces only

I have searched in vain to find a fix for this issue. I have an editable field in a web page that contains a user entered space. When I copy the space and enter it into a program called IVI32 which I guess you would call a Unicode text program, I get the following info.
The space character is defined as FFFE2000. I need to detect when this field has one or more of these spaces and nothing else. I have tried the following with preg_match:
'/\s+/u'
'/^[0 :-]+$/ '
'/\A\s*\z/'
Nothing works and I am completely stumped. Any help from some Unicode experts out there will be greatly appreciated.
There was an error in my code which was preventing anything from working properly (the product of no time off!). Here's what works for anyone else who might want to detect if an element contains only whitespace that cannot be eliminated by php trim();
if(!preg_match('/\\s/', $test_string)):-do something-
if(!preg_match('/\s+/u', $test_string)):-do something-
if(!preg_match('/[\pZ\pC]+/u', $test_string)):-do something-
For anyone who is interested the space is pasted immediately after the end of this sentence.
Would this work?
preg_match('/^ +$/', $subject);
Match a single or more spaces? Because \s will also match nonbreaking spaces, tabs, and newlines.
Have a try with:
/\p{WhiteSpace}/

Converting entire source code from tabs to 4 spaces

I just realized after committing the CakePHP source to GitHub that they're now using tabs to indent code rather than four spaces. They also define this in the .editorconfig file, which I've changed to this:
root = true
[*]
indent_style = space
indent_size = 4
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
Is there a way to run through the entire source code and safely convert all tabs to four spaces for indentation? My reasoning is every developer on the repo uses four spaces and mixing and matching will cause the code to look out of place when looking at it on GitHub. And I'm just a fan of consistency :)
If I'm going down the home-brew way and writing my own script for this, I don't really mind what language although I'm more confident in PHP (not the best suited for the job, I know). Is this as simple as doing a preg_replace('~\t~', ' ', $fileText) on each file?
Try this in the directory you wish to execute it in:
find ./ -type f -exec sed -i 's/\t/XXXX/g' {} \;
That should replace the tabs with 4 spaces (if you replace the X's with spaces).
Adjust the space between t/ and /g with however many spaces you want…just get rid of the X's and put spaces in there.
A straight replacement of tabs with spaces will result in misalignment when tabs follow space characters that encroach on that tab region.
A basic python script which makes use of the expandtabs() string method will result in code looking the same as when it was conceived. Example is for a tab space of 4:
#!/usr/bin/python
#
# convert source code or text with spaces, being careful to align text as it was conceived
# with the original tab space settings, which is defaulted to 4 spaces per tab.
#
# usage:
# ./tabs2spaces.py <file_to_convert>
import os
import sys
spaces_per_tab = 4
argc = len( sys.argv )
if argc < 2:
print 'no file argument specified'
filename = sys.argv[ 1 ]
old_filename = 'old_' + filename
os.rename( filename, old_filename )
fn = open( filename, 'wb' )
fo = open( old_filename, 'r' )
for line in fo:
fn.write( line.expandtabs( spaces_per_tab ) )
fn.close()
fo.close()
Not sure if you have access to or already use Sublime Text 2, but it can automatically convert all the tabs to spaces for you:
How to replace four spaces with a tab in Sublime Text 2?

I need to create a horizontal list from a vertical list of 500 words for a function

I'm creating a 'bad words' filter as str_ireplace function and I have a list of about 500 bad words.. all in a long vertical list. Any idea how I could quickly and easily create a horizontal, comma-delimited formatted list without manually typing a comma after every word and backspacing?
And yes.. I could probably do this in 20 minutes, but I've had this problem before so I'm asking for all the future times I run into this too.
I'd just use find and replace. If whatever editor you're using for your coding can't cope with finding carriage returns try Word, Notepad++, etc.
In php it would be something like:
str_replace(array("\r", "\r\n", "\n"), ", ", $string);
Or
$file = file("list.txt");
print_r($file);
Or, if you want to use bash for that, this would be the thing:
sed -e :a -e '$!N;s/\n/, /;ta' list.txt
file_put_contents('filename', implode(',', file('filename', FILE_IGNORE_NEW_LINES)));
This code rewrite your file exactly as you want

Categories