delete first two char from each line in text file -sed - php

I am using PHP file which execute sed:
shell_exec("C:\\cygwin64\\bin\\bash.exe --login -c 'sed -i -r \'s/.{2}//\' $text_files_path/File.txt 2>&1'");
This statement will delete the first 2 character from file.txt.
How to delete the first 2 char from (each line) in the file?
File.text:
< TTGCATGCAAAAATTT
< AAAAAAATTTTGCTGA
< AAGGTTCCCCCTTAGT
Edit 1:
shell_exec("C:\\cygwin64\\bin\\bash.exe --login -c 'sed -i -r 's/^..//' $text_files_path/File.txt 2>&1'");
This works but, it concatenate all lines together:
File.text after above command:
TTGCATGCAAAAATTTAAAAAAATTTTGCTGAAAGGTTCCCCCTTAGT

Please don't call sed via bash to do something that PHP can do natively. It's a complete anti-pattern. Worryingly, I have seen the exact same thing in another question quite recently...
I hope you've got plenty of free disk space:
$input_filename = "$text_files_path/File.txt";
$output_filename = 'path/to/temp/output.txt';
$input_file = fopen($input_filename, 'rb');
$output_file = fopen($output_filename, 'wb');
while (($line = fgets($input_file)) !== false) {
fwrite($output_file, substr($line, 2));
}
fclose($input_file);
fclose($output_file);
rename($output_filename, $input_filename);
Open the input file for reading and the temporary output file for writing. Use binary mode in both cases to avoid issues related to different line endings on different systems.
Read each line of the input and write the substring from the second character to the temporary output.
Close both files and then overwrite the input with the temporary file.
Technically this could actually be implemented in-place but the resulting script would be much more complicated and you would run further risk of corrupting your input file if things went wrong.

If you just want to use PHP, then you can explode() the file into individual lines and then use substr() to drop the first two characters before joining the lines back into a single string separated with a new line:
// Set the results array.
$result = array();
// Split the file into lines.
$file = $text_files_path . '/File.txt';
$lines = explode("\n", $file);
// Cut the first two characters of each line and add to the results array.
foreach($lines AS $line) {
$result[] = substr($line, 2);
}
// Split the result back into lines.
$result = implode("\n", $result);

s/^..// That should give you the result you need.
^ points to the start of the line then the . will match any character

Related

Reading a log file into an array reversed, is it best method when looking for keyword near the bottom?

I am reading from log files which can be anything from a small log file up to 8-10mb of logs. The typical size would probably be 1mb. Now the key thing is that the keyword im looking for is normally near the end of the document, in probably 95% of the cases. Then i extract 1000 characters after the keyword.
If i use this approach:
$lines = explode("\n",$body);
$reversed = array_reverse($lines);
foreach($reversed AS $line) {
// Search for my keyword
}
Would it be more efficent than using:
$pos = stripos($body,$keyword);
$snippet_pre = substr($body, $pos, 1000);
What i am not sure on is with stripos does it just start searching through the document 1 character at a time so in theory if there is 10,000 characters after the keyword then i wont have to read those into memory, whereas the first option would have to read everything into memory even though it probably only needs the last 100 lines, could i alter it to read 100 lines into memory, then search another 101-200 lines if the first 100 was not successful or is the query so light that it doesnt really matter.
I have a 2nd question and this assumes the reverse_array is the best approach, how would i extract the next 1000 characters after i have found the keyword, here is my woeful attempt
$body = $this_is_the_log_content;
$lines = explode("\n",$body);
$reversed = array_reverse($lines);
foreach($reversed AS $line) {
$pos = stripos($line,$keyword);
$snippet_pre = substr($line, $pos, 1000);
}
Why i don't think that will work is because each $line might only be a few hundred characters so would the better solution be to explode it every say 2,000 lines and also keep the previous $line as a backup variable so something like this.
$body = $this_is_the_log_content;
$lines = str_split($body, 2000);
$reversed = array_reverse($lines);
$previous_line = $line;
foreach($reversed AS $line) {
$pos = stripos($line,$keyword);
if ($pos) {
$line = $previous_line . ' ' . $line;
$pos1 = stripos($line,$keyword);
$snippet_pre = substr($line, $pos, 1000);
}
}
Im probably massively over-complicating this?
I would strongly consider using a tool like grep for this. You can call this command line tool from PHP and use it to search the file for the word you are looking for and do things like give you the byte offset of the matching line, give you a matching line plus trailing context lines, etc.
Here is a link to grep manual. http://unixhelp.ed.ac.uk/CGI/man-cgi?grep
Play with the command a bit on the command line to get it the way you want it, then call it from PHP using exec(), passthru(), or similar depending on how you need to capture/display the content.
Alternatively, you can simply fopen() the file with the pointer at the end and move the file pointer forward in the file using fseek() searching for the string as you move along the way. Once you find you needle, you can then read the file from that offset until you get to the end of file or the number of log entries.
Either of these might be preferable to reading the entire log file into memory and then trying to work with it.
The other thing to consider is whether 1000 characters is meaningful. Typically log files would have lines that vary in length. To me it would seem that you should be more concerned about getting the next X lines from the log file, not the next Y characters. What if a line has 2000 characters, are you saying you only want to get half of it? That may not be meaningful at all.

Ensure fgetcsv() reads the entire line

I am using PHP to import data from a CSV file using fgetcsv(), which yields an array for each row. Initially, I had the character limit set at 1024, like so:
while ($data = fgetcsv($fp, 1024)) {
// do stuff with the row
}
However, a CSV with 200+ columns surpassed the 1024 limit on many rows. This caused the line read to stop in the middle of a row, and then the next call to fgetcsv() would start where the previous one left off and so on until an EOL was reached.
I have since upped this limit to 4096, which should take care of the majority of cases, but I would like put a check in to be sure that the entire line was read after each line is fetched. How do I go about this?
I was thinking to check the end of the last element of the array for end of line characters (\n, \r, \r\n), but wouldn't these be parsed out by the fgetcsv() call?
Just omit the length parameter. It's optional in PHP5.
while ($data = fgetcsv($fp)) {
// do stuff with the row
}
Just don't specify a limit, and fgetcsv() will slurp in as much as is necessary to capture a full line. If you do specify a limit, then it's entirely up to YOU to scan the file stream and ensure you're not slicing something down the middle.
However, note that not specifying a limit can be risky if you don't have control over generation of this .csv in the first place. It'd be easy to swamp your server with a malicious CSV that has a many terabytes of data on a single line.
Thank you for the suggestions, but these solutions really didn't solve the issue of knowing that we account for the longest line while still providing a limit. I was able to accomplish this by using the wc -L UNIX command via shell_exec() to determine the longest line in the file prior to beginning the line fetching. The code is below:
// open the CSV file to read lines
$fp = fopen($sListFullPath, 'r');
// use wc to figure out the longest line in the file
$longestArray = explode(" ", shell_exec('wc -L ' . $sListFullPath));
$longest_line = (int)$longestArray[0] + 4; // add a little padding for EOL chars
// check against a user-defined maximum length
if ($longest_line > $line_length_max) {
// alert user that the length of at least one line in the CSV is too long
}
// read in the data
while ($data = fgetcsv($fp, $longest_line)) {
// do stuff with the row
}
This approach ensures that every line is read in its entirety and still provides a safety net for really long lines without stepping through the entire file with PHP line by line.
I would be careful with your final solution. I was able to upload a file named /.;ls -a;.csv to perform command injection. Make sure you validate the file path if you use this approach. Also, it might be a good idea to provide a default_length in the case your wc fails for any reason.
// use wc to find max line length
// uses a hardcoded default if wc fails
// this is relatively safe from command
// injection since the file path is a tmp file
$wc = explode(" ", shell_exec('wc -L ' . $validated_file_path));
$longest_line = (int)$wc[0];
$length = ($longest_line) ? $longest_line + 4 : $default_length;
fgetcsv() is by default is used to read line by line from a csv file but when it is not functioning that way, you have to check PHP_EOL character on your OS machine
you have simply to go:
C:\xampp\php\php.ini
and search for:
;auto_detect_line_endings = Off
and uncomment it and activate it to:
auto_detect_line_endings = On
restart Apache and check . . . should works

How to pass a file as an argument to php exec?

I would like to know how I can pass the content of a file (csv in my case) as an argument for a command line executable (in C or Objective C) to be called by exec in php.
Here is what I have done: the user loads the content of its file from an URL like this:
http://www.myserver.com/model.php?fileName=test.csv
Then the following code allows php to parse and load the csv file:
<?php
$f = $_GET['fileName'];
$handle = fopen("$f", "r");
$data = array();
while (($line = fgetcsv($handle)) !== FALSE) {
$data[] = $line;
}
?>
where I'm stuck is how to pass the content of this csv file as an argument to exec. Even if I can assume the csv is known to have only two columns, how many rows it has is user-specific, so I cannot pass all the values one by one as parameters, e.g.
exec("/path_to_executable/model -a $data[0][0] -b $data[0][1] .....");
The only alternative solution I guess would be to write something like that:
exec("/path_to_executable/model -fileName test.csv");
and have the command line executable do the csv parsing, but in that case, I think I need to have the csv file physically written on the server side. I'm wondering what happens if several people are accessing the webpage at the same time with their own different csv file, are they over-writing each others?
I guess there must be a much proper way to do this and I have not figured it out. Any idea? Thanks!
I would recommend having that data on disk, and loading it within the command line utility - it is much less messing about. But if you can't do that, just pass it in 1 (unparsed) line at a time:
$command = "/path_to_executable/model";
foreach ($fileData as $line) {
$command .= ' "'.escapeshellarg($line).'"';
}
exec($command);
Then you can just fetch the data into your utility by looping argv, where argv[0] is the first line, argv[1] is the second line, and so on.
you could use popen() to get a handle on the process to write to. If you need to go both ways (read/write) and might requre some more power, have a look a proc_open().
You could also just write your data to some random file (to avoid multiple users kicking each other's race-conditioned butts). Something along the lines of
<?php
$csv = file_get_contents('http://www.myserver.com/model.php?fileName=test.csv
');
$filename = '/tmp/' . uniqid(sha1($csv)) . '.csv';
file_put_contents($filename, $csv);
exec('/your/thing < '. escapeshellarg($filename));
unlink($filename);
And since you're also in charge of the executable, you might figure out how to get the number of arguments passed (hint: argc) and read them in (hint: argv). Passing them through line-based like so:
<?php
$csvRow = fgetcsv($fh);
if ($csvRow) {
$escaped = array_map('escapeshellarg', $csvRow);
exec('/your/thing '. join(' ', $escaped));
}

How do I loop through two files and combine the same?

I do have two text files and want to loop through both files then combine both line (line 1 of first test file and line1 of second text file. like that for thousands of lines) and do some function
I am familiar with loop through one file and for that code is given below:
$lines = file('data.txt');
foreach ($lines as $line) {
//some function
}
but how will I do for two files and combine bothe lines?
Not sure what you mean by search through the table, but to open both files and do stuff with them:
$file1 = fopen("/path/to/file1.txt","r"); //Open file with read only access
$file2 = fopen("/path/to/file2.txt","r");
$combined = fopen("/path/to/combined.txt","w"); //in case you want to write the combined lines to a new file
while(!feof($file1) && !feof($file2))
{
$line1 = trim(fgets($file1)); //Grab a line of the first file, note the trim will clip off the carriage return/new line at the end of the line, can remove it if you don't need it.
$line2 = trim(fgets($file2)); //Grab a line of the second file
$combline = $line1 . $line2;
fwrite($combined,$combline . "\r\n"); //Write to new combined file, and add a new carriage return/newline at the end of the combined line to replace the one trimmed off.
//You can do whatever with data from $line1, $line2, or the combined $combline after getting them.
}
Note: You might run into trouble if you hit the end of file on one file before the other, which would only happen if they aren't the same length, might need some if control statements to set $line1 or $line2 to "" or something else if feof() their respective files, once both hit the end of file, the while loop will end.
You can do this programmatically as Crayon and Tim have shown. If both files have the same number of lines, it should work. If the line number is different you will have to loop over the larger file to make sure you get all lines or check EOF on both.
To combine line by line, I often use the unix command paste which is very fast. This also accounts for files with different lengths. Run this on the command line:
paste file1 file2 > output.txt
See the manpage for paste for command line options, field delimiters.
man paste
Example:
$file1 = fopen("file1.txt", "rb");
$file2 = fopen("file2.txt", "rb");
while (!feof($file1)) {
$combined = fread($file1, 8192) . " " . fread($file2, 8192);
// now insert $combined into db
}
fclose($file1);
fclose($file2);
you will want to use the longer of the two files in the while condition.
you may need to adjust the bytes read in fread depending on how long your lines are
change " " to whatever delimiter you want

sed doesn't work with PHP exec function

echo "sed -i 's/NULL/\\N/g' ".$_REQUEST['para'].".sql";
The above statement works. But it fail when I use it in exec like this...
exec("sed -i 's/NULL//\/\/\N/g' ".$_REQUEST['para'].".sql");
You should escape backslashes with backslashes, not with forward slashes, like this:
exec("sed -i 's/NULL/\\\\N/g' ".$_REQUEST['para'].".sql");
EDIT I wrote the answer without looking at what the code actually does. Don't do this, because $_REQUEST['para'] can be whatever the user wants, which can be used for code injection. Use the PHP functions as the other answer suggests.
Although it's entirely up to you, but my advice is not to call system commands unnecessarily. In PHP, you can use preg_replace() to do the functionality of sed.
preg_replace("/NULL/","\\N",file_get_contents("$_REQUEST['para']"."sql") )
Building on ghostdog's idea, here's code that will actually do what you want (the original code he posted didn't actually read content of the file in):
//basename protects against directory traversal
//ideally we should also do a is_writable() check
$file = basename($_REQUEST['para'].".sql");
$text = file_get_contents($file);
$text = str_replace('NULL', '\\N', $text); //no need for a regex
file_put_contents($file, $text);
Admittedly, however, if the file in question is more than a few meg, this is inadvisable as the whole file will be read into memory. You could read it in chunks, but that'd get a bit more complicated:
$file = basename($_REQUEST['para'].".sql");
$tmpFile = tempnam("/tmp", "FOO");
$in = fopen($file, 'r');
$tmp = fopen($tmpFile, 'w');
while($line = fgets($in)) {
$line = str_replace('NULL', '\\N', $line);
fputs($tmp, $line);
}
fclose($tmp);
fclose($in);
rename($tmpFile, $file);
If the file is 100+ meg, honestly, calling sed directly like you are will be faster. When it comes to large files, the overhead of trying to reproduce a tool like sed/grep with its PHP equivalent just isn't worth it. However, you need to at least take some steps to protect yourself if you're going to do so:
Taking some basic steps to secure amnom's code:
$file = basename($_REQUEST['para'].".sql");
if(!is_writable($file))
throw new Exception('bad filename');
exec("sed -i 's/NULL/\\\\N/g' ".escapeshellarg($file));
First, we call basename, which
strips any path from our filename
(e.g., if an attacker submitted the
string '/etc/passwd', we'd at least
now be limiting them to the file
'passwd' in the current working
directory
Next, we ensure that the file is, in
fact, writable. If not, we
shouldn't continue
Finally, we escapeshellarg() on the file. Failure to do so allows arbitrary command execution. e.g., if the attacker submitted the string /etc/passwd; rm -rf /; #, you'd end up with the command sed 's/blah/blah/' /etc/passwd; rm -rf /; #.sql. It should be clear that while that exact command may not work, finding one that actually would is trivial.

Categories