Remove lines with specific pattern at the beginning of them

Remove lines with specific pattern at the beginning of them - php

I have a text file of approximately 25,000 lines. About 525kb.
Some lines have random text at the beginning.
Some have long strings of semicolons.
Some others only have three semi-colons and then a space and optionally more text on the same line. These are the lines I want to remove.
Here is a sample....
;;; Updated Time 20120706122706
;;; Generic DEveloper Output
;;; Some Random Comments
;;; I got some more...
;;; Yet another uneeded line
;;; Thanks for using StackOverflow <http://stackoverflow.com>, or...
;;; Not.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Banana Production
[Data_Release_Version]
Version=12586
Released=20120706122706
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; Baseline Properties
[BaseLineProperties]
Comment=BaselineProperties
----- and so on.
Once it gets to the first line with 4 or more ; on the line, I need the rest of the file as there are no ";;; " lines.
Trying to find something fast instead of reading everything line and writing it back out if it doesn't match ";;; ".
File is ASCII (possibly UTF-8) text type file.
Any ideas?
Thank you for your time, assistance and knowledge.

What I would suggest is to use file_get_contents() and save file's contents in a variable as a string, then use explode() that string at every newline character, then in a foreach loop, use preg_match() to check if the line begins with 3 semicolons and a space, if it dosent, put it in another array named $output. After foreach, implode() $output and add a newline character and use file_put_contents() to print it in another file. Hope this helps :-)
code:
<?php
$string = file_get_contents($filename);
$array = explode("\n",$string);
foreach($array as $arr) {
if(!(preg_match("^;;;\s",$arr))) {
$output[] = $arr;
}
}
$out = implode("\n",$output);
file_put_contents($path,$out);
?>

Depends.. I would try to load into a string, then do a explode() with newline, so it's in array, then run a foreach with a skip on any that doesnt have strpos == 0 -AND- strpos !== false, you can put in a continue to skip to the next line if it doesnt match.
Another option, is to parse, and skip, or even using fseek, and such. Depends on alot of different factors to determine whats going to be fastest.
You can implode later on, and add the newlines back in, and then push out a file, and/or use line breaks. Depending where the output is supposed to go.

I think you gave the answer yourself:
Make a script that reads the input file line by line in a loop (while). It writes every line into an output file if two conditions are met: 1. a flag ("done") is FALSE and 2. the line does NOT start with ";;; " (not the blank). This removes those lines starting with three semicolons. Once you come about a line containing more semicolons you set the flag to TRUE, thus the remaining lines wil be copied without being examined.

Related

PHP - Write string in a certain position of a file

I have the following string in a text file:
<!--
///////////////////////////////////////////////////////////////////////////////////////
//
// Individuals
//
///////////////////////////////////////////////////////////////////////////////////////
-->
I need to write text two lines below the above part. There are many other sections like the one above, but each one doesn't repeat, so I know there is only one "Individuals".
I need to overwrite the file, so how can I set the position of my output to there?

There are two approaches:
Read the file line by line. Check each line if it matches to certain conditions. If yes, insert lines.
Read the full file, replace the text wanted and write the full file.
As long as your traffic is not too high, method 2 will be simpler. In order to replace the full block, I suggest this regex:
/(<!--[\s\r\n/]*Individuals[\s/]*)[^\s/]*([\s/]*-->)/
It matches the full block containing "Individuals" followed by some content on the new line. The block before the separate content is being caught by parenthesis, in order to do a replace:
$content = file_get_contents('myfile.txt');
$newContent = preg_replace("#(<!--[\s\r\n/]*Individuals[\s/]*)[^\s/]*([\s/]*-->)#", "$1$myNames$2", $content);
file_put_contents('myfile.txt', $newContent);
The preg_replace concatenates the first part of the block, some slashes and "Individuals", together with $myNames, and together with the closing part of the block to a new block.
I have not tested this finally in the shell, so I apologize for a possible typo. But I'm sure this will work out.

re writing or formatting contents in .txt file to add numbering in front and semi colon at the end

I have a .txt file with content(see first image) I need the content in such a way that It should be numbered and comma at the end of every line(see second image).
I want to insert say: "1"=>" in front of the first line. The numbering will increase on the second line so having about 4747 lines the last number will be 4747.
then insert: ", at the end of every line.
I have some knowlege in PHP so if somebody has solution or idea that will be helpful. I have been formatting this manually and that is very time consuming

Using PHP:
open your original file (consider fopen() or file_get_contents)
Split into an array by newline (explode())
process array line by line (for or foreach)
prepend the current line number
append quote and comma
loop to next array member
When finished - save content to file (file_put_contents())
That should be enough for you to work it out for yourself!

Use PHP's file(), which will give you an array with one element for each line. After that it should be a simple matter of iterating over the array with foreach and concatenating the appropriate bits onto the string. Then use fopen() and fwrite() to write the edited lines to your output file.

Counting effective line count in sources by PHP

I am writing a PHP program which reads file with file_get_contents then attempts to count effective lines in that source file. It must not count empty lines or lines containing comments only. Sample file:
<?php
/**
* blah blah
*/
class Test {
// testfunc
function testfunc(){
return;
}
}
The number of lines in such a file should be 5. Here is what I've got so far:
$f=file_get_contents($this->file);
$f=preg_replace('|/\*.*?\*/|s','',$f);
$f=preg_replace('/^\s*$/','',$f); // <-- does not work
$f=preg_replace("/\n\n*/s","\n",$f);
$count=count(explode("\n",$f));
But for some reason it does not eliminate white-spaces. Is there a better way to get this done?
The following code does the job, since I don't care much about the spaces, but I still wonder, why my original line labeled "does not work" is not removing spaces from empty lines. Is there some extra character at the end? File format is unix.
$f=preg_replace('/ */','',$f); // removes all spaces properly.

Change /^[\s\t]*$/ to be /^\s*$/ms and that should fix it.
The \s class includes tabs, so no need to add \t. The s makes it match newline characters and the m option makes ^ and $ work when data contains multiple lines (matches line breaks).
Also, it might be better to change /\n\n/s to be /[\r\n]{2,}/.

I would just use trim() and then test each line.
foreach ($lines as $line) {
if (strlen(trim($line)) > 0) {
$total++;
}
}
Then, you're set up to test for other conditions as well, such as comment lines and what not. I suspect that this will be faster than doing a find/replace on a potentially large document, but you should test it either way, and choose the fastest method.

What is the proper New Line Character in Outlook Contact Export?

I have a CSV parser, that takes Outlook 2010 Contact Export .CSV file, and produces an array of values.
I break each row on the new line symbol, and each column on the comma. It works fine, until someone puts a new line inside a field (typically Address). This new line, which I assume is "\n" or "\r\n", explodes the row where it shouldn't, and the whole file becomes messed up from there on.
In my case, it happens when Business Street is written in two lines:
123 Apple Dr. Unit A
My code:
$file = file_get_contents("outlook.csv");
$rows = explode("\r\n",$file);
foreach($rows as $row)
{
$columns = explode(",",$row);
// Further manipulation here.
}
I have tried both "\n" and "\r\n", same result.
I figured I could calculate the number of columns in the first row (keys), and then find a way to not allow a new line until this many columns have been parsed, but it feels shady.
Is there another character for the new line that I can try, that would not be inside the data fields themselves?

The most common way of handling newlines in CSV files is to "quote" fields which contain significant characters such as newlines or commas. It may be worth looking into whether your CSV generator does this.
I recommend using PHP's fgetcsv() function, which is intended for this purpose. As you've discovered, splitting strings on commas works only in the most trivial cases.
In cases, where that doesn't work, a more sophisticated, reportedly RFC4180-compliant parser is available here.

I also recommend fgetcsv()
fgetcsv will also take care of commas inside strings ( between quotes ).
Interesting parsing tutorial
+1 to the previous answer ;)
PS: fgetcsv is a bit slower then opening the file and explode the contents etc. But imo it's worth it.

file_put_contents, file_append and line breaks

I'm writing a PHP script that adds numbers into a text file. I want to have one number on every line, like this:
1
5
8
12
If I use file_put_contents($filename, $commentnumber, FILE_APPEND), the result looks like:
15812
If I add a line break like file_put_contents($filename, $commentnumber . "\n", FILE_APPEND), spaces are added after each number and one empty line at the end (underscore represents spaces):
1_
5_
8_
12_
_
_
How do I get that function to add the numbers the way I want, without spaces?

Did you tried with PHP EOL constant?
file_put_contents($filename, $commentnumber . PHP_EOL, FILE_APPEND)
--- Added ---
I just realize that my file editor does the same, but don't worrie, is just a ghost character that the editor places there to signal that there is a newline
You could try this
A file with EOL after the last number looks like:
1_
2_
3_
EOF
but a file without that last character looks like
1_
2_
3
EOF
where _ means a space character
You could try to parse the file contents using php to see what's inside
$lines = explode( PHP_EOL, file_get_contents($file));
foreach($lines as $line ) {
var_dump($line);
}
...tricky

pauls answer has the correct approach but he has a mistake.
what you need ist the following:
file_put_contents($filename, trim($commentnumber).PHP_EOL, FILE_APPEND);
the PHP_EOL constant makes sure to use the right line ending on mac, windows and unix systems
the trim function removes any newline or whitespace on both sides of the string.
converting to integer would be a huge mistake because
1. you might end up having zero, expecially because of white space or special characters (wherever they come from...)
2. ids dont necessarily need to be integers

Ohh Guys! Just Use
\r\n
insted of \n

There is nothing in the code you provided that would generate those spaces, unless $commentnumber already contains the space to begin with. If that is the case, simply use trim($commentnumber) instead.
There is also nothing in your code that would explain empty lines at the bottom of the file, unless $commentnumber can be an empty string. If that is the case, and you want it to output the number 0 instead, use intval($commentnumber).
Of course, you need only one of those two. If you want to preserve string-like content, use trim(); if you always want integers, use intval(), which already trims it automatically.
It is also possible that you accidentally wrote " \n" instead of "\n" in your actual code, but in the code you posted here it is correct.

annoyingregistration, what you have there is absolutely fine.
PHP_EOL and "\n" are exactly the same.
The code you provided theres nothing wrong with it so it must be the value of $commentnumber that has a space at the end of it. as stated, run your $commentnumber through the trim() function.
file_put_contents($filename, trim($commentnumber . "\n"), FILE_APPEND);
Good luck.

After reading your code and responses, I have come up with a theory...
Since I can't see that there's anything wrong with your code, how did you open and read the file? Did you actually open it in a text editor? Did you use a PHP script to do it? If so, open the file with a text editor and check that there are actually spaces at the end of each line. If there is actually is...well, ignore the rest of this answer, then. If not, just read on.
For instance, if you use something like this:
<?php
$lines = file($filename);
if($lines) // Error reading
die();
foreach($lines as $line)
echo $line."<br />";
Then you would always a whitespace at the end of the line because of the way file() work. Make sure each $line does not have a whitespace - such as a newline character - at the end.
Since HTML handles all whitespaces - spaces, tabs, newlines etc. - as spaces, if there is a whitespace at the end of $line, then those would appear as spaces in the HTML output.
Solution: use rtrim($line) to remove whitespaces at the end of the lines. Using the following code:
<?php
$lines = file($filename);
if($lines) // Error reading
die();
foreach($lines as $line)
echo rtrim($line)."<br />";
wouldn't have the same problems as the first example, and all spaces at the end of the lines would be gone.

its because each time you write to the file, the file is being finished, file_put_contents inserts an extra line break at the end

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Remove lines with specific pattern at the beginning of them - php

Related

PHP - Write string in a certain position of a file

re writing or formatting contents in .txt file to add numbering in front and semi colon at the end

Counting effective line count in sources by PHP

What is the proper New Line Character in Outlook Contact Export?

file_put_contents, file_append and line breaks

Categories

Resources