How to read parts of a delimited file

How to read parts of a delimited file - php

I have a file that looks like this:
1028806~HDR~20110815~15-AUG-2011~C~23:10~~~~~~~
1028806~DTL~C3914A~HWP-C3914A~1000949~A~LASERJET MAINT KIT 8100/N/DN~HEWLETT PACKARD~2659~12~0~0~475.75~658.75~0~3~Y~2~~2~475.75~5~~~009088336~~3179~10.60~N~8.25~8.50~20.50~~088698601976~44103109~6A~20030627~NNY~~A~S~~~~~~N~~~~~~20.50~8.50~8.25~~~~~~~~~~~~~~~~
1028806~DTL~70023301~OKI-70023301~1002121~A~OKILAN 6020E+ 10/100BASE-TX ETHERNET EXT~OKI PRINTING SOLUTIONS~2703~0~0~0~55.17~80.00~0~0~Y~0~~0~55.17~0~~~009117000~~2160~2.79~N~8.00~8.75~14.00~~000000180016~44101700~ACC-IMPACT~19950723~NNY~~A~S~~~~~~N~~~~~~14.00~8.75~8.00~~~~~~~~~~~~~~~~
1028806~DTL~PRO7T~APC-PRO7T~1003150~A~Professional-grade Protection for Computers and Electronics~AMERICAN POWER CONVERSION~20664~7~0~0~21.60~36.00~0~0~Y~0~~0~21.60~7~~~008112000~~4400~2.00~N~1.90~6.90~12.40~~731304000181~39121610~SURG~19950723~NNY~~A~S~~~~~~N~~~~~~12.40~6.90~1.90~~~~~~~~~~~~~~~~
1028806~DTL~PER7~APC-PER7~1003418~A~Surge suppressor ( external ) / 7 output connector(s)~AMERICAN POWER CONVERSION~20664~496~50~0~9.30~15.25~0~3~Y~86~~363~9.30~44~~~008118000~~4400~1.85~N~2.10~6.90~11.50~~731304000112~39121610~SURG~20011025~NNY~~A~S~~~~~~N~~~~~~11.50~6.90~2.10~~~~~~~~~~~~~~~~
1028806~DTL~PRO7~APC-PRO7~1003761~A~APC SurgeArrest Professional - Surge suppressor ( external ) - AC 120 V - 7 outp~AMERICAN POWER CONVERSION~20664~88~0~0~17.59~30.00~0~0~Y~12~~52~17.59~24~~~008112000~~4400~1.95~N~2.25~7.50~12.25~~731304000174~39121610~SURG~19950723~NNY~~A~S~~~~~~N~~~~~~12.25~7.50~2.25~~~~~~~~~~~~~~~~
I need to use a script to read certain parts of each line (the bold parts):
1028806~DTL~C3914A~HWP-C3914A~1000949~A~LASERJET MAINT KIT 8100/N/DN~HEWLETT PACKARD~2659~12~0~0~475.75~658.75~0~3~Y~2~~2~475.75~5~~~009088336~~3179~10.60~N~8.25~8.50~20.50~~088698601976~44103109~6A~20030627~NNY~~A~S~~~~~~N~~~~~~20.50~8.50~8.25~~~~~~~~~~~~~~~~
The file has over 300k items so going through manually is not an option, so how can I get a script to read only these parts when I don't know how long the part # and descriptions are? While ignoring all the other ~ characters.
Thanks

fgetcsv() can help here, a little more memory-conservative than loading the whole file up at once and explode()'ing all the lines into a giant array.
if (($handle = fopen("/path/to/file", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, "~")) !== FALSE) {
echo $data[2] . " " . $data[6];
}
}
fclose($handle);

Looks like you can explode on tilde:
$fields = explode('~', $line);
$part_num = $fields[2];
$desc = $fields[6];

// read the file
$lines = file('file.txt');
// loop through each line
foreach($lines as $line){
// separate the parts by the ~ delimiter up to the second bold part
// ignoring the rest of ~
$parts = explode('~', $line, 7);
echo $parts[2]; // output first bold part
echo $parts[6]; // output second bold part
}

Related

preg_split : including empty field

I want to split a TSV string. The structure is:
abc\tdef\tghi\tjklm
where \t is a tab character.
If I use preg_split to split such string $i
$field=preg_split("/\t/", $i);
$field[3] is jklm.
However, if I have another string
abc\tdef\t\t
$field[3] is not a valid index.
How can I force empty fields into $field, such that all $field arrays would have an equal number of indexes?

If your problem just for extracting Tab Separator Value data, you can use built
in php function (fgetcsv()). It is more stable than use our own function. Please try this
if (($handle = fopen("test.csv", "r")) !== FALSE) {
// extract csv using tab delimiter
while (($data = fgetcsv($handle, 1000, "\t")) !== FALSE) {
print_r($data);
}
fclose($handle);
}

Like this?
$str ="abc\tdef\t\t";
Var_dump(explode("\t", $str));
https://3v4l.org/7qOPJ

Make Php string without break lines

i need to generate random sentences from dictionary. In dictionary is every word at one line, firstly i load this dictionary to array and after it i have a for cycle and randomly pickup some data, but if i wrote it, so it is at one line in browser, but in source code is every word at another line. Then I need to create a set of XML files from search engine and this new lines are indexed as /n/r and in XML source code it has got a symbol 
 So my question is how i can make a sentence which will be at one line in source code too. Thanks.
Here is piece of my code i don´t have here randomly loading data, i only made it for illustration in for cycle.
$file = fopen("test.txt", "r");
$data = array();
while (($buffer = fgets($file)) !== false) {
$data[] = $buffer;
}
$sentence = '';
for ($i=0;$i<10;$i++){
$sentence = $sentence . $data[$i];
}

Use trim function to filter new line characters.
In your code use:
$data[] = trim($buffer);

correct regex date pattern for dd/mm/yyyy

I need to update the same line, which is also including a date in dd/mm/yyyy format along with some string, in a group of files. I have checked answers here given to similar questions however couldn’t make any of the patterns suggested run in my code.
My current PHP code is:
<?php
// get the system date
$sysdate = date("d/m/Y");
// open the directory
$dir = opendir($argv[1]);
$files = array();
// sorts the files alphabetically
while (($file = readdir($dir)) !== false) {
$files[] = $file;
}
closedir($dir);
sort($files);
// for each ordered file will run the in the clauses part
foreach ($files as $file) {
$lines = '';
// filename extension is '.hql'
if (strpos($file,".hql") != false || strpos($file,".HQL") != false)
{
$procfile = $argv[1] . '\\' . $file;
echo "Converting filename: " . $procfile . "\n";
$handle = fopen($procfile, "r");
$lines = fread($handle, filesize($procfile));
fclose($handle);
$string = $lines;
// What are we going to change runs in here
$pattern = '[0-9][0-9][0-9][0-9]/[0-9][0-9]/[0-9][0-9]';
$replacement = $sysdate;
$lines = preg_replace($pattern, $replacement, $string);
echo $lines;
$newhandle = fopen($procfile, 'w+');
fwrite($newhandle, $lines);
fclose($newhandle);
// DONE
}
}
closedir($dir);
?>
When I run this code on command prompt, it doesn’t give any error message and it seems to be running properly. But once it finishes and I check my files, I see that the content of each file is getting deleted and they all become 0 KB files with nothing in them.

You have no delimiters set in place for your regular expression.
A delimiter can be any (non-alphanumeric, non-backslash, non-whitespace) character.
You want to use a delimiter besides / so you avoid having to escape / already in your pattern.
You could use the following to change your format:
$pattern = '~[0-9]{4}/[0-9]{2}/[0-9]{2}~';
See Live demo

This one also do basic checks (month between 1-12, day between 1-31)
(0(?!0)|[1-2]|3(?=[0-1]))\d\/(0(?!0)|1(?=[0-2]))\d\/\d{4}
See it live: http://regex101.com/r/jG9nD5

You should surround the regular expression with delimiter character.
For example:
$pattern = '![0-9][0-9][0-9][0-9]/[0-9][0-9]/[0-9][0-9]!';
/ is commonly used, but because the regular expression contains / itself, I used ! instead.

Besides the lack of delimiters (# and ~ are favorites, if / is used in the pattern), you are looking for 4 digits at the beginning: yyyy/mm/dd. Decide what you're looking for. You might also be able to do something like
[0-9]{4}/[0-9]{2}/[0-9]{2}
or even
\d{4}/\d{2}/\d{2}
... I know those will work in Perl, but I haven't tried them with PHP (they ought to work, as the "p" in preg stands for Perl, but no guarantees).

Why use regex? Use DateTime class for validation.
var_dump(validateDate('2012-02-28', 'Y-m-d')); # true
var_dump(validateDate('28/02/2012', 'd/m/Y')); # true
var_dump(validateDate('30/02/2012', 'd/m/Y')); # false
function

Your code can be rewritten in short like this:
#!/usr/bin/php
<?php
// get the system date
$sysdate = date('d/m/Y');
// change working directory to the specified one
chdir($argv[1]);
// loop over the *.hql files in sorted order
foreach (glob('*.{hql,HQL}', GLOB_BRACE) as $file) {
echo "Converting filename: $argv[1]\\$file\n";
$contents = file_get_contents($file);
$contents = preg_replace('#\d{4}/\d{2}/\d{2}#', $sysdate, $contents);
echo $contents;
file_put_contents($file, $contents);
}
The problem was with the missing PCRE regex delimiters as others already pointed out. Even after fixing this, the code was not really nice.
The glob and file_get_contents functions are available as of PHP 4.3.0. The file_put_contents function is available as of PHP 5.
glob makes your code more succinct, readable and even portable as you won‘t have to mention directory separator anywhere except the info message. You used \\ but should have used DIRECTORY_SEPARATOR if you wanted your code to be portable.
The file_get_contents function fetches the whole contents of a file as a string. The file_put_contents function does the opposite – stores a string in a file. If you want it in PHP 4, use this implementation:
if (!function_exists('file_put_contents')):
function file_put_contents($filename, $data) {
$handle = fopen($filename, 'w');
$result = fwrite($handle, $data);
fclose($handle);
return $result;
}
endif;
Also notice that the final ?> is not necessary in PHP.

PHP: Reading word by word?

I am trying to read a file one word at a time. So far I have been able to use fgets() to read line by line or up to a certain amount of bytes, but that is not what I am looking for. I want one word at a time. up to the next white space, \n, or EOF.
Does anyone know how to do this in php. In c++ I just use the 'cin >> var' command.

you can do this by
$filecontents = file_get_contents('words.txt');
$words = preg_split('/[\s]+/', $filecontents, -1, PREG_SPLIT_NO_EMPTY);
print_r($words);
this will give you array of words

For some replies in this topic: I say this: Do not reinvent the wheel.
In PHP use:
str_word_count ( string $string [, int $format [, string $charlist ]] )
format:
0 = Return only the number of words;
1 = Return an array;
2 = Return an associative array;
charlist:
Charlist are characters which you consider a word.
Function.str-word-count.php
[CAUTION]
Nobody know anything about the size of your file content, if your file contents is big, exists many flexible solutions.
(^‿◕)

You would have to use fgetc to get a letter at a time until you hit a word bountry then do something with the word. Example
$fp = fopen("file.txt", "r");
$wordBoundries = array("\n"," ");
$wordBuffer = "";
while ($c = fgetc($fp)){
if (in_array($c, $wordBountries)){
// do something then clear the buffer
doSomethingWithBuffer($wordBuffer);
$wordBuffer = "";
} else {
// add the letter to the buffer
$wordBuffer.= $c;
}
}
fclose($fp);

You can try fget() function which read file line by line and when you get one line from file you use explode() to extract word from line which separated by space.
Try this code:
$handle = fopen("inputfile.txt", "r");
if ($handle) {
while (($line = fgets($handle)) !== false) {
// process the line read.
$word_arr = explode(" ", $line); //return word array
foreach($word_arr as $word){
echo $word; // required output
}
}
fclose($handle);
} else {
// error while opening file.
echo "error";
}

PHP change whole line if a word exist in a line of .txt file

I'm using preg_replace to search for a word match within a line of a text file. If it is found, I would like to replace the entire line. My current problem is what I've tried thus far only replaces the exact word and not the entire line.
PHP
$database = "1.txt";
$id = "OFFENSIVEWORD1";
$str = file_get_contents($database);
$str = preg_replace("/\b".$id."\b/","********",$str);
$fp = fopen($database,'w');
fwrite($fp,$str);
fclose($fp);
1.TXT FILE
LOVING YOU MY FRIEND etc.
OFFENSIVEWORD1 YOU MY FRIEND etc.
OFFENSIVEWORD2 YOU MY FRIEND etc.
OFFENSIVEWORD3 YOU MY FRIEND etc.
EXPECTED OUTPUT
LOVING YOU MY FRIEND etc.
********
OFFENSIVEWORD2 YOU MY FRIEND etc.
OFFENSIVEWORD3 YOU MY FRIEND etc.
Thanks.

you need to change
$str = preg_replace("/\b".$id."\b/","********",$str);
to
$str = preg_replace("/\b" . $id . "\b.*\n/ui", "********\n", $str);
to make it work. Look out for the difference between newline among different operating systems.
update
or better use
$str = preg_replace("/.*\b" . $id . "\b.*\n/ui", "********\n", $str);
in case your offensive word is not in the start of the line.

This should do it as well (this is tested and works):
<?PHP
$str = "
hello this is PROFANE 1
clean statement";
$lines = explode('
', $str);
$newfile = '';
foreach($lines as $line) {
// ** change PROFANE 1 and PROFANE 2 obviously to real profane words to filter out
if (stripos($line, 'PROFANE 1') !== false || stripos($line, 'PROFANE 2') !== false) {
$line = preg_replace("/./", "*", $line);
}
$newfile .= '
' . $line;
}
echo "<pre>" . $newfile . "</pre>";
?>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to read parts of a delimited file - php

Looks like you can explode on tilde: $fields = explode('~', $line); $part_num = $fields[2]; $desc = $fields[6];

Related

preg_split : including empty field

Make Php string without break lines

correct regex date pattern for dd/mm/yyyy

PHP: Reading word by word?

PHP change whole line if a word exist in a line of .txt file

Categories

Resources