Data Base Model - Huge (over 1 million lines) text files [closed] - php

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have to work on a project involving huge amount of data stored in a raw text file. Each field is delimited by its size, ie, field 1 is from position 0 to 3, etc.. (not CSV file)
The file contains over a million lines.
I need to store it into a database. I checked several posts about what would be the best way to go about it, and it seems like the technology choice matters less than the algorithm. I'm open to Php, Perl or Python. Feel free to suggest anything.
Now, the file structure in itself is a bit tricky. Here is an example:
A880780093vvd47aa8db20d4133e6f587cf046054e8316000212093659D11001
C880780093d47aa8db20d4133e6f587cf046054e831600021209365907000 0711012012C
A880780093vvcaacb22bfb091127f9c9e14175d858ee25000212093681O11001
C880780093caacb22bfb091127f9c9e14175d858ee2500021209368107000 0611012012ADI
D880780093caacb22bfb091127f9c9e14175d858ee250002120936810700011012012HK00210Z
A880780093vvb92f937a3fd1268c1478deb174a1bfca86000212093750S11041
C880780093b92f937a3fd1268c1478deb174a1bfca8600021209375007000 3911012012PB
C880780093b92f937a3fd1268c1478deb174a1bfca8600021209375007000 3911012012B 1002
E880780093212093750b92f937a3fd1268c1478deb174a1bfca8600007000110120120100000127000000000000
C880780093b92f937a3fd1268c1478deb174a1bfca8600021209375007000 3911012012B
Basically, there are 6 types of lines, from A to F; line A is the header of the block. Lines B and C have the exact same length and fields. Line D is a possible complement to line C, meaning that it is attached to a line C but not required; also meaning there cannot be a line D without a line C. Lines E and F are independent lines, only attached to line A. (all lines are part of a block, so they could all be "attached" to a line A, or a virtual block ID)
How would I go about to create a model that would allow me to:
- modify some data on some lines based on some criteria (ie, if 5th char of line C is 4, then 10th becomes 7)
- keep track of the modified ones (ie, I want to be able to link them to their original selves)
- Be able to rebuild the original text file, deleting the original lines and replacing them by their modified version
- Be able to insert new lines in the block: if line C has 7th char = 0 then I add below it a D line.
- keep the line order intact. (if one line is inserted, it moves the order for the following line by 1 rank ahead)
I thought about using a parent_id foreign key in all 5 line tables (one per each line type, since they do not have the same fields); thus resolving the line ordering issue, but I am stuck at rebuilding the modified file version. I also thought about dividing the file into blocks (starting by a line A), then linking lines to block ID...
Any suggestion would be greatly appreciated!
Thanks a lot in advance!

Go through the file line by line and use a stack. Something along the lines:
<?php
// You'd have to implement the database yourself!
$db = new Database();
$db->startTransaction();
$stack = array();
$fh = fopen("my-file", "r");
$i = 0;
while (($buffer = fgets($handle, 4096)) !== false) {
if (!isset($buffer[0])) {
continue;
}
switch ($buffer[0]) {
case "A":
// Do something ...
break;
case "C":
// Do something ...
break;
case "C":
if ($stack[$i] != "C") {
trigger_error("Line D without preceding line C");
}
// Do something ...
break;
// More stuff ...
}
$stack[$i++] = $buffer;
$db->insert("INSERT INTO table (line) VALUES ($buffer)");
}
$db->commitTransaction();
?>
Of course there are better solutions than the ugly switch, but it's quick'n'dirty. Your database design answer is impossible to answer because we have no clue about the requirements. All in all consider posting your work and ask specific questions regarding a small piece of a big problem and not asking to solve big problems.

Related

How to add new column to a flat file using PHP?

Quick update: The reason I need this solution is that this one php file is used to expand the flat file for about hundred users (that all use the same php file, but have their own flat files)
SOLUTION:
I worked with this one more day, rephrased the question and got a really great answer. I add it here for future help for others:
$content = file_get_contents("newstest.db");
$content = preg_replace('/(^ID:.*\S)/im', '$1value4:::', $content);
$content = preg_replace('/(^\d+.*\S)/im', '$1:::', $content);
file_put_contents("newstest.db", $content);
The original content of the flat file used when testing the code was:
ID:::value1:::value2:::value3:::
1:::My:::first:::line:::
2:::My:::second:::line:::
3:::Your:::third:::line:::
ORIGINAL QUESTION:
I have a PHP script I am trying to modify. Being a PHP newbie, and have searched both here and on Google without finding a solution, I ask here.
I need to add more values (columns) in the flat file, automatically if the "column" does not exist from before.
Because this one PHP file is shared with many users (each with their own flat file), I need a way to automatically add new "columns" in their flat files if the column does not exist. Doing it manually is very time consuming, and I bet there is an easy way.
INFO:
The flat file is named "newstest.db"
The flat file has this layout:
id:::title:::short:::long:::author:::email:::value1:::value2:::value3:::
So the divider is :::
I understand the basics, that I need to add for instance "value4:::" after "value3:::" in the first line of the news.db, then add ::: to the other existing lines to update all lines and prepare for the new "value4"
Today the php uses this to connect to the flat file:
($filesource is the path to the flat file including it's name. Unique for each user.)
$connect_to_file = connect_pb ($filesource);
And to write to the file I use:
insert_pb($filesource,"$new_id:::$title:::$short:::$long:::$author:::$email:::$value1:::::::::");
(As you see in this case value 2 and 3 is not used in this case, but are in others.)
QUESTION:
Is there a quick/ existing php code to use to add a new column if it doesn't already exist? Or do I need to make the php code for this specific task?
I understand that the code must do something along:
If "value4" does not exist in line 0 in $filesource
then add "value4:::" at the end of line 0,
and for each of the other lines add ":::" at the end.
I don't know where to start, but I have tried for some hours.
I understand this:
update_pb(pathtofiletosaveto,"id","x == (ID of news)","value in first line","value to add");
But I don't know how to make an if statement as in 1) above, neither how to update the line 0 in the flat file to add "value4:::" at the end etc.
MY CODE (does not work as intended):
OR, may be I need to read only line 1 in the file (newstest.db), and then exchange that with a new line if "value4" is not in line 1?
A suggestion, but I don't know how do all:
(It's probably full of errors, as I have tried to read up and find examples and combining code.)
<?php
// specify the file
$file_source="newstest.db";
// get the content of the file
$newscontent = file($file_source, true);
$lines ='';
// handle the content, add "value4:::" and ":::" to the lines
foreach ($newscontent as $line_num => $linehere) {
// add "value4:::" at the end of first line only, and put it in memory
if($line_num[0]) {$lines .= $linehere.'value4:::';}
else {
// then add ":::" to the other lines and add them to memory
$lines .= $linehere.':::';
}
// echo results just to see what is going on
//echo 'Line nr'.$line_num.':<br />'.$lines.'<br /><br />';
}
// add
// to show the result
echo "Here is the result:<br /><br />".$lines."<br /><br />";
//Write new content to $file_source
$f = fopen($file_source, 'w');
fwrite($f,$lines);
fclose($f);
echo "done updating database flat file";
?>
This ALMOST works...
But it does NOT add "value4:::" to the end of the first line,
and it does not add ":::" to the end of the next lines, but to the beginning...
So a couple of questions remains:
1) How can I search in line 0 after "value4", and then write "value4:::" at the end of the line?
2) How can I add ":::" at the end of each line, and not in the beginning?
I kindly ask you to either help me with this.
Do you absolutely have to use PHP for this task? It seems like something you only need to do once, and is much easier to do in a different way.
For example, if you have a *nix shell with sed, sed -i 's/$/:::/' <file> will do that task for you.

PHP select and save text template [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
im new to php/html and i don't even know if PHP has ability like that.
What i want to do is to select some text from input and save it to txt file.
I have a for which takes all text from it and saves to .txt file. But there is alot of not necessary text and it makes .txt file big and hardly readable.
This is what text i usually paste
# userid name uniqueid connected ping loss state
# 122 "bryerzavala" [U:1:167845174] 03:25 152 0 active
# 118 "zhabka" [U:1:12080791] 05:41 109 0 active
and the part i need is only [U:1:167845174] so it would be great is there is an way to make a non needed text trown away and the file save only [U:X:XXXXXXXXX] part.
There are some options to do what you want, it all depends on how the input text is formatted:
If the input text is always formatted the same way for every line, meaning uniqueid is going to be located in the same position you could use substr http://php.net/manual/en/function.substr.php
If the position of uniqueid varies on every line you could use a regular expression to extract the part of the text line you are looking for, in this case you could use preg_match http://php.net/manual/en/function.preg-match.php
here is a simple approach. may be you will like it
i'll use the example of file.txt you have provided to simulate the input data.
old.txt
# userid name uniqueid connected ping loss state
# 122 "bryerzavala" [U:1:167845174] 03:25 152 0 active
# 118 "zhabka" [U:1:12080791] 05:41 109 0 active
process.php
<?php
$file = file_get_contents('./test.txt', true);
// var_dump($file);//for debug
preg_match_all('/\[U:\d:\d+\]/',
$file,
$out, PREG_PATTERN_ORDER);
var_dump($out); //for debug
//save data in your file
$fp = fopen("incomming.txt", "w");
foreach ($out[0] as $key => $data) {
fwrite($fp, $data.PHP_EOL);
}
//close file
fclose($fp);
echo '<hr/>';
$lines = file('incomming.txt');
var_dump($lines);

Array with tricky duplication

I was using the PHP function parse_ini_file() to build an array structure from a INI File I have, but I was somewhat forced to create a custom parser because the native function has no support for nested files nor it parses the file comments.
Although a real INI files do not support complex hierarchy, with such feature I could easily manipulate other informations I have (or so I hope).
And the possibility to have the comments in the file is not really a requirement, but if I allow the INI file to be edited through a GUI, if I don't have access to them the resulting file would not make much sense.
And this "not required requirement" is the problem I can't solve.
Right... I'm publishing the class (and a supporting interface) in this Gist because it's a little big.
Do not judge the method Reader::process() too hard because this is not the real class. The real one involves an entire Stream Reader that would just complicate things if posted here. That fragment, however, is a valid replacement.
The INI File I'm using for testing purposes is this one:
[info]
; Comment Section 1: author
author="Bruno Augusto"
; Comment Section 1: copyright
copyright="MIT"
[descriptor]
; Comment Section 2: name
name="Item Name"
; Comment Section 2: description
description="Item Description"
; Comment Section 2: Line 1
; Comment Section 2, Line 2 (version)
version=1.0
[contents]
; Comment Section 3: Line 1
; Comment Section 3, Line 2 (files)
files[] = config.ini
files[] = folder/file1.php
files[] = folder/folder2/file2.php
files[] = folder/folder2/folder3/file3.php
files[] = folder/folder2/folder3/file4.php
[hierarchyTest]
; Comment Section 4: levels
levels[first\second\third] = "foo"
levels[first\second\third] = "anotherone"
levels[first\second\third] = "onemore"
levels[first\second\third\fourth] = "bar"
levels[first\second\third\fourth] = "baaz"
[hierarchyTest2]
; Comment Section 5: Line 1
; Comment Section 5, Line 2 (levels2)
levels2[first.second.third] = "baaz"
The usage is as simple as instantiate the Reader class with the path of the INI file and invoke the Reader::read() method. We're all big boys here, I think I can suppress this. :p
The buggy comments are proccesed starting in line 68 and added to the final structure in line 81*.
The problem is that currently all the values are being accumulated after each iteration. The one to blame is the array_merge() used in line 69. By removing it:
$comments = $line;
Almost everything works fine, except that multi-line comments holds only the last line as expected by a direct assignment.
Initially, I thought I could simply empty the array after use it so I've added such instruction in the line 84:
$comments = array();
And here is the weirdness. It works well for almost all entries, but the sections named contents and hierarchyTest have no comments added.
I've isolated the problem and it's because of the hierarchy parsing, starting at line 92 but I have no idea how to solve this.
This isolation is so true that if I add one more entry to the last section (hierarchyTest2) with the format:
levels2[first.second.third] = "baazaaa"
Its comments vanishes too.

Read single line from a big text file

I have a 10MB text file.
The length of the lines may vary.
Which is the most efficient way (fast and memory friendly) to read just one specific line from this file? e.g. get_me_the_line($nr, $file_resource)
I don't know of a way to just jump to the line, if the lines are of varying length. However you can iterate through lines pretty quickly when not using them for anything, and return the one of interest.
function ReadLineNumber($file, $number)
{
$handle = fopen($file, "r");
$i = 0;
while (fgets($handle) && $i < $number - 1)
$i++;
return fgets($handle);
}
Edit
I added - 1 to the loop because this reads a line ahead. The $number is therefore a zero-index line reference. Change to - 2 if you would prefer line 1 mean the first line in the file.
As the lines are of varying length you have to look at each character as it might denote the end of the line. Quickest would be loading the file in chunks that are sized like the blocksize of the filesystem and counting the linebreaks until you are on the desired line.
Better way would be to have an index file that stores information about the file containing the lines. Using a database could also be a better idea.
If the file is REALLY large (several GB or more) and your application is running on *nix you may not want to try having PHP process the file and instead use some existing unix tools optimized for this kind of line processing. Once such tool is sed and an example of printing a specific line from a huge file can be found here.
Should be trivial to wrap this in a system_exec() call, or similar to write the function you are looking for.

How can I tell what line a file resource is currently "on" in PHP?

Using PHP, it's possible to read off the contents of a file using fopen and fgets. Each time fgets is called, it returns the next line in the file.
How does fgets know what line to read? In other words, how does it know that it last read line 5, so it should return the contents of line 6 this time? Is there a way for me to access that line-number data?
(I know it's possible to do something similar by reading the entire contents of the file into an array with file, but I'd like to accomplish this with fopen.)
There is a "position" kept in memory for each file that is opened ; it is automatically updated each time you are reading a line/character/whatever from the file.
You can get this position with ftell, and modify it with fseek :
ftell — Returns the current position
of the file read/write pointer
fseek — Seeks on a file pointer
You can also use rewind to... rewind... the position of that pointer.
This is not getting you a position as a line number, but closer to a position as a character number (actually, you are getting the position as a number of bytes from the beginning of the file) ; when you have that, reading a line is just a metter of reading characters until yu hit an end of line character.
BTW : as far as I remember, these functions are coming from the C language -- PHP itself being written in C ;-)
Files are just a stream of data, read from the beginning to the end. The OS will remember the position you've read so far in that file. If needed, doing so in the application as well is fairly simple. The OS only cares about byte positions though, not lines.
Just imagine dealing out a deck of 52 card sequentially. You hand off the first card. Next time the 2. card. When you want to give out the 3. card , you don't need to start counting from the start again, or even remembering where you were you just hand out the next available card, and that'll be the third.
It might be a bit more work that's needed to read lines, since you'd want to buffer data read from the actual file for preformance sake, but it's not that much more to it than to record the offset of the last piece of data you handed out, find the next newline character and hand off all the data between those 2 points.
PHP nor the OS has no real need to keep the line number around, since all the system care about is "next line". If you want to know the line number, you keep a counter and increment it every time your app reads a line.
$lineno=0;
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
lineno++; // keep track of the line number
...
}
i hav this old sample i hob its can help you :)
$File = file('path');
$array = array();
$linenr = 5;
foreach( $File AS $line_num => $line )
{
$array = array_push( $array , $line );
}
echo $array[($linenr-1)];
You could just call fgets and increment a var $line_number each time you call it. That would tell you the line it is on.

Categories