Handling text file with unknown newline positions

Handling text file with unknown newline positions - php

My problem is simple: I have a text file, which i handle and insert all the data in a database and also do stuff with it for each new line. The problem is that the text file is a log for sms'es received in my gateway and depending on the text that is being sent I would have a line corresponding to each sms. If an SMS does not have any new lines in its body, everything is alright, on the other hand, if and SMS is sent like this:
"Test
TestOnANewLine"
I get a log file that breaks and with a new line everytime. A sample follows:
2012-01-01 10:10:10,4C64DCD6.req,192.168.999.999,+12223334444,OK -- SMPP - 999.999.999.999:9999,SubmitUser=user;Sender=sender;SMSCMsgId=999999999;Text="Test1
NewLineTest
AnotherNEwLineTEst"
The log file is interpreted like this:
date time, smsid, ip that processed it, number that is being sent to, status --connection type - ip that is sent from, user that submitted; sender name that is displayed; sms connection id; body of the sms
As for the language I am using PHP and for the functions used its a simple
foreach($lines as $line)
{ explode and do stuff }
How do I handle this situation? At this point any help is appreciated
Thanks in advance!!

fgetcsv could handle the linebreaks enclosed in '"' but with an additional '"' character in the body it would fail...
So what about some unresponsible regexp usage?
preg_match_all(#^(\d{4}-\d{2}-\d{2}[^,]+),([^,]+),([^,]+),([^,]+),([^,]+),SubmitUser=([^;])+;Sender=([^;])+;SMSCMsgId=([^;])+;Text="([\w\d\s\.\-,:;'"]+)"$#im', $file, $matches);
should do the job, for not too crazy texts, maybe you should adpot the \w\d\s.-,:;'" expression more to your needs

Couldn't you loop through the newlines until you can parse a date from it?
Maybe take into account that the previous line ended with a double quote ?
I know its not fool proof but without some recognisable "end of message" character(s). This is the best i could think of :P

First of all, thank you for all the feedback, it was really precious and it helped me on solving this issue. Also, for all the other people that will look through this post and would want a solution here is mine:
I changed the way I would interpret the end of line /r/n from the regular one to /r/n2 which means that ill consider a new line in my file reading if and only if there is a regular new line /r/n and on the new physical line there is a 2 (which is the beginning of the year)
The actual solved part is:
$data = file_get_contents($backup_file);
$lines=explode("\r\n2",$data);
foreach($lines as $line)
{
//explode and do stuff
}

Try this to get all the log entries normalized into a single array item per log entry (i.e. combine entries across multiple line breaks into a single item)
$line_array = file('/path/to/file');
$log_array = array();
$i = -1;
$date_pattern = '/^[0-9]{4}-[0-9]{2}-[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}/';
foreach ($line_array as $line) {
if (1 === preg_match($date_pattern, $line)) {
// this is a new log entry
// let's trim the whitespace from the end of the last log array entry since we are done with it
if(isset($log_array[$i])) {
$log_array[$i] = rtrim($log_array[$i]);
}
// start a new log array entry
$i++;
$log_array[$i] = $line;
} else {
// this is not a new log entry
$log_array[$i] .= $line;
}
}
After that you should be able to work with $log_array to extract the data you need. By the way I should note that when you loop through the $log_array. It would probably be helpful to extract the msg text first. If you do a greedy preg_match on the double quotes, you shouldn't have any problems with messages that have quotes within them as the greedy match will find the largest possible matching string, which in your case would be everything between the quotes bounding the message content.

Related

Can you write line by line to a PHP variable?

I have a script that generates Javascript based on user form inputs. At present the code is outputted to a txt file on the server, but I'd like to put it into a MySql database.
Writing line by line to a txt file is easy with fopen, and helpful with my script due to the way the code is generated and wrapped around user inputs (various loops etc).
However, I'd really like to write the output to a variable, and then send that to the database. However, I can't see any way of accomplishing this?
Im sure it is possible, but the information I've found online only deals with quite basic variable creation.
A dirty solution would be to write to the txt file as I currently do, and then load the text file into a variable and then delete the text file. But this seems silly and clearly a waste of processing time.
Very new to Php so sorry if the above seems dumb.

It's not too difficult, you can declare the variable with the first line and then incrementally write to it, with the \n escape sequence (representing a new line) separating each line. You can size use the PHP_EOL built-in inserted, as commented. The=` assignment operator appends the string following the operator to the variable's value prior to the operation.
$lines = "my first line";
while (condition) {
$lines .= PHP_EOL . "my next line";
}
A derivative way of doing this would be to insert all the lines inside the loop and start with just declaring an empty string.
$lines = "";
while (condition) {
$lines .= "my next line" . PHP_EOL;
}
Note that this method will add an empty newline at the end, which you can trim off of needed.
Alternatively, another way would be to create an array, push to it, and then use the implode function to glue together the array into a string using a newline.
$lines = array();
while (condition) {
array_push($lines, "my next line");
}
$lines = implode(PHP_EOL, $lines);

PHP variables look the same but are not equal (I'm confused)

OK, so I shave my head, but if I had hair I wouldn't need a razor because I'd have torn it all out tonight. It's gone 3am and what looked like a simple solution at 00:30 has become far from it.
Please see the code extract below..
$psusername = substr($list[$count],16);
if ($psusername == $psu_value){
$answer = "YES";
}
else {
$answer = "NO";
}
$psusername holds the value "normann" which is taken from a URL in a text based file (url.db)
$psu_value also holds the value "normann" which is retrieved from a cookie set on the user's computer (or a parameter in the browser address bar - URL).
However, and I'm sure you can guess my problem, the variable $answer contains "NO" from the test above.
All the PHP I know I've picked up from Google searches and you guys here, so I'm no expert, which is perhaps evident.
Maybe this is a schoolboy error, but I cannot figure out what I'm doing wrong. My assumption is that the data types differ. Ultimately, I want to compare the two variables and have a TRUE result when they contain the same information (i.e normann = normann).
So if you very clever fellows can point out why two variables echo what appears to be the same information but are in fact different, it'd be a very useful lesson for me and make my users very happy.

Do they echo the same thing when you do:
echo gettype($psusername) . '\n' . gettype($psu_value);

Since i can't see what data is stored in the array $list (and the index $count), I cannot suggest a full solution to yuor problem.
But i can suggest you to insert this code right before the if statement:
var_dump($psusername);
var_dump($psu_value);
and see why the two variables are not identical.
The var_dump function will output the content stored in the variable and the type (string, integer, array ec..), so you will figure out why the if statement is returning false

Since it looks like you have non-printable characters in your string, you can strip them out before the comparison. This will remove whatever is not printable in your character set:
$psusername = preg_replace("/[[:^print:]]/", "", $psusername);

0D 0A is a new line. The first is the carriage return (CR) character and the second is the new line (NL) character. They are also known as \r and \n.
You can just trim it off using trim().
$psusername = trim($psusername);
Or if it only occurs at the end of the string then rtrim() would do the job:
$psusername = rtrim($psusername);
If you are getting the values from the file using file() then you can pass FILE_IGNORE_NEW_LINES as the second argument, and that will remove the new line:
$contents = file('url.db', FILE_IGNORE_NEW_LINES);

I just want to thank all who responded. I realised after viewing my logfile the outputs in HEX format that it was the carriage return values causing the variables to mismatch and a I mentioned was able to resolve (trim) with the following code..
$psusername = preg_replace("/[^[:alnum:]]/u", '', $psusername);
I also know that the system within which the profiles and usernames are created allow both upper and lower case values to match, so I took the precaution of building that functionality into my code as an added measure of completeness.
And I'm happy to say, the code functions perfectly now.
Once again, thanks for your responses and suggestions.

PHP Advanced Regex Splitting

I'm facing a slight issue with an idea.
I use a chat feature within an online forum on all my computing devices. I also use it mobily, which causes slight issues of formatting, input, etc. I've had the idea to relay all the chat from a relay account to my own mobile friendly site.
I haven't started on sending messages yet, although I know how to read messages. How to output them is the issue.
I sniffed outgoing packets on my computer as the chat uses ajax. I was then able to find the following url: http://server05.ips-chat-service.com/get.php?room=xxxx&user=xxxx&access_key=xxxx
The page outputs something similar to this: ~~||~~1419344231,1,kondaxdesign,Could somebody send a quick message for me__C__ please?,,10248~~||~~1419344237,1,tom.bridges,its a iso and a vm what more do we need to know?,,10880~~||~~
That string would output this in chat: http://i.stack.imgur.com/j7CM6.png
I unfortunately don't have much knowledge on regex, or any other function that would split this. Would anybody be able to assist me on getting the 1). Name, 2). Chat Data and 3). Timestamp?
As you can see, the string is something like this: ~~||~~[timestamp],1,[name],[data],,[some integer]~~||~~
Cheers.
After reading through the string output, when somebody leaves chat, this is sent: ~~||~~1419344521,2,wegface,TIMEOUT,2_10828,0~~||~~
The beginning of the log starts with 1,224442 before the first ~~||~~.

You would first explode each record, then use str_getcsv to read the string and parse it as you want. Here is a script that does that, without any formatting on output, and I've named the variables as named in the OP that describes what they are.
I wouldn't use a regular expression to parse the string, as better functionality is available (linked above)
$string = "~~||~~1419344231,1,kondaxdesign,Could somebody send a quick message for me__C__ please?,,10248~~||~~1419344237,1,tom.bridges,its a iso and a vm what more do we need to know?,,10880~~||~~";
//Split so we have each chat record to loop around
foreach( explode("~~||~~", $string) as $segments) {
//Read the CSV properly
$chat = str_getcsv($segments);
if( count($chat) <> 6 ) { continue; } //Skip any that don't have all the data
$timestamp = $chat[0];
$name = $chat[2];
$data = $chat[3];
$some_integer = $chat[5];
echo $name .' said - '. $data .'<br />';
}

PHP preg_replace inside for loop

I'm currently trying out this PHP preg_replace function and I've run into a small problem. I want to replace all the tags with a div with an ID, unique for every div, so I thought I would add it into a for loop. But in some strange way, it only do the first line and gives it an ID of 49, which is the last ID they can get. Here's my code:
$res = mysqli_query($mysqli, "SELECT * FROM song WHERE id = 1");
$row = mysqli_fetch_assoc($res);
mysqli_set_charset("utf8");
$lyric = $row['lyric'];
$lyricHTML = nl2br($lyric);
$lines_arr = preg_split('[<br />]',$lyricHTML);
$lines = count($lines_arr);
for($i = 0; $i < $lines; $i++) {
$string = preg_replace(']<br />]', '</h4><h4 id="no'.$i.'">', $lyricHTML, 1);
echo $i;
}
echo '<h4>';
echo $string;
echo '</h4>';
How it works is that I have a large amount of text in my database, and when I add it into the lyric variable, it's just plain text. But when I nl2br it, it gets after every line, which I use here. I get the number of by using the little "lines_arr" method as you can see, and then basically iterate in a for loop.
The only problem is that it only outputs on the first line and gives that an ID of 49. When I move it outside the for loop and removes the limit, it works and all lines gets an <h4> around them, but then I don't get the unique ID I need.
This is some text I pulled out from the database
Mama called about the paper turns out they wrote about me
Now my broken heart´s the only thing that's broke about me
So many people should have seen what we got going on
I only wanna put my heart and my life in songs
Writing about the pain I felt with my daddy gone
About the emptiness I felt when I sat alone
About the happiness I feel when I sing it loud
He should have heard the noise we made with the happy crowd
Did my Gran Daddy know he taught me what a poem was
How you can use a sentence or just a simple pause
What will I say when my kids ask me who my daddy was
I thought about it for a while and I'm at a loss
Knowing that I´m gonna live my whole life without him
I found out a lot of things I never knew about him
All I know is that I´ll never really be alone
Cause we gotta lot of love and a happy home
And my goal is to give every line an <h4 id="no1">TEXT</h4> for example, and the number after no, like no1 or no4 should be incremented every iteration, that's why I chose a for-loop.

Looks like you need to escape your regexp
preg_replace('/\[<br \/\]/', ...);
Really though, this is a classic XY Problem. Instead of asking us how to fix your solution, you should ask us how to solve your problem.
Show us some example text in the database and then show us how you would like it to be formatted. It's very likely there's a better way.
I would use array_walk for this. ideone demo here
$lines = preg_split("/[\r\n]+/", $row['lyric']);
array_walk($lines, function(&$line, $idx) {
$line = sprintf("<h4 id='no%d'>%s</h4>", $idx+1, $line);
});
echo implode("\n", $lines);
Output
<h4 id="no1">Mama called about the paper turns out they wrote about me</h4>
<h4 id="no2">Now my broken heart's the only thing that's broke about me</h4>
<h4 id="no3">So many people should have seen what we got going on</h4>
...
<h4 id="no16">Cause we gotta lot of love and a happy home</h4>
Explanation of solution
nl2br doesn't really help us here. It converts \n to <br /> but then we'd just end up splitting the string on the br. We might as well split using \n to start with. I'm going to use /[\r\n]+/ because it splits one or more \r, \n, and \r\n.
$lines = preg_split("/[\r\n]+/", $row['lyric']);
Now we have an array of strings, each containing one line of lyrics. But we want to wrap each string in an <h4 id="noX">...</h4> where X is the number of the line.
Ordinarily we would use array_map for this, but the array_map callback does not receive an index argument. Instead we will use array_walk which does receive the index.
One more note about this line, is the use of &$line as the callback parameter. This allows us to alter the contents of the $line and have it "saved" in our original $lyrics array. (See the Example #1 in the PHP docs to compare the difference).
array_walk($lines, function(&$line, $idx) {
Here's where the h4 comes in. I use sprintf for formatting HTML strings because I think they are more readable. And it allows you to control how the arguments are output without adding a bunch of view logic in the "template".
Here's the world's tiniest template: '<h4 id="no%d">%s</h4>'. It has two inputs, %d and %s. The first will be output as a number (our line number), and the second will be output as a string (our lyrics).
$line = sprintf('<h4 id="no%d">%s</h4>', $idx+1, $line);
Close the array_walk callback function
});
Now $lines is an array of our newly-formatted lyrics. Let's output the lyrics by separating each line with a \n.
echo implode("\n", $lines);
Done!

If your text in db is in every line why just not explode it with \n character?
Always try to find a solution without using preg set of functions, because they are heavy memory consumers:
I would go lke this:
$lyric = $row['lyric'];
$lyrics =explode("\n",$lyrics);
$lyricsHtml=null;
$i=0;
foreach($lyrics as $val){
$i++;
$lyricsHtml[] = '<h4 id="no'.$i.'">'.$val.'</h4>';
}
$lyricsHtml = implode("\n",$lyricsHtml);

An other way with preg_replace_callback:
$id = 0;
$lyric = preg_replace_callback('~(^)|$~m',
function ($m) use (&$id) {
return (isset($m[1])) ? '<h4 id="no' . ++$id . '">' : '</h4>'; },
$lyric);

Parsing s-expressions with PHP

Well, I need to parse 2 textfiles. 1 named Item.txt and one named Message.txt They are configuration files for a game server, Item contains a line for each item in the game, and Message has Item names, descriptions, server messages etc. I know this is far less than ideal, but I can't change the way this works, or the format.
The idea is in Item.txt I have lines in this format
(item (name 597) (Index 397) (Image "item030") (desc 162) (class general etc) (code 4 9 0 0) (country 0 1 2) (plural 1) (buy 0) (sell 4) )
If I have the php variable $item which is equal to 397 (Index), I need to first get the 'name' (597).
Then I need to open Message.txt and find this line
( itemname 597 "Blue Box")
Then return "Blue Box" to PHP as a variable.
What I'm trying to do is return the item's name with the item's Index.
I know this is probably something really basic, but I've searched though dozens of file operation tutorials and still can't seem to find what I need.
Thanks

Following method doesn't actually 'parse' the files, but it should work for your specific problem...
(Note: not tested)
Given:
$item = 397;
open Item.txt:
$lines = file('Item.txt');
search index $item and get $name:
$name = '';
foreach($lines as $line){ // iterate lines
if(strpos($line, '(Index '.$item.')')!==false){
// Index found
if(preg_match('#\(name ([^\)]+)\)#i', $line, $match)){
// name found
$name = $match[1];
}
break;
}
}
if(empty($name)) die('item not found');
open Message.txt:
$lines = file('Message.txt');
search $name and get $msg:
$msg = '';
foreach($lines as $line){ // iterate lines
if(strpos($line, 'itemname '.$name.' "')!==false){
// name found
if(preg_match('#"([^"]+)"#', $line, $match)){
// msg found
$msg = $match[1];
}
break;
}
}
$msg should now contain Blue Box:
echo $msg;

Not sure if your problem is with parsing the expressions, or reading files per se since you mention "file operation tutorials".
Those parenthetical expressions in your files are called s-expressions. You may want to google for an s-expression parser and adapt it to php.

You should look into the serialize function, which allows data to be stored to a textfile in a format that PHP can reinterpret easily when it needs to be reloaded.
Serializing this data as an array and saving it down to the textfiles would allow you to access it by array keys. Let's take your example. As an array, the data you described would look something like this:
$items[397]['name'] = 'bluebox';
Serializing the item array would put it in a format that could be saved and later accessed.
$data = serialize($items);
//then save data down to the text files using fopen or your favorite class
You could then load the file and unserialize it's contents to end up with the same array. The serialize and unserialize functions are directly intended for this application.

the first text file has several features that you can use to help parse it. It is up to you to decide if it is well formed and reliable enough to key on.
I noticed:
1) a record is delimited by a single line break
2) the record is further delimted by a set of parens ()
3) the record is typed using a word (e.g. item)
4) each field is delimited by parens
5) each field is named and the name is the first 'word'
6) anything after the first word is data, delimited by spaces
7) data with double quotes are string literals, everything else is a number
A method:
read to the end of line char and store that
strip the opening and closing parens
strip all closing )
split at ( and store in temp array (see: http://www.php.net/manual/en/function.explode.php)
element 0 is the type (e.g. item)
for elements 1-n, split at space and store in temp array.
element 0 in this new array will be the key name, the rest is data
once you have all the data compartmentalized, you can then store it in an associative array or database. The exact structure of the array is difficult for me to envision without actually getting into it.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Handling text file with unknown newline positions - php

Couldn't you loop through the newlines until you can parse a date from it? Maybe take into account that the previous line ended with a double quote ? I know its not fool proof but without some recognisable "end of message" character(s). This is the best i could think of :P

Related

Can you write line by line to a PHP variable?

PHP variables look the same but are not equal (I'm confused)

PHP Advanced Regex Splitting

PHP preg_replace inside for loop

Parsing s-expressions with PHP

Categories

Resources