Parsing s-expressions with PHP - php

Well, I need to parse 2 textfiles. 1 named Item.txt and one named Message.txt They are configuration files for a game server, Item contains a line for each item in the game, and Message has Item names, descriptions, server messages etc. I know this is far less than ideal, but I can't change the way this works, or the format.
The idea is in Item.txt I have lines in this format
(item (name 597) (Index 397) (Image "item030") (desc 162) (class general etc) (code 4 9 0 0) (country 0 1 2) (plural 1) (buy 0) (sell 4) )
If I have the php variable $item which is equal to 397 (Index), I need to first get the 'name' (597).
Then I need to open Message.txt and find this line
( itemname 597 "Blue Box")
Then return "Blue Box" to PHP as a variable.
What I'm trying to do is return the item's name with the item's Index.
I know this is probably something really basic, but I've searched though dozens of file operation tutorials and still can't seem to find what I need.
Thanks

Following method doesn't actually 'parse' the files, but it should work for your specific problem...
(Note: not tested)
Given:
$item = 397;
open Item.txt:
$lines = file('Item.txt');
search index $item and get $name:
$name = '';
foreach($lines as $line){ // iterate lines
if(strpos($line, '(Index '.$item.')')!==false){
// Index found
if(preg_match('#\(name ([^\)]+)\)#i', $line, $match)){
// name found
$name = $match[1];
}
break;
}
}
if(empty($name)) die('item not found');
open Message.txt:
$lines = file('Message.txt');
search $name and get $msg:
$msg = '';
foreach($lines as $line){ // iterate lines
if(strpos($line, 'itemname '.$name.' "')!==false){
// name found
if(preg_match('#"([^"]+)"#', $line, $match)){
// msg found
$msg = $match[1];
}
break;
}
}
$msg should now contain Blue Box:
echo $msg;

Not sure if your problem is with parsing the expressions, or reading files per se since you mention "file operation tutorials".
Those parenthetical expressions in your files are called s-expressions. You may want to google for an s-expression parser and adapt it to php.

You should look into the serialize function, which allows data to be stored to a textfile in a format that PHP can reinterpret easily when it needs to be reloaded.
Serializing this data as an array and saving it down to the textfiles would allow you to access it by array keys. Let's take your example. As an array, the data you described would look something like this:
$items[397]['name'] = 'bluebox';
Serializing the item array would put it in a format that could be saved and later accessed.
$data = serialize($items);
//then save data down to the text files using fopen or your favorite class
You could then load the file and unserialize it's contents to end up with the same array. The serialize and unserialize functions are directly intended for this application.

the first text file has several features that you can use to help parse it. It is up to you to decide if it is well formed and reliable enough to key on.
I noticed:
1) a record is delimited by a single line break
2) the record is further delimted by a set of parens ()
3) the record is typed using a word (e.g. item)
4) each field is delimited by parens
5) each field is named and the name is the first 'word'
6) anything after the first word is data, delimited by spaces
7) data with double quotes are string literals, everything else is a number
A method:
read to the end of line char and store that
strip the opening and closing parens
strip all closing )
split at ( and store in temp array (see: http://www.php.net/manual/en/function.explode.php)
element 0 is the type (e.g. item)
for elements 1-n, split at space and store in temp array.
element 0 in this new array will be the key name, the rest is data
once you have all the data compartmentalized, you can then store it in an associative array or database. The exact structure of the array is difficult for me to envision without actually getting into it.

Related

Reading a text file with specific code tag information in php

I would like to read a file, generally a text file, each record is starting with a with a specific code (filed name) in the line and ended by another specific code for a complete record. Each specific code is delimited by character ^ as its value in php into dump into sql database.
text file e.g.
001^UK2000009
008^S54/01/R/M/X,
009^Male
110^text1
200^text2
001^UK2000008
008^S54/012/R/M/X
009^Female
110^text1a
200^text2a
and so on...
This is similar to php constructor File_MARC
thanks in advance
First you have to read a file with file methods in php and than you can get a specific column name and it's value by below way
First read a single line from a file and than use a explode method to break that line into different elements with space delimitation.
$columns = explode(' ', $line_variable);
After generating columns I can see that each key values are delimited by ^ (cap) symbol so for that also we can use the explode method.
$newColumn =[];
foreach($columns as $column){
$splited = explode('^', $column);
$newColumn[][$splited[0]] = $splited[1];
}
print_r($newColumn);
This is just to give you an idea that how you can achieve your task but rest is completely dependent on you.

Best Way to Recognize Person's First and Last Name in Text String

I'm trying to extract a people's names from text files, which I am reading line by line. With the way the file is structured, both the first and last name should almost always be on the same line and will be within the first few lines of the file. Currently, I search for the first name in an array of ~2300 names and then assume that the following word is the last name. My issue with my current approach is that it doesn't correctly match the names and thus may incorrectly identify a different word in the file as the name. For example, my name is Daniel, but the function skips over my name and recognizes Virginia (a word later in the file) as my first name. Am I doing anything wrong and is there a better way of doing this? I am pretty new to PHP, so chances are I'm making a silly mistake.
Clarifications: The file is a raw text file containing data that is extracted from pictures of resumes via OCR. For the purposes of my project, I am assuming that there is always a first & last name (no middle), and that both will be on the same line
$name = $this->search($line);
if (count($name) > 0 && empty($fname) && empty($lname)){
$fname = $name[0];
$lname = $name[1];
}
function search($str){ //$str is the current file line being read
require "utils".DIRECTORY_SEPARATOR."dictionary-first-names.php";
$arr = explode(" ", $str);
for ($i = 0; $i < count($arr); $i++){
if (in_array(mb_strtolower($arr[$i]), $dict)){
return array($arr[$i], $arr[$i+1]); //shouldn't have array out of bounds as first & last name should be on the same line
}
}
}
Here is a pastebin link to dictionary-first-names.php, since it's very long: https://pastebin.com/cRFkR4fh
You can use Named Entity Recognition (NER) methods, spacy and NLP Core are two best libraries for that purpose. But you should do that in python.

Handling text file with unknown newline positions

My problem is simple: I have a text file, which i handle and insert all the data in a database and also do stuff with it for each new line. The problem is that the text file is a log for sms'es received in my gateway and depending on the text that is being sent I would have a line corresponding to each sms. If an SMS does not have any new lines in its body, everything is alright, on the other hand, if and SMS is sent like this:
"Test
TestOnANewLine"
I get a log file that breaks and with a new line everytime. A sample follows:
2012-01-01 10:10:10,4C64DCD6.req,192.168.999.999,+12223334444,OK -- SMPP - 999.999.999.999:9999,SubmitUser=user;Sender=sender;SMSCMsgId=999999999;Text="Test1
NewLineTest
AnotherNEwLineTEst"
The log file is interpreted like this:
date time, smsid, ip that processed it, number that is being sent to, status --connection type - ip that is sent from, user that submitted; sender name that is displayed; sms connection id; body of the sms
As for the language I am using PHP and for the functions used its a simple
foreach($lines as $line)
{ explode and do stuff }
How do I handle this situation? At this point any help is appreciated
Thanks in advance!!
fgetcsv could handle the linebreaks enclosed in '"' but with an additional '"' character in the body it would fail...
So what about some unresponsible regexp usage?
preg_match_all(#^(\d{4}-\d{2}-\d{2}[^,]+),([^,]+),([^,]+),([^,]+),([^,]+),SubmitUser=([^;])+;Sender=([^;])+;SMSCMsgId=([^;])+;Text="([\w\d\s\.\-,:;'"]+)"$#im', $file, $matches);
should do the job, for not too crazy texts, maybe you should adpot the \w\d\s.-,:;'" expression more to your needs
Couldn't you loop through the newlines until you can parse a date from it?
Maybe take into account that the previous line ended with a double quote ?
I know its not fool proof but without some recognisable "end of message" character(s). This is the best i could think of :P
First of all, thank you for all the feedback, it was really precious and it helped me on solving this issue. Also, for all the other people that will look through this post and would want a solution here is mine:
I changed the way I would interpret the end of line /r/n from the regular one to /r/n2 which means that ill consider a new line in my file reading if and only if there is a regular new line /r/n and on the new physical line there is a 2 (which is the beginning of the year)
The actual solved part is:
$data = file_get_contents($backup_file);
$lines=explode("\r\n2",$data);
foreach($lines as $line)
{
//explode and do stuff
}
Try this to get all the log entries normalized into a single array item per log entry (i.e. combine entries across multiple line breaks into a single item)
$line_array = file('/path/to/file');
$log_array = array();
$i = -1;
$date_pattern = '/^[0-9]{4}-[0-9]{2}-[0-9]{2}\s[0-9]{2}:[0-9]{2}:[0-9]{2}/';
foreach ($line_array as $line) {
if (1 === preg_match($date_pattern, $line)) {
// this is a new log entry
// let's trim the whitespace from the end of the last log array entry since we are done with it
if(isset($log_array[$i])) {
$log_array[$i] = rtrim($log_array[$i]);
}
// start a new log array entry
$i++;
$log_array[$i] = $line;
} else {
// this is not a new log entry
$log_array[$i] .= $line;
}
}
After that you should be able to work with $log_array to extract the data you need. By the way I should note that when you loop through the $log_array. It would probably be helpful to extract the msg text first. If you do a greedy preg_match on the double quotes, you shouldn't have any problems with messages that have quotes within them as the greedy match will find the largest possible matching string, which in your case would be everything between the quotes bounding the message content.

Reading specific CSV value in PHP

I have the following CSV file:
08-0018421032;00-0018151831;G-20009429-0;G-20009429-0;0374048-0
27-001842101232;10-0018151831;G-30009429-0;G-50009429-0;7374048-0
36-0018421033232;20-0018151831;G-40009429-0;G-60009429-0;8374048-0
As you can see the separator is the ; symbol.
I then send this info to php via a jquery plugin which works perfect since I can read the file in PHP. The following code grabs the CSV file (Which is the $csvfile variable) and I can see the lines in it:
$file = fopen("upload/$csvfile", "r");
while (!feof($file) ) {
$line = fgetcsv($file, 1024,';');
print $line[0].'<br/>';
}
fclose($file);
What I need is to be able to select not only the line but on the value in it. To go to a specific value, for example in the first line the 3rd value would be G-20009429-0 and I would assign this to a php variable to be used later on.
Right now I have no idea how to grab a specific value in a line and also when I print the $line[0] it shows the values in a vertical order instead of a horizontal order. What I mean with this is that it shows the following output:
00-0018151831
10-0018151831
20-0018151831
Instead of showing me like this:
08-0018421032;00-0018151831;G-20009429-0;G-20009429-0;0374048-0
Maybe is the sleep but am stuck here. Just to repeat, the csv file is read by Php correctly since I can do a print_r on it and it shows all the lines in it. The thing is how to manipulate the information after I have the csv and how to grab a specific value in a specific line. Thank you.
$line is an array containing every element from that row. $line[0] is the first element of the row, $line[1] the second element and so on. Try var_dump($line). What you're doing is you output every first element of every row.
If you want to output every element in one line, just concatenate the array again:
echo join(';', $line);
But then that's missing the point of fgetcsv, which is specifically helpfully separating those elements into an array for you so you can work with them.

PHP: Formatting irregular CSV files into HTML tables

My client receives a set of CSV text files periodically, where the elements in each row follow a consistent order and format, but the commas that separate them are inconsistent. Sometimes one comma will separate two elements and other times it will be two or four commas, etc ...
The PHP application I am writing attempts to do the following things:
PSEUDO-CODE:
1. Upload csv.txt file from client's local directory.
2. Create new HTML table.
3. Insert the first three fields FROM csv.txt into HTML table row.
4. Iterate STEP 2 while the FIRST field equals the First field below it.
5. If they do not equal, CLOSE HTML table.
6. Check to see if FIRST field is NOT NULL, IF TRUE, GOTO step 2, Else close HTML table.
I have no trouble with steps 1 and 2. Step 3 is where it gets tricky since the fields in the csv.txt files are not always separated by the same number of commas. They are, however, always in the same relative order and format. I am also having issues with step 4. I don't know how to check if the beginning field in a row matches the beginning field in the row below it. Steps 5 should be relatively simple. For step 6, I need to find an eqivalent of a "GOTO" function in PHP.
Please let me know if any part of the question is unclear. I appreciate your help.
Thank you in advance!
If you want to group the rows by their first element you can try something like:
read the next row via fgetcsv()
filter empty elements (a,,b,c -> a,b,c)
if the row contains fields <-> is not empty append the row to "its" group
That's not exactly what you've described but it may be what you want ;-)
<?php
$fp = fopen('test.csv', 'rb') or die('!fopen');
$groups = array();
while(!feof($fp)) {
$row = array_filter(fgetcsv($fp));
if ( !empty($row) ) {
// # because I don't care whether the array exists or not
#$groups[$row[0]][] = $row;
}
}
foreach( $groups as $g ) {
echo '
<table>';
foreach( $g as $row ) {
echo '
<tr>
<td>', join('</td><td>', array_map('htmlentities', $row)), '</td>
</tr>
';
}
echo '</table>';
}
why not simply start by going through any replacing any multiples of commas with a single comma. eg:
abc,def,,ghi,,,,jkl
becomes:
abc,def,ghi,jkl
and then just continue normally.
If you mean that there are different numbers of commas on each line, then as far as I can see it is actually impossible to do what you want to do by looking at the commas alone. For example:
ab,c,d,ef // could group columns a-f in that way, but
a,bc,de,f // could also group columns a-f
... and you would have no way of knowing which was the proper arrangement, unless you're given some other instructions or the type of data is identifiable by regular expression as someone else said.
If on the other hand you just mean that sometimes there are blanks, but there are still the same number of columns, like this:
a,b,,d,e,f
a,,c,d,e,f
... then you can still form the table correctly. I would recommend using explode(',' $line) in that case and then doing your processing on the elements of the exploded array without worrying about what is inside them.

Categories