Need help parsing some data in PHP - php

I need to be able to parse this sort of data in PHP:
Acct: 1
email : bleh#gmail.com
status : online
--------------------------------------------------
Acct: 2
email : dfgg#fgfg.com
status : banned
--------------------------------------------------
Acct: 3
signedupname : SomeUsername
somethingelse : offline
--------------------------------------------------
As you can see the data is largely random. The only thing that remains the same is the ----- seperating each entry and the Acct: 1 bits. The padding between each : often changes as does the variable to represent each bit of data.
I've tried going through and parsing it all myself using regex but I am defenately not skilled enough to be able to do it properly. At the end of this I want data according to the following:
Acct: <integer>
var1: <string>
var2: <string>
If that makes any sense at all. It doesn't need to be that effeciant, as I will need to do this about once a day and I do not mind waiting for how-ever long it needs.
Thank you. :)
Edit: This is what I did on my own:
<?php
$data = file_get_contents('records.txt');
$data = explode('******** Saved Host list with acct/variables ********', $data);
$data = explode('--------------------------------------------------', $data[1]);
foreach($data as &$dataz)
{
$dataz = trim($dataz);
}
$data = str_replace('Acct:', "\nAcct:", $data);
foreach($data as $dataz)
{
preg_match('/Acct: (.*)/', $dataz, $match);
$acct = $match[1];
preg_match('/: (.*)/', $dataz, $match);
$var1 = $match[1];
echo $var1;
}
?>
I got as far as extracting the Acct: part, but anything beyond that I simply can't get my head around.

This piece of code will take your entire input and produce an associative array for each record in your input:
// replace ----- with actual number of dashes
foreach (explode('-----', $input) as $entry) {
$entry = trim($entry);
$record = array();
foreach (explode("\n", $entry) as $line) {
$parts = explode(':', $line);
$varname = trim($parts[0]);
$value = trim($parts[1]);
$record[$varname] = $value;
}
// Do anything you want with $record here
}
Edit: I just had a look at the code you posted. You really don't need regular expressions for what you're trying to do. Regex can be really handy when used in the right place, but most of the time, it's not the right thing to use.

Related

Extracting meaningful data from this complicated string in PHP

I'm receiving some structured data for my PHP application, but the format is somewhat unpredictable and difficult to deal with. I don't get a say in the initial format of the data. What I get is a string (sample given below).
[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78],[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80],[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64],[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70],[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]
The above is data for 5 football players. This is what I need to get:
[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78]
[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80]
[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64]
[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70]
[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]
Now, what I've done manually in the above example I need to do reliably with PHP. As you see, each player has a set of data. In order to split the big string into individual players, I can't just explode it by "],[" because that substring appears within each player's data too an unpredictable number of times.
Each player has a certain number of statistics (accurate_pass, touches etc) but they don't all have the same statistics. For instance, player #1 has "saves" and the others don't. Player #4 has "won_contest" and the others don't. There is no way to know who will have which stats. That means I can't just count commas until the new player or something similar.
Each player has a number before his name, but that number has an unpredictable number of digits and there's no way to discern it from other numbers which may appear in the string.
What I see as a constant occurrence for all players is the last bit: before the last closed bracket there are always 3 integers divided by commas. This type of substring (INT,INT,INT]) doesn't seem to appear in any other situation. Maybe this could be of some use?
A "hard" way to do this is parenthesis counting (less common in PHP, more common in text parsing languages)...
<?php
$str = "[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78],[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80],[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64],[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70],[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]";
$line = ',';
$paren_count = 0;
$lines = array();
for($i=0; $i<strlen($str); $i++)
{
$line.= $str{$i};
if($str{$i} == '[') $paren_count++;
elseif($str{$i} == ']')
{
$paren_count--;
if($paren_count == 0)
{
$lines[] = substr($line,1);
$line = '';
}
}
}
print_r($lines);
?>
Looks like #Boundless answer is correct, you can use json_decode, but you need to do a couple of things to the string you get first, which also seems like a valid json formatted string.
This worked for me:
<?php
$str = "[9484,'Víctor Valdés',8,[[['accurate_pass',[15]],['touches',[42]],['saves',[4]],['total_pass',[24]],['good_high_claim',[2]],['formation_place',[1]]]],1,'GK',1,0,0,'GK',31,183,78],[1320,'Carles Puyol',7.76,[[['accurate_pass',[50]],['touches',[75]],['aerial_won',[3]],['total_pass',[55]],['total_tackle',[1]],['formation_place',[6]]]],2,'DC',5,0,0,'D(CLR)',35,178,80],[5780,'Dani Alves',8.21,[[['accurate_pass',[58]],['touches',[99]],['total_scoring_att',[1]],['total_pass',[66]],['total_tackle',[6]],['aerial_lost',[1]],['fouls',[4]],['formation_place',[2]]]],2,'DR',22,0,0,'D(CR)',30,173,64],[83686,'Marc Bartra',8.31,[[['accurate_pass',[64]],['touches',[88]],['won_contest',[1]],['total_scoring_att',[1]],['aerial_won',[1]],['total_pass',[66]],['total_tackle',[5]],['aerial_lost',[1]],['fouls',[1]],['formation_place',[5]]]],2,'DC',15,0,0,'D(C)',22,181,70],[13471,'Adriano',6.72,[[['accurate_pass',[16]],['touches',[28]],['aerial_won',[2]],['total_pass',[18]],['total_tackle',[1]],['formation_place',[3]]]],2,'DL',21,1,31,'D(CLR),M(LR)',29,172,67]";
$str = '[' . $str . ']';
$str = str_replace('\'','"', $str);
//convert string to array
$arr = json_decode($str);
//now it's a php array so you can access any value
//echo '<pre>';
//print_r( $arr );
//echo '</pre>';
echo $arr [0][1]; //prints "Victor Valdes"
?>
Your string looks like JSON but it is not valid JSON so json_decode() will not work.
Your specific case could be converted to valid JSON by wrapping the string in a pair of [] and replacing the single quotes with double quotes:
$string = str_replace("'", '"', $your_string);
var_dump(json_decode('[' . $string . ']'));
See this example.
Of course the best solution would be to make sure that valid JSON is supplied because this will break easily if your text strings contain for example double quotes.
Try parsing as json, then pulling out what you want. Assuming that the data comes in blocks of 4 you can try:
$arr = json_decode($str);
for($i = 0; $i < count($arr) - 3; $i += 4)
{
$arr[] = new array($arr[$i], $arr[$i + 1], $arr[$i + 2], $arr[$i + 3]);
}
Why not count the [ in a loop? Here's a quick untested loop that could get you started.
$output = array('');
$brackets = 0;
$index = 0;
foreach (str_split($input) as $ch) {
if ($ch == '[') {
$brackets++;
}
$output[$index] .= $ch;
if ($ch == ']') {
$brackets--;
if ($brackets === 0) {
$index++;
$output[$index] = '';
}
}
}
Not very elegant though...

How to search for certain number of letter within a text file with PHP?

I have a large text file with names, location and date of birth of lots of people. I need to find names based on character size. How can I do this with PHP?
In the text file, data is organised like this:
Name-Location ID DOB
Bob-LA 110 12/01/1987
Lia-CA 111 11/09/1984
Neil-LA 112 17/10/1982
Emon-CA 113 07/12/1991
Elita-CA 113 13/06/1983
Ron-CA 114 16/02/1979
and so on
Now I wish to search for people with certain character name and with same location (say I wish to find all the people whose name has 4 letter and are from CA [Emon-CA]). How can I do that?
I can normally search through a file using PHP, where I know the string I am looking for. But here I actually don't know how to set the condition to show up my desired results. Can someone please help me?
Thanks in advance.
You can try
$filename = "log.txt";
foreach ( new TextFileFilterIterator($filename) as $line ) {
list($name, $location, $id, $dob) = $line;
if (strlen($name) == 4 && $location == "CA") {
echo implode(",", $line), PHP_EOL;
}
}
Output
Emon,CA,113,07/12/1991
Class Used
class TextFileFilterIterator extends ArrayIterator {
private $filter;
function __construct($filename) {
parent::__construct(array_filter(array_map("trim", file($filename))));
}
public function current() {
$c = array_filter(explode(" ", parent::current()));
list($n, $l) = explode("-", array_shift($c));
array_unshift($c, $n, $l);
return array_map("trim", $c);
}
}
I'd suggest using regular expressions, something like this:
// assume $text contains the contents of your text file
$namelength = 4; // change this as needed
$location = 'CA'; // again, change as needed
If you just want to count the results
$count = preg_match_all('|^\s*([\w]{'.$namelength.'})-'.$location.'\s*(\d+)\s*(\d{2}/\d{2}/\d{4})$|',$text,$matches);
Otherwise, If you want to do something with each match:
if(preg_match_all('|^\s*([\w]{'.$namelength.'})-'.$location.'\s*(\d+)\s*(\d{2}/\d{2}/\d{4})$|',$text,$matches)){
foreach($matches as $match){
$name = $match[1];
$id = $match[3];
$dob = $match[4];
// Do something with each name.
}
}

Split a string, remember the positions of splitting

Assume I have the following string:
I have | been very busy lately and need to go | to bed early
By splitting on "|", you get:
$arr = array(
[0] => I have
[1] => been very busy lately and need to go
[2] => to bed early
)
The first split is after 2 words, and the second split 8 words after that. The positions after how many words to split will be stored: array(2, 8, 3). Then, the string is imploded to be passed on to a custom string tagger:
tag_string('I have been very busy lately and need to go to bed early');
I don't know what the output of tag_string will be exactly, except that the total words will remain the same. Examples of output would be:
I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p
I-ee have been-vb very busy-df lately-nn and need-f to go to bed-uu early-yy
This will lengthen the string by an unknown number of characters. I have no control over tag_string. What I know is (1) the number of words will be the same as before and (2) the array was split after 2, and thereafter after 8 words, respectively. I now need a solution explode the tagged string into the same array as before:
$string = "I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p"
function split_string_again() {
// split after 2nd, and thereafter after 8th word
}
With output:
$arr = array(
[0] => I have-nn
[1] => been-vb very-vb busy lately and-rr need to-r go
[2] => to bed early-p
)
So to be clear (I wasn't before): I cannot split by remembering the strpos, because strpos before and after the string went through the tagger, aren't the same. I need to count the number of words. I hope I have made myself more clear :)
You wouldn't want to count the number of words, you would want to count the string length (strlen). If it is the same string without the pipes, then you want to split it with substr after a certain amount.
$strCounts = array();
foreach ($arr as $item) {
$strCounts[] = strlen($item);
}
// Later on.
$arr = array();
$i = 0;
foreach ($strCounts as $count) {
$arr[] = substr($string, $i, $count);
$i += $count; // increment the start position by the length
}
I have not tested this, simply a "theory" and probably has some kinks to work out. There may be a better way to go about it, I just don't know it.
Interesting question, although I think the rope data structure still applies it might be a little overkill since word placement won't change. Here is my solution:
$str = "I have | been very busy lately and need to go | to bed early";
function get_breaks($str)
{
$breaks = array();
$arr = explode("|", $str);
foreach($arr as $val)
{
$breaks[] = str_word_count($val);
}
return $breaks;
}
$breaks = get_breaks($str);
echo "<pre>" . print_r($breaks, 1) . "</pre>";
$str = str_replace("|", "", $str);
function rebreak($str, $breaks)
{
$return = array();
$old_break = 0;
$arr = str_word_count($str, 1);
foreach($breaks as $break)
{
$return[] = implode(" ", array_slice($arr, $old_break, $break));
$old_break += $break;
}
return $return;
}
echo "<pre>" . print_r(rebreak($str, $breaks), 1) . "</pre>";
echo "<pre>" . print_r(rebreak("I have-nn been-vb very-vb busy lately and-rr need to-r go to bed early-p", $breaks), 1) . "</pre>";
Let me know if you have any questions, but it is pretty self explanatory. There are definitely ways to improve this as well.
I'm not quite sure I understood what you actually wanted to achieve. But here are a couple of things that might help you:
str_word_count() counts the number of words in a string. preg_match_all('/\p{L}[\p{L}\p{Mn}\p{Pd}\x{2019}]*/u', $string, $foo); does pretty much the same, but on UTF-8 strings.
strpos() finds the first occurrence of a string within another. You could easily find the positions of all | with this:
$pos = -1;
$positions = array();
while (($pos = strpos($string, '|', $pos + 1)) !== false) {
$positions[] = $pos;
}
I'm still not sure I understood why you can't just use explode() for this, though.
<?php
$string = 'I have | been very busy lately and need to go | to bed early';
$parts = explode('|', $string);
$words = array();
foreach ($parts as $s) {
$words[] = str_word_count($s);
}

Parsing single CSV column delimited by ~HEADER#

I have a CSV file which needs some additional processing. I've got most of our custom functionality completed. My stuck at the moment is the latest addition to the feed, multiple categories in 1 column. Here is a quick example of the new field setup.
Category01#Things~Category01#Will~Category01#Be~Category01#Here~Category02#Testing~Category02#More text here~Category02#Any data~Category02#No more data for this category~LastCategory#This~LastCategory#Is~LastCategory#The~LastCategory#End
I would need to build an array in PHP from each category available, similar to;
$category01 = array('Things', 'Will', 'Be', 'Here');
Any help would be greatly appreciated. Thanks!
If I understand your question and the format correctly, categories separated by ~, and each listed as "SomeString#Category Name", then this should to the trick. However I don't think this has anything to do with the CSV format.
$pairs = explode('~', $string);
$cats = array();
foreach ($pairs as $pair) {
list($cat_number, $cat_name) = explode('#', $pair);
$cats[] = $cat_name;
}
Gahhh. the goggles, they do nothing!
If you're unable to change that output form (and it SHOULD be changed to something nicer), you'll have to go brute force:
$csv = '...';
$categories = array();
$parts = explode('~', $csv);
foreach($parts as $part) {
$bits = explode('#', $part);
$category = (int)substr($part[0], 8);
if (!is_array($categories[$category])) {
$categories[$category] = array();
}
$categories[$category][] = $part[1];
}
Of course, this'll blow up on your LastCategory stuff at the tail end of that "csv". so... let me again STRONGLY urge you fix up whatever's generating that so-called "csv" in the first place.

Need a php script diagnosis for a small snippet of code

Can somebody tell me what I am doing wrong really? I am going nuts, the following code works perfect on localhost/WIN and when I try it on the webhost/linux it gives warnings:
$lines = file('english.php');
foreach($lines as $line) {
$matches=array();
if (preg_match('/DEFINE\(\'(.*?)\',\s*\'(.*)\'\);/i', $line, $matches)) {
$keys[] = $matches[1];
$values[] = $matches[2];
}
}
$lang = array_combine($keys, $values);
When I test on webhost:
Warning: array_combine() expects parameter 1 to be array, null given in /home/xx/public_html/xx on line 616
But on local server (windows xp) it works perfect. I do not know what I am doing wrong, please help me resolve this nightmare :(
Thanks.
I don't see anything obviously wrong with your code, but I'm curious why you're building separate arrays and then combining them rather than just building a combined array:
// Make sure this file is local to the system the script is running on.
// If it's a "url://" path, you can run into url_fopen problems.
$lines = file('english.php');
// No need to reinitialize each time.
$matches = array();
$lang = array();
foreach($lines as $line) {
if (preg_match('/DEFINE\(\'([^\']*)\',\s*\'([^\\\\\']*(?:\\.[^\\\\\']*)*)\'\);/i', $line, $matches)) {
$lang[$matches[1]] = $matches[2];
}
}
(I've also changed your regex to handle single quotes.)
Are the php versions the same?
And are you sure you have transfered all your files to the webhost?
It seems your $keys variable is null, because you're not initializing it anywhere.
My best guess is that the english.php file on your server is empty (or does not exists), so when you try to read it nothing is saved in $keys variable;
Try adding an initial value for that variable before the foreach statement:
$lines = file('english.php');
$keys = array();
foreach($lines as $line) {
$matches=array();
if (preg_match('/DEFINE\(\'(.*?)\',\s*\'(.*)\'\);/i', $line, $matches)) {
$keys[] = $matches[1];
$values[] = $matches[2];
}
}
$lang = array_combine($keys, $values);
That way, even if the file doesn't exist or is empty you're covering all possible paths.
You should always code as if everything could go wrong, not the other way around :)

Categories