extracting multiple fields from a text file using php - php

what is the best way of extracting multiple (~40 values) from a text file using php?
the data is more or less like:
NAMEA valuea
NAMEB valueb
I'm looking for a proper* approach to extracting this data into a data-structure, because i will need to specify regexs for all of them (all 40).
did i make myself clear?
*meaning, the default/painful method would be for me to do:
$namea = extractfunction("regexa", $textfilevalue);
$nameb = extractfunction("regeb", $textfilevalue);
... 40 times!
The lines may not be in the same order, or be present in each file. Every NAMEA is text like: "Registration Number:", or "Applicant Name:" (ie, with spaces in what i was calling as NAMEA)
Response to the Col.
i'm looking for a sensible "way" of writing my code, so its readable, modifiable, builds an object/array thats easily callable, etc... "good coding style!" :)
#Adam - They do actually... and contain slashes as well...
#Alix - Freaking marvelous man! THat was GOOD! would you also happen to have any insights on how I can "truncate" the rsultant array by removing everything from "key_x" and beyond? Should i open that as a new question?

Here is my take at it:
somefile.txt:
NAMEA valuea
NAMEB valueb
PHP Code:
$file = file_get_contents('./somefile.txt');
$string = preg_replace('~^(.+?)\s+(.+?)$~m', '$1=$2', $file);
$string = str_replace(array("\r\n", "\r", "\n"), '&', $string);
$result = array();
parse_str($string, $result);
echo '<pre>';
print_r($result);
echo '</pre>';
Output:
Array
(
[NAMEA] => valuea
[NAMEB] => valueb
)
You may also be able to further simplify this by using str_getcsv() on PHP 5.3+.
EDIT: My previous version fails for keys that have spaces like #Col. Shrapnel noticed. I didn't read the question with enough attention. A possible solution since you seem to be using keys that always have : appended is this:
$string = preg_replace('~^(.+?):\s+(.+?)$~m', '$1=$2', $file);
To remove everything from key_x to the end of the file you can do something like this:
$string = substr($string, 0, strpos($string, 'key_x'));
So the whole thing would look like this:
somefile.txt:
Registration Number: valuea
Applicant Name: valueb
PHP Code:
$file = file_get_contents('./somefile.txt');
$string = substr($file, 0, strpos($file, 'key_x'));
$string = preg_replace('~^(.+?):\s+(.+?)$~m', '$1=$2', $string);
$string = str_replace(array("\r\n", "\r", "\n"), '&', $string);
$result = array();
parse_str($string, $result);
echo '<pre>';
print_r($result);
echo '</pre>';
Output:
Array
(
[Registration_Number] => valuea
[Applicant_Name] => valueb
)

as far as I get it you can use file() to get an array of strings and then parse these strings with some regexp.
if you add a = sign between names and values, you'll be ble to get the whole thing at once using parse_ini_file()

Assuming your keys (namea, nameb) never have spaces in them:
$contents = file('some_file.txt'); // read file as array
$data = array();
foreach($contents as $line) { // iterate over file
preg_match('/^([^\s]+)\s+(.*)/', $line, $matches); // pull out key and value into $matches
$key = $matches[1];
$value = $matches[2];
$data[$key] = $value; // store key/value pairs in $data array
}
var_dump($data); // what did we get?

Related

PhP regex for a string, what would be the best way to do it?

I have an array with rule field that has a string like this:
FREQ=MONTHLY;BYDAY=3FR
FREQ=MONTHLY;BYDAY=3SA
FREQ=WEEKLY;UNTIL=20170728T080000Z;BYDAY=MO,TU,WE,TH,FR
FREQ=MONTHLY;UNTIL=20170527T100000Z;BYDAY=4SA
FREQ=WEEKLY;BYDAY=SA
FREQ=WEEKLY;INTERVAL=2;BYDAY=TH
FREQ=WEEKLY;BYDAY=TH
FREQ=WEEKLY;UNTIL=20170610T085959Z;BYDAY=SA
FREQ=MONTHLY;BYDAY=2TH
Each line is a different array, I am giving a few clues to get an idea of what I need.
What I need is to write a regex that would take off all unnecessary values.
So, I don't need FREQ= ; BYDAY= etc. I basically need the values after = but each one I want to store in a different variable.
Taking third one as an example it would be:
$frequency = WEEKLY
$until = 20170728T080000Z
$day = MO, TU, WE, TH, FR
It doesn't have to be necessarily one regex, there can be one regex for each value. So I have one for FREQ:
preg_match("/[^FREQ=][A-Z]+/", $input_line, $output_array);
But I can't do it for the rest unfortunately, how can I solve this?
The only way to go would be PHP array destructuring:
$str = "FREQ=WEEKLY;UNTIL=20170728T080000Z;BYDAY=MO,TU,WE,TH,FR";
preg_match_all('~(\w+)=([^;]+)~', $str, $matches);
[$freq, $until, $byday] = $matches[2]; // As of PHP 7.1 (otherwise use list() function)
echo $freq, " ", $until, " ", $byday;
// WEEKLY 20170728T080000Z MO,TU,WE,TH,FR
Live demo
Be more general
Using extract function:
preg_match_all('~(\w+)=([^;]+)~', $str, $m);
$m[1] = array_map('strtolower', $m[1]);
$vars = array_combine($m[1], $m[2]);
extract($vars);
echo $freq, " ", $until, " ", $byday;
Live demo
Notice: For this problem, I recommend the generell approach #revo posted, it's concise and safe and easy on the eyes -- but keep in mind, that regular expressions come with a performance penalty compared to fixed string functions, so if you can use strpos/substr/explode/..., try to use them, don't 'knee-jerk' to a preg_-based solution.
Since the seperators are fixed and don't seem to occur in the values your are interested in, and you furthermore rely on knowledge of the keys (FREQ:, etc) you don't need regular-expressions (as much as I like to use them anywhere I can, and you can use them here); why not simply explode and split in this case?
$lines = explode("\n", $text);
foreach($lines as $line) {
$parts = explode(';', $line);
$frequency = $until = $day = $interval = null;
foreach($parts as $part) {
list($key, $value) = explode('=', $part);
switch($key) {
case 'FREQ':
$frequency = $value;
break;
case 'INTERVAL':
$interval = $value;
break;
// and so on
}
}
doSomethingWithTheValues();
}
This may be more readable and efficient if your use-case is as simple as stated.
You need to use the Pattern
;?[A-Z]+=
together with preg_split();
preg_split('/;?[A-Z]+=/', $str);
Explanation
; match Semikolon
? no or one of the last Character
[A-Z]+ match one or more uppercase Letters
= match one =
If you want to have each Line into a seperate Array, you should do it this Way:
# split each Line into an Array-Element
$lines = preg_split('/[\n\r]+/', $str);
# initiate Array for Results
$results = array();
# start Looping trough Lines
foreach($lines as $line){
# split each Line by the Regex mentioned above and
# put the resulting Array into the Results-Array
$results[] = preg_split('/;?[A-Z]+=/', $line);
}

Remove everything before http in every element of array

I got an array call $urlsand i want to remove everything before http for every element in the array
suppose
$urls[1] = hd720\u0026url=http%3A%2F%2Fr2---sn-h50gpup0nuxaxjvh-hg0l.googlevideo.com%2Fvideoplayback%3Fexpire%3D1387559704%26fexp%3D937407%252C908540%252C941239%252C916623%252C909717%252C932295%252C936912%252C936910%252C923305%252C936913%252C907231%252C907240%252C921090%
I want it to be
$urls[1] = http%3A%2F%2Fr2---sn-h50gpup0nuxaxjvh-hg0l.googlevideo.com%2Fvideoplayback%3Fexpire%3D1387559704%26fexp%3D937407%252C908540%252C941239%252C916623%252C909717%252C932295%252C936912%252C936910%252C923305%252C936913%252C907231%252C907240%252C921090%
Here i gave example only for $urls[1] but i want to remove every characters till http is found for ALL element of array
I tried
$urls = strstr($urls, 'http');
$urls = preg_replace('.*(?=http://)', '', $urls);
Both didn't work
Use array_map() with a callback function:
$urls = array_map(function($url) {
return preg_replace('~.*(?=http://)~', '$1', urldecode($url));
}, $urls);
Demo.
strstr coupled with array_map gives you the expected result.
$furls = array_map('filterArr',$urls);
function filterArr($v)
{
return urldecode(strstr($v,'http'));
}
print_r($furls);
I'd do it like this:
foreach($urls as $key=>$val) {
$e = &$urls[$key]; // notice the & sign
// now whatever you do with $e will go back
// into the original array element
$e = preg_replace(.............);
}
I always use this technique to convert arrays since it's fast and efficient. The array_walk / array_filter way is also good but much slower when your array is medium to big.
You can cut everything before http with explode.
$string = explode("http", $urls); // Hold the url and cut before the http
$str = $string[0]; // Hold the first cut - E.G : hd720\u0026url=
echo $str; // Hold the first cut - E.G : hd720\u0026url=
Also note that $string[1]; will hold the other side of http : `%3A%2F%2Fr2---sn-h50...
So you can do it somthing like that :
$str1 = $string[1];
$fixedUrl = 'http'.$str1; // will hold the fixed http : http%3A%2F%2Fr2---sn-h50gpup0nuxaxjvh-hg0l...
You just miss delimiters arround your regex, preg_replace works well on array:
$urls = preg_replace('~.*(?=http://)~', '', $urls);
// add delimiters __^ __^
I used ~ to avoid escaping the //, in this case, it'll be:
$urls = preg_replace('/.*(?=http:\/\/)/', '', $urls);
// add delimiters __^ __^

In comma delimited string is it possible to say "exists" in php

In a comma delimited string, in php, as such: "1,2,3,4,4,4,5" is it possible to say:
if(!/*4 is in string bla*/){
// add it via the .=
}else{
// do something
}
In arrays you can do in_array(); but this isn't a set of arrays and I don't want to have to convert it to an array ....
Try exploding it into an array before searching:
$str = "1,2,3,4,4,4,5";
$exploded = explode(",", $str);
if(in_array($number, $exploded)){
echo 'In array!';
}
You can also replace numbers and modify the array before "sticking it back together" with implode:
$strAgain = implode(",", $exploded);
You could do this with regex:
$re = '/(^|,)' + preg_quote($your_number) + '(,|$)/';
if(preg_match($re, $your_string)) {
// ...
}
But that's not exactly the clearest of code; someone else (or even yourself, months later) who had to maintain the code would probably not appreciate having something that's hard to follow. Having it actually be an array would be clearer and more maintainable:
$values = explode(',', $your_string);
if(in_array((str)$number, $values)) {
// ...
}
If you need to turn the array into a string again, you can always use implode():
$new_string = implode(',', $values);

Separate each word into a Array

I have a file with contents like :
Apple 100
banana 200
Cat 300
I want to search for a particular string in the file and get the next word. Eg: I search for cat, I get 300. I have looked up this solution: How to Find Next String After the Needle Using Strpos(), but that didn't help and I didn't get the expected output. I would be glad if you can suggest any method without using regex.
I'm not sure this is the best approach, but with the data you've provided, it'll work.
Get the contents of the file with fopen()
Separate the values into array elements with explode()
Iterate over your array and check each element's index as odd or even. Copy to new array.
Not perfect, but on the right track.
<?php
$filename = 'data.txt'; // Let's assume this is the file you mentioned
$handle = fopen($filename, 'r');
$contents = fread($handle, filesize($filename));
$clean = trim(preg_replace('/\s+/', ' ', $contents));
$flat_elems = explode(' ', $clean);
$ii = count($flat_elems);
for ($i = 0; $i < $ii; $i++) {
if ($i%2<1) $multi[$flat_elems[$i]] = $flat_elems[$i+1];
}
print_r($multi);
This outputs a multidimensional array like this:
Array
(
[Apple] => 100
[banana] => 200
[Cat] => 300
)
Try this, it doesn't use regex, but it will be inefficient if the string you're searching is longer:
function get_next_word($string, $preceding_word)
{
// Turns the string into an array by splitting on spaces
$words_as_array = explode(' ', $string);
// Search the array of words for the word before the word we want to return
if (($position = array_search($preceding_word, $words_as_array)) !== FALSE)
return $words_as_array[$position + 1]; // Returns the next word
else
return false; // Could not find word
}
$find = 'Apple';
preg_match_all('/' . $find . '\s(\d+)/', $content, $matches);
print_r($matches);
You might benefit from using named regex subpatterns to capture the information you're looking for.
For example you, finding a number the word that is its former (1 <= value <= 9999)
/*String to search*/
$str = "cat 300";
/*String to find*/
$find = "cat";
/*Search for value*/
preg_match("/^$find+\s*+(?P<value>[0-9]{1,4})$/", $str, $r);
/*Print results*/
print_r($r);
In cases where a match is found the results array will contain the number you're looking for indexed as 'value'.
This approach can be combined with
file_get_contents($file);

PHP - get line number of regex result

I'm trying to write some PHP that will read a CSS file, find all occurrences of the #group comment, and their line number. This is what I have so far, but it's returning the character count rather than the line number.
$file = 'master.css';
$string = file_get_contents($file);
$matches = array();
preg_match_all('/\/\* #group.*?\*\//m', $string, $matches, PREG_OFFSET_CAPTURE);
list($capture, $offset) = $matches[0];
$line_number = substr_count(substr($string, 0, $offset), "\n") + 1;
echo '<pre>';
print_r($matches[0]);
echo '</pre>';
Try using file() rather than file_get_contents(). The difference is that file() returns the file contents as an array, one element per line, rather than as a string like file_get_contents does. I should note that file() returns the newline character at the end of each line as part of the array element. If you don't want that, add the FILE_IGNORE_NEW_LINES flag as a second parameter.
From there, you can use preg_grep() to return only the elements in the initial array. You can read their indexes to determine which lines matched, if you only want the line numbers:
An example:
myfile.txt:
hello world
how are you
say hello back!
line_find.php:
$filename = "myfile.txt";
$fileContents = file($filename);
$pattern = "/hello/";
$linesFound = preg_grep($pattern, $fileContents);
echo "<pre>", print_r($linesFound, true), "</pre>";
Result:
Array
(
[0] => hello world
[2] => say hello back!
)
Hope that helps.
This is not going to be optimal, but if you don't care about that :
$line_number = 1 + substr_count($string, "\n", 0, $index);
It's just counting the number of new line characters found up until that index you get from the offset capture.

Categories