Right now I'm using the command below to get the volume names of the mounted disks in OSX:
$exec = "df -lH | grep \"/Volumes/*\" | tr -s \" \" | sed 's/ /;/g'";
And parsing the output using this code:
$lines = explode("\n", $output);
$i = 0;
foreach ($lines as $line) {
$driveinfo = explode(";", $line);
$driveinfo[7] = trim($driveinfo[0]);
if (!empty($driveinfo[0]))
$allremovabledrives[$driveinfo[0]] = $driveinfo;
$i++;
}
This works fine if the Volume label doesn't have spaces in it:
[/dev/disk1s1] => Array
(
[0] => /dev/disk1s1
[1] => 32G
[2] => 31G
[3] => 674M
[4] => 98%
[5] => 0
[6] => 0
[7] => /dev/disk1s1
[8] => /Volumes/LUMIX
)
But if I mount a disk with a volume name that has spaces, disaster strikes and extra array values get added:
[/dev/disk4] => Array
(
[0] => /dev/disk4
[1] => 4.0T
[2] => 1.2T
[3] => 2.8T
[4] => 29%
[5] => 140741078
[6] => 347553584
[7] => /dev/disk4
[8] => /Volumes/My
[9] => Passport
[10] => Pro
)
Can anybody help me solve this problem? I'm not well versed in sed and command-line utilities ...
OK, the volume name is always the last field, and you know how many fields there are (9), so I would just split on whitespace and ask for that many fields. And not bother with any sed/awk/grep/tr stuff since you're already in a full-fledged programming system that can do what those commands do more efficiently within its own process space.
First, you can pass the list of volumes you want info about to df as arguments, which means you don't need the grep:
$df = shell_exec('df -lH /Volumes/*');
Now split on newline and get rid of the headers:
$rows = explode("\n", $df);
array_shift($rows);
Start building your result:
$result = array();
Here's where we don't need to use shell utilities just to make it possible to do with explode what we can already do with preg_split. The regular expression /\s+/ matches 1 or more whitespace characters in a row, so we don't get extra fields. The limit (9) means it only splits into 9 fields no matter how many more spaces there are - so the spaces in the last field (the volume name) get left alone.
foreach ($rows as $row) {
$cols = preg_split('/\s+/', $row, 9);
$result[$cols[0]] = $cols;
}
After all that, $result should look like you want.
Related
I'm wanting to replace the first character of a string with a specific character depending on its value,
A = 0
B = 1
C = 2
Is there a way to do this based on rules? In total I will have 8 rules.
Ok, so I'm editing this to add more information as I don't think some people understand / want to help without the full picture...
My string will be any length between 5 and 10 characters
Capitals will not factor into this, it is not case sensitive
Currently there is no code, I'm not sure the best way to do this. I can write an if statement on a substring, but I know straight away that is inefficient.
Below is the before and after that I am expecting, I have kept these examples simple but all I am looking to do is replace the first character with a specific character depending on its value. For now, there are eight rules, but this could grow in the future
INPUT OUTPUT
ANDREW 1NDREW
BRIAN 2RIAN
BOBBY 2OBBY
CRAIG 3RAIG
DAVID 4AVID
DUNCAN 4UNCAN
EDDIE 5DDIE
FRANK 6RANK
GEOFF 7EOFF
GIANA 7IANA
HAYLEY 8AYLEY
So as you can see, pretty straight forward, but is there a simple way to specifically specify what a character should be replaced by?
Assuming all the rules are for single characters, like in the example, it would be easisest to code them in to a dictionary:
$rules = array('A' => 0, 'B' => 0 /* etc... */);
$str[0] = $rules[$str[0]];
I think this is what you want.
<?php
$input = array('ANDREW','BRIAN','BOBBY','CRAIG','DAVID','DUNCAN','EDDIE','FRANK','GEOFF','GIANA','HAYLEY');
$array = range('A','Z');
$array = array_flip(array_filter(array_merge(array(0), $array)));
$output = [];
foreach($input as $k=>$v){
$output[] = $array[$v[0]].substr($v, 1);
}
print_r($output);
?>
Output:
Array (
[0] => 1NDREW
[1] => 2RIAN
[2] => 2OBBY
[3] => 3RAIG
[4] => 4AVID
[5] => 4UNCAN
[6] => 5DDIE
[7] => 6RANK
[8] => 7EOFF
[9] => 7IANA
[10] => 8AYLEY
)
DEMO: https://3v4l.org/BHLPk
I'm using the the League CSV library, but the same exact thing happens when I use built in PHP functions. Here is my spreadsheet:
ID column A column B column C
123 apple orange pear
And here is my code:
$stmt = (new Statement())
->offset(0)
->limit(2)
;
$records = $stmt->process($csv);
foreach ($records as $record) {
print_r($record);
}
Finally, here is the output. Notice, the ID value (123) is hanging off to the left. I'm not even sure what that is supposed to mean.
Array
(
[0] => ID
[1] => column A
[2] => column B
123 [3] => column C
[4] => apple
[5] => orange
[6] => pear
)
Edit: Here is the raw CSV file. Newline character is possibly a carriage return?
ID,column A,column B,column C
123,apple,orange,pear
This was really lame. Since I'm on a Mac, I needed to add this line to the top of the script:
ini_set('auto_detect_line_endings',TRUE);
Can the preg_match() function include groups it did not find in the matches array?
Here is the pattern I'm using:
/^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/
What I'm trying to is parse an human readable size into bytes. This pattern fits my requirement, but only if I can retrieve matches in the absolute group order.
This can produce upto 5 match groups, which would result in a matches array with indices 0-5. However if the string does not match all groups, then the matches array may have, for example, group 5 actually at index 3.
What I'd like is the final match in that pattern (5) to always be at the same index of the matches array. Because multiple groups are optional it's very important that when reading the matches array we know which group in the expression got matched.
Example situation: The regex tester at regexr.com will show all 5 groups including those not matched always in the correct order. By enabling the "global" and "multi-line" flags and using the following text, you can hover over the blue matches for a good visual.
500.2 KiB
256M
700 Mb
1.2GiB
You'll notice that not all groups are always matched, however the group indexes are always in the correct order.
Edit: Yes I did already try this in PHP with the following:
$matches = [];
$matchesC = 0;
$matchesN = 6;
if (!preg_match("/^([0-9]+)(\.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$/", $size, $matches) || ($matchesC = count($matches)) < $matchesN) {
print_r($matches);
throw new \Exception(sprintf("Could not parse size string. (%d/%d)", $matchesC, $matchesN));
}
When $size is "256M" that print_r($matches); returns:
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
)
Groups 4 and 5 are missing.
The non-participating groups are just not initialized with an empty string value in PHP, so, Group 4 and 5 are null in case of '256M' string. It seems that preg_match discards those non-initialized values from the end of the array.
In your case, you can make your capturing groups non-optional, but the patterns inside optional.
$arr = array('500.2 KiB', '256M', '700 Mb', '1.2GiB');
foreach ($arr as $s) {
if (preg_match('~^([0-9]+)(\.[0-9]+)?\s?([^ib]?)(i?)(b?)$~i', $s, $m)) {
print_r($m) . "\n";
}
}
Output:
Array
(
[0] => 500.2 KiB
[1] => 500
[2] => .2
[3] => K
[4] => i
[5] => B
)
Array
(
[0] => 256M
[1] => 256
[2] =>
[3] => M
[4] =>
[5] =>
)
Array
(
[0] => 700 Mb
[1] => 700
[2] =>
[3] => M
[4] =>
[5] => b
)
Array
(
[0] => 1.2GiB
[1] => 1
[2] => .2
[3] => G
[4] => i
[5] => B
)
See the PHP demo.
You can use T-Regx which can handle such cases with ease! It always checks whether a group is matched, even if it's last and unmatched. It also can differentiate between "" (matched empty) or null (unmatched):
pattern('^([0-9]+)(.[0-9]+)?\s?([^iIbB])?([iI])?([bB])?$')
->match($size)
->first(function (Match $match) {
// whether the group was used in a pattern
$match->hasGroup(14);
// whether the group was matched, even if last or empty string
$match->matched(5);
// group, or default value if not matched
$match->group(5)->orReturn('unmatched');
});
I am trying to get score table from this page http://www.skysports.com/football/competitions/bundesliga/table. I do this with
$bundes = file('http://www.skysports.com/football/competitions/bundesliga/table');
And when i try to display array $bundes i do it with this:
echo '<pre>', print_r($bundes), '</pre>';
The code witch i try do display is displayed like this:
[1437] =>
[1022] => German Bundesliga 2015/16
# Team Pl W D L F A GD Pts Last 6
1 [1059] => [1060] => Bayern Munich [1061] => [1062] => 9 9 0 0 29 4 25 27 [1072] =>
[1073] =>
[1074] =>
This is the first row of table. And now i can display $bundes[1060] and i get output of Bayer Munich but how can i get values from $bundes[1062], values are 9, 9, 0, 0, 29, 4, 25 and 27? I need to display each of this values in <td></td>
When i try to echo $bundes[1062] i get nothing.
A more reliable way of extracting the data is using DOM manipulation classes to do something like:
$doc = new \DOMDocument();
#$doc->loadHTMLFile('http://www.skysports.com/football/competitions/bundesliga/table');
$xpath = new \DOMXPath($doc);
$rows = $xpath->query('//tbody/tr');
$data = [];
foreach ($rows as $i => $row) {
$columns = $xpath->query('td', $row);
foreach ($columns as $column) {
$data[$i][] = trim($column->textContent);
}
}
print_r($data);
Which gives you:
Array
(
[0] => Array
(
[0] => 1
[1] => Bayern Munich
[2] => 9
[3] => 9
[4] => 0
[5] => 0
[6] => 29
[7] => 4
[8] => 25
[9] => 27
[10] =>
)
...
Regarding Dagon's comment, no terms can disallow crawling and extracting the data (as long as you do so at a reasonable rate that does not impact the website's performance). Terms of use & copyright law, however, do dictate what you can and cannot do with the crawled content (ex. republish).
Web scraping may be against the terms of use of some websites. The enforceability of these terms is unclear (see "FAQ about linking – Are website terms of use binding contracts?").
- Wikipedia, Web scraping: Legal issues
BTW, the pages robots meta tag does allow INDEX.
In a text file, I have the folowing strings :
ID | LABEL | A | B | C
--------------------------------------
9999 | Oxygen Isotopes | | 0.15 | 1
8733 | Enriched Uranium | | 1 | 1
I would like to extract the fields ID and LABEL of each line using regular expression.
How I can achieve it ?
I am not certain why you insisted on regex.
As the column appear to be separated by | symbol, it seems like using PHP function explode would be an easier solution.
You would be able loop through the lines, and refer to each column using typical array index notation, for example: $line[0] and $line[1] for ID and LABEL respectively.
I doubt regex is the best solution here.
Try this to separate the text file into an array of lines (this might or might not work, depending on the OS of the machine you created the txt file on)
$lines = explode($text, "\n");
$final_lines = array();
foreach ($lines as $line) {
$parts = explode($line, " | ");
$final_lines[] = $parts;
}
Now you can access all of the data through the line number then the column, like
$final_lines[2][0]
Will contain 8733.
You could use preg_split on every line:
$array = preg_split(`/\s*\|\s*/`, $inputLine, 2);
Then as in djdy's answer, the ID will be in $array[0] and the label in $array[1].
No regex needed:
<?php
$file = file('file.txt');
$ret = array();
foreach($file as $k=>$line){
if($k<2){continue;}
list($ret['ID'][],
$ret['LABEL'][],
$ret['A'][],
$ret['B'][],
$ret['C'][]) = explode('|',$line);
}
print_r($ret);
//Label: Oxygen Isotopes ID:9999
echo 'Label: '.$ret['LABEL'][0].' ID:'.$ret['ID'][0];
/*
Array
(
[C] => Array
(
[0] => 1
[1] => 1
)
[B] => Array
(
[0] => 0.15
[1] => 1
)
[A] => Array
(
[0] =>
[1] =>
)
[LABEL] => Array
(
[0] => Oxygen Isotopes
[1] => Enriched Uranium
)
[ID] => Array
(
[0] => 9999
[1] => 8733
)
)
*/
?>
Regular expressions might not be the best approach here. I'd read in each line as a string, and use String.explode("|", input) to make an array of strings. The 0 index is your ID, the 1 index is your label, and so on for A, B, and C if you want. That's a more robust solution than using regex.
A regular expression that gets the ID might be something like
\d{4} |
You could do something similar for the label field, bug again, this isn't as robust as just using explode.
Though its not a best approach to use regular expression here but one may be like this
preg_match_all("/(\d{4}.?)\|(.*?)\|/s", $data, $matchs)
2nd and 3rd index of $matches will carry the required data
Try
$str = file_get_contents($filename);
preg_match_all('/^\s*(\d*)\s*\|\s*(.*?)\s*\|/m', $str, $matches);
// $matches[1] will have ids
// $matches[2] will have labels