I have a source file with a select form with some options, like this:
<option value="TTO">1031</option><option value="187">187</option><option value="TWO">2SK8</option><option value="411">411</option><option value="AEL">Abec 11</option><option value="ABE">Abec11</option><option value="ACE">Ace</option><option value="ADD">Addikt</option><option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option><option value="ALG">Alligator</option><option value="ALM">Almost</option>
I would like to read this file using php and regex, but I don't really know how. Anybody an idea? It would be nice to have an array with the 3 digits code as a key, and the longer string as a value. (so, for example, $arr['TWO'] == '2SK8')
<?php
$options= '
<option value="TTO">1031</option><option value="187">187</option><option value="TWO">2SK8</option><option value="411">411</option><option value="AEL">Abec 11</option><option value="ABE">Abec11</option><option value="ACE">Ace</option><option value="ADD">Addikt</option><option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option><option value="ALG">Alligator</option><option value="ALM">Almost</option>
';
preg_match_all( '#(<option value="([^"]+)">([^<]+)<\/option>)#', $options, $arr);
$result = array();
foreach ($arr[0] as $i => $value)
{
$result[$arr[2][$i]] = $arr[3][$i];
}
print_r($result);
?>
output:
Array
(
[TTO] => 1031
[187] => 187
[TWO] => 2SK8
[411] => 411
[AEL] => Abec 11
[ABE] => Abec11
[ACE] => Ace
[ADD] => Addikt
[AFF] => Affiliate
[ALI] => Alien Workshop
[ALG] => Alligator
[ALM] => Almost
)
What about something like this :
$html = <<<HTML
<option value="TTO">1031</option><option value="187">187</option>
<option value="TWO">2SK8</option><option value="411">411</option>
<option value="AEL">Abec 11</option><option value="ABE">Abec11</option>
<option value="ACE">Ace</option><option value="ADD">Addikt</option>
<option value="AFF">Affiliate</option><option value="ALI">Alien Workshop</option>
<option value="ALG">Alligator</option><option value="ALM">Almost</option>
HTML;
$matches = array();
if (preg_match_all('#<option\s+value="([^"]+)">([^<]+)</option>#', $html, $matches)) {
$list = array();
$num_matches = count($matches[0]);
for ($i=0 ; $i<$num_matches ; $i++) {
$list[$matches[1][$i]] = $matches[2][$i];
}
var_dump($list);
}
The output ($list) would be :
array
'TTO' => string '1031' (length=4)
187 => string '187' (length=3)
'TWO' => string '2SK8' (length=4)
411 => string '411' (length=3)
'AEL' => string 'Abec 11' (length=7)
'ABE' => string 'Abec11' (length=6)
'ACE' => string 'Ace' (length=3)
'ADD' => string 'Addikt' (length=6)
'AFF' => string 'Affiliate' (length=9)
'ALI' => string 'Alien Workshop' (length=14)
'ALG' => string 'Alligator' (length=9)
'ALM' => string 'Almost' (length=6)
A few explainations :
I'm using preg_match_all to match as many times as possible
([^"]+) means "everything that is not a double-quote (as that one would mark the end of the value), at least one time, and as many times as possible (+)
([^<]+) means about the same thing, but with < instead of " as end marker
preg_match_all will get me an array containing in $matches[1] the list of all stuff that matched the first set of (), and in $matches[2] what matched the second set of ()
so I need to iterate over the results to re-construct the list that inetrestes you :-)
Hope this helps -- and that you understood what it does and how, so you can help yourself, the next time ;-)
As a sidenote : using regex to "parse" HTML is generally not such a good idea... If you have a full HTML page, you might want to take a look at DOMDocument::loadHTML.
If you don't and the format of the options is not well-defined... Well, maybe it might prove useful to add some stuff to the regex, as a precaution... (Like accepting spaces here and there, accepting other attributes, ...)
Try this out. Just load the file's contents into $raw_html and use this regex to collect the matches. The 3-digit code from the $ith option is $out[i][1], and the longer string is $out[i][2]. You can convert that to an associative array as needed.
$regex = '|<option value="(.{3})">([^<]+)</option>|';
preg_match_all($regex, $raw_html, $out, PREG_SET_ORDER);
print_r($out);
Related
I need some help. What I want is to make ignore a comma in specific string. It is a comma seperated file csv, but the name have a comma, and I need to ignore that.
What I got is
<?php
$pattern = '/([\\W,\\s]+Inc.])|[,]/';
$subject = 'hypertext language, programming, Amazon, Inc., 100';
$limit = -1;
$flags = PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE;
$result = preg_split ($pattern, $subject, $limit, $flags);
?>
Result is
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon',
3 => ' Inc.',
4 => ' 100',
);
?>
And I want the result to be
$result (php code):
<?php
array (
0 => 'hypertext language',
1 => ' programming',
2 => ' Amazon, Inc.',
3 => ' 100',
);
?>
Thanks for your help :)
Note that [\W,\s] = \W since \W matches any char that is not a letter, digit or underscore. However, it seems you just want to split on a , that is not followed with space(s)*+Inc..
You may use a negative lookahead to achieve this:
/,(?!\s*Inc\.)/
^^^^^^^^^^^^
See the regex demo
The (?!\s*Inc\.) will fail any , match if there are 0+ whitespaces (\s*) followed with a sequence of literal characters Inc. after them.
From your tutorial, if I pull the Amazon information as a CSV, I get the following format. Which you can then parse with one of Php's native functions. This shows you don't need to use explode or regex to handle this data. Use the right tool for the job:
<?php
$csv =<<<CSV
"amzn","Amazon.com, Inc.",765.56,"11/2/2016","4:00pm","-19.85 - -2.53%",10985
CSV;
$array = str_getcsv($csv);
var_dump($array);
Output:
array (size=7)
0 => string 'amzn' (length=4)
1 => string 'Amazon.com, Inc.' (length=16)
2 => string '765.56' (length=6)
3 => string '11/2/2016' (length=9)
4 => string '4:00pm' (length=6)
5 => string '-19.85 - -2.53%' (length=15)
6 => string '10985' (length=5)
I have this array
array (size=7)
0 => string 'Â bs-0468R(20UG)' (length=16)
1 => string 'Â bs-1338R(1ML)' (length=15)
2 => string 'Â bs-1557G(NO BSA)' (length=18)
3 => string 'Â bs-3295R(NO BSA)' (length=18)
4 => string '" bs-0730R' (length=10)
5 => string '" bs-3889R' (length=10)
6 => string 'bs-0919R (NO BSA)' (length=17)
I want to throw away everything and only keep the string that start with bs.
What is the best of doing it ?
Something like this:
$result = array_filter($array, function ($i) {
return strpos($i, 'bs')===0;
});
I love preg_grep:
$result = preg_grep('/^bs/', $array);
I agree with #Casimir et Hippolyte. If you know you're always going to have a controlled dataset such as your example (rare), you can always just reference the string as an array -- which it already is under the hood:
$result = array_filter($array, function ($v) {
return $v[0] . $v[1] == 'bs';
});
Regex is amazing and not a performance problem for most situations, however I have had some issues with it where other functionalities were far faster and efficient when it counted. I understand that statement is not true for the majority of applications, but it is worth noting.
I need to parse a lot of files and get their header declaration from all of them and add them all to an array..It doesnt matter if its the same or not since i'll use array_unique after to get only the unique once.
Some files have comments on the top so i can just pick the first line. The declaration is like this:
private ["_aaaaaaa", "_bbbbbb", "_ccccc", "_dddddddd"];
but sometimes it can be like this (no space)
private["_aaaaaaa","_bbbbbb","_ccccc","_dddddddd"];
or like this (if the guy who wrote it didnt pay attention)
private["_aaaaaaa", "_bbbbbb","_ccccc", "_dddddddd"];
So far i got this:
<?php
$str = 'private ["_aaaaaaa","_bbbbbb","_ccccc","_dddddddd"];';
$arr = Array();
$start = 'private [';
$end = '];';
$pattern = sprintf(
'/%s(.+?)%s/ims',
preg_quote($start, '/'), preg_quote($end, '/')
);
if (preg_match($pattern, $str, $matches)) {
list(, $match) = $matches;
echo $match;
}
?>
which outputs :
"_aaaaaaa","_bbbbbb","_ccccc","_dddddddd"
Still though that doesnt cover it....plus how will i make that to an array...?
Is there a simple way of doing this ? I've got the function that parses all the files in a folder and subfolder...i just need first to parse all the files and make this array which i'll later use in my main function.
Any help would be appreciated.
-Thanks
This should work -
/*
Function-> get_header()
Input -> The header string.
Output -> An array of header's parameters.
*/
function get_header($string){
if(preg_match("/private\s?\[(.*?)\];/", $string, $matches)){
return preg_split("/(\s*)?,(\s*)?/",$matches[1]);
}
return Array();
}
//Assuming these to be the different file headers.
$headers = Array(
'private ["_aaaaaaa", "_bbbbbb", "_ccccc","_dddddddd"];',
'private ["_4","_3","_2","_1" ];',
'private["_a", "_b","_c", "_d"];'
);
$header_arr = Array();
foreach($headers as $h){
$header_arr = array_merge($header_arr, get_header($h));
}
var_dump($header_arr);
OUTPUT-
/*
array
0 => string '"_aaaaaaa"' (length=10)
1 => string '"_bbbbbb"' (length=9)
2 => string '"_ccccc"' (length=8)
3 => string '"_dddddddd"' (length=11)
4 => string '"_4"' (length=4)
5 => string '"_3"' (length=4)
6 => string '"_2"' (length=4)
7 => string '"_1" ' (length=5)
8 => string '"_a"' (length=4)
9 => string '"_b"' (length=4)
10 => string '"_c"' (length=4)
11 => string '"_d"' (length=4)
*/
In PHP I have an array like this:
array
0 => string 'open' (length=4)
1 => string 'http://www.google.com' (length=21)
2 => string 'blank' (length=5)
but it could also be like:
array
0 => string 'blank' (length=5)
1 => string 'open' (length=4)
2 => string 'http://www.google.com' (length=21)
now it is easy to find "blank" with in_array("blank", $array) but how can I see if one string is starting with "http"?
I've tried with
array_search('http', $array); // not working
array_search('http://www.google.com', $array); // is working
now everything after `http? could vary (how to write vary, varie? could be different is what I mean!)
Now do I need a regex or how can I check if http exists in array string?
Thanks for advices
"Welcome to PHP, there's a function for that."
Try preg_grep
preg_grep("/^http\b/i",$array);
Regex explained:
/^http\b/i
^\ / ^ `- Case insensitive match
| \/ `--- Boundary character
| `------ Literal match of http
`--------- Start of string
Try using the preg_grep function which returns an array of entries that match the pattern.
$array = array("open", "http://www.google.com", "blank");
$search = preg_grep('/http/', $array);
print_r($search);
Solution without regex:
$input = array('open', 'http://www.google.com', 'blank');
$output = array_filter($input, function($item){
return strpos($item, 'http') === 0;
});
Output:
array (size=1)
1 => string 'http://www.google.com' (length=21)
You can use preg_grep
$match = preg_grep("/http/",$array);
if(!empty($match)) echo "http exist in the array of string.";
or you can use foreach and preg_match
foreach($array as $check) {
if (preg_match("/http/", $check))
echo "http exist in the array of string.";
}
This question is different to Split, a string, at every nth position, with PHP, in that I want to split a string like the following:
foo|foo|foo|foo|foo|foo
Into this (every 2nd |):
array (3) {
0 => 'foo|foo',
1 => 'foo|foo',
2 => 'foo|foo'
}
So, basically, I want a function similar to explode() (I really doubt that what I'm asking will be built-in), but which 'explodes' at every nth appearance of a certain string.
How is this possible?
You can use explode + array_chunk + array_map + implode
$string = "foo|foo|foo|foo|foo|foo";
$array = stringSplit($string,"|",2);
var_dump($array);
Output
array
0 => string 'foo|foo' (length=7)
1 => string 'foo|foo' (length=7)
2 => string 'foo|foo' (length=7)
Function used
function stringSplit($string, $search, $chunck) {
return array_map(function($var)use($search){return implode($search, $var); },array_chunk(explode($search, $string),$chunck));
}