Php Regex: how to match repeated patterns - php

given following text
bond0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
eth0: 11329920252 12554462 0 0 0 0 0 3561 13072970332 12899522 0 0 0 0 0 0
I need to capture columns values. I thought something about these lines:
Regex: `(\w+):(?:\s+(\d+))+`
Php: `preg_match_all('/(\w+):(?:\s+(\d+))+/sim', $data, $regs)
But unfortunately it captures only first column.
Array
(
[0] => Array
(
[0] => dummy0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[1] => bond0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2] => eth0: 11329920252 12554462 0 0 0 0 0 3561 13072970332 12899522 0 0 0 0 0 0
[3] => ip6tnl0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4] => lo: 51675995 100695 0 0 0 0 0 0 51675995 100695 0 0 0 0 0 0
[5] => sit0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[6] => tunl0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
)
[1] => Array
(
[0] => 0
[1] => 0
[2] => 0
[3] => 0
[4] => 0
[5] => 0
[6] => 0
)
)
Any suggestion? Thanks
`
====EDIT====
Just to be clear: i know that i could preg_match searching for \d+ values or split the whole string in lines and run explode on the each line, but I'm interested in regex solution where I have first column as first member of resulting array(actualy forgot to put capturing braces in the first draft of question), and following columns with data, every line putted in it's dedicated array...

Why use preg_match or preg_match_all at all?
$results = array();
foreach (preg_split("/\r\n|\r|\n/", $data) as $line)
{
list($key, $values) = explode(":", $line);
$results[$key] = preg_split("/\s/", trim($values));
}
This should work as long as there is no more than one : on every line. Seems to me like it's the shortest and fastest way to write this too.

Here you go:
$data = explode("\n", $data);
$out = array();
foreach ($data as $d) {
preg_match_all('/\s(\d+)/', $d, $matches);
Puts $matches[0] equal to an array of matches. You then want to add it to the array of rows:
$out[] = $matches[0];
}
You now have an jagged array of lines and columns. So, to reference line two column four, you can go to $out[1][3].

I know that you are looking for preg_match solution but this is in case you didn't find any usefull answer
<?php
$val = "bond0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
eth0: 11329920252 12554462 0 0 0 0 0 3561 13072970332 12899522 0 0 0 0 0 0";
$arr1 = explode("\n",$val);
foreach ($arr1 as $value) {
$exp = explode(":",$value);
$ex = preg_replace('/\s+/', ' ',trim($exp[1]));
$arr[$exp[0]] = explode(" ",$ex);
}
var_dump($arr);
?>
results:
array (size=2)
'bond0' =>
array (size=17)
0 => string '' (length=0)
1 => string '0' (length=1)
2 => string '0' (length=1)
3 => string '0' (length=1)
4 => string '0' (length=1)
5 => string '0' (length=1)
6 => string '0' (length=1)
7 => string '0' (length=1)
8 => string '0' (length=1)
9 => string '0' (length=1)
10 => string '0' (length=1)
11 => string '0' (length=1)
12 => string '0' (length=1)
13 => string '0' (length=1)
14 => string '0' (length=1)
15 => string '0' (length=1)
16 => string '0' (length=1)
'eth0' =>
array (size=17)
0 => string '' (length=0)
1 => string '11329920252' (length=11)
2 => string '12554462' (length=8)
3 => string '0' (length=1)
4 => string '0' (length=1)
5 => string '0' (length=1)
6 => string '0' (length=1)
7 => string '0' (length=1)
8 => string '3561' (length=4)
9 => string '13072970332' (length=11)
10 => string '12899522' (length=8)
11 => string '0' (length=1)
12 => string '0' (length=1)
13 => string '0' (length=1)
14 => string '0' (length=1)
15 => string '0' (length=1)
16 => string '0' (length=1)

You can do it like this:
$subject = <<<LOD
dummy0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
bond0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
eth0: 11329920252 12554462 0 0 0 0 0 3561 13072970332 12899522 0 0 0 0 0 0
ip6tnl0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
lo: 51675995 100695 0 0 0 0 0 0 51675995 100695 0 0 0 0 0 0
sit0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tunl0: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
LOD;
$pattern = '~^(?<item>\w+):|\G\h+(?<value>\d+)~m';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
$i=-1;
foreach($matches as $match) {
if (isset($match['value']))
$result[$i]['values'][] = $match['value'];
else {
$i++;
$result[$i]['item'] = $match['item']; }
}
print_r($result);
You will obtain the format you describe in your EDIT.
Pattern details:
~ # pattern delimiter
^ # anchor for the line start (in multiline mode)
(?<item>\w+) # named capture "item"
:
| # OR
\G # force the match to be contigous from precedent
\h+ #
(?<value>\d+) # named capture "value"
~m # pattern delimiter, m modifier for multiline mode

Related

PHP : Combinations without permutations

This code gives me every possible combination of n values with a length of x, to have a sum of n.
function GETall_distri_pres($n_valeurs, $x_entrees, $combi_presences = array()) {
if ($n_valeurs == 1) {
$combi_presences[] = $x_entrees;
return array($combi_presences);
}
$combinaisons = array();
// $tiroir est le nombre de chaussettes dans le tiroir suivant
for ($tiroir = 0; $tiroir <= $x_entrees; $tiroir++) {
$combinaisons = array_merge($combinaisons, GETall_distri_pres(
$n_valeurs - 1,
$x_entrees - $tiroir,
array_merge($combi_presences, array($tiroir))));
}
return $combinaisons;
}
I need to generate only unique distributions for example not having [2,1,1,0] and [1,2,1,0], only [2,1,1,0].
var_dump(GETall_distri_pres(3,3)) will give :
array (size=10)
0 =>
array (size=3)
0 => int 0
1 => int 0
2 => int 3
1 =>
array (size=3)
0 => int 0
1 => int 1
2 => int 2
2 =>
array (size=3)
0 => int 0
1 => int 2
2 => int 1
3 =>
array (size=3)
0 => int 0
1 => int 3
2 => int 0
4 =>
array (size=3)
0 => int 1
1 => int 0
2 => int 2
5 =>
array (size=3)
0 => int 1
1 => int 1
2 => int 1
6 =>
array (size=3)
0 => int 1
1 => int 2
2 => int 0
7 =>
array (size=3)
0 => int 2
1 => int 0
2 => int 1
8 =>
array (size=3)
0 => int 2
1 => int 1
2 => int 0
9 =>
array (size=3)
0 => int 3
1 => int 0
2 => int 0
Do you have any ideas?
This would be an approach: before returning the computed set you filter them by creating a fresh associative array, using the normalized permutations as keys. That will result in permutations overwriting themselves, so that only one will get preserved:
<?php
function GETall_distri_pres($n_valeurs, $x_entrees, $combi_presences = array()) {
if ($n_valeurs == 1) {
$combi_presences[] = $x_entrees;
return array($combi_presences);
}
$combinaisons = array();
// $tiroir est le nombre de chaussettes dans le tiroir suivant
for ($tiroir = 0; $tiroir <= $x_entrees; $tiroir++) {
$combinaisons = array_merge($combinaisons, GETall_distri_pres(
$n_valeurs - 1,
$x_entrees - $tiroir,
array_merge($combi_presences, array($tiroir))));
}
// filter out permutations
$filteredCombinations = [];
array_walk($combinaisons, function($entry) use(&$filteredCombinations) {
arsort($entry);
$filteredCombinations[join('', $entry)] = $entry;
});
return array_values($filteredCombinations);
}
$result = GETall_distri_pres(3, 3);
print_r($result);
The output obviously is:
Array
(
[0] => Array
(
[0] => 3
[1] => 0
[2] => 0
)
[1] => Array
(
[0] => 2
[1] => 1
[2] => 0
)
[2] => Array
(
[0] => 1
[1] => 1
[2] => 1
)
)

Divide the array and create new array

I have array like this
array (size=6)
0 => string '2 16 10 4 0 0 0 0 0'
1 => string '0 0 0 4'
2 => string '2 15 8 6 0 0 0 0 0'
3 => string '0 0 0 3'
4 => string '3 18 12 5 0 0 0 0 0'
5 => string '0 0 0 2'
I want to divide the array and create a new array like
array1 (size = 1)
0 => '2 16 10 4 0 0 0 0 0 0 0 0 0 4'
array2 (size = 1)
0 => '2 15 8 6 0 0 0 0 0 0 0 0 3'
array3 (size = 2)
0 => '3 18 12 5 0 0 0 0 0 0 0 0 2'
array_chunk() works fine. But it not supported my array
use array_chunk($array_name, 2)
the above will return a multi dimension array.
You can do it through array_chunk() and foreach()
$new_array = array_chunk($original_array,2);
$final_array = [];
foreach($new_array as $arr){
$final_array[] = $arr[0].' '.$arr[1];
}
print_r($final_array);
Output:- https://eval.in/928261
Note:- If you want to remove extra white-spaces in-between the strings, then use preg_replace()
$new_array = array_chunk($original_array,2);
$final_array = [];
foreach($new_array as $arr){
$final_array[] = preg_replace('/\s+/', ' ', $arr[0]).' '.preg_replace('/\s+/', ' ', $arr[1]);
}
print_r($final_array);
Output:-https://eval.in/928265
Method #1: (Demo)
$prepped_copy=preg_replace('/\s+/',' ',$array); // reduce spacing throughout entire array
while($prepped_copy){ // iterate while there are any elements in the array
$result[]=implode(' ',array_splice($prepped_copy,0,2)); // concat 2 elements at a time
}
var_export($result);
Method #2 (Demo)
$pairs=array_chunk(preg_replace('/\s+/',' ',$array),2); // reduce spacing and pair elements
foreach($pairs as $pair){
$result[]="{$pair[0]} {$pair[1]}"; // concat 2 elements at a time
}
var_export($result);
Both Output:
array (
0 => '2 16 10 4 0 0 0 0 0 0 0 0 4',
1 => '2 15 8 6 0 0 0 0 0 0 0 0 3',
2 => '3 18 12 5 0 0 0 0 0 0 0 0 2',
)
To my surprise, Method #1 was actually slightly faster using the small sample dataset (but not noticeably so).

How do I make my word unscrambler return more relevant results

I am building a word unscrambler (php/mysql) that takes user input of between 2 and 8 letters and returns words of between 2 and 8 letters that can be made from those letters, not necessarily using all of the letters, but definitely not including more letters than supplied.
The user will enter something like MSIKE or MSIKEI (two i's), or any combination of letters or multiple occurrences of a letter.
The query below will find all occurrences of words that contain M, S, I, K, or E.
However, the query below also returns words that have multiple occurrences of letters not requested. For example, the word meek would be returned, even though it has two e's and the user didn't enter two e's, or the word kiss, even though the user didn't enter s twice.
SELECT word
FROM words
WHERE word REGEXP '[msike]'
AND has_a=0
AND has_b=0
AND has_c=0
AND has_d=0
(we skip e) or we could add has_e=1
AND has_f=0
...and so on...skipping letters m, s, i, k, and e
AND has_w=0
AND has_x=0
AND has_y=0
AND has_z=0
Note the columns has_a, has_b, etc are either 1 if the letter occurs in the word or 0 if not.
I am open to any changes to the table schema.
This site: http://grecni.com/texttwist.php is a good example of what I am trying to emulate.
Question is how to modify the query to not return words with multiple occurrences of a letter, unless the user specifically entered a letter multiple times. Grouping by word length would be an added bonus.
Thanks so much.
EDIT: I altered the db per the suggestion of #awei, The has_{letter} is now count_{letter} and stores the total number of occurrences of the respective letter in the respective word. This could be useful when a user enters a letter multiple times. example: user enters MSIKES (two s).
Additionally, I have abandoned the REGEXP approach as shown in the original SQL statement. Working on doing most of the work on the PHP side, but many hurdles still in the way.
EDIT: Included first 10 rows from table
id word alpha otcwl ospd csw sowpods dictionary enable vowels consonants start_with end_with end_with_ing end_with_ly end_with_xy count_a count_b count_c count_d count_e count_f count_g count_h count_i count_j count_k count_l count_m count_n count_o count_p count_q count_r count_s count_t count_u count_v count_w count_x count_y count_z q_no_u letter_count scrabble_points wwf_points status date_added
1 aa aa 1 0 0 1 1 1 aa a a 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 1 2015-11-12 05:39:45
2 aah aah 1 0 0 1 0 1 aa h a h 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 6 5 1 2015-11-12 05:39:45
3 aahed aadeh 1 0 0 1 0 1 aae hd a d 0 0 0 2 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 9 8 1 2015-11-12 05:39:45
4 aahing aaghin 1 0 0 1 0 1 aai hng a g 1 0 0 2 0 0 0 0 0 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 6 10 11 1 2015-11-12 05:39:45
5 aahs aahs 1 0 0 1 0 1 aa hs a s 0 0 0 2 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 4 7 6 1 2015-11-12 05:39:45
6 aal aal 1 0 0 1 0 1 aa l a l 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3 4 1 2015-11-12 05:39:45
7 aalii aaiil 1 0 0 1 1 1 aaii l a i 0 0 0 2 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 6 1 2015-11-12 05:39:45
8 aaliis aaiils 1 0 0 1 0 1 aaii ls a s 0 0 0 2 0 0 0 0 0 0 0 2 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 6 6 7 1 2015-11-12 05:39:45
9 aals aals 1 0 0 1 0 1 aa ls a s 0 0 0 2 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 4 4 5 1 2015-11-12 05:39:45
10 aardvark aaadkrrv 1 0 0 1 1 1 aaa rdvrk a k 0 0 0 3 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 2 0 0 0 1 0 0 0 0 0 8 16 17 1 2015-11-12 05:39:45
Think you've already done the hard work with your revised schema. All you need to do now is modify the query to look for <= the number of counts of each letter as specified by the user.
E.g. if the user entered "ALIAS":
SELECT word
FROM words
WHERE count_a <= 2
AND count_b <= 0
AND count_c <= 0
AND count_d <= 0
AND count_e <= 0
AND count_f <= 0
AND count_g <= 0
AND count_h <= 0
AND count_i <= 1
AND count_j <= 0
AND count_k <= 0
AND count_l <= 1
AND count_m <= 0
AND count_n <= 0
AND count_o <= 0
AND count_p <= 0
AND count_q <= 0
AND count_r <= 0
AND count_s <= 1
AND count_t <= 0
AND count_u <= 0
AND count_v <= 0
AND count_w <= 0
AND count_x <= 0
AND count_y <= 0
AND count_z <= 0
ORDER BY CHAR_LENGTH(word), word;
Note: As requested, this is ordering by word length, then alphabetically. Have used <= even for <= 0 just to make it easier to modify by hand for other letters.
This returns "aa", "aal" and "aals" (but not "aalii" or "aaliis" since they both have two "i"s).
See SQL Fiddle Demo.
Since you have two different requirements, I suggest implementing both two different solutions.
Where you don't care about dup letters, build a SET datatype with the 26 letters. Populate the bits according what the word has. This ignores duplicate letters. This also facilitates looking for words with a subset of the letters: (the_set & ~the_letters) = 0.
Where you do care about dups, sort the letters in the word and store that as the key. "msike" becomes "eikms".
Build a table that contains 3 columns:
eikms -- non unique index on this
msike -- the real word - probably good to have this as the PRIMARY KEY
SET('m','s','i',','k','e') -- for the other situation.
msikei and meek would be entered as
eikms
msikei
SET('m','s','i',','k','e') -- (or, if more convenient: SET('m','i','s','i',','k','e')
ekm
meek
SET('e','k','m')
REGEXP is not practical for your task.
Edit 1
I think you also need a column that indicates whether there are any doubled letters in the word. That way, you can distinguish that kiss is allowed for msikes but for for msike.
Edit 2
A SET or an INT UNSIGNED can hold 1 bit for each of the 26 letters -- 0 for not present, 1 for present.
msikes and msike would both go into the set with exactly 5 bits turned on. The value to INSERT would be 'm,s,i,k,e,s' for msikes. Since the rest needs to involve Boolean arithmetic, maybe it would be better to use INT UNSIGNED. So...
a is 1 (1 << 0)
b is 2 (1 << 1)
c is 4 (1 << 2)
d is 8 (1 << 3)
...
z is (1 << 25)
To INSERT you use the | operator. bad becomes
(1 << 1) | (1 << 0) | (1 << 3)
Note how the bits are laid out, with 'a' at the bottom:
SELECT BIN((1 << 1) | (1 << 0) | (1 << 3)); ==> 1011
Similarly 'ad' is 1001. So, does 'ad' match 'bad'? The answer comes from
SELECT b'1001' & ~b'1011' = 0; ==> 1 (meaning 'true')
That means that all the letters in 'ad' (1001) are found in 'bad' (1011). Let's introduce "bed", which is 11010.
SELECT b'11010' & ~b'1011' = 0; ==> FALSE because of 'e' (10000)
But 'dad' (1001) will work fine:
SELECT b'1001' & ~b'1011' = 0; ==> TRUE
So, now comes the "dup" flag. Since 'dad' has dup letters, but 'bad' did not, your rules say that it is not a match. But it took the "dup" to finish the decision.
If you have not had a course in Boolean arithmetic, well, I have just presented the first couple of chapters. If I covered it too fast, find a math book on such and jump in. "It's not rocket science."
So, back to what code is needed to decide whether my_word has a subset of letters and whether it is allowed to have duplicate letters:
SELECT $my_mask & ~tbl.mask = 0, dup FROM tbl;
Then do the suitable AND / OR between to finish the logic.
With the limited Regex support on MySQL, best I can do is a PHP script for generating the query, presuming it only includes English letters. It seems making an expression to exclude invalid words is easier than one that includes them.
<?php
$inputword = str_split('msikes');
$counter = array();
for ($l = 'a'; $l < 'z'; $l++) {
$counter[$l] = 0;
}
foreach ($inputword as $l) {
$counter[$l]++;
}
$nots = '';
foreach ($counter as $l => $c) {
if (!$c) {
$nots .= $l;
unset($counter[$l]);
}
}
$conditions = array();
if(!empty($nots)) {
// exclude words that have letters not given
$conditions[] = "[" . $nots . "]'";
}
foreach ($counter as $l => $c) {
$letters = array();
for ($i = 0; $i <= $c; $i++) {
$letters[] = $l;
}
// exclude words that have the current letter more times than given
$conditions[] = implode('.*', $letters);
}
$sql = "SELECT word FROM words WHERE word NOT RLIKE '" . implode('|', $conditions) . "'";
echo $sql;
Something like this might work for you:
// Input Word
$WORD = strtolower('msikes');
// Alpha Array
$Alpha = range('a', 'z');
// Turn it into letters.
$Splited = str_split($WORD);
$Letters = array();
// Count occurrence of each letter, use letter as key to make it unique
foreach( $Splited as $Letter ) {
$Letters[$Letter] = array_key_exists($Letter, $Letters) ? $Letters[$Letter] + 1 : 1;
}
// Build a list of letters that shouldn't be present in the word
$ShouldNotExists = array_filter($Alpha, function ($Letter) use ($Letters) {
return ! array_key_exists($Letter, $Letters);
});
#### Building SQL Statement
// Letters to skip
$SkipLetters = array();
foreach( $ShouldNotExists as $SkipLetter ) {
$SkipLetters[] = "`has_{$SkipLetter}` = 0";
}
// count condition (for multiple occurrences)
$CountLetters = array();
foreach( $Letters as $K => $V ) {
$CountLetters[] = "`count_{$K}` <= {$V}";
}
$SQL = 'SELECT `word` FROM `words` WHERE '.PHP_EOL;
$SQL .= '('.implode(' AND ', $SkipLetters).')'.PHP_EOL;
$SQL .= ' AND ('.implode(' AND ', $CountLetters).')'.PHP_EOL;
$SQL .= ' ORDER BY LENGTH(`word`), `word`'.PHP_EOL;
echo $SQL;

how to get a value of an array while unserialize the array

I have an serialized array like below:
a:22:{s:18:"'myprofiledefault'";s:1:"2";s:19:"'myprofilepersonal'";s:1:"0";s:14:"'myprofilejob'";s:1:"0";s:16:"'myprofileleave'";s:1:"0";s:21:"'myprofilepermission'";s:1:"0";s:28:"'myprofilebonus & commision'";s:1:"0";s:19:"'myprofiledocument'";s:1:"0";s:28:"'myprofileemergency contact'";s:1:"0";s:19:"'myprofilebenifits'";s:1:"0";s:17:"'view empdefault'";s:1:"0";s:18:"'view emppersonal'";s:1:"0";s:13:"'view empjob'";s:1:"0";s:15:"'view empleave'";s:1:"0";s:20:"'view emppermission'";s:1:"0";s:27:"'view empbonus & commision'";s:1:"0";s:18:"'view empdocument'";s:1:"0";s:27:"'view empemergency contact'";s:1:"0";s:18:"'view empbenifits'";s:1:"0";s:15:"'view empnotes'";s:1:"0";s:17:"'view emponboard'";s:1:"0";s:18:"'view empoffboard'";s:1:"0";s:16:"'view empcharts'";s:1:"0";}
If i unserialized this and print means it will be look like below:
Array(['myprofiledefault'] => 2
['myprofilepersonal'] => 0
['myprofilejob'] => 0
['myprofileleave'] => 0
['myprofilepermission'] => 0
['myprofilebonus & commision'] => 0
['myprofiledocument'] => 0
['myprofileemergency contact'] => 0
['myprofilebenifits'] => 0
['view empdefault'] => 0
['view emppersonal'] => 0
['view empjob'] => 0
['view empleave'] => 0
['view emppermission'] => 0
['view empbonus & commision'] => 0
['view empdocument'] => 0
['view empemergency contact'] => 0
['view empbenifits'] => 0
['view empnotes'] => 0
['view emponboard'] => 0
['view empoffboard'] => 0
['view empcharts'] => 0)
My question is that i want to get the key values individually.
I am trying this one
echo $ret['myprofilepersonal'];
but this is not working showing an error undefined index. Please How to get this
If we suppose that you unserialize the array you posted like so:
$ret = unserialize(...);
Then you need to access the values using sth like this:
echo $ret["'myprofiledefault'"];
// or by escaping single quote:
echo $ret['\'myprofiledefault\''];
because as I see, every key is quoted

Formatting raw text gathered from another website using php

I am trying to retrieve a table from another website, which is based on several variables passed to it via a form. I have worked out that the url details after the ? correspond to those variables and have created a form on my page to post those variables and create url, which I have then put into a file_get_contents process, whereby I collect the table as data (I have narrowed the get to the div in which the table is housed).
My problem is that the data is shown as a string of plain text on my page with no formatting (i.e. no columns or rows).
Here is the code to retrieve the data:
<?php
$page = file_get_contents($stats_url);
$doc = new DOMDocument();
$doc->loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
// Loop through the DIVs looking for one withan id of "content"
// Then echo out its contents (pardon the pun)
if ($div->getAttribute('id') === 'statstable') {
echo $div->nodeValue;
}
}
?>
Here is a sample of the data returned:
NameGamesInnsNot OutsRunsHigh ScoreAvg50's100'sDucksStrike RateBowled (%)Caught (%)LBW (%)Stumped (%)Run Out (%)Not Out (%)Did Not Bat (%)%Games Won%Games Drawn%Games Lost%Team RunsCatchesStumpingsRun OutsOwais Fareed 1 1 0 72 7272 1 0 0 - 0 1 (100) 0 0 0 0 0 0 0 100 42.6 0 0 0 Atif Ali 2 2 0 28 2814 0 0 1 - 2 (100) 0 0 0 0 0 0 0 0 100 11.62 0 0 0 Craig Hills 2 2 0 20 1310 0 0 0 - 0 1 (50) 1 (50) 0 0 0 0 0 0 100 8.3 1 0 0 Dale Skeath 2 2 0 16 128 0 0 0 - 1 (50) 1 (50) 0 0 0 0 0 0 0 100 6.64 1 0 0 ash ashim 2 2 1 16 10*16 0 0 0 - 0 1 (50) 0 0 0 1 (50) 0 0 0 100 6.64 0 0 0 Hussain Dalvi 1 1 0 11 1111 0 0 0 - 0 1 (100) 0 0 0 0 0 0 0 100 6.51 0 0 0 Azhar Ali 1 1 0 11 1111 0 0 0 - 0 1 (100) 0 0 0 0 0 0 0 100 6.51 0 0 0 A Hammed 1 1 0 10 1010 0 0 0 - 0 1 (100) 0 0 0 0 0 0 0 100 5.92 0 0 0 M Ali 1 1 0 5 55 0 0 0 - 1 (100) 0 0 0 0 0 0 0 0 100 2.96 0 0 0 Simon Pleasant 1 1 0 5 55 0 0 0 - 0 1 (100) 0 0 0 0 0 0 0 100 6.94 0 0 0
How can I then take this text and recompile it as a table?
Check out PHP Simple HTML DOM Parser
It works brilliantly for this stuff.
http://simplehtmldom.sourceforge.net/

Categories