Extract tabular data using regular expressions [closed] - php
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a text file with multiple occurrences of tables like show below:
_____________________________________
Heading 1 | Heading 2
_______________ | ___________________
Label1 18857.10 | Label3 710.00
Label2 2361.50 | Label4 0.00
| Label5 2531.37
| Label6 0.00
| Label7 0.00
| Label8 0.01
________________| ___________________
16495.60 | Label9 3969.06
_______________ | ___________________
I want to store the numerical values into variables using regular expressions. Since I'm new to regular expressions, I couldn't find a way to do it. Can anyone help me with this?
$table="_____________________________________
Heading 1 | Heading 2
_______________ | ___________________
Label1 18857.10 | Label3 710.00
Label2 2361.50 | Label4 0.00
| Label5 2531.37
| Label6 0.00
| Label7 0.00
| Label8 0.01
________________| ___________________
16495.60 | Label9 3969.06
_______________ | ___________________
";
$num = preg_match_all('/(\w+) (\d+(\.\d+)?)/', $table, $result);
for($i=0; $i<$num; $i++){
echo "{$result[1][$i]} = {$result[2][$i]}<br>";
}
If your table is exactly what you showed, this works.
regex: /(\w+) (\d+(\.\d+)?)/
Slashes / at the begining and end are delimiting the regex.
(\w+) means, "match any letter,number or underscore once or more times
one space follows, you can add + after the space, to match more then one, or put \s instead of space, to match any white character, like tab for example..
(\d+(\.\d+)?) ... \d+ means one or more digits, (\.\d+) means dot followed by one or more digits, question mark means that the previous parenthesis (\.\d+) is optional.
Preg_match_all stores those matches in third parameter and returns number of matches. In $result[$i][0] is the whole match, $result[$i][1] is first sub-expression (\w+), $result[$i][2] is second parenthesis (\d+(\.\d+)?), $result[$i][3] is the decimal part (\.\d+), it is inside $result[$i][2], but you don't need $result[$i][3], just for explanation :)
The code prints:
Heading = 1
Heading = 2
Label1 = 18857.10
Label3 = 710.00
Label2 = 2361.50
Label4 = 0.00
Label5 = 2531.37
Label6 = 0.00
Label7 = 0.00
Label8 = 0.01
Label9 = 3969.06
edit: sorry, it doesn't work, it didn't match that naked 16495.60 value. Let me think a bit more...
...
$regex='/([a-zA-Z0-9]+)? +(\d+(\.\d+)?)/';
is bit better, here's how it works:
[a-zA-Z0-9]+ matches non-zero ammount of letters or numbers
? after parenthesis means, the whole parenthesis expression is optional.
+ one or more spaces
(\d+(\.\d+)?) non-zero ammount of digits followed by optional { dot and another non-zero ammount of digits }
This whole regex does not include | or new-line, so all matching should happen in only one field of the table.
The result variable should be:
array (size=4)
0 =>
array (size=12)
0 => string 'Heading 1' (length=9)
1 => string 'Heading 2' (length=9)
2 => string 'Label1 18857.10' (length=15)
3 => string 'Label3 710.00' (length=13)
4 => string 'Label2 2361.50' (length=14)
5 => string 'Label4 0.00' (length=11)
6 => string 'Label5 2531.37' (length=14)
7 => string 'Label6 0.00' (length=11)
8 => string 'Label7 0.00' (length=11)
9 => string 'Label8 0.01' (length=11)
10 => string ' 16495.60' (length=19)
11 => string 'Label9 3969.06' (length=14)
1 =>
array (size=12)
0 => string 'Heading' (length=7)
1 => string 'Heading' (length=7)
2 => string 'Label1' (length=6)
3 => string 'Label3' (length=6)
4 => string 'Label2' (length=6)
5 => string 'Label4' (length=6)
6 => string 'Label5' (length=6)
7 => string 'Label6' (length=6)
8 => string 'Label7' (length=6)
9 => string 'Label8' (length=6)
10 => string '' (length=0)
11 => string 'Label9' (length=6)
2 =>
array (size=12)
0 => string '1' (length=1)
1 => string '2' (length=1)
2 => string '18857.10' (length=8)
3 => string '710.00' (length=6)
4 => string '2361.50' (length=7)
5 => string '0.00' (length=4)
6 => string '2531.37' (length=7)
7 => string '0.00' (length=4)
8 => string '0.00' (length=4)
9 => string '0.01' (length=4)
10 => string '16495.60' (length=8)
11 => string '3969.06' (length=7)
3 =>
array (size=12)
0 => string '' (length=0)
1 => string '' (length=0)
2 => string '.10' (length=3)
3 => string '.00' (length=3)
4 => string '.50' (length=3)
5 => string '.00' (length=3)
6 => string '.37' (length=3)
7 => string '.00' (length=3)
8 => string '.00' (length=3)
9 => string '.01' (length=3)
10 => string '.60' (length=3)
11 => string '.06' (length=3)
edit2: GRAB THOSE SNIPPETS AGAIN! There should be a backslash before the dot, in (\.\d+)!!! I formated it wrong and it disappeared.** Rewrote it, should be fine now.
Related
Can not match the last group of numbers using php preg_match()
preg_match_all("/(\d{12}) (?:,|$)/","111762396541,561572500056,561729950637,561135281443",$matches); var_dump($mathes): array (size=2) 0 => array (size=4) 0 => string '561762396543,' (length=13) 1 => string '561572500056,' (length=13) 2 => string '561729950637,' (length=13) 3 => string '561135281443' (length=12) 1 => array (size=4) 0 => string '561762396543' (length=12) 1 => string '561572500056' (length=12) 2 => string '561729950637' (length=12) 3 => string '561135281443' (length=12) But I want the $matches like this: array (size=4) 0 => string '561762396543,' (length=13) 1 => string '561572500056,' (length=13) 2 => string '561729950637,' (length=13) 3 => string '561135281443' (length=12) I wanna match groups of numbers(each has 12 digits) and a suffix comma if there is one.The exeption is the last group of numbers,it doesnt have to match a comma,cause it reaches the end of the line.
Try this instead: preg_match_all("/(\d{12}(?:,|$))/","111762396541,561572500056,561729950637,561135281443",$matches); When the $ is inside your character range brackets [ ] it is looking for the $ characters not the end-of-line. EDIT: If you want to include the comma in your matches, then just use the above code sample and look at $matches[0]. If you wanted an easier syntax that matches any sort of word boundary, the \b will match commas and end-of-line, too: preg_match_all("/(\d{12}\b)/","111762396541,561572500056,561729950637,561135281443",$matches);
How to limit a variable search to a single line of text?
Considering this sample text: grupo1, tiago1A, bola1A, mola1A, tijolo1A, pedro1B, bola1B, mola1B, tijolo1B, raimundo1C, bola1C, mola1C, tijolo1C, joao1D, bola1D, mola1D, tijolo1D, felipe1E, bola1E, mola1E, tijolo1E, grupo2, tiago2A, bola2A, mola2A, tijolo2A, pedro2B, bola2B, mola2B, tijolo2B, raimundo2C, bola2C, mola2C, tijolo2C, joao2D, bola2D, mola2D, tijolo2D, felipe2E, bola2E, mola2E, tijolo2E, grupo3, tiago3A, bola3A, mola3A, tijolo3A, pedro3B, bola3B, mola3B, tijolo3B, raimundo3C, bola3C, mola3C, tijolo3C, joao3D, bola3D, mola3D, tijolo3D, felipe3E, bola3E, mola3E, tijolo3E, grupo4, tiago4A, bola4A, mola4A, tijolo4A, pedro4B, bola4B, mola4B, tijolo4B, raimundo4C, bola4C, mola4C, tijolo4C, joao4D, bola4D, mola4D, tijolo4D, felipe4E, bola4E, mola4E, tijolo4E, grupo5, tiago5A, bola5A, mola5A, tijolo5A, pedro5B, bola5B, mola5B, tijolo5B, raimundo5C, bola5C, mola5C, tijolo5C, joao5D, bola5D, mola5D, tijolo5D, felipe5E, bola5E, mola5E, tijolo5E, I would like to capture the 20 values that follow grupo3 and store them in groups of 4. I am using this: (Demo) /grupo3,((.*?),(.*?),(.*?),(.*?)),/ but this only returns the first 4 comma separated values after grupo3. I need generate this array structure: Match 1 Group 1 tiago3A Group 2 bola3A Group 3 mola3A Group 4 tijolo3A Match 2 Group 1 pedro3B Group 2 bola3B Group 3 mola3B Group 4 tijolo3B Match 3 Group 1 raimundo3C Group 2 bola3C Group 3 mola3C Group 4 tijolo3C Match 4 Group 1 joao3D Group 2 bola3D Group 3 mola3D Group 4 tijolo3D Match 5 Group 1 felipe3E Group 2 bola3E Group 3 mola3E Group 4 tijolo3E
You can try the following: /,(.*?),(.*?),(.*?),(.*?),.*?$/m the /m in the end indicates the flag for multi-line and $ before that indicates end of line. Demo Edit: For getting every 4 elements only form the 3rd paragraph /grupo3,((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)),/ Demo And you can get the desired output in PHP like: preg_match('/grupo3,((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)),/', $str, $matches); $groups = []; unset($matches[0]); $matches = array_values($matches); $count = count($matches); $j=0; for($i=1;$i<$count;$i++) { if($i%5 == 0) { $j++; continue; } $groups[$j][] = $matches[$i]; } var_dump($groups); Output will be something like: array (size=5) 0 => array (size=4) 0 => string ' tiago3A' (length=8) 1 => string ' bola3A' (length=7) 2 => string ' mola3A' (length=7) 3 => string ' tijolo3A' (length=9) 1 => array (size=4) 0 => string 'pedro3B' (length=7) 1 => string ' bola3B' (length=7) 2 => string ' mola3B' (length=7) 3 => string ' tijolo3B' (length=9) 2 => array (size=4) 0 => string 'raimundo3C' (length=10) 1 => string ' bola3C' (length=7) 2 => string ' mola3C' (length=7) 3 => string ' tijolo3C' (length=9) 3 => array (size=4) 0 => string 'joao3D' (length=6) 1 => string ' bola3D' (length=7) 2 => string ' mola3D' (length=7) 3 => string ' tijolo3D' (length=9) 4 => array (size=4) 0 => string 'felipe3E' (length=8) 1 => string ' bola3E' (length=7) 2 => string ' mola3E' (length=7) 3 => string 'tijolo3E' (length=0)
Please forgive the lateness of this answer. This is the comprehensive answer with a clean/direct solution that I would have posted earlier if this page wasn't put on hold. This is as refined a solution as I can devise without knowing more about how your input data is generated/accessed. The input: $text='grupo1, tiago1A, bola1A, mola1A, tijolo1A, pedro1B, bola1B, mola1B, tijolo1B, raimundo1C, bola1C, mola1C, tijolo1C, joao1D, bola1D, mola1D, tijolo1D, felipe1E, bola1E, mola1E, tijolo1E, grupo2, tiago2A, bola2A, mola2A, tijolo2A, pedro2B, bola2B, mola2B, tijolo2B, raimundo2C, bola2C, mola2C, tijolo2C, joao2D, bola2D, mola2D, tijolo2D, felipe2E, bola2E, mola2E, tijolo2E, grupo3, tiago3A, bola3A, mola3A, tijolo3A, pedro3B, bola3B, mola3B, tijolo3B, raimundo3C, bola3C, mola3C, tijolo3C, joao3D, bola3D, mola3D, tijolo3D, felipe3E, bola3E, mola3E, tijolo3E, grupo4, tiago4A, bola4A, mola4A, tijolo4A, pedro4B, bola4B, mola4B, tijolo4B, raimundo4C, bola4C, mola4C, tijolo4C, joao4D, bola4D, mola4D, tijolo4D, felipe4E, bola4E, mola4E, tijolo4E, grupo5, tiago5A, bola5A, mola5A, tijolo5A, pedro5B, bola5B, mola5B, tijolo5B, raimundo5C, bola5C, mola5C, tijolo5C, joao5D, bola5D, mola5D, tijolo5D, felipe5E, bola5E, mola5E, tijolo5E,'; The method: (PHP Demo) var_export(preg_match('/^grupo3, \K.*(?=,)/m',$text,$out)?array_chunk(explode(', ',$out[0]),4):'fail'); Use preg_match() to extract the single line, then use explode() to split the string on "comma space", then use array_chunk() to store in an array of 5 subarrays containing 4 elements each. The pattern targets grupo3, at the start of the line, then restarts the full match using \K then greedily matches every non-newline character and stops just before the last comma in the line. The positive lookahead (?=,) doesn't store the final comma in the full string match. (Pattern Demo) My method does not retain any leading and trailing spaces, just the values themselves. Output: array ( 0 => array ( 0 => 'tiago3A', 1 => 'bola3A', 2 => 'mola3A', 3 => 'tijolo3A', ), 1 => array ( 0 => 'pedro3B', 1 => 'bola3B', 2 => 'mola3B', 3 => 'tijolo3B', ), 2 => array ( 0 => 'raimundo3C', 1 => 'bola3C', 2 => 'mola3C', 3 => 'tijolo3C', ), 3 => array ( 0 => 'joao3D', 1 => 'bola3D', 2 => 'mola3D', 3 => 'tijolo3D', ), 4 => array ( 0 => 'felipe3E', 1 => 'bola3E', 2 => 'mola3E', 3 => 'tijolo3E', ), ) p.s. If the search term ($needle) is to be dynamic, you can use something like this to achieve the same result: (PHP Demo) $needle='grupo3'; // if the needle may include any regex-sensitive characters, use preg_quote($needle,'/') at $needle var_export(preg_match('/^'.$needle.', \K.*(?=,)/m',$text,$out)?array_chunk(explode(', ',$out[0]),4):'fail'); /* or this is equivalent... if(preg_match('/^'.$needle.', \K.*(?=,)/m',$text,$out)){ $singles=explode(', ',$out[0]); $groups=array_chunk($singles,4); var_export($groups); }else{ echo 'fail'; } */
JpGraph : no error but broken picture
I make a graph with JPGraph but I've a little problem. No error was displayed but the picture is a "broken picture". Here is my code : $graph = new Graph(1000,300); $graph->img->SetMargin(40,30,50,40); $graph->SetScale("textlin"); $graph->title->Set("Graph"); $graph->ygrid->Show(); $graph->ygrid->SetColor('blue#0.7'); $graph->ygrid->SetLineStyle('dashed'); $graph->xgrid->Show(); $graph->xgrid->SetColor('red#0.7'); $graph->xgrid->SetLineStyle('dashed'); $graph->title->SetFont(FF_ARIAL,FS_BOLD,11); $bigCourbe = array(); $i = 0; foreach($bigData as $data) { var_dump($data); $courbe = new LinePlot($data); $courbe->value->Show(); $courbe->value->SetFont(FF_ARIAL,FS_NORMAL,9); $courbe->value->SetFormat('%d'); $courbe->mark->SetType(MARK_FILLEDCIRCLE); $courbe->mark->SetFillColor("green"); $courbe->mark->SetWidth(2); $courbe->SetWeight(10); echo $function[$i]; $courbe->SetLegend($function[$i++]); $bigCourbe[] = $courbe; } $graph->xaxis->title->Set("Heures"); $graph->yaxis->title->SetFont(FF_FONT1,FS_BOLD); $graph->xaxis->title->SetFont(FF_FONT1,FS_BOLD); foreach ($bigCourbe as $elem) { $graph->Add($elem); } //$graph->Stroke(); So, the result seems good : array (size=24) 0 => string '0.000' (length=5) 1 => string '0.000' (length=5) 2 => string '0.000' (length=5) 3 => string '0.000' (length=5) 4 => string '0.000' (length=5) 5 => string '0.000' (length=5) .... Index HC <- it's the $function[$i] var array (size=24) 0 => string '0.200' (length=5) 1 => string '0.200' (length=5) 2 => string '0.100' (length=5) 3 => string '0.200' (length=5) 4 => string '0.200' (length=5) 5 => string '0.200' (length=5) .... Index HP array (size=24) 0 => string '0.000' (length=5) 1 => string '0.000' (length=5) 2 => string '0.000' (length=5) 3 => string '0.000' (length=5) 4 => string '0.000' (length=5) 5 => string '0.000' (length=5) Index HPTE array (size=24) 0 => string '0.200' (length=5) 1 => string '0.200' (length=5) 2 => string '0.100' (length=5) 3 => string '0.200' (length=5) 4 => string '0.200' (length=5) 5 => string '0.200' (length=5) .... Total Heure I do not see the problem... If I remove the debug, and uncomment the $graph->Stroke(); line. I have that : Can you help me ? Thank you in advance.
in my case (php 7.1.0) it was Deprecated message just put ini_set('display_errors', 0); at the top of your script and try again. or comment out $graph->Stroke(); and see what error(or warning) you have
I know this is an old question but I just had the same problem and fixed it by removing everything before the <?php. THis was suggested in the link below - check 1.b. Solution Source
How do you display the graph? Do you have an HTML page? Please show the HTML code. I have a similar problem where I have no error message, but get the broken image icon. jpGraph can't include PHP file One thing I would recommend is to add the lines //save to file $fileName = "/tmp/imagefile.png"; $graph->img->Stream($fileName); if the image from the file is correct, then you can rule out the graph generating code.
What array function should I use for creating an index?
Hello guys I am trying to create an index of all words on html page that my crawler parses. At this moment I have managed to breakdown the html page into an array of words and I have filtered out all the stop words. At this stage I have a few problems. The array of words from the parsed html page have words that are repeated, I like that because I still have to record how many times a word appeared in the page. The array looks like this. $wordsFromHTML = array (size=119) 0 => string 'web' (length=3) 1 => string 'giants' (length=6) 2 => string 'vryheid' (length=7) 3 => string 'news' (length=4) 4 => string 'access' (length=6) 5 => string 'mails' (length=5) 6 => string 'mobile' (length=6) 7 => string 'february' (length=8) 8 => string 'access' (length=6) 9 => string 'mails' (length=5) 10 => string 'web' (length=3) 11 => string 'february' (length=8) 12 => string 'access' (length=6) 13 => string 'mails' (length=5) 14 => string 'desktop' (length=7) 15 => string 'february' (length=8) 16 => string 'hosting' (length=7) 17 => string 'web' (length=3) 18 => string 'giants' (length=6) 19 => string 'vryheid' (length=7) 20 => string 'february' (length=8) 22 => string 'us' (length=2) Now I want to save all the words from the $wordsFromHTML to the $indesArray which is my final index. It should look like this. $indexArray = array('web'=>array('url'=>array(0,10,17))) The problem is how to keep incrementing the position ($wordsFromHTML keys) for each word that was repeated from the $wordsFromHTML array in the final index array. The index array should only have unique words and if another word that already exists try to come in, we use the already existing word which has the same URL and increment its position. Hope you understand my question.
Smarty value of array extraction using date_format result on other array value as a key
I have two arrays in a smarty template: $months and $contract. {$months|var_dump} gets this: array (size=12) 1 => string 'января' (length=12) 2 => string 'февраля' (length=14) 3 => string 'марта' (length=10) 4 => string 'апреля' (length=12) 5 => string 'мая' (length=6) 6 => string 'июня' (length=8) 7 => string 'июля' (length=8) 8 => string 'августа' (length=14) 9 => string 'сентября' (length=16) 10 => string 'октября' (length=14) 11 => string 'ноября' (length=12) 12 => string 'декабря' (length=14) array values are russian names of months in genitive. {$contract|var_dump} gets this 'date_till' => '1355518365' (length=10) so I need to create a month number first from $contract.date_till. it is usually done like {$contract.date_till|date_format:"%m"} And now the question is: how do I extract a month name from $months array by the month number made of $contract.date_till with date_format? I've tried many variants described in smarty manuals, but noone works. For example, this one doesn't: {$months[{$contract.date_till|date_format:"%m"}]}
{assign var=monthNo value=$contract.date_till|date_format:"%m"} {$months.$monthNo} This will give u the month of the necessary date.