How to limit a variable search to a single line of text? - php

Considering this sample text:
grupo1, tiago1A, bola1A, mola1A, tijolo1A, pedro1B, bola1B, mola1B, tijolo1B, raimundo1C, bola1C, mola1C, tijolo1C, joao1D, bola1D, mola1D, tijolo1D, felipe1E, bola1E, mola1E, tijolo1E,
grupo2, tiago2A, bola2A, mola2A, tijolo2A, pedro2B, bola2B, mola2B, tijolo2B, raimundo2C, bola2C, mola2C, tijolo2C, joao2D, bola2D, mola2D, tijolo2D, felipe2E, bola2E, mola2E, tijolo2E,
grupo3, tiago3A, bola3A, mola3A, tijolo3A, pedro3B, bola3B, mola3B, tijolo3B, raimundo3C, bola3C, mola3C, tijolo3C, joao3D, bola3D, mola3D, tijolo3D, felipe3E, bola3E, mola3E, tijolo3E,
grupo4, tiago4A, bola4A, mola4A, tijolo4A, pedro4B, bola4B, mola4B, tijolo4B, raimundo4C, bola4C, mola4C, tijolo4C, joao4D, bola4D, mola4D, tijolo4D, felipe4E, bola4E, mola4E, tijolo4E,
grupo5, tiago5A, bola5A, mola5A, tijolo5A, pedro5B, bola5B, mola5B, tijolo5B, raimundo5C, bola5C, mola5C, tijolo5C, joao5D, bola5D, mola5D, tijolo5D, felipe5E, bola5E, mola5E, tijolo5E,
I would like to capture the 20 values that follow grupo3 and store them in groups of 4.
I am using this: (Demo)
/grupo3,((.*?),(.*?),(.*?),(.*?)),/
but this only returns the first 4 comma separated values after grupo3.
I need generate this array structure:
Match 1
Group 1 tiago3A
Group 2 bola3A
Group 3 mola3A
Group 4 tijolo3A
Match 2
Group 1 pedro3B
Group 2 bola3B
Group 3 mola3B
Group 4 tijolo3B
Match 3
Group 1 raimundo3C
Group 2 bola3C
Group 3 mola3C
Group 4 tijolo3C
Match 4
Group 1 joao3D
Group 2 bola3D
Group 3 mola3D
Group 4 tijolo3D
Match 5
Group 1 felipe3E
Group 2 bola3E
Group 3 mola3E
Group 4 tijolo3E

You can try the following:
/,(.*?),(.*?),(.*?),(.*?),.*?$/m
the /m in the end indicates the flag for multi-line and $ before that indicates end of line. Demo
Edit: For getting every 4 elements only form the 3rd paragraph
/grupo3,((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)),/
Demo
And you can get the desired output in PHP like:
preg_match('/grupo3,((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)), ((.*?),(.*?),(.*?),(.*?)),/', $str, $matches);
$groups = [];
unset($matches[0]);
$matches = array_values($matches);
$count = count($matches);
$j=0;
for($i=1;$i<$count;$i++)
{
if($i%5 == 0)
{
$j++;
continue;
}
$groups[$j][] = $matches[$i];
}
var_dump($groups);
Output will be something like:
array (size=5)
0 =>
array (size=4)
0 => string ' tiago3A' (length=8)
1 => string ' bola3A' (length=7)
2 => string ' mola3A' (length=7)
3 => string ' tijolo3A' (length=9)
1 =>
array (size=4)
0 => string 'pedro3B' (length=7)
1 => string ' bola3B' (length=7)
2 => string ' mola3B' (length=7)
3 => string ' tijolo3B' (length=9)
2 =>
array (size=4)
0 => string 'raimundo3C' (length=10)
1 => string ' bola3C' (length=7)
2 => string ' mola3C' (length=7)
3 => string ' tijolo3C' (length=9)
3 =>
array (size=4)
0 => string 'joao3D' (length=6)
1 => string ' bola3D' (length=7)
2 => string ' mola3D' (length=7)
3 => string ' tijolo3D' (length=9)
4 =>
array (size=4)
0 => string 'felipe3E' (length=8)
1 => string ' bola3E' (length=7)
2 => string ' mola3E' (length=7)
3 => string 'tijolo3E' (length=0)

Please forgive the lateness of this answer. This is the comprehensive answer with a clean/direct solution that I would have posted earlier if this page wasn't put on hold. This is as refined a solution as I can devise without knowing more about how your input data is generated/accessed.
The input:
$text='grupo1, tiago1A, bola1A, mola1A, tijolo1A, pedro1B, bola1B, mola1B, tijolo1B, raimundo1C, bola1C, mola1C, tijolo1C, joao1D, bola1D, mola1D, tijolo1D, felipe1E, bola1E, mola1E, tijolo1E,
grupo2, tiago2A, bola2A, mola2A, tijolo2A, pedro2B, bola2B, mola2B, tijolo2B, raimundo2C, bola2C, mola2C, tijolo2C, joao2D, bola2D, mola2D, tijolo2D, felipe2E, bola2E, mola2E, tijolo2E,
grupo3, tiago3A, bola3A, mola3A, tijolo3A, pedro3B, bola3B, mola3B, tijolo3B, raimundo3C, bola3C, mola3C, tijolo3C, joao3D, bola3D, mola3D, tijolo3D, felipe3E, bola3E, mola3E, tijolo3E,
grupo4, tiago4A, bola4A, mola4A, tijolo4A, pedro4B, bola4B, mola4B, tijolo4B, raimundo4C, bola4C, mola4C, tijolo4C, joao4D, bola4D, mola4D, tijolo4D, felipe4E, bola4E, mola4E, tijolo4E,
grupo5, tiago5A, bola5A, mola5A, tijolo5A, pedro5B, bola5B, mola5B, tijolo5B, raimundo5C, bola5C, mola5C, tijolo5C, joao5D, bola5D, mola5D, tijolo5D, felipe5E, bola5E, mola5E, tijolo5E,';
The method: (PHP Demo)
var_export(preg_match('/^grupo3, \K.*(?=,)/m',$text,$out)?array_chunk(explode(', ',$out[0]),4):'fail');
Use preg_match() to extract the single line, then use explode() to split the string on "comma space", then use array_chunk() to store in an array of 5 subarrays containing 4 elements each.
The pattern targets grupo3, at the start of the line, then restarts the full match using \K then greedily matches every non-newline character and stops just before the last comma in the line. The positive lookahead (?=,) doesn't store the final comma in the full string match.
(Pattern Demo)
My method does not retain any leading and trailing spaces, just the values themselves.
Output:
array (
0 =>
array (
0 => 'tiago3A',
1 => 'bola3A',
2 => 'mola3A',
3 => 'tijolo3A',
),
1 =>
array (
0 => 'pedro3B',
1 => 'bola3B',
2 => 'mola3B',
3 => 'tijolo3B',
),
2 =>
array (
0 => 'raimundo3C',
1 => 'bola3C',
2 => 'mola3C',
3 => 'tijolo3C',
),
3 =>
array (
0 => 'joao3D',
1 => 'bola3D',
2 => 'mola3D',
3 => 'tijolo3D',
),
4 =>
array (
0 => 'felipe3E',
1 => 'bola3E',
2 => 'mola3E',
3 => 'tijolo3E',
),
)
p.s. If the search term ($needle) is to be dynamic, you can use something like this to achieve the same result: (PHP Demo)
$needle='grupo3';
// if the needle may include any regex-sensitive characters, use preg_quote($needle,'/') at $needle
var_export(preg_match('/^'.$needle.', \K.*(?=,)/m',$text,$out)?array_chunk(explode(', ',$out[0]),4):'fail');
/* or this is equivalent...
if(preg_match('/^'.$needle.', \K.*(?=,)/m',$text,$out)){
$singles=explode(', ',$out[0]);
$groups=array_chunk($singles,4);
var_export($groups);
}else{
echo 'fail';
}
*/

Related

How to export a multi dimensional array to a specific .csv layout with fputcsv PHP

I know the answer to this will be obvious but I have spent the last 3 days trying to figure it out. I am having trouble getting a Multi-Dimensional array to export into the correct layout in the exported .csv file.
I seem to able to either get all the data but not in the correct layout or I can get the correct layout but not all the data.
This is the array
array (size=106)
0 =>
array (size=6)
0 => string 'Title' (length=5)
1 => string 'image_url' (length=9)
3 => string 'SKU CODE' (length=8)
4 => string 'TITLE SIZE' (length=10)
5 => string 'DESCRIPTION' (length=11)
6 => string 'BASE SKU' (length=8)
1 =>
array (size=6)
0 => string 'A witch and her cat live here' (length=29)
1 => string 'https://beautifulhomegifts.com/a-witch-and-her-cat-live-here/' (length=61)
3 =>
array (size=4)
0 => string 'BHG-MS-AWAHCLH030720' (length=20)
1 => string 'BHG-MS-AWAHCLH030720-A5' (length=23)
2 => string 'BHG-MS-AWAHCLH030720-A4' (length=23)
3 => string 'BHG-MS-AWAHCLH030720-A3' (length=23)
4 =>
array (size=4)
0 => string 'A witch and her cat live here' (length=29)
1 => string 'A witch and her cat live here - 150mm x 200mm' (length=45)
2 => string 'A witch and her cat live here - 201mm x 305mm' (length=45)
3 => string 'A witch and her cat live here - 305mm x 400mm' (length=45)
5 =>
array (size=4)
0 => string 'A witch and her cat live here' (length=29)
1 => string 'A witch and her cat live here' (length=29)
2 => string 'A witch and her cat live here' (length=29)
3 => string 'A witch and her cat live here' (length=29)
6 =>
array (size=3)
1 => string 'BHG-MS-AWAHCLH030720' (length=20)
2 => string 'BHG-MS-AWAHCLH030720' (length=20)
3 => string 'BHG-MS-AWAHCLH030720' (length=20)
2 =>
array (size=2)
0 => string '' (length=0)
1 => string '' (length=0)
3 =>
array (size=2)
0 => string '' (length=0)
1 => string '' (length=0)
4 =>
array (size=2)
0 => string '' (length=0)
1 => string '' (length=0)
5 =>
array (size=6)
0 => string 'Autism House Rules' (length=18)
1 => string 'https://beautifulhomegifts.com/autism-house-rules/' (length=50)
3 =>
array (size=4)
0 => string 'BHG-MS-AHR030720' (length=16)
1 => string 'BHG-MS-AHR030720-A5' (length=19)
2 => string 'BHG-MS-AHR030720-A4' (length=19)
3 => string 'BHG-MS-AHR030720-A3' (length=19)
4 =>
array (size=4)
0 => string 'Autism House Rules' (length=18)
1 => string 'Autism House Rules - 150mm x 200mm' (length=34)
2 => string 'Autism House Rules - 201mm x 305mm' (length=34)
3 => string 'Autism House Rules - 305mm x 400mm' (length=34)
5 =>
array (size=4)
0 => string 'Autism House Rules' (length=18)
1 => string 'Autism House Rules' (length=18)
2 => string 'Autism House Rules' (length=18)
3 => string 'Autism House Rules' (length=18)
6 =>
array (size=3)
1 => string 'BHG-MS-AHR030720' (length=16)
2 => string 'BHG-MS-AHR030720' (length=16)
3 => string 'BHG-MS-AHR030720' (length=16)
6 =>
array (size=2)
0 => string '' (length=0)
1 => string '' (length=0)
7 =>
array (size=2)
0 => string '' (length=0)
1 => string '' (length=0)
8 =>
array (size=2)
0 => string '' (length=0)
1 => string '' (length=0)
9 =>
I have tried multiple ways to get this to work and this is the closest I have got to it being correct
$f = fopen('new.csv', 'a'); // Configure fOpen to create, open and write only.
if ($f != false){
// Loop over the array and passing in the values only.
foreach ($the_big_array as $row){
fputcsv($f, $row);
}
}
fclose($f);
This gives me this layout but it just shows there is a child array and does not output the data of the child arrays.
Above is the output I am getting.
Below is the layout I want to achieve.
I have also tried a foreach loop inside a foreach loop to get the data, when I do this I get all the data but not in the same layout. I have looked through all the posts on here and so many get close to what I want to achieve but none of them give the correct layout.
To summarise, I want to export $the_big_array to a .csv file that has the layout of the second image of a .csv in a spreadsheet. Thank you
array (
0 =>
array (
0 => 'Title',
1 => 'image_url',
3 => 'SKU CODE',
4 => 'TITLE SIZE',
5 => 'DESCRIPTION',
6 => 'BASE SKU',
),
1 =>
array (
0 => 'A witch and her cat live here',
1 => 'https://beautifulhomegifts.com/a-witch-and-her-cat-live-here/',
3 =>
array (
0 => 'BHG-MS-AWAHCLH030720',
1 => 'BHG-MS-AWAHCLH030720-A5',
2 => 'BHG-MS-AWAHCLH030720-A4',
3 => 'BHG-MS-AWAHCLH030720-A3',
),
4 =>
array (
0 => 'A witch and her cat live here',
1 => 'A witch and her cat live here - 150mm x 200mm',
2 => 'A witch and her cat live here - 201mm x 305mm',
3 => 'A witch and her cat live here - 305mm x 400mm',
),
5 =>
array (
0 => 'A witch and her cat live here',
1 => 'A witch and her cat live here',
2 => 'A witch and her cat live here',
3 => 'A witch and her cat live here',
),
6 =>
array (
1 => 'BHG-MS-AWAHCLH030720',
2 => 'BHG-MS-AWAHCLH030720',
3 => 'BHG-MS-AWAHCLH030720',
),
),
2 =>
array (
0 => '',
1 => '',
),
3 =>
array (
0 => '',
1 => '',
),
4 =>
array (
0 => '',
1 => '',
),
5 =>
array (
0 => 'Autism House Rules',
1 => 'https://beautifulhomegifts.com/autism-house-rules/',
3 =>
array (
0 => 'BHG-MS-AHR030720',
1 => 'BHG-MS-AHR030720-A5',
2 => 'BHG-MS-AHR030720-A4',
3 => 'BHG-MS-AHR030720-A3',
),
4 =>
array (
0 => 'Autism House Rules',
1 => 'Autism House Rules - 150mm x 200mm',
2 => 'Autism House Rules - 201mm x 305mm',
3 => 'Autism House Rules - 305mm x 400mm',
),
5 =>
array (
0 => 'Autism House Rules',
1 => 'Autism House Rules',
2 => 'Autism House Rules',
3 => 'Autism House Rules',
),
6 =>
array (
1 => 'BHG-MS-AHR030720',
2 => 'BHG-MS-AHR030720',
3 => 'BHG-MS-AHR030720',
),
),
Ok since the array is malformed and the code is a bit lengthy, I would like to say that we
First, print the headers by popping the first entry in the array.
Make each row have same number of entries by getting the max depth/ max count that a row entry could go with entry values.
Print each new row which is symmetrically arranged by using array_column(). You can print $final_row_data in the code to get a better view of how it is symmeterically arranged.
Snippet:
<?php
$the_big_array = array (
0 =>
array (
0 => 'Title',
1 => 'image_url',
3 => 'SKU CODE',
4 => 'TITLE SIZE',
5 => 'DESCRIPTION',
6 => 'BASE SKU',
),
1 =>
array (
0 => 'A witch and her cat live here',
1 => 'https://beautifulhomegifts.com/a-witch-and-her-cat-live-here/',
3 =>
array (
0 => 'BHG-MS-AWAHCLH030720',
1 => 'BHG-MS-AWAHCLH030720-A5',
2 => 'BHG-MS-AWAHCLH030720-A4',
3 => 'BHG-MS-AWAHCLH030720-A3',
),
4 =>
array (
0 => 'A witch and her cat live here',
1 => 'A witch and her cat live here - 150mm x 200mm',
2 => 'A witch and her cat live here - 201mm x 305mm',
3 => 'A witch and her cat live here - 305mm x 400mm',
),
5 =>
array (
0 => 'A witch and her cat live here',
1 => 'A witch and her cat live here',
2 => 'A witch and her cat live here',
3 => 'A witch and her cat live here',
),
6 =>
array (
1 => 'BHG-MS-AWAHCLH030720',
2 => 'BHG-MS-AWAHCLH030720',
3 => 'BHG-MS-AWAHCLH030720',
),
),
2 =>
array (
0 => '',
1 => '',
),
3 =>
array (
0 => '',
1 => '',
),
4 =>
array (
0 => '',
1 => '',
),
5 =>
array (
0 => 'Autism House Rules',
1 => 'https://beautifulhomegifts.com/autism-house-rules/',
3 =>
array (
0 => 'BHG-MS-AHR030720',
1 => 'BHG-MS-AHR030720-A5',
2 => 'BHG-MS-AHR030720-A4',
3 => 'BHG-MS-AHR030720-A3',
),
4 =>
array (
0 => 'Autism House Rules',
1 => 'Autism House Rules - 150mm x 200mm',
2 => 'Autism House Rules - 201mm x 305mm',
3 => 'Autism House Rules - 305mm x 400mm',
),
5 =>
array (
0 => 'Autism House Rules',
1 => 'Autism House Rules',
2 => 'Autism House Rules',
3 => 'Autism House Rules',
),
6 =>
array (
1 => 'BHG-MS-AHR030720',
2 => 'BHG-MS-AHR030720',
3 => 'BHG-MS-AHR030720',
),
)
);
$headers = array_shift($the_big_array);
$header_keys = array_keys($headers);
$fhandle = fopen("sample.csv","a+");// have w+ if you want to override each time.
fputcsv($fhandle,$headers);// add headers first
foreach($the_big_array as $row_data){
$insert_row = [];
// making consistent with all header keys
foreach($header_keys as $key){
if(isset($row_data[$key])){
$insert_row[$key] = $row_data[$key];
}else{
$insert_row[$key] = '';
}
}
if(count(array_filter($insert_row)) == 0) continue;
$final_row_data = [];
$max_depth_size = 0;
foreach($insert_row as $value){
if(is_array($value)){
$max_depth_size = max($max_depth_size,count($value));
}
}
foreach($insert_row as $key => $value){
$temp = [];
if(is_array($value)){
$value = array_values($value); // since data is malformed(would work even if it is ok)
$val_size = count($value);
for($i = 0; $i < $max_depth_size; ++$i){
if($i >= $val_size) $temp[$i] = '';
else $temp[$i] = $value[$i];
}
}else{
$temp = array_merge(array($value),array_fill(0, $max_depth_size - 1, ''));
}
$final_row_data[] = $temp;
}
for($column = 0;$column < $max_depth_size; ++$column){
fputcsv($fhandle,array_column($final_row_data, $column)); // add all formatted data to CSV
}
}
fclose($fhandle);
Your starting array is bad-formed, because it is not consistent in the dimensions of the child array and in the indexes. That's a valid solution, but it's very fragile because there are a lot of assumption about the array structure.
$f = fopen('new.csv', 'a');
// Write the header
fputcsv($f, array_values(array_shift($the_big_array)));
foreach($the_big_array as $baseRow) {
if (empty($baseRow[0]) continue
$subRowsCount = count($baseRow[3])
if (
count($baseRow[4]) !== $subRowsCount
|| count($baseRow[5]) !== $subRowsCount
|| count($baseRow[6]) !== $subRowsCount - 1)
} {
// Check that the sub-arrays dimensions are consistent or ignore the row
continue;
}
for($i = 0; $i < $subRowsCount; $i++) {
fputcsv($f, [
$i === 0 ? $baseRow[0] : '', // Title
$i === 0 ? $baseRow[1] : '', // image_url
$baseRow[3][$i], // SKU code
$baseRow[4][$i], // Title size
$baseRow[5][$i], // Description
$i === 0 ? '' : $baseRow[6][$i-1] // Base sku
])
}
}
The first row of each "group" contains the max number of columns so it can be used to reliably fetch column data.
Tear off the title and url values as you iterate your input array so that the remaining data in the subarray has a consistent and easily manipulated structure.
Rows that have missing trailing columns do not matter when pushing csv rows into a file, so it is a waste of code to bother generating empty strings. Conversely, leading empty column values will be a problem -- this is why I add two empty strings when not adding the first row of a respective group.
Read the PSR coding standards to see recommendations on spacing and curly brace usage.
Code: (Demo)
$headers = array_shift($array);
$fhandle = fopen("new.csv", "a");
fputcsv($fhandle, $headers);
foreach ($array as $row) {
if (empty($row[0])) {
continue;
}
$titleAndUrl = array_splice($row, 0, 2);
foreach ($row[0] as $column => $notUsed) {
fputcsv(
$fhandle,
array_merge(
!$column ? $titleAndUrl : ['', ''],
array_column($row, $column)
)
);
}
}
fclose($fhandle);
See demo for output in array form.

PHP preg_split() pattern

I need help finding a PCRE pattern using preg_split().
I'm using the regex pattern below to split a string based on its starting 3 character code and semi-colons. The pattern works fine in Javascript, but now I need to use the pattern in PHP. I tried preg_split() but just getting back junk.
// Each group will begin with a three letter code, have three segments separated by a semi-colon. The string will not be terminated with a semi-colon.
// Pseudocode
string_to_split = "AAA;RED;111;BBB;BLUE;22;CCC;GREEN;33;DDD;WHITE;44"
// This works in JS
// https://regex101.com
$pattern = "/[AAA|BBB|CCC|DDD][^;]*;[^;]*[;][^;]*/gi";
Match 1
Full match 0-11 `AAA;RED;111`
Match 2
Full match 12-23 `BBB;BLUE;22`
Match 3
Full match 24-36 `CCC;GREEN;33`
Match 4
Full match 37-49 `DDD;WHITE;44`
$pattern = "/[AAA|BBB|CCC|DDD][^;]*;[^;]*[;][^;]*/";
$split = preg_split($pattern, $string_to_split);
returns
array(5)
0:""
1:";"
2:";"
3:";"
4:""
According to your additional information in some comments to the answers, I update my answer to be very specific to your source format.
You might want something like this:
$subject = "AAA;RED;111;AAA;Oh my dog;12.34;AAA;Oh Long John;.4556;BBB;Oh Long Johnson;1.2323;BBB;Oh Don Piano;.33;CCC;Why I eyes ya;1.445;CCC;All the live long day;2.3343;DDD;Faith Hilling;.89";
$pattern = '/(?<=;|^)(AAA|BBB|CCC|DDD);([^;]*);((?:\d*\.)?\d+)(?=;|$)/';
preg_match_all($pattern, $subject,$matches);
var_dump($matches);
giving you
array (size=4)
0 =>
array (size=8)
0 => string 'AAA;RED;111' (length=11)
1 => string 'AAA;Oh my dog;12.34' (length=19)
2 => string 'AAA;Oh Long John;.4556' (length=22)
3 => string 'BBB;Oh Long Johnson;1.2323' (length=26)
4 => string 'BBB;Oh Don Piano;.33' (length=20)
5 => string 'CCC;Why I eyes ya;1.445' (length=23)
6 => string 'CCC;All the live long day;2.3343' (length=32)
7 => string 'DDD;Faith Hilling;.89' (length=21)
1 =>
array (size=8)
0 => string 'AAA' (length=3)
1 => string 'AAA' (length=3)
2 => string 'AAA' (length=3)
3 => string 'BBB' (length=3)
4 => string 'BBB' (length=3)
5 => string 'CCC' (length=3)
6 => string 'CCC' (length=3)
7 => string 'DDD' (length=3)
2 =>
array (size=8)
0 => string 'RED' (length=3)
1 => string 'Oh my dog' (length=9)
2 => string 'Oh Long John' (length=12)
3 => string 'Oh Long Johnson' (length=15)
4 => string 'Oh Don Piano' (length=12)
5 => string 'Why I eyes ya' (length=13)
6 => string 'All the live long day' (length=21)
7 => string 'Faith Hilling' (length=13)
3 =>
array (size=8)
0 => string '111' (length=3)
1 => string '12.34' (length=5)
2 => string '.4556' (length=5)
3 => string '1.2323' (length=6)
4 => string '.33' (length=3)
5 => string '1.445' (length=5)
6 => string '2.3343' (length=6)
7 => string '.89' (length=3)
The start marker should occur at the start of string or immidiately after a semicolon, so we do a lookbehind, looking for start or semicolon:
(?<=;|^)
We look for an alternative of AAA,BBB,CCC or DDD and capture it:
(AAA|BBB|CCC|DDD)
After a semicolon we look for any character except a semicolon. The quantifier * means 0 or more time. Use + if you want at least 1.
;([^;]*)
After the next semicolon wie look for a number. This task has to be splitted to fit a valid format: We first look for 0 or more digits followed by a dot:
(?:\d*\.)?
where (?:) means a non-capturing group.
Behind we look for at least one digit: \d+
We want to capture both parts of of the number using parentheses after the searched semicolon:
;((?:\d*\.)?\d+)
This matches "1234", ".1234", "1.234", "12.34" , "123.4" but "1234.", "1.2.3"
Finally we want this to immediately occur before a semicolon or the end of string. Thus we do a lookahead:
(?=;|$)
Lookaheads and lookbehinds are not part of the captured result behind or respectively before.
I've modified your pattern a little, and added a couple of flags to preg_split.
The PREG_SPLIT_NO_EMPTY flag will exclude empty matches from the result, and PREG_SPLIT_DELIM_CAPTURE will include the captured value in the result.
$split = preg_split('/([abcd]{3};[^;]+;\d+);?/i', $string, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
Result:
Array
(
[0] => AAA;RED;111
[1] => BBB;BLUE;22
[2] => CCC;GREEN;33
[3] => DDD;WHITE;44
)
Alternatively, and more suitably, you can use preg_match_all with the following pattern.
preg_match_all('/([abcd]{3};[^;]+;\d+);?/i', $string, $matches);
print_r($matches[0]);
Result:
Array
(
[0] => AAA;RED;111
[1] => BBB;BLUE;22
[2] => CCC;GREEN;33
[3] => DDD;WHITE;44
)
You don't want to split your string but match elements, use preg_match_all:
$str = "AAA;RED;111;AAA;Oh my dog;2.34;AAA;Oh Long John;.4556;BBB;Oh Long Johnson;1.2323;BBB;Oh Don Piano;.33;CCC;Why I eyes ya;1.445;CCC;All the live long day;2.3343;DDD;Faith Hilling;.89";
$res = preg_match_all('/(?:AAA|BBB|CCC|DDD);[^;]*;[^;]*;?/', $str, $m);
print_r($m[0]);
Output:
Array
(
[0] => AAA;RED;111;
[1] => AAA;Oh my dog;2.34;
[2] => AAA;Oh Long John;.4556;
[3] => BBB;Oh Long Johnson;1.2323;
[4] => BBB;Oh Don Piano;.33;
[5] => CCC;Why I eyes ya;1.445;
[6] => CCC;All the live long day;2.3343;
[7] => DDD;Faith Hilling;.89
)
Explanation:
/ : regex delimiter
(?:AAA|BBB|CCC|DDD) : non capture group AAA or BBB or CCC or DDD
; : a semicolon
[^;]* : 0 or more any character that is not a semicolon
; : a semicolon
[^;]* : 0 or more any character that is not a semicolon
;? : optional semicolon
/ : regex delimiter

Can not match the last group of numbers using php preg_match()

preg_match_all("/(\d{12})
(?:,|$)/","111762396541,561572500056,561729950637,561135281443",$matches);
var_dump($mathes):
array (size=2)
0 =>
array (size=4)
0 => string '561762396543,' (length=13)
1 => string '561572500056,' (length=13)
2 => string '561729950637,' (length=13)
3 => string '561135281443' (length=12)
1 =>
array (size=4)
0 => string '561762396543' (length=12)
1 => string '561572500056' (length=12)
2 => string '561729950637' (length=12)
3 => string '561135281443' (length=12)
But I want the $matches like this:
array (size=4)
0 => string '561762396543,' (length=13)
1 => string '561572500056,' (length=13)
2 => string '561729950637,' (length=13)
3 => string '561135281443' (length=12)
I wanna match groups of numbers(each has 12 digits) and a suffix comma if there is one.The exeption is the last group of numbers,it doesnt have to match a comma,cause it reaches the end of the line.
Try this instead:
preg_match_all("/(\d{12}(?:,|$))/","111762396541,561572500056,561729950637,561135281443",$matches);
When the $ is inside your character range brackets [ ] it is looking for the $ characters not the end-of-line.
EDIT: If you want to include the comma in your matches, then just use the above code sample and look at $matches[0].
If you wanted an easier syntax that matches any sort of word boundary, the \b will match commas and end-of-line, too:
preg_match_all("/(\d{12}\b)/","111762396541,561572500056,561729950637,561135281443",$matches);

preg_match_all for numbers in certain parts of string

Babylon 5 Season 4 Episode 13 Rumors Bargains and Lies 45
how i can extract numbers that comes after Season and the numbers that comes only after episode not any number after that. in the above example. i would want only 4 and 13 numbers using php using preg_match thanks in advance
You can do something like this:
$str = 'Babylon 5 Season 4 Episode 13 Rumors Bargains and Lies 45';
if (preg_match_all("/(Season|Episode) (\d.)/", $str, $matches)) {
var_dump($matches);
}
And it will output:
array (size=3)
0 =>
array (size=2)
0 => string 'Season 4 ' (length=9)
1 => string 'Episode 13' (length=10)
1 =>
array (size=2)
0 => string 'Season' (length=6)
1 => string 'Episode' (length=7)
2 =>
array (size=2)
0 => string '4 ' (length=2)
1 => string '13' (length=2)
You can have your Season and Episode values in an array m:
preg_match_all('/.*Season\s+(?<season>\d+)\s+Episode\s+(?<episode>\d+)/', $str, $m);
print 'Season: ' . $m['season'][0] . "\n";
print 'Episode: ' . $m['episode'][0] . "\n";
You can use regex look behind to capture numbers.
(?<=Season)\s*?\d+|(?<=Episode)\s*?\d+
It should capture 4 and 13.
See https://regex101.com/r/JVOUOw/2

Extract tabular data using regular expressions [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a text file with multiple occurrences of tables like show below:
_____________________________________
Heading 1 | Heading 2
_______________ | ___________________
Label1 18857.10 | Label3 710.00
Label2 2361.50 | Label4 0.00
| Label5 2531.37
| Label6 0.00
| Label7 0.00
| Label8 0.01
________________| ___________________
16495.60 | Label9 3969.06
_______________ | ___________________
I want to store the numerical values into variables using regular expressions. Since I'm new to regular expressions, I couldn't find a way to do it. Can anyone help me with this?
$table="_____________________________________
Heading 1 | Heading 2
_______________ | ___________________
Label1 18857.10 | Label3 710.00
Label2 2361.50 | Label4 0.00
| Label5 2531.37
| Label6 0.00
| Label7 0.00
| Label8 0.01
________________| ___________________
16495.60 | Label9 3969.06
_______________ | ___________________
";
$num = preg_match_all('/(\w+) (\d+(\.\d+)?)/', $table, $result);
for($i=0; $i<$num; $i++){
echo "{$result[1][$i]} = {$result[2][$i]}<br>";
}
If your table is exactly what you showed, this works.
regex: /(\w+) (\d+(\.\d+)?)/
Slashes / at the begining and end are delimiting the regex.
(\w+) means, "match any letter,number or underscore once or more times
one space follows, you can add + after the space, to match more then one, or put \s instead of space, to match any white character, like tab for example..
(\d+(\.\d+)?) ... \d+ means one or more digits, (\.\d+) means dot followed by one or more digits, question mark means that the previous parenthesis (\.\d+) is optional.
Preg_match_all stores those matches in third parameter and returns number of matches. In $result[$i][0] is the whole match, $result[$i][1] is first sub-expression (\w+), $result[$i][2] is second parenthesis (\d+(\.\d+)?), $result[$i][3] is the decimal part (\.\d+), it is inside $result[$i][2], but you don't need $result[$i][3], just for explanation :)
The code prints:
Heading = 1
Heading = 2
Label1 = 18857.10
Label3 = 710.00
Label2 = 2361.50
Label4 = 0.00
Label5 = 2531.37
Label6 = 0.00
Label7 = 0.00
Label8 = 0.01
Label9 = 3969.06
edit: sorry, it doesn't work, it didn't match that naked 16495.60 value. Let me think a bit more...
...
$regex='/([a-zA-Z0-9]+)? +(\d+(\.\d+)?)/';
is bit better, here's how it works:
[a-zA-Z0-9]+ matches non-zero ammount of letters or numbers
? after parenthesis means, the whole parenthesis expression is optional.
+ one or more spaces
(\d+(\.\d+)?) non-zero ammount of digits followed by optional { dot and another non-zero ammount of digits }
This whole regex does not include | or new-line, so all matching should happen in only one field of the table.
The result variable should be:
array (size=4)
0 =>
array (size=12)
0 => string 'Heading 1' (length=9)
1 => string 'Heading 2' (length=9)
2 => string 'Label1 18857.10' (length=15)
3 => string 'Label3 710.00' (length=13)
4 => string 'Label2 2361.50' (length=14)
5 => string 'Label4 0.00' (length=11)
6 => string 'Label5 2531.37' (length=14)
7 => string 'Label6 0.00' (length=11)
8 => string 'Label7 0.00' (length=11)
9 => string 'Label8 0.01' (length=11)
10 => string ' 16495.60' (length=19)
11 => string 'Label9 3969.06' (length=14)
1 =>
array (size=12)
0 => string 'Heading' (length=7)
1 => string 'Heading' (length=7)
2 => string 'Label1' (length=6)
3 => string 'Label3' (length=6)
4 => string 'Label2' (length=6)
5 => string 'Label4' (length=6)
6 => string 'Label5' (length=6)
7 => string 'Label6' (length=6)
8 => string 'Label7' (length=6)
9 => string 'Label8' (length=6)
10 => string '' (length=0)
11 => string 'Label9' (length=6)
2 =>
array (size=12)
0 => string '1' (length=1)
1 => string '2' (length=1)
2 => string '18857.10' (length=8)
3 => string '710.00' (length=6)
4 => string '2361.50' (length=7)
5 => string '0.00' (length=4)
6 => string '2531.37' (length=7)
7 => string '0.00' (length=4)
8 => string '0.00' (length=4)
9 => string '0.01' (length=4)
10 => string '16495.60' (length=8)
11 => string '3969.06' (length=7)
3 =>
array (size=12)
0 => string '' (length=0)
1 => string '' (length=0)
2 => string '.10' (length=3)
3 => string '.00' (length=3)
4 => string '.50' (length=3)
5 => string '.00' (length=3)
6 => string '.37' (length=3)
7 => string '.00' (length=3)
8 => string '.00' (length=3)
9 => string '.01' (length=3)
10 => string '.60' (length=3)
11 => string '.06' (length=3)
edit2: GRAB THOSE SNIPPETS AGAIN! There should be a backslash before the dot, in (\.\d+)!!! I formated it wrong and it disappeared.** Rewrote it, should be fine now.

Categories