Need some assistence with Regex (PHP) - php

I'd like to parse txt files to HTML using preg_replace to add formatting.
The format of the file is like this :
09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
This should be treated as a group and parsed into a table, like :
<table>
<tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr>
<tr><td>1234567</td><td>(optional)</td><td>Today is a beautiful day</td></tr>
<tr><td>1234568</td><td>(optional)</td><td>Tomorrow will be even better</td></tr>
<tr><td>1234569</td><td>(optional)</td><td>December is the best month of the year!</td></tr>
</table>
For now, I'm using two separate preg_replacements, one for the first line (date) and a second one for the following ones, which can be just one or up to 100 or so. But, this file can contain other text as well, which needs to be ignored (as for the replacement), but if this line has more or less the same format (7 digits and some text) it gets formatted as well :
$file = preg_replace('~^\s*((\[.*\]){0,2}\d{1,2}:\d{2}:\d{2}(\[/.*\]){0,2})\s(\d{2}-\d{2}-\d{2}(\[/.*\]){0,2})\s+(?:\d{2}/\d{3}\s+|)(Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday)\s+(.+)$~m', '<table class="file"><tr class="entry"><td class="time">$1 $4</td><td class="day">$6</td><td class="message">$7</td></tr>', $file);
$file = preg_replace('~^\s*(.{0,11}?)\s*((\[.+?\])?\d{7}(\[/.+?\])?)\s+(.+?)$~m', '<tr class="id"><td class="optional">$1</td><td class="id">$2</td><td class="message">$5</td></tr>', $file);
How to improve this? Like, if I have this content :
09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better
So, I'd like to catch and preg_replace only the first block and the last one, starting with time/date and some following lines, starting with a 7-digit ID.
So far, thanks for reading ;)

I think this accomplishes what you are trying to do.
There was one line that were unclear to me why it should be ignored:
1234570 This line should be ignored
This line meets the 7 digits and some text requirement.
The regex I came up with was:
/^(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2}|\d{7})\h*([a-zA-Z]{3}day)?\h*(.+)/m
Here is a regex101 demo: https://regex101.com/r/qB0gH6/1
and in PHP usage:
$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace('/^(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2}|\d{7})\h*([a-zA-Z]{3}day)?\h*(.+)/m', '<td>$1</td><td>$2</td><td>$3</td>', $string);
Output:
<td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234567</td><td></td><td>Today is a beautiful day</td>
<td>1234568</td><td></td><td>Tomorrow will be even better</td>
<td>1234569</td><td></td><td>December is the best month of the year!</td>
Liverpool - WBA 2-2
<td>1234570</td><td></td><td>This line should be ignored</td>
<td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234571</td><td></td><td>Today is a beautiful day</td>
<td>1234572</td><td></td><td>Tomorrow will be even better</td>
Okay, per your update it is a bit more complicated but I think this does it:
$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
Liverpool - WBA 2-2
1234570 This line should be ignored
19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace_callback('/(?:^|\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}day)?\h*(.+?)\n((\d{7})\h+(.+?)(\n|$))+/',
function ($matches) {
$lines = explode("\n", $matches[0]);
$theoutput = '<table><tr>';
foreach($lines as $line) {
if(preg_match('/(?:^|\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}day)?\h*(.*)/', $line, $output)) {
//it is the first date string line;
foreach($output as $key => $values) {
if(!empty($key)) {
$theoutput .= '<td>' . $values . '</td>';
}
}
} else {
if(preg_match('/(\d{7})\h*(.*)/', $line, $output)) {
$theoutput .= '</tr><tr>';
foreach($output as $key => $values) {
if(!empty($key)) {
$theoutput .= '<td>' . $values . '</td>';
}
}
}
}
}
$theoutput .= '</tr></table>';
return $theoutput;
}, $string);
Output:
<table><tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234567</td><td>Today is a beautiful day</td></tr><tr><td>1234568</td><td>Tomorrow will be even better</td></tr><tr><td>1234569</td><td>December is the best month of the year!</td></tr></table>
Liverpool - WBA 2-2
1234570 This line should be ignored
<table><tr><td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234571</td><td>Today is a beautiful day</td></tr><tr><td>1234572</td><td>Tomorrow will be even better</td></tr></table>

Related

Extract certain data from a text file and create a table [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Hi I am very new on PHP programming i am just trying to learn a little bit more on how can i work with files.
I having a text file with some bunch of data like below.
image from the file
Policy Name: TU_TOPS_VM-Full_30D_00_2
Daily Windows:
Saturday 19:50:00 --> Sunday 06:00:00
Policy Name: TU_QW_VM-FULL_30D_18_01
Daily Windows:
Sunday 02:05:00 --> Sunday 09:00:00
Policy Name: TU_GPAS_FULL-VM_30D_18_01
Daily Windows:
Friday 22:00:00 --> Saturday 06:00:00
I would like to have an output similar to this in a table.
POlicy Day Time
TU_TOPS_VM-Full_30D_00_2 Saturday Saturday 19:50:00
TU_QW_VM-FULL_30D_18_01 Sunday 02:05:00
TU_GPAS_FULL-VM_30D_18_01 Friday 22:00:00
From my code i was able to obtain the Policy name and organize the data in a table column.
Output from code.
POlicy Day Time
TU_TOPS_VM-Full_30D_00_2
TU_QW_VM-FULL_30D_18_01
What i was able to do so far.
<?php
$lines= file('schedule');
$lines = preg_grep("/Policy Name:/", $lines);
echo'
<table>
<tr>
<td>POlicy</td>
<td>Day</td>
<td>Time</td>
</tr>';
foreach ($lines as $policy) {
$find="Policy Name:";
$replace="";
$po= (str_replace($find,$replace,$policy));
echo '
<tr>
<td>'.$po.'<br></td>
</tr>
</table>';
}
?>
How can i extract the day and time and organize it beside the policy
name?.
You're throwing away the other lines when you use preg_grep. Instead, loop over all the lines, checking which kind of line it is.
Also, </table> should not be inside the loop, it should only be at the end of the loop.
<?php
$lines= file('schedule', FILE_IGNORE_NEW_LINE);
echo'
<table>
<tr>
<td>POlicy</td>
<td>Day</td>
<td>Time</td>
</tr>';
foreach ($lines as $line) {
if (strstr($line, 'Policy Name:')) {
$policy = str_replace('Policy Name:', '', $line);
} elseif (preg_match('/(\w+)\s+(\d\d:\d\d:\d\d)\s+-->/', $line, $match)) {
$day = $match[1];
$time = $match[2];
echo "
<tr>
<td>$policy</td>
<td>$day</td>
<td>$time</td>
</tr>";
}
}
echo "\n</table>";
?>
#Barmar wrote a great answer. Here's a different tack, in that I want to control where my array breaks (i.e. not at each line, because I'm not sure where your line breaks are). Also, less specific matching gives you more support for variability in your input (which may not be an issue for you).
<?php
$string = file_get_contents('schedule') ; // read the file in as a string, not an array
$text_array = explode("Policy Name:", $string) ; // break the string into an array of strings at each "Policy Name:"
foreach($text_array as $entry){
$entry = preg_replace('/\s+/', ' ',$entry) ; // conflate whitespace into a single space for delimiting
$sub_entry_array = explode(' ', $entry) ; // split each substring into an array
$table_rows .= "<tr><td>$sub_entry_array[1]</td><td>$sub_entry_array[4]</td><td>$sub_entry_array[5]</td></tr>" ; // display the array values we want
}
echo "<table><tr><th>Policy</th><th>Day</th><th>Time</th></tr>$table_rows</table>" ;
?>

Making the most possible sentences from multiple synonyms

I would like to the most possible different sentences from multiple block of words, in php. For example i put in the php code:
"today" "yesturday" | "is" "is not" | "monday" "tuesday"
It would become:
today is monday
yesturday is monday
today is not tuesday
yesturday is tuesday
etc...
How can i create this in php?
Thank you.
Try this
$block1 = array("today", "yesturday" );
$block2 = array("is", "is not");
$block3 = array("monday", "tuesday");
foreach($block1 as $word1) {
foreach($block2 as $word2) {
foreach($block3 as $word3) {
echo $word1.' '.$word2.' '.$word3."\n";
}
}
}

With a simple String In Date

Myself I have such a string
citation
2d6h8y4m
d - days
min - min
h - h
y - years
m - months
s - seconds
I would like to add today's date as saved time , ie
2 days , 6 hours, 8 years and 4 months
I know how to do it in a simple way - the loop on all the text and read in sequence numbers, but my guess is that it can be done more simply - a regex . Sorry sag on this if someone gave me such a function or somehow me clues (eg . Given pattern on one character) enough , I can Narratives
#INFO
Not understood still to drive not want to do today's date to add eg 2 days , 4 months , etc.
Ie today is 2014-11-22 5:43:45 p.m.
After the addition of I 2 days 6 hours 8 years and 4 months I have
2022-04-24 11:43:45 p.m.
Separate int values & char/word by preg_match_all(). Create and array() that contains the full meaning of character (ie, h = hours). Then just foreach(). Example:
$avr = array('d'=>'days', 'min'=>'min','h'=>'hours', 'y'=>'years', 'm'=>'months', 's'=>'seconds');
$str = '2d6h8y4m';
preg_match_all('/\d+/', $str, $int);
preg_match_all('/[a-z]+/', $str, $word);
$len = count($int[0]) - 1;
$result = '';
foreach($int[0] as $k=>$v){
if($len == $k){
$result .= ' and ' . $v . ' ' . $avr[$word[0][$k]];
}else{
$suf = ($k == 0) ? '' : ', ';
$result .= $suf . $v . ' ' . $avr[$word[0][$k]];
}
}
echo $result;
Output:
2 days, 6 hours, 8 years and 4 months
The regex you're looking for is
(\d+)d(\d+)h(\d+)y(\d+)m
\d matches any digit. From this you can extract what was matched between the brackets. Here is the full code:
preg_match("(\d+)d(\d+)h(\d+)y(\d+)m", "2d6h8y4m", $matches);
Now $matches will be an array containing what is matched between the brackets. $matches[0] will be the first bracket (the day), $matches[1] will be the first bracket (the hour), etc.
Why are you reinventing the wheel?
Why don't you store intervals in ISO-8601 duration format, like for example: P8Y5M2DT6H ?
$date = new DateTime();
$interval = new DateInterval('P8Y5M2DT6H');
$date->add($interval);
demo

Simple regex trying to extract 3 or 4 numbers from "dirty" time string

Despite some help earlier on I am still floundering in regex problems and now in array problems.
I am trying to allow users to put time in as 205pm 1405 14:05 2.05 pm and so on.
Previously I had times stored as 14:05 (standard mySQL TIME format) but users were not liking that but if I convert to 2:05 pm then, when the updated values are entered (in similar format), that obviously breaks the database.
I have NO TROUBLE going 14:05 to 2:05 pm but I am having a nightmare going in the opposite direction.
I have fudged things a bit with a cascading IF statement to get the string length but I have spent literally hours trying to get at the output.
IE if I get 2-05 pm, to start off with I just want to get 205.
Here is my atrocious code:
if ($_POST['xxx']='yyy')
{
$stuff=$_POST['stuff'];
$regex='/^\d\D*\d\D*\d\D*\d\D*\d\D*$/';
if (preg_match($regex, $stuff, $matches)) {echo " More than 4 digits. This cannot be a time."; }
else{
$regex='/^\d\D*\d\D*\d\D*\d\D*$/';
if (preg_match($regex, $stuff, $matches)) {echo " >>4 digits";}
else{
$regex='/^\d\D*\d\D*\d\D*$/';
if (preg_match($regex, $stuff, $matches)) {echo " >>3 digits";}
else{
$regex='/^\d\D*\d\D*$/';
if (preg_match($regex, $stuff, $matches)) {echo " Less than 3 digits. This cannot be a time.";}
}
}
}
}
debug ($matches,"mat1");
$NEWmatches = implode($matches);
debug ($matches,"matN1");
preg_match_all('!\d+!', $NEWmatches, $matches);
debug ($matches,"mat2");
$matches = implode($matches);
debug ($matches,"mat3");
echo "<br> Matches $matches"; /// I hoped to get the digits only here
?>
Thanks for any help.
$times = array(
'205pm', '1405', '4:05', '2.05 pm'
);
foreach($times as $time)
{
// parsing string into array with 'h' - hour, 'm' - minutes and 'ap' keys
preg_match('/(?P<h>\d{1,2})\D?(?P<m>\d{2})\s*(?P<ap>(a|p)m)?/i', $time, $matches);
// construction below is not necessary, it just removes extra values from array
$matches = array_intersect_key($matches,
array_flip(array_filter(array_keys($matches), 'is_string')));
// output the result
var_dump($matches);
}
If you are using that string at strtotime then it is easier just to reformat it to the correct format, like this
$times = array(
'205pm', '1405', '4:05', '2.05 pm'
);
var_dump(preg_replace('/(\d{1,2})\D?(\d{2})(\s*(a|p)m)?/i', '$1:$2$3', $times));
ps: for more complex possible situations I would suggest to reformat the time and do something like this, otherwise regexp can be a nightmare..
$times = array(
'9 pm', '205pm', '1405', '4:05', '2.05 pm'
);
$times = preg_replace('/(\d{1,2})\D?(\d{2})(\s*(a|p)m)?/i', '$1:$2$3', $times);
foreach($times as $time)
{
$date = strtotime($time);
if ($date === false) { echo 'Unable to parse the time ' . $time . "\n"; continue; }
$hour = date('G', $date);
$minutes = date('i', $date);
echo $hour . " : " . $minutes . "\n";
}
For your given example "2-05 or 14:05" you can use this RegEx:
^(?<HOUR>[0-9]{1,2})\s{0,}((-|:|\.)\s{0,})?(?<MIN>[0-9]{2})\s{0,}(?<MODE>(a|p)m)?$
"Hour" will hold the the first 2 numbers of the string, "MIN" will always hold the last 2 numbers of the string. "MODE" will hold (am or pm)
So you can combine them at the end to an single string. Also you can just run an simple Replace("-","").

PHP Integers with leading zeros

Particularly, 08 and 09 have caused me some major trouble. Is this a PHP bug?
Explanation:
I have a calendar 'widget' on a couple of our client's sites, where we have a HTML hard-coded calendar (I know a PHP function can generate n number of months, but the boss man said 'no').
Within each day, there is a PHP function to check for events on that day, passing the current day of the month like so:
<td valign="top">01<?php printShowLink(01, $events) ?></td>
$events is an array of all events on that month, and the function checks if an event is on that day:
function printShowLink($dayOfMonth, $eventsArray) {
$show = array();
$printedEvent = array();
$daysWithEvents = array();
foreach($eventsArray as $event) {
if($dayOfMonth == $event['day'] && !in_array($event['id'], $printedEvent)){
if(in_array($event['day'], $daysWithEvents)) {
echo '<hr class="calendarLine" />';
} else {
echo '<br />';
}
$daysWithEvents[] = $event['day']; // string parsed from timestamp
if($event['linked'] != 1) {
echo '<div class="center cal_event '.$event['class'].'" id="center"><span title="'.$event['title'].'" style="color:#666666;">'.$event['shorttitle'].'</span></div>';
$printedEvent[] = $event['id'];
} else {
echo '<div class="center cal_event '.$event['class'].'" id="center">'.$event['shorttitle'].'</div>';
$printedEvent[] = $event['id'];
}
}
}
}
On the 8th and 9th, no events will show up. Passing a string of the day instead of a zero-padded integer causes the same problem.
The solution is as what is should have been in the first place, a non-padded integer. However, my question is, have you seen this odd behavior with 08 and/or 09?
I googled this and couldn't find anything out there.
Quote it. 0123 without quotes is octal in PHP. It's in the docs
$ php -r 'echo 01234, "\n01234\n";'
668
01234
$
So you should change your code to
<td valign="top">01<?php printShowLink('01', $events) ?></td>
It's been a while since I've had to wade through so much PHP been doing mostly Javascript for 3 years. But 08 and 09 being a problem makes me think: they could be getting treated as octal (base 8), and the digits 8 and 9 do not exist in octal.

Categories