How can i detect time like strings in a string - php

I would like to detect all time like strings in a webpage, and then use strtotime() in php to get unix time stamps. Is there a way to detect time like strings using php. I could use regex for a particular page, but I am seeking something universal or at least something that detects most of the possible formats of time/date strings? Thanks for reading this.
this is nice, but limited
Matching a time string with a regular expression

Similar question here:
How to convert String to Date without knowing the format?
The consensus is that you need to know the incoming format. You also could attempt to match the incoming string against a discreet list of known formats first in attempt to determine the format. You hinted at this with mentioning regex in your question. Those are really the only two ways.

You could try looking at the underlying implementation of strtotime() itself, and see how that's done - might give you some ideas.

Related

Date string might be one of two formats; how should I handle it?

I am alright with the basics of the php date manipulation functions, but I still get confused from time to time, partially because I don't know all of the rules firmly.
Right now I am dealing with a problem where I have a function that is going to save some data in a database. One of the data items is a date. Normally it is pretty easy for me to figure out how to convert it to the right format for the MySQL statement. However in this case the string could be in one of two different formats:
m/d/Y
or
m/d/Y h:iA
I need to be able to convert either to 'Y-m-d' and I need to know WHICH of the two formats I received. Is there a straight forward way to this? Like some sort of:
if (is_format('m/d/Y', $date)){
...
}
Thanks for the help.
Use DateTime::createFromFormat for each of your two formats in turn. It returns false on failure so you will know which call succeeded, and you get date/time validation as a bonus.
After creating the DateTime object use format to turn it into your preferred representation.

get date in string php

I am trying to get the date out of sentences in php.. so for example
I am trying to get 10/8/2006 out of
"This building was cleaned on the 8th of October 2006 after a huge storm."
There is a github function for it
https://github.com/etiennetremel/PHP-Find-Date-in-String but it fails in dates such as 1/5/2012.
I realize that given the varied nature of date strings, finding a date string in strings is so much tougher than just simple REGEX for a specific format, or simply strototiming a given string input..
does anyone have any good ideas?
Firstly, I would start by looking for a few basic patterns and extracting them with a few passes of regular expression (mm/dd/yy and mm/dd/yyyy with \d{2]/\d{2}/\d{2,4}, then look for others, like \d{1,2}(th|st|rd)? Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)|Nov(ember)?|Dec(ember) etc)
It will almost certainly be quicker to write a few regular expressions and do it in passes than write one massive one.
Then, pass the stuff you extracted in to strtotime to get yourself a usable unix timestamp and do with what you need from that.
Caveats:
I haven't tried the regexes, there are obvious optimisations
Works on the assumption that your dates will always be USA style (mm/dd/yy, and not dd/mm/yy)
I don't think there is a working solution because 10/8/2010 is not telling too much... it can be 10 day or month... I think you can proceed with your regex :)
i think it will be more simple with regex.
the problem is that aher ary many options of how the date is writen.

heuristic (fuzzy) date extraction from the string?

I have a problem to heuristically parse a string of text which contains a date but in a rather arbitrary (unknown) format.
function parseDateStr($text) {
$cleanText = filter($text);
# ...
$day = findDay($cleanText);
$month = findMonth($cleanText);
$year = findYear($cleanText);
# .. assert constraints, parse again or fail
return sprintf('%04d-%02d-%02d', $year, $month, $day)
}
Input text is a sentence in English language plus arbitrary syntax symbols (like a subset of \W regexp class). The task of the algorithm is to extract date only after filtering away any potential garbage (noisy) words, unrelated to the date. It is allowed that the algorithm could fail and return no result. If only two combination of two joined digits (MM) together with four other digits (YYYY) were found in the string - it is assumed that two digits corresponds to the month of the date and the day is taken to be 01 (first day of the month). Result gives a date in "YYYY-MM-DD" (SQL) format (of type DATE).
My idea is to proceed with designing a series of filters using preg_replace & co. Further, use logical constraints on the range of $year, $day, use a vocabulary for $month, etc., but I would not be surprised if similar but more elegant solutions or approaches are thinkable or already exist. If so, please let me know about them. I would also appreciate if any critics or potential pitfalls can be pointed out.
Relation to similar questions:
Please note that the question is different from more basic date parsing questions as:
PHP Parse Date String
How to parse any date format
since in my case I can not specify or determine the format of the string. On the other hand the following questions talk about similar tasks:
Extracting date from a string in Python
Extract multiple date format from few string variables in php
Extracting date from a string in PHP
I am not sure if the last one is a duplicate, it is not ultimately clear to me what OP wants to parse (although checkdate and date_parse seem to be partially useful). But the first question on the whole "mokey business" is also true for my case and has been addressed by fuzzy parsing as in
dparser.parse("monkey 2010-07-10 love banana",fuzzy=True)
Finally, the second one contains great grabbing regexp (almost "fuzzy").
PS by elegant I understand that the code is rather compact (there is no significant limitations on performance, so using "hacky" regexps is ok).
timelib
Well, date_parse is performing very very well and it was very educational to learn why. PHP function date_parse is a part of ext/date/lib or timelib, and apparently (despite lack of proper documentation) its implementation in C (written by Derick Rethans and called from the Zend Engine macros part with declarations) makes it a clever tool:
date_parse is already fuzzy: there are a lot of warnings (and complains) on the documentation page that function tolerates and parses too much but obviously it is actually a feature and not a bug (otherwise one should use date_parse_from_format or respective DateTime::createFromFormat())
date_parse uses (a lot of) regular expressions in a relatively smart way (based on re2c)
In addition to filtering this "scanner" looks for all possible combinations of words and date formats (from the list of known months and timezones), and, finally, just makes a "blindly" guess by looking for YYYY, MM and DD "separately" (very similar to what I need to do).
date_parse is a true compiled "scanner" that comes with look-ahead logic and error reporting that can be handled further by user (no exceptions, just messages inside the nested array of results).
There is even a python package wrapping the C code of timelib (so I am even not sure which is ultimately better in "parsing the monkey business" timelib or python-dateutil)
testing and examples
From my part, I have failed to find any input example from my dataset that was not parsed by date_parse, i.e.:
echo FuzzyDateParser::fromText('banana 1/2/3');
echo FuzzyDateParser::fromText('Joe Soap was born on 12 February 1981'));
echo FuzzyDateParser::fromText('2005 Feb., reprint'));
echo FuzzyDateParser::fromText('!'); # will fail to parse, producing an empty string.
echo FuzzyDateParser::fromText('monkey 2010-07-10 loves bananas and php');
The code for FuzzyDateParser class can be found in this gist. It can be useful as a template to handle errors and implement a fallback from date_parse results to own custom logic (which I eventually did not have to do for my case).

Identify date format and convert to ISO date in php

Goal: Convert any local date to the according ISO date
My Approach: http://codepad.viper-7.com/XEmnst
strftime("%Y-%m-%d",strtotime($date))";
Upside: Converts a lot of formats really well
Downside / Problem: Converts strings and numbers that are obviously not a date. E.g.
strftime("%Y-%m-%d",strtotime("A")) => 2012-10-29
strftime("%Y-%m-%d",strtotime("1")) => 1970-01-01
Questions:
Is there a better way to identify and convert dates to ISO dates?
Do you know of any library / regex that is capable of do so in php?
PHP's strtotime() function already does a best-effort attempt at taking an arbitrary string and working out what date format it is.
I dislike this function for a number of reasons, but it does do a reasonable job of working things out, given a string of unknown date format as input.
However, even strtotime()'s best efforts can never be enough, because arbitrary date formats are ambiguous.
There is no way to tell whether 05-06-07 is meant to be the 5th of June 2007 or the 6th of May 2007. Or even the 7th June 2005 (yes, some people do write dates like that).
Simple plain truth: It's impossible.
If you want your dates to be reliable in any meaningfuly way, you must abandon the idea that you'll be able to accept arbitrary input formats.
[EDIT]
You say in the comments that the input is coming from a variety of Excel and CSV files.
The only hope you have is if each of those files is consistent in itself. If you know that a file from a given source will have a given input format, you can write a custom wrapper for each file type that you import, and process it for that format. This is a solution I've used myself in the past, and it does work as long as you can predict the format for the file you're processing.
However, if individual files contain unpredictable or ambiguous dates, then you are out of luck: You have an impossible task. The only way you'll avoid having bad data is to kick back to the suppliers of the files and ask them to fix their data.
I think the problems will really arise when faced with dates such as 5-6-2012 when it is unclear whether you are dealing with 5th June, or 6th May and you could be taking input from European countries where DD MM YYYY is the norm.
If you are analyzing just one input field, then you might have a chance of detecting the delimeters and splitting the string up looking for what might look like a real date.
In this case the PHP function checkdate might come in handy as a last ditch double check.
Be aware also that Mysql (if this is where the data is heading) is also quite lenient about what it will put into a DATE field, the delimeters, the absence of leading zeros etc. But still, you have to get the Y M D order correct for it to have a chance.
I suppose the ultimate answer is to disallow free-text input for dates, but give them pickers - but of course you may not be in a position to influence the incoming date ...

datetime for mysql and PHP

I have a task to read datetime from csv file by PHP and store them in mysql database. There are two format of datetime in csv file, the first is DD/MM/YYYY HH:mm:ss AM/PM, the second is MM-DD-YYYY HH:mm:ss AM/PM. Then later, I need to select some rows for their datetime is in some period.
It seems a little confused. There are some questions in my brain:
It is easy to set varchar type in mysql table to store them. But it
is dificult to select some rows later, since I need to convert
string to datetime first and check if data between in a special
period.
Another solution is to convert these datetime from string to
datetime by PHP before storing in database. Then it is easy to
select data later. But the first step is also a little complex.
I do not know if some one has any good ideas about this question, or some experience in similar problems.
Firstly: never ever EVER store dates or date times in a database as strings.
NEVER.
Got that?
You should always convert them to the database's built-in date or datetime data types.
Failure to do this will bite you very very hard later on. For example, imagine trying to get the database to sort them in date order if they're saved as strings, especially if they're in varying formats. And if there's one thing that you can be sure of, when you've got a date in a database, you're going to need to query it based on entries on, after or before a given date. If you weren't going to need to do that sort of thing with them, there wouldn't be much point storing the date in the first place, so even if you haven't been asked to do it yet, consider it a given that it'll be asked for later. Therefore, always always ALWAYS store them in the correct data type and not as a varchar.
Next, the mixture of formats you've been asked to deal with.
This is insanity.
I loathe and detest PHP's strtotime() function. It is slow, has some unfortunate quirks, and should generally be considered a legacy of the past and not used. However, in this case, it may just come to your rescue.
strtotime() is designed to accept a date string in an unknown format, parse it, and output the correct timestamp. Obviously, it has to deal with the existence of both dd-mm-yyyy and mm-dd-yyyy formats. It does this by guessing which of the two you meant by looking at the separator character.
If the date string uses slashes as the separator, then it assumes the format is mm/dd/yyyy. If it uses dashes, then it assumes dd-mm-yyyy. This is one of those little quirks that makes using strtotime() such a pain in normal usage. But here it is your friend.
The way it works is actually the direct opposite of the formats you've specified in the question. But it should be enough to help you. If you switch the slashes and dashes in your input strings, and pass the result to strtotime() it should produce the correct timestamps in all cases, according to the way you've described it in the question.
It should then be simple enough to save them correctly in the database.
However I would strongly recommend testing this very very thoroughly. And not being surprised if it breaks somewhere along the line. If you're being fed data in inconsistent formats, then there really isn't any way to guarantee that it'll be consistently inconsistent. Your program basically needs to just do the best it can with bad data.
You also need to raise some serious questions about the quality of the input data. No program can be expected to work reliably in this situation. Make it clear to whoever is supplying it that it isn't good enough. If the program breaks because of bad data, it's their fault, not yours.

Categories