echo date('r',strtotime("16 Dec, 2010")); //Tue, 16 Dec 2008 20:10:00 +0530
echo date('r',strtotime("16 Dec 2010")); //Sat, 16 Jan 2010 00:00:00 +0530
That's just wrong... Either it should fail or it should parse correctly. Do you know any robust natural language date/time parser in php? How do you parse natural language date time in php?
Edit:
var_dump(strtotime("16 Dec, abcd")); //bool(false)
"16 Dec, 2010" is either a valid GNU date input format or it is not. In the first case it should return the correct answer and in the second it should return false. This is what I mean by 'wrong'.
Edit:
The purpose is as hop guessed to accept a significant variety of user input.
If you know in what format the time is represented in the string, you can use strptime() together with the appropriate format string to parse it. It will at least report an error when it cannot interpret the string according to the format.
This function exists in PHP 5.1.0 and up.
If you want to take arbitrary user input, you should offer clear and obvious feedback to the user, so that she can do something about a falsely interpreted date. Most of the time, there won't be a problem anyway and you can't ever catch all problematic cases (think American vs. European format).
It's not wrong, the data you're supplying is ambiguous - there is a world of difference.
Ambiguous data means the most you can reasonably expect from it is a 'best guess'. You might disagree with how it makes this best guess, but that's not 'wrong', that's just a different opinion on what is most likely. You can't expect any more than that without removing the ambiguity.
Further thoughts, mostly to hop's comments on the OP:
Silently failing is not an option - deciding when or not to silently fail is subject to the same rules, and will be thrown by the same ambiguities.
Which of the example strings is wrong and should silently fail? What about the guy next to you? Does he think the same one is wrong? What if you remove the context by not comparing them side by side?
The only thing 'wrong' here is expecting a function to be able to decipher an exact meaning from data that will always be subject to ambiguity... and this is just those examples, I haven't even got to dates yet :) (1/2/08 is the first of Feb? or the 2nd of Jan? 1908? 2008? 8?)
Right, that said, I'm off to write a function called 'is_this_art'...
There is a Ruby class called Chronic that has the flexibility you need to handle convenient user input: http://chronic.rubyforge.org/
I'm sure you could just port it to PHP by replacing Ruby's Time with PHP's DateTime.
strtotime is the best function you could find for that. I doubt that an arbitrary string representation of a date will ever be interpreted 100% correctly, since it would require at least some information on the formatting used.
In other words: Please define natural language (You just used two different versions thereof in your question, as the php interpreter pointed out correctly)
I'm not familiar with any, though maybe someone can offer an already-written one. In the meantime, I'd recommend running your date data through a regex or other munging before putting it through strtotime, and using a little sanity-checking on its output to see if the returned date falls in the accepted range.
Related
I am trying to get the date out of sentences in php.. so for example
I am trying to get 10/8/2006 out of
"This building was cleaned on the 8th of October 2006 after a huge storm."
There is a github function for it
https://github.com/etiennetremel/PHP-Find-Date-in-String but it fails in dates such as 1/5/2012.
I realize that given the varied nature of date strings, finding a date string in strings is so much tougher than just simple REGEX for a specific format, or simply strototiming a given string input..
does anyone have any good ideas?
Firstly, I would start by looking for a few basic patterns and extracting them with a few passes of regular expression (mm/dd/yy and mm/dd/yyyy with \d{2]/\d{2}/\d{2,4}, then look for others, like \d{1,2}(th|st|rd)? Jan(uary)?|Feb(ruary)?|Mar(ch)?|Apr(il)?|May|Jun(e)?|Jul(y)?|Aug(ust)?|Sep(tember)?|Oct(ober)|Nov(ember)?|Dec(ember) etc)
It will almost certainly be quicker to write a few regular expressions and do it in passes than write one massive one.
Then, pass the stuff you extracted in to strtotime to get yourself a usable unix timestamp and do with what you need from that.
Caveats:
I haven't tried the regexes, there are obvious optimisations
Works on the assumption that your dates will always be USA style (mm/dd/yy, and not dd/mm/yy)
I don't think there is a working solution because 10/8/2010 is not telling too much... it can be 10 day or month... I think you can proceed with your regex :)
i think it will be more simple with regex.
the problem is that aher ary many options of how the date is writen.
I have a problem to heuristically parse a string of text which contains a date but in a rather arbitrary (unknown) format.
function parseDateStr($text) {
$cleanText = filter($text);
# ...
$day = findDay($cleanText);
$month = findMonth($cleanText);
$year = findYear($cleanText);
# .. assert constraints, parse again or fail
return sprintf('%04d-%02d-%02d', $year, $month, $day)
}
Input text is a sentence in English language plus arbitrary syntax symbols (like a subset of \W regexp class). The task of the algorithm is to extract date only after filtering away any potential garbage (noisy) words, unrelated to the date. It is allowed that the algorithm could fail and return no result. If only two combination of two joined digits (MM) together with four other digits (YYYY) were found in the string - it is assumed that two digits corresponds to the month of the date and the day is taken to be 01 (first day of the month). Result gives a date in "YYYY-MM-DD" (SQL) format (of type DATE).
My idea is to proceed with designing a series of filters using preg_replace & co. Further, use logical constraints on the range of $year, $day, use a vocabulary for $month, etc., but I would not be surprised if similar but more elegant solutions or approaches are thinkable or already exist. If so, please let me know about them. I would also appreciate if any critics or potential pitfalls can be pointed out.
Relation to similar questions:
Please note that the question is different from more basic date parsing questions as:
PHP Parse Date String
How to parse any date format
since in my case I can not specify or determine the format of the string. On the other hand the following questions talk about similar tasks:
Extracting date from a string in Python
Extract multiple date format from few string variables in php
Extracting date from a string in PHP
I am not sure if the last one is a duplicate, it is not ultimately clear to me what OP wants to parse (although checkdate and date_parse seem to be partially useful). But the first question on the whole "mokey business" is also true for my case and has been addressed by fuzzy parsing as in
dparser.parse("monkey 2010-07-10 love banana",fuzzy=True)
Finally, the second one contains great grabbing regexp (almost "fuzzy").
PS by elegant I understand that the code is rather compact (there is no significant limitations on performance, so using "hacky" regexps is ok).
timelib
Well, date_parse is performing very very well and it was very educational to learn why. PHP function date_parse is a part of ext/date/lib or timelib, and apparently (despite lack of proper documentation) its implementation in C (written by Derick Rethans and called from the Zend Engine macros part with declarations) makes it a clever tool:
date_parse is already fuzzy: there are a lot of warnings (and complains) on the documentation page that function tolerates and parses too much but obviously it is actually a feature and not a bug (otherwise one should use date_parse_from_format or respective DateTime::createFromFormat())
date_parse uses (a lot of) regular expressions in a relatively smart way (based on re2c)
In addition to filtering this "scanner" looks for all possible combinations of words and date formats (from the list of known months and timezones), and, finally, just makes a "blindly" guess by looking for YYYY, MM and DD "separately" (very similar to what I need to do).
date_parse is a true compiled "scanner" that comes with look-ahead logic and error reporting that can be handled further by user (no exceptions, just messages inside the nested array of results).
There is even a python package wrapping the C code of timelib (so I am even not sure which is ultimately better in "parsing the monkey business" timelib or python-dateutil)
testing and examples
From my part, I have failed to find any input example from my dataset that was not parsed by date_parse, i.e.:
echo FuzzyDateParser::fromText('banana 1/2/3');
echo FuzzyDateParser::fromText('Joe Soap was born on 12 February 1981'));
echo FuzzyDateParser::fromText('2005 Feb., reprint'));
echo FuzzyDateParser::fromText('!'); # will fail to parse, producing an empty string.
echo FuzzyDateParser::fromText('monkey 2010-07-10 loves bananas and php');
The code for FuzzyDateParser class can be found in this gist. It can be useful as a template to handle errors and implement a fallback from date_parse results to own custom logic (which I eventually did not have to do for my case).
Goal: Convert any local date to the according ISO date
My Approach: http://codepad.viper-7.com/XEmnst
strftime("%Y-%m-%d",strtotime($date))";
Upside: Converts a lot of formats really well
Downside / Problem: Converts strings and numbers that are obviously not a date. E.g.
strftime("%Y-%m-%d",strtotime("A")) => 2012-10-29
strftime("%Y-%m-%d",strtotime("1")) => 1970-01-01
Questions:
Is there a better way to identify and convert dates to ISO dates?
Do you know of any library / regex that is capable of do so in php?
PHP's strtotime() function already does a best-effort attempt at taking an arbitrary string and working out what date format it is.
I dislike this function for a number of reasons, but it does do a reasonable job of working things out, given a string of unknown date format as input.
However, even strtotime()'s best efforts can never be enough, because arbitrary date formats are ambiguous.
There is no way to tell whether 05-06-07 is meant to be the 5th of June 2007 or the 6th of May 2007. Or even the 7th June 2005 (yes, some people do write dates like that).
Simple plain truth: It's impossible.
If you want your dates to be reliable in any meaningfuly way, you must abandon the idea that you'll be able to accept arbitrary input formats.
[EDIT]
You say in the comments that the input is coming from a variety of Excel and CSV files.
The only hope you have is if each of those files is consistent in itself. If you know that a file from a given source will have a given input format, you can write a custom wrapper for each file type that you import, and process it for that format. This is a solution I've used myself in the past, and it does work as long as you can predict the format for the file you're processing.
However, if individual files contain unpredictable or ambiguous dates, then you are out of luck: You have an impossible task. The only way you'll avoid having bad data is to kick back to the suppliers of the files and ask them to fix their data.
I think the problems will really arise when faced with dates such as 5-6-2012 when it is unclear whether you are dealing with 5th June, or 6th May and you could be taking input from European countries where DD MM YYYY is the norm.
If you are analyzing just one input field, then you might have a chance of detecting the delimeters and splitting the string up looking for what might look like a real date.
In this case the PHP function checkdate might come in handy as a last ditch double check.
Be aware also that Mysql (if this is where the data is heading) is also quite lenient about what it will put into a DATE field, the delimeters, the absence of leading zeros etc. But still, you have to get the Y M D order correct for it to have a chance.
I suppose the ultimate answer is to disallow free-text input for dates, but give them pickers - but of course you may not be in a position to influence the incoming date ...
I have a task to read datetime from csv file by PHP and store them in mysql database. There are two format of datetime in csv file, the first is DD/MM/YYYY HH:mm:ss AM/PM, the second is MM-DD-YYYY HH:mm:ss AM/PM. Then later, I need to select some rows for their datetime is in some period.
It seems a little confused. There are some questions in my brain:
It is easy to set varchar type in mysql table to store them. But it
is dificult to select some rows later, since I need to convert
string to datetime first and check if data between in a special
period.
Another solution is to convert these datetime from string to
datetime by PHP before storing in database. Then it is easy to
select data later. But the first step is also a little complex.
I do not know if some one has any good ideas about this question, or some experience in similar problems.
Firstly: never ever EVER store dates or date times in a database as strings.
NEVER.
Got that?
You should always convert them to the database's built-in date or datetime data types.
Failure to do this will bite you very very hard later on. For example, imagine trying to get the database to sort them in date order if they're saved as strings, especially if they're in varying formats. And if there's one thing that you can be sure of, when you've got a date in a database, you're going to need to query it based on entries on, after or before a given date. If you weren't going to need to do that sort of thing with them, there wouldn't be much point storing the date in the first place, so even if you haven't been asked to do it yet, consider it a given that it'll be asked for later. Therefore, always always ALWAYS store them in the correct data type and not as a varchar.
Next, the mixture of formats you've been asked to deal with.
This is insanity.
I loathe and detest PHP's strtotime() function. It is slow, has some unfortunate quirks, and should generally be considered a legacy of the past and not used. However, in this case, it may just come to your rescue.
strtotime() is designed to accept a date string in an unknown format, parse it, and output the correct timestamp. Obviously, it has to deal with the existence of both dd-mm-yyyy and mm-dd-yyyy formats. It does this by guessing which of the two you meant by looking at the separator character.
If the date string uses slashes as the separator, then it assumes the format is mm/dd/yyyy. If it uses dashes, then it assumes dd-mm-yyyy. This is one of those little quirks that makes using strtotime() such a pain in normal usage. But here it is your friend.
The way it works is actually the direct opposite of the formats you've specified in the question. But it should be enough to help you. If you switch the slashes and dashes in your input strings, and pass the result to strtotime() it should produce the correct timestamps in all cases, according to the way you've described it in the question.
It should then be simple enough to save them correctly in the database.
However I would strongly recommend testing this very very thoroughly. And not being surprised if it breaks somewhere along the line. If you're being fed data in inconsistent formats, then there really isn't any way to guarantee that it'll be consistently inconsistent. Your program basically needs to just do the best it can with bad data.
You also need to raise some serious questions about the quality of the input data. No program can be expected to work reliably in this situation. Make it clear to whoever is supplying it that it isn't good enough. If the program breaks because of bad data, it's their fault, not yours.
strtotime("25/03/1957") returns false. what will satisfy all of these date formats? i can't imagine how long it would take to actually make my own, so i hope there's already one out there you know of.
thanks!
Considering some dates are valid but can point to two different actual dates, no function will ever be able to "guess" the right format at all times...
To help with that, with PHP >= 5.3, a new function has been added : date_create_from_format -- but it doesn't exist with PHP < 5.3, unfortunately...
(See also DateTime::createFromFormat)
Still, in the example you took, the year 1957 is a possible source of problems : PHP generally works with UNIX Timestamps, when it comes to dates...
And, at least on 32-bits systems, those can only represent dates between 1970 and 2038 -- as they count the number of seconds since 1970-01-01.
To avoid this problem, it's often a good idea to use the DateTime class, with which (quoting) :
The date and time information is
internally stored as an 64-bit number
so all imaginable dates (including
negative years) are supported. The
range is from about 292 billion years
in the past to the same in the future.
(It will not solve the parsing problems with PHP < 5.3 ; but it'll solve the date-range problem...)
I've found that dateTime objects support a wider range of formats than the strtotime() function, and the timezone settings of your server also make a difference; but I ended up building a function that would replace '/' with '-' before using the string to date methods. I also test for valid, then try swapping the apparent dd and mm (25-03-2001 => 03-25-2001) if invalid before testing again.