We've got a collection of messy data, and trying to unify it.
Lots of services let you type dates out into different formats and they correctly understand them, but cant think what the process is called, or how we could go about doing this in PHP, if there is a library that already provides this.
So we've got time and dates in an old database we've inherited, and trying to clean it up a bit, some of the formats look like
9pm
9:00pm
25th march 2015
its a complete mix and match, does anybody know of any libraries or ways to be able to parse these into a universal format?
the problem here is the information you have is inconsistent! you need to normalize it some way, IT might actually be worth getting into an excel sheet and try to match the date time fields into some kind of regex and filter like that, is probably what i would do, so you can separate the different formats and tackle each format individually.
A program will have to first identify the format you're feeding it and then it will spit out whatever format you want!
you can use this strtotime() PHP with this to turn it into any format
http://php.net/manual/en/function.date.php
Related
I'm fairly new to PHP, and I'm trying to write a script that solves the following
I have an RSS feed that gets saved to my server every 10 minutes (copied from elsewhere).
There is a problem with the timestamps (pubDate tag) on the RSS feed, they always have the correct date but 00:00:00 GMT as the timestamp (I have no control over this).
Therefor, when I use an autotweeting program to tweet updates from the feed (it checks it every hour or so), it won't - It only tweets the first update of each day as a result.
Therefor, what I'm trying to do to fix it to some degree is to check if the feed has changed, and if it has, change the saved pubDate to the current server time on only the new items.
I'm also kind of confused as to how I can have it check for changes - If I have a corrected version (with fairly accurate timestamps) saved to my server, it will ALWAYS think there are changes, because the timestamps will always be 00:00:00. I'm thinking, check both feeds for items including the full strings such as <guid isPermaLink="true">http://services.runescape.com/m=adventurers-log/a=161/display_player_profile.ws?searchName=A13d&id=-463827091</guid> - Since the id= at the end stays constant, it would only change the <pubDate> of items found to be new.
http://services.runescape.com/m=adventurers-log/a=161/rssfeed?searchName=A13d Here is a feed as an example. If anyone could get me started or point me to some kind of tutorial that might help, I'd really appreciate it. This is over my head, but something I need to learn how to do.
Maybe there is something wrong with your code parsing the timestamp, date format perhaps?
I believe the method of doing full string comparisons(<title> & <description>) between items with the same <guid> is your best bet. Here is some reading about RSS duplicate detection if you are interested.
I'm trying to reduce the amount of work I need to do each day through PHP and am a total novice to righting it, especially when it comes to using patterns and finding matches.
What I need to do is take for example (master.txt)
bonsaimimarlik.com,9/28/2013 12:00:00 AM,AUC
imimarlik.com,9/28/2013 12:00:00 AM,AUC
bonsai.com,9/28/2013 12:00:00 AM,AUC
bonsaimimlik.com,9/28/2013 12:00:00 AM,AUC
Have it narrow down results to those containing that day's date.
Then strip it down only the domain portion.
Then filter those results down to ones containing keywords I specify in an array.
Once processed down it needs to write everything to a file.
I'm really stuck on the $preq_quote and $preg_match_all portion of this. If you don't want to type out the code, I respect that. I want to learn so if you have anything good I could read on writing out the patterns. My weakest point is not being able to make sense of things like "/^." and "*\%/m" in some of the examples I see.
This will teach you all the basics about regular expressions after that you just need to practice to solve your problem
http://net.tutsplus.com/tutorials/php/regular-expressions-for-dummies-screencast-series/
Have fun learning :)
I'm trying to make a chat history system. So every time a person says hi to one another, they can also say something else. And each of those hi's, i wan't to add what they wrote into a history html file. Being something like this:
James Said: Hi Richard, i saw that hardware you told me about, it is compatible with our software!.
At: 23 November 2011 - 23:09 UTC-08.
________________________________________________________________________________________
Richard Said: Nice!! let's start working with it this week, the project has to be finished before the end of the world.
At: 24 November 2011 - 09:23 UTC-08.
________________________________________________________________________________________
The html file i can build with php, but how do i save it to a MySQL BLOB? Without storing it in a directory (directly to the BLOB).
You're approach to this problem isn't really a good one.
If you try to store the data in a particular output format then you're in real trouble if you suddenly find you need the data in a different format.
You're much better off just storing the particulars of the conversation, and then generating the output to display from the stored conversation. That way you can easily present it in all kinds of formats you might need it in.
EDIT TO ADD:
Something else I should have mentioned (but forgot thanks to all the Christmas brandy ;) ), trying to store the conversation data in a single big block of data will negate most of the advantages using a relational database can confer in the first place. You couldn't, for example, easily store the timestamp of each line of the conversation, or search the database for particular items in the conversation. You could find workarounds of course, but given databases are already designed to solve those kinds of problems anyway, you'd just be wasting effort and your solution wouldn't measure up to what the database already provides.
Since it is not really a binary (the B in Blob), but HTML, I suggest you use the MEDIUMTEXT type and deal with it as just a normal text field.
Quick embrassing question.
I have been looking for a PHP function that would calculate the difference between two timestamps and output the result based on given parameters such as
the diff in years only, diff in months only, diff in days only, etc etc
The function I made has been quite buggy and I haven't found a good one on the Net.
Please assist.
Thanks
Please take a look at DateTime::diff()
DateTime::diff — Returns the difference between two DateTime objects
You can format the output to anything you want it to be
(Provided an extra answer despite of the duplicates because they use strtotime & math and that doesn't always work out well or is a nice way to do it. Using a core function of php seems nicer to me)
I'm trying to get twitter statuses displaying on my blog, however I cannot get the time each status is created at to display the way in which I desire. Here is how it is being printed now:
Thu Aug 05 12:36:20 +0000 2010
However I would like it to be displayed like this:
54 days ago
How can I manage this with PHP preg_replace?
Also at the moment I am using the twitter API to get the statuses. Is it better to use this method or an RSS feed? I would appreciate if anyone could help me out. Thanks
I would really recommend writing it out to the page as "Aug 5 2010" (or however you want it to appear). That way you only need to write it out once ever, not once per day. But also write as a GMT timestamp in a way that JS can read it but people that have JS turned off cannot see it. Then, once you've got your page displaying things correctly, use a JS script to loop through the tags and replace the dates with the friendly text you want. Example:
<span class="dateToBeReplaced" title="Thu Aug 05 2010 12:36:20 GMT+0000">Aug 05 2010</span>
The JS would look something like this (uses jQuery): http://jsfiddle.net/JxTLt/4/
JS is a little more finicky about date formats than PHP, so you pretty much should stick the the format above. Use strtotime() to handle the formatting and time zone conversion.
You don't need to manually parse the date, it's already in an understandable format. If you run it through strtotime it will return a timestamp that you can work with.
The concept of displaying time as you want is called "fuzzy time", and you can find a really code post on it here.
You won't be able to do this with preg_replace() alone. You need to do some temporal comparisons to create a human readable timestamp. Check out this post. Also keep in mind that Twitter responses will be GMT.
As far as the API vs. RSS, this is really up to you. Both responses have to be parsed. There is arguably more overhead with the API now that Twitter only supports OAuth. Although there are several PHP libraries available. If you only want to display statuses, I'd go with the RSS.