Searchable date/time durations in SMW

Searchable date/time durations in SMW - php

I'm using Mediawiki, with the SMW extension, in a private setting to organize a fictional universe my group is creating. I have some functionality I'd like, and I'd like to know if there is an extension out there, or the possibility of making my own, that would let me do what I want to do.
Basically, in plain english... I want to know what's going on in the universe at a specific point (or duration) in time.
I'd like to give the user an ability to give a date (as simple as a year, or as precise as necessary), or duration, and return every event which has a duration that overlaps.
For example.
Event 1 starts ~ 3200 BCE, and ends ~ 198 BCE
Event 2 starts ~509 BCE and ends ~ 405 CE
Event 3 starts 1/15/419 CE and ends 1/17/419 CE
Event 4 starts ~409 BCE and ends on 2/14/2021 CE
User inputs a date (partial, in this instance) 309 BCE.
Wiki returns Event 1, and Event 4, as the given date is within the duration of both.
This would allow my creators to query a specific date (and hopefully a duration) and discover what events are already taking place, so they can adjust their works according to what is already established. It's a simple conflict checker.
If there's no extension available that can handle this, is there anything like this anywhere I can research? I've never dealt with dates in PHP. I'm a general business coder, I've never done complex applications.

There is no built in “duration” data type in SMW, so the easiest approach would probably be to use one date property for the starting date, and one for the ending date (note that it must be BC/AD, not BCE/CE or similar):
[[Event starts at::3200 BC]]
[[Event ends at::198 BC]]
then you can query for each event that has a starting date before, and an ending date after a certain date:
{{#ask:[[Event starts at::<1000 BC]] [[Event ends at::>1000 BC]]}}
Note that > actually means “greater or equal to” in SMW query syntax.

Related

Fuzzy date match

I have a mysql db of clients and crawled a website retrieving all the reviews for the past few years. Now I am trying to match those reviews up with the clients so I can email them. The problem is that the review site allowed them to enter anything they wanted for the name, so in some cases I have full first name and last initial, and in some cases first initial and last full name. It also gives an approximate time it was posted such as "1 week ago", "6 months ago" and so on which we already have converted to an approximate date.
Now I need to try matching those up to the clients. Seems the best way would be to do a fuzzy search on the names, and then once I find all John B% I look for the one with a job completion date nearest the posting of the review naturally eliminating anything that was posted before jobs were completed.
I put together a small sample dataset where table1 is the clients, table2 is the review to match on here:
http://sqlfiddle.com/#!9/23928c/6/0
I was initially thinking of doing a date_diff, but then I need to sort by the lowest number. Before I tackle this on my own, I thought I would ask if anyone has any tricks they want to share.
I am using PHP / Laravel to query MySql

You can use DATEDIFF with absolute values:
ORDER BY ABS(DATEDIFF(`date`, $calculatedDate)) DESC
To find records that match your estimation closely, positive or negative.

Time Prediction based on existing date:time records

I have a system that logs date:time and it returns results such as:
05.28.2013 11:58pm
05.27.2013 10:20pm
05.26.2013 09:47pm
05.25.2013 07:30pm
05.24.2013 06:24pm
05.23.2013 05:36pm
What I would like to be able to do is have a list of date:time prediction for the next few days - so a person could see when the next event might occur.
Example of prediction results:
06.01.2013 04:06pm
05.31.2013 03:29pm
05.30.2013 01:14pm
Thoughts on how to go about doing time prediction of this kind with php?

The basic answer is "no". Programming tools are not designed to do prediction. Statistical tools are designed for that purpose. You should be thinking more about R, SPSS, SAS, or some other similar tool. Some databases have rudimentary data analysis tools built-in, which is another (often inferior) option.
The standard statistical technique for time-series prediction is called ARIMA analysis (auto-regressive integrated moving average). It is unlikely that you are going to be implementing that in php/SQL. The standard statistical technique for estimating time between events is Poisson regression. It is also highly unlikely that you are going to be implementing that in php/SQL.
I observe that your data points are once per day in the evening. I might guess that this is the end of some process that runs during the day. The end time is based on the start time and the duration of the process.
What can you do? Often a reasonable prediction is "what happened yesterday". You would be surprised at how hard it is to beat this prediction for weather forecasting and for estimating the stock market. Another very reasonable method is the average of historical values.
If you know something about your process, then an average by day of the week can work well. You can also get more sophisticated, and do Monte Carlo estimates, by measuring the average and standard deviation, and then pulling a random value from a statistical distribution. However, the average value would work just as well in your case.
I would suggest that you study a bit about statistics/data mining/predictive analytics before attempting to do any "predictions". At the very least, if you really have a problem in this domain, you should be looking for the right tools to use.

As Gordon Linoff posted, the simple answer is "no", but you can write some code that will give a rough guess on what the next time will be.
I wrote a very basic example on how to do this on my site http://livinglion.com/2013/05/next-occurrence-in-datetime-sequence/

Here's a possible way that this could be done, using PHP + MySQL:
You can have a table with two fields: a DATE field and a TIME field (essentially storing the date + time portion separately). Say that the table is named "timeData" and the fields are:
eventDate: date
eventTime: time
Your primary key would be the combination of eventDate and eventTime, so that they're never repeated as a pair.
Then, you can do a query like:
SELECT eventTime, count(*) as counter FROM timeData GROUP BY eventTime ORDER BY counter DESC LIMIT 0, 10
The aforementioned query will always return the first 10 most frequent event times, ordered by frequency. You can then order these again from smallest to largest.
This way, you can return quite accurate time prediction results, which will become even more accurate as you gather data each day

How to merge iCalendar events

I'm trying to merge multiple iCalendars together. I wanna be able to merge overlapping events. So for example, if I have an event Monday at 12pm - 2pm and another event at 1pm - 3pm, I want to end up with an event that runs from 12pm until 3pm.
I'm looking for a simple open-source script that does that in PHP, or just help with the algorithm itself.
Any kind of help is appreciated!

Right -- sadly, I cannot help you with the PHP coding, as i know nothing of PHP (this also means my algorithmic help might just be waay off). However, I know quite my way around algorithms, so I'm going to come up with as many as possible. I'll give each it's reasons for and against, you can take your pick, and hopefully we'll both learn something.
First off, a simplification -- note that when merging more than two ICalendars together, we can merge two, then merge our result with the next etc; meaning our algorithm can just merge two to work.
With that in mind, the conceptually simplest merge I can muster:
Given ICalendars A and B, we will merge them into a new ICalendar C
Initialize C
Pick & remove the earliest event from either A or B, adding it to C.
Do the same, this time "merging the events" if they overlap.
lather, rinse, repeat until both A and B are empty -- C should now contain the merger of A and B.
Actually, this would be close to the best algorithm -- O(n) time, where n is the average number of events per ICalendar; meaning no other methods will be forthcoming...sadly.

This is what I ended up doing if anyone's interested. It may not be the most efficient, but it's good enough for what I'm doing.
Parse the calendars into Event objects (each object has a start time and end time unix timestamps), the class Event should also have a toString() method for exporting.
Store all objects in an array then sort it by start time (ascendingly)
Initialize an array for the final result, let's call it "final_array"
Take the first Event in the array as "A"
Start iterating through the array starting with the next Event, let's name it "B"
If B starts after A ends: Add A to final_array and make B the new A
If B starts before A ends:
If B ends before A ends: Do nothing
If B ends after A ends: Change A's end time to B's end time.
Go back to 5 if you haven't reached the end of the array
For each event in final_array, write event to the new calendar file

More efficient way of displaying querying a db, based on input from user

I have a database(mySQL) with a schedule for a bus. I want to be able to display the schedule based on some user inputs: route, day, and time. The bus makes at least 13 runs around the city in per day. The structure is set up as:
-Select Route(2 diff routes)
-Select Day(2 set of day, Sun-Wed & Thur-Sat)
-Select Time(atLeast 13 runs per day) = Show Schedule
My table structure is:
p_id, route day run# stop time
1 routeA m-w 1 stop1 12:00PM
1 routeA m-w 1 stop2 12:10PM
..and so on
I do have a functioning demo, however, it is very inefficient. I query the db for every possible run. I would like to avoid doing this.
Could anyone give me some tips to make this more efficient? OR show me some examples?

If you google for "bus timetable schema design" you will find lots of similar questions and many different solutions depending on the specific use case. Here is one similar question asked on here - bus timetable using SQL.
The first thing would be to normalise your data structure. There are many different approaches to this but a starting point would be something like -
routes(route_id, bus_no, route_name)
stops(stop_id, stop_name, lat/long, etc)
schedule(schedule_id, route_id, stop_id, arrive, depart)
You should do some searching and look to see the different use cases supported and how they relate to your specific scenario. The above example is only a crude example. It can be broken down further depending on the data being used. You may only want to store the time between stops in one table and then a start time for the route in another.

Multi line log parsing for service times in PHP/Python

What's the best way to parse a multi line log file that require contextual knowledge from previous lines in php and/or python?
ex.
Date Time ID Call
1/1/10 00:00:00 1234 Start
1/1/10 00:00:01 1234 ServiceCall A Starts
1/1/10 00:00:05 1234 ServiceCall B Starts
1/1/10 00:00:06 1234 ServiceCall A Finishes
1/1/10 00:00:09 1234 ServiceCall B Finishes
1/1/10 00:00:10 1234 Stop
Each log line will have a unique id to bind it to a session but each consecutive set of lines is not guaranteed to be from the same session.
The ultimate goal is to find out how long each transaction took and how long each sub transaction took.
I'd love to use a library if one already exists.

I can think of two different ways of doing this.
1) You can use a finite state machine to process the file line by line. When you hit a Start line, mark the time. When you hit a Stop line with the same ID, diff the time and report.
2) Use PHP's Perl-Compatible Regular Expressions with the m modifier to match all the text from each start/stop line set, then just look at the first and last lines of each match string returned.
In both cases, I would verify the IDs match to prevent against matching different sets.

My first thought would be to create objects each time my parser encountered the start pattern with a new key. I'm assuming,from your example that 1234 is a key such that all log lines which must be correlated together can be mapped to the state of one "thing" (object).
So you see the pattern to start tracking one of these and every time you see a log entry that relates to it you call methods for the type of event (state change) that these subsequent lines represent.
From your example these "log state" objects (for lack of a more apropos term) might contain a list or dictionary (or other container) for each ServiceCall (which I would expect would be another class of objects).
So the overall design would be a parser/dispatcher that reads the log, if the log item relates to some existing object (key) then the item is dispatched to the object which
can then further create its own (ServiceCall or other) objects and/or dispatch events to those or raise exceptions or invoke callbacks or call outs to other functions as needed.
Presumably you also will need to have some collection or final disposition handler which could be called by your log objects when the Stop events are dispatched to them.
I'd guess you'd also want to support some sort or status reporting method so that the application can enumerate all live (uncollected) objects to in response to signals or commands in some other channel (perhaps from a non-blocking check performed by the parser/dispatcher)

Here is a variation on a log parser I wrote a while ago, tailored to your log format. (The general approach tracks pretty closely with Jim Dennis's description, although I used a defaultdict of lists to accumulate all the entries for any given session.)
from pyparsing import Suppress,Word,nums,restOfLine
from datetime import datetime
from collections import defaultdict
def convertToDateTime(tokens):
month,day,year,hh,mm,ss = tokens
return datetime(year+2000, month, day, hh,mm,ss)
# define building blocks for parsing and processing log file entries
SLASH,COLON = map(Suppress,"/:")
integer = Word(nums).setParseAction(lambda t:int(t[0]))
date = integer + (SLASH + integer)*2
time = integer + (COLON + integer)*2
timestamp = date + time
timestamp.setParseAction(convertToDateTime)
# define format of a single line in the log file
logEntry = timestamp("timestamp") + integer("sessionid") + restOfLine("descr")
# summarize calls into single data structure
calls = defaultdict(list)
for logline in log:
entry = logEntry.parseString(logline)
calls[entry.sessionid].append(entry)
# first pass to find start/end time for each call
for sessionid in sorted(calls):
calldata = calls[sessionid]
print sessionid, calldata[-1].timestamp - calldata[0].timestamp
For your data, this prints out:
1234 0:00:10
You can process each session's list of entries with a similar approach to tease apart the sub-transactions.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Searchable date/time durations in SMW - php

Related

Fuzzy date match

Time Prediction based on existing date:time records

How to merge iCalendar events

More efficient way of displaying querying a db, based on input from user

Multi line log parsing for service times in PHP/Python

Categories

Resources