Date insertion in Cassandra DB: non-trivial 1h shift issue - php

I am a bit desperate about this problem... I have no idea how to face it.
Here is a simpler way to look at this problem:
If my insert cql query is:
"BEGIN BATCH USING CONSISTENCY ONE insert into my_table(id,'2014-04-11 8:00:00',...,'2014-04-15 10:00:00') values ('2036548',3.15,...,4.11) APPLY BATCH"
...and my data request cql query is:
"Select FIRST 100000 '2014-04-01 0:00:00'..'2014-04-16 0:00:00' from my_table where id=2036548"
...why does the inserted date 2014-04-15 10:00:00 changes to 2014-04-15 11:00:00 when pullling it from Cassandra?
The date pulling code in vb.net is:
Public Shared Function getCassandraDate(ByVal value As Byte()) As Date
Dim buffer As Byte() = New Byte(value.Length - 1) {}
value.CopyTo(buffer, 0)
Array.Reverse(buffer)
Dim ticks As Long = BitConverter.ToInt64(buffer, 0)
Dim dateTime As New System.DateTime(1970, 1, 1, 0, 0, 0, _
0)
dateTime = dateTime.AddMilliseconds(ticks)
Return dateTime.ToLocalTime
End Function
...same thing in PHP:
date_default_timezone_set("Europe/Paris");
$time = $this->unpackDate($packed_time);
$str_time = date('Y-m-d H:i:s',$time); //TODO : to local time
private function unpackDate($data, $is_name=null)
{
$arr = unpack('N2', $data);
// If we are on a 32bit architecture we have to explicitly deal with
// 64-bit twos-complement arithmetic since PHP wants to treat all ints
// as signed and any int over 2^31 - 1 as a float
if (PHP_INT_SIZE == 4) {
$hi = $arr[1];
$lo = $arr[2];
$isNeg = $hi < 0;
// Check for a negative
if ($isNeg) {
$hi = ~$hi & (int)0xffffffff;
$lo = ~$lo & (int)0xffffffff;
if ($lo == (int)0xffffffff) {
$hi++;
$lo = 0;
} else {
$lo++;
}
}
// Force 32bit words in excess of 2G to pe positive - we deal wigh sign
// explicitly below
if ($hi & (int)0x80000000) {
$hi &= (int)0x7fffffff;
$hi += 0x80000000;
}
if ($lo & (int)0x80000000) {
$lo &= (int)0x7fffffff;
$lo += 0x80000000;
}
$value = $hi * 4294967296 + $lo;
if ($isNeg)
$value = 0 - $value;
} else {
// Upcast negatives in LSB bit
if ($arr[2] & 0x80000000)
$arr[2] = $arr[2] & 0xffffffff;
// Check for a negative
if ($arr[1] & 0x80000000) {
$arr[1] = $arr[1] & 0xffffffff;
$arr[1] = $arr[1] ^ 0xffffffff;
$arr[2] = $arr[2] ^ 0xffffffff;
$value = 0 - $arr[1]*4294967296 - $arr[2] - 1;
} else {
$value = $arr[1]*4294967296 + $arr[2];
}
}
return $value / 1e3;
}
MORE DETAILS
Processing chain:
(1). insertion to Cassandra through .NET
(2). Cassandra data storage
(3). Pulling the data from PHP or .NET
Problem:
As for today, a date being 2014-04-15 10:00:00 in step (1), will come out as 2014-04-15 11:00:00 in step (3).
Details:
(regarding the date format in this chain)
(1). Local time in .NET (Timezone: "Europe/Paris"). Insertion cql that is being executed: "BEGIN BATCH USING CONSISTENCY ONE insert into my_table(id,'2014-04-11 8:00:00',...,'2014-04-15 10:00:00') values ('2036548',3.15,...,4.11) APPLY BATCH"
(2). ??? I don't know what Cassandra does here... ???
(3). Example of cql query to pull the data: "Select FIRST 100000 '2014-04-01 0:00:00'..'2014-04-16 0:00:00' from my_table where id=2036548". In php: date_default_timezone_set("Europe/Paris"); $str_time = date('Y-m-d H:i:s',$time);. In .NET: dateTime.ToLocalTime.
Extra info:
I think it worked well before the daylight saving time change some weeks ago. But I can not be sure about that.
If in step (1), if I changed the date to de date to UTC before inserting it, 2014-04-15 10:00:00 will become 2014-04-15 08:00:00 and the output will be 2014-04-15 09:00:00, which is still not correct.
I highly suspect that the trick here is between steps (1) and (2), that is to say, me not being able to understand how Cassandra treats dates.
Edit1:
#Ananth 's questions:
both cassandra and client run in the same datacenter?
It is complicated:
Insertion in .NET from server1, a different server from server-cassandra (datacenter).
PHP (to pull the data) running on server-cassandra.
.NET (to pull the data) running on server1, not on server-cassandra.
PHP and .NET pulling the same result.
Can you post your schema here?
Here it is
CREATE TABLE tsmeasures (
id int PRIMARY KEY
) WITH
comment='' AND
comparator=timestamp AND
read_repair_chance=0.100000 AND
gc_grace_seconds=0 AND
default_validation=double AND
min_compaction_threshold=4 AND
max_compaction_threshold=32 AND
replicate_on_write='true' AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='SnappyCompressor';
Edit2:
After testing it step by step, this is the result:
real date : 2014-04-15 17:00:00 (localtime)
cql text : '2014-04-15 15:00:00' (to UTC, done through .NET)
PHP Cassandra Unpack of this date => $ticks = 1397577600 (*) The unpack is done with the piece of code shown before
Ticks converted (through http://www.epochconverter.com/ )
GMT: Tue,
15 Apr 2014 16:00:00 GMT Your time zone: 4/15/2014 6:00:00 PM GMT+2
These results makes no sense to me...
More details:
cql insert:
"BEGIN BATCH USING CONSISTENCY ONE insert into tsmeasures(id,'2014-04-11 15:00:00',...,'2014-04-15 15:00:00') values ('2036548',0,...,4.85) APPLY BATCH"
cql fetch:
"SELECT '2014-04-10 16:00:00'..'2014-04-20 17:00:00' FROM tsmeasures WHERE id IN
(2036548,2036479,2036174,650877)"
Thus '2014-04-15 15:00:00' is included in the range of the fetch, and I can identify it because it is the highest value.
I will keep digging...

This seems to be a time zone issue. It appears you are neither specifying a timezone when storing nor when retrieving the timestamps. According to the documentation Cassandra applies the timezone of the coordinator node handling the write request if no timezone is supplied by the client. If timestamps shift between writing and reading them, that probably means all or some of your Cassandra nodes are not configured for the same timezone as your client is.

Before Edit
Is there a clock time sync problem between your client and cassandra? I would strictly recommend running NTP between your client and cassandra installation.
Post Edit
CREATE TABLE tsmeasures (
id int PRIMARY KEY
) WITH
comment='' AND
comparator=timestamp AND
read_repair_chance=0.100000 AND
gc_grace_seconds=0 AND
default_validation=double AND
min_compaction_threshold=4 AND
max_compaction_threshold=32 AND
replicate_on_write='true' AND
compaction_strategy_class='SizeTieredCompactionStrategy' AND
compression_parameters:sstable_compression='SnappyCompressor';
From what you have given , it looks like you are trying to get the insertion time .
Your problem might be due to clients running in different clock cycles with respect to cassandra. Cassandra just places a unix timestamp for each write.
So what is happening here from what i see.
You write from client using timestamp X(datastax driver sets this insertion timestamp). Cassandra writes with X.
You read with a timestamp Y. Cassandra tries to read with timestamp Y(So , as per your explanation, PHP client is there in a different location).
Both a are bound to differ.
Solution 1
Try to have a global NTP between the entire set up so that client clock cycles are in sync with cassandra.
Solution 2
Insert a column named timestamp which is user driven and do a range scan based on that
Solution 3
Set the insertion time in DML operations.

Related

PHP MySql time calculations

I want to calculate how much time the worker has left since break start.
By the click, PHP insert into history table a record with timestamp (break_start), after second click, PHP update record with timestamp (break_end).
Now i can calculate time difference using this code:
$break_start = $worker->query('SELECT break_start FROM history WHERE id = 16')->fetchArray();
$break_end = $worker->query('SELECT break_end FROM history WHERE id = 16')->fetchArray();
$diff = $worker->query("SELECT TIMEDIFF('".$break_end['break_end']."','".$break_start['break_start']."') AS total")->fetchArray();
$break_total = $worker->query("SELECT break_time FROM worker WHERE id = 7")->fetchArray();
echo $diff['total']." ";
echo $break_total['break_time']." ";
$str1 = strtotime($diff['total']);
$str2 = strtotime($break_total['break_time']);
Output is: 00:31:26 00:30:00.00 86 01:26
(This code above is just attempt to understand time in PHP and MySQL).
I want to subtract break time from "break" value which is stored in table "worker" (default value is 30).
I don`t know how to do this..

2D PHP array, join values based on similar values

I have PHP array which I use to draw a graph
Json format:
{"y":24.1,"x":"2017-12-04 11:21:25"},
{"y":24.1,"x":"2017-12-04 11:32:25"},
{"y":24.3,"x":"2017-12-04 11:33:30"},
{"y":24.1,"x":"2017-12-04 11:34:25"},
{"y":24.2,"x":"2017-12-04 11:35:35"},.........
{"y":26.2,"x":"2017-12-04 11:36:35"}, ->goes up for about a minute
{"y":26.3,"x":"2017-12-04 11:37:35"},.........
{"y":24.1,"x":"2017-12-04 11:38:25"},
{"y":24.3,"x":"2017-12-04 11:39:30"}
y=is temperature and x value is date time,
as you can see temperature doesn't change so often even if, it change only for max 0.4. But sometimes after a long period of similar values it change for more than 0.4.
I would like to join those similar values, so graph would not have 200k of similar values but only those that are "important".
I would need an advice, how to make or which algorithm would be perfect to create optimized array like i would like.
perfect output:
{"y":24.1,"x":"2017-12-04 11:21:25"},.........
{"y":24.1,"x":"2017-12-04 11:34:25"},
{"y":24.2,"x":"2017-12-04 11:35:35"},.........
{"y":26.2,"x":"2017-12-04 11:36:35"}, ->goes up for about a minute
{"y":26.3,"x":"2017-12-04 11:37:35"},.........
{"y":24.1,"x":"2017-12-04 11:38:25"}
Any help?
As you specified php I'm going to assume you can handle this on the output side.
Basically, you want logic like "if the absolute value of the temperature exceeds the last temperature by so much, or the time is greater than the last time by x minutes, then let's output a point on the graph". If that's the case you can get the result by the following:
$temps = array(); //your data in the question
$temp = 0;
$time = 0;
$time_max = 120; //two minutes
$temp_important = .4; //max you'll tolerate
$output = [];
foreach($temps as $point){
if(strtotime($point['x']) - $time > $time_max || abs($point['y'] - $temp) >= $temp_important){
// add it to output
$output[] = $point;
}
//update our data points
if(strtotime($point['x']) - $time > $time_max){
$time = strtotime($point['x']);
}
if(abs($point['y'] - $temp) >= $temp_important){
$temp = $point['y'];
}
}
// and out we go..
echo json_encode($output);
Hmm, that's not exactly what you're asking for, as if the temp spiked in a short time and then went down immediately, you'd need to change your logic - but think of it in terms of requirements.
If you're RECEIVING data on the output side I'd write something in javascript to store these points in/out and use the same logic. You might need to buffer 2-3 points to make your decision. Your logic here is performing an important task so you'd want to encapsulate it and make sure you could specify the parameters easily.

How to efficiently filter datetime column for extracting data?

I am usng sqlite to log data every 5 minutes to a column that is time stamped with and integer in Unix time. The user interface uses php code to extract data in various user selectable time frames which is then plotted using javascript. Charts typically have 12 data/time points and I need to extract data for plotting over different periods of say 1Hr/12Hr/24Hr/12days/month/year. So only need to extract 12 data rows per search. So for a 24Hr plot I need to only extract data at houly intervals (when minutes = 0) similarly for 12day plots at daily intervals (when mins=0 && hours=0) etc.
My php code for 1Hr works fine since the data is logged every 5min giving me 12 rows of data between search start time and end time. What is an efficient way of extracting data for the longer periods when number of rows between start time and end time is greater than 12? I need to further filter the search to efficiently extract only the data I need?
any suggestions - most appreciated - frank
$db = new MyDB(); // open database
$t=time(); // get current time
$q1 = "SELECT TimeStamp,myData FROM mdata WHERE ";
$q2 = " AND TimeStamp <=".$t; // end time
$q3 = " AND TimeStamp >=".($t-3600); // start time 1 hour earlier
$qer = $q1.$q2.$q3; // my search query form above parts
$result = $db->query($qer);
$json = array();
while ($data = $result->fetchArray(SQLITE_NUM)) {
$json[] = $data;
}
echo json_encode($json); // data is returned as json array
$db->close(); // close database connection
I think you should use WHERE date BETWEEN in your search query?
This kind of search could take up a lot of time once data builds up?
Since you already know the exact times you're interested in, you should probably just build an array of times and use SQL's IN operator:
$db = new MyDB(); // open database
$timeStep = 300; // Time step to use, 5 minutes here - this would be 3600 for hourly data
$t = time(); // get current time
$t -= $t % $timeStep; // round to the proper interval
$query = "SELECT TimeStamp,myData FROM mdata ";
$query .= "WHERE TimeStamp IN "
$query .= "(" . implode(",", range($t, $t + $timeStep * 12, $timeStep)) . ")";
$result = $db->query($query);
$json = array();
while ($data = $result->fetchArray(SQLITE_NUM)) {
$json[] = $data;
}
You'll need to do some different math for monthly data - try constructing 12 times with PHP's mktime() function.
Here are the references for the PHP implode() and range() functions I used.

List dates in months from database

Hey, I need som help to list my added dates from database, and split it into their added month.
I have no clue on how to do it... Soe can someone please show me examples, or maybe some tutorials how to do?
Thx
Something along the lines of this, perhaps?
SELECT * FROM table GROUP BY MONTH(dateColumn)
SELECT * FROM table WHERE MONTH(dateColumn) = 9
A must-read reference for date & time handling functions in MySQL is:
http://dev.mysql.com/doc/refman/5.1/en/date-and-time-functions.html
Are you looking for the MySQL MONTH() Function?
Query: SELECT MONTH(NOW());
Output: 11
Not entirely sure what you mean but here goes...
The sample below creates a test collection (assumably your collection retrieved from the database) and groups them by Month and Year and then displays the result. It uses Linq and anonymous objects which you could easily replace with some POCO classes...
Sub Main()
Dim ls As New List(Of Object)
Dim lsGroup As New List(Of Object)
Dim ran As New Random(Now.Millisecond)
'' build a sample collection
For x As Integer = 1 To 100
ls.Add(New With {.ID = x, .DateAdded = Now.AddMinutes(-(ran.Next(1, 100000)))})
Next
'' now group them into years and months
For Each item In ls
Dim currentItem As Object = lsGroup.Where(Function(o) o.Year = item.DateAdded.Year And o.Month = item.DateAdded.Month).SingleOrDefault()
If currentItem Is Nothing Then
'' create
Dim var = New With {.Year = item.DateAdded.Year, .Month = item.DateAdded.Month, .ItemCollection = New List(Of Object)}
var.ItemCollection.Add(item)
lsGroup.Add(var)
Else
'' add
currentItem.ItemCollection.Add(item)
End If
Next
'' display the results
For Each group In lsGroup
Console.WriteLine(group.Year & " - " & MonthName(group.Month))
For Each item In group.ItemCollection
Console.WriteLine(" > " & item.ID & " - " & item.DateAdded.ToString())
Next
Console.WriteLine()
Next
Console.ReadLine()
End Sub
Here's what I do when I need the month that's in a timestamp or date item called "t".
TIMESTAMP(DATE_FORMAT(t,'%y-%m-01'))
This returns another timestamp that represents midnight on the first day of that month.
Works for weeks too.
TIMESTAMP(FROM_DAYS(TO_DAYS(t) -MOD(TO_DAYS(t) -1, 7)))
This obscure incantation returns a timestamp that represents midnight on the Sunday preceding the given timestamp.

Sort tinytext time from mysql db query with sql or sort the array it produces using PHP

Hey guys - The problem stems from a poorly designed database used to store real estate information. I set up a template for my client to select a weekend and to display the open houses for that weekend. Open house times (ohtime1, ohtime2, ohtime3) are stored as tinytext, with no way of knowing AM or PM. "12:00 - 2:00" and "01:00 - 03:00" are common entries that we humans discern as noon-2pm and 1pm-3pm, however when I query the database and ORDER BY ohtime1, it obviously puts 01:00 before 12:00. I am having difficulty sorting using SQL and using the different php sort methods. The initial listings array with all the open house information is set up like something as follows:
$listings[0][displayaddress] = empire state building
$listings[0][baths] = too many to count
$listings[0][ohtime1] = 12:00 - 02:00
$listings[1][displayaddress] = madison square garden
$listings[1][baths] = 2
$listings[1][ohtime1] = 01:00 - 03:00
etc...
I iterate through $listings with foreach($listings as $listing) to process for the smarty templates we use, as well as to separate into the different days, and then again for manhattan and brooklyn listings. This results in 4 new arrays. My theory was if I convert all the times before 09:00am to 24 hour time, then sort them, then assign to the different day/borough it would work. Here is the converting code:
$p = explode("-",$listing[ohtime1]); //01:00 - 03:00
$time1 = trim($p[0]); //01:00
$time2 = trim($p[1]); //03:00
$hour1 = substr($time1,0,2); //01
$hour2 = substr($time2,0,2); //03
$min1 = explode(":",$time1);
$min2 = explode(":",$time2);
$min1 = $min1[1]; //00
$min2 = $min2[1]; //00
//convert all times to 24 hour
if($hour1 < 9) $hour1 = $hour1+12; //13
if($hour2 < 9) $hour2 = $hour2+12; //15
$listing[ohtime1] = $hour1.":".$min1." - ".$hour2.":".$min2; //13:00 - 15:00
$listing[hour1] = $hour1;
$listing[hour2] = $hour2;
Converting wasn't difficult, but am at a loss as to how to sort them. I am not versed in advanced SQL theory as to implement the conversion to 24hrs I did in php into mysql. I was also thinking I could implement a sorting feature when I create the new arrays but I am again at a loss. Here is the code for separating into the new arrays:
foreach($openhouse_date_fields as $oh){ //3 possible open house dates
if(substr($listing[$oh], 0,10) == $date) { //if any of the listings's open houses match the first search date
if($listing[sect] == "Brooklyn") {
$listingsb[$listing[displayaddress]] = $listing;
}
else
$listingsm[$listing[displayaddress]] = $listing;
}
elseif(substr($listing[$oh], 0,10) == $date2) { //if any of the listings's open houses match the second search date
if($listing[sect] == "Brooklyn")
$listingsb2[$listing[displayaddress]] = $listing;
else
$listingsm2[$listing[displayaddress]] = $listing;
}
}
I hope that is enough information. Thanks for taking the time to read and for any feedback!
Here's an example of converting one of the TINYTEXT columns to a pair of columns of type TIME:
SELECT
MAKETIME(start_hour + IF(start_hour<9, 12, 0), start_minute, 0) AS start_time,
MAKETIME(finish_hour + IF(start_hour<9, 12, 0), finish_minute, 0) AS finish_time
FROM (
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(ohtime1, ' - ', 1), ':', 1) AS start_hour,
SUBSTRING_INDEX(SUBSTRING_INDEX(ohtime1, ' - ', 1), ':', -1) AS start_minute,
SUBSTRING_INDEX(SUBSTRING_INDEX(ohtime1, ' - ', -1), ':', 1) AS finish_hour,
SUBSTRING_INDEX(SUBSTRING_INDEX(ohtime1, ' - ', -1), ':', -1) AS finish_minute
FROM MyOpenHouseTable) t
ORDER BY start_time;

Categories