Cassandra datastax | TimeoutException - php

I have a problem with cassandra I have the following error.
I link a picture
Code syntaxe :
public function find($db_table = null, $db_id = null) {
$filter = "";
$return = array();
$cluster = $this->cluster();
$session = $cluster->connect($this->keyspace);
if(isset($db_table)) {
$filter .= " WHERE db_table like '%".$db_table."%' ";
if($db_id != null) {
$filter .= " AND db_id = '".$db_id."' ALLOW FILTERING";
}
}
$query = new Cassandra\SimpleStatement("SELECT * FROM ".$this->keyspace.".log $filter;");
$result = $session->executeAsync($query);
$rows = $result->get();
Cassandra Error picture

You should not use "allow filtering" unless you know what you are doing.
SELECT * FROM prod.log WHERE db_id = 13913 AND db_table LIKE '%%' product LIMIT 5000 is timing out, as you seem to have a lot of entries in the DB and allow filtering is doing a full table scan.
You should adapt the table design to match your queries.

A more detail can be found here.
https://docs.datastax.com/en/developer/php-driver/1.2/api/Cassandra/Cluster/class.Builder/
Using withConnectTimeout may help avoid a TimeoutException
$cluster = $this->cluster()->withConnectTimeout(60);
you may increase the timeout value Although you update little more by changing values in /etc/cassandra/cassandra.yaml
similar to below --
sudo nano /etc/cassandra/cassandra.yaml (for editing of cassandra.yaml file)
# How long the coordinator should wait for read operations to complete
read_request_timeout_in_ms: 50000
# How long the coordinator should wait for seq or index scans to complete
range_request_timeout_in_ms: 100000
# How long the coordinator should wait for writes to complete
write_request_timeout_in_ms: 20000
# How long the coordinator should wait for counter writes to complete
counter_write_request_timeout_in_ms: 50000
# How long a coordinator should continue to retry a CAS operation
# that contends with other proposals for the same row
cas_contention_timeout_in_ms: 10000
# How long the coordinator should wait for truncates to complete
# (This can be much longer, because unless auto_snapshot is disabled
# we need to flush first so we can snapshot before removing the data.)
truncate_request_timeout_in_ms: 600000
# The default timeout for other, miscellaneous operations
request_timeout_in_ms: 100000
# How long before a node logs slow queries. Select queries that take longer than
# this timeout to execute, will generate an aggregated log message, so that slow queries
# can be identified. Set this value to zero to disable slow query logging.
slow_query_log_timeout_in_ms: 5000

using "LIKE" on cassandra i dont think so :(
your query :( try do something more clean instead use ".$db_table." do the properly bind use . ? and then inside your exec(query,['value'])
what do your mean ? with SELECT * FROM ".$this->keyspace.".log
this is not a query ! if you use this in php, the syntax definitely this is completely wrong.
you wrote
SELECT * FROM keyspace_name.log WHERE table like 'wherever' and id = 'something' :(
and worse select ALL
this never gonna happen
u use $this for invoke your cluster from where ? did this come from ?
ok i can enumerate 10 different reasons to your code doesn't work, but this is not my goal i want help u so just try something simple.
____ good_____
<?
$cluster = Cassandra::cluster()
->withContactPoints('127.0.0.1')
->build();
$session = $cluster->connect("your_k_space");
$table_list = $session->execute("SELECT table_name FROM system_schema.tables WHERE keyspace_name = 'your_k_space'");
if (in_array($db_table, $table_list)) {
$options = array('arguments' => [$db_table,$db_id]);
$result = $session->execute("SELECT * FROM ? WHERE db_id = ? ALLOW FILTERING",$options);
foreach ($result as $key => $value) print_r($value);
}else{
die('table not found');
}

Related

PHP MySQL - Update 6.5m rows performance issues

I am working with a MySQL table and I need to increment a value in one column for each row, of which there are over 6.5m.
The col type is varchar and can contain an integer or a string (i.e. +1). The table type is MyISAM.
I have attempted this with PHP:
$adjust_by = 1;
foreach ($options as $option) {
$original_turnaround = $option['turnaround'];
$adjusted_turnaround = $option['turnaround'];
if (preg_match('/\+/i', $original_turnaround)) {
$tmp = intval($original_turnaround);
$tmp += $adjust_by;
$adjusted_turnaround = '+'.$tmp;
} else {
$adjusted_turnaround += $adjust_by;
}
if (!array_key_exists($option['optionid'], $adjusted)) {
$adjusted[$option['optionid']] = array();
}
$adjusted[$option['optionid']][] = array(
'original_turn' => $original_turnaround,
'adjusted_turn' => $adjusted_turnaround
);
}//end fe options
//update turnarounds:
if (!empty($adjusted)) {
foreach ($adjusted as $opt_id => $turnarounds) {
foreach ($turnarounds as $turn) {
$update = "UPDATE options SET turnaround = '".$turn['adjusted_turn']."' WHERE optionid = '".$opt_id."' and turnaround = '".$turn['original_turn']."'";
run_query($update);
}
}
}
For obvious reasons there are serious performance issues with this approach. Running this in my local dev environment leads to numerous errors and eventually the server crashing.
Another thing I need to consider is when this is run in a production environment. This is for an ecommerce store, and I cannot have a huge update like this lock the database or cause any other issues.
One possible solution I have found is this: Fastest way to update 120 Million records
But creating another table comes with it's own issues. The codebase is not in a good state, similar queries are run on this table in loads of places so I would have to modify a large number of queries and files to make this approach work.
What are my options (if there are any)?
You can do this task with SQL.
With CAST you can convert a string into integer.
With IF and SUBSTR you can check if string contains +.
With CONCAT you will add (merge a two values into one string) + to your calculated result (if it will be necessary).
Just try this SQL:
"UPDATE `options` SET `turnaround` = CONCAT(IF(SUBSTR(`turnaround`, 1, 1) = '+', '+', ''), CAST(`turnaround` AS SIGNED) + " + $adjust_by + ") WHERE 1";
can't you just say
UPDATE whatevertable SET whatever = whatever + 1?
Try it and see, I'm pretty sure it will work!
EDIT: You have strings OR integers? Your DB design is flawed, this probably won't work, but would have been the correct answer had your DB design been more strict.
You probably don't have, but need, this 'composite' index (in either order):
INDEX(optionid, turnaround)
Please provide SHOW CREATE TABLE.
Another, slight, performance boost is to explicitly LOCK TABLE WRITE before that update loop. And UNLOCK afterwards. Caution: This only applies to MyISAM.
You would be much better off with InnoDB.

Database overload in long task using Laravel

I'm currently struggling with an issue that is overloading my database which makes all page requests being delayed significantly.
Current scenario
- A certain Artisan Command is scheduled to be ran every 8 minutes
- This command has to update a whole table with more than 30000 rows
- Every row will have a new value, which means 30000 queries will have to be executed
- For about 14 seconds the server doesn't answer due to database overload (I guess)
Here's the handle method of the command handle()
public function handle()
{
$thingies = /* Insert big query here */
foreach ($thingies as $thing)
{
$resource = Resource::find($thing->id);
if(!$resource)
{
continue;
}
$resource->update(['column' => $thing->value]);
}
}
Is there any other approach to do this without making my page requests being delayed?
Your process is really inefficient and I'm not surprised it takes a long time to complete. To process 30,000 rows, you're making 60,000 queries (half to find out if the id exists, and the other half to update the row). You could be making just 1.
I have no experience with Laravel, so I'll leave it up to you to find out what functions in Laravel can be used to apply my recommendation. I just want to get you to understand the concepts.
MySQL allows you to submit a multi query; One command that executes many queries. It is drastically faster than executing individual queries in a loop. Here is an example that uses MySQLi directly (no 3rd party framework such as Laravel)
//the 30,000 new values and the record IDs they belong to. These values
// MUST be escaped or known to be safe
$values = [
['id'=>145, 'fieldName'=>'a'], ['id'=>2, 'fieldName'=>'b']...
];
// %s and %d will be replaced with column value and id to look for
$qry_template = "UPDATE myTable SET fieldName = '%s' WHERE id = %d";
$queries = [];//array of all queries to be run
foreach ($values as $row){ //build and add queries
$q = sprintf($qry_template,$row['fieldName'],$row['id']);
array_push($queries,$q);
}
//combine all into one query
$combined = implode("; ",$queries);
//execute all queries at once
$mysqli->multi_query($combined);
I would look into how Laravel does multi queries and start there. The last time I implemented something like this, it took about 7 milliseconds to insert 3,000 rows. So updating 30,000 will definitely not take 14 seconds.
As an added bonus, there is no need to first run a query to figure out whether the ID exists. If it doesn't, nothing will be updated.
Thanks to #cyclone comment I was able to update all the values in one single query.
It's not a perfect solution, but the query execution time now takes roughly 8 seconds and only 1 connection is required, which means the page requests are still being handled when the query is being executed.
I'm not marking this question as definitive since there might be improvements to make.
$ids = [];
$caseQuery = '';
foreach ($thingies as $thing)
{
if(strlen($caseQuery) == 0)
{
$caseQuery = '(CASE WHEN id = '. $thing->id . ' THEN \''. $thing->rank .'\' ';
}
else
{
$caseQuery .= ' WHEN id = '. $thing->id . ' THEN \''. $thing->rank .'\' ';
}
array_push($ids, $thing->id);
}
$caseQuery .= ' END)';
// Execute query
DB::update('UPDATE <table> SET <value> = '. $caseQuery . ' WHERE id IN ('. implode( ',' , $ids) .')');

Splitting a string of values like 1030:0,1031:1,1032:2 and storing data in database

I have a bunch of photos on a page and using jQuery UI's Sortable plugin, to allow for them to be reordered.
When my sortable function fires, it writes a new order sequence:
1030:0,1031:1,1032:2,1040:3,1033:4
Each item of the comma delimited string, consists of the photo ID and the order position, separated by a colon. When the user has completely finished their reordering, I'm posting this order sequence to a PHP page via AJAX, to store the changes in the database. Here's where I get into trouble.
I have no problem getting my script to work, but I'm pretty sure it's the incorrect way to achieve what I want, and will suffer hugely in performance and resources - I'm hoping somebody could advise me as to what would be the best approach.
This is my PHP script that deals with the sequence:
if ($sorted_order) {
$exploded_order = explode(',',$sorted_order);
foreach ($exploded_order as $order_part) {
$exploded_part = explode(':',$order_part);
$part_count = 0;
foreach ($exploded_part as $part) {
$part_count++;
if ($part_count == 1) {
$photo_id = $part;
} elseif ($part_count == 2) {
$order = $part;
}
$SQL = "UPDATE article_photos ";
$SQL .= "SET order_pos = :order_pos ";
$SQL .= "WHERE photo_id = :photo_id;";
... rest of PDO stuff ...
}
}
}
My concerns arise from the nested foreach functions and also running so many database updates. If a given sequence contained 150 items, would this script cry for help? If it will, how could I improve it?
** This is for an admin page, so it won't be heavily abused **
you can use one update, with some cleaver code like so:
create the array $data['order'] in the loop then:
$q = "UPDATE article_photos SET order_pos = (CASE photo_id ";
foreach($data['order'] as $sort => $id){
$q .= " WHEN {$id} THEN {$sort}";
}
$q .= " END ) WHERE photo_id IN (".implode(",",$data['order']).")";
a little clearer perhaps
UPDATE article_photos SET order_pos = (CASE photo_id
WHEN id = 1 THEN 999
WHEN id = 2 THEN 1000
WHEN id = 3 THEN 1001
END)
WHERE photo_id IN (1,2,3)
i use this approach for exactly what your doing, updating sort orders
No need for the second foreach: you know it's going to be two parts if your data passes validation (I'm assuming you validated this. If not: you should =) so just do:
if (count($exploded_part) == 2) {
$id = $exploded_part[0];
$seq = $exploded_part[1];
/* rest of code */
} else {
/* error - data does not conform despite validation */
}
As for update hammering: do your DB updates in a transaction. Your db will queue the ops, but not commit them to the main DB until you commit the transaction, at which point it'll happily do the update "for real" at lightning speed.
I suggest making your script even simplier and changing names of the variables, so the code would be way more readable.
$parts = explode(',',$sorted_order);
foreach ($parts as $part) {
list($id, $position) = explode(':',$order_part);
//Now you can work with $id and $position ;
}
More info about list: http://php.net/manual/en/function.list.php
Also, about performance and your data structure:
The way you store your data is not perfect. But that way you will not suffer any performance issues, that way you need to send less data, less overhead overall.
However the drawback of your data structure is that most probably you will be unable to establish relationships between tables and make joins or alter table structure in a correct way.

Query on large mysql database

i've got a script which is supposed to run through a mysql database and preform a certain 'test'on the cases. Simplified the database contains records which represent trips that have been made by persons. Each record is a singel trip. But I want to use only roundway trips. So I need to search the database and match two trips to each other; the trip to and the trip from a certain location.
The script is working fine. The problem is that the database contains more then 600.000 cases. I know this should be avoided if possible. But for the purpose of this script and the use of the database records later on, everything has to stick together.
Executing the script takes hours right now, when executing on my iMac using MAMP. Off course I made sure that it can use a lot of memory etcetare.
My question is how could I speed things up, what's the best approach to do this?
Here's the script I have right now:
$table = $_GET['table'];
$output = '';
//Select all cases that has not been marked as invalid in previous test
$query = "SELECT persid, ritid, vertpc, aankpc, jaar, maand, dag FROM MON.$table WHERE reasonInvalid != '1' OR reasonInvalid IS NULL";
$result = mysql_query($query)or die($output .= mysql_error());
$totalCountValid = '';
$totalCountInvalid = '';
$totalCount = '';
//For each record:
while($row = mysql_fetch_array($result)){
$totalCount += 1;
//Do another query, get all the rows for this persons ID and that share postal codes. Postal codes revert between the two trips
$persid = $row['persid'];
$ritid = $row['ritid'];
$pcD = $row['vertpc'];
$pcA = $row['aankpc'];
$jaar = $row['jaar'];
$maand = $row['maand'];
$dag = $row['dag'];
$thecountquery = "SELECT * FROM MON.$table WHERE persid=$persid AND vertpc=$pcA AND aankpc=$pcD AND jaar = $jaar AND maand = $maand AND dag = $dag";
$thecount = mysql_num_rows(mysql_query($thecountquery));
if($thecount >= 1){
//No worries, this person ID has multiple trips attached
$totalCountValid += 1;
}else{
//Ow my, the case is invalid!
$totalCountInvalid += 1;
//Call the markInvalid from functions.php
$totalCountValid += 1;
markInvalid($table, '2', 'ritid', $ritid);
}
}
//Echo the result
$output .= 'Total cases: '.$totalCount.'<br>Valid: '.$totalCountValid.'<br>Invalid: '.$totalCountInvalid; echo $output;
Your basic problem is that you are doing the following.
1) Getting all cases that haven't been marked as invalid.
2) Looping through the cases obtained in step 1).
What you can easily do is to combine the queries written for 1) and 2) in a single query and loop over the data. This will speed up the things a bit.
Also bear in mind the following tips.
1) Selecting all columns is not at all a good thing to do. It takes ample amount of time for the data to traverse over the network. I would recommend replacing the wild-card with all columns that you really need.
SELECT * <ALL_COlumns>
2) Use indexes - sparingly, efficiently and appropriately. Understand when to use them and when not to.
3) Use views if you can.
4) Enable MySQL slow query log to understand which queries you need to work on and optimize.
log_slow_queries = /var/log/mysql/mysql-slow.log
long_query_time = 1
log-queries-not-using-indexes
5) Use correct MySQL field types and the storage engine (Very very important)
6) Use EXPLAIN to analyze your query - EXPLAIN is a useful command in MySQL which can provide you some great details about how a query is ran, what index is used, how many rows it needs to check through and if it needs to do file sorts, temporary tables and other nasty things you want to avoid.
Good luck.

PHP Script Timeout

I have a custom script for a bulletin board system that counts the number of threads a user has made, and updates a column accordingly. This works fine, however with 100,000+ users, it times out when running it for the first time.
I've tried adding the following before the query, but it still times out (500 error).
set_time_limit(0);
ignore_user_abort(true);
Additional: I'm using this script on my vps.
Query:
set_time_limit(0);
ignore_user_abort(true);
$db->write_query("ALTER TABLE `".TABLE_PREFIX."users` ADD `numthreads` int(10) unsigned NOT NULL default '0'");
// load users into an array to count number of threads
$query = $db->simple_select("users", "uid");
while($user = $db->fetch_array($query))
{
$users[$user['uid']] = $user;
}
foreach($users as $user)
{
$query = $db->simple_select("threads", "COUNT(tid) AS threads", "uid = '{$user['uid']}'");
// get total number of threads
$numthreads = intval($db->fetch_field($query, "threads"));
$db->update_query("users", array("numthreads" => $numthreads), "uid = '{$user['uid']}'");
}
Use this:
ini_set('max_execution_time', 0);
Inplace of:
set_time_limit(0);
ignore_user_abort(true);
You can also edit php.ini:
max_execution_time = 60; //Maximum execution time of each script, in seconds
max_input_time = 60; //Maximum amount of time each script may spend parsing request data
Hope this helps.
First you should separate your ALTER statement. Execute the ALTER first and then do the rest. Alter table can be expensive in time if you have a big table. You can run it manually, using phpmyadmin or via shell (even better since there's no php timeout). which will give you the ability to not timeout.
Then remove the ALTER from the script and run it.
and then use:
$query = $db->simple_select("users", "uid");
while($user = $db->fetch_array($query))
{
$query = $db->simple_select("threads", "COUNT(tid) AS threads", "uid = '{$user['uid']}'");
$numthreads = intval($db->fetch_field($query, "threads"));
$db->update_query("users", array("numthreads" => $numthreads), "uid = '{$user['uid']}'");
}
try this
ini_alter ("max_execution_time", 600000000);
$tmp = ini_get ( "max_execution_time" );
set_time_limit(600000000);
A common reason that people are not able to change the max execution time is that their host does not allow them to. Make sure to find out that they do not block this.

Categories