PHP Mongo's cursor speed very slow

PHP Mongo's cursor speed very slow - php

This code takes ~0.1s
// find
$benchmark = Profiler::start ('Testing', 'find()');
$cursor = MongoBG::getInstance ( )->setDatabase ('test')->setCollection ('testcoll')->find();
Profiler::stop ($benchmark);
$benchmark = Profiler::start ('Testing', 'cursor walk');
while ($cursor->hasNext()) {
print_r($cursor->getNext());
}
Profiler::stop ($benchmark);
so "find()" took only 0.000017 seconds
but "cursor walk" 0.102812 seconds
Collection is about 100 rows, speed remains the same with 1000 or only 10 items in it.
Some server info:
FreeBSD 8.1, PHP 5.3.5 with (mongo/1.1.4), MongoDB version 1.6.6-pre

With such a quick time, it sounds like find didn't do anything but prepare an object (no communication with the database), and it's only upon using the cursor the actual query was executed and results were read. The cursor is doing the work which is why it's slower.
I know that's how the mongodb drivers for node.js worked. If you looked at it like that, the cursor speed isn't bad for opening a connection, authenticating, sending the query, receiving and buffering the response, then parsing/loading it into an object to return to you.

Related

Cassandra - DataSax PHP Driver - Timeout

I'm having problems with timeout using the DataSax php driver for Cassandra.
Whenever I execute a certain command it always throws this exception after 10s:
PHP Fatal error: Uncaught exception 'Cassandra\Exception\TimeoutException' with message 'Request timed out'
My php code is like this:
$cluster = Cassandra::cluster()->build();
$session = $cluster->connect("my_base");
$statement = new Cassandra\SimpleStatement("SELECT COUNT(*) as c FROM my_table WHERE my_colunm = 1 AND my_colunm2 >= '2015-01-01' ALLOW FILTERING")
$result = $session->execute($statement);
$row = $result->first();
My settings in cassandra.yaml is:
# How long the coordinator should wait for read operations to complete
read_request_timeout_in_ms: 500000
# How long the coordinator should wait for seq or index scans to complete
range_request_timeout_in_ms: 1000000
# How long the coordinator should wait for writes to complete
write_request_timeout_in_ms: 2000
# How long the coordinator should wait for counter writes to complete
counter_write_request_timeout_in_ms: 50000
# How long a coordinator should continue to retry a CAS operation
# that contends with other proposals for the same row
cas_contention_timeout_in_ms: 50000
# How long the coordinator should wait for truncates to complete
# (This can be much longer, because unless auto_snapshot is disabled
# we need to flush first so we can snapshot before removing the data.)
truncate_request_timeout_in_ms: 60000
# The default timeout for other, miscellaneous operations
request_timeout_in_ms: 1000000
I've already tried this:
$result = $session->execute($statement,new Cassandra\ExecutionOptions([
'timeout' => 120
])
);
and this:
$cluster = Cassandra::cluster()->withDefaultTimeout(120)->build();
and this:
set_time_limit(0)
And it always throws the TimeoutException after 10s..
I'm using Cassandra 3.6
Any idea?

Using withConnectTimeout (instead of, or together with withDefaultTimeout) might help avoid a TimeoutException (it did in my case)
$cluster = Cassandra::cluster()->withConnectTimeout(60)->build();
However, if you need such a long timeout, then there is probably an underlying problem that will need solving eventually.

Two things you are doing wrong.
ALLOW FILTERING : Be careful. Executing this query with allow filtering might not be a good idea as it can use a lot of your computing resources. Don't use allow filtering in production Read the
datastax doc about using ALLOW FILTERING
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/select_r.html?hl=allow,filter
count() : It also a terrible idea to use count(). count() actually pages through all the data. So a select count() from userdetails without a limit would be expected to timeout with that many rows. Some details here: http://planetcassandra.org/blog/counting-key-in-cassandra/
How to Fix it ?
Instead of using ALLOW FILTERING, You should create index table of
your clustering column if you need query without partition key.
Instead of using count(*) you should create a counter table

Sphinx How can i keep connection active even if no activity for longer time?

I was doing bulk inserts in the RealTime Index using PHP and by Disabling AUTOCOMIT ,
e.g.
// sphinx connection
$sphinxql = mysqli_connect($sphinxql_host.':'.$sphinxql_port,'','');
//do some other time consuming work
//sphinx start transaction
mysqli_begin_transaction($sphinxql);
//do 50k updates or inserts
// Commit transaction
mysqli_commit($sphinxql);
and kept the script running overnight, in the morning i saw
PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate
212334 bytes) in
so when i checked the nohup.out file closely , i noticed , these lines ,
PHP Warning: mysqli_query(): MySQL server has gone away in /home/script.php on line 502
Warning: mysqli_query(): MySQL server has gone away in /home/script.php on line 502
memory usage before these lines was normal , but memory usage after these lines started to increase, and it hit the php mem_limit and gave PHP Fatal error and died.
in script.php , line 502 is
mysqli_query($sphinxql,$update_query_sphinx);
so my guess is, sphinx server closed/died after few hours/ minutes of inactivity.
i have tried setting in sphinx.conf
client_timeout = 3600
Restarted the searchd by
systemctl restart searchd
and still i am facing same issue.
So how can i not make sphinx server die on me ,when no activity is present for longer time ?
more info added -
i am getting data from mysql in 50k chunks at a time and doing while loop to fetch each row and update it in sphinx RT index. like this
//6mil rows update in mysql, so it takes around 18-20 minutes to complete this then comes this following part.
$subset_count = 50000 ;
$total_count_query = "SELECT COUNT(*) as total_count FROM content WHERE enabled = '1'" ;
$total_count = mysqli_query ($conn,$total_count_query);
$total_count = mysqli_fetch_assoc($total_count);
$total_count = $total_count['total_count'];
$current_count = 0;
while ($current_count <= $total_count){
$get_mysql_data_query = "SELECT record_num, views , comments, votes FROM content WHERE enabled = 1 ORDER BY record_num ASC LIMIT $current_count , $subset_count ";
//sphinx start transaction
mysqli_begin_transaction($sphinxql);
if ($result = mysqli_query($conn, $get_mysql_data_query)) {
/* fetch associative array */
while ($row = mysqli_fetch_assoc($result)) {
//sphinx escape whole array
$escaped_sphinx = mysqli_real_escape_array($sphinxql,$row);
//update data in sphinx index
$update_query_sphinx = "UPDATE $sphinx_index
SET
views = ".$escaped_sphinx['views']." ,
comments = ".$escaped_sphinx['comments']." ,
votes = ".$escaped_sphinx['votes']."
WHERE
id = ".$escaped_sphinx['record_num']." ";
mysqli_query ($sphinxql,$update_query_sphinx);
}
/* free result set */
mysqli_free_result($result);
}
// Commit transaction
mysqli_commit($sphinxql);
$current_count = $current_count + $subset_count ;
}

So there are a couple of issues here, both related to running big processes.
MySQL server has gone away - This usually means that MySQL has timed out, but it could also mean that the MySQL process crashed due to running out of memory. In short, it means that MySQL has stopped responding, and didn't tell the client why (i.e. no direct query error). Seeing as you said that you're running 50k updates in a single transaction, it's likely that MySQL just ran out of memory.
Allowed memory size of 134217728 bytes exhausted - means that PHP ran out of memory. This also leads credence to the idea that MySQL ran out of memory.
So what to do about this?
The initial stop-gap solution is to increase memory limits for PHP and MySQL. That's not really solving the root cause, and depending on t he amount of control you have (and knowledge you have) of your deployment stack, it may not be possible.
As a few people mentioned, batching the process may help. It's hard to say the best way to do this without knowing the actual problem that you're working on solving. If you can calculate, say, 10000 or 20000 records instad of 50000 in a batch that may solve your problems. If that's going to take too long in a single process, you could also look into using a message queue (RabbitMQ is a good one that I've used on a number of projects), so that you can run multiple processes at the same time processing smaller batches.
If you're doing something that requires knowledge of all 6 million+ records to perform the calculation, you could potentially split the process up into a number of smaller steps, cache the work done "to date" (as such), and then pick up the next step in the next process. How to do this cleanly is difficult (again, something like RabbitMQ could simplify that by firing an event when each process is finished, so that the next one can start up).
So, in short, there are your best two options:
Throw more resources/memory at the problem everywhere that you can
Break the problem down into smaller, self contained chunks.

You need to reconnect or restart the DB session just before mysqli_begin_transaction($sphinxql)
something like this.
<?php
//reconnect to spinx if it is disconnected due to timeout or whatever , or force reconnect
function sphinxReconnect($force = false) {
global $sphinxql_host;
global $sphinxql_port;
global $sphinxql;
if($force){
mysqli_close($sphinxql);
$sphinxql = #mysqli_connect($sphinxql_host.':'.$sphinxql_port,'','') or die('ERROR');
}else{
if(!mysqli_ping($sphinxql)){
mysqli_close($sphinxql);
$sphinxql = #mysqli_connect($sphinxql_host.':'.$sphinxql_port,'','') or die('ERROR');
}
}
}
//10mil+ rows update in mysql, so it takes around 18-20 minutes to complete this then comes this following part.
//reconnect to sphinx
sphinxReconnect(true);
//sphinx start transaction
mysqli_begin_transaction($sphinxql);
//do your otherstuff
// Commit transaction
mysqli_commit($sphinxql);

PHP timeout running MSSQL SP

I am trying to run a MSSQL stored procedure from PHP using PDO. I do this all the time but with this SP it's timing out. The SP is rather complex and takes about 4 minutes to run.
I am calling it like this:
$setDate = '2014-01-03';
$queryMain = $coreDB->prepare("exec sp_ard :runDate");
$queryMain->bindParam("runDate",$setDate);
$queryMain->execute();
$e = $queryMain->errorInfo();
$d = $queryMain->fetchAll(PDO::FETCH_ASSOC);
print_r($e);
print_r($d);
When I run the page it runs for a minute or so then produces this error:
Array ( [0] => 08S01 [1] => 258 [2] => [Microsoft][SQL Server Native Client 10.0]TCP Provider: Timeout error [258]. )
I know the SP works fine. I can run it straight from the MSSQL management console. It takes about 4 minutes to run from there but it works fine.
I am trying to figure out how I can run this from PHP.
Any help would be great.
Thanks!

You should be able to configure the timeout using PDO.
PDO::setAttribute
public bool PDO::setAttribute ( int $attribute , mixed $value )
PDO::ATTR_TIMEOUT: Specifies the timeout duration in seconds. Not all drivers support this option, and it's meaning may differ from driver to driver. For example, sqlite will wait for up to this time value before giving up on obtaining an writable lock, but other drivers may interpret this as a connect or a read timeout interval.

If your problem is a timeout, you can use the function set_time_limit
set_time_limit(0);
If you pass a zero to the function, no time limit is imposed.
Also, make sure you are closing all the connections after using them.
Hope this help!

MySQL to Redis on a huge table, how to speed things up?

I have a bit of a problem when I try to take a huge amount of data from a mysql table to a redis database. Anyway I'm getting the error "MySQL server has gone away" after a while and I have no idea why..
EDIT:
OR when I use the commented code that breaks the loop it just goes "finished" when it isn't finished.
This is the php code I use (runned by php-cli):
<?php
require 'Predis/Autoloader.php';
Predis\Autoloader::register();
mysql_connect('localhost', 'root', 'notcorrect') or die(mysql_error());
mysql_select_db('database_that_i_use') or die(mysql_error());
$redis = new Predis\Client();
//starting on 0 but had to edit this when it crashed :(
for($i = 3410000; $i<999999999999; $i += 50000) {
echo "Query from $i to " . ($i + 50000) . ", please wait...\n";
$query = mysql_unbuffered_query('SELECT * FROM table LIMIT ' . $i . ', 50000')or die(mysql_error());
// This was code I used before, but for some reason it got valid when it wasn't supposed to.
/*if(mysql_num_rows($query) == 0) {
echo "Script finished!\n";
break;
}*/
while($r = mysql_fetch_assoc($query)) {
$a = array('campaign_id' => $r['campaign_id'],
'criteria_id' => $r['criteria_id'],
'date_added' => $r['date_added'],
);
$redis->hmset($r['user_id'], $a);
unset($a);
usleep(10);
}
echo "Query completed for 50000 rows..\n";
sleep(2);
}
unset($redis);
?>
My question is how to do this better, I have seriously no idea why it crashes. My server is pretty old and slow and maybe can't handle this large amount of data? This is just a testserver before we switch to real production.
Worth to notice is that the script ran fine for maybe half an hour and it may be the limit statement that makes it very slow when the number get high? Is there then an easier way to do this? I need to transfer all the data today! :)
Thanks in advance.
EDIT: running example:
Query from 3410000 to 3460000, please wait...
Query completed for 50000 rows..
Query from 3460000 to 3510000, please wait...
Query completed for 50000 rows..
Query from 3510000 to 3560000, please wait...
Query completed for 50000 rows..
Query from 3560000 to 3610000, please wait...
MySQL server has gone away
EDIT:
The table consist of ~5 million rows of data and is approx. 800 MB in size.
But I need to do similar things for even larger tables later on..

First, you may want to use another script language. Perl, Python, Ruby, anything is better than PHP to run this kind of scripts.
I cannot comment on why the mysql connection is lost, but to get better performance you need to try to eliminate as many roundtrips as you can with the mysql server and the redis server.
It means:
you should not use unbuffered queries but buffered ones (provided LIMIT is used in the query)
OR
you should not iterate on the mysql query using LIMIT since you get a quadratic complexity while it should be only linear. I don't know if it can be avoided in PHP though.
you should pipeline the commands you sent to Redis
Here is an example of pipelining with Predis:
https://github.com/nrk/predis/blob/v0.7/examples/PipelineContext.php
Actually, if I really had to use PHP for this, I would export the mysql data in a text file (using "select into outfile" for instance), and then read the file and use pipelining to push data to Redis.

PDO/MySQL memory consumption with large result set

I'm having a strange time dealing with selecting from a table with about 30,000 rows.
It seems my script is using an outrageous amount of memory for what is a simple, forward only walk over a query result.
Please note that this example is a somewhat contrived, absolute bare minimum example which bears very little resemblance to the real code and it cannot be replaced with a simple database aggregation. It is intended to illustrate the point that each row does not need to be retained on each iteration.
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$stmt = $pdo->prepare('SELECT * FROM round');
$stmt->execute();
function do_stuff($row) {}
$c = 0;
while ($row = $stmt->fetch()) {
// do something with the object that doesn't involve keeping
// it around and can't be done in SQL
do_stuff($row);
$row = null;
++$c;
}
var_dump($c);
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(43005064)
int(43018120)
I don't understand why 40 meg of memory is used when hardly any data needs to be held at any one time. I have already worked out I can reduce the memory by a factor of about 6 by replacing "SELECT *" with "SELECT home, away", however I consider even this usage to be insanely high and the table is only going to get bigger.
Is there a setting I'm missing, or is there some limitation in PDO that I should be aware of? I'm happy to get rid of PDO in favour of mysqli if it can not support this, so if that's my only option, how would I perform this using mysqli instead?

After creating the connection, you need to set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false:
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
// snip
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(653920)
int(668136)
Regardless of the result size, the memory usage remains pretty much static.

Another option would be to do something like:
$i = $c = 0;
$query = 'SELECT home, away FROM round LIMIT 2048 OFFSET %u;';
while ($c += count($rows = codeThatFetches(sprintf($query, $i++ * 2048))) > 0)
{
foreach ($rows as $row)
{
do_stuff($row);
}
}

The whole result set (all 30,000 rows) is buffered into memory before you can start looking at it.
You should be letting the database do the aggregation and only asking it for the two numbers you need.
SELECT SUM(home) AS home, SUM(away) AS away, COUNT(*) AS c FROM round

The reality of the situation is that if you fetch all rows and expect to be able to iterate over all of them in PHP, at once, they will exist in memory.
If you really don't think using SQL powered expressions and aggregation is the solution you could consider limiting/chunking your data processing. Instead of fetching all rows at once do something like:
1) Fetch 5,000 rows
2) Aggregate/Calculate intermediary results
3) unset variables to free memory
4) Back to step 1 (fetch next set of rows)
Just an idea...

I haven't done this before in PHP, but you may consider fetching the rows using a scrollable cursor - see the fetch documentation for an example.
Instead of returning all the results of your query at once back to your PHP script, it holds the results on the server side and you use a cursor to iterate through them getting one at a time.
Whilst I have not tested this, it is bound to have other drawbacks such as utilising more server resources and most likely reduced performance due to additional communication with the server.
Altering the fetch style may also have an impact as by default the documentation indicates it will store both an associative array and well as a numerical indexed array which is bound to increase memory usage.
As others have suggested, reducing the number of results in the first place is most likely a better option if possible.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.