I have a PHP script on webserver 1 located in country A. I have a DB server located in country B.
The php script queries a large table from the DB server, groups the results, and inserts them back into the DB server (to another table). This is done by a single query (INSERT INTO SELECT...)
My question here is, does the data actually transfer between the web/db server? E.g. is this using GB's of bandwidth on both servers?
If you never deal with any retrieved data from DB server, then the query won't send any data to web server 1. Basically, if you just run the query, the only data that's sent is the text of the query (e.g. INSERT INTO SELECT...), which is probably just a few bytes, and then the response, which is just a success/fail value. That's it.
To say it another way, the data from the SELECT part of your INSERT INTO SELECT query is all dealt with on the DB server, it's never sent to webserver 1.
Even if you did run a SELECT query on a remote databse, you wouldn't get all of the results back. You actually get a resource. This resource is a reference to a set of data still on the remote database. When you do something like fetch_row on that resource, it fetches the next row. At that point, the data is transferred.
You can test this by monitoring the memory usage of your PHP script at various points in it's execution using memory_get_usage. Try:
echo "Memory before query: " . memory_get_usage() . "\n";
$result = $mysqli->query("your select query");
echo "Memory after query: " . memory_get_usage() . "\n";
$data = array();
$i=1;
while ($row = $result->fetch_row()) {
$data[] = $row;
echo "Memory after reading row " . $i++ . ": " . memory_get_usage() . "\n";
}
You should see a very small increase of used memory after your SELECT, and then a steady increase as your iterate over the results.
Related
First off, I know we have to get off of the depreciated php mysql functions and move to mysqli or PDO. However, that transition won't be happening here for a few weeks and I need to get this working like ASAP.
Basically, I have code that works fine on our old server (PHP 5.2.13), as well as smaller queries on our new server (PHP 5.4.20), but for larger queries will only return a partial record set and then just... die I guess? What record it dies on depends on the query, but it pretty much always dies somewhere in the range of record 10k to 15k. I suspect it is dying because of some kind of php.ini setting that sets a limit or something but I have no idea what it would be. I've streamlined the code to the essentials here:
$query = $my_query;
$result = mysql_query($query) or die(mysql_error());
$record_count = mysql_num_rows($result);
echo "Query has returned " . $record_count . " records.<br>";
$y=0;
while($row=mysql_fetch_array($result, MYSQL_ASSOC))
{
echo "START";
echo $y . " ";
foreach($row as $key => $value)
{ echo $value . " "; }
$y=$y+1;
echo "END" . "<br>";
}
echo "GOT OUT OF THERE!";
So yeah, the record_count will echo that the query returned about 250k records, but in the loop it basically will do somewhere between 10-15k records, echo the final "END", but then the loop just plain stops. It doesn't get back to the next "START" nor does it ever get to the "GOT OUT OF THERE!" And again, this same exact code works fine on our old server, as well as smaller queries on our new server.
Anyone have any ideas what the issue is?
It's probably just timing out. You can override the server's default timeout settings for an individual script by adding this line:
set_time_limit(0);
This will allow the script to run forever. If you want to set a different time limit, the parameter is in seconds, so for instance, this will allow the script to run for 5 minutes:
set_time_limit(300);
I have pull back a lot of information and as a result, my page is loading in about 22~24 seconds. Is there anything I can do to optimize my code?
Here is my code:
<?php
$result_rules = $db->query("SELECT source_id, destination_id FROM dbo.rules");
while($row_rules = sqlsrv_fetch_array($result_rules)){
$result_destination = $db->query("SELECT pk_id, project FROM dbo.destination WHERE pk_id=" . $row_rules['destination_id'] . " ORDER by project ASC");
while($row_destination = sqlsrv_fetch_array($result_destination)){
echo "Destination project: ";
echo "<span class='item'>".$row_destination['project']."</span>";
echo "ID: ".$row_rules['destination_id']."<br>";
if ($row_rules['source_id'] == null) {
echo "Source ID for Destination ID".$row_rules['destination_id']." is NULL<br>";
} else {
$result_source = $db->query("SELECT pk_id, project FROM dbo.source WHERE pk_id=" . $row_rules['source_id'] . " ORDER by project ASC");
while($row_source = sqlsrv_fetch_array($result_source)){
echo "Source project: ";
echo $row_source['project'];
echo " ID: ".$row_rules['source_id']."<br>";
}
}
}
}
?>
Here's what my tables look like:
Source table: pk_id:int, project:varchar(50), feature:varchar(50), milestone:varchar(50), reviewGroup:varchar(125), groupId:int
Rules table: pk_id:int, source_id:int, destination_id:int, login:varchar(50), status:varchar(50), batchId:int, srcPGroupId:int, dstPGroupId:int
Destination table: pk_id:int, project:varchar(50), feature:varchar(50), milestone:varchar(50), QAAssignedTo:varchar(50), ValidationAssignedTo:varchar(50), Priority:varchar(50), groupId:int
If you want help with optimizing queries then please provide details of the schema and the output of the explain plan.
Running nested loops is bad for performance. Running queries inside nested loops like this is a recipe for VERY poor performance. Using '*' in select is bad for performance too (particularly as your only ever using a couple of columns).
You should start by optimizing your PHP and merging the queries:
$result_rules = $db->query(
"SELECT rule.destination_id, [whatever fields you need from dbo.rules]
dest.project AS dest_project,
src.project AS src_project,
src.pk_id as src_id
FROM dbo.rules rule
INNER JOIN dbo.destination dest
ON dest.pk_id=rule.destination_id
LEFT JOIN dbo.source src
ON src.pk_id=rule.source_id
ORDER BY rule.destination_id, dest.project, src.project");
$last_dest=false;
$last_src=false;
while($rows = sqlsrv_fetch_array($result)){
if ($row['destination_id']!==$last_dest) {
echo "Destination project: ";
echo "<span class='item'>".$row['dest_project']."</span>";
echo "ID: ".$row['destination_id']."<br>";
$last_dest=$row['destination_id'];
}
if (null===$row['src_id']) {
... I'll let you sort out the rest.
Add an index on (pk_id, project) so it includes all fields important for the query.
Make sure that pk_Id is indexed: http://www.w3schools.com/sql/sql_create_index.asp
Rather than using select *, return only the columns you need, unless you need all of them.
I'd also recommend moving your SQL code to the server and calling the stored procedure.
You could consider using LIMIT if your back end is mysql: http://php.about.com/od/mysqlcommands/g/Limit_sql.htm .
I'm assuming that the else clause is what's slowing up your code. I would suggest saving all the data you're going to need at the start and then accessing the array again in the else clause. Basically, you don't need this to run every time.
$result_destination = $db->query("SELECT * FROM dbo.destination WHERE pk_id=" . $row_rules['destination_id'] . " ORDER by project ASC")
You could grab the data earlier and use PHP to iterate over it.
$result_destinations = $db->query("SELECT * FROM dbo.destination ORDER by project ASC")
And then later in your code use PHP to determine the correct destination. Depending on exactly what you're doing it should shave some amount of time off.
Another consideration is the time it takes for your browser to render the html generated by your php code. The more data you are presenting, the longer it's going to take. Depending on the requirements of your audience, you might want to display only x records at a time.
There are jquery methods of increasing the number of records displayed without going back to the server.
For starters you would want to lower the number of queries run. For example doing a query, looping through those results and running another query, then looping through that result set running more queries is generally considered bad. The number of queries run goes up exponentially.
For example, if you have 100 rows coming back from the first query and 10 rows from each sub-query. The first query returns 100 rows that you loop over. For each of those you query again. You are now at 101 queries. Then, for each of those 100 you run another query each returning 10 rows. You are now at 1001 queries. Each query has to send data to the server (the query text), wait for a response and get data back. That is what takes so long.
Use a join to do a single query on all the tables and loop over the single result.
I have a database with lots of game objects,
which is being queried by the following 3 PHP scripts.
List objects: gets a JSON object with all the items I need
Add object: adds an object to the database
Reset: wipes all objects from the table
All three of them work, somewhat. Although, there is a timing mismatch. When the game calls the reset function, it restarts. When the game restarts, it automatically loads all the objects. Unfortunately,and here's the problem, if the game has just been reset, objects will still be pulled by script 1.
I know of transactions, but I have never used them and I have no idea how I would implement those here, since my transaction is involving things from different scripts that are run at different times.
For bonus credit: will this setup (AS3 > php > MySQL) get me in trouble with a heavy load? The game might get picked up by 10, 100, 1000 people, is there anything I can read about that subject?
Edit: new idea/question
Currently, the wiping works as such: The objects table has a field 'deleted' which is set to '1' when the reset method is called. It might be smarter to copy the existing data into an archive table and then truncate the live table...
Edit: Here's the (relevant) PHP code I'm using
Add Object:
if ($db_found) {
$x = $_GET['x'];
$y = $_GET['y'];
$type = $_GET['type'];
$name = $_GET['name'];
$text = $_GET['text'];
$SQL = "INSERT INTO bodies (x,y,type,name,text)
VALUES ('".$x."','".$y."','".$type."','".$name."','".$text."' )";
if (!mysql_query($SQL))
{
die('Error: ' . mysql_error());
}
};
mysql_close($db_handle);
List/Get Objects:
if ($db_found) {
$SQL = "SELECT * FROM bodies WHERE deleted = 0";
$res = mysql_query($SQL);
$rows = array();
while($r = mysql_fetch_assoc($res)) {
print $r['x'] . ','
. $r['y']
. ','
. $r['type']
. ','
. $r['name']
. ','
. $r['text']
. ';';
}
};
mysql_close($db_handle);
Reset: (EDIT 2)
mysql_query("LOCK TABLES bodies WRITE;");
$SQL = " DELETE FROM bodies";
if (!mysql_query($SQL))
{
die('Error: ' . mysql_error());
}
};
mysql_query("UNLOCK TABLES;");
How to do Transactions in MySQL.
In your case you might be interessted in the atomicity and isolation of transactions, meaning that when restarting a game, you want to ensure that before the reset has not fully finished, nobody can fetch any of your intermediate data. Doing the reset inside a transaction will ensure this property*. (* for TRUNCATE see below)
You will need InnoDB as your Engine for all tables that are involved in your transactions. MyISAM does not support transactions.
Changing large amounts of data inside a transaction can potentially cause high query delays, as transaction use special undo/redo-logs to be able to undo all the things you did in your transaction, if you decide to ROLLBACK.
I wouldn't wipe the tables when starting a new game. Instead give your data a game_id and use a new game_id when starting a new game. Space shouldn't really be an issue nowadays. This has the advantage that you will need little to none table locking when reseting the game.
If you must, be sure to use TRUNCATE when clearing out the tables. As far as I know TRUNCATE in MySQL cannot be rolled back, so doing it inside a transaction won't do anything useful.
I think PHP/MySQL will perform fine if used correctly, even for larger visitor counts. You can use profiling tools like xdebug or the MySQL slow query log to trace and remove performance bottle necks.
I have a bit of a problem when I try to take a huge amount of data from a mysql table to a redis database. Anyway I'm getting the error "MySQL server has gone away" after a while and I have no idea why..
EDIT:
OR when I use the commented code that breaks the loop it just goes "finished" when it isn't finished.
This is the php code I use (runned by php-cli):
<?php
require 'Predis/Autoloader.php';
Predis\Autoloader::register();
mysql_connect('localhost', 'root', 'notcorrect') or die(mysql_error());
mysql_select_db('database_that_i_use') or die(mysql_error());
$redis = new Predis\Client();
//starting on 0 but had to edit this when it crashed :(
for($i = 3410000; $i<999999999999; $i += 50000) {
echo "Query from $i to " . ($i + 50000) . ", please wait...\n";
$query = mysql_unbuffered_query('SELECT * FROM table LIMIT ' . $i . ', 50000')or die(mysql_error());
// This was code I used before, but for some reason it got valid when it wasn't supposed to.
/*if(mysql_num_rows($query) == 0) {
echo "Script finished!\n";
break;
}*/
while($r = mysql_fetch_assoc($query)) {
$a = array('campaign_id' => $r['campaign_id'],
'criteria_id' => $r['criteria_id'],
'date_added' => $r['date_added'],
);
$redis->hmset($r['user_id'], $a);
unset($a);
usleep(10);
}
echo "Query completed for 50000 rows..\n";
sleep(2);
}
unset($redis);
?>
My question is how to do this better, I have seriously no idea why it crashes. My server is pretty old and slow and maybe can't handle this large amount of data? This is just a testserver before we switch to real production.
Worth to notice is that the script ran fine for maybe half an hour and it may be the limit statement that makes it very slow when the number get high? Is there then an easier way to do this? I need to transfer all the data today! :)
Thanks in advance.
EDIT: running example:
Query from 3410000 to 3460000, please wait...
Query completed for 50000 rows..
Query from 3460000 to 3510000, please wait...
Query completed for 50000 rows..
Query from 3510000 to 3560000, please wait...
Query completed for 50000 rows..
Query from 3560000 to 3610000, please wait...
MySQL server has gone away
EDIT:
The table consist of ~5 million rows of data and is approx. 800 MB in size.
But I need to do similar things for even larger tables later on..
First, you may want to use another script language. Perl, Python, Ruby, anything is better than PHP to run this kind of scripts.
I cannot comment on why the mysql connection is lost, but to get better performance you need to try to eliminate as many roundtrips as you can with the mysql server and the redis server.
It means:
you should not use unbuffered queries but buffered ones (provided LIMIT is used in the query)
OR
you should not iterate on the mysql query using LIMIT since you get a quadratic complexity while it should be only linear. I don't know if it can be avoided in PHP though.
you should pipeline the commands you sent to Redis
Here is an example of pipelining with Predis:
https://github.com/nrk/predis/blob/v0.7/examples/PipelineContext.php
Actually, if I really had to use PHP for this, I would export the mysql data in a text file (using "select into outfile" for instance), and then read the file and use pipelining to push data to Redis.
I do have a array which will read text file and do api call and then insert these result to mySQL. but when it run in bulk the response from API server is slow and due to this many of the results are coming blank. what I am looking is is there a way to pause this loop for each call say 5 seconds to get result from api server so it wont get blank results.
this code is below
//connect to your database
mysql_connect("localhost", "root", "password");
mysql_select_db("somedb");
//read your text file into an array, one entry per line
$lines = file('filename.txt');
//loop through each website URL
foreach ($lines as $website_url) {
//make the request to the compete API
$response = file_get_contents("http://apps.compete.com/sites/" . $website_url . "/trended/rank/?apikey=0sdf456sdf12sdf1");
//decode the request
$response = json_decode($request);
//get the rank from the response
$rank = $response['something'];
//insert the website URL and its rank
mysql_query("INSERT INTO website_ranks (website_url, rank) VALUES ('" . mysql_real_escape_string($website_Url) . "', " . $rank . ")");
}
Use the sleep command
sleep (5);
Wouldn't it make more sense to verify the server responded rather than sleeping for an arbitrary amount of time?
One alternate approach I'd recommend using instead of looping with a single INSERT statement is using a BULK INSERT syntax with your RDBMS provider. This will speed up the process considerably.
That syntax usually looks like this:
INSERT INTO tablename(col1, col2) VALUES (...)