MySQL queries optimization - php

So I've been working on an IRC game for one year now, written in PHP and using a PHP to IRC framework.
Recently, I've added the ability to archive scores (they're being reseted every couple hundreds of games) which forced me to update various admin functions.
I've just updated a function that allows me to merge two players (some users don't bother looking for their old password etc...) in order to merge archived scores too (in case a reset has occurred before I find the duplicated accounts).
The score-merging part (below) works has intended, but I'm wondering if I can optimize the process because I find it rather heavy (but can't think of something better) :
$from_stats = $this->db->query("SELECT `games`, `wins`, `points`, `date_archive` FROM ".$this->dbprefix."score WHERE `id`=".$id1." AND `channel`='".$gamechan."' GROUP BY `date_archive`"); // get scores for the original account
$to_stats = $this->db->query("SELECT `games`, `wins`, `points`, `date_archive` FROM ".$this->dbprefix."score WHERE `id`=".$id2." AND `channel`='".$gamechan."' GROUP BY `date_archive`"); // get scores for the duplicated account
$from_games = array();
$from_wins = array();
$from_points = array();
$from_date = array();
while (list($fromstats_games,$fromstats_wins,$fromstats_points,$fromstats_date) = $this->db->fetchRow($from_stats)) { // build score arrays for the original account
$from_games[count($from_games)] = $fromstats_games;
$from_wins[count($from_wins)] = $fromstats_wins;
$from_points[count($from_points)] = $fromstats_points;
$from_date[count($from_date)] = $fromstats_date;
}
$to_games = array();
$to_wins = array();
$to_points = array();
$to_date = array();
while (list($tostats_games,$tostats_wins,$tostats_points,$tostats_date) = $this->db->fetchRow($to_stats)) { // build score arrays for the duplicated account
$to_games[count($to_games)] = $tostats_games;
$to_wins[count($to_wins)] = $tostats_wins;
$to_points[count($to_points)] = $tostats_points;
$to_date[count($to_date)] = $tostats_date;
}
foreach ($from_date as $key1 => $id1_date) {
foreach ($to_date as $key2 => $id2_date) {
if ($id1_date == $id2_date) { // merge scores if dates match
$from_games[$key1] += $to_games[$key2];
$from_wins[$key1] += $to_wins[$key2];
$from_points[$key1] += $to_points[$key2];
$this->db->query("UPDATE ".$this->dbprefix."score SET `games`=".$from_games[$key1].", `wins`=".$from_wins[$key1].", `points`=".$from_points[$key1]." WHERE `id`=".$id1." AND `channel`='".$gamechan."' AND `date_archive`='".$id1_date."'");
break;
}
}
}
$this->db->query("DELETE FROM ".$this->dbprefix."score WHERE `id`=".$id2); // delete all entries for the duplicated account

Just one tip: after all use this query (if You have appropriate privilages)
$this->db->query("OPTIMIZE TABLE ".$this->dbprefix."score");
This should cause all indexes in this table to be recalculated. You'll notice the index file size has changed to 1kb (or few bytes)

Related

Recursive call for cron job

I have a three PHP script, from which two are running as a separate cron job each day in server.
First script(get_products.php), make curl call to external server to get all products data and stored that data in database. Note that there are around 10000 products and increasing each day.
In Second script(find_related.php), selects products from database stored by first script, perform some operations and store operational data in another database. Each product have 10 rows so in this database there is around 100000 rows. This script is running as cron. Sometimes the script is not executed fully and that's why, the actual and expecting results are not stored in database. I included this line of code in script: ini_set('max_execution_time', '3600');
But it not works.
Here is the process done in this script:
Normally task is to find 10 related products based on tags. I have around 10300 products stored in my DB. Each time query take one product and their tags and try to randomly find one product tagged with same tag as main product and store the related product data into another DB for third script. Only one product per tag is allowed. If it will not find total of 10 related products then randomly gets products from another DB named bestseller_products.
Here is my code:
$get_all_products = mysql_query('SELECT * FROM store_products');
while($get_products_sql_res = mysql_fetch_array($get_all_products)){
$related_products = array();
$tags = explode(",",$get_products_sql_res['product_tags']);
$product_id = $get_products_sql_res['product_id'];
$product_handle = $get_products_sql_res['product_handle'];
$get_products_sql = mysql_query('SELECT * FROM related_products WHERE product_handle="'.$product_handle.'"');
if (mysql_num_rows($get_products_sql)==0)
{
$count = 0;
foreach($tags as $t){
$get_related_products_sql = mysql_query("SELECT product_handle, product_title, product_image FROM store_products WHERE product_tags like '%".$t."%' AND product_id != '".$product_id."' ORDER BY RAND()");
if(!$get_related_products_sql){
continue;
}
while($get_related_products = mysql_fetch_array($get_related_products_sql) ){
$related_product_title = mysql_real_escape_string($get_related_products['product_title']);
$found = false;
foreach($related_products as $r){
if($r['handle'] == $get_related_products['product_handle']){
$found = true;
break;
}
}
if($found == false){
$related_products[$count]['handle'] = $get_related_products['product_handle'];
mysql_query("INSERT INTO related_products (product_handle, product_id, related_product_title, related_product_image, related_product_handle) VALUES ('$product_handle','$product_id','$related_product_title', '$get_related_products[2]', '$get_related_products[0]')");
$count = $count + 1;
break;
}
}
}
if($count < 10){
$bestseller_products = mysql_query("SELECT product_handle, product_title, product_image FROM bestseller_products WHERE product_id != '".$product_id."' ORDER BY RAND() LIMIT 10");
while($bestseller_products_sql_res = mysql_fetch_array($bestseller_products)){
if($count < 10){
$found = false;
$related_product_title = mysql_real_escape_string($bestseller_products_sql_res['product_title']);
$related_product_handle = $bestseller_products_sql_res['product_handle'];
foreach($related_products as $r){
if($r['handle'] == $related_product_handle){
$found = true;
break;
}
}
if($found == false){
$related_product_image = $bestseller_products_sql_res['product_image'];
mysql_query("INSERT INTO related_products (product_handle, product_id, related_product_title, related_product_image, related_product_handle) VALUES ('$product_handle','$product_id','$related_product_title', '$related_product_image', '$related_product_handle')");
$count = $count + 1;
}
}
}
}
}
}
Third script(create_metafields.php), created metafields in external server using data created by second script. And same problem arises as in second script.
So i want to execute the second script into parts. I mean, not to process all 10000 products in one call but want to run unto parts(1-500,501-1000,1001-1500,..) like it. But dont want to create separate cron jobs. Please suggest if someone has solution. I really need to figure it out.
Thanks in advance!

MySQL + PHP millions of rows, sorting the data from multiple queries. Memory Consumption

So I have 2 tables. One table has all the actual campaign_id information.
The second table has the impression/statistic information on the campaign_id's
I have a table on the page (i use ajax, but that's besides the point). I want to "sort" a column, but all the rows are generated by the campaign_id table, I run all of the statistics for every campaign first, and then link them up to each row. then after all the info/data is up, it then sorts all of it. this uses a MASSIVE amount of memory and resources. Is this efficient at all? is there a better solution to sorting huge amounts of data?
// I have to increase memory because the sorting takes a lot of resource
ini_set('memory_limit','1028M');
// the column I want to sort
$sortcolumn = $this->input->post('sortcolumn');
// direction of the sort ASC/DESC
$sortby = $this->input->post('sortby');
// millions of impression data that is linked with campaign_id
$cdata= array();
$s = "SELECT report_campaign_id,";
$s .= "SUM(report_imps) AS imps ";
$s .= "FROM Campaign_Impressions";
$s .= "GROUP BY report_campaign_id ";
$r = $this->Qcache->result($s,0,'campaignsql');
foreach($r as $c) {
$cdata[$c->report_campaign_id]['imps'] = ($c->imps) ? $c->imps : 0;
}
// 500,000+ thousand campaigns
// I draw my table from these campaigns
$rows = array();
$s = "SELECT * FROM Campaigns ";
$r = $this->db->query($s)->result();
foreach($r as $c)
{
$row= array();
$row['campaign_id'] = $c->campaign_id;
// other campaign info here...
// campaign statistics here...
$row['campaign_imps'] = $cdata[$c->campaign_id]['imps'];
// table row
$rows[] = $row;
}
// prepare the columns i want to sort
$sortc = array();
foreach($rows as $sortarray) {
if (!isset($sortarray[ $sortcolumn ])) continue;
$sortc[] = str_replace(array('$',','),'',$sortarray[ $sortcolumn ]);
}
// sort columns and direction
array_multisort($sortc,(($sortby==='asc')?SORT_ASC:SORT_DESC),SORT_NATURAL,$rows);
As you can see, the "campaign_impressions" table is running data on "every" campaign, and doesn't seem so efficient, but more effective instead of running a query per row to know the data.
(I dont display all the campaigns, but I need to run every one of them to know the sorting of all)
You should let MySQL do the job my using order by
if this still takes a lot of time on the MySQL side consider using sorted indexes on the columns

DynamoDb retrieve data Order by descending order and then use pagination like sql syntax

I am facing problem to retrieve records in descending order with pagination limit from amazon dynamodb as in mysql.
Now I am using the following script, but it gives unordered list of records. I need the last inserted id is on top.
$limit = 10;
$total = 0;
$start_key = null;
$params = array('TableName' => 'event','AttributesToGet' =>array('id','interactiondate','repname','totalamount','fooding','nonfooding','pdfdocument','isMultiple','payment_mode','interaction_type','products','programTitle','venue','workstepId','foodingOther','interaction_type_other'), 'ScanFilter'=> array('manufacturername' => array("ComparisonOperator" => "EQ", "AttributeValueList" => array(array("S" => "$manufacturername")))),'Limit'=>$limit );
$itemsArray = array();
$itemsArray = array();
$finalItemsArray = array();
$finalCRMRecords = array();
do{
if(!empty($start_key)){
$params['ExclusiveStartKey'] = $start_key->getArrayCopy();
}
$response = $this->Amazon->Dynamodb->scan($params);
if ($response->status == 200) {
$counter = (string) $response->body->Count;
$total += $counter;
foreach($response->body->Items as $itemsArray){
$finalItemsArray[] = $itemsArray;
}
if($total>$limit){
$i =1;
foreach($response->body->Items as $items){
$finalItemsArray[] = $items;
if($i == $limit){
$start_key = $items->id->{AmazonDynamoDB::TYPE_NUMBER}->to_array();
$finalCRMRecords['data'] = $finalItemsArray;
$finalCRMRecords['start_key'] = $start_key;
break;
}
$i++;
}
}elseif($total<$limit){
$start_key = $response->body->LastEvaluatedKey->to_array();
}else{
$finalCRMRecords['data'] = $finalItemsArray;
if ($response->body->LastEvaluatedKey) {
$start_key =$response->body->LastEvaluatedKey->to_array();
break;
} else {
$start_key = null;
}
$finalCRMRecords['start_key'] = $start_key;
}
}
}while($start_key);
Regards
Sandeep Kumar Sinha
A Scan operation in DynamoDB can not change the sorting of the returned items. Also is Scan a pretty expensive operation as it always requires to read the whole table.
If you want to take advantage of DynamoDB, here's one advice:
Instead of looking for information, try to just find it.
In the sense of, use lookups instead of scan/query to get the information you need.
As an example, if you have a table that stores Events. Just store all events in that table, with their EventId as HashKey. Then you can have a second table EventLookups to store lookups to EventIds. In the EventLookups table you could put an Item like LookupId: LATEST-EVENT referencing some EventId: .... Every time you insert new events you can update the LATEST-EVENT entry to point to a newer Event. Or use a SET to store the latest 50 EventIds events in one Item.
-mathias

php sql find and insert in empty slot

I have a game script thing set up, and when it creates a new character I want it to find an empty address for that players house.
The two relevant table fields it inserts are 'city' and 'number'. The 'city' is a random number out of 10, and the 'number' can be 1-250.
What it needs to do though is make sure there's not already an entry with the 2 random numbers it finds in the 'HOUSES' table, and if there is, then change the numbers. Repeat until it finds an 'address' not in use, then insert it.
I have a method set up to do this, but I know it's shoddy- there's probably some more logical and easier way. Any ideas?
UPDATE
Here's my current code:
$found = 0;
while ($found == 0) {
$num = (rand()%250)+1; $city = (rand()%10)+1;
$sql_result2 = mysql_query("SELECT * FROM houses WHERE city='$city' AND number='$num'", $db);
if (mysql_num_rows($sql_result2) == 0) { $found = 1; }
}
You can either do this in PHP as you do or by using a MySQL trigger.
If you stick to the PHP way, then instead of generating a number every time, do something like this
$found = 0;
$cityarr = array();
$numberarr = array();
//create the cityarr
for($i=1; $i<=10;$i++)
$cityarr[] = i;
//create the numberarr
for($i=1; $i<=250;$i++)
$numberarr[] = i;
//shuffle the arrays
shuffle($cityarr);
shuffle($numberarr);
//iterate until you find n unused one
foreach($cityarr as $city) {
foreach($numberarr as $num) {
$sql_result2 = mysql_query("SELECT * FROM houses
WHERE city='$city' AND number='$num'", $db);
if (mysql_num_rows($sql_result2) == 0) {
$found = 1;
break;
}
}
if($found) break;
}
this way you don't check the same value more than once, and you still check randomly.
But you should really consider fetching all your records before the loops, so you only have one query. That would also increase the performance a lot.
like
$taken = array();
for($i=1; $i<=10;$i++)
$taken[i] = array();
$records = mysql_query("SELECT * FROM houses", $db);
while($rec = mysql_fetch_assoc($records)) {
$taken[$rec['city']][] = $rec['number'];
}
for($i=1; $i<=10;$i++)
$cityarr[] = i;
for($i=1; $i<=250;$i++)
$numberarr[] = i;
foreach($cityarr as $city) {
foreach($numberarr as $num) {
if(in_array($num, $taken[]) {
$cityNotTaken = $city;
$numberNotTaken = $number;
$found = 1;
break;
}
}
if($found) break;
}
echo 'City ' . $cityNotTaken . ' number ' . $numberNotTaken . ' is not taken!';
I would go with this method :-)
Doing it the way you say can cause problems when there is only a couple (or even 1 left). It could take ages for the script to find an empty house.
What I recommend doing is insert all 2500 records in the database (combo 1-10 with 1-250) and mark with it if it's empty or not (or create a combo table with user <> house) and match it on that.
With MySQL you can select a random entry from the database witch is empty within no-time!
Because it's only 2500 records, you can do ORDER BY RAND() LIMIT 1 to get a random row. I don't recommend this when you have much more records.

Show all keys with phpcassa

I'm fairly new to cassandra but i have making good progress so far.
$conn = new ConnectionPool('Cluster');
$User = new ColumnFamily($conn, 'User');
$index_exp = CassandraUtil::create_index_expression('email', 'John#dsaads.com');
$index_clause = CassandraUtil::create_index_clause(array($index_exp));
$rows = $User->get_indexed_slices($index_clause);
foreach($rows as $key => $columns) {
echo $columns['name']."<br />";
}
Im using this type of query to get specific date from somebodys email adress.
However, i now want to do 2 things.
Count every user in the database and display the number
List every user in the database with $columns['name']." ".$columns['email']
In mysql i would just remove the 'where attribute' from the select query, however i think its a little bit more complicated here?
In Cassandra there's no easy way to count all of the rows. You basically have to scan everything. If this is something that you want to do often, you're doing it wrong. Example code:
$rows = $User->get_range("", "", 1000000);
$count = 0;
foreach($rows as $row) {
$count += 1;
}
The second answer is similar:
$rows = $User->get_range("", "", 1000000, null, array("name", "email"));
foreach($rows as $key => $columns) {
echo $columns["name"]." ".$columns["email"];
}
Tyler Hobbs give very nice example.
However if you have many users, you do not want to iterate on them all the time.
It is better to have this iteration once or twice per day and to store the data in cassandra or memcached / redis.
I also would do a CF with single row and put all usernames (or user keys) there on single row. However some considered this as odd practice and some people will not recommend it. Then you do:
$count = $cf->get_count($rowkey = 0);
note get_count() is slow operation too, so you still need to cache it.
If get_count() returns 100, you will need to upgrade your phpcassa to latest version.
About second part - if you have less 4000-5000 users, I would once again do something odd - put then on single row as supercolumns. Then read will be with just one operation:
$users = $scf->get($rowkey = 0, new ColumnSlice("", "", 5000));
foreach($users as $user){
echo $user["name"]." ".$user["email"];
}

Categories