insert dummy data to mysql fast - php

I have 1 function in my debug model which i want to use in order to add dummy data to my app to test its speed and such...
the problem is that it needs to add records to 2 different tables and also check for duplicates usernames etc before each record is added to db so it takes a little time...
also this procedure is repeated about $total different dummy records i want to add at once in a for loop...
for example for 100 new users i want to add it takes around 5 seconds to proceed.
is this time fine or do i need to optimize it?
what if i want to add 1000,10000 users at once?? is it possible?
EDIT:
Function called to insert data:
public function registerRandomUsers($total = 1){
$this->load->model("misc_model");
$this->load->model("encryption_model");
$this->load->model("signup_model");
for ($i=1;$i<=$total;$i++){
$username = $this->misc_model->generateRandomString(15);
$flag = false;
while ($flag == false){
if ($this->user_model->usernameExist($username)){
$username = $this->misc_model->generateRandomString(15);
}else{
$flag = true;
$password = 'Test123';
$email = $username.'#email.com';
$data = array(
'username' => $username,
'password' => $password,
'email' => $email
);
$this->signup_model->submitRegistration($data);
$userdata = $this->user_model->getUserData($username, "username");
}
}
}
}

If you're not worried about having a random string as the user name, just set the $email = 'user'.$i.'#email.com'; (so you don't have to worry about collisions). The main reason this will be running slow is because you're sending a new query to the database on each iteration of the loop - it would be much much faster to generate a bulk insert string like:
INSERT INTO user (email,pass)
VALUES ('user1#email.com','Test123')
, ('user2#email.com','Test123')
, ('user3#email.com','Test123')
, ('user4#email.com','Test123')
, ('user5#email.com','Test123');
This way you can avoid the overhead of tcp traffic from sending 10000 queries to the database and have it do it all in one go.

I think if you're really looking for some realistic sample/test data, you should use generatedata.com.
http://www.generatedata.com/
It's one of the best I have seen so far.

Build your query as this
$conjuctions = str_repeat("('dummy#email.com','test pass'),", 20); // 20 dummy datas
$query = "INSERT INTO user (email,pass) VALUES ".substr($conjunctions,0,str_len($conjuctions).";"
// ^ This is to remove the last comma

Related

How to handle a large post request

Unfortunately I can't show you the code but I can give you an idea of what it looks like, what it does and what the problem I have...
<?php
include(db.php);
include(tools.php);
$c = new GetDB(); // Connection to DB
$t = new Tools(); // Classes to clean, prevent XSS and others
if(isset($_POST['var'])){
$nv = json_decode($_POST['var'])
foreach($nv as $k) {
$id = $t->clean($k->id);
// ... goes on for about 10 keys
// this might seems redundant or insufficient
$id = $c->real_escape_string($id);
// ... goes on for the rest of keys...
$q = $c->query("SELECT * FROM table WHERE id = '$id'");
$r = $q->fetch_row();
if ($r[1] > 0) {
// Item exist in DB then just UPDATE
$q1 = $c->query(UPDATE TABLE1);
$q4 = $c->query(UPDATE TABLE2);
if ($x == 1) {
$q2 = $c->query(SELECT);
$rq = $q2->fetch_row();
if ($rq[0] > 0) {
// Item already in table just update
$q3 = $c->query(UPDATE TABLE3);
} else {
// Item not in table then INSERT
$q3 = $c->query(INSERT TABLE3);
}
}
} else {
// Item not in DB then Insert
$q1 = $c->query(INSERT TABLE1);
$q4 = $c->query(INSERT TABLE2);
$q3 = $c->query(INSERT TABLE4);
if($x == 1) {
$q5 = $c->query(INSERT TABLE3);
}
}
}
}
As you can see is a very basic INSERT, UPDATE tables script, so before we release to full production we did some test to see that the script is working as it should, and the "result" where excellent...
So, we ran this code against 100 requests, everything when just fine... less than 1.7seconds for the 100 requests... but then we saw the amount of data that needed to be send/post it was a jaw drop for me... over 20K items it takes about 3 to 5min to send the post but the script always crash the "data" is an array in json
array (
[0] => array (
[id] => 1,
[val2] => 1,
[val3] => 1,
[val4] => 1,
[val5] => 1,
[val6] => 1,
[val7] => 1,
[val8] => 1,
[val8] => 1,
[val9] => 1,
[val10] => 1
),
[1] => array (
[id] => 2,
[val2] => 2,
[val3] => 2,
[val4] => 2,
[val5] => 2,
[val6] => 2,
[val7] => 2,
[val8] => 2,
[val8] => 2,
[val9] => 2,
[val10] => 2
),
//... about 10 to 20K depend on the day and time
)
but in json... any way, sending this information is not a problem, like I said it can take about 3 to 5mins the problem is the code that does the job receiving the data and do the queries... in a normal shared hosting we get a 503 error which by doing a debug it turn out to be a time out, so for our VPS we can increment the max_execution_time to whatever we need to, to process 10K+ it takes about 1hr in our VPS, but in a shared hosting we can't use max_execution_time... So I ask the other developer the one that is sending the information that instead of sending 10K+ in one blow to send a batch of 1K and let it rest for a second then send another batch..and so on ... so far I haven't got any answer... so I was thinking to do the "pause" on my end, say, after process 1K of items wait for a sec then continue but I don't see it as efficient as receiving the data in batches... how would you solve this?
Sorry, I don't have enough reputation to comment everywhere, yet, so I have to write this in an answer. I would recommend zedfoxus' method of batch processing above. In addition, I would highly recommend figuring out a way of processing those queries faster. Keep in mind that every single PHP function call, etc. gets multiplied by every row of data. Here are just a couple of the ways you might be able to get better performance:
Use prepared statements. This will allow MySQL to cache the memory operation for each successive query. This is really important.
If you use prepared statements, then you can drop the $c->real_escape_string() calls. I would also scratch my head to see what you can safely leave out of the $t->clean() method.
Next I would evaluate the performance of evaluating every single row individually. I'd have to benchmark it to be sure, but I think running a few PHP statements beforehand will be faster than making umpteen unnecessary MySQL SELECT and UPDATE calls. MySQL is much faster when inserting multiple rows at a time. If you expect multiple rows of your input to be changing the same row in the database, then you might want to consider the following:
a. Think about creating a temporary, precompiled array (depending on memory usage involved) that stores the unique rows of data. I would also consider doing the same for the secondary TABLE3. This would eliminate needless "update" queries, and make part b possible.
b. Consider a single query that selects every id from the database that's in the array. This will be the list of items to use an UPDATE query for. Update each of these rows, removing them from the temporary array as you go. Then, you can create a single, multi-row insert statement (prepared, of course), that does all of the inserts at a single time.
Take a look at optimizing your MySQL server parameters to better handle the load.
I don't know if this would speed up a prepared INSERT statement at all, but it might be worth a try. You can wrap the INSERT statement within a transaction as detailed in an answer here: MySQL multiple insert performance
I hope that helps, and if anyone else has some suggestions, just post them in the comments and I'll try to include them.
Here's a look at the original code with just a few suggestions for changes:
<?php
/* You can make sure that the connection type is persistent and
* I personally prefer using the PDO driver.
*/
include(db.php);
/* Definitely think twice about each tool that is included.
* Only include what you need to evaluate the submitted data.
*/
include(tools.php);
$c = new GetDB(); // Connection to DB
/* Take a look at optimizing the code in the Tools class.
* Avoid any and all kinds of loops–this code is going to be used in
* a loop and could easily turn into O(n^2) performance drain.
* Minimize the amount of string manipulation requests.
* Optimize regular expressions.
*/
$t = new Tools(); // Classes to clean, prevent XSS and others
if(isset($_POST['var'])){ // !empty() catches more cases than isset()
$nv = json_decode($_POST['var'])
/* LOOP LOGIC
* Definitely test my hypothesis yourself, but this is similar
* to what I would try first.
*/
//Row in database query
$inTableSQL = "SELECT id FROM TABLE1 WHERE id IN("; //keep adding to it
foreach ($nv as $k) {
/* I would personally use specific methods per data type.
* Here, I might use a type cast, plus valid int range check.
*/
$id = $t->cleanId($k->id); //I would include a type cast: (int)
// Similarly for other values
//etc.
// Then save validated data to the array(s)
$data[$id] = array($values...);
/* Now would also be a good time to add the id to the SELECT
* statement
*/
$inTableSQL .= "$id,";
}
$inTableSQL .= ");";
// Execute query here
// Then step through the query ids returned, perform UPDATEs,
// remove the array element once UPDATE is done (use prepared statements)
foreach (.....
/* Then, insert the remaining rows all at once...
* You'll have to step through the remaining array elements to
* prepare the statement.
*/
foreach(.....
} //end initial POST data if
/* Everything below here becomes irrelevant */
foreach($nv as $k) {
$id = $t->clean($k->id);
// ... goes on for about 10 keys
// this might seems redundant or insufficient
$id = $c->real_escape_string($id);
// ... goes on for the rest of keys...
$q = $c->query("SELECT * FROM table WHERE id = '$id'");
$r = $q->fetch_row();
if ($r[1] > 0) {
// Item exist in DB then just UPDATE
$q1 = $c->query(UPDATE TABLE1);
$q4 = $c->query(UPDATE TABLE2);
if ($x == 1) {
$q2 = $c->query(SELECT);
$rq = $q2->fetch_row();
if ($rq[0] > 0) {
// Item already in table just update
$q3 = $c->query(UPDATE TABLE3);
} else {
// Item not in table then INSERT
$q3 = $c->query(INSERT TABLE3);
}
}
} else {
// Item not in DB then Insert
$q1 = $c->query(INSERT TABLE1);
$q4 = $c->query(INSERT TABLE2);
$q3 = $c->query(INSERT TABLE4);
if($x == 1) {
$q5 = $c->query(INSERT TABLE3);
}
}
}
}
The key is to minimize queries. Often, where you are looping over data doing one or more queries per iteration, you can replace it with a constant number of queries. In your case, you'll want to rewrite it into something like this:
include(db.php);
include(tools.php);
$c = new GetDB(); // Connection to DB
$t = new Tools(); // Classes to clean, prevent XSS and others
if(isset($_POST['var'])){
$nv = json_decode($_POST['var'])
$table1_data = array();
$table2_data = array();
$table3_data = array();
$table4_data = array();
foreach($nv as $k) {
$id = $t->clean($k->id);
// ... goes on for about 10 keys
// this might seems redundant or insufficient
$id = $c->real_escape_string($id);
// ... goes on for the rest of keys...
$table1_data[] = array( ... );
$table2_data[] = array( ... );
$table4_data[] = array( ... );
if ($x == 1) {
$table3_data[] = array( ... );
}
}
$values = array_to_sql($table1_data);
$c->query("INSERT INTO TABLE1 (...) VALUES $values ON DUPLICATE KEY UPDATE ...");
$values = array_to_sql($table2_data);
$c->query("INSERT INTO TABLE2 (...) VALUES $values ON DUPLICATE KEY UPDATE ...");
$values = array_to_sql($table3_data);
$c->query("INSERT INTO TABLE3 (...) VALUES $values ON DUPLICATE KEY UPDATE ...");
$values = array_to_sql($table4_data);
$c->query("INSERT IGNORE INTO TABLE4 (...) VALUES $values");
}
While your original code executed between 3 and 5 queries per row of your input data, the above code only executes 4 queries in total.
I leave the implementation of array_to_sql to the reader, but hopefully this should explain the idea. TABLE4 is an INSERT IGNORE here since you didn't have an UPDATE in the "found" clause of your original loop.

Remove a loop to find existing numbers in database to one single query

I am sending an array of numbers separated by commans to server. Basically on server database I have a field that cotains numbers.Server side code checks which numbers are in database and send me the array of those numbers.
Following code is what I am using...
public function already_user()
{
$contacts=$this->input->post('contact');
//$contacts is an array.
$user= explode(',',$contacts);
foreach($user as $number)
{
$data = array (
'username' =>$number
);
$usernumber = $this->chat_model->get(array('username'=>$number)); // a simple query to datbase that check if number exists in database column or not.
if(!$usernumber==""){
$value[]=$usernumber;
}
}
echo json_encode($value);
}
Only drawback about this code is , its extremely slow.... If i have 1000+ numbers it takes a minute.since its a loop Is there any way to fasten this up. Any single mysql query??
Why don't you just write a single query? Parse the string to get an array and query the table in a single go.
$contact_array = array_map('trim', explode(', ', $contacts));
$all_usernumbers = $this->chat_model->get(array('username IN'=> $contact_array));
which should (depending of what your chat_model->get() accepts), translate to:
SELECT * FROM your_table WHERE username IN ('123', '345')
This is all pseudocode since I don't know what your framework accepts but the SQL query above should be valid.

Can't collect data from database the correct way

I have the function getDistance(). The function findDistance() inside the while loop, calculates the distance between 2 users, by using coordinates (latitude-longitude), and returns to var $djson the distance in meters. $distance is a string committed by the user for first time and $user_old["distance"] is a string which is called from a database in $query. I wanted to be able in $matched_names, to save all the names of the users from my database, for who the condition inside if() is true, regarding the sum of the distance of the new user who commits his data and the old ones inside the database. The problem is that $matched_names saves the first name which is called from the database and for as many times the loop goes on without even considering the if() restriction. For example if the first name called in $user is "Mike", and $user has 5 rows then the output will be: Mike,Mike,Mike,Mike,Mike.
I suppose that i have made some mistake in the way things work inside while..
<?php
public function getDistance($uuid, $name, $distance, $latstart, $lonstart, $latend, $lonend, $gcm_regId) {
$query = sprintf("SELECT uid, gcm_regid, name, distance,latstart, lonstart, latend, lonend FROM user_demand WHERE latstart='%s' AND lonstart='%s'",
mysql_real_escape_string($latstart),
mysql_real_escape_string($lonstart));
$user = mysql_query($query);
$no_of_rows = mysql_num_rows($user);
$user_old = mysql_fetch_assoc($user);
while( $user_old = mysql_fetch_assoc($user)) {
$djson = $this->findDistance($latend,$lonend,$user_old["latend"],$user_old["lonend"] );
if ($user_old["distance"] + $distance >= $djson) {
$match = $this ->df->addUserMatch($user_old['gcm_regid'],$user_old['name'],$gcm_regId,$name);
$matched_names = array_fill(0,$no_of_rows,$user_old['name']);
$matched_gcmz = array_fill(0,$no_of_rows,$user_old['gcm_regid']);
}
}
$registatoin_ids = array($gcm_regId);
$message = array("names" =>$matched_names,"gcm" => $matched_gcmz);
$result = $this ->gcm->send_notification($registatoin_ids, $message);
}
?>
What i usually do is, when I'm going to write something that is complicated, is to get it to working without being in a function then break it down into functions.
It is less confusing and easier to troubleshoot problems that way.
It is kind of hard to tell what it is doing since you didn't post your other function and the if statement relies on the output of that function.
Offhand, it could be this line where you are using the $name and $user_old['name']
$match = $this ->df->addUserMatch($user_old['gcm_regid'],$user_old['name'],$gcm_regId,$name);
I think you would want it to match each other. Like if $name = $user_old['name'] then add it, if not, do something else.

Efficient php code to insert array data into mysql table?

So I have a flatfile db in the format of
username:$SHA$1010101010101010$010110010101010010101010100101010101001010:255.255.255.255:1342078265214
Each record on a new line... about 5000+ lines.. I want to import it into a mysql table. Normally I'd do this using phpmyadmin and "file import", but now I want to automate this process by using php to download the db via ftp and then clean up the existing table data and upload the updated db.
id(AUTH INCREMENT) | username | password | ip | lastlogin
The script I've got below for the most part works.. although php will generate an error:
"PHP Fatal error: Maximum execution time of 30 seconds exceeded" I believe I could just increase this time, but on remote server I doubt I'll be allowed, so I need to find better way of doing this.
Only about 1000 records will get inserted into the database before that timeout...
The code I'm using is below.. I will say right now I'm not a pro in php and this was mainly gathered up and cobbled together. I'm looking for some help to make this more efficient as I've heard that doing an insert like this is just bad. And it really sounds bad aswel, as a lot of disk scratching when I run this script on local pc.. I mean why does it want to kill the hdd for doing such a seemingly simple task.
<?php
require ('Connections/local.php');
$wx = array_map('trim',file("auths.db"));
$username = array();
$password = array();
$ip = array();
$lastlogin = array();
foreach($wx as $i => $line) {
$tmp = array_filter(explode(':',$line));
$username[$i] = $tmp[0];
$password[$i] = $tmp[1];
$ip[$i] = $tmp[2];
$lastlogin[$i] = $tmp[3];
mysql_query("INSERT INTO authdb (username,password,ip,lastlogin) VALUES('$username[$i]', '$password[$i]', '$ip[$i]', '$lastlogin[$i]') ") or die(mysql_error());
}
?>
Try this, with bound parameters and PDO.
<?php
require ('Connections/local.php');
$wx = array_map('trim',file("auths.db"));
$username = array();
$password = array();
$ip = array();
$lastlogin = array();
try {
$dbh = new PDO("mysql:host=$ip;dbname=$database", $dbUsername, $dbPassword);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
} catch(PDOException $e) {
echo 'ERROR: ' . $e->getMessage();
}
$mysql_query = "INSERT INTO authdb (username,password,ip,lastlogin) VALUES(:username, :password, :ip, :lastlogin)";
$statement = $dbh->prepare($mysql_query);
foreach($wx as $i => $line) {
set_time_limit(0);
$tmp = array_filter(explode(':',$line));
$username[$i] = $tmp[0];
$password[$i] = $tmp[1];
$ip[$i] = $tmp[2];
$lastlogin[$i] = $tmp[3];
$params = array(":username" => $username[$i],
":password" => $password[$i],
":ip" => $ip[$i],
":lastlogin" => $lastlogin[$i]);
$statement->execute($params);
}
?>
Instead of sending queries to server one by one in the form
insert into table (x,y,z) values (1,2,3)
You should use extended insert syntax, as in:
insert into table (x,y,z) values (1,2,3),(4,5,6),(7,8,9),...
This will increase insert performance by miles. However you need to be careful about how many rows you insert in one statement, since there is a limit to the size of a single SQL can be. So, I'd say start with 100 row packs and see how it goes, then adjust pack size accordingly. Chances are your insert time will go down to like 5 seconds, putting it way under max_execution_time limit.

Lots of 'If statement', or a redundant mysql query?

$url = mysql_real_escape_string($_POST['url']);
$shoutcast_url = mysql_real_escape_string($_POST['shoutcast_url']);
$site_name = mysql_real_escape_string($_POST['site_name']);
$site_subtitle = mysql_real_escape_string($_POST['site_subtitle']);
$email_suffix = mysql_real_escape_string($_POST['email_suffix']);
$logo_name = mysql_real_escape_string($_POST['logo_name']);
$twitter_username = mysql_real_escape_string($_POST['twitter_username']);
with all those options in a form, they are pre-filled in (by the database), however users can choose to change them, which updates the original database. Would it be better for me to update all the columns despite the chance that some of the rows have not been updated, or just do an if ($original_db_entry = $possible_new_entry) on each (which would be a query in itself)?
Thanks
I'd say it doesn't really matter either way - the size of the query you send to the server is hardly relevant here, and there is no "last updated" information for columns that would be updated unjustly, so...
By the way, what I like to do when working with such loads of data is create a temporary array.
$fields = array("url", "shoutcast_url", "site_name", "site_subtitle" , ....);
foreach ($fields as $field)
$$field = mysql_real_escape_string($_POST[$field]);
the only thing to be aware of here is that you have to be careful not to put variable names into $fields that would overwrite existing variables.
Update: Col. Shrapnel makes the correct and valid point that using variable variables is not a good practice. While I think it is perfectly acceptable to use variable variables within the scope of a function, it is indeed better not use them at all. The better way to sanitize all incoming fields and have them in a usable form would be:
$sanitized_data = array();
$fields = array("url", "shoutcast_url", "site_name", "site_subtitle" , ....);
foreach ($fields as $field)
$sanizited_data[$field] = mysql_real_escape_string($_POST[$field]);
this will leave you with an array you can work with:
$sanitized_data["url"] = ....
$sanitized_data["shoutcast_url"] = ....
Just run a single query that updates all columns:
UPDATE table SET col1='a', col2='b', col3='c' WHERE id = '5'
I would recommend that you execute the UPDATE with all column values. It'd be less costly than trying to confirm that the value is different than what's currently in the database. And that confirmation would be irrelevant anyway, because the values in the database could change instantly after you check them if someone else updates them.
If you issue an UPDATE against MySQL and the values are identical to values already in the database, the UPDATE will be a no-op. That is, MySQL reports zero rows affected.
MySQL knows not to do unnecessary work during an UPDATE.
If only one column changes, MySQL does need to do work. It only changes the columns that are different, but it still creates a new row version (assuming you're using InnoDB).
And of course there's some small amount of work necessary to actually send the UPDATE statement to the MySQL server so it can compare against the existing row. But typically this takes only hundredths of a millisecond on a modern server.
Yes, it's ok to update every field.
A simple function to produce SET statement:
function dbSet($fields) {
$set='';
foreach ($fields as $field) {
if (isset($_POST[$field])) {
$set.="`$field`='".mysql_real_escape_string($_POST[$field])."', ";
}
}
return substr($set, 0, -2);
}
and usage:
$fields = explode(" ","name surname lastname address zip fax phone");
$query = "UPDATE $table SET ".dbSet($fields)." WHERE id=$id";

Categories