Laravel from database table to another table 1mill rows, slow

Laravel from database table to another table 1mill rows, slow - php

So basically, I got 2 tables.
The first table contains 1 million rows or something, with an empty field called 'telefon'.
Now, I got a second table, which has the field values for 'telefon' in the other table.
I came up with this solution, but this takes forever. It has been an hour, and when inspecting the database table, only 1600 rows are done. Is there any faster ways of doing this? Thanks in advance.
DB::table('phones') -> orderBy('id') -> chunk(100, function($old) {
foreach ($old as $x) {
DB::table('companies')
-> where('organisasjonsnummer', $x -> businessCode)
-> update([
'telefon' => $x -> contact
]);
}
});

Huh, foreach + queries is almost always bad. If I am not mistaken, you would like to do this:
UPDATE companies, phones SET companies.telefon = phones.contact WHERE companies.organisasjonsnummer = phones.businessCode
It may be very slow if there's no index on companies.organisasjonsnummer and phones.businessCode columns, but it can take a lot of time to index them now as well, so I'm not sure if there's any benefit to index them now if they won't be used later. Anyway, using a single query should be faster at least to some extent.

Always remember, when you use Eloquent/SQL inside a loop, you will run a command for each round.
The same applies for lazy loading.
In this case you should use \DB::statement("put your sql here"); or in this case \DB::update("put your update here");, let the database do the service for you and be happy!.

Related

Doctrine MIN() low performance

I have table (~150 columns with ~150k records) and project on Symfony 3 with Doctrine. In project is clasic filter to show results.
If you submit form i collect data in object $selectedInputOptions and build query looks like:
$query = $repository
->createQueryBuilder('t')
->select('t.idkatcountry', 't.idkatlocality', 't, MIN(t.price) AS priceFrom'......);
if(count($selectedInputOptions->getCountry()) > 0)
$query->andWhere('t.idkatcountry IN (:idkatcountry )')->setParameter('idkatcountry ', $selectedInputOptions->getCountry());
if(count($selectedInputOptions->getLocality()) > 0)
$query->andWhere('t.idkatlocality IN (:idkatlocality )')->setParameter('idkatlocality ', $selectedInputOptions->getLocality());
price column have decimal(15,2) datatype
Before i have in $repository->select('t.price') and everything was OK but after change this to 't, MIN(t.price) AS priceFrom' query execution time was increased +40% and in few cases (any input in form be blank = checks all records) +900%.
So my questions:
How i can cut execution time? (Is there some idexes for this?, Will help change datetype range let's say to decimal(6,2)?)
And bonus question :) Table has ~150columns but query for filtering using ~10-15 columns can i set some type of index for quicker selects?
EDIT:
changed column price to ineger - did not help
added index to column pricte - did not help
SOLUTION!
It was little mistake in select parameter using MIN().
Insted of:
't, MIN(t.price) AS priceFrom'
I used:
'MIN(t.price) AS priceFrom')'
Because t takes ALL columns (~150 in my case) and I didn't notice this... So now is everything OK and time is normal.

Here you can do one thing, stop loading unwanted data in entity, by using unset in jsonSerialize() method.

Duplicate Entry Mysql TokuDB Wiht Many Clients

I have a strange situation.
Suppose I have a very simple function in php (I used Yii but the problem is general) which is called inside a transaction statement:
public function checkAndInsert($someKey)
{
$data = MyModel::model()->find(array('someKey'=>$someKey)); // search a record in the DB.If it does not exist, insert
if ( $data == null)
{
$data->someCol = 'newOne';
$data->save();
}
else
{
$data->someCol = 'test';
$data->save();
}
}
...
// $db is the instance variable used for operation on the DB
$db->transaction();
$this->checkAdnInsert();
$db->commit();
That said, if I run the script containing this function by staring many processes, I will have duplicate values in the DB. For example, if I have $someKey='pippo', and I run the script by starting 2 processes, I will have two (or more) records with column "someCol" = "newOne". This happens randomly, not always.
Is the code wrong? Should I put some constraint in DB in form of KEYs?
I also read this post about adding UNIQUE indexes to TokuDB which says that UNIQUE KEY "kills" write performance...

The approach you have is wrong. It's wrong because you delegate the authority for integrity/uniqueness check to PHP, but it's the database that's responsible for that.
In other words, you don't have to check whether something exists and then insert. That's bad because there's always some slight ping involved between PHP and MySQL and as you already saw - you can get false results for your checks.
If you need unique values for certain column or combination of columns, you add a UNIQUE constraint. After that you simply insert. If the record exists, insert fails and you can deal with it via Exception. Not only is it faster, it's also easier for you because your code can become a one-liner which is much easier to maintain or understand.

Laravel - multi-insert rows and retrieve ids

I'm using Laravel 4, and I need to insert some rows into a MySQL table, and I need to get their inserted IDs back.
For a single row, I can use ->insertGetId(), however it has no support for multiple rows. If I could at least retrieve the ID of the first row, as plain MySQL does, would be enough to figure out the other ones.

It's mysql behavior of
last-insert-id
Important
If you insert multiple rows using a single INSERT statement, LAST_INSERT_ID() returns the value generated for the first inserted row only. The reason for this is to make it possible to reproduce easily the same INSERT statement against some other server.
u can try use many insert and take it ids or after save, try use $data->id should be the last id inserted.

If you are using INNODB, which supports transaction, then you can easily solve this problem.
There are multiple ways that you can solve this problem.
Let's say that there's a table called Users which have 2 columns id, name and table references to User model.
Solution 1
Your data looks like
$data = [['name' => 'John'], ['name' => 'Sam'], ['name' => 'Robert']]; // this will insert 3 rows
Let's say that the last id on the table was 600. You can insert multiple rows into the table like this
DB::begintransaction();
User::insert($data); // remember: $data is array of associative array. Not just a single assoc array.
$startID = DB::select('select last_insert_id() as id'); // returns an array that has only one item in it
$startID = $startID[0]->id; // This will return 601
$lastID = $startID + count($data) - 1; // this will return 603
DB::commit();
Now, you know the rows are between the range of 601 and 603
Make sure to import the DB facade at the top using this
use Illuminate\Support\Facades\DB;
Solution 2
This solution requires that you've a varchar or some sort of text field
$randomstring = Str::random(8);
$data = [['name' => "John$randomstring"], ['name' => "Sam$randomstring"]];
You get the idea here. You add that random string to a varchar or text field.
Now insert the rows like this
DB::beginTransaction();
User::insert($data);
// this will return the last inserted ids
$lastInsertedIds = User::where('name', 'like', '%' . $randomstring)
->select('id')
->get()
->pluck('id')
->toArray();
// now you can update that row to the original value that you actually wanted
User::whereIn('id', $lastInsertedIds)
->update(['name' => DB::raw("replace(name, '$randomstring', '')")]);
DB::commit();
Now you know what are the rows that were inserted.

As user Xrymz suggested, DB::raw('LAST_INSERT_ID();') returns the first.
According to Schema api insertGetId() accepts array
public int insertGetId(array $values, string $sequence = null)
So you have to be able to do
DB::table('table')->insertGetId($arrayValues);
Thats speaking, if using MySQL, you could retrive the first id by this and calculate the rest. There is also a DB::getPdo()->lastInsertId(); function, that could help.
Or if it returened the last id with some of this methods, you can calculate it back to the first inserted too.
EDIT
According to comments, my suggestions may be wrong.
Regarding the question of 'what if row is inserted by another user inbetween', it depends on the store engine. If engine with table level locking (MyISAM, MEMORY, and MERGE) is used, then the question is irrevelant, since thete cannot be two simultaneous writes to the table.
If row-level locking engine is used (InnoDB), then, another possibility might be to just insert the data, and then retrieve all the rows by some known field with whereIn() method, or figure out the table level locking.

$result = Invoice::create($data);
if ($result) {
$id = $result->id;
it worked for me
Note: Laravel version 9

mysql | PHP | Join within own table

i dont know if i am doing right or wrong, please dont judge me...
what i am trying to do is that if a record belongs to parent then it will have parent id assosiated with it.. let me show you my table schema below.
i have two columns
ItemCategoryID &
ItemParentCategoryID
Let Suppose a record on ItemCategoryID =4 belongs to ItemCategoryID =2 then the column ItemParentCategoryID on ID 4 will have the ID of ItemCategoryID.
I mean a loop with in its own table..
but problem is how to run the select query :P
I mean show all the parents and childs respective to their parents..

This is often a lazy design choise. Ideally you want a table for these relations or/and a set number of depths. If a parent_id's parent can have it's own parent_id, this means a potential infinite depth.
MySQL isn't a big fan of infinite nesting depths. But php don't mind. Either run multiple queryies in a loop such as Nil'z's1, or consider fetching all rows and sorting them out in arrays in php. Last solution is nice if you pretty much always get all rows, thus making MySQL filtering obsolete.
Lastly, consider if you could have a more ideal approach to this in your database structure. Don't be afraid to use more than one table for this.
This can be a strong performance thief in the future. An uncontrollable amount of mysql queries each time the page loads can easily get out of hands.

Try this:
function all_categories(){
$data = array();
$first = $this->db->select('itemParentCategoryId')->group_by('itemParentCategoryId')->get('table')->result_array();
if( isset( $first ) && is_array( $first ) && count( $first ) > 0 ){
foreach( $first as $key => $each ){
$second = $this->db->select('itemCategoryId, categoryName')->where_in('itemParentCategoryId', $each['itemParentCategoryId'])->get('table')->result_array();
$data[$key]['itemParentCategoryId'] = $each['itemParentCategoryId'];
$data[$key]['subs'] = $second;
}
}
print_r( $data );
}

I don't think you want/can to do this in your query since you can nest a long way.
You should make a getChilds function that calls itself when you retrieve a category. This way you can nest more than 2 levels.
function getCategory()
{
// Retrieve the category
// Get childs
$childs = $this->getCategoryByParent($categoryId);
}
function getCategorysByParent($parentId)
{
// Get category
// Get childs again.
}

MySQL does not support recursive queries. It is possible to emulate recursive queries through recursive calls to a stored procedure, but this is hackish and sub-optimal.
There are other ways to organise your data, these structures allow very efficient querying.

This question comes up so often I can't even be bothered to complain about your inability to use Google or SO search, or to offer a wordy explanation.
Here - use this library I made: http://codebyjeff.com/blog/2012/10/nested-data-with-mahana-hierarchy-library so you don't bring down your database

Transfer data into optimized MySQL tables from "raw" tables in cron with php

I'm working with an MLS real estate listing provider (RETS). Every 48 hours we will be pulling data from their server in a cron job to an SQL database. I'm charged with the task of writing a php script that will be run after the data from the remote server is dumped into our "raw" tables. In these raw tables, all columns are VARCHAR(255), and we want to move the data into optimized tables. Before I send my script to the guy in charge of setting up the cron job, I wondered if there is a more efficient way to do it so I don't look foolish.
Here's what I'm doing:
There are 8 total tables, 4 raw and 4 optimized - all in the same database. The raw table column names are non descriptive, like c1,c2,c2,c4 etc. This is intentional because the data that goes in each column may change. The raw table column names are mapped to the correct optimized table columns with php, something like this:
$tables['optimized_table_name1']['raw_table'] = 'raw_table_name1';
$tables['optimized_table_name1']['data_map'] = array(
'c1' => array( // <--- "c1" is the raw table column name
'column_name' => 'id',
// I use other values for table creation,
// but they don't matter to the question.
// Just explaining why the array looks like this
//'type' => 'VARCHAR',
//'max_length' => 45,
//'primary_key' => FALSE,
// etc.
),
'c9' => array('column_name' => 'address'),
'c25' => array('column_name' => 'baths'),
'c2' => array('column_name' => 'bedrooms') //etc.
);
I'm doing the same thing for each of the 4 tables: SELECT * FROM the raw table, read the config array and create a huge SQL insert statement, TRUNCATE the optimized table, then run the INSERT query.
foreach ($tables as $table_name => $config):
$raw_table = $config['raw_table'];
$data_map = $config['data_map'];
$fields = array();
$values = array();
$count = 0;
// Get the raw data and create an array mapped to the optimized table columns.
$query = mysql_query("SELECT * FROM dbname.{$raw_table}");
while ($row = mysql_fetch_assoc($query))
{
// Reading column names from my config file on first pass
// Setting up the array, will only run once per table
if (empty($fields))
{
foreach ($row as $key => $val)
{// Produces an array with the column names
$fields[] = $data_map[$key]['column_name'];
}
}
foreach ($row as $key => $val)
{// Assigns data to an array to be imploded later
$values[$count][] = $val;
}
$count++;
}
// Create the INSERT statement string
$insert = array();
$sql = "\nINSERT INTO `{$table_name}` (`".implode('`,`', $fields)."`) VALUES\n";
foreach ($values as $key => $vals)
{
foreach ($vals as &$val)
{
// Escape the data
$val = mysql_real_escape_string($val);
}
// Using implode for simplicity, could avoid the nested foreach if I wanted to
$insert[] = "('".implode("','", $vals)."')";
}
$sql .= implode(",\n", $insert).";\n";
// TRUNCATE optimized table and run INSERT query here
endforeach;
Which produces something like this (only larger - about 15,000 records max per table, and one insert statement per table):
INSERT INTO `optimized_table_name1` (`id`,`beds`,`baths`,`town`) VALUES
('50300584','2','1','Fairfield'),
('87560584','3','2','New Haven'),
('76545584','2','1','Bristol');
Now I'll admit, I have been under the wing of an ORM for a long time and am not up on my vanilla mysql/php. This is a pretty simple task and I want to keep the code simple.
My questions:
Is the TRUNCATE/INSERT method a good way to do this?
Is there anything about my code that you can see being a problem? I know you see nested foreach loops and just shudder, but I want to keep the code as small clean as possible and avoid lots of messy string concatenation (to produce the insert query). Like I said, I also haven't used native php functions for SQL in a long time.
I feel like it really doesn't matter if the code is not optimized if it is run at 3AM every 2 days. Does it matter? Is this code going to preform OK?
Is there a better overall strategy to accomplish this task?
Do I need to be using transactions?
How can I be aware of errors that may occur in cron scripts?
Apologize if I don't use correct cron jargon, it's new to me.

Keep it simple. ORM would be swell for this task.
Answers:
Yes.
Your code is readable. At least I did not have any problems to read it.
We had a script that ran early in the morning. It was not optimized and consumed a lot of memory. After FOUR years it started to consume over 512 Mb. I've spent 2 hours to optimize it to, so now it consumes 7 Mb (pretty good optimization, huh? :) ). I personally think it is "ok" that your script is not optimized now. If this script will start failing, you'll figure what the problem is. Maybe it will exhaust memory, maybe your SQL queries will cause deadlocks... maybe you will later optimize it to READ from slave servers... I don't know, but it works fine now, that's okay.
I'd do something similar to your code. But I'd probably generate file first and load data into the server by running shell command mysql -u username --password=password < import_file.sql. So I'd have my file stored somewhere on a disk so I cal always take a look at it. And maybe even edit for one-time correction load. But you still can do it by writing your sql statement into file.
No. It is just one query. If you use InnoDB engine it is already a transaction.
First, use error_reporting(E_ALL & ~E_NOTICE). Second, use mysql_error PHP function to ensure your query performed correctly. Third, in your cronjob output errors stream into some file like so: 0 7 * * 0 /path/to/php -c /path/to/php.ini /path/to/script.php 2> /tmp/errors_file And thus you can create SECOND script runnin after first one to notify about errors in script.php by email or.... whatever way of notifying you prefer. I'd prefer to register_shutdown_functions that would check for error_file and if it is not empty, notify you and delete it afterwards.
Just my opinion, but I hope my answer helps though.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.