I'm using:
A local Postgres DB
Laravel 5 with a MySQL DB also local
Postgres DB contains the table structure from OTRS - with over 91.000 rows on the table "tickets"
I needed to move the data from one DB to the other. What I did was compile a big query that united all the data I needed and run it in a PHP script I created inside the Laravel structure. Then I iterated through the results using Eloquent to insert into MySQL. I know it's terrible but I had to remove the memory and time limits for PHP in order to do this and though it took a really long time, it worked!
Right now I realized I missed something, I need to rerun a similar query (results in the same number of rows), but this time to just add one field on the MySQL DB.
My question is, how can I optimize this process? I'm thinking of using chunks but I don't know how to that.
To clarify:
MySql's Tickets table contains 91397 rows and 5 columns
Postgres Tickets table also has 91397 rows and 6 columns
I created a migration on Laravel (MySQL) that added the extra column (though it's empty)
It's probably easier if I show you the code I have.
link
You can use eloquent to do this. Here's how I'd do (untested, may need some tuning):
Set up both DB connections in the config\database.php.
Make two models for the same resource (you can use only one model but I rather not), I'll name them M1 and M2, with different connection attribute. To do this create an attribute inside the model: protected $connection = 'connection_name'.
So basically:
class M1 extends Model {
protected $connection = 'connection_name';
protected $table = 'table_name';
protected $guarded = [];
Same for M2 with a different $connection.
Now in your controller method or wherever you're executing your code, you can chunk queries using eloquent:
//assuming you are transferring M1's db to M2's
M1::chunk(200, function($m1s) //if you wanna use only one model, this should be M1::on('connection-name')->chunk(...
{
foreach ($m1s as $m1)
{
$m2 = new M2();
//this does not copy model-specific attributes like $connection
$m2->fill( $m1->getAttributes() );
$m2->save();
}
});
I think this does the job, can you test this?
Related
In a nutshell, the title best discribes my question, but here I am showing the core of the problem.
I have two databases in my web application, One is MariaDB, the other is MongoDB, To give some context, the "user" table in MariaDB stores user information with column "id" it's primary key, there is another "badge" table which stores badge information with also column "id" it's primary key, at last there is "user_badge" collection in MongoDB having documents of fields
{_id, user_id, badge_id, date}
which just links the User with his/her Badges. This is what I meant by pseudo-relation, Unfortunately I don't know what is it called in this situation.
An example:
I want to query and get all users that have a badge with ID 1. So my pseudo-query should do something like "Select all fields from user table where badge_id in user_badge collection is 1". I highlighted like here because this is impossible to be done in a query (based on my knowledge) somehow a query ought to be made on the MongoDB database first then a second have to be made in the MariaDB database against the results of the former query.
Edit: My original question was about how to implement this in Yii2 PHP framework, but when I googled for sometime and I found out no information to do such a thing even in pure PHP, So I decide to end my edited question here, asking for a way to query between a table in an sql database and a collection in a no-sql database, Yet below I leave my old question which just asks for how to do this more specifically in the PHP framework. really if I knew how to do this in pure PHP I can just make a function somehow that does that in the framework if there wasn't any.
Obviously there cannot be a direct primarykey-foriegnkey relation between two database types but I overrided this issue by having a ::hasMany ActiveRecord method in my User Model, and that worked perfectly fine; When I have a User model between hands I just call $model->userBadges to get from MongoDB all documents having that User ID, also vice versa. The problem is when I do a Query involving this relation, I get error
Calling unknown method: yii\mongodb\ActiveQuery::getTableNameAndAlias()
Parts of my Application
User getUserBadges method in User model
public function getUserBadges(){
return $this->hasMany(UserBadge::className(), ['user_id' => 'id']);
}
UserBadge model extending yii\mongodb\ActiveRecord
class UserBadge extends ActiveRecord
{
public static function collectionName()
{
return 'user_badge';
}
public function attributes()
{
return ['_id', 'user_id', 'badge_id', 'date'];
}
public function getUser(){
return $this->hasOne(User::className(), ['id' => 'user_id']);
}
public function getBadge(){
return $this->hasOne(Badge::className(), ['id' => 'badge_id']);
}
}
My query
$query = User::find()->joinWith(['userBadges']);
Edit: I figured out that the previous query is not really what I want, I simplified it to be clear but the real query that I want to do and you will get the point of why I am doing all of this is
$query = User::find()->joinWith(['userBadges'])->where(['badge_id' => 1]);
And with that I can get users from the user table who have a certain badge with id for example 1.
And here the code fails and throws the error stated above. After inspecting for sometime I found the API for the joinWith method
This method allows you to reuse existing relation definitions to perform JOIN queries. Based on the definition of the specified relation(s), the method will append one or multiple JOIN statements to the current query.
And here I knew that it's normal for this error to occur, In my query I am joining a document in a collection of the MongoDB database not a record in a table in a SQL database which definitely wouldn't work. I got stuck here and don't know what to exactly do, I am sticking to have user table in a SQL database and having the user_badge collection in a no-SQL database, what shall I do in such scenario? query on the no-SQL first and then query a SQL query against the result of the former query? or there is already a solution to such a problem in the methods of AcitveQuery? Or my Database structure is invalid?
Thanks in advance.
So after some good time I knew how to do it with the help of this question, where a SQL query is made against a PHP array.
So, first MongoDB will be queried and the results will be stored in an array, then A MariaDB SQL query will be made against the array generated from former query, I am pretty sure that this is not the best option; what if the result of the MongoDB query 100,000? well an array will be made with 100,000 entries, the SQL query will be made using also that 100,000 item array. Yet this is the best answer I could get (until now).
How to implement it in Yii2
// This line query from the MongoDB database and format the data returned well
$userBadges = UserBadge::find()->select(['user_id'])->where(['badge_id' => 1])->column();
// This line make the SQL query using the array generated from the former line
$userQuery = User::find()->where(['id' => $userBadges]);
I hope there can be a better answer for this question that someone can know, But I thought of sharing what I have reached so far.
i hope you are having a good time. i am learning laravel and the inscuctor talked about when you load relationships in laravel, like so
public function timeline()
{
$ids = $this->follows()->pluck('id');
$ids->push($this->id);
return Tweet::whereIn('user_id', $ids)->latest()->get();
}
and i have a follows relationship in my model, and he talked about this line
$ids = $this->follows()->pluck('id');
being better for performance than this line
$ids = $this->follows->pluck('id');
my question is, how does laravel pluck the ids in the first case, and how it queries the database
i hope im making sense, thanks for your time, and answer.
the following one executes a select query on database
$this->follows()->pluck('id');
the follows() returns a query builder (which is a not yet executed sql statement) and then on the result select the id column and returns a collection of ids
you can see the query by dumping the query builder by $this->follows()->dd()
Whereas in the second option
$this->follows->pluck('id')
up until $this->follows laravel executes a query and returns all the records as a collection instance, You will be able to see all the attributes on each of the records. And then ->pluck('id') is getting executed on the laravel collection class, which will do an operation I assume similar to the array_column function does and returns only the id column.
as you can easily see in the second operation the whole data set was retrieved first from the DB and then selected the required attribute/column (2 distinct and heavy operations). Where as in the first option we directly told eloquent to select only the required column, which is only one lighter operation compared to the second option.
I apologize if this is a silly question. This is probably well out of standard practice, but I'm looking to be able to join data from a MS-SQL database with that of MySQL in Laravel 5.2.
I'm not sure if it's possible to do something like...
\DB::connection('sqlsrv')->table('mstable')
->leftJoin(\DB::connection('mysql')->table('mysqltable'),
'mysqltable.shared_id',
'=',
'mstable.shared_id');
My thinking is that if Laravel is converting the query early enough into a PHP object, it should be able to. Otherwise, is there a fallback to being able to use the two database types together?
I'm pretty sure that a join won't be possible. What you can do though, is using relations on different databases. Depending on your situation this might be an applicable workaround. I am currently using this approach to query entities from different databases and "chunk" them so that I only keep a couple of thousand entities in the memory. It's still an efficient way to iterate over all entities, because I'm using eager loading, so that Laravel / Eloquent only triggers two requests per chunk: One to get the primary models and the second to get the relation (by default this is done by an IN statement on the relation's table using the keys obtained from the primary models table).
A "simple" way to set this up is to fill the protected $connection property of the Eloquent Models like this:
class Foo extends Model{
protected $connection = "mysql";
public function bar(){
return $this->hasOne(Bar::class);
}
}
class Bar extends Model{
protected $connection = "ms-sql";
}
Foo::with("bar")->get();
I'm trying to retrieve data from sql table A, modify some columns then insert the modified columns into sql table B.
However my issue is that when I use:
$customer= new Customer;
$fakecustomer= new Fakecustomer;
$fake_customer_name_records = $fakecustomer->get()->toArray();
foreach($fake_customer_name_records as $record){
//process columns for each record
$fake_customer_name_records_arry[]=array(
'last_name'=>$last_name,
'first_name'=>$first_name,
'home_phone'=>$phonenumber,
);
}
$customer->insert($fake_customer_name_records_arry);
It can only insert around 1000 records. Is there a way in Laravel for me to process about 60,000 records?
Thanks
I would suggest to use the "chunk" option here, and process records in "chunks". It's more native way, to my opinion. Here's what docs say :
Chunking Results
If you need to process a lot (thousands) of Eloquent records, using
the chunk command will allow you to do without eating all of your RAM:
User::chunk(200, function($users)
{
foreach ($users as $user)
{
//
}
});
The first argument passed to the method is the number of records you
wish to receive per "chunk". The Closure passed as the second argument
will be called for each chunk that is pulled from the database.
Link to read more : click
Use an extra variable and sum 1 every iteration, when reaches 1000 (or lower) execute 'insert' and reset the counter.
Have you tried disabling the query log with DB::disableQueryLog(); ? I had the same problem and this prety much solved it.
Also, when working with migrations or some process that's going to take a lot of time try to create a command instead of trying to do it with a controller.
I'm working on a research project that requires me to process large csv files (~2-5 GB) with 500,000+ records. These files contain information on government contracts (from USASpending.gov). So far, I've been using PHP or Python scripts to attack the files row-by-row, parse them, and then insert the information into the relevant tables. The parsing is moderately complex. For each record, the script checks to see if the entity named is already in the database (using a combination of string and regex matching); if it is not, it first adds the entity to a table of entities and then proceeds to parse the rest of the record and inserts the information into the appropriate tables. The list of entities is over 100,000.
Here are the basic functions (part of a class) that try to match each record with any existing entities:
private function _getOrg($data)
{
// if name of organization is null, skip it
if($data[44] == '') return null;
// use each of the possible names to check if organization exists
$names = array($data[44],$data[45],$data[46],$data[47]);
// cycle through the names
foreach($names as $name) {
// check to see if there is actually an entry here
if($name != '') {
if(($org_id = $this->_parseOrg($name)) != null) {
$this->update_org_meta($org_id,$data); // updates some information of existing entity based on record
return $org_id;
}
}
}
return $this->_addOrg($data);
}
private function _parseOrg($name)
{
// check to see if it matches any org names
// db class function, performs simple "like" match
$this->db->where('org_name',$name,'like');
$result = $this->db->get('orgs');
if(mysql_num_rows($result) == 1) {
$row = mysql_fetch_object($result);
return $row->org_id;
}
// check to see if matches any org aliases
$this->db->where('org_alias_name',$name,'like');
$result = $this->db->get('orgs_aliases');
if(mysql_num_rows($result) == 1) {
$row = mysql_fetch_object($result);
return $row->org_id;
}
return null; // no matches, have to add new entity
}
The _addOrg function inserts the new entity's information into the db, where hopefully it will match subsequent records.
Here's the problem: I can only get these scripts to parse about 10,000 records / hour, which, given the size, means a few solid days for each file. The way my db is structured requires a several different tables to be updated for each record because I'm compiling multiple external datasets. So, each record updates two tables, and each new entity updates three tables. I'm worried that this adds too much lag time between MySQL server and my script.
Here's my question: is there a way to import the text file into a temporary MySQL table and then use internal MySQL functions (or PHP/Python wrapper) to speed up the processing?
I'm running this on my Mac OS 10.6 with local MySQL server.
load the file into a temporary/staging table using load data infile and then use a stored procedure to process the data - shouldnt take more than 1-2 mins at the most to completely load and process the data.
you might also find some of my other answers of interest:
Optimal MySQL settings for queries that deliver large amounts of data?
MySQL and NoSQL: Help me to choose the right one
How to avoid "Using temporary" in many-to-many queries?
60 million entries, select entries from a certain month. How to optimize database?
Interesting presentation:
http://www.mysqlperformanceblog.com/2011/03/18/video-the-innodb-storage-engine-for-mysql/
example code (may be of use to you)
truncate table staging;
start transaction;
load data infile 'your_data.dat'
into table staging
fields terminated by ',' optionally enclosed by '"'
lines terminated by '\n'
(
org_name
...
)
set
org_name = nullif(org_name,'');
commit;
drop procedure if exists process_staging_data;
delimiter #
create procedure process_staging_data()
begin
insert ignore into organisations (org_name) select distinct org_name from staging;
update...
etc..
-- or use a cursor if you have to ??
end#
delimiter ;
call process_staging_data();
Hope this helps
It sounds like you'd benefit the most from tuning your SQL queries, which is probably where your script spends the most time. I don't know how the PHP MySQL client performs, but MySQLdb for Python is fairly fast. Doing naive benchmark tests I can easily sustain 10k/sec insert/select queries on one of my older quad-cores. Instead of doing one SELECT after another to test if the organization exists, using a REGEXP to check for them all at once might be more efficient (discussed here: MySQL LIKE IN()?). MySQLdb lets you use executemany() to do multiple inserts simultaneously, you could almost certainly leverage that to your advantage, perhaps your PHP client lets you do the same thing?
Another thing to consider, with Python you can use multiprocessing to and try parallelize as much as possible. PyMOTW has a good article about multiprocessing.