I download XML from external URL and parse it into mysql.
Rate::updateOrCreate([
'exchanger_id' => $exchangerId,
'signature_from_id' => $signatureFromId,
'signature_to_id' => $signatureToId
], [
'in' => $item->in,
'out' => $item->out,
'amount' => $item->amount
]);
The thing is XML contains many items, and I parse many sites, so it results into 20K queries for 20-25 URLS. Later on I'll parse about 300 URLS and the number of queries will rise.
How could I optimize this process? I mean the updateOrCreate part. If a row with exchanger_id, signature_from_id and signature_to_id exists I need to update it, otherwise create a new row. And repeat it for every xml item.
As I realize Laravel makes at least 2 queries: first is a select which checks out if the row exists, second is create/update.
Couldn't think about any batch examples :(
Update
I made a unique composite key for first three columns (exchanger_id, signature_from_id, signature_to_id) and downloaded this trait https://github.com/yadakhov/insert-on-duplicate-key
Number of queries become 26 (was about 20000). But the amount of time required to handle all this didn't change. What am I missing...
Why not do this instead if your business case allows it.
(1) Store all the xml in bulk in some folder in your app
(2) Create Cron job that will do the processing for you and fire an event that you can capture when the processing is complete so you can take the next step? Take a look at scheduling jobs here. Also take a look at queues and eventing in laravel here for some more advance ideas.
Related
I have 2 tables
1. First table contains prospects, their treatment status and the mail code they received (see it as a foreign key)
2. Second table contains mails, indexed with email code
I need to display some charts about hundreds of thousands prospects so I was thinking about an aggregate query (get prospect data group by month, count status positive, count status negative, between start and end date, etc)
Result is pretty short and simple, and I can use it directly in charts :
[ "2019-01" => [ "WON" => 55000, "LOST" => 85000, ...],
...
]
Then I was asked to add a filter with mails (code and human label) so user would chose it from a multi select field. I can handle writting the query(ies), but I am wondering about which way I should use.
I got a choice between:
- keeping my first query and do a second one (distinct values of mail, same conditions)
- query everything and treat all my rows with PHP
I know coding but I have little knowledge about performance.
In theory I should not use 2 queries about same data but treating all those lines with php when mysql can do it better, looks like ... "overkill".
Is there a best practice ?
I have a lot of PHP pages that have dozens of queries supporting them, and they run plenty fast. When a page does not run fast, I focus on the slowest query; I do not on playing games in PHP. But I avoid running a query that hits hundreds of thousands of rows; it will be "too" slow. Some things...
Maybe I will find a way to aggregate the data to avoid a big scan.
Maybe I will move the big query to a second page -- this avoids penalizing the user who does not need.
Maybe I will break up the big scan so that the user must ask for pieces, not build a page with 100K lines. Pagination is not good for that many rows. So...
Maybe I will dynamically build an index into a second level of pages.
To discuss this further, please provide SHOW CREATE TABLE, some SELECTs (not worrying about how bad they are; we'll tell you), and mockups of page(s).
I'm using Laravel 5.7 to fetch large amounts of data (around 500k rows) from an API server and insert it into a table (call it Table A) quite frequently (at least every six hours, 24/7) - however, it's enough to insert only the changes the next time we insert (but at least 60-70% of the items will change). So this table will quickly have tens of millions of rows.
I came up with the idea to make a helper table (call it Table B) to store all the new data into it. Before inserting everything into Table A, I want to compare it to the previous data (with Laravel, PHP) from Table B - so I will only insert the records that need to be updated. Again it will usually be around 60-70% of the records.
My first question is if this above-mentioned way is the preferred way of doing it, in this situation (obviously I want to make it happen as fast as possible.) I assume that searching for an updating the records in the table would take a lot more time and it would keep the table busy / lock it. Is there a better way to achieve the same (meaning to update the records in the DB).
The second issue I'm facing is the slow insert times. Right now I'm using a local environment (16GB RAM, I7-6920HQ CPU) and MySQL is inserting the rows very slowly (about 30-40 records at a time). The size of one row is around 50 bytes.
I know it can be made a lot faster by fiddling around with InnoDB's settings. However, I'd also like to think that I can do something on Laravel's side to improve performance.
Right now my Laravel code looks like this (only inserting 1 record at a time):
foreach ($response as $key => $value)
{
DB::table('table_a')
->insert(
[
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
]);
}
$response is a type of array.
So my second question: is there any way to increase the inserting time of the records to something like 50k/second - both on the Laravel application layer (by doing batch inserts) and MySQL InnoDB level (changing the config).
Current InnoDB settings:
innodb_buffer_pool_size = 256M
innodb_log_file_size = 256M
innodb_thread_concurrency = 16
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = normal
innodb_use_native_aio = true
MySQL version is 5.7.21.
If I forgot to tell/add anything, please let me know in a comment and I will do it quickly.
Edit 1:
The server that I'm planning to use will have SSD on it - if that makes any difference. I assume MySQL inserts will still count as I/O.
Disable autocommit and manually commit at end of insertion
According to MySQL 8.0 docs. (8.5.5 Bulk Data Loading for InnoDB Tables)
You can increase the INSERT speed by turning off auto commit:
When importing data into InnoDB, turn off autocommit mode, because it performs a log flush to disk for every insert. To disable autocommit during your import operation, surround it with SET autocommit and COMMIT statements:
SET autocommit=0;
... SQL import statements ...
COMMIT;
Other way to do it in Laravel is using Database Transactions:
DB::beginTransaction()
// Your inserts here
DB::commit()
Use INSERT with multiple VALUES
Also according to MySQL 8.0 docs (8.2.5.1 Optimizing INSERT Statements) you can optimize INSERT speed by using multiple VALUES on a single insert statement.
To do it with Laravel, you can just pass an array of values to the insert() method:
DB::table('your_table')->insert([
[
'column_a'=>'value',
'column_b'=>'value',
],
[
'column_a'=>'value',
'column_b'=>'value',
],
[
'column_a'=>'value',
'column_b'=>'value',
],
]);
According to the docs, it can be many times faster.
Read the docs
Both MySQL docs links that I put on this post have tons of tips on increasing INSERT speed.
Avoid using Laravel/PHP for inserting it
If your data source is (or can be) a CSV file, you can run it a lot faster using mysqlimport to import the data.
Using PHP and Laravel to import data from a CSV file is an overhead, unless you need to do some data processing before inserting.
Thanks #Namoshek, I had also the same problem. solution is like this.
$users= array_chunk($data, 500, true);
foreach ($users as $key => $user) {
Model::insert($user);
}
Depends on data, you can also make use of array_push() and then insert.
Don't call insert() inside a foreach() because it will execute n number of queries to the database when you have n number of data.
First create an array of data objects matching with the database column names. and then pass the created array to insert() function.
This will only execute one query to the database regardless of how many number of data you have.
This is way faster, way too faster.
$data_to_insert = [];
foreach ($response as $key => $value)
{
array_push($data_to_insert, [
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
]);
}
DB::table('table_a')->insert($data_to_insert);
You need to do multiple row insert but also chunk your insert to not exceed your DB limits
You can do this by chunking your array
foreach (array_chunk($response, 1000) as $responseChunk)
{
$insertableArray = [];
foreach($responseChunk as $value) {
$insertableArray[] = [
'test1' => $value['test1'],
'test2' => $value['test2'],
'test3' => $value['test3'],
'test4' => $value['test4'],
'test5' => $value['test5'],
];
}
DB::table('table_a')->insert($insertableArray);
}
You can increase the size of the chunk 1000 till you approach you DB configuration limit. Make sure to leave some security margin (0.6 times your DB limit).
You can't go any faster than this using laravel.
So I do a lot of calculations and at the end I have rates that need to be saved to existing rows in a table.
The array I have will be similar to the following:
[
<model_id> => [
'rate' => <some rate>
]
<model_id_2> => [
'rate' => <some other rate>
]
.....
]
Now obviously I could foreach through this array and do an update for each and every item in the array but I could end up with 100 update calls. Is there a way (through laravel's eloquent OR even a raw sql query) to do all these updates through one call?
You may try with Eloquent update() for multiple records update. Here is some code which I am using for update multiple records into the my table.
\App\Notification::where('to_id', '=', 0)
->update(['is_read' => 1]);
If you are worried about the request spent time you can handle this by firing an event and then queueing your listener/job, who will save your model, so it can be processed asynchronously. For examples, go to Laravel Docs for Queues
As long as I know you cannot update multiple rows on Laravel.
I currently have some code which needs to perform multiple updates per user for thousands of users, incrementing a counter depending on an action they've taken in order to track what actions are being performed. Each action consists of subactions which need to have the count updated too. These need to be tracked by day.
So I am storing "action":"actionName", "day":day, "count": count, for actions per day (e.g. incoming from outside web page, start game, stop game by exiting, concatenated with the game name for a lot of games).
Each day I get a few thousand rows (one per unique action) added which are updated a few hundred thousand times each day to increase the count.
The relevant code is as follows (creating array of actions not included).
$m = new Mongo();
$db = $m->actionsDB;
$collection = $db->action_count;
foreach ($arr as $action) {
$collection->update(array("action" => $action, "day" => $day),array('$inc' => array("count" => 1)),array("upsert" => true));)
}
$collection->ensureIndex(array("action" => 1, "day" => -1));
An example of the series of updates made on an action and subactions would be:
startGame, 20110417;
startGameZork, 20110417;
startGameZorkWindows, 20110417
The problem seems to be that with this code running on the server, mongo commands in the shell get queued up.
Currently I'm unsure as to why, I guess there may be a performance issue with so many updates per second.
What I am wondering is how can I increase performance? I'm pretty new to mongo, so not entirely sure what options are available. I looked at PHP's batchInsert but I can't see any mention of doing batchUpdate (so instead of updating, creating an array holding all the data I currently update then doing a batchUpdate in a single trip to the DB).
Mongo driver version is 1.2.0, so persistent connections are by default.
Edit: db.serverStatus() before, during and after on ~1600 updates per second (30 seconds). Test Data
There is no built-in batching for updates/upserts. You can only limit the docs to be updated by adjusting your query expression and adding some further filter for "emulating" a batch somehow. MongoDB won't help you here. Updates/Upserts are one or all.
If you have a chance to store your data in a file (json or csv), you could try to insert the data using the command-line mongoimport utility .
In this way you can use the --upsert flag to update/insert documents if they are already present/new
For example from PHP:
exec("mongoimport --db <bdname> --collection <collection_name> --jsonArray --upsert --file $data_file");
So I'm working on a project for a realtor. I have the following objects/MySQL tables in my design:
Complexes
Units
Amenities
Pictures
Links
Documents
Events
Agents
These are the relationships between the above objects.
Complexes have a single Agent.
Complexes have multiple Units, Amenities, Pictures, Links, Documents, and Events.
Units have multiple Pictures, Links, and Documents.
Amenities, Pictures, Links, Documents, and Events all have the necessary foreign keys in the database to specify which unit/complex they belong to.
I need to load the necessary objects from the database into PHP so I can use them in my project.
If I try to select all the data out of the table in 1 query, using LEFT JOINS, I'll get AT LEAST (# of links) * (# of pictures) * (# of documents) rows for each unique unit. Add amenities, and events to that and I'll get all that * # of amenities * # of events for each complex...Not sure I want to try to deal with loading that into an object in PHP.
The other possibility is for each complex/unit, execute 1 separate SQL statement each for links, pictures, documents, events and amenities
My questions are as follows:
If I properly index all my tables, is it REALLY a bad idea to execute 3-5 extra queries for each complex/unit?
If not, how else can I get the data I need to load into a PHP object. Ideally, I would have an object as follows for units:
Unit Object
(
[id]
[mls_number]
[type]
[retail_price]
[investor_price]
[quantity]
[beds]
[baths]
[square_feet]
[description]
[featured]
[year_built]
[has_garage]
[stories]
[other_features]
[investor_notes]
[tour_link]
[complex] => Complex Object
(
[id]
[name]
[description]
etc.
)
[agent] => Agent Object
(
[id]
[first_name]
[last_name]
[email]
[phone]
[phone2]
etc.
)
[pictures] => Array
(
[1] => Picture Object
(
)
)
[links] => Array
(
[1] => Link Object
(
)
)
[documents] => Array
(
[1] => Document Object
(
)
)
)
I don't ALWAYS need ALL of this information, sometimes I only need the primary key of the complex, sometimes I only need the primary key of the agent, etc. But I figured the correct way to do this would be to load the entire object every time I instantiate it.
I've been doing a lot of research on OO PHP, but most (read all) online examples use only 1 table. That obviously doesn't help as the project I'm working on has many complex relationships. Any ideas? Am I totally off the mark here?
Thanks
[UPDATE]
On the other hand, usually on the front-end, which everyone will see, I WILL need ALL the information. For instance, when someone wants information on a specific complex, I need to display all units belonging to that complex, all pictures, document, links, events for the complex as well as all pictures, documents and links for the unit.
What I was hoping to avoid was, during one page load, executing one query to get the complex I need. Then another query to get the 20 units associated with the complex. Then for each of the 20 units, executing a query for picture, another for documents, another for links, etc. I wanted to get them all at once, with one trip through the database.
[EDIT 2]
Also, note that the queries to select the pictures, documents, links, events, and agent from the database are pretty simple. Just basic SELECT [list of columns] FROM [table] WHERE [primary_key] = [value] with the occasional INNER JOIN. I'm not doing any complex computations or subqueries, just basic stuff.
[BENCHMARK]
So after reading all the answers to my question, I decided to run a benchmark on what I decided to do. What I do is load all the units that I need. Then as I need to display pictures, document, blah blah, I load them at that time. I created 30,000 test units, each with 100 pictures, 100 documents, and 100 links. Then I loaded a certain number of units (I started with 1000, then 100, then the more realistic 10), looped through them, then loaded all pictures, documents and links associated to the unit. With 1000 units, it took approximately 30 seconds. With 100 units, it took about 3 seconds. With 10 units, it took about .5 seconds. There was a lot of variance with the results. Sometimes, with 10 units, it would take .12 seconds. Then it would take .8. Then maybe .5. Then .78. It was really all over the place. However, it seemed to average around half a second. In reality, though, I might only need 6 units at a time, and they each might only have 10 pictures, 5 links and 5 documents associated with them...so I think the "grab the data when you need it" approach is the best bet in a situation like this. If you needed to get all this data at once though, it would be worthwhile to come up with a single SQL statement to load all the data you need so you are only looping through the data one time (6700 units at a time took 217 seconds while the full 30,000 made PHP run out of memory).
If I properly index all my tables, is it REALLY a bad idea to execute 3-5 extra queries for each complex/unit?
In short, no. For each of the related tables, you should probably run a separate query. That's what most ORM (Object-Relational Mapping/Modelling) systems would do.
If performance is really a problem (and, based on what you've said, it won't be) then you might consider caching the results using something like APC, memcache or Xcache.
the point of ORM is not to load entire objects every time. the point is to make it easy and transparent for your app to access object.
that being said, if you need the unit object, then load the unit object, and only the unit object. if you need the agent object, then load that when you need it, not when you load the unit object.
Maybe you should think of breaking this up.
When you initiate your object, get only what details you need for that object to function. If and when you need more details, then go and get them. You distribute your load and processing this way: the object only gets the load and processing it needs to function, and when more is needed, it gets it then.
So, in your example - create the complex first. When you need to access a unit, then create that unit, when you need the agent, then get that agent, etc.
$complexDetails = array('id' => $id, etc);
$complexUnits = array();
.........
$complexUnits[] = new unit();
.........
$complexDetails['agent'] = new Agent();
I had to address this issue a while back when I concocted my own MVC framework as an experiment. To limit the layers of data loaded from the DB, I passed an integer to the constructor. Each constructor would decrement this integer before passing it to the constructors of the objects it instantiated. When it got to 0, no more sub-objects would be instantiated. This meant, basically, the int passed was the number of layers loaded.
So if I only wanted an attribute of the unit object, I'd do this:
$myUnit = new Unit($unitId,1);
If you want to "store" the objects, meaning cache them, just load them into a PHP array and serialize it. Then you can store it back to the database, in memcache or anywhere else. Attaching a label to it would allow you to retrieve it, and include a time stamp so you know how old it is (i.e. needs to be refreshed).
If the data doesn't change, or changes infrequently, there really is no reason to run multiple complex queries every time. Simple ones, like getting a primary, you might as well just hit the database directly.