I download XML from external URL and parse it into mysql.
Rate::updateOrCreate([
'exchanger_id' => $exchangerId,
'signature_from_id' => $signatureFromId,
'signature_to_id' => $signatureToId
], [
'in' => $item->in,
'out' => $item->out,
'amount' => $item->amount
]);
The thing is XML contains many items, and I parse many sites, so it results into 20K queries for 20-25 URLS. Later on I'll parse about 300 URLS and the number of queries will rise.
How could I optimize this process? I mean the updateOrCreate part. If a row with exchanger_id, signature_from_id and signature_to_id exists I need to update it, otherwise create a new row. And repeat it for every xml item.
As I realize Laravel makes at least 2 queries: first is a select which checks out if the row exists, second is create/update.
Couldn't think about any batch examples :(
Update
I made a unique composite key for first three columns (exchanger_id, signature_from_id, signature_to_id) and downloaded this trait https://github.com/yadakhov/insert-on-duplicate-key
Number of queries become 26 (was about 20000). But the amount of time required to handle all this didn't change. What am I missing...
Why not do this instead if your business case allows it.
(1) Store all the xml in bulk in some folder in your app
(2) Create Cron job that will do the processing for you and fire an event that you can capture when the processing is complete so you can take the next step? Take a look at scheduling jobs here. Also take a look at queues and eventing in laravel here for some more advance ideas.
I am trying to fetch Data in a recursive-way (over associations) using CakePHP 3.1
In Cake 3 I can use the "contain" key to fetch the next level of asociated data. But I need to fetch one more level. Does anyone know how to do this? I read the docs but didn't found anything there, with google it's the same.
The 3 Levels are connected like this:
OperationalCostInvoice (belongsTo Object)
-> Object (hasMany OperationalCostTypes)
-> OperationalCostType
With OperationalCostInvoice->get($object_id, ['contain' => 'Object']) I can get the Object that is associated with the OperationalCostInvoice but I also want to fetch the OperationalCostTypes from the Object in (if possible) just one call.
I dont need tipps about association linking the reason that the entities are linked like this is I can easily implement a history function.
Thanks in advance!
I just meant one function call (on the Table object) to fetch everything. I know that more than one query is required.
Just create your own table method then and return all your results in one array or implement whatever you want and return it.
public function foo() {
return [
'one' => $this->find()...->all();
'two' => $this->Asssoc->find()...->all();
];
}
But in CakePHP 2 there was the option recursive which controlled on how many levels associated data is fetched.
The recursive was a pretty stupid thing in Cake2. First thing we've always done was to set it to -1 in the AppModel to avoid unnecessary data fetching. Using contain is always the better choice. I would stay away from using recursive at all, especially for deeper levels.
Also contain is still, as it was in Cake2 as well, able to fetch more than one level deep associations.
$this->find()->contain([
'FirstLevel' => [
'SecondLevel' => [
'ThirdLevel'
]
]
])->all();
problem
I have two data tables SEQUENCES and ORGANISMS whose many-to-many-relationship is mappend in the table SOURCES. There is also a 1-m relationshipt between SOURCES and ENTRIES. I will append a detailed structure.
What i want to achieve, is the display of all sequences with all associated organisms and entries, where a condition within the sequences table is met. I have some ideas on how to achieve this, but i need the solution with the best performance, as each of these contains 50k+ entries.
idea one
Select all organisms that belong to the same sequence as a concatenated string in sql, and split it in PHP. I have no idea though, how to do the concatenation in SQL.
idea two
select same sequences with different organisms as distinct records, order by organism, and join them later in php. though this somehow feels just wrong.
idea three
use views. ANY idea on this one appreciated
structure
SEQUENCES
SEQUENCE_ID
DESCRIPTION
ORGANISMS
ORGANISM_ID
NAME
SOURCES
SOURCE_ID
SEQUENCE_ID FK to SEQUENCES.SEQUENCE_ID
ORGANISM_ID FK to ORGANISMS.ORGANISM_ID
ENTRIES
SOURCE_ID FK to SOURCES.SOURCE_ID
ENTRY_VALUE
desired outcome
array(
array(
"SEQUENCE_ID" => 4,
"DESCRIPTION" => "Some sequence",
"SOURCES" => array(
array(
"ORGANISM_ID" => 562,
"ORGANISM_NAME" => "Escherichia coli",
"ENTRIES" => array(
"some entry",
"some other entry"
),
array(
"ORGANISM_ID" => 402764,
"ORGANISM_NAME" => "Aranicola sp. EP18",
"ENTRIES" => array()
)
)
),
array(
"SEQUENCE_ID" => 5,
.....
)
)
PHP5 and FIREBIRD2.5.1
You can't fetch a nested array like that directly from a flat table structure. But if I get you right, what you want to do is not that hard to achieve.
I don't understand why you would concatenate things and then split them again, that's hard to maintain and probably slow.
I see two approaches here:
Fetch everything at once as flat table using JOIN and loop through it in PHP. This approach creates a lot of duplication but it's fast because you can fetch all data in one query and then process it with PHP.
Fetch every entity separately, loop and fetch the next hierarchy level as you go. This approach will be slower. It takes complexity away from the SQL query and doesn't fetch redunant data. It also gives you more freedom as to how you loop through your data and what you do with it.
Alternatively you might want to actually store hierarchical data in a no-sql way, where you could already store the array structure you mentioned.
I have a site developed in cakephp 2.0, I have some tables relationed here is an example:
This is my relations:
ingredients (id,name) has many versions
ingredient_properties(id,property_id,version_id) belongs
to properties, versions
properties (id,name,value,group_id,unit_id) has many
ingredient_properties and belongs to groups,units
groups (id,name) has many properties
units (id,name) has many properties
versions (id,name,ingredient_id,active) has many ingredient_properties and belongs to ingredients.
I am in the ingredientController.php and I wanto to retrieve all this data where Version.active=1 and Version.ingredient_id=2.
This is my query:
$this->set(
'ingredient',
$this->Ingredient->Version->find('all', array(
'recursive' => 2,
'conditions' => array(
'Version.active' => 1,
'Version.ingredient_id' => 2
)
))
);
I have many and many queries like this and I want to know if recursive 2 is the best way to retrieve all data of the table that I have explained or there is a better way most quickly (in terms of speed of query not to implement).
I hope that someone can help me to optimize my code because this query works but I don't know if it is the better way to retrieve data of many tables relationed.
Thanks.
It is not the best way to use 'recursive' => 2 if you want to retrieve so much data. I believe it generates too many queries. Containable behaviour has the same drawback. The best way for me was to unbind models associations and construct table joins on the fly. You can look at an example here. But you need to know some SQL to understand what you do.
So I'm working on a project for a realtor. I have the following objects/MySQL tables in my design:
Complexes
Units
Amenities
Pictures
Links
Documents
Events
Agents
These are the relationships between the above objects.
Complexes have a single Agent.
Complexes have multiple Units, Amenities, Pictures, Links, Documents, and Events.
Units have multiple Pictures, Links, and Documents.
Amenities, Pictures, Links, Documents, and Events all have the necessary foreign keys in the database to specify which unit/complex they belong to.
I need to load the necessary objects from the database into PHP so I can use them in my project.
If I try to select all the data out of the table in 1 query, using LEFT JOINS, I'll get AT LEAST (# of links) * (# of pictures) * (# of documents) rows for each unique unit. Add amenities, and events to that and I'll get all that * # of amenities * # of events for each complex...Not sure I want to try to deal with loading that into an object in PHP.
The other possibility is for each complex/unit, execute 1 separate SQL statement each for links, pictures, documents, events and amenities
My questions are as follows:
If I properly index all my tables, is it REALLY a bad idea to execute 3-5 extra queries for each complex/unit?
If not, how else can I get the data I need to load into a PHP object. Ideally, I would have an object as follows for units:
Unit Object
(
[id]
[mls_number]
[type]
[retail_price]
[investor_price]
[quantity]
[beds]
[baths]
[square_feet]
[description]
[featured]
[year_built]
[has_garage]
[stories]
[other_features]
[investor_notes]
[tour_link]
[complex] => Complex Object
(
[id]
[name]
[description]
etc.
)
[agent] => Agent Object
(
[id]
[first_name]
[last_name]
[email]
[phone]
[phone2]
etc.
)
[pictures] => Array
(
[1] => Picture Object
(
)
)
[links] => Array
(
[1] => Link Object
(
)
)
[documents] => Array
(
[1] => Document Object
(
)
)
)
I don't ALWAYS need ALL of this information, sometimes I only need the primary key of the complex, sometimes I only need the primary key of the agent, etc. But I figured the correct way to do this would be to load the entire object every time I instantiate it.
I've been doing a lot of research on OO PHP, but most (read all) online examples use only 1 table. That obviously doesn't help as the project I'm working on has many complex relationships. Any ideas? Am I totally off the mark here?
Thanks
[UPDATE]
On the other hand, usually on the front-end, which everyone will see, I WILL need ALL the information. For instance, when someone wants information on a specific complex, I need to display all units belonging to that complex, all pictures, document, links, events for the complex as well as all pictures, documents and links for the unit.
What I was hoping to avoid was, during one page load, executing one query to get the complex I need. Then another query to get the 20 units associated with the complex. Then for each of the 20 units, executing a query for picture, another for documents, another for links, etc. I wanted to get them all at once, with one trip through the database.
[EDIT 2]
Also, note that the queries to select the pictures, documents, links, events, and agent from the database are pretty simple. Just basic SELECT [list of columns] FROM [table] WHERE [primary_key] = [value] with the occasional INNER JOIN. I'm not doing any complex computations or subqueries, just basic stuff.
[BENCHMARK]
So after reading all the answers to my question, I decided to run a benchmark on what I decided to do. What I do is load all the units that I need. Then as I need to display pictures, document, blah blah, I load them at that time. I created 30,000 test units, each with 100 pictures, 100 documents, and 100 links. Then I loaded a certain number of units (I started with 1000, then 100, then the more realistic 10), looped through them, then loaded all pictures, documents and links associated to the unit. With 1000 units, it took approximately 30 seconds. With 100 units, it took about 3 seconds. With 10 units, it took about .5 seconds. There was a lot of variance with the results. Sometimes, with 10 units, it would take .12 seconds. Then it would take .8. Then maybe .5. Then .78. It was really all over the place. However, it seemed to average around half a second. In reality, though, I might only need 6 units at a time, and they each might only have 10 pictures, 5 links and 5 documents associated with them...so I think the "grab the data when you need it" approach is the best bet in a situation like this. If you needed to get all this data at once though, it would be worthwhile to come up with a single SQL statement to load all the data you need so you are only looping through the data one time (6700 units at a time took 217 seconds while the full 30,000 made PHP run out of memory).
If I properly index all my tables, is it REALLY a bad idea to execute 3-5 extra queries for each complex/unit?
In short, no. For each of the related tables, you should probably run a separate query. That's what most ORM (Object-Relational Mapping/Modelling) systems would do.
If performance is really a problem (and, based on what you've said, it won't be) then you might consider caching the results using something like APC, memcache or Xcache.
the point of ORM is not to load entire objects every time. the point is to make it easy and transparent for your app to access object.
that being said, if you need the unit object, then load the unit object, and only the unit object. if you need the agent object, then load that when you need it, not when you load the unit object.
Maybe you should think of breaking this up.
When you initiate your object, get only what details you need for that object to function. If and when you need more details, then go and get them. You distribute your load and processing this way: the object only gets the load and processing it needs to function, and when more is needed, it gets it then.
So, in your example - create the complex first. When you need to access a unit, then create that unit, when you need the agent, then get that agent, etc.
$complexDetails = array('id' => $id, etc);
$complexUnits = array();
.........
$complexUnits[] = new unit();
.........
$complexDetails['agent'] = new Agent();
I had to address this issue a while back when I concocted my own MVC framework as an experiment. To limit the layers of data loaded from the DB, I passed an integer to the constructor. Each constructor would decrement this integer before passing it to the constructors of the objects it instantiated. When it got to 0, no more sub-objects would be instantiated. This meant, basically, the int passed was the number of layers loaded.
So if I only wanted an attribute of the unit object, I'd do this:
$myUnit = new Unit($unitId,1);
If you want to "store" the objects, meaning cache them, just load them into a PHP array and serialize it. Then you can store it back to the database, in memcache or anywhere else. Attaching a label to it would allow you to retrieve it, and include a time stamp so you know how old it is (i.e. needs to be refreshed).
If the data doesn't change, or changes infrequently, there really is no reason to run multiple complex queries every time. Simple ones, like getting a primary, you might as well just hit the database directly.