I have a code which fetches data from external API and commits it to DB afterwards:
protected function saveWidgetsToDatabase($widgetsDaily, Boost $boost, $date)
{
echo "Saving widgets to DB... ";
$widgets = Widget::all();
foreach ($widgetsDaily as $widgetDaily) {
$existingWidget = $widgets
->where('widget_id', $widgetDaily->id)
->where('date', $date)
->first();
if ($existingWidget === null)
$boost->widgets()->save(new Widget([
...
]));
else
$existingWidget->update([
...
]);
}
}
Relation I have is that one Boost has many Widgets. Now, the issue I'm facing is bottleneck DB saving/updating as I need to update a widget only if it has same date and ID, otherwise I need to create new one.
We are talking about few thousands of records, so I believe that where clauses are pretty intensive.
I wanted to make a batch save, though I didn't quite make it.
Are there any chances of making this faster?
When you call Widget::all();, that gets every single widget record in your database and creates a Widget instance for it. Therefore, $widgets will be a Collection of every Widget object stored in the database. If you have 10000 widget records, you'll have a Collection of 10000 Widget objects. This is obviously not what you want.
That also means that when you call $widgets->where()..., you're calling where() on the Collection object, which is using PHP to filter through the collection of objects, instead of using SQL to filter the database results.
There are a couple things you can do.
First, you know you only care about those widgets that have an id in the list of $widgetsDaily. So, limit your Widget query to only include those records that have a widget_id in that list of ids.
Second, add the date lookup to the database query as well.
Third, key the resulting collection by the widget_id field, so that you can directly access the item by the widget_id without having to loop through the entire collection looking for it every time.
protected function saveWidgetsToDatabase($widgetsDaily, Boost $boost, $date)
{
// Get the only widget_ids we care about (assumes $widgetsDaily is a collection)
$ids = $widgetsDaily->pluck('id')->all();
// Get the target widgets from the database. This collection will only
// contain widgets that we actually care about.
$widgets = Widget::whereIn('widget_id', $ids)
->where('date', $date)
->get()
->keyBy('widget_id'); // rekey the resulting collection
foreach ($widgetsDaily as $widgetDaily) {
// Because the collection was rekeyed on widget_id, you can use
// get(id) instead of having to use where('widget_id', id)->first()
$existingWidget = $widgets->get($widgetDaily->id);
if ($existingWidget === null)
$boost->widgets()->save(new Widget([
...
]));
else
$existingWidget->update([
...
]);
}
}
Related
I am facing issue with a scenario that I have to show data in two different grids present on same view. I don't want to query separately for both grids. What I want to achieve is to query only once and split data for both grids separately and pass it to both grids.
I have the option to hide rows on type basis but I don't want to use this
I have tried the option to hide rows on type basis but I don't want to use this option. I want something to split the main data provider into two data providers
The only way to do that with yii\data\ActiveDataProvider is extending it and overriding its prepareModels() and prepareKeys() methods.
Other option is to use yii\data\ArrayDataProvider instead.
//simple query just for illustration, modify it as you need
$all = MyModel::find()->all();
$first = $second = [];
foreach ($all as $item) {
// condition to decide where the current item belongs
if (someCondition) {
$first[] = $item;
} else {
$second[] = $item;
}
}
$firstProvider = new \yii\data\ArrayDataProvider([
'allModels' => $first,
]);
$secondProvider = new \yii\data\ArrayDataProvider([
'allModels' => $second,
]);
The main disadvantage of using ArrayDataProvider is, that you have to load all models into array even if you plan to use pagination. So if there are many rows in your table, it might be better to use two independent ActiveDataProvider and let them load the data in two queries.
I have collections of clients, and every collection can contain many clients. This PHP code loops collections, and clients inside every collection. And saves the client to the database.
foreach ($collections as $key => $collection) {
foreach ($collection as $k => $client) {
$name = $client['name'];
//...
$clientObj = new Client();
$clientObj->setName($name);
//..
$clientObj->save();
}
}
What I want to do, is to group every collection in one Mysql query, then go to the next collection. Because the previous code executes one query per client, And for performance, we need one query per collection.
How can we do that?
Add each record to a Doctrine_Collection the call save() on the collection object.
* Saves all records of this collection and processes the
* difference of the last snapshot and the current data
As Example:
$collection = new Doctrine_Collection('client');
$collection->add($client1);
$collection->add($client2);
$collection->save();
I want to include the model name in the returned results of a query using CakePHP's find() methods.
For instance, if I do a
$person = $this->Person->find("first", array(
"conditions" => array (
"Person.id" => $id
)
));
I get back
Person{id:1, name:Abraham Lincoln}
I want to get back
Person{id:1, name:Abraham Lincoln, model: Person}
I'm fairly front-end oriented. I know I could loop through results and add these at the controller level, but that seems tedious, especially since most of my queries are far more complex, utilizing contain(). I imagine somewhere in CakePHP's core there's a place this kind of functionality could be added, I just don't know where.
Essentially, I'm looking for where CakePHP casts the database query to a php variable, so I can inject my additional model value.
I do know I will never use the column name "model" anywhere in my application. I'm also certain I want this information where I'm requesting it to be in every singe query, as little sense as it may make.
Add this to every model where you need it:
public function afterFind($results, $primary = false) {
foreach($results as $ikey => $item) {
foreach($item as $skey => $subitem) {
if(is_array($subitem))
$results[$ikey][$skey]['model'] = $skey;
else $results[$ikey]['model'] = $skey;
}
}
return $results;
}
Unfortunately I wasn't able to get this work when I stored it in AppModel.
I hope the title was descriptive enough, i wasn't sure how to name it.
Let's say i have the following code:
Class Movie_model {
public method getMoviesByDate($date) {
// Connects to db
// Gets movie IDs from a specific date
// Loop through movie IDs
// On each ID, call getMovieById() and store the result in an array
// When all IDs has looped, return array with movies returned from getMovieById().
}
public function getMovieById($id) {
// Get movie by specified ID
// Also get movie genres from another method
// Oh, and it gets movie from another method as well.
}
}
I always want to get the same result when getting a movie (I always want the result from getMovieById().
I hope you get my point. I will have many other functions like getMoviesByDate(), i will also have getMoviesByGenre() for example, and i want that to return the same movie info as getMovieById() as well.
It it "ok" to do it this way? I know this puts more load on the server and increases load time, but is there any other, better way that i don't know of?
EDIT: I clarified the code in getMoviesByDate() a bit. Also, getMovieByDate() is just an example. As i said, i will be calling methods like getMoviesByGenre() also.
EDIT: I'm currently running 48 database queries on the frontpage of my project, and the frontpage is still far from finished, so that number would at least triple when i'm done. Almost all queries take around 0.0002, but as the database keeps growing that number will rise dramatically i'm guessing. I need to change something.
I don't think it's good to work like this in this particular case. The function getMoviesByDate would return an amount of "n" movies (or movie ids) from a single query. For each id in this query you would have a separate query to get the movie by the specified ID.
This would mean if the first function would return 200 movies, you would run the getMovieById() function (and the query inside it) 200 times. A better practice (IMO) would be to just get all the info you require in the getMoviesByDate() function and return it as a collection.
It doesn't seem very logical to have getMoviesByDate() and getMoviesById() methods on a Movie class.
An alternative would be to have some sort of MovieManager class that does all of the retrieving, and returns Movie objects.
class MovieManager {
public function getMoviesByDate($date) {
// get movies by date, build an array of Movie objects and return
}
public function getMoviesByGenre($genre) {
// get movies by genre, build an array of Movie objects and return
}
public function getMovieById($id) {
// get movie by id, return Movie object
}
}
Your Movie class would just have properties and methods specific to a single movie:
class Movie {
public id;
public name;
public releaseDate;
}
It's OK to have separate methods for getting by date, genre etc etc, but you must ensure that you are not calling for the same records multiple times - in that case you will want a single query that could join the various tables you need.
Edit - after you have clarified your question:
The idea of getting movie IDs by date, then running them all through getMovieById() is bad! The movie data should be pulled when getting by date, so you don't have to hit the database again.
You can modified your getMovieById function. You can pass date as a parameter, the function should return the movies by their id and filtered by date.
To keep track which records you've already loaded into RAM previously you can use a base class for your models which saves the id's of the records already loaded and a reference to object the model object in the RAM.
class ModelBase {
/* contains the id of the current record, null if new record */
protected $id;
// keep track of records already loaded
static $loaded_records = Array();
public function __construct(Array $attr_values) {
// assign $attr_values to this classes attributes
// save this instance in class variable to reuse this object
if($attr_values['id'] != null) {
self::$loaded_records[get_called_class()][$attr_values['id']] = $this;
}
}
public static function getConcurrentInstance(Array $attr_values) {
$called_class = get_called_class();
if(isset(self::$loaded_records[$called_class][$attr_values['id']])) {
// this record was already loaded into RAM
$record = self::$loaded_records[$called_class][$attr_values['id']];
// you may need to update certain fields of $record
// from the data in $attr_values, because the data in the RAM may
// be old data.
} else {
// create the model with the given values
$record = new $called_class($attr_values);
}
return $record;
}
// provides basic methods to update records in ram to database etc.
public function save() {
// create query to save this record to database ...
}
}
Your movie model could look something like this.
Class MovieModel extends ModelBase {
// additional attributes
protected $title;
protected $date;
// more attributes ...
public static function getMoviesByDate($date) {
// fetches records from database
// calls getConcurrentInstance() to return an instance of MovieModel() for every record
}
public static function getMovieById($id) {
// fetches record from database
// calls getConcurrentInstance() to return an instance of MovieModel()
}
}
Other things you could do do decrease the load on the DB:
Only connect once to the database per request. There are also possibilities to share a connection to a database between multiple requests.
Index thefields in your database which get searched often.
only fetch the records you need
Prevent to load the same record twice (if it didn't change)
What is the best way of working with calculated fields of Propel objects?
Say I have an object "Customer" that has a corresponding table "customers" and each column corresponds to an attribute of my object. What I would like to do is: add a calculated attribute "Number of completed orders" to my object when using it on View A but not on Views B and C.
The calculated attribute is a COUNT() of "Order" objects linked to my "Customer" object via ID.
What I can do now is to first select all Customer objects, then iteratively count Orders for all of them, but I'd think doing it in a single query would improve performance. But I cannot properly "hydrate" my Propel object since it does not contain the definition of the calculated field(s).
How would you approach it?
There are several choices. First, is to create a view in your DB that will do the counts for you, similar to my answer here. I do this for a current Symfony project I work on where the read-only attributes for a given table are actually much, much wider than the table itself. This is my recommendation since grouping columns (max(), count(), etc) are read-only anyway.
The other options are to actually build this functionality into your model. You absolutely CAN do this hydration yourself, but it's a bit complicated. Here's the rough steps
Add the columns to your Table class as protected data members.
Write the appropriate getters and setters for these columns
Override the hydrate method and within, populate your new columns with the data from other queries. Make sure to call parent::hydrate() as the first line
However, this isn't much better than what you're talking about already. You'll still need N + 1 queries to retrieve a single record set. However, you can get creative in step #3 so that N is the number of calculated columns, not the number of rows returned.
Another option is to create a custom selection method on your TablePeer class.
Do steps 1 and 2 from above.
Write custom SQL that you will query manually via the Propel::getConnection() process.
Create the dataset manually by iterating over the result set, and handle custom hydration at this point as to not break hydration when use by the doSelect processes.
Here's an example of this approach
<?php
class TablePeer extends BaseTablePeer
{
public static function selectWithCalculatedColumns()
{
// Do our custom selection, still using propel's column data constants
$sql = "
SELECT " . implode( ', ', self::getFieldNames( BasePeer::TYPE_COLNAME ) ) . "
, count(" . JoinedTablePeer::ID . ") AS calc_col
FROM " . self::TABLE_NAME . "
LEFT JOIN " . JoinedTablePeer::TABLE_NAME . "
ON " . JoinedTablePeer::ID . " = " . self::FKEY_COLUMN
;
// Get the result set
$conn = Propel::getConnection();
$stmt = $conn->prepareStatement( $sql );
$rs = $stmt->executeQuery( array(), ResultSet::FETCHMODE_NUM );
// Create an empty rowset
$rowset = array();
// Iterate over the result set
while ( $rs->next() )
{
// Create each row individually
$row = new Table();
$startcol = $row->hydrate( $rs );
// Use our custom setter to populate the new column
$row->setCalcCol( $row->get( $startcol ) );
$rowset[] = $row;
}
return $rowset;
}
}
There may be other solutions to your problem, but they are beyond my knowledge. Best of luck!
I am doing this in a project now by overriding hydrate() and Peer::addSelectColumns() for accessing postgis fields:
// in peer
public static function locationAsEWKTColumnIndex()
{
return GeographyPeer::NUM_COLUMNS - GeographyPeer::NUM_LAZY_LOAD_COLUMNS;
}
public static function polygonAsEWKTColumnIndex()
{
return GeographyPeer::NUM_COLUMNS - GeographyPeer::NUM_LAZY_LOAD_COLUMNS + 1;
}
public static function addSelectColumns(Criteria $criteria)
{
parent::addSelectColumns($criteria);
$criteria->addAsColumn("locationAsEWKT", "AsEWKT(" . GeographyPeer::LOCATION . ")");
$criteria->addAsColumn("polygonAsEWKT", "AsEWKT(" . GeographyPeer::POLYGON . ")");
}
// in object
public function hydrate($row, $startcol = 0, $rehydrate = false)
{
$r = parent::hydrate($row, $startcol, $rehydrate);
if ($row[GeographyPeer::locationAsEWKTColumnIndex()]) // load GIS info from DB IFF the location field is populated. NOTE: These fields are either both NULL or both NOT NULL, so this IF is OK
{
$this->location_ = GeoPoint::PointFromEWKT($row[GeographyPeer::locationAsEWKTColumnIndex()]); // load gis data from extra select columns See GeographyPeer::addSelectColumns().
$this->polygon_ = GeoMultiPolygon::MultiPolygonFromEWKT($row[GeographyPeer::polygonAsEWKTColumnIndex()]); // load gis data from extra select columns See GeographyPeer::addSelectColumns().
}
return $r;
}
There's something goofy with AddAsColumn() but I can't remember at the moment, but this does work. You can read more about the AddAsColumn() issues.
Here's what I did to solve this without any additional queries:
Problem
Needed to add a custom COUNT field to a typical result set used with the Symfony Pager. However, as we know, Propel doesn't support this out the box. So the easy solution is to just do something like this in the template:
foreach ($pager->getResults() as $project):
echo $project->getName() . ' and ' . $project->getNumMembers()
endforeach;
Where getNumMembers() runs a separate COUNT query for each $project object. Of course, we know this is grossly inefficient because you can do the COUNT on the fly by adding it as a column to the original SELECT query, saving a query for each result displayed.
I had several different pages displaying this result set, all using different Criteria. So writing my own SQL query string with PDO directly would be way too much hassle as I'd have to get into the Criteria object and mess around trying to form a query string based on whatever was in it!
So, what I did in the end avoids all that, letting Propel's native code work with the Criteria and create the SQL as usual.
1 - First create the [get/set]NumMembers() equivalent accessor/mutator methods in the model object that gets returning by the doSelect(). Remember, the accessor doesn't do the COUNT query anymore, it just holds its value.
2 - Go into the peer class and override the parent doSelect() method and copy all code from it exactly as it is
3 - Remove this bit because getMixerPreSelectHook is a private method of the base peer (or copy it into your peer if you need it):
// symfony_behaviors behavior
foreach (sfMixer::getCallables(self::getMixerPreSelectHook(__FUNCTION__)) as $sf_hook)
{
call_user_func($sf_hook, 'BaseTsProjectPeer', $criteria, $con);
}
4 - Now add your custom COUNT field to the doSelect method in your peer class:
// copied into ProjectPeer - overrides BaseProjectPeer::doSelectJoinUser()
public static function doSelectJoinUser(Criteria $criteria, ...)
{
// copied from parent method, along with everything else
ProjectPeer::addSelectColumns($criteria);
$startcol = (ProjectPeer::NUM_COLUMNS - ProjectPeer::NUM_LAZY_LOAD_COLUMNS);
UserPeer::addSelectColumns($criteria);
// now add our custom COUNT column after all other columns have been added
// so as to not screw up Propel's position matching system when hydrating
// the Project and User objects.
$criteria->addSelectColumn('COUNT(' . ProjectMemberPeer::ID . ')');
// now add the GROUP BY clause to count members by project
$criteria->addGroupByColumn(self::ID);
// more parent code
...
// until we get to this bit inside the hydrating loop:
$obj1 = new $cls();
$obj1->hydrate($row);
// AND...hydrate our custom COUNT property (the last column)
$obj1->setNumMembers($row[count($row) - 1]);
// more code copied from parent
...
return $results;
}
That's it. Now you have the additional COUNT field added to your object without doing a separate query to get it as you spit out the results. The only drawback to this solution is that you've had to copy all the parent code because you need to add bits right in the middle of it. But in my situation, this seemed like a small compromise to save all those queries and not write my own SQL query string.
Add an attribute "orders_count" to a Customer, and then write something like this:
class Order {
...
public function save($conn = null) {
$customer = $this->getCustomer();
$customer->setOrdersCount($customer->getOrdersCount() + 1);
$custoner->save();
parent::save();
}
...
}
You can use not only the "save" method, but the idea stays the same. Unfortunately, Propel doesn't support any "magic" for such fields.
Propel actually builds an automatic function based on the name of the linked field. Let's say you have a schema like this:
customer:
id:
name:
...
order:
id:
customer_id: # links to customer table automagically
completed: { type: boolean, default false }
...
When you build your model, your Customer object will have a method getOrders() that will retrieve all orders associated with that customer. You can then simply use count($customer->getOrders()) to get the number of orders for that customer.
The downside is this will also fetch and hydrate those Order objects. On most RDBMS, the only performance difference between pulling the records or using COUNT() is the bandwidth used to return the results set. If that bandwidth would be significant for your application, you might want to create a method in the Customer object that builds the COUNT() query manually using Creole:
// in lib/model/Customer.php
class Customer extends BaseCustomer
{
public function CountOrders()
{
$connection = Propel::getConnection();
$query = "SELECT COUNT(*) AS count FROM %s WHERE customer_id='%s'";
$statement = $connection->prepareStatement(sprintf($query, CustomerPeer::TABLE_NAME, $this->getId());
$resultset = $statement->executeQuery();
$resultset->next();
return $resultset->getInt('count');
}
...
}