I have a legacy database linked to a CakePHP (2.5.3) application. The database has a LOT of columns in all of its tables, many of which are completely unused (for example in one table I only need 2 columns out of 80 blank ones), so as a result I always have to specify 'fields' whenever I run a query. I read elsewhere that I can unset the fields by using code like this in the model:
function beforeFind($query) {
$this->schema();
unset($this->_schema['ColumnName']);
unset($this->_schema['ColumnName2']);
etc.
}
And this seems to work okay, the problem is that I am using 80+ lines of code to unset columns when really I only need to set two. Is there a way to force CakePHP to let me manually define the columns in the schema?
I have tried declaring the $_schema variable at the start of the model and that doesn't seem to help.
Override Model::schema
Model::schema is the method which tells CakePHP what fields exist, you need only override it to say explicitly what fields it should contain, or filter out the fields you don't want.
e.g.:
function schema($field = false) {
if (!is_array($this->_schema) || $field === true) {
parent::schema();
$keepTheseKeys = array_flip(['Column1', 'Column2']);
$this->_schema = array_intersect_key($this->_schema, $keepTheseKeys);
}
return parent::schema($field);
}
Related
We have a COMMON database and then tenant databases for each organization that uses our application. We have base values in the COMMON database for some tables e.g.
COMMON.widgets. Then in the tenant databases, IF a table called modified_widgets exists and has values, they are merged with the COMMON.widgets table.
Right now we are doing this in controllers along the lines of:
public function index(Request $request)
{
$widgets = Widget::where('active', '1')->orderBy('name')->get();
if(Schema::connection('tenant')->hasTable('modified_widgets')) {
$modified = ModifiedWidget::where('active', '1')->get();
$merged = $widgets->merge($modified);
$merged = array_values(array_sort($merged, function ($value) {
return $value['name'];
}));
return $merged;
}
return $countries;
}
As you can see, we have model for each table and this works OK. We get the expected results for GET requests like this from controllers, but we'd like to merge at the Laravel MODEL level if possible. That way id's are linked to the correct tables and such when populating forms with these values. The merge means the same id can exist in BOTH tables. We ALWAYS want to act on the merged data if any exists. So it seems like model level is the place for this, but we'll try any suggestions that help meet the need. Hope that all makes sense.
Can anyone help with this or does anyone have any ideas to try? We've played with overriding model constructors and such, but haven't quite been able to figure this out yet. Any thoughts are appreciated and TIA!
If you put this functionality in Widget model you will get 2x times of queries. You need to think about Widget as an instance, what I am trying to say is that current approach does 2 queries minimum and +1 if tenant has modified_widgets table. Now imagine you do this inside a model, each Widget instance will pull in, in a best case scenario its equivalent from different database, so for bunch of Widgets you will do 1 (->all())+n (n = number of ModifiedWidgets) queries - because each Widget instance will pull its own mirror if it exists, no eager load is possible.
You can improve your code with following:
$widgets = Widget::where('active', '1')->orderBy('name')->get();
if(Schema::connection('tenant')->hasTable('modified_widgets')) {
$modified = ModifiedWidget::where('active', '1')->whereIn('id', $widgets->pluck('id'))->get(); // remove whereIn if thats not the case
return $widgets->merge($modified)->unique()->sortBy('name');
}
return $widgets;
OK, here is what we came up with.
We now use a single model and the table names MUST be the same in both databases (setTable does not seem to work even though in exists in the Database/Eloquent/Model base source code - that may be why it's not documented). Anyway = just use a regular model and make sure the tables are identical (or at least the fields you are using are):
<?php
namespace App\Models;
use Illuminate\Database\Eloquent\Model;
class Widget extends Model
{
}
Then we have a generic 'merge controller' where the model and optional sort are passed in the request (we hard coded the 'where' and key here, but they could be made dynamic too). NOTE THIS WILL NOT WORK WITH STATIC METHODS THAT CREATE NEW INSTANCES such as $model::all() so you need to use $model->get() in that case:
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use Illuminate\Support\Facades\Config;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Schema;
class MergeController extends Controller
{
public function index(Request $request)
{
//TODO: add some validations to ensure model is provided
$model = app("App\\Models\\{$request['model']}");
$sort = $request['sort'] ? $request['sort'] : 'id';
$src_collection = $model->where('active', '1')->orderBy('name')->get();
// we setup the tenants connection elsewhere, but use it here
if(Schema::connection('tenant')->hasTable($model->getTable())) {
$model->setConnection('tenant');
$tenant_collection = $model->get()->where('active', '1');
$src_collection = $src_collection->keyBy('id')->merge($tenant_collection->keyBy('id'))->sortBy('name');
}
return $src_collection;
}
}
If you dd($src_collection); before returning it it, you will see the connection is correct for each row (depending on data in the tables). If you update a row:
$test = $src_collection->find(2); // this is a row from the tenant db in our data
$test->name = 'Test';
$test->save();
$test2 = $src_collection->find(1); // this is a row from the tenant db in our data
$test2->name = 'Test2'; // this is a row from the COMMON db in our data
$test2->save();
dd($src_collection);
You will see the correct data is updated no matter which table the row(s) came from.
This results in each tenant being able to optionally override and/or add to base table data without effecting the base table data itself or other tenants while minimizing data duplication thus easing maintenance (obviously the table data and population is managed elsewhere just like any other table). If the tenant has no overrides then the base table data is returned. The merge and custom collection stuff have minimal documentation, so this took some time to figure out. Hope this helps someone else some day!
What I am trying to do
I want to query a specific set of records using active model like so
$jobModel = Jobs::find()->select('JOB_CODE')->distinct()->where(['DEPT_ID'=>$dept_id])->all();
Then I want to assign a flag attribute to the records in this activerecord based on whether they appear in a relationship table
What I have tried
So in my job model, I have declared a new attribute inAccount. Then I added this function in the job model that sets the inAccount flag to -1 or 0 based on whether a record is found in the relationship table with the specified account_id
public function assignInAccount($account_id){
if(JobCodeAccounts::find()->where(['JOB_CODE'=>$this->JOB_CODE])->andWhere(['ACCOUNT_ID'=>$account_id])->one() == null){
$this->inAccount=0;
}
else{
$this->inAccount = -1;
}
}
What I have been doing is assigning each value individually using foreach like so
foreach($jobModel as $job){
$job->assignInAccount($account_id);
}
However, this is obviously very slow because if I have a large number of records in $jobModel, and each one makes a db query in assignInAccount() this could obviously take some time if the db is slow.
What I am looking for
I am wondering if there is a more efficient way to do this, so that I can assign inAccount to all job records at once. I considered using afterFind() but I don't think this would work as I need to specify a specific parameter. I am wondering if there is a way I can pass in an entire model (or at least array of models/model-attributes and then do all the assignations running only a single query.
I should mention that I do need to end up with the original $jobModel activerecord as well
Thanks to scaisEdge's answer I was able to come up with an alternative solution, first finding the array of jobs that need to be flagged like so:
$inAccountJobs = array_column(Yii::$app->db->createCommand('Select * from job_code_accounts where ACCOUNT_ID = :account_id')
->bindValues([':account_id' => $account_id])->queryAll(), 'JOB_CODE');
and then checking each job record to see if it appears in this array like so
foreach($jobModel as $job){
if(in_array($job->JOB_CODE, $inAccountJobs))
$job->inAccount = -1;
else
$job->inAccount = 0;
}
Does seem to be noticeably faster as it requires only a single query.
OK lets say I want to select a number of columns from a database table, but I won't know what those columns are in the method. I could pass them in, but it could be more or less depending on the method calling the database method.
A quick fix would be SELECT *, but I understand that this is bad and can cause more data to be returned than is necessary, and I definitely don't need all the data from that table.
So I am using CodeIgniter and prepared statements to do this, and below is what I have currently (it works, just point that out).
function get_pages() {
$this->db->select('pages.id, pages.title, pages.on_nav, pages.date_added, admin.first_name, admin.last_name')
->from('pages, admin')
->where('pages.admin_id = admin.id')
->order_by('pages.id', 'ASC');
$query = $this->db->get();
return $query->result();
}
It's a simple function, but at the moment limited to getting only 'pages'. I want to convert this to work with getting from other tables too. What is the best way?
Many thanks in advance.
EDIT In CodeIgniter I have many Controllers. One for 'pages', one for 'products', one for 'news' and on and on. I don't want to create a single database query method in my model for each controller.
i think the desire to not have 4 methods is misguided. if you don't have the information in the method, you'll have to pass it in. so you could either pass in a string with the table you want and switch over that changing the query based on the table name, or pass in all of the necessary parts of the query. this would include table name, criteria column, criteria, and columns to select. and you'd need to pass that information in every time you called the function. neither of those two methods are really going to save you much code, and they're both less readable than a function for each purpose.
The entire idea with models to put your specific queries to the persistence layer in there. Using a generic catch-all method can be disastrous and hard to test. You should shape your model around the problem you're trying to solve.
This makes it much cleaner and easier to work with. At the same time you must also avoid the common trap of over-sizing models. Each model should follow the SRP. Try and separate concerns so that in your controller, you can easily see state changes.
Does that make sense or am I just rambling...?
In your model:
function get_pages($table_source) {
$this->db->select($table_source.".id"); // or $this->db->select('id');
// for instance, if one of your $table_source ="users" and there is no 'title' column you can write
if($table_source!='users') $this->db->select('title');
$this->db->select('on_nav');
$this->db->select('date_added');
$this->db->select('admin.first_name');
$this->db->select('admin.last_name');
$this->db->join('admin','admin.id = '.$table_source.'.admin_id')
$this->db->order_by('pages.id', 'ASC');
$query = $this->db->get($table_source);
return $query->result_array();
}
In your controller:
function all_tables_info() {
$tables = array("pages","users","customers");
$i=0;
foreach($tables as $table) {
$data[$i++]=$this->your_Model->get_pages($table);
}
//do somthing with $data
}
Well this is a simple design question I've wondered about many times and never found a satisfying solution for. My example is with php-sql, but this certainly applies to other languages too.
I have a small database table containing only very few entries, and that almost never needs updating. eg this usertype table:
usertype_id (primary key) | name | description
---------------------------+------------+-------------------
1 | 'admin' | 'Administrator'
2 | 'reguser' | 'Registered user'
3 | 'guest' | 'Guest'
Now in the php code, I often have to check or compare the type of user I'm dealing with. Since the user types are stored in the database, I can either:
1) Select * from the usertype table at class instantiation, and store it in an array.
Then all the ids are available to the code, and I can do a simple select to get the rows I need. This solution requires an array and a db query every time the class is instantiated.
$query = "SELECT info, foo FROM user WHERE usertype_id = ".$usertypes['admin'];
2) Use the name column to select the correct usertype_id, so we can effectively join with other tables. This is more or less equivalent to 1) but without needing to cache the whole usertype table in the php object:
$query = "SELECT info, foo FROM user JOIN usertype USING (usertype_id) WHERE usertype.name = 'admin' ";
3) Define constants that match the keys in the usertype table:
// As defines
define("USERTYPE_ADMIN",1);
define("USERTYPE_REGUSER",2);
//Or as class constants
const USERTYPE_ADMIN = 1;
const USERTYPE_REGUSER = 2;
And then do a simple select.
$query = "SELECT info, foo FROM user WHERE usertype_id = " . USERTYPE_ADMIN;
This is probably the most resource-efficient solution, but it is bad to maintain, as you have to update both the table and the code if you need to modify something in the usertype table..
4) Scrap the usertype table and only keep the types in the php code. I don't really like this because it lets any value get into the database and get assigned to the type of user. But maybe, all things considered, it isn't so bad and i'm just complicating something that should be simple..
Anyways, to sum it up the solution I like most is #2 because it's coherent and with an index on usertype.name, it can't be that bad. But what I've often ended up using is #3, for efficiency.
How would you do it? Any better solutions?
(edit: fixed query in #2)
I would suggest #3 to avoid useless queries, and prevent risk of behavior changes if existing DB table rows are incidentally modified:
Adding the necessary constants in the model class:
class Role // + use namespaces if possible
{
// A good ORM could be able to generate it (see #wimvds answer)
const ADMIN = 1;
const USER = 2;
const GUEST = 3;
//...
}
Then querying like this makes sense:
$query = "SELECT info, foo FROM user WHERE role_id = ".Role::ADMIN;
With an ORM (e.g. Propel in the example below) you'll end up doing:
$isAdminResults = UserQuery::create()->filterByRoleId(Role::ADMIN);
I almost always go for option 3). You could generate the code needed automatically based on what is available in the DB. The only thing you have to remember then is that you have to run the script to update/rewrite that info when you add another role (but if you're using phing or a similar build tool to deploy your apps, just add a build rule for it to your deploy script and it will always be run whenever you deploy your code :p).
Why not denormalize the DB table so instead of having usertype_id, you'd have usertype with the string type (admin). Then in PHP you can just do define('USERTYPE_ADMIN', 'admin');. It saves you from having to modify two places if you want to add a user type...
And if you're really worried about any value getting in, you could always make the column an ENUM data type, so it would self manage...
For tables that will contain "type" values especially when is expected such table to change over time I tend to use simple approach:
Add Varchar column named hid (comes from "human readable id") with unique key. Then I fill it with id meaningful to humans like:
usertype_id (primary key) | name | description | hid (unique key)
---------------------------+------------+-------------------+---------------
1 | 'admin' | 'Administrator' | 'admin'
2 | 'reguser' | 'Registered user' | 'user'
3 | 'guest' | 'Guest' | 'guest'
When you need the actual id you will have to do select based on hid column, i.e.
select usertype_id from tablename where hid = "admin"
This is not an efficient approach but it will ensure compatibility of your application among different deployments (i.e. one client may have 1.admin, 2. guest; other client 1.admin, 2. user, etc.). For your case I think #3 is pretty suitable but if you expect to have more than 10 different user roles - try the "hid" approach.
Are you using any kind of framework here? Could these values be stored in a single source - a config file - which both creates a list of the objects in PHP and also populates the table when you bootstrap the database? I'm thinking from a Rails perspective, as it's been a while since I've written any PHP. Solution there would probably be fixtures.
Why not to make it just
foreach (getdbarr("SELECT * FROM usertype") as $row) {
define($row['name'],$row['id']);
}
You shouldn't need a JOIN in every query to fetch the information about types/roles. You can keep your 'user' model and 'role' models separate in the data access objects (DAO) -- especially since there are so few records for user types.
In most cases where I have a limited number of options that I'd otherwise be joining against a large table, I cache them in memcached as an associative array. In the event I need some information about a particular relationship (like a role) I just lazy load it.
$user = DAO_User::get(1); // this pulls a JOIN-less record
$role = $user->getRole(); // lazy-load
The code for $user->getRole() can be something like:
public function getRole() {
// This comes from a cache that may be called multiple
// times per request with no penalty (i.e. store in a registry)
$roles = DAO_UserRoles::getAll();
if(isset($roles[$this->role_id]))
return $roles[$this->role_id];
return null; // or: new Model_UserRole();
}
This also works if you want to display a list with 1000 users on it. You can simply render values for that column from a single $roles associative array.
This is a major performance improvement on the SQL end, and it goes a long way to reducing complexity in your code base. If you have several other foreign keys on the user table you can still use this approach to grab the necessary information when you need it. It also means you can have dependable Model_* classes without having to create hybrids for every possible combination of tables you might JOIN -- which is much better than simply getting a result set, iterating it, and freeing it.
Even with more than 100 rows on both sides of your JOIN, you can still use the lazy load approach for infrequent or highly redundant information. With a reasonable caching service in your code, there's no penalty for calling DAO_UserRole::get(1500) multiple times because subsequent calls during the same request shouldn't hit the database twice. In most cases you're only going to be displaying 10-25 rows per page out of 1000s, and lazy loading will save your database engine from having to JOIN all the extraneous rows before you actually need them.
The main reason to do a JOIN is if your WHERE logic requires it, or if you need to ORDER BY data from a foreign key. Treating JOINs as prohibitively expensive is a good habit to be in.
For basicly static lookup tables, I generally make static constant files (such as your #3). I generally use classes such as:
namespace Constants;
class UserTypes {
const ADMIN = 1;
const USER = 2;
const GUEST = 3;
}
$id = Constants\UserTypes::ADMIN;
When I'm using lookup takes that are a bit more variable, then I'll pull it into a object and then cache it for 24 hours. That way it only gets updated once a day. That will save you from making database round trips, but allow you to deal with things in code easily.
Yeah, you're right about avoiding #3 and sticking with #2. As much as possible, look-ups like when you use a usertype table to contain the roles and then relate them to the user table using the id values should stay in the database. If you use constants, then the data must always rely on your php code to be interpreted. Also, you can enforce data integrity by using foreign keys (where servers allow) and it will allow you to port the reporting from your php code to other reporting tools. Maintenance also becomes easier. Database administrators won't need to know php in order to derive the meanings of the numbers if you used #3, should they ever be asked to aid in reports development. It may not seem too relevant, but in terms of maintenance, using stored procedures than embedded sql in your php code would also be maintenance-friendly in several ways, and will also be advantageous to DBAs.
I'd go for option #2 and use the join as it is intended to be used. You never know what the future will throw up, it's always better to be prepared today!
With regards to leaving the database alone as much as possible for such operations, there is also the possibility of caching in the long term. For this route, within PHP an option is to use a file cache, one that will only get updated when time calls for it. For the framework I have created, here's an example; I'd be interested to know what people think:
Note:
(LStore, LFetch, GetFileName) belong to a Cache object which gets called statically.
(Blobify and Unblobify) belong to a SystemComponent object which is always alive
Each piece of cache data has a key. this is the only thing you ever have to remember
public function LStore($key,$data, $blnBlobify=true) {
/* Opening the file in read/write mode */
$h = fopen(self::GetFileName($key, 'longstore'),'a+');
if (!$h) throw new Exception('Could not write to cache');
flock($h,LOCK_EX); // exclusive lock, will get released when the file is closed
fseek($h,0); // go to the start of the file
/* truncate the file */
ftruncate($h,0);
if($blnBlobify==true) { $data = SystemComponent::Blobify(array($data)); }
If (fwrite($h,$data)===false) {
throw new Exception('Could not write to cache');
}
fclose($h);
}
public function LFetch($key) {
$filename = self::GetFileName($key, 'longstore');
if (!file_exists($filename)){ return false;}
$h = fopen($filename,'r');
if (!$h){ return false;}
/* Getting a shared lock */
flock($h,LOCK_SH);
$data = file_get_contents($filename);
fclose($h);
$data = SystemComponent::Unblobify($data);
if (!$data) {
/* If unserializing somehow didn't work out, we'll delete the file */
unlink($filename);
return false;
}
return $data;
}
/* This function is necessary as the framework scales different directories */
private function GetFileName($key, $strCacheDirectory='') {
if(!empty($strCacheDirectory)){
return SystemComponent::GetCacheAdd() . $strCacheDirectory.'/' . md5($key);
} else {
return SystemComponent::GetCacheAdd() . md5($key);
}
}
public function Blobify($Source){
if(is_array($Source)) { $Source = serialize($Source); }
$strSerialized = base64_encode($Source);
return $strSerialized;
}
public function Unblobify($strSerialized){
$Decoded = base64_decode($strSerialized);
if(self::CheckSerialized($Decoded)) { $Decoded = unserialize($Decoded); }
return $Decoded;
}
function CheckSerialized($Source){
$Data = #unserialize($Source);
if ($Source === 'b:0;' || $Data !== false) {
return true;
} else {
return false;
}
}
Now when it comes to accessing the actual data, I just call a fetch. For making sure it is up to date, I tell it to store. In your case, this would be after updating the usertype table.
I have a nasty problem. I want to get rid of a certain database field, but I'm not sure in which bits of code it's called. Is there a way to find out where this field is used/called from (except for text searching the code; this is fairly useless seeing as how the field is named 'email')?
Cheers
I would first text search the files for the table name, then only search the tables that contain the table name for the field name.
I wrote a program to do this for my own purposes. It builds an in-memory listing of tables and fields and relates the tables to the fields. Then it loops through tables, searching for the code files that contain the table names, and then searches those files for the fields in the tables found. I'd recommend a similar methodology in your case.
setting mysql to log all queries for some time might help. the queries will give you the tip where to look
brute force - set up a test instance - remove the column - and excercise your test suite.
create a before insert trigger on that table that monitors the insertion on that column.
at the same time create another table called monitor with only one column email
make that table insert the value of NEW.email field into monitor.email as well as in real table.
so you can run your application and check for the existence of any non-null value in monitor table
You should do this in PHP i would expect
For example:
<?php
class Query
{
var $command;
var $resource;
function __construct($sql_command = '')
{
$this->command = $sql_command;
}
public function setResource($resource)
{
$this->resource = $resource;
}
}
//then you would have some kind of database class, but here we would modify the query method.
class Database
{
function query(Query $query)
{
$resource = mysql_query($query->command);
$query->setResource($resource);
//Then you can send the class to the monitor
QueryMonitor::Monitor($query);
}
}
abstract class QueryMonitor
{
public static Monitor(Query $query)
{
//here you use $query->resource to do monitoring of queryies
//You can also parse the query and gather what query type it was:-
//Select or Delete, you can also mark what tables were in the Query
//Even meta data so
$total_found = mysql_num_rows($query->resource);
$field_table = mysql_field_table ($query->resource);
//Just an example..
}
}
?>
Obviously it would be more advanced than that but you can set up a system to monitor every query and every queries meta data in a log file or w.e