Random code execution stops when using Eloquent chunk()

Random code execution stops when using Eloquent chunk() - php

I have a method for a scheduled system cleanup that goes through all the files in the "storage" table, selecting the type of files we need (property photos), and then going through each of them defining if corresponding listing still exists in the database. If not, removing the record from the DB and removing the file itself.
Now about the problem. Originally I didn't use chunk(), it was just Model::all() to select everything and it all worked well. But at this point I've got 200000 records in that storage table and these operations began to crash because of enormous memory consumption. So I decided to go with chunk().
So the problem is that now it works as it should, however, at some random moments (somewhere around the middle of the process) the code execution just stops as if the operation was completed, so no errors logged anywhere and the task is not fully completed.
Can you please suggest what can be the cause of such strange behavior?
public function verifyPhotos() {
// Instantiating required models and putting them into a single array so they can be passed to a closure
$models = [];
$models['storage'] = App::make('Store');
$models['condo'] = App::make('Condo');
$models['commercial'] = App::make('Commercial');
$models['residential'] = App::make('Residential');
// Obtaining and processing all records from the storage chunk by chunk
$models['storage']->where('subject_type', '=', 'property_photo')->chunk(10000, function($files) use(&$models) {
// Going through each record in current chunk
foreach ($files as $photo) {
// If record's subject type is Condo
if ($photo->subject_name == 'CondoProperty') {
// Selecting Condo model to work with
$current_model = $models['condo'];
// If record's subject type is Commercial
} elseif ($photo->subject_name == 'CommercialProperty') {
// Selecting Commercial model to work with
$current_model = $models['commercial'];
// If record's subject type is Residential
} elseif($photo->subject_name == 'ResidentialProperty') {
// Selecting Residential model to work with
$current_model = $models['residential'];
}
// If linked listing doesn't exist anymore
if (!$current_model->where('ml_num', '=', $photo->owner_id)->first()) {
// Deleting a storage record and physical file
Storage::delete('/files/property/photos/'.$photo->file_name);
$models['storage']->unregisterFile($photo->id);
}
}
});
}

Using chunk() in Eloquent will add a limit and an offset to your SQL query and execute it for every chunk. If you change your data in the database reducing the rows matched by the query, you will skip over the matching rows in the next execution because of the offset.
I.e. if you have 9 rows with id = 1...9 and subject_type = 'property_photo' and you use chunk(3, ...) the resulting queries are:
select * from store where subject_type = 'property_photo' limit 3 offset 0;
select * from store where subject_type = 'property_photo' limit 3 offset 3;
select * from store where subject_type = 'property_photo' limit 3 offset 6;
select * from store where subject_type = 'property_photo' limit 3 offset 9;
If you inside the each chunk set subject_type = 'something' for each row, those rows no longer match and the next query which offsets by 3 would effectively skip the next 3 matching.
It may be possible to use the Collection::each() closure instead like below, although it would still have to load the entire result set into a collection:
$models['storage']->where('subject_type', '=', 'property_photo')->get()->each(function ($photo) use (&$models) {
if ($photo->subject_name == 'CondoProperty') {
//...
}
//...
});
Remember you can also run DB::disableQueryLog(); to save memory on large database operations.

You should add try ... catch to some suspicious codes and print exception message into log file. I also once found the same problem and eventually found that it was also about memory consumption.
Most suspicious part for me is reusing models, $current_model->where(). I suspect that the memory may not be released after each query. Basically each query should be used only once. Is there any reason to reuse it?
Try change to $current_model = App::make('YourModel'); instead of reusing via $models and see if it solves.

Related

Handling conditional SELECT and UPDATE queries from concurrent API calls

I have a system for offering different types of prizes to users based on chance defined for them. I have a column called totalCount which contains the number of total prizes of that type that can be awarded, and a column called awardedCount which contains the number of prizes already awarded. The client makes a call to the specific API, to check for prize.
Inside the API logic, first I am checking how many available prize types are there and for this I am checking if (awardedCount < totalCount). Here I am getting a list of prizes that i am passing to the chance algorithm and getting a specific type of prize in return. After this I am updating the table data for that specific type of prize by awardedCount = awardedCount + 1.
This is working perfectly till I am having multiple concurrent calls to the API. I have tested by making a hundred concurrent calls, and keeping totalCount = 5. And many times more prizes are being awarded than is intended, I have got 6 or 7 prizes being awarded on multiple tests with concurrent calls. I believe this is because of the multiple threads calling the API and the SQL queries being executed from there not synchronised accordingly, which is very natural in such a case.
Code is given below:
public function claimPrize($criteria) {
// Check login
if(AuthHelper::loginStatus($criteria)) {
// Get available prizes
$availablePrizes = $this->getPrizes();
// Award a prize
$awardedPrize = $this->selectPrize($availablePrizes);
// Update awarded prize
$response = $this->updatePrize($awardedPrize);
}
}
private function getPrizes() {
$prizesQuery = DoctrineQuery::from('cms_prizes')
->select('*')
->addWhere('awardedCount < totalCount');
$prizesAvailable = $prizesQuery->execute();
return $prizesAvailable->toArray();
}
private function selectPrize($availablePrizes) {
// Chance calculation done here
return ChanceHelper::getPrize($availablePrizes);
}
private function updatePrize($awardedPrize) {
$prizesQuery = DoctrineQuery::from('cms_prizes')
->select('*')
->addWhere('type = ?', array($awardedPrize['type']))
->getFirst();
$prizeModel = $prizesQuery->execute();
if($prizeModel) {
$prizeModel->awardedCount = $prizeModel->awardedCount + 1;
$prizeModel->save();
}
return $prizeModel;
}
I need some help on how to handle this issue. The API is written in PHP and I am using MySQL for database.
P.S.: The title might be vague as I could not come up with anything that will explain this problem in short. If so is the case, please help me out on that as well.

how to insert unique codes in a field with laravel?

my problem is that with a job (done in laravel 5.7) I need to change the codes of some promotions every day, the problem with the code that I have made changes the promotion code but it is the same for everyone, and I need it different for each of the promotions.
my code
DB::table('promociones')->update(['codigo_promocion'=>str_random(4)]);

If you don't have a lot of data like thousands of rows to update you can use a loop for it.
Using Eloquent:
$promociones = Promociones::all();
foreach($promociones as $promocion) {
$promocion->codigo_promocion = $this->generateUniqueString();
$promocion->save();
}
or by using query builder:
$promociones = DB::table('promociones')->get();
forea ch($promociones as $promocion) {
DB::table('promociones')
->where('id', $promocion->id)
->update(['codigo_promocion'=> $this->generateUniqueString()]);
}
And the generateUniqueString should check if the string is already inserted in the database:
private function generateUniqueString()
{
while(true) {
$randomString = str_random(4);
$doesCodeExist = DB::table('promociones')
->where('codigo_promocion', $randomString)
->count();
if (! $doesCodeExist) {
return $randomString;
}
}
}
Use where() if you need to filter the data.
Keep in mind if you have a lot of data you should consider using an approach like queues for example.

A simple way to handle this would be looping:
$promotions = Promocione::where(...)->get();
// Note, replace `...` with logic to target specific codes that need to be changed today.
foreach($promotions AS $promotion){
$promotion->codigo_promocion = str_random(4);
$promotion->save();
}
This assumes you have a Promocione model, and can be very performance intensive depending on the number of records in the database. Also, str_random(4) doesn't guaranteed a random value (in comparison to other uses of str_random(4) in the same loop), nor does it provide a large pool of random values, so you'll likely end up with duplicates. You can query for existing duplicates while looping, and generate a new code if you find one, but as you exhaust your pool of str_random(4) codes, this process will "lock up" to the eventually point of infinite execution.

Yii2 - Calling model functions on entire activeRecords?

What I am trying to do
I want to query a specific set of records using active model like so
$jobModel = Jobs::find()->select('JOB_CODE')->distinct()->where(['DEPT_ID'=>$dept_id])->all();
Then I want to assign a flag attribute to the records in this activerecord based on whether they appear in a relationship table
What I have tried
So in my job model, I have declared a new attribute inAccount. Then I added this function in the job model that sets the inAccount flag to -1 or 0 based on whether a record is found in the relationship table with the specified account_id
public function assignInAccount($account_id){
if(JobCodeAccounts::find()->where(['JOB_CODE'=>$this->JOB_CODE])->andWhere(['ACCOUNT_ID'=>$account_id])->one() == null){
$this->inAccount=0;
}
else{
$this->inAccount = -1;
}
}
What I have been doing is assigning each value individually using foreach like so
foreach($jobModel as $job){
$job->assignInAccount($account_id);
}
However, this is obviously very slow because if I have a large number of records in $jobModel, and each one makes a db query in assignInAccount() this could obviously take some time if the db is slow.
What I am looking for
I am wondering if there is a more efficient way to do this, so that I can assign inAccount to all job records at once. I considered using afterFind() but I don't think this would work as I need to specify a specific parameter. I am wondering if there is a way I can pass in an entire model (or at least array of models/model-attributes and then do all the assignations running only a single query.
I should mention that I do need to end up with the original $jobModel activerecord as well

Thanks to scaisEdge's answer I was able to come up with an alternative solution, first finding the array of jobs that need to be flagged like so:
$inAccountJobs = array_column(Yii::$app->db->createCommand('Select * from job_code_accounts where ACCOUNT_ID = :account_id')
->bindValues([':account_id' => $account_id])->queryAll(), 'JOB_CODE');
and then checking each job record to see if it appears in this array like so
foreach($jobModel as $job){
if(in_array($job->JOB_CODE, $inAccountJobs))
$job->inAccount = -1;
else
$job->inAccount = 0;
}
Does seem to be noticeably faster as it requires only a single query.

Why CDbCacheDependency's query is executed twice?

I'm trying to cache some results retrieved from database using Yii framework 1.1.12. Here is what I am doing in short:
public static function getCategories()
{
if (self::$_categories !== null)
return self::$_categories;
print "Getting categories...";
self::$_categories = Yii::app()->cache->get("categoriesList");
if (self::$_categories === false)
{
$sql = "SELECT id, parent_id, name FROM {{category}} WHERE id > 0 AND is_deleted = 0";
$categoriesList = Yii::app()->db->createCommand($sql)->queryAll();
// Doing some work with $categoriesList and obtaining self::$_categories as the result
// ...
$dependency = new CDbCacheDependency("SELECT MAX(update_time) FROM {{category}}");
Yii::app()->cache->set("categoriesList", self::$_categories, 3600, $dependency);
}
return self::$_categories;
}
Using the profiling tool I can see that everything works. At the first time both queries are executed (each query once):
SELECT MAX(update_time) FROM arrenda_category
SELECT id, parent_id, name FROM arrenda_category WHERE id > 0 AND is_deleted = 0
On further requests only first one is executed.
The problem is when I increase max value of update_time in arrenda_category table (even not using my own edit script - directly from MySQL command line) and refresh the page a count of SELECT MAX(update_time) FROM arrenda_category queries becomes equal to 2. Further refreshes give only one execution again. The interesting thing is if I clear the cache I have one execution of SELECT MAX(...) ... query too.
So I don't understand why a query of cache dependency class is executed twice on condition's change. Is there something wrong with my code or maybe anything else?
P.S. I'm sure that SELECT MAX(update_time) FROM arrenda_category can be executed only in function described above. I also see that the line print "Getting categories..." is reached once per page request.

Yes. It is expected.
EXPLANATION
Suppose the data is already in the cache. And when you call the function getCategories() , the line Yii::app()->cache->get("categoriesList") will execute the dependancy query to check whether the data is changed. Since it is not changed the query is executed one time only.
Now you changed the update_time value externally ( or using some another code in your app ), and you call the getCategories() again,
The line Yii::app()->cache->get("categoriesList") executes the dependancy query to check whether the data in the cache is valid. It finds that data is invalid and returns false
Then the query SELECT id, parent_id, name FROM {{category}} WHERE id > 0 AND is_deleted = 0 is executed to fetch the updated data from the database
The line Yii::app()->cache->set("categoriesList", self::$_categories, 3600, $dependency); AGAIN executes the dependancy query SELECT MAX(update_time) FROM {{category}} to get the latest MAX(update_time) whose result is stored along with the data. Thats why the query is executed twice.
So the point is that every time you set() a value to cache, the dependancy value must be stored along with it since it is needed for the subsequent get() queries for checking whether the dependency is changed.
PS:
If you want more clarification check the source code of the set() function of your cache application component ,it calls the evaluateDependency() function of the CDbCacheDependancy class which inturn calls the generateDependentData() which causes the execution of the dependancy query

Database design: Matching sql database keys to php constants?

Well this is a simple design question I've wondered about many times and never found a satisfying solution for. My example is with php-sql, but this certainly applies to other languages too.
I have a small database table containing only very few entries, and that almost never needs updating. eg this usertype table:
usertype_id (primary key) | name | description
---------------------------+------------+-------------------
1 | 'admin' | 'Administrator'
2 | 'reguser' | 'Registered user'
3 | 'guest' | 'Guest'
Now in the php code, I often have to check or compare the type of user I'm dealing with. Since the user types are stored in the database, I can either:
1) Select * from the usertype table at class instantiation, and store it in an array.
Then all the ids are available to the code, and I can do a simple select to get the rows I need. This solution requires an array and a db query every time the class is instantiated.
$query = "SELECT info, foo FROM user WHERE usertype_id = ".$usertypes['admin'];
2) Use the name column to select the correct usertype_id, so we can effectively join with other tables. This is more or less equivalent to 1) but without needing to cache the whole usertype table in the php object:
$query = "SELECT info, foo FROM user JOIN usertype USING (usertype_id) WHERE usertype.name = 'admin' ";
3) Define constants that match the keys in the usertype table:
// As defines
define("USERTYPE_ADMIN",1);
define("USERTYPE_REGUSER",2);
//Or as class constants
const USERTYPE_ADMIN = 1;
const USERTYPE_REGUSER = 2;
And then do a simple select.
$query = "SELECT info, foo FROM user WHERE usertype_id = " . USERTYPE_ADMIN;
This is probably the most resource-efficient solution, but it is bad to maintain, as you have to update both the table and the code if you need to modify something in the usertype table..
4) Scrap the usertype table and only keep the types in the php code. I don't really like this because it lets any value get into the database and get assigned to the type of user. But maybe, all things considered, it isn't so bad and i'm just complicating something that should be simple..
Anyways, to sum it up the solution I like most is #2 because it's coherent and with an index on usertype.name, it can't be that bad. But what I've often ended up using is #3, for efficiency.
How would you do it? Any better solutions?
(edit: fixed query in #2)

I would suggest #3 to avoid useless queries, and prevent risk of behavior changes if existing DB table rows are incidentally modified:
Adding the necessary constants in the model class:
class Role // + use namespaces if possible
{
// A good ORM could be able to generate it (see #wimvds answer)
const ADMIN = 1;
const USER = 2;
const GUEST = 3;
//...
}
Then querying like this makes sense:
$query = "SELECT info, foo FROM user WHERE role_id = ".Role::ADMIN;
With an ORM (e.g. Propel in the example below) you'll end up doing:
$isAdminResults = UserQuery::create()->filterByRoleId(Role::ADMIN);

I almost always go for option 3). You could generate the code needed automatically based on what is available in the DB. The only thing you have to remember then is that you have to run the script to update/rewrite that info when you add another role (but if you're using phing or a similar build tool to deploy your apps, just add a build rule for it to your deploy script and it will always be run whenever you deploy your code :p).

Why not denormalize the DB table so instead of having usertype_id, you'd have usertype with the string type (admin). Then in PHP you can just do define('USERTYPE_ADMIN', 'admin');. It saves you from having to modify two places if you want to add a user type...
And if you're really worried about any value getting in, you could always make the column an ENUM data type, so it would self manage...

For tables that will contain "type" values especially when is expected such table to change over time I tend to use simple approach:
Add Varchar column named hid (comes from "human readable id") with unique key. Then I fill it with id meaningful to humans like:
usertype_id (primary key) | name | description | hid (unique key)
---------------------------+------------+-------------------+---------------
1 | 'admin' | 'Administrator' | 'admin'
2 | 'reguser' | 'Registered user' | 'user'
3 | 'guest' | 'Guest' | 'guest'
When you need the actual id you will have to do select based on hid column, i.e.
select usertype_id from tablename where hid = "admin"
This is not an efficient approach but it will ensure compatibility of your application among different deployments (i.e. one client may have 1.admin, 2. guest; other client 1.admin, 2. user, etc.). For your case I think #3 is pretty suitable but if you expect to have more than 10 different user roles - try the "hid" approach.

Are you using any kind of framework here? Could these values be stored in a single source - a config file - which both creates a list of the objects in PHP and also populates the table when you bootstrap the database? I'm thinking from a Rails perspective, as it's been a while since I've written any PHP. Solution there would probably be fixtures.

Why not to make it just
foreach (getdbarr("SELECT * FROM usertype") as $row) {
define($row['name'],$row['id']);
}

You shouldn't need a JOIN in every query to fetch the information about types/roles. You can keep your 'user' model and 'role' models separate in the data access objects (DAO) -- especially since there are so few records for user types.
In most cases where I have a limited number of options that I'd otherwise be joining against a large table, I cache them in memcached as an associative array. In the event I need some information about a particular relationship (like a role) I just lazy load it.
$user = DAO_User::get(1); // this pulls a JOIN-less record
$role = $user->getRole(); // lazy-load
The code for $user->getRole() can be something like:
public function getRole() {
// This comes from a cache that may be called multiple
// times per request with no penalty (i.e. store in a registry)
$roles = DAO_UserRoles::getAll();
if(isset($roles[$this->role_id]))
return $roles[$this->role_id];
return null; // or: new Model_UserRole();
}
This also works if you want to display a list with 1000 users on it. You can simply render values for that column from a single $roles associative array.
This is a major performance improvement on the SQL end, and it goes a long way to reducing complexity in your code base. If you have several other foreign keys on the user table you can still use this approach to grab the necessary information when you need it. It also means you can have dependable Model_* classes without having to create hybrids for every possible combination of tables you might JOIN -- which is much better than simply getting a result set, iterating it, and freeing it.
Even with more than 100 rows on both sides of your JOIN, you can still use the lazy load approach for infrequent or highly redundant information. With a reasonable caching service in your code, there's no penalty for calling DAO_UserRole::get(1500) multiple times because subsequent calls during the same request shouldn't hit the database twice. In most cases you're only going to be displaying 10-25 rows per page out of 1000s, and lazy loading will save your database engine from having to JOIN all the extraneous rows before you actually need them.
The main reason to do a JOIN is if your WHERE logic requires it, or if you need to ORDER BY data from a foreign key. Treating JOINs as prohibitively expensive is a good habit to be in.

For basicly static lookup tables, I generally make static constant files (such as your #3). I generally use classes such as:
namespace Constants;
class UserTypes {
const ADMIN = 1;
const USER = 2;
const GUEST = 3;
}
$id = Constants\UserTypes::ADMIN;
When I'm using lookup takes that are a bit more variable, then I'll pull it into a object and then cache it for 24 hours. That way it only gets updated once a day. That will save you from making database round trips, but allow you to deal with things in code easily.

Yeah, you're right about avoiding #3 and sticking with #2. As much as possible, look-ups like when you use a usertype table to contain the roles and then relate them to the user table using the id values should stay in the database. If you use constants, then the data must always rely on your php code to be interpreted. Also, you can enforce data integrity by using foreign keys (where servers allow) and it will allow you to port the reporting from your php code to other reporting tools. Maintenance also becomes easier. Database administrators won't need to know php in order to derive the meanings of the numbers if you used #3, should they ever be asked to aid in reports development. It may not seem too relevant, but in terms of maintenance, using stored procedures than embedded sql in your php code would also be maintenance-friendly in several ways, and will also be advantageous to DBAs.

I'd go for option #2 and use the join as it is intended to be used. You never know what the future will throw up, it's always better to be prepared today!
With regards to leaving the database alone as much as possible for such operations, there is also the possibility of caching in the long term. For this route, within PHP an option is to use a file cache, one that will only get updated when time calls for it. For the framework I have created, here's an example; I'd be interested to know what people think:
Note:
(LStore, LFetch, GetFileName) belong to a Cache object which gets called statically.
(Blobify and Unblobify) belong to a SystemComponent object which is always alive
Each piece of cache data has a key. this is the only thing you ever have to remember
public function LStore($key,$data, $blnBlobify=true) {
/* Opening the file in read/write mode */
$h = fopen(self::GetFileName($key, 'longstore'),'a+');
if (!$h) throw new Exception('Could not write to cache');
flock($h,LOCK_EX); // exclusive lock, will get released when the file is closed
fseek($h,0); // go to the start of the file
/* truncate the file */
ftruncate($h,0);
if($blnBlobify==true) { $data = SystemComponent::Blobify(array($data)); }
If (fwrite($h,$data)===false) {
throw new Exception('Could not write to cache');
}
fclose($h);
}
public function LFetch($key) {
$filename = self::GetFileName($key, 'longstore');
if (!file_exists($filename)){ return false;}
$h = fopen($filename,'r');
if (!$h){ return false;}
/* Getting a shared lock */
flock($h,LOCK_SH);
$data = file_get_contents($filename);
fclose($h);
$data = SystemComponent::Unblobify($data);
if (!$data) {
/* If unserializing somehow didn't work out, we'll delete the file */
unlink($filename);
return false;
}
return $data;
}
/* This function is necessary as the framework scales different directories */
private function GetFileName($key, $strCacheDirectory='') {
if(!empty($strCacheDirectory)){
return SystemComponent::GetCacheAdd() . $strCacheDirectory.'/' . md5($key);
} else {
return SystemComponent::GetCacheAdd() . md5($key);
}
}
public function Blobify($Source){
if(is_array($Source)) { $Source = serialize($Source); }
$strSerialized = base64_encode($Source);
return $strSerialized;
}
public function Unblobify($strSerialized){
$Decoded = base64_decode($strSerialized);
if(self::CheckSerialized($Decoded)) { $Decoded = unserialize($Decoded); }
return $Decoded;
}
function CheckSerialized($Source){
$Data = #unserialize($Source);
if ($Source === 'b:0;' || $Data !== false) {
return true;
} else {
return false;
}
}
Now when it comes to accessing the actual data, I just call a fetch. For making sure it is up to date, I tell it to store. In your case, this would be after updating the usertype table.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Random code execution stops when using Eloquent chunk() - php

Related

Handling conditional SELECT and UPDATE queries from concurrent API calls

how to insert unique codes in a field with laravel?

Yii2 - Calling model functions on entire activeRecords?

Why CDbCacheDependency's query is executed twice?

Database design: Matching sql database keys to php constants?

Categories

Resources