I was tasked to create this organization registration system. I decided to use MySQL and PHP to do it. Each organization in table orgs has a max_members column and has a unique id org_id. Each student in table students has an org column. Every time a student joins an organization, his org column is equated to the org_id of that organization.
When someone clicks join on an organization page, a PHP file executes.
In the PHP file, a query retrieves the total number of students whose org is equal to the org_id of the organization being joined.
$query = "SELECT COUNT(student_id) FROM students WHERE org = '$org_id'";
The maximum members is also retrieved from the orgs table.
$query = "SELECT max_members FROM orgs WHERE org_id = '$org_id'";
So I have variables $total_members and $max_members. A basic if statement checks if $total_members < $max_members, then updates the student's org equal to the org_id. If not, then it does nothing and notifies the student that the organization is full.
What my main concern is what if this situation happened:
Org A only has one slot left. 29/30 members.
Student A clicks join on Org A (and at the same time)
Student B clicks join on Org A
Student A retrieves data: There is one slot left
Student B retrieves data: There is one slot left
Student A's org = Org A's org_id
Student B's org = Org A's org_id
After the scripts have executed, Org A will show up with 31/30 members
Can this happen? If yes, how can I avoid it?
I've thought about using MySQL variables like this:
SET #org_id = 'what-ever-org';
SELECT #total_members := COUNT(student_id) FROM students WHERE org_main = #org_id;
SELECT #max_members := max_members FROM orgs WHERE org_id = #org_id;
UPDATE students SET org_main = IF(#total_members < #max_members, #org_id, '') WHERE student_id = 99999;
But I don't know if it would make a difference.
Row locking does not apply in my case. I think. I'd love to be proven wrong though.
The code I've written above is a simplified version of the original code. The original code included checking registration dates, org days, etc, however, these things are not related to the question.
What you're describing is usually called a race-condition. It occurs, because you perform two non-atomic operations on your database. To avoid this you need to use transactions, which ensure that the database server prevents this kind of interference. Another approach would be to use a "before update trigger".
Transaction
As you're using MySQL you have to make sure that the DB engine your tables are running on is InnoDB, because MyISAM just doesn't have transactions. Before you do your SELECT you need to start a transaction. Either send START TRANSACTION manually to the database or use a proper PHP implementation for it, e.g. PDO (PDO::beginTransaction()).
In a running transaction you can then use the suffix FOR UPDATE in your SELECT statement, which will lock the rows that have been selected by the query:
SELECT COUNT(student_id) FROM students WHERE org = :orgId FOR UPDATE
After your UPDATE statement you can commit the transaction, which will write the changes permanently to the database and remove the locks.
If you expect a lot of these simultaneous requests to happen, be aware that locking can cause some delay in the response, because the database might wait for a lock to be released.
Trigger
Instead of using transactions you can also create a trigger on the database, which will run before an update is executed on the database. In this trigger you could test if the maximum number of students has been exceeded and throw an error if that's the case. However, this can be a very challenging approach, especially if the value to be checked depends on something in the UPDATE statement. Also it is debatable if it's a good idea to implement this kind of logic on the database level.
There are two ways, use synchronized function in PHP which performs this operation. But if you want to implement all the logic in MySQL (I'd like this method), please use Stored Procedure.
Create a stored procedure as (not exactly):
CREATE PROCEDURE join_org(stu_id int, org_id, int, OUT success int)
DECLARE total_members int;
DECLARE max_members_Allowed;
SELECT COUNT(student_id) INTO total_members FROM students WHERE org = 'org_id';
SELECT max_members INTO max_members_Allowed FROM orgs WHERE org_id = 'org_id';
IF(max_members_Allowed > total_members) Then
UPDATE student SET orgid='org_id';
SET success = 1;
ELSE
SET success = 0;
END IF;
Then register this out variable named 'success' int your PHP code to indicate success or failure. Call this procedure when user clicks join.
Related
In my database, I have several tables. One is a checkpoint table that makes note of a user choosing to finalize one of their projects. This table contains a timestamp that is automatically created. Whenever a user finalizes their project a new row is added to the checkpoint table (that way we can also keep a history of previous times the project was finalized).
I have several other tables with timestamps (or tables that I could add timestamp columns too) that automatically update when their tables change.
Is there a simple way to be able to tell if any of the other tables have updated their data since the project was last finalized? I do not need to know which tables have changed data just that there are tables that have changed data.
For example, if a user changes data in one of their tables I want to be able to display a message indicating that their project has unfinalized data.
There are a couple of ways that I have thought about doing this:
Checking every single table to see if any timestamps are newer than the latest timestamp in the checkpoint table.
Add an additional timestamp column (I already have a created and updated timestamp column) to the main project table. Most of the other tables are linked directly or indirectly to this main project table. Add triggers to every other table to update this timestamp when their data changes. I am not quite sure yet how to correctly set up a proper trigger for this.
Creating a new table with just the project_id and a timestamp column. Add a trigger to the other tables as shown in option 2.
As new modules are added, I will be adding more tables to the project so will need something that is easy to scale as well.
Each of these approaches seems like there would be a lot of steps involved.
Would one of these approaches be more efficient or viable than another? Is there another approach that I am not thinking about? If triggers are the best way to do this how would I go about setting up the trigger?
A simplified overview of my tables looks like this:
main_project_table
id
user_id (FK to user_table)
created_timestamp
updated_timestamp
checkpoint_group_table (users can choose which group to finalize their project too)
id
user_id (FK to user_table)
group_name
checkpoint_table (the table that records the finalized data and time of finalization)
id
checkpoint_group_id (FK to checkpoint_group_table)
project_id (FK to main_project_table)
project_finalized_timestamp
parent_table (several of these)
id
project_id (FK to main_project_table)
child_table (0 or more of these for each parent_table)
id
parent_id (FK to parent_table)
You really only have three solutions: Middleware, Triggers, and General Log File.
Middleware solution:
Add a timestamp field to each relevant table, and set the default value is set to "CURRENT_TIMESTAMP". This will update the timestamp field to the current time on every update.
Assuming that users are going through some API, you can write a JOIN query where it returns the latest time stamp. It would look like this.
SELECT
CASE
WHEN b.timestamp IS NOT NULL THEN 0
WHEN c.timestamp IS NOT NULL THEN 0
WHEN d.timestamp IS NOT NULL THEN 0
WHEN e.timestamp IS NOT NULL THEN 0
ELSE 1
AS `test`
FROM checkpoint_table a
LEFT JOIN main_project_table b
ON a.project_id = b.id
AND b.timestamp > a.project_finalized_timestamp
LEFT JOIN checkpoint_group_table c
ON b.user_id = c.user_id
AND c.timestamp > a.project_finalized_timestamp
LEFT JOIN parent_table d
ON b.id = d.project_id
AND d.timestamp > a.project_finalized_timestamp
LEFT JOIN child_table e ON d.id = e.parent_id
ON b.id = d.project_id
AND e.timestamp > a.project_finalized_timestamp
Now when a request routed to the tables you can run this query and if test == 0, then you return the message.
<?php
class middleware{
public function getMessage(){
// run query
if($data[0]['test'] == 1){
return "project has unfinalized data";
}else{
return null;
}
}
}
Trigger Solution:
CREATE TRIGGER checkpoint_group_table
AFTER UPDATE on _table_
FOR EACH ROW UPDATE _table_
SET main_project_table.updated_timestamp = CURTIME()
WHERE main_project_table.user_id=checkpoint_group_table.id
The advantages to this are that it is perhaps more elegant than the middleware solution. The disadvantages are that triggers are not in plain view, and it is my experience, that when processes are in the background they eventually are forgotten. In the long term, you could be left with this Jenga puzzle, which would make like difficult.
General Log File Solution:
Mysql can log every query on the server. It is possible to access this log file during the time, parse it out, and figure out if any tables were updated. This way you can figure if anything was updated after the project was finalized.
Turn on a general log file.
SET GLOBAL general_log = 'ON';
Set the path of the log file.
SET GLOBAL general_log_file = 'var/log/mysql/mysql_general.log'
Confirm by going to the command terminal.
mysql -se "SHOW VARIABLES" | grep -e general_log
You might need to reset MySQL.
sudo service MySQL restart
This script can you started...
$v = shell_exec("sudo less /var/log/mysql/mysql_general.log");
$lines = explode("\n",$v);
$new = array();
foreach($lines as $i => $line){
if(substr($line,0,1) != " "){
if(isset($l)){
array_push($new,$l);
}
$l = $line;
}else{
$l.= preg_replace('/\s+/', ' ', $line);
}
}
$lines = $new;
$index = array();
foreach($lines as $i => $line){
$e = explode("\t",$line);
$new = array();
foreach($e as $key => $value){
$new[$key] = trim($value);
}
$index[$i] = $new;
}
This will result in this...
array(3) {
[0]=> string(27) "2017-10-01T08:17:04.659274Z"
[1]=> string(8) "70 Query"
[2]=> string(129) "UPDATE checkpoint_group_table SET group_name = 'Dev Group' Where id=6"
}
From here you can use a library called PHP-SQL-Parser to parse out the query.
The advantages to this approach might scale well, being that you will not have to add any columns to your database. The disadvantages are that this will involve more code and that means more complexity. You probably cannot really do this solution without writing unit tests for it.
If I would have been at your situation, I would have made a table with fields project id (FK) and boolean for is_finalized. So every time a project is finalized, I would add an entry in it.
+-----------------+--------------+
| project_id | is_finalized |
|-----------------|--------------|
| 12 | 1 |
+-----------------+--------------+
before any update/insert, Just check if this key exists for my project. if exists, change it to 0 and while loading the file, Just check if the value is 0. If 0, then show the Message: project has unfinalized data.
It should show the message only if the key exists and the value is 0. If the project is not finalized. The table won't have the value, hence no message.
Quite easy, faster in processing (rather than checking each timestamp) and extensible approach as it would be just dependent on the update or insert queries, which you can use in you upcoming modules in future.
timestamp comparison could be messy to do multiple check.
...I do not need to know which tables have changed data just that there are tables that have changed data...
Join-query to generate a (1) data-set, JSON/SERIALIZE it, then MD5, keep this hasted string into db. Next time compare it back, if there ANY different, the data-set has been changed. This is the general idea in large data/file comparison / code-repo.
but in light of...
..more tables to the project..
Then just use MD5 on each data-row in the table. Once changed the hashed string will be different.
Plan A: An off-the-wall solution:
Set up Master-Slave. The Slave will contain an 'old' copy of the data.
Establish "delayed" replication. Let's say 1 hour.
Get pt-table-checksum; run it twice an hour.
That will discover changes within an hour. (The timings may need tweaking if data size is quite large or small.)
Plan B
Deny all direct access from actual humans. Instead, build an application that handles all normal accesses through some API. Then I would instrument the API to collect whatever I choose.
Ad Hoc queries (for which there is no API):
Perhaps disallow them
Perhaps have a review board (me) to admit their running.
Perhaps have an API that runs the query, but immediately logs/emails/rings bells/whatever.
Really not sure why these answers are suggesting reliance on IDs or complex data-logging, this is a fairly common problem with some very simple solutions.
Use those parent/child relationships
Note: when documenting a schema, it is important to note more than just FK relationships, but also the type of replationship (one-to-one, many-to-one, one-to-many, many-to-many).
You already have a fairly well defined parent/child relationships, I assume to be:
main_project one<--many parent one<--many child
Use them one of two ways:
Update a date for parent and main_project which stores the most recent date any child was modified.
Use a combination of join/max/modified in a query utilizing main_project, parent, and child.
child_updated date
main_project.child_updated
parent.child_updated
Whenever updating any child, also update the child_modified dates for main_project and parent. Similar for parent, update main_project. This can be done with triggers, php, or some clever uses of joins or views to as the main_project objects. I would highly advise sticking to doing this with PHP models of those tables.
join/max/modified
Just build a query to get you four values, then check them:
checkpoint_table.main_project_finalized
main_project.modified
MAX(parent.modified)
MAX(child.modified)
These joins can get a bit tricky, so you'll have to play with this a bit.
SELECT m.modified as modified, MAX(c.project_finalized_timestamp) as finalized, MAX(p.modified) AS parent_modified, MAX(c.modified) as child_modified
FROM main_project_table m
LEFT JOIN checkpoint_table c
ON m.id = c.project_id
LEFT JOIN parent_table p
ON m.id = p.project_id
LEFT JOIN child_table c
ON p.id = c.parent_id
GROUP BY m.id
This will give you ONE row of all the dates you care about, allowing you to create some simple logic for it in PHP.
$result = // retrieve joined data as above
if ($result['finalized'] < max($result['modified'], $result['parent_modified'], $result['child_modified']) {
// changed
}
There are some good solutions mentioned so far. Another one is to make use of MySQL's information schema. Doing this, you can for example select all tables that have a timestamp field with the name you know, and check their modification times. This is probably the most dynamic and seamless approach but not really the best one. I would typically only do something like this if I was building an interface on top of legacy or third party code and didn't have control of that part of the application.
Architecturally I think the best approach is to have your application aware of pertinent tables / fields and do an audit of them. I am assuming that the data is relational to the object at question and therefore although they are foreign tables, they can still be easily checked for modifications.
Another good idea would be to add versioning to all of the tables in question so that during this step in your application you can show what changed.
I am currently working with PHP and SQL on a website. There is a database containing users (accounts), organisations, and a relational table to link organisations to accounts (a many to many relationship)
When I delete an account from the database, the SQL query should also delete any organisations the account is linked to if the account being deleted is the only account linked to an organisation.
I am relatively new to SQL and have constructed a query which should delete an organisation from the organisations table under the conditions described above.
Here is my query:
'DELETE FROM TBL_ORGANISATIONS WHERE id = (
SELECT org_id FROM TBL_AFFILIATIONS WHERE account_email = :email AND (
SELECT COUNT(*) FROM TBL_AFFILIATIONS WHERE org_id IN (
SELECT org_id FROM TBL_AFFILIATIONS WHERE account_email = :email
)
) = 1
)'
Is this the correct way to structure this query or is there a clearer / more efficient way to do this? As I previously mentioned I am fairly new to SQL and have not yet grasped the concept of all the SQL keywords which can be useful in constructing queries such as this (JOIN etc.)
I thank you all in advance for any advice you can provide.
By the way:
I am using PDO hence the :email for those of you wondering.
If you have foreign key constraints, like I think you should, then this statement will fail, because the affiliation record still points to the organisation record to be deleted.
You can use ON DELETE CASCADE to delete the organisation, like Mihai suggested in his (now deleted) comment, but to do that, you will still have to check whether there's only one affiliation linked to the organisation.
In this case, I'd rather query the organisation's ID first. You will probably have that at hand anyway, because you'll know the details of the account you are deleting. Then first delete the account and next delete the organisation if you need to, with a statement that looks like this:
DELETE FROM TBL_ORGANISATIONS o
WHERE
o.id = :ThatIdYouQueriedBefore AND
NOT EXISTS (SELECT 'x' FROM TBL_AFFILIATIONS a.org_id = o.id);
Personally I'm not a big fan of cascaded deletes, since a seemingly small mistake might cost you a lot of data, but even you do use them, I don't think it makes this particular case much easier.
I think I would do two logical queries. Feel free to wrap in transaction if you need to guarantee that the deletes always happen together, with a rollback if they don't.
First, delete both the account and the account to organization affiliation with a single query (here I am assuming your account table name)
DELETE tbl_account,tbl_affiliations
FROM tbl_account INNER JOIN tbl_affiliations
ON tbl_account.account_email = tbl_affiliations.account_email /* I am assuming join condition here, perhaps there is an id to be used instead */
WHERE tbl_account.acount_email = ?
Then, delete any orphaned organizations:
DELETE tbl_organizations
FROM tbl_organizations LEFT JOIN tbl_affiliations
ON tbl_organizations.org_id = tbl_affiliations.org_id
WHERE tbl_affiliations.org_id IS NULL
Note that since that last query is not dependent on any account-specific information, you could also consider running an asynchronous process to clean up orphaned organizations if you don't need the organization deletion to happen synchronously with the account deletion.
The benefit of this approach is that you can potentially always delete user accounts in the same way, using the first query, as this works regardless as to whether there are multiple accounts associated with the organization. So you don't need any extra application logic or SQL SELECT subqueries to look for cases where there is a single account associated with an organization.
As you are trying to delete the data from two tables at a time, you can try this.
DELETE from TBL_ORGANISATIONS TO JOIN TBL_AFFILIATIONS TA
ON(TO.id=TA.org_id) where TA.account_email=:email;
Here, we are trying to delete the rows from two tables TBL_ORGANISATIONS and TBL_AFFILIATIONS using primary key of TO(i.e. id) and foriegn key TA(i.e. org_id) and adding a condition using where clause where TA.account_email=:email.
I have a call center web application that I wrote in php. I need a better way of managing access to the client database. I may have a list of 1000 people who need to be called, and I may have 10 people querying that database at the same time in order to pull a person to call. What's the best way I can keep the same record from NOT coming up between the people who are calling.
Currently, I grab a record, then write to a field on it to indicate that it's locked. So when the next person queries the DB, it checks to make sure it's not pulling anything that was marked as locked. This works fine if it's a slow night. When you have a lot of people going at once, it's just not a fast enough way of fixing it.
Any ideas?
When you query for the person, you could try doing something loike this instead.
Select yourPerson from yourTable WITH (rowlock, xlock) WHERE yourPersonid = 1;
This will give you an exclusive row lock on the person to prevent other people from using the same person concurrently.
I guess that perhaps you are doing something like:
declare #id int, #phone varchar(20), #name varchar(100)
select top (1) #id = id, #phone = phone, #name = name
from People
where status = 'awaiting'
update People
set status = 'in_process'
where id = #id --previously grabbed
--- etc.
It is not safe, because of some other process could also select the same record before your update statement
In this scenario you can use update statement (or delete, depending on your logic) with the output clause
declare #person table (id int, phone varchar(20), name varchar(100))
update top (1) p
set status = 'in_process'
output inserted.id, inserted.phone, inserted.name into #person
from People p
where status = 'awaiting'
I have a web page where people are able to post a single number between 0 and 10.
There is like a lotto single number generation once daily. I want my PHP script to check on the the posted numbers of all the users and assign a score of +1 or -1 to the relative winners (or losers).
The problem is that once I query the DB for the list of the winning users, I want to update their "score" field (in "users" table). I was thinking of a loop like this (pseudocode)
foreach winner{
update score +1
}
but this would mean that if there are 100 winners, then there will be 100 queries. Is there a way to do some sort of batch inserting with one single query?
Thanks in advance.
I'll assume you are using a database, with sql, and suggest that would probably want to do something like
UPDATE `table` SET `score`=`score`+1 WHERE `number`=3;
and the corresponding -1 for losers (strange, can't see a reason to -1 them).
Without more details though, I can't be of further help.
You didn't specify how the numbers were stored. If there is a huge number of people posting, a good option is to use a database to store their numbers.
You can have for example a table called lotto with three fields: posted_number, score and email. Create an (non-unique!) index on the posted_number field.
create table lotto (posted_number integer(1) unsigned, score integer, email varchar(255), index(posted_number));
To update their score you can execute two queries:
update lotto set score = score+1 where posted_number = <randomly drawn number here>
update lotto set score = score-1 where posted_number = <randomly drawn number here>
Let's just assume we have a datatable named posts and users.
Obviously, users contain the data of the gambler (with a convenient id field and points for the number of points they have), and posts contain the post_id ID field for the row, user_id, which is the ID of the user and value, the posted number itself.
Now you only need to implement the following SQL queries into your script:
UPDATE users INNER JOIN posts ON users.id = posts.user_id SET users.points = (users.points + 1)
WHERE posts.value = 0;
Where 0 at the end is to be replaced with the randomly drawn number.
What will this query do? With the INNER JOIN construct, it will create a link between the two tables. Automatically, if posts.value matches our number, it will link posts.user_id to users.id, knowing which user has to get his/her points modified. If someone gambled 0, and his ID (posts.user_id) is 8170, the points field will update for the user having user.id = 8170.
If you alter the query to make it (users.points - 1) and WHERE posts.value != 0, you will get the non-winners having one point deducted. It can be tweaked as much as you want.
Just be careful! After each daily draw, the posts table needs to be truncated or archived.
Another option would be storing the timestamp (time() in PHP) of the user betting the number, and when executing, checking against the stored timestamp... whether it is in between the beginning and the end of the current day or not.
Just a tip: you can use graphical database software (like Microsoft Access or LibreOffice Base) to have your JOINs and such simulated on a graphical display. It makes modelling such questions a lot easier for beginners. If you don't want desktop-installed software, trying out an installation of phpMyAdmin is another solution too.
Edit:
Non-relational databases
If you are to use non-relational databases, you will first need to fetch all the winner IDs with:
SELECT user_id FROM posts WHERE value=0;
This will give you a result of multiple rows. Now, you will need to go through this result, one-by-one, and executing the following query:
UPDATE users SET points=(users.points + 1) WHERE id=1;
(0 is the drawn winning number, 1 is the concurrent id of the user to update.)
Without using the relation capabilities of MySQL, but using a MySQL database, the script would look like this:
<?php
$number = 0; // This is the winning number we have drawn
$result = mysql_query("SELECT user_id FROM posts WHERE number=" .$number);
while ( $row = mysql_fetch_assoc($result) )
{
$curpoints_result = mysql_query("SELECT points FROM users WHERE user_id=" .$row['user_id']);
$current_points = mysql_fetch_assoc($curpoints_results);
mysql_query("UPDATE users SET points=" .($current_points['points'] + 1). " WHERE user_id=" .$row['user_id']);
}
?>
The while construct make this loop to run until every row of the result (list of winners) is updated.
Oh and: I know MySQL is a relational database, but it is just what it is: an example.
I need retrieve data from 2 tables at the same time, the tables are not linked by foreigns keys or such.
$query1 = "select idemployee from employee where address like 'Park Avenue, 23421'";
$query2 "select idcompany from company where bossName like 'Peter'";
How can I do this with a kinda thread in PHP?. I've heard that threads are no safe in PHP.
UPDATED:
I got an input field that needs to looks data in both tables, is like search on both tables and show the posible results based on the employee address or boss's name, so you can type an address or just the boss's name. It's just a representation on what I need
Either use a single query, or look into something like Gearman to have workers performing jobs asynchronously (I assume the current code is only an example: if the queries you have there are performing so badly you want to perform them async. then you most likely have a database problem). Having some deamon processes ready to go to perform tasks is relatively simple.
.
Um...
$query1 = "select idemployee from employee where address like ?";
$query2 = "select idcompany from company where bossName like ?";
$stmt1 = $pdo->prepare($query1);
$stmt1->execute(array('Park Avenue, 23421'));
$employee = $stmt1->fetch();
$stmt2 = $pdo->prepare($query2);
$stmt2->execute(array('Peter'));
$company = $stmt2->fetch();
What am I missing?
You could use MYSQLI_ASYNC and http://docs.php.net/mysqli.poll (both only available with php 5.3+ and mysqlnd).
But then you'll need a separate connection to the MySQL server for each query.
Depends on what do you want to do with those queries. If, for example, you are using an AJAX form and can make two requests, you should create separate scripts, where each returns the results for each query. That is effectively running them in separate processes, so they execute simultaneously.
There is no such thing as threading per se in PHP, you can see a hack around it here (using full fledged processes.)
Counter answer to my previous:
Create a new table
CREATE TABLE EmployeeBossXref (
id INT auto_increment,
employee_id INT,
boss_id INT,
company_id INT,
FOREIGN KEY (employee_id) REFERENCES Employee(id),
FOREIGN KEY (boss_id) REFERENCES Employee(id),
FOREIGN KEY (company_id) REFERENCES Company(id)
) ENGINE=InnoDB;
Then change SQL to:
select Employee.name, Boss.name, Company.name FROM Employee
JOIN EmployeeBossXref ebx ON ebx.employee_id=Employee.id
JOIN Employee Boss ON Boss.id=ebx.boss_id
JOIN Company ON Company.id=ebx.company_id
WHERE Employee.address LIKE 'Park Avenue, 23421'
AND Boss.name LIKE 'Peter';
With this system, all bosses are employees (which they logically are!), employees can have more than one, or no boss.
You dont. Do you have an engineering reason you need to do this?
Making two queries simultaneously is still going to hit the same database, and the database is going to do the same amount of work. Its not going to make anything faster, and, you'll have the overhead of the additional threads/processes being created.
If you really need better concurrency, consider a 2nd (or 3rd or 4th) real-time replicated database for SELECT queries, to offload some of the work from the main database.