Check for new content with JS/PHP/MySQL? - php

I have four tables: projects, posts, users and project_users. Some users are connected to some projects via the project_users table. Each project contains a bunch of rows from the posts table. Pretty straight forward.
The users can edit the posts rows and every time an update occurs the user id and timestamp should be saved so other users can see what is new and not. Should I save this information to 1) the post row, keeping a latest editor and time for each row or should I consider 2) a whole separate log table? What are the benefits of each case?
When that is decided, I want to run a script every now and then that refreshes the site content if any rows have been updated by anyone else (poor man's push). To give me some perspective. Is it a big query to ask the database with say 1 million posts: "check if there are any posts in this project where any user other than you have updated after timestamp x"? Just asking because I'd run that query a lot.
A much quicker way would be to log the latest editor to the projects or project users rows, but with multiple people editing at the same time, that would be less accurate and there is also no way to see which rows got updated. Makes sense?

Use solution (1). It's a clean design and easy to work with.
When you fetch post data for the edit form, save the updated_at timestamp in a hidden field. Then run an Ajax request every X seconds to check if the timestamp has been changed. The query is simple:
SELECT updated_at FROM posts WHERE post_id = ?
Since you filter by the primary key, the query will be very fast even with a huge table.
Now compare the fetched timestamp with what is stored in the hidden field. If it has changed - Notify the user (editor).
As you already realized, it's possible that another user will update the post between two Ajax requests. So you should also check for updates just befor saving the form data. Don't forget to use a transaction and lock the row:
Pseudo code:
START TRANSACTION
SELECT updated_at FROM posts WHERE post_id = ? FOR UPDATE
if ($row->updated_at == $form->updated_at)
then update row
else notify the user that he is unlucky ;-(
COMMIT
Of course you can also use solution (2) and add a separate table post_updates(post_id, user_id, updated_at). To get the latest timestamp you would run:
SELECT MAX(updated_at) as updated_at
FROM post_updates
WHERE post_id = ?
It should run pretty fast with an index on (post_id, updated_at). But don't complicate things as long as you don't have any requirements which would benefit from that table.

Related

Splitting up data in MySQL to make it faster and more accessible

I have a MySQL database that is becoming really large. I can feel the site becoming slower because of this.
Now, on a lot of pages I only need a certain part of the data. For example, I store information about users every 5 minutes for history purposes. But on one page I only need the information that is the newest (not the whole history of data). I achieve this by a simple MAX(date) in my query.
Now I'm wondering if it wouldn't be better to make a separate table that just stores the latest data so that the query doesn't have to search for the latest data from a specific user between millions of rows but instead just has a table with only the latest data from every user.
The con here would be that I have to run 2 queries to insert the latest history in my database every 5 minutes, i.e. insert the new data in the history table and update the data in the latest history table.
The pro would be that MySQL has a lot less data to go through.
What are common ways to handle this kind of issue?
There are a number of ways to handle slow queries in large tables. The three most basic ways are:
1: Use indexes, and use them correctly. It is important to avoid table scans on large tables; this is almost always your most significant performance hit with single queries.
For example, if you're querying something like: select max(active_date) from activity where user_id=?, then create an index on the activity table for the user_id column. You can have multiple columns in an index, and multiple indexes on a table.
CREATE INDEX idx_user ON activity (user_id)
2: Use summary/"cache" tables. This is what you have suggested. In your case, you could apply an insert trigger to your activity table, which will update the your summary table whenever a new row gets inserted. This will mean that you won't need your code to execute two queries. For example:
CREATE TRIGGER update_summary
AFTER INSERT ON activity
FOR EACH ROW
UPDATE activity_summary SET last_active_date=new.active_date WHERE user_id=new.user_id
You can change that to check if a row exists for the user already and do an insert if it is their first activity. Or you can insert a row into the summary table when a user registers...Or whatever.
3: Review the query! Use MySQL's EXPLAIN command to grab a query plan to see what the optimizer does with your query. Use it to ensure that the optimizer is avoiding table scans on large tables (and either create or force an index if necesary).

Maintaining history table for PHP Application

I want to maintain the history table for my application to track what all the fields were changed by the user.
The following is my bugs_history table structure -
id, bugsid, userid, field_changed, old_value, new_value, created_on, created_by
So my query is, when I'll update my form and submit, how to get the field name that was changed along with old value and new value. And add the modified changes in the above history table.
I have googled a lot for this but didn't get as per my requirments. Please let me know how to achieve this.
If you know the field names (you have a HTML form containing them, so you probably know the names) then build a list of them that you then loop through, building a new SELECT query to get old_value and then an INSERT query to save it. The select would order on created_on DESC and LIMIT 1.
But I see a clear problem here: concurrency. What happens when two users try to edit the same bug (with the same bugsid) at the same time? They would expect the old_value to be the same for both? Or should the two operations be executed sequentially? Or should the last one to edit be warned that he's trying to edit stale data? Which one would get the "latest" created_on?
This right here is your real problem, not writing the code that generates two SQL queries.

What is an elegant / efficient way of storing the status of 100 lessons for multiple users?

I'm working on an app in JavaScipt, jQuery, PHP & MySQL that consists of ~100 lessons. I am trying to think of an efficient way to store the status of each user's progress through the lessons, without having to query the MySQL database too much.
Right now, I am thinking the easiest implementation is to create a table for each user, and then store each lesson's status in that table. The only problem with that is if I add new lessons, I would have to update every user's table.
The second implementation I considered would be to store each lesson as a table, and record the user ID for each user that completed that lesson there - but then generating a status report (what lessons a user completed, how well they did, etc.) would mean pulling data from 100 tables.
Is there an obvious solution I am missing? How would you store your users progress through 100 lessons, so it's quick and simple to generate a status report showing their process.
Cheers!
The table structure I would recommend would be to keep a single table with non-unique fields userid and lessonid, as well as the relevant progress fields. When you want the progress of user x on lesson y, you would do this:
SELECT * FROM lessonProgress WHERE userid=x AND lessonid=y LIMIT 1;
You don't need to worry about performance unless you see that it's actually an issue. Having a table for each user or a table for each lesson are bad solutions because there aren't meant to be a dynamic number of tables in a database.
If reporting is restricted to one user at a time - that is, when generating a report, it's for a specific user and not a large clump of users - why not consider javascript object notation stored in a file? If extensibility is key, it would make it a simple matter.
Obviously, if you're going to run reports against an arbitrarily large number of users at once, separate data files would become inefficient.
Discarding the efficiency argument, json would also give you a very human-readable and interchangeable format.
Lastly, if the security of the report output isn't a big sticking point, you'd also gain the ability to easily offload view rendering onto the client.
Use relations between 2 tables. One for users with user specific columns like ID, username, email, w/e else you want to store about them.
Then a status table that has a UID foreign key. ID UID Status etc.
It's good to keep datecreated and dateupdated on tables as well.
Then just join the tables ON status.UID = users.ID
A good option will be to create one table with an user_ID as primary key and a status (int) each row of the table will represent a user. Accessing to its progress would be fast a simple since you have an index of user IDs.
In this way, adding new leassons would not make you change de DB

Soft delete best practices (PHP/MySQL)

Problem
In a web application dealing with products and orders, I want to maintain information and relationships between former employees (users) and the orders they handled. I want to maintain information and relationships between obsolete products and orders which include these products.
However I want employees to be able to de-clutter the administration interfaces, such as removing former employees, obsolete products, obsolete product groups etc.
I'm thinking of implementing soft-deletion. So, how does one usually do this?
My immediate thoughts
My first thought is to stick a "flag_softdeleted TINYINT NOT NULL DEFAULT 0" column in every table of objects that should be soft deletable. Or maybe use a timestamp instead?
Then, I provide a "Show deleted" or "Undelete" button in each relevant GUI. Clicking this button you will include soft-deleted records in the result. Each deleted record has a "Restore" button. Does this make sense?
Your thoughts?
Also, I'd appreciate any links to relevant resources.
That's how I do it. I have a is_deleted field which defaults to 0. Then queries just check WHERE is_deleted = 0.
I try to stay away from any hard-deletes as much as possible. They are necessary sometimes, but I make that an admin-only feature. That way we can hard-delete, but users can't...
Edit: In fact, you could use this to have multiple "layers" of soft-deletion in your app. So each could be a code:
0 -> Not Deleted
1 -> Soft Deleted, shows up in lists of deleted items for management users
2 -> Soft Deleted, does not show up for any user except admin users
3 -> Only shows up for developers.
Having the other 2 levels will still allow managers and admins to clean up the deleted lists if they get too long. And since the front-end code just checks for is_deleted = 0, it's transparent to the frontend...
Using soft-deletes is a common thing to implement, and they are dead useful for lots of things, like:
Saving a user's data when they deleted something
Saving your own data when you delete something
Keep a track record of what really happened (a kind of audit)
etcetera
There is one thing I want to point out that almost everyone miss, and it always comes back to bite you in the rear piece. The users of your application does not have the same understanding of a delete as you have.
There are different degrees of deletions. The typical user deletes stuff when (s)he
Made a misstake and want to remove the bad data
Doesn't want to see something on the screen anymore
The problem is that if you don't record the intention of the delete, your application cannot distinguish between erronous data (that should never have been created) and historically correct data.
Have a look at the following data:
PRICES | item | price | deleted |
+------+-------+---------+
| A | 101 | 1 |
| B | 110 | 1 |
| C | 120 | 0 |
+------+-------+---------+
Some user doesn't want to show the price of item B, since they don't sell that item anymore. So he deletes it. Another user created a price for item A by misstake, so he deleted it and created the price for item C, as intended. Now, can you show me a list of the prices for all products? No, because either you have to display potentially erronous data (A), or you have to exclude all but current prices (C).
Of course the above can be dealt with in any number of ways. My point is that YOU need to be very clear with what YOU mean by a delete, and make sure that there is no way for the users to missunderstand it. One way would be to force the user to make a choice (hide/delete).
If I had existing code that hits that table, I would add the column and change the name of the table. Then I would create a view with the same name as the current table which selects only the active records. That way none of the existing code woudl break and you could have the soft delete column. If you want to see the deleted record, you select from the base table, otherwise you use the view.
I've always just used a deleted column as you mentioned. There's really not much more to it than that. Instead of deleting the record, just set the deleted field to true.
Some components I build allow the user to view all deleted records and restore them, others just display all records where deleted = 0
Your idea does make sense and is used frequently in production but, to implement it you will need to update quite a bit of code to account for the new field. Another option could be to archive (move) the "soft-deleted" records to a separate table or database. This is done frequently as well and makes the issue one of maintenance rather than (re)programming. (You could have a table trigger react to the delete to archive the deleted record.)
I would do the archiving to avoid a major update to production code. But if you want to use deleted-flag field, use it as a timestamp to give you additional useful info beyond a boolean. (Null = not deleted.) You might also want to add a DeletedBy field to track the user responsible for deleting the record. Using two fields gives you a lot of info tells you who deleted what and when. (The two extra field solution is also something that can be done in an archive table/database.)
The most common scenario I've come across is what you describe, a tinyint or even bit representing a status of IsActive or IsDeleted. Depending on whether this is considered "business" or "persistence" data it may be baked into the application/domain logic as transparently as possible, such as directly in stored procedures and not known to the application code. But it sounds like this is legitimate business information for your needs so would need to be known throughout the code. (So users can view deleted records, as you suggest.)
Another approach I've seen is to use a combination of two timestamps to show a "window" of activity for a given record. It's a little more code to maintain it, but the benefit is that something can be scheduled to soft-delete itself at a pre-determined time. Limited-time products can be set that way when they're created, for example. (To make a record active indefinitely one could use a max value (or just some absurdly distant future date) or just have the end date be null if you're ok with that.)
Then of course there's further consideration of things being deleted/undeleted from time to time and tracking some kind of audit for that. The flag approach knows only the current status, the timestamp approach knows only the most recent window. But anything as complex as an audit trail should definitely be stored separately than the records in question.
Instead I would use a bin table in which to move all the records deleted from the other tables. The main problem with the delete flag is that with linked tables you will definitely run into a double key error when trying to insert a new record.
The bin table could have a structure like this:
id, table_name, data, date_time, user
Where
id is the primary key with auto increment
table_name is the name of the table from which the record was deleted
data contains the record in JSON format with name and value of all fields
date_time is the date and time of the deletion
user is the identifier of the user (if the system provides for it) who performed the operation
this method will not only save you from checking the delete flag at each query (immagine the ones with many joins), but will allow you to have only the really necessary data in the tables, facilitating any searches and corrections using SQL client programs

HOw do I delete record in a table by keeping certain datas?

my site has lots of incoming searches which is stored in a database to show recent queries into my website. due to high search queries my database is getting bigger in size. so what I want is I need to keep only recent queries in database say 10 records. this keeps my database small and queries will be faster.
I am able to store incoming queries to database but don't know how to restrict or delete excess/old data from table.
any help??
well I am using PHP and MySQL
Hopefully you have a timestamp column in your table (or have the freedom to add one). AFAIK, you have to add the timestamp explicitly when you add data to the table. Then you can do something along the lines of:
DELETE FROM tablename WHERE timestamp < '<a date two days in the past, or whatever'>;
You'd probably want to just do this periodically, rather than every time you add to the table.
I suppose you could also just limit the size to the most recent ten records by checking the size of the table every time you are about to add a line, and deleting the oldest record (again, using the timestamp column you added) if adding the new record will make it too large.
Falkon's answer is good - though you might not want to have your archive in a table, depending on your needs for that forensic data. You could also set up a cron job that just uses mysqldump to make a backup of the database (with the date in the filename), and then delete the excess records. This way you can easily make backups of your old data, or search it with whatever tool, and your database stays small.
You should write a PHP script, which will be started by CRON (ie. once a day) and move some data from main table TableName to archive table TableNameArchive with exactly the same structure.
That SQL inside the script should looks like:
INSERT INTO TableNameArchive
SELECT * FROM TableName WHERE data < '2010-06-01' //of course you should provide here your conditions
next you should DELETE old records from TableName.

Categories