Resorting id field in database after deleting data - php

I have a problem. I created a database (my_database), created a table inside it(members), and created fields (id, name, lastname, address). The id field is set to auto-increment. Now I have 5 rows, which means id has a value of 1, 2, 3, 4 and 5. When I delete id 3, then what's left is '1 2 4 5'. My question is what should I do to resort '1 2 4 5' to '1 2 3 4'? Any help would be much appreciated.
Update: I have to do this resorting because on my site I am displaying all the members using loop and accessing the id number. Or is it even possible to get the id of a data? Example I have data this data: id=2, lastname=clyde, address=switzerland. How can I get the id value just by the lastname value?

You really don't want to do that for a number of reasons.
First off, things might go wrong if a page is open. Let's say a user is on the edit page of the entry with id = 4 when you delete the entry with id = 3 and update the ids as you suggest. Now, when the editing person hits submit, the id will be taken from his page and he will be updating the new entry 4 instead of the old entry 4. Now this is a blunt and simple example, but beyond those there are lots more things that can go wrong at this level.
Secondly, there is a lot of work involved when programming this. First, you need to update your columns. Then, you also need to make sure MySQL knows what number to give the next entry. And then all of that has to work in an environment that makes sure we don't get quite as many problems with the point mentioned above (which is quite a hard thing). This is a lot of work compared to the alternative of not doing anything.
The third problem is that there is a huge performance overhead. Imagine having a database with thousands upon thousands of entries, and then removing an entry with a low id. Suddenly, all entries with higher ids have to be updated. This might well mean that your site becomes unresponsive because it is doing this task and can't handle too much else at the same time (in fact, in order to make sure that we don't get problems like in the first point, we have to make sure that we don't do anything else at the same time (or make sure we work on different copies of the data or something) because we could end up with a result that comes from during this whole update process.
My suggestion would be in line with what others are saying: just leave it as it is and do not worry about this. auto_increment is meant for just one purpose: giving each value a unique identifier easily. Use this identifier to identify and refer to the same entry only. Perhaps one could also make a case about sorting on these identifiers, but no further than to have a certain order (and even then people will disagree with this use of it).
Instead of trying to update the ids, we should look for another place to solve this problem. If the problem is purely how you feel about it, that's easy. You may not feel good about it, but you just need to convince yourself that updating all those ids is not the solution you are looking for. If you use the numbers elsewhere, the problem can be a little more complex to solve. However, there is always the possibility of using PHP to generate numbers for each entry, which most definitely is the logical place to do so if the numbers are used in the generating of your html content. If you provide more details about where you use the sequential numbers, a look could be taken at how to solve it in that case.

You could do an UPDATE query but mostly developers just leave the id's as is. You can do the numbering in the PHP if it's important to you.

You can either manually update id value by the following statement :
update members set id = (id-1) where id > 3;
or my advice would be to leave as it is. Its not causing any ambuiguity.

Related

What do you think of this approach for logging changes in mysql and have some kind of audit trail

I've been reading through several topics now and did some research about logging changes to a mysql table. First let me explain my situation:
I've a ticket system with a table: 'ticket'
As of now I've created triggers which will enter a duplicate entry in my table: 'ticket_history' which has "action" "user" and "timestamp" as additional columns. After some weeks and testing I'm somewhat not happy with that build since every change is creating a full copy of my row in the history table. I do understand that disk space is cheap and I should not worry about it but in order to retrieve some kind of log or nice looking history for the user is painful, at least for me. Also with the trigger I've written I get a new row in the history even if there is no change. But this is just a design flaw of my trigger!
Here my trigger:
BEFORE UPDATE ON ticket FOR EACH ROW
BEGIN
INSERT INTO ticket_history
SET
idticket = NEW.idticket,
time_arrival = NEW.time_arrival,
idticket_status = NEW.idticket_status,
tmp_user = NEW.tmp_user,
action = 'update',
timestamp = NOW();
END
My new approach in order to avoid having triggers
After spening some time on this topic I came up with an approach I would like to discuss and implement. But first I would have some questions about that:
My idea is to create a new table:
id sql_fwd sql_bwd keys values user timestamp
-------------------------------------------------------------------------
1 UPDATE... UPDATE... status 5 14 12345678
2 UPDATE... UPDATE... status 4 7 12345678
The flow would look like this in my mind:
At first I would select something or more from the DB:
SELECT keys FROM ticket;
Then I display the data in 2 input fields:
<input name="key" value="value" />
<input type="hidden" name="key" value="value" />
Hit submit and give it to my function:
I would start with a SELECT again: SELECT * FROM ticket;
and make sure that the hidden input field == the value from the latest select. If so I can proceed and know that no other user has changed something in the meanwhile. If the hidden field does not match I bring the user back to the form and display a message.
Next I would build the SQL Queries for the action and also the query to undo those changes.
$sql_fwd = "UPDATE ticket
SET idticket_status = 1
WHERE idticket = '".$c_get['id']."';";
$sql_bwd = "UPDATE ticket
SET idticket_status = 0
WHERE idticket = '".$c_get['id']."';";
Having that I run the UPDATE on ticket and insert a new entry in my new table for logging.
With that I can try to catch possible overwrites while two users are editing the same ticket in the same time and for my history I could simply look up the keys and values and generate some kind of list. Also having the SQL_BWD I simply can undo changes.
My questions to that would be:
Would it be noticeable doing an additional select everytime I want to update something?
Do I lose some benefits I would have with triggers?
Are there any big disadvantages
Are there any functions on my mysql server or with php which already do something like that?
Or is there might be a much easier way to do something like that
Is maybe a slight change to my trigger I've now already enough?
If I understad this right MySQL is only performing an update if the value has changed but the trigger is executed anyways right?
If I'm able to change the trigger, can I still prevent somehow the overwriting of data while 2 users try to edit the ticket the same time on the mysql server or would I do this anyways with PHP?
Thank you for the help already
Another approach...
When a worker starts to make a change...
Store the time and worker_id in the row.
Proceed to do the tasks.
When the worker finishes, fetch the last worker_id that touched the record; if it is himself, all is well. Clear the time and worker_id.
If, on the other hand, another worker slips in, then some resolution is needed. This gets into your concept that some things can proceed in parallel.
Comments could be added to a different table, hence no conflict.
Changing the priority may not be an issue by itself.
Other things may be messier.
It may be better to have another table for the time & worker_ids (& ticket_id). This would allow for flagging that multiple workers are currently touching a single record.
As for History versus Current, I (usually) like to have 2 tables:
History -- blow-by-blow list of what changes were made, when, and by whom. This is table is only INSERTed into.
Current -- the current status of the ticket. This table is mostly UPDATEd.
Also, I prefer to write the History directly from the "database layer" of the app, not via Triggers. This gives me much better control over the details of what goes into each table and when. Plus the 'transactions' are clear. This gives me confidence that I am keeping the two tables in sync:
BEGIN; INSERT INTO History...; UPDATE Current...; COMMIT;
I've answered a similar question before. You'll see some good alternatives in that question.
In your case, I think you're merging several concerns - one is "storing an audit trail", and the other is "managing the case where many clients may want to update a single row".
Firstly, I don't like triggers. They are a side effect of some other action, and for non-trivial cases, they make debugging much harder. A poorly designed trigger or audit table can really slow down your application, and you have to make sure that your trigger logic is coordinated between lots of developers. I realize this is personal preference and bias.
Secondly, in my experience, the requirement is rarely "show the status of this one table over time" - it's nearly always "allow me to see what happened to the system over time", and if that requirement exists at all, it's usually fairly high priority. With a ticketing system, for instance, you probably want the name and email address of the users who created, and changed the ticket status; the name of the category/classification, perhaps the name of the project etc. All of those attributes are likely to be foreign keys on to other tables. And when something does happen that requires audit, the requirement is likely "let me see immediately", not "get a database developer to spend hours trying to piece together the picture from 8 different history tables. In a ticketing system, it's likely a requirement for the ticket detail screen to show this.
If all that is true, then I don't think history tables populated by triggers are a good idea - you have to build all the business logic into two sets of code, one to show the "regular" application, and one to show the "audit trail".
Instead, you might want to build "time" into your data model (that was the point of my answer to the other question).
Since then, a new style of data architecture has come along, known as CQRS. This requires a very different way of looking at application design, but it is explicitly designed for reactive applications; these offer much nicer ways of dealing with the "what happens if someone edits the record while the current user is completing the form" question. Stack Overflow is an example - we can see, whilst typing our comments or answers, whether the question was updated, or other answers or comments are posted. There's a reactive library for PHP.
I do understand that disk space is cheap and I should not worry about it but in order to retrieve some kind of log or nice looking history for the user is painful, at least for me.
A large history table is not necessarily a problem. Huge tables only use disk space, which is cheap. They slow things down only when making queries on them. Fortunately, the history is not something you'd use all the time, most likely it is only used to solve problems or for auditing.
It is useful to partition the history table, for example by month or week. This allows you to simply drop very old records, and more important, since the history of the previous months has already been backed up, your daily backup schedule only needs to backup the current month. This means a huge history table will not slow down your backups.
With that I can try to catch possible overwrites while two users are editing the same ticket in the same time
There is a simple solution:
Add a column "version_number".
When you select with intent to modify, you grab this version_number.
Then, when the user submits new data, you do:
UPDATE ...
SET all modified columns,
version_number=version_number+1
WHERE ticket_id=...
AND version_number = (the value you got)
If someone came in-between and modified it, then they will have incremented the version number, so the WHERE will not find the row. The query will return a row count of 0. Thus you know it was modified. You can then SELECT it, compare the values, and offer conflict resolution options to the user.
You can also add columns like who modified it last, and when, and present this information to the user.
If you want the user who opens the modification page to lock out other users, it can be done too, but this needs a timeout (in case they leave the window open and go home, for example). So this is more complex.
Now, about history:
You don't want to have, say, one large TEXT column called "comments" where everyone enters stuff, because it will need to be copied into the history every time someone adds even a single letter.
It is much better to view it like a forum: each ticket is like a topic, which can have a string of comments (like posts), stored in another table, with the info about who wrote it, when, etc. You can also historize that.
The drawback of using a trigger is that the trigger does not know about the user who is logged in, only the MySQL user. So if you want to record who did what, you will have to add a column with the user_id as I proposed above. You can also use Rick James' solution. Both would work.
Remember though that MySQL triggers don't fire on foreign key cascade deletes... so if the row is deleted in this way, it won't work. In this case doing it in the application is better.

Why use IDs for looking up data as opposed to using the title -- please help me argue my point

Please help me argue my point.
I am working on a website project with a team of developers, we are developing the system in 3 parts. The one part is the API, 2 back-end and front-end. Both the front end and back-end gets and stores data by sending it to the API.
I am specifically responsible for the front end. I am using Codeigniter as my framework.
A little background: The app is a sports betting site.
This is the problem: The developers of the API use the name of for example a tournament or fixture or sport to do the lookup, I pass the name of a tournament for example:
www.example.com/sport/add_bet/{tournament_name}
The problem I have with this is that the tournament name as entered into the system by humans might have characters such as spaces, forward slashes, etc in the name.
As you can imagine using a forward slash in the url will completely break the system, since we use them to call different controllers, actions and to pass variables.
I am trying to get them to change to using a simple primary key id field, to perform the lookup of the data. For some reason these developers don't want to do this.
The project manager that manages this project (not a programmer and no experience of programming) had a chat to them about this issue, but still they don't want to change, and they told her that it is a matter of personal preference on which way to go.
As far as I know ID's have always been the way to do it.
Could you guys/girls please help me argue my point by giving some reasons as to why I am correct or incorrect in your view. I would like to provide your answers as motivation to get them to change over to doing it the right way.
Your help/answers/suggestions would be much appreciated.
The most important thing is the id will be unique as it is should be the primary key. so searching by ids will return unique results.
But the multiple record may have save title if you didn't validate them at the time of saving.
And also if you want some joins or something like that the id would help it.
And the should never trust the user and expect them to work as you wanted.
There is two sides:
1) You allow select single Title from dropdown and send to server only ID. Look-up by ID is way faster (assuming you are using ID as primary key). But if you have lots of Titles than you have to list all of them and user will be forced to scroll till find that Title.
2) You have simple input field to allow search only by part of Title. That way you don't have to list all Titles. As programmer, you have to escape all user input, that goes to server (via GET or POST), so that user can input even DELETE FROM user WHERE 1 to your input field and your system will sill works fine. Also, by inputting only part of Title allow to show multiple results, while using IDs is impossible.
I prefer second approach.
To make the look up fast, you need to place an index on the column by which you are looking up records. Primary key column always has an index. In order to use some other column you need to add an unique index, to avoid duplicates and make the search faster, which in turn makes the table larger. If you expect the table to grow (which is not too unlikely if you follow many sports and many leagues/tournaments over a number of years), it might become a problem at some point, depending on the resources in your production environment. It's not the strongest argument you can present, but it is not a bad argument either

Is there a benefit to starting MySQL data base id at a large number?

So i noticed in a script i'm using that the id row in the database i have set up is started at 1728.
Is there any specific benefits in starting a database id number at a large number or anything other then 1 ??
It looks somewhat cool for someone.
profile.php?id=1728 looks better than profile.php?id=1
But in your case, it's probably wrong SQL dump which had AUTO_INCREMENT 1728
Not that I'm aware of, although I've seen it used as a very simple security measure, which prevents the first user in a table of users (typically the admin / creator) from having user ID = 1.
No, there are no benefits. As long as the id is unique, it doesn't matter. Some developers prefer to start ids for some rows higher because it seems to look better in a url. For example, this url:
http://www.example.com/user/profile.php?id=142541
looks better than:
http://www.example.com/user/profile.php?id=1
I don't know of any real reason. Perhaps then people don't guess that your admin user ID is 1.
I have seen tables start with non-1 IDs when using auto-incrementing IDs. Every attempted insert will increment even if the insert fails. In your case the table may have incremented to 1728 while the script was being developed so the first "real" record was 1728.
We can only guess. There are no technical benefits as such. But there may be soft benefits. I can imagine that it's done with the intention of having some reserved IDs for old/previous backup .sql dumps or even default database entries.
I occasionally start IDs (if I really have to expose database-internal numbering in the UI) at larger numbers like 1000, so I get e.g. 4-digit numbers for all IDs. Not technical necessary, but may look more consistent.
#Jake Chapman, The reason behind it is that if one see
profile.php?id=1 or profile.php?id=2 ...
all these takes programmer's attention, and tricky programmer would like to play some hacking tricks because they know well what can be done to it.
Numbers likes
profile.php?id=1343 or profile.php?id=2543 ...
Some confusing and don't take attntion suddenly that is why.

Session Id's, UUID's and GUID's and just ID's in general

I have an idea. It might be bad, for reasons not known by me, but I would really appreciate your feedback on this!
We've been using Session ID's in a PHP project. And for the first time in 4 years, a duplicate sessionid was generated. Luckily enough I randomly decided to go looking through the Customers Table because I was bored and noticed there was a duplicate entry in the sessionid column and changed it, and references to it, before any real problems occured.
Which led (or lead?) me to ask myself this question:
Basically, any type of ID, be it a session id, uuid or guid can eventually be duplicated because the length of them is not infinite. So then I got thinking, because we never want this to happen again.
I thought, What if we use a combination of date and time, as an identifier? This wouldn't be "random", but it would solve the duplication issue. For example:
14th of May 2011 04:36:05PM
If used as an identifier, could be changed to:
14052011163605
The reason this form of ID will never become duplicated is because no date, combined with time in the future will ever be the same as one in the past, or present.
And since, in our case, these ID's are not meant to be seen by anybody, there's no reason for it to be random, is there?
I'd love to hear your thoughts on this, and, how you approach situations like this. What's your best method of generating [practically] unique ID'S?
The reason this form of ID will never become duplicated is because no date, combined with time in the future will ever be the same as one in the past, or present.
If you only had one user, this would be true.
UUID / GUIDs are very large (larger than the count of particles in visible universe)
your date/time solution will fail on high loads. what happens when i need 100+ new ids per second ?
Why not just make your session ID column a unique column, and then generate a new session ID if you get a constraint violation error? That way the database will find this problem for you in basically the same way that you did (and as a bonus, it can fix it too).
UUIDs are already generated based on nanosecond intervals of time (see this wikipeida article). If you are using PHP, I'd suggest this page to take a look at how to generate the different versions, depending on your use:
http://php.net/manual/en/function.uniqid.php
The reason you can't guarantee uniqueness is that you have to eventually pick a size limit for the actual string/variable containing your UUID, so there is always the potential for a duplicate. Given the number of possibilities for a UUID, though, this should be practically impossible.
I agree with the other posters... this probably shouldn't ever happen. How are you actually generating unique IDs? Are you sure your code is properly creating them?

Need some suggestion for a database schema design

I'm designing a very simple (in terms of functionality) but difficult (in terms of scalability) system where users can message each other. Think of it as a very simple chatting service. A user can insert a message through a php page. The message is short and has a recipient name.
On another php page, the user can view all the messages that were sent to him all at once and then deletes them on the database. That's it. That's all the functionality needed for this system. How should I go about designing this (from a database/php point of view)?
So far I have the table like this:
field1 -> message (varchar)
field2 -> recipient (varchar)
Now for sql insert, I find that the time it takes is constant regardless of number of rows in the database. So my send.php will have a guaranteed return time which is good.
But for pulling down messages, my pull.php will take longer as the number of rows increase! I find the sql select (and delete) will take longer as the rows grow and this is true even after I have added an index for the recipient field.
Now, if it was simply the case that users will have to wait a longer time before their messages are pulled on the php then it would have been OK. But what I am worried is that when each pull.php service time takes really long, the php server will start to refuse connections to some request. Or worse the server might just die.
So the question is, how to design this such that it scales? Any tips/hints?
PS. Some estiamte on numbers:
number of users starts with 50,000 and goes up.
each user on average have around 10 messages stored before the other end might pull it down.
each user sends around 10-20 messages a day.
UPDATE from reading the answers so far:
I just want to clarify that by pulling down less messages from pull.php does not help. Even just pull one message will take a long time when the table is huge. This is because the table has all the messages so you have to do a select like this:
select message from DB where recipient = 'John'
even if you change it to this it doesn't help much
select top 1 message from DB where recipient = 'John'
So far from the answers it seems like the longer the table the slower the select will be O(n) or slightly better, no way around it. If that is the case, how should I handle this from the php side? I don't want the php page to fail on the http because the user will be confused and end up refreshing like mad which makes it even worse.
the database design for this is simple as you suggest. As far as it taking longer once the user has more messages, what you can do is just paginate the results. Show the first 10/50/100 or whatever makes sense and only pull those records. Generally speaking, your times shouldn't increase very much unless the volume of messages increases by an order of magnatude or more. You should be able to pull back 1000 short messages in way less than a second. Now it may take more time for the page to display at that point, but thats where the pagination should help.
I would suggest though going through and thinking of future features and building your database out a little more based on that. Adding more features to the software is easy, changing the database is comparatively harder.
Follow the rules of normalization. Try to reach 3rd normal form. To go further for this type of application probably isn’t worth it. Keep your tables thin.
Don’t actually delete rows just mark them as deleted with a bit flag. If you really need to remove them for some type of maintenance / cleanup to reduce size. Mark them as deleted and then create a cleanup process to archive or remove the records during low usage hours.
Integer values are easier for SQL server to deal with then character values. So instead of where recipient = 'John' use WHERE Recipient_ID = 23 You will gain this type of behavior when you normalize your database.
Don't use VARCHAR for your recipient. It's best to make a Recipient table with a primary key that is an integer (or bigint if you are expecting extremely large quantities of people).
Then when you do your select statements:
SELECT message FROM DB WHERE recipient = 52;
The speed retrieving rows will be much faster.
Plus, I believe MySQL indexes are B-Trees, which is O(log n) for most cases.
A database table without an index is called a heap, querying a heap results in each row of the table being evaluated even with a 'where' clause, the big-o notation for a heap is O(n) with n being the number of rows in the table. Adding an index (and this really depends on the underlying aspects of your database engine) results in a complexity of O(log(n)) to find the matching row in the table. This is because the index most certainly is implemented in a b-tree sort of way. Adding rows to the table, even with an index present is an O(1) operation.
> But for pulling down messages, my pull.php will take longer as the number of rows
increase! I find the sql select (and delete) will take longer as the rows grow and
this is true even after I have added an index for the recipient field.
UNLESS you are inserting into the middle of an index, at which point the database engine will need to shift rows down to accommodate. The same occurs when you delete from the index. Remember there is more than one kind of index. Be sure that the index you are using is not a clustered index as more data must be sifted through and moved with inserts and deletes.
FlySwat has given the best option available to you... do not use an RDBMS because your messages are not relational in a formal sense. You will get much better performance from a file system.
dbarker has also given correct answers. I do not know why he has been voted down 3 times, but I will vote him up at the risk that I may lose points. dbarker is referring to "Vertical Partitioning" and his suggestion is both acceptable and good. This isn't rocket surgery people.
My suggestion is to not implement this kind of functionality in your RDBMS, if you do remember that select, update, insert, delete all place locks on pages in your table. If you do go forward with putting this functionality into a database then run your selects with a nolock locking hint if it is available on your platform to increase concurrency. Additionally if you have so many concurrent users, partition your tables vertically as dbarker suggested and place these database files on separate drives (not just volumes but separate hardware) to increase I/O concurrency.
So the question is, how to design this such that it scales? Any tips/hints?
Yes, you don't want to use a relational database for message queuing. What you are trying to do is not what a relational database is best designed for, and while you can do it, its kinda like driving in a nail with a screwdriver.
Instead, look at one of the many open source message queues out there, the guys at SecondLife have a neat wiki where they reviewed a lot of them.
http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes
This is an unavoidable problem - more messages, more time to find the requested ones. The only thing you can do is what you already did - add an index and turn O(n) look up time for a complete table scan into O(log(u) + m) for a clustered index look up where n is the number of total messages, u the number of users, and m the number of messages per user.
Limit the number of rows that your pull.php will display at any one time.
The more data you transfer, longer it will take to display the page, regardless of how great your DB is.
You must limit your data in the SQL, return the most recent N rows.
EDIT
Put an index on Recipient and it will speed it up. You'll need another column to distinguish rows if you want to take the top 50 or something, possibly SendDate or an auto incrementing field. A Clustered index will slow down inserts, so use a regular index there.
You could always have only one row per user and just concatenate messages together into one long record. If you're keeping messages for a long period of time, that isn't the best way to go, but it reduces your problem to a single find and concatenate at storage time and a single find at retrieve time. It's hard to say without more detail - part of what makes DB design hard is meeting all the goals of the system in a well-compromised way. Without all the details, its hard to give advice on the best compromise.
EDIT: I thought I was fairly clear on this, but evidently not: You would not do this unless you were blanking a reader's queue when he reads it. This is why I prompted for clarification.

Categories