This question is for my pastebin app written in PHP.
I did a bit of a research, although I wasn't able to find a solution that matches my needs. I have a table with this structure:
+-----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------+----------------+
| id | int(12) unsigned | NO | PRI | NULL | auto_increment |
| author | varchar(50) | YES | | | |
| authorid | int(12) unsigned | YES | | NULL | |
| project | varchar(50) | YES | | | |
| timestamp | int(11) unsigned | NO | | NULL | |
| expire | int(11) unsigned | NO | | NULL | |
| title | varchar(25) | YES | | | |
| data | longtext | NO | | NULL | |
| language | varchar(50) | NO | | php | |
| password | varchar(60) | NO | | NULL | |
| salt | varchar(5) | NO | | NULL | |
| private | tinyint(1) | NO | | 0 | |
| hash | varchar(12) | NO | | NULL | |
| ip | varchar(50) | NO | | NULL | |
| urlkey | varchar(8) | YES | MUL | | |
| hits | int(11) | NO | | 0 | |
+-----------+------------------+------+-----+---------+----------------+
This is for a pastebin application. I basically want paste revisions so that if you open paste #1234, it shows all past revisions of that paste.
I thought of three ways:
Method 1
Have a revisions table with id and old_id or something and for each ID, I would insert all old revisions, so if my structure looks like this:
rev3: 1234
rev2: 1233
rev1: 1232
The table will contain this data:
+-------+----------+
| id | old_id |
+-------+----------+
| 1234 | 1233 |
| 1234 | 1232 |
| 1233 | 1232 |
+-------+----------+
The problem which I have with this is that it introduces a lot of duplicate data. And the more the revisions get, it has not only more data but I need to do N inserts for each new paste to the revisions table which is not great for a large N.
Method 2
I can add a child_id to the paste table at the top and just update that. And then, when fetching the paste, I will keep querying the db for each child_id and their child_id and so on... But the problem is, that will introduce too many DB reads each time a paste with many revisions is opened.
Method 3
Also involves a separate revisions table, but for the same scenario as method 1, it will store the data like this:
+-------+-----------------+
| id | old_id |
+-------+-----------------+
| 1234 | 1233,1232 |
| 1233 | 1232 |
+-------+-----------------+
And when someone opens paste 1234, I'll use an IN clause to fetch all child paste data there.
Which is the best approach? Or is there a better approach? I am using Laravel 4 framework that has Eloquent ORM.
EDIT: Can I do method 1 with a oneToMany relationship? I understand that I can use Eager Loading to fetch all the revisions, but how can I insert them without having to do a dirty hack?
EDIT: I figured out how to handle the above. I'll add an answer to close this question.
If you are on Laravel 4, give Revisionable a try. This might suite your needs
So here is what I am doing:
Say this is the revision flow:
1232 -> 1233 -> 1234
1232 -> 1235
So here is what my revision table will look like:
+----+--------+--------+
| id | new_id | old_id |
+----+--------+--------+
| 1 | 1233 | 1232 |
| 2 | 1234 | 1233 |
| 3 | 1234 | 1232 |
| 4 | 1235 | 1232 |
+----+--------+--------+
IDs 2 and 3 show that when I open 1234, it should show both 1233 and 1232 as revisions on the list.
Now the implementation bit: I will have the Paste model have a one to many relationship with the Revision model.
When I create a new revision for an existing paste, I will run a batch insert to add not only the current new_id and old_id pair, but pair the current new_id with all revisions that were associated with old_id.
When I open a paste - which I will do by querying new_id, I will essentially get all associated rows in the revisions table (using a function in the Paste model that defines hasMany('Revision', 'new_id')) and will display to the user.
I am also thinking about displaying the author of each revision in the "Revision history" section on the "view paste" page, so I think I'll also add an author column to the revision table so that I don't need to go back and query the main paste table to get the author.
So that's about it!
There are some great packages to help you keeping model revisions:
If you only want to keep the models revisions you can use:
Revisionable
If you also want to log any other actions, whenever you want, with custom data, you can use:
Laravel Activity Logger
Honorable mentions:
Activity Log. It also has a lot of options.
Related
I develop custom migration code using CiviCRM's PHP API calls like:
<?php
$result = civicrm_api3('Contact', 'create', array(
'sequential' => 1,
'contact_type' => "Household",
'nick_name' => "boo",
'first_name' => "moo",
));
There's a need to keep original IDs, but specifying 'id' or 'contact_id' above does not work. It either does not create the contact or updates an existing one.
The ID is auto-incremented, for sure, but MySQL supports to insert arbitrary, unique values in that case.
How would you proceed? Hack CiviCRM to somehow pass the id to MySQL at the INSERT statement? Somehow dump the SQL after the import and manipulate the IDs in-place at the .sql textfile (hard to maintain integrity)? Any suggestions for that?
I have ~300.000 entries at least to deal with, so a fully automated and robust solution is a must. Any SQL magic potentially to do that?
For those who are not familiar with CiviCRM, the table structure is the following:
mysql> desc civicrm_contact;
+--------------------------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------------+------------------+------+-----+-------------------+-----------------------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| contact_type | varchar(64) | YES | MUL | NULL | |
| contact_sub_type | varchar(255) | YES | MUL | NULL | |
| do_not_email | tinyint(4) | YES | | 0 | |
| do_not_phone | tinyint(4) | YES | | 0 | |
| do_not_mail | tinyint(4) | YES | | 0 | |
| do_not_sms | tinyint(4) | YES | | 0 | |
| do_not_trade | tinyint(4) | YES | | 0 | |
| is_opt_out | tinyint(4) | NO | | 0 | |
| legal_identifier | varchar(32) | YES | | NULL | |
| external_identifier | varchar(64) | YES | UNI | NULL | |
and we talk about the first field.
You should use the external_identifier field which is exactly done for what you want.
This field is not used by CiviCRM itself so there is no risk to mess with core functionality. It's done to link with an external system (legacy for example).
CiviCRM consider the external_identifier to be unique so it will throw an error (using API - I think) or update (using CiviCRM contact import screen) if you try to insert a contact with the same external_identifier.
I want to generate a report from a table, like
+-------------+------------------+------+-----+------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+------------+----------------+
| productID | int(10) unsigned | NO | PRI | NULL | auto_increment |
| productCode | char(3) | NO | | | |
| name | varchar(30) | NO | | | |
| quantity | int(10) unsigned | NO | | 0 | |
| price | decimal(7,2) | NO | | 99999.99 | |
+-------------+------------------+------+-----+------------+----------------+
and show with some graphic the the top sellers. I'm lost in this subject.
Is there a package that make this reports?
Thanks for the info in advance.
I don't think there is a package to generate the reports. Reports are all about getting data from DB, analyze and send output to the client/browser. What I would suggest is that get the data from DB and send to the client as JSON. In client side, you can use graph plotting packages like Highchart, D3JS etc to plot the graph.
I'm creating a portfolio website that has galleries that contain images. I want the user of this portfolio to be able to order the images within a gallery. The problem itself is fairly simple I'm just struggling with deciding on a solution to implement.
There are 2 solutions I've thought of so far:
Simply adding an order column (or priority?) and then querying with an ORDER BY clause on that column. The disadvantage of this being that to change the order of a single image I'd have to update every single image in the gallery.
The second method would be to add 2 nullable columns next and previous that simply store the ID of the next and previous image. This would then mean there would be less data to update when the order was changed; however, it would be much more complex to set up and I'm not entirely sure how I'd actually implement it.
Extra options would be great.
Are those options viable?
Are there better options?
How could / should they be implemented?
The current structure of the two tables in question is the following:
mysql> desc Gallery;
+--------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+-------------------+-----------------------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| title | varchar(255) | NO | | NULL | |
| subtitle | varchar(255) | NO | | NULL | |
| description | varchar(5000) | NO | | NULL | |
| date | datetime | NO | | NULL | |
| isActive | tinyint(1) | NO | | NULL | |
| lastModified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------+------------------+------+-----+-------------------+-----------------------------+
mysql> desc Image;
+--------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+-------------------+-----------------------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| galleryId | int(10) unsigned | NO | MUL | NULL | |
| description | varchar(250) | YES | | NULL | |
| path | varchar(250) | NO | | NULL | |
| lastModified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------+------------------+------+-----+-------------------+-----------------------------+
Currently there is no implementation of ordering in any form.
while 1 is a bit ugly you can do:
UPDATE table set order=order+1 where order>='orderValueOfItemYouCareAbout';
this will update all the rest of the images and you wont have to do a ton of leg work.
As bart2puck has said and I stated in the question, option 1 is a little bit ugly; it is however the option I have chosen to go with to simplify the solution all round.
I have added a column (displayOrder int UNSIGNED) to the Image table after path. When I want to re-order a row in the table I simply swap rows around. So, if I have 3 rows:
mysql> SELECT id, galleryId, description, displayOrder FROM Image ORDER BY displayOrder;
+-----+-----------+----------------------------------+--------------+
| id | galleryId | description | displayOrder |
+-----+-----------+----------------------------------+--------------+
| 271 | 20 | NULL | 1 |
| 270 | 20 | Tracks leading into the ocean... | 2 |
| 278 | 20 | NULL | 3 |
+-----+-----------+----------------------------------+--------------+
3 rows in set (0.00 sec)
If I want to re-order row 278 to appear second rather than third, I'll simply swap it with the second by doing the following:
UPDATE Image SET displayOrder =
CASE displayOrder
WHEN 2 THEN 3
WHEN 3 THEN 2
END
WHERE galleryId = 20
AND displayOrder BETWEEN 2 AND 3;
Resulting in:
mysql> SELECT id, galleryId, description, displayOrder FROM Image ORDER BY displayOrder;
+-----+-----------+----------------------------------+--------------+
| id | galleryId | description | displayOrder |
+-----+-----------+----------------------------------+--------------+
| 271 | 20 | NULL | 1 |
| 278 | 20 | NULL | 2 |
| 270 | 20 | Tracks leading into the ocean... | 3 |
+-----+-----------+----------------------------------+--------------+
3 rows in set (0.00 sec)
One possible issue that some people may find is that you can only alter the position by one place with this method, i.e. to move image 278 to appear first I'd have to make it second, then first, otherwise the current first image would appear third.
I am working on a file storage application and I am using PHP and MySQL as my tools. My use case is user can share multiple files with multiple users using the app. The sender will send URLs via email and the receiver has to click on the link and login to see the file. So when a user shares a file I want to make an entry to the database.
id|filename|shared_with|shared_by|shared_on|shared_url|url_expiration
Now, the above is my currently thought database structure. But in this case I will have to store multiple multiple values if the same file is shared with multiple users, which I believe is not a good way to do. Also, storing comma(,) separated values is not a good idea.
I gave a thought to Document database like Mongo DB (just because dropbox uses it and handles key value pair data well). But as MySQL can handle a decent amount of records and NOSQL can be a potential solution for bigdata now, I am not sure which would be the right way to go about this use case.
I would like the experts to throw some light on it. I am using Amazon S3 for storing files.
Here's a very basic design to get you started...
You need a table to store your file information:
files
id unsigned int(P)
owner_id unsigned int(F users.id)
name varchar(255)
+----+----------+----------+
| id | owner_id | name |
+----+----------+----------+
| 1 | 1 | File A |
| 2 | 1 | File B |
| 3 | 1 | File C |
| 4 | 2 | File 123 |
| .. | ........ | ........ |
+----+----------+----------+
You need a table to store information about what files where shared with whom. In my example data you see bob shared File A with mary and jim, then he shared File B with mary.
shares
id unsigned int(P)
file_id unsigned int(F files.id)
shared_with unsigned int(F user.id)
shared datetime
url varchar(255)
url_expires datetime
+----+---------+-------------+---------------------+-------+---------------------+
| id | file_id | shared_with | shared | url | url_expires |
+----+---------+-------------+---------------------+-------+---------------------+
| 1 | 1 | 2 | 2014-01-06 08:00:00 | <url> | 2014-01-07 08:00:00 |
| 2 | 1 | 3 | 2014-01-06 08:00:00 | <url> | 2014-01-07 08:00:00 |
| 3 | 2 | 2 | 2014-01-06 08:15:32 | <url> | 2014-01-07 08:15:32 |
| .. | ....... | ........... | ................... | ..... | ................... |
+----+---------+-------------+---------------------+-------+---------------------+
And finally you need a table to store user information.
users
id unsigned int(P)
username varchar(32)
password varbinary(255)
...
+----+----------+----------+-----+
| id | username | password | ... |
+----+----------+----------+-----+
| 1 | bob | ******** | ... |
| 2 | mary | ******** | ... |
| 3 | jim | ******** | ... |
| .. | ........ | ........ | ... |
+----+----------+----------+-----+
+----------------------------+------------------------------------------------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+------------------------------------------------------------------------------+------+-----+---------+----------------+
| type | enum('Website','Facebook','Twitter','Linkedin','Youtube','SeatGeek','Yahoo') | NO | MUL | NULL | |
| name | varchar(100) | YES | MUL | NULL | |
| processing_interface_id | bigint(20) | YES | MUL | NULL | |
| processing_interface_table | varchar(100) | YES | MUL | NULL | |
| create_time | datetime | YES | MUL | NULL | |
| run_time | datetime | YES | MUL | NULL | |
| completed_time | datetime | YES | MUL | NULL | |
| reserved | int(10) | YES | MUL | NULL | |
| params | text | YES | | NULL | |
| params_md5 | varchar(100) | YES | MUL | NULL | |
| priority | int(10) | YES | MUL | NULL | |
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| status | varchar(40) | NO | MUL | none | |
+----------------------------+------------------------------------------------------------------------------+------+-----+---------+----------------+
select * from remote_request use index ( processing_order ) where remote_request.status = 'none' and type = 'Facebook' and reserved = '0' order by priority desc limit 0, 40;
This table receives an extremely large amount of writes and reads. each remote_request ends up being a process, which can spawn anywhere between 0 and 5 other remote_requests depending on the type of request, and what the request does.
The table is currently sitting at about 3.5 Million records, and it goes to a snail pace when the site itself is under heavy load and I have more then 50 or more instances running simultaneously. (REST requests are the purpose of the table just in case you were not sure).
As the table grows it just gets worse and worse. I can clear the processed requests out on a daily basis but ultimatly this is not fixing the problem.
What I need is for this query to always have a very low response ratio.
Here are the current indexes on the table.
+----------------+------------+----------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+------------+----------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| remote_request | 0 | PRIMARY | 1 | id | A | 2403351 | NULL | NULL | | BTREE | | |
| remote_request | 1 | type_index | 1 | type | A | 18 | NULL | NULL | | BTREE | | |
| remote_request | 1 | processing_interface_id_index | 1 | processing_interface_id | A | 18 | NULL | NULL | YES | BTREE | | |
| remote_request | 1 | processing_interface_table_index | 1 | processing_interface_table | A | 18 | NULL | NULL | YES | BTREE | | |
| remote_request | 1 | create_time_index | 1 | create_time | A | 160223 | NULL | NULL | YES | BTREE | | |
| remote_request | 1 | run_time_index | 1 | run_time | A | 343335 | NULL | NULL | YES | BTREE | | |
| remote_request | 1 | completed_time_index | 1 | completed_time | A | 267039 | NULL | NULL | YES | BTREE | | |
| remote_request | 1 | reserved_index | 1 | reserved | A | 18 | NULL | NULL | YES | BTREE | | |
| remote_request | 1 | params_md5_index | 1 | params_md5 | A | 2403351 | NULL | NULL | YES | BTREE | | |
| remote_request | 1 | priority_index | 1 | priority | A | 716 | NULL | NULL | YES | BTREE | | |
| remote_request | 1 | status_index | 1 | status | A | 18 | NULL | NULL | | BTREE | | |
| remote_request | 1 | name_index | 1 | name | A | 18 | NULL | NULL | YES | BTREE | | |
| remote_request | 1 | processing_order | 1 | priority | A | 200 | NULL | NULL | YES | BTREE | | |
| remote_request | 1 | processing_order | 2 | status | A | 200 | NULL | NULL | | BTREE | | |
| remote_request | 1 | processing_order | 3 | type | A | 200 | NULL | NULL | | BTREE | | |
| remote_request | 1 | processing_order | 4 | reserved | A | 200 | NULL | NULL | YES | BTREE | | |
+----------------+------------+----------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Any idea how i solve this? Is it not possible to make some sort of complicated index that would automatic order them with priority, then take the first 40 that match the 'Facebook' type? It currently is scanning more then 500k rows of the table before it returns a result which is grossly inefficient.
Some other version of the query that I have been tinkering with are:
select * from remote_request use index ( type_index,status_index,reserved_index,priority_index ) where remote_request.status = 'none' and type = 'Facebook' and reserv ed = '0' order by priority desc limit 0, 40
It would be amazing if we could get the rows scanned to under 1000 rows depending on just how many types of requests enter the table.
Thanks in advance, this might be a real nutcracker for most except the most experienced mysql experts?
Your four-column index has the right columns, but in the wrong order.
You want the index to first look up matching rows, which you do by three columns. You are looking up by three equality conditions, so you know that once the index finds the set of matching rows, the order of these rows is basically a tie with respect to those first three columns. So to resolve the tie, add as the fourth column the column by which you wanted to sort.
If you do that, then the ORDER BY becomes a no-op, because the query can just read the rows in the order they are stored in the index.
So I would create the following index:
CREATE INDEX processing_order2 ON remote_request
(status, type, reserved, priority);
There's probably not too much significance to the order of the first three columns, since they're all in equality terms combined with AND. But the priority column belongs at the end.
You may also like to read my presentation How to Design Indexes, Really.
By the way, using USE INDEX() shouldn't be necessary if you have the right index, MySQL's optimizer will choose it automatically most of the time. But USE INDEX() can block the optimizer from considering a new index that you create, so it becomes a disadvantage for code maintenance.
This isn't a complete answer but it was too long for a comment:
Are you actually searching on all of those indexes? If not get rid of some. Extra indexes slow down writes.
Secondly use EXPLAIN on your query and don't specify an index when you do. See how MySQL wants to process it rather than forcing an option (Generally it does the right thing).
Finally sorting is likely what hurts you the most. If you don't sort it probably gets the records pretty quickly. It has to scan and sort every row that meets your criteria before it can return the top 40.
Options:
Try creating a VIEW (not as familiar with VIEWS but it might work)
Split this table into smaller tables
use a third party tool such as
Sphinx or Lucene to create specialized indexes to search on. (I've
used Sphinx for something like this before. You can find it at
http://sphinxsearch.com/).
Or look into using a NoSQL solution where you can use a Map function to do it.
Edit I read a bit about using VIEW and I don't think it will help you in your case because you have such a large table. See the answer in this thread: Using MySQL views to increase performance