I'm writing a word search program.
My database is set up to MyISAM
with one table (Words) structured
WordID | String | A | B | ... | Z |
------------------------------------
int varchar int int ... int
Where the values for columns A - Z are the # of occurrences of that letter in the string.
To write a query to find all possible words made out of a specified (but dynamic - user chosen) set of characters (including wild characters) ie: "Bu!!er" should return but, butt, bull, etc
Where
S is the set of characters specified that we can use
W is the set of characters in a word
I'll need to query the database for all strings where
# of occurences in the word for each specified character (not including "!") is less than number of occurrences of that character in the specified string
W_k < S_k where k is each character specified
AND
# of occurrences of letters not specified in the specified string are in SUM less than the total occurrences of the wildcard character ("!") in the specified string
W_q < S_! where q is each character not specified and S_! total amount of occurrences of "!".
For the first part of the WHERE statement (W_k < S_k)
For bu!!er the statement would be
`B` <= 1 AND `U` <= 1 AND `E` <= 1 AND `R` <= 1
And for the second part
`A` + `C` + `D` + ... + `Z` <= 2
The complete Where part of the query becomes
( ( `A` + (IF(`B`-1 < 0, 0, `B`-1)) + `C` + `D` + (IF(`E`-1 < 0, 0, `E`-1)) + `F` + `G` + `H` + `I` + `J` + `K` + `L` + `M` + `N` + `O` + `P` + `Q` + (IF(`R`-1 < 0, 0, `R`-1)) + `S` + `T` + (IF(`U`-1 < 0, 0, `U`-1)) + `V` + `W` + `X` + `Y` + `Z` ) <= 2 )
Is there a better way to do it than this?
`A` + `C` + `D` + ... + `Z`
Use denormalization? Store the full length in a separate column.
`TOTAL` <= 5
As a sidenote:
Your schema restricts the possible queries too much - though it's enough for this job. It might be better to keep all words in the memory (one per server instance) and do "full table scans" or "indexed scans" on the words.
In the above tree each node has a name and value. Each node can have 6 children at max. How to store it in MySQL database to perform the below operations efficiently?
Operations
1) grandValue(node) - should give (sum of all of the descendants' values, including self)
Eg.,
grandValue(C) = 300
grandValue(I) = 950
grandValue(A) = 3100
2) children(node) - should give the list of all children (immediate descendants only)
Eg.,
children(C) = null
children(I) = L,M,N
children(A) = B,C,D,E
3) family(node) - should give the list of descendants
family(C) = null
family(I) = L,M,N
family(A) = B,C,D,E,F,G,H,I,J,K,L,M,N
4) parent(node) - should give the parent of the node
parent(C) = A
parent(I) = D
parent(A) = null
5) insert(parent, node, value) - should insert node as a child of parent
insert(C, X, 500) Insert a node name X with value 500 as C's child
I am thinking of using recursive methods to do these manipulations as we do with binary trees. But I am not sure if that's the optimal way to do it. The tree may hold 10 to 30 million nodes and maybe skewed. So dumping the data into memory stack is my area of concern.
Please help.
NOTE: I am using PHP, MySQL, Laravel, on VPS Machine.
UPDATE: Tree will grow in size. New nodes will be added as a child of leaf nodes or nodes which has less than 6 nodes and not in-between 2 nodes.
You could store the data in a table using nested sets.
http://en.wikipedia.org/wiki/Nested_set_model#Example
I worry that your millions of nodes may make life difficult if you intend to constantly add new items. Perhaps that concern could be mitigated by using rational numbers instead of integers as the left and right values. Add a column for depth to speed up your desire to ask for descendants. I wrote some SQL to create the table and the stored procedures you asked for. I did it in SQL Server do the syntax might be slightly different but it's all standard SQL statements being executed. Also I just manually decided the upper and lower bounds for each Node. Obviously you'd have to deal with writing the code to get these nodes inserted (and maintained) in your database.
CREATE TABLE Tree(
Node nvarchar(10) NOT NULL,
Value int NOT NULL,
L int NOT NULL,
R int NOT NULL,
Depth int NOT NULL,
);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('A', 100, 1, 28, 0);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('B', 100, 2, 3, 1);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('C', 300, 4, 5, 1);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('D', 150, 6, 25, 1);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('E', 200, 26, 27, 1);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('F', 400, 7, 8, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('G', 250, 9, 10, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('H', 500, 11, 12, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('I', 350, 13, 21, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('J', 100, 21, 22, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('K', 50, 23, 24, 2);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('L', 100, 14, 15, 3);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('M', 300, 16, 17, 3);
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES ('N', 200, 18, 19, 3);
CREATE PROCEDURE grandValue
#Node NVARCHAR(10)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #lbound INT;
DECLARE #ubound INT;
SELECT #lbound = L, #ubound = R FROM Tree WHERE Node = #Node
SELECT SUM(Value) AS Total FROM TREE WHERE L >= #lbound AND R <= #ubound
RETURN
END;
EXECUTE grandValue 'C';
EXECUTE grandValue 'I';
EXECUTE grandValue 'A';
CREATE PROCEDURE children
#Node NVARCHAR(10)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #lbound INT;
DECLARE #ubound INT;
DECLARE #depth INT;
SELECT #lbound = L, #ubound = R, #depth=Depth FROM Tree WHERE Node = #Node
SELECT Node FROM TREE WHERE L > #lbound AND R < #ubound AND Depth = (#depth + 1)
RETURN
END;
EXECUTE children 'C';
EXECUTE children 'I';
EXECUTE children 'A';
CREATE PROCEDURE family
#Node NVARCHAR(10)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #lbound INT;
DECLARE #ubound INT;
SELECT #lbound = L, #ubound = R FROM Tree WHERE Node = #Node
SELECT Node FROM TREE WHERE L > #lbound AND R < #ubound
RETURN
END;
EXECUTE family 'C';
EXECUTE family 'I';
EXECUTE family 'A';
CREATE PROCEDURE parent
#Node NVARCHAR(10)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #lbound INT;
DECLARE #ubound INT;
DECLARE #depth INT;
SELECT #lbound = L, #ubound = R, #depth = Depth FROM Tree WHERE Node = #Node
SELECT Node FROM TREE WHERE L < #lbound AND R > #ubound AND Depth = (#depth - 1)
RETURN
END;
EXECUTE parent 'C';
EXECUTE parent 'I';
EXECUTE parent 'A';
CREATE PROCEDURE ancestor
#Node NVARCHAR(10)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #lbound INT;
DECLARE #ubound INT;
SELECT #lbound = L, #ubound = R FROM Tree WHERE Node = #Node
SELECT Node FROM TREE WHERE L < #lbound AND R > #ubound
RETURN
END;
EXECUTE ancestor 'C';
EXECUTE ancestor 'I';
EXECUTE ancestor 'A';
For creating the nested sets in the table in the first place you can run some code to generate the inserts or start with the first node and then successively add each additional node - although since each add potentially modifies many of the nodes in the set there can be a lot of thrashing of the database as you build this.
Here's a stored procedure for adding a node as a child of another node:
CREATE PROCEDURE insertNode
#ParentNode NVARCHAR(10), #NewNodeName NVARCHAR(10), #NewNodeValue INT
AS
BEGIN
SET NOCOUNT ON;
DECLARE #ubound INT;
DECLARE #depth INT;
SELECT #ubound = R, #depth = Depth FROM Tree WHERE Node = #ParentNode
UPDATE Tree SET L = L + 2 WHERE L >= #ubound
UPDATE Tree SET R = R + 2 WHERE R >= #ubound
INSERT INTO Tree (Node, Value, L, R, Depth) VALUES (#NewNodeName, #NewNodeValue, #ubound, #ubound + 1, #depth + 1);
RETURN
END;
I got this from http://www.evanpetersen.com/item/nested-sets.html who also shows a nice graph walking algorithm for creating the initial L and R values. You'd have to enhance this to keep track of the depth as well but that's be easy.
trying to rename duplicates in MySQL database so far using that code but this only adding 1 at the end of name. So if I have
UPDATE phpfox_photo n
JOIN (SELECT title_url, MIN(photo_id) min_id FROM phpfox_photo GROUP BY title_url HAVING COUNT(*) > 1) d
ON n.title_url = d.title_url AND n.photo_id <> d.min_id
SET n.title_url = CONCAT(n.title_url, '1');
Anna
Anna
Anna
Result is
Anna
Anna1
Anna11
When I got 200 Annas result is Anna1111111111111111111111111111111111111111111....etc
how do I do it to rename in the following inc
Anna
Anna1
Anna2
if i didn't miss something you can make a stored procedure that iterates throw your rows using cursors to do that as following:
DECLARE counter INT DEFAULT 0;
DECLARE num_rows INT DEFAULT 0;
DECLARE offset INT;
DECLARE title_urlvalue VARCHAR(50);
DECLARE no_more_rows BOOLEAN;
DECLARE ucur CURSOR FOR
SELECT
UPDATE phpfox_photo n
JOIN (SELECT title_url, MIN(photo_id) min_id
FROM phpfox_photo GROUP BY title_url HAVING COUNT(*) > 1) d
ON n.title_url = d.title_url AND n.photo_id <> d.min_id;
SET offset = 1;
SET no_more_rows = TRUE;
select FOUND_ROWS() into num_rows;
OPEN ucur;
uloop: LOOP
FETCH ucur
if counter >= num_rows then
no_more_rows = False;
endif
INTO title_urlvalue;
IF no_more_rows THEN
CLOSE ucur;
LEAVE uloop;
END IF;
update title_urlvalue = Concat(title_urlvalue,offset);
SET offset = offset + 1;
SET counter = counter + 1;
END LOOP uloop;
close ucur;
With User-Defined Variables
SET #counter:=0;
SET #title_url:='';
UPDATE phpfox_photo n
JOIN (SELECT title_url, MIN(photo_id) min_id
FROM phpfox_photo
GROUP BY title_url
HAVING COUNT(*) > 1) d
ON n.title_url = d.title_url AND n.photo_id <> d.min_id
SET n.title_url = IF(n.title_url <> #title_url, CONCAT(#title_url:=n.title_url, #counter:=1), CONCAT(n.title_url, #counter:=#counter+1));
Maybe you can use modulo to produce numbering, like this (SQLite example, but should be similar in mysql):
SELECT *, (rowid % (SELECT COUNT(*) FROM table as t WHERE t.name = table.name ) ) FROM table ORDER BY name
All you need is to translate rowid and modulo function, both availible in mysql.
Then you can CONCAT results as you desire.
UPDATE phpfox_photo n
JOIN
(SELECT title_url,
MIN(photo_id) min_id
FROM phpfox_photo
GROUP BY title_url
HAVING COUNT(*) > 1
)
d
ON n.title_url = d.title_url
AND n.photo_id <> d.min_id
SET n.title_url =
CASE
WHEN <last char is int>
THEN <replace last char with incremented last char>
ELSE <string + 1>
END
I've got the following query to determine how many votes a story has received:
SELECT s_id, s_title, s_time, (s_time-now()) AS s_timediff,
(
(SELECT COUNT(*) FROM s_ups WHERE stories.q_id=s_ups.s_id) -
(SELECT COUNT(*) FROM s_downs WHERE stories.s_id=s_downs.s_id)
) AS votes
FROM stories
I'd like to apply the following mathematical function to it for upcoming stories (I think it's what reddit uses) -
http://redflavor.com/reddit.cf.algorithm.png
I can perform the function on the application side (which I'm doing now), but I can't sort it by the ranking which the function provides.
Any advise?
Try this:
SELECT s_id, s_title, log10(Z) + (Y * s_timediff)/45000 AS redditfunction
FROM (
SELECT stories.s_id, stories.s_title, stories.s_time,
stories.s_time - now() AS s_timediff,
count(s_ups.s_id) - count(s_downs.s_id) as X,
if(X>0,1,if(x<0,-1,0)) as Y,
if(abs(x)>=1,abs(x),1) as Z
FROM stories
LEFT JOIN s_ups ON stories.q_id=s_ups.s_id
LEFT JOIN s_downs ON stories.s_id=s_downs.s_id
GROUP BY stories.s_id
) as derived_table1
You might need to check this statement if it works with your datasets.
y and z are the tricky ones. You want a specific return based on x's value. That sounds like a good reason to make a function.
http://dev.mysql.com/doc/refman/5.0/en/if-statement.html
You should make 1 function for y and one for z. pass in x, and expect a number back out.
DELIMINATOR //
CREATE FUNCTION y_element(x INT)
RETURNS INT
BEGIN
DECLARE y INT;
IF x > 0 SET y = 1;
ELSEIF x = 0 SET y = 0;
ELSEIF x < 0 SET y = -1;
END IF;
RETURN y;
END //;
DELIMINATOR;
There is y. I did it by hand without checking so you may have to fix a few typo's.
Do z the same way, and then you have all of the values for your final function.