Make a list of most used keywords - php

I have a mysql table with these columns: id, text, keywords.
ID is an id. Text is a title. Keywords is a list of tags in this format: tag1 tag2 tag3.
How would I go about getting a list of the most used keywords in the column? Eg. if I wanted to build a tag cloud from all the items in the table.

There are ways to do what you want. But it won't be simple. The way you have organized your keywords in this database is going to cause quite a few headaches. You should try to normalize the data.
Perhaps instead of this:
id text keywords
1 bob he she it
2 thing white yellow hello
Have an separate table for the keywords:
id keyword
1 he
1 she
2 white
2 yellow
That way, it would be a much simpler matter to find what you want:
select count(keyword) as num from `keywords` group by keyword order by num desc

A simple way would be to create an array where each key is tag#. The value of each of those keys is the number of times tag# appears in the database; this would involve traversing through each tag in the database.

You might be better off normalising your database tables. Maybe something like this:
Items table : id, text
Tags table : id, text
items_tags table: item_id, tag_id
This way you can associate multiple tags with each item, and queries for tag counts become easy.

Related

MySQL - select a row using REGEXP within multiple columns if any word in query matchs

I have three tables:
post
tags
past_tags
So, I want to make a simple search to get the post ID using it's title only, related tags only, and title and tag.
To explain my point take this example:
In post table I have columns named (post_id, post_title)
Example: post_id: 1 - post_title: my new super car
In tags table I have columns named (tag_id, tag_name)
Example: tag_id: 5 - tag_name: red
In post_tags I record the post_id of "post" and the tag_id of "tag":
Example: post_id: 1 - tag_id: 5
so each post can has many tags (a simple relationship).
I want to select the post post_id if I enter any of these queries:
supper car
red car
red super car
super red car
red
It's a kind of merging the results and match the post_id even if the query is not related to one column.
Thank you.
For "word" matching, learn about FULLTEXT indexing. REGEXP is less practical.

PHP MySQL is taking 0.8s to 3s to load on search query, how to speed up

my MySQL table is in this structure:
|id|title|duration|thumb|videoid|tags|category|views
|1||Video Name|300|thumb1.jpg|134|tag1|tag2|tag3|category|15
|2||Video Name2|300|thumb2.jpg|1135|tag2|tag3|tag4|category|10
Table contains about 317k rows.
Query is:
SELECT id,title,thumb FROM videos WHERE tags LIKE '%$keyword%' or title LIKE '%$keyword%' order by id desc limit 20
And this is taking 0.8s to 3s to load results.
Im new in php/mysql, how can I speed up these queries, suggestions please, thank you.
The only other suggestion I can throw in is to have a multi-part index of
( tags, title, id )
This way, it can utilize the index to qualify the WHERE clause criteria for both tags and title, and have the ID for the order by clause without having to go back to the raw data pages. Then, when records ARE found, only for those entries does it need to actually retrieve the raw data pages for the other columns associated with the row.
You are using this search construct:
column LIKE '%$keyword%'
The leading % wildcard character definitely defeats the use of indexes to do these searches. How to cure this terrible performance problem? You could use FULLTEXT search, about which you can read. Or, you could try to organize your tables so
column LIKE 'keyword%'
will find what you need, and then index the columns being searched. To do this, you would create a tag table, with a name and id for each distinct tag. This table will have a primary key on the id, and a unique key on the tag. E.g.
tag_id | tag
1 | drama
2 | comedy
3 | horror
4 | historical
The you would create another table, known in the trade as a join table, with two ids in it. The primary key of this table is a composite of the two columns. You also need a non-unique index on the tag_id field.
video_id | tag_id
1 | 1
1 | 4
This sample data gives video with id = 1 the tags "drama" and "historical."
Then to match tags you need
SELECT v.id, v.title, v.thumb
FROM video AS v
JOIN tag_video AS tv ON v.id = tv.video_id
JOIN tag AS t ON tv.tag_id = t.tag_id
WHERE t.tag IN ('drama', 'comedy')
This will look up your tags very fast, and let you look up multiple ones in a single query if you wish.
It won't help with your requirement for full text search on your titles, however.
EDITED:
define indexes on title and keyword fields.
try this:
ALTER TABLE `videos` ADD INDEX (`title`);
ALTER TABLE `videos` ADD INDEX (`keyword`);

Use search results (Like %search%), to match id's in another table in the database

I have two tables in the database, parts, and products.
I have a column in the products table with strings of ids (comma separated). Those ids match ids of the parts table.
**parts**
ID | description (I'm searching this part)
-------------------------------
1 | some text here
2 | some different text here
3 | ect...
**products**
ID | parts-list
--------------------------------
1 | 1,2,3
2 | 2,3
3 | 1,2
I'm really struggling with the SQL query on this one.
I've done the 1st part, got the id's from the parts table
SELECT * FROM parts WHERE description LIKE '%{$search}%'
The biggest problem is the comma separated structure of the the description column.
Obviously, I could do it in PHP, create an array of the the results from the parts table, use that to search the products table for id's, and then use those results to grab the row data from the parts table (again). Not very efficient.
I also tried this, but I'm obviously trying to compare two arrays here, not sure how this should be done.
SELECT * FROM `products` WHERE
CONCAT(',', description, ',')
IN (SELECT `id` FROM `parts` WHERE `description` LIKE '%{$search}%')
Can anybody help?
I would perhaps try a combination of LOCATE() and SUBSTR(). I work mainly in MSSQL which has CHARINDEX() that I think works like MySQL's LOCATE(). It is bound to be messy. Are there a variable number of elements in the parts-list field?

MSQL query for displaying results based on the same keywords

I have a table called cakes that contains the columns: id, title, description, keywords. I also have a table called keywords, with cakes being the parent. The keywords table contains two columns: id and keyword
I need two queries: UPDATED
If a person types in ingredients such as chocolate, hazelnut, strawberry (could be anything separated by a comma) I need the query to search for cakes that contain all three keywords and display results. Display ONLY cakes that contain all three. If no cake matches, I need a message saying nothing found.
I have a label on the search box which says, Find similar cakes. If a person types in Vanilla Raspberry or example, the query needs to locate the cake in the database and match its keywords to the keywords of other cakes and display results. Display ONLY cakes that have the same keywords.
Not sure how to write these queries. Any help is appreciated. Thanks!
If the database must use a delimited long-string field for "keywords" rather than putting them in rows, then you will want to use the LIKE Operator
Assuming your [keywords] column is formatted like this:
'chocolate,ganache,strawberry'
You can search for "similar" cakes like this:
SELECT
columns
FROM
table t
WHERE
t.[keywords] LIKE '%chocolate%'
OR t.[keywords] LIKE '%cheesecake%'
Though, if you can change the schema, I would do so. Searching normalized keyword rows will be much more efficient and fast than having the DB parse through text using LIKE
If you could make a keywords table, which references the parent table by ID, you could do an equality search using a JOIN which would be superior, in my opinion.
It might have three columns: Id, ParentId, Keyword
EDIT: So based on your update, you have a cakewords table which can be searched.
This is untested, and there is likely a more efficient way using no IN clause. But the idea is that you know all the keyword id's for your specific cake. Then you are looking for other cakes having keywords in that collection.
SELECT
columns
FROM
cake AS cs
JOIN
cakewords AS csw
ON csw.[cakeid] = cs.[id]
WHERE
csw.[wordid] IN
(SELECT
cw.[wordid]
FROM
cakewords AS cw
JOIN
cakes AS c
ON c.[id] = cw.[cakeid]
WHERE
c.[id] = #pMyCurrenctCakeId
(
EDIT2: Here is a good related question:
What's the optimal solution for tag/keyword matching?
Based on an answer within, you might try this:
SELECT DISTINCT
c.[id]
FROM
cakewords AS cw1
INNER JOIN cakewords cw2
ON cw2.[wordid] = cw1.[wordid]
INNER JOIN cake AS c
ON c.[id] = cw.[cakeid]
WHERE
cw1.[cakeid] = #current_cake_id

Best way to store "tags" for speed in enormous table

I'm developing a big content site, with a table "contents", with more than 50 Million of records. Here's the table structure:
contain id(INT11 INDEX),
name(varchar150 FULLTEXT),
description (text FULLTEXT),
date(INT11 INDEX)
I wan to add a "tags" to this contents.
I'm think 2 methods:
Make a varchar(255 FULLTEXT) "tags" column in table contents. Store all tags separated by comas, and search row by row (Which I think this will be slow) using MATCH & AGAINS.
Make 2 tables. First table name "tags" with columns id, tag(varchar(30 INDEX or FULLTEXT?)), "contents_tags" with id, tag_id (int11 INDEX) and content_id (int11 INDEX) and search contents by a JOINS of 3 tables (contents - contents_tags - tags) to retrieve all contents with the tag(s).
I think this is slow and memory killer because a ENORMOUS JOIN of 50M
table * contents_tags * tags.
What is the best method to store tags to make it as efficient as possible? What is the fastest way to search by a text (for example "movie 3d 2011" and simple tag "video") and to locate contents.?
The size of the table (approx. 5Gb now without tags). The table is a MYISAM because I need to store name and description of the table contents in FULLTEXT to string search (users ca search now by this fields), and need the best speed to search by tags.
Any with experience in this?
Thanks!
FULLTEXT indexes are really not as fast as you may think they are.
Use a separate table to store your tags:
Table tags
----------
id integer PK
tag varchar(20)
Table tag_link
--------------
tag_id integer foreign key references tag(id)
content_id integer foreign key references content(id)
/* this table has a PK consisting of tag_id + content_id */
Table content
--------------
id integer PK
......
You SELECT all content with tag x by using:
SELECT c.* FROM tags t
INNER JOIN tag_link tl ON (t.id = tl.tag_id)
INNER JOIN content c ON (c.id = tl.content_id)
WHERE tag = 'test'
ORDER BY tl.content_id DESC /*latest content first*/
LIMIT 10;
Because of the foreign key, all fields in tag_links are individually indexed.
The `WHERE tags = 'test' selects 1 (!) record.
Equi-joins this with 10,000 taglinks.
And Equi-joins that with 1 content record each (each tag_link only ever points to 1 content).
Because of the limit 10, MySQL will stop looking as soon as it has 10 items, so it really only looks at 10 tag_links records.
The content.id is autoincrementing, so higher numbers are very fast proxy for newer articles.
In this case you never need to look for anything other than equality and you start out with 1 tag that you equi-join using integer keys (the fastest join possible).
There are no if-thens-or-buts about it, this is the fastest way.
Note that because there are at most a few 1000 tags, any search will be much faster than delving in the full contents table.
Finally
CSV fields are a very bad idea, never use then in a database.

Categories