So I want to index the lyrics from a lyrics website and then perform operations on the lyrics (search for certain artists, terms, patterns etc) .
I figure the best scenario is if there is already some structured file format for me to use--> anyone know if anything like this exists?
The next best thing would be a site that is "amenable" to what I am trying to do--> any such site?
Any comments in general about how I can do this speedily? (This is supposed to be a fun project and not a heavy duty application)
Thanks!
Downloading the lyric database from a site is bad idea, you can query it for each lyric you want instead.
Even if you download all the lyrics, don't store them on a flat-file(maybe xml?), instead of use a database like sqlite. Otherwise the operations like searching or listing would be painful.
But no idea about amenable sites.
Edit; I found ChartLyrics API; you can use their API easily.
Generally,
1) Download that lyric and store it in separate table in your database
table: lyrics (example)
+---------+-------------+-----------------+-------------------------------+
| lyr_id | lyr_artist | lyr_title | lyr_content |
+---------+-------------+-----------------+-------------------------------+
| 1 | Metallica | The Unforgiven | New blood joins this earth... |
+---------+-------------+-----------------+-------------------------------+
...
+---------+-------------+-----------------+-------------------------------+
2) Search artist in column lyr_artist, song title in column lyr_title, text (keywords) in lyr_content, etc.
Query examples
SELECT * FROM lyrics WHERE lyr_artist='artist';
SELECT * FROM lyrics WHERE lyr_title='song_title';
SELECT * FROM lyrics WHERE lyr_content LIKE '%word1%' AND lyr_content LIKE '%word2%'
Well, generally, something like that.. or mix WHERE condition. You can use WHERE...LIKE to columns like song title and artist too, for example to find song "The Unforgiven" if user asks for keyword "Unforgiven", etc.
3) Use query result to display search results
Note: Storing data in files on server is not as good as storing it in database, in terms of speed.
Related
Im building a yellow pages site. I tried multiple database structures. Im not sure which one is best. Here are few I considered,
Saving all business data - name, phone, email etc in one table, list of tags in another, and mapping data id and tag id for tag-data relationship in a third table. I found this cumbersome since I'll be doing most things directly in the database (at least initially, before launch) and hence distributing everything can be problematic in my case. This one is a clean solution I must admit though.
Saving biz entries in one table with a separate column for tags (that'll contain comma separated(or JSON) tags for every entry). Then retrieving results using like query or full-text search for a tag. This one will be slower and will get more slow as db size increases. Also its not easy to maintain - suppose if I have to rename a tag.
(My Preferred Choice) Distributing biz data in different tables based on type - all banks in one, hotels, restaurants etc in separate tables. A separate table for all tags containing a rule for searching data from the table. Here is a detailed explanation.
Biz Tables:
college_tbl, bank_tbl, hotel_tbl, restaurant_tbl...so on
Tags Table
ID | Biz Table | Tag Name | Tag Key | Match Rule (col:like_query_part)
1 | bank_tbl | Citi Bank Branches | ['citi','bank'] | 'name:%$1%$2%'
2 | restaurant_tbl | Pizza Hut Restaurants | ['pizza','hut'] | 'name:%$1%$2%'
3 | hotel_tbl | The Leela Hotels | ['the leela'] | 'name:%$1%'
I'll then use 'Match rule' in like query to fetch results from 'Biz Table' for 'Tag Name'.
Im going forward with the third approach. I feel its simple, reduces the need of third data-tag relationship table, renaming is easy and performance won't get down if table has limited entries - say 1 million max per table.
Im scratching my head for the last 15 days to find the best structure and feel this one is pretty good in my case.
Please suggest a better approach or if this approach could have some issues later on.
Use Number 1. Period, full stop.
The mistake is "doing things directly in the database" rather than developing the API first.
Number 2 has one advantage -- FULLTEXT search. That can be tacked onto #1 after you have have a working API and some data to play with.
Number 3 (multiple similar tables) is a fisaco. Numerous Q&A ask about such; the reply is always "NO".
I have a database in MySQL that currently lists approximately 1500 concerts and events. Now, the plan is to add setlists (list of the songs performed at the concerts) for all the concerts in the database. Basically this will mean a lot of repeated values (songs performed at many concerts), and I would really appriciate some input on what the best approach would be.
I initially started out with a database similar to this;
| eventID | edate | venue | city | setlist |
The field setlist was basically text data, where I could paste the list of songs and parse through it to put each song on a new line with php. This works, and editing the text and running order was like editing a text document. Now, obviously this was pretty simple, but has drawbacks and limitations. Simple things like getting stats on songs performed is probably very difficult, right?
So, what is the best way to store the setlist value?
Create a new table that adds a new row for each song performed, and that has a foreign key linking to eventID? How would I best retain (and edit, if needed) the running order of the songs in that table? Any other suggestions?
Thanks for any input or advice on this, as I would love to get some help before I start adding all the data.
I would create a table that holds each song performed at a specific event:
| songId | eventID | song |
Where eventID can be duplicated in multiple rows to show each song performed at that event.
This way you can query all the times a specific song was performed, and also get all songs (the setlist) for a specific event by querying on the eventID.
I'm making a blog system and I want to add 'tags' to my blogposts. These are similar to the tags you see here, they can be used to group posts with similar subjects.
I want to store the tags in the database as a comma-separated string of words (non-whitespaced strings). But I'm not quite sure how I would search for all posts containing tag A and tag B.
I don't like a simple solution that works with a small database where I retrieve all data and scan it with a PHP loop, because this won't work with a large database (hundreds if not thousands of posts). I do not intend to make this many blogposts, but I want the system to be solid and save worktime on the PHP scripts by getting right results straight from the database.
Let's say my table looks like this (it's a bit more complex actually)
blogposts:
id | title | content_html | tags
0 | "hello world" | "<em>hello world!</em>" | "hello,world,tag0"
1 | "bye world" | "<strong>bye world!</strong>" | "bye,world,tag1,tag2"
2 | "hello you" | "hello you! :>" | "hello,tag3,you"
How would I be able to select all posts that contain "hello" as well as "world" in the tags? I know about the LIKE statement, where you can search for substrings, but can you use it with multiple substrings?
You can't index a field of csv values in a meaningful way, and SQL doesn't support being able to find a unique value in a field of CSV values. Instead, you'll want to set up two more tables, and make the following alteration to your table.
blogposts:
id | title | content_html
tags:
id | tag_name
taxonomy table:
id | blogpost_id | tag_id
When you add a tag to a blog post, you will insert a new record into the taxonomy table. When you query for data, you'll join across all three tables to get the information similar to this:
SELECT `tag_name` FROM `blogposts` INNER JOIN `blogposts_taxonomy` ON
`blogposts`.`id`=`blogposts_taxonomy`.`blogpost_id` INNER JOIN `blogpost_tags` ON
`blogposts_taxonomy`.`tag_id`=`blogpost_tags`.`id` WHERE `blogposts`.`id` = someID;
//UPDATE
Setting up the N:M relationship gives you a lot of options during the build out of your application. For example, say you wanted to be able to search for blogposts that were all tagged "php." You could do that as follows:
SELECT `id`,`html_content` FROM `blogposts` INNER JOIN `blogposts_taxonomy` ON
`blogposts`.`id`=`blogposts_taxonomy`.`blogpost_id` INNER JOIN `blogposts_tags` ON
`blogposts_taxonomy`.`tag_id`=`blogposts_tags`.`id` WHERE `blogposts_tags`.`tag_name`="php";
That will return all blogposts that have been tagged with the "php" tag.
Cheers
If you really wanted to store the data like this the FIND_IN_SET mysql function would be your friend.
Have the function twice in the where clause.
But it will perform horribly - having a linked table one-to-many style as already suggested is MUCH better idea. If you have lots of the same tags a many-to-many could be used. Via a 'post2tag' table.
I want to make a music playing website where users can save playlists of songs to be regenerated later. I'm kind of a newbie to sql, but it seems like databases are meant to hold fixed-length variables, whereas a user-generated playlist would be an arbitrary length. There are a couple ways I've thought of to handle this:
Separate tables (maybe another table for each playlist? )
XML
I feel like there's an easy third way I'm missing. I'm doing this in php, but if there's a super easy way using django I'd also be interested.
2 tables:
Playlists. Fields: id | title | owner_id (reference to user.id)
Songs. Fields: id | title | length | playlist_id (reference to playlist.id)
How about this:
Playlists: list_id|title|owner_id
Songs: song_id|title|artist|album|year|length|style|whatevereelseyouwnattoadd
Songs_In_Lists: song_id|list_id
Third table just ties songs to playlists.
otherwise there will be a lot of redundancy with song info if song goes to multiple playlists.
The primary key for the third table will be on both columns. Same song goes to same list only once, so it works fine.
A WordPress build i am working on wants to pull in stories from rss feeds, and then allow users of the site to add comments and star ratings to each one.
It doesn't really seem like the correct useage of rss to me, but is this sort of thing possible without importing/syncing the rss feeds with the database?
At the very least you need some way of associating ratings with a particular story. This means storing some unique 'story' identifier so you can retrieve it later and calculate its ratings and comments. You could get away with not syncing the entire feed if you could come up with a reliable means of identifying and associating the unique_id I mentioned.
Example:
#dbo.stories_comments
--------------------
|story_id | comment|
--------------------
| 12345 | Lorem..|
| abcde | Ipsum..|
--------------------
Like I said, the tricky part is coming up with the story_id
Presumably you don't want stories users have voted on to disappear when they fall out of the RSS feed, so you're going to have to store a copy of said story in your database.
So the short answer to your question is "No."
Additionally, I don't see any reason this isn't a "correct useage of rss".