Database number of columns and separate table? [closed]

Database number of columns and separate table? [closed] - php

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm using MySQL database and wondering about the database table designs.
I see sets of two kinds of tables designed by an experience PHP developer. Basically one table contains some dynamic stats and the other table more static ones but each record would have the same row_id. For example, a user table where info like name, pass are stored and a user_stats table where like a summary of actions maybe like the total money spent, etc.
Is there any real advantage in doing this besides adding more clarity to the functions of each table. How many columns would you say is optimal in a table? Is it bad to mix the more static info and dynamic stuff together into like 20 columns of the same table?

From a design standpoint, you might consider that the User table describes things, namely "users." There might be hundreds of thousands of them, and rows might need to remain in that table ... even for Users who've been pushing-up daisies in the local graveyard for many years now ... because this table associates the user_id values that are scattered throughout the database with individual properties of that "thing."
Meanwhile, we also collect User_Stats. But, we don't keep these stats nearly so long, and we don't keep them about every User that we know of. (Certainly not about the ones who live in the graveyard.) And, when we want to run reports about those statistics, we don't want to pore through all of those hundreds-of-thousands of User records, looking for the ones that actually have statistics.
User_Stats, then, is an entirely separate collection of records. Yes, it is related to Users, e.g. in the referential-integrity sense that "any user_id (foreign key ...) in User_Stats must correspond to a user_id in Users. But it is, nevertheless, "an entirely separate collection of records."
Another important reason for keeping these as a separate table is that there might naturally be a one-to-many relationship between a User and his User_Stats. (For instance, maybe you keep aggregate statistics by day, week, or month ...)
If you have nothing better to do with your afternoon than to read database textbooks ... ;-) ... the formal name of this general topic is: "normal forms." ("First," "Second," "Third," and so on.)

Related

Implementing voting or "likes" using MySQL [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am creating a system where users (who are identified by a user id number) will be allowed to vote on posts (think Reddit, StackOverflow, etc).
Users can vote a post up or not vote at all on it.
The number of votes on a given post can easily be stored within the table containing the posts.
Keeping track of who has voted, however, is a different task entirely that I'm not sure how to approach.
I was thinking I could have a table that would have two columns: user id and post id.
When they vote on a post, I add their user id and post id to that table. If they unvote, I remove that entry from the table.
EG:
User ID | Post ID
1 | 3949
1 | 4093
2 | 3949
etc...
Is this a reasonable solution?

Yes this is reasonably simple and easy solution to the problem. You can do the same for your comments(if you like to). In your MAIN_POST table assign a post_id and use this same post_id in other tables (comments(post_id, user_id, post_comment, comment_time) and votes(post_id, user_id, vote_status(you can use 1 for vote up and 0 for vote down))). It will complicate your sql queries, to retrieve data, a little but you can do it. And on android side there are alot of tricks to handle and furnish this data in application and you can make this vote(like) and comments idea just like facebook (YOU for your comments and likes and NAMES for others).

I wouldn't remove rows from the table. I understand why you would want to do that, but why lose the information? Instead, keep a +1/-1 value for each entry and then sum up the values for a post:
select sum(vote)
from uservotes
where postid = 1234;
And, I agree with Rick that you should also include the creation date/time.

Using an 'in between' or 'joining' table is a perfectly acceptable solution in this case. If relevant you could even add a timestamp to the relation and show to the user when a user has upvoted something.
Also it is important to take care of proper Indexes and Keys to have your table structure also perform properly once the dataset grows.

How to search multiple millions of strings really fast? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
So here is the situation
We have 250,000 radio stations.
Each radio station comes with 2 strings.
These 2 strings can either be Song Name, Album Name or Artist Name
We don't know which one is what. But one of them is surely the song name, we don't know which one.
Usually the other one is Artist(telling it for worst case scenarios, we don't want to create a situation of worst case scenario by assuming it as Album)
Now we have a database which consists of 4.5 million Artists, 7 million Albums and 150 million songs.(and a bunch of few other data which don't matter) These 3 different rows are in different tables. These are the tables where we will do our searching and matching. We can sort them Alphabetically or however it suits us to speed up the process.
These tables are interrelated.
In these tables a Song name always has an artist and album(in their respective table) associated with it, an album always has artist/s and song/s associated with......you get the idea
With 2 strings that comes with each radio station, I have to recognize 3 things
Song Name
Album Name
Artist Name
Now I am assuming the best case scenario would be if we match the first string of the channels with The Artist Names in the tables. If we get a match we can easily find if the other string gets a match under the Song name(and Album name) associated with the Artist matched. (Let's assume for the sake of simplicity that an Album Name cannot be same as Artist Name or song name or vice versa)
If we don't get a match for Artist with the first string, we try out the second string. and then we repeat the same with Album if we don't get a match.
What should be the algorithm for getting the fastest results ?
I have a server of 56(using some ram already) Gb but I want to reserve 20 Gb for other purposes. (But if you can provide a very great solution by using the reserve, don't hesitate to suggest.)
We also have SSD storage. Do you think this all can be done for all the radio stations within a minute ? Preferably 30 secs?
Please let me know how to proceed.
Here is the image for better understanding

Well all of these are strings. It is an interesting Search problem, creating a separate specific search index (a Trie like structure) would be good. Now coming to your problem the best data structure to index your data would be a Finite State Transducer. It is much more compact than a Trie as in real world the strings and text share a lot of suffixes and an FST allows you to share suffixes as well as prefixes, think Graphs. However Trie doesn't allow you to share suffixes. Also as you would have values to your keys so you will require something like a Transducer (think sorted maps) which emits a value given a key and not a Finite State Acceptor which is more like a sorted set and not a map like structure.
Lucene has a great implementation and I suppose a lot of things like Suggestions, Edit Distances are all based on it. They have also decoupled it from their main Inverted Index.
More information on Lucene Finite State Transducers:
http://blog.mikemccandless.com/2010/12/using-finite-state-transducers-in.html
Index 1,600,000,000 Keys with Automata and Rust: http://blog.burntsushi.net/transducers/

Counting large amounts of data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
My site has a social feature. On the user profile it shows how many posts the user has, how many followers and so on.
However our database has over 100,000 rows.
Its working fine however im getting very sceptical on the performance in the long run.
I was thinking of another method which i think would work best.
So basically right now it just counts the rows which the user owns in the mysql database.
Instead of scanning through the entire mysql table would it be better to do the following:
Create a section in the "Users" table called "post_counts". Every time he makes a post the counter will go up. Everytime the user removes his post it goes down and so forth.
Ive tried both methods however since the DB is still small its hard to tell if there is a performance increase
current method just querys SELECT * WHERE user = user_id FROM table_name; then just count with php count($fetchedRows);
Is there a better way to handle this?
[update]
Basically the feature is like the twitter followers. Im sure Twitter doesnt count billions of rows to determine the users followers count.

I have MySQL tables that have 70M+ rows in them. 100,000 is nothing.
But yes I would keep the counters in a field and simply update them whenever that users posts something or deletes a post. Make sure you have good indexes.
Also what you COUNT() make a differences. A COUNT(*) takes a less overhead than a COUNT(col) WHERE...
Use "explain" to see how long different COUNT() statements take and how many rows they are scanning.
As in: mysql> explain select count(*) from my_table where user_id = 72 \G;

How to Handle a great number of rows with SQL Queries and take only small amount of data efficiently? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I'm coding a site in PHP, and site will contain really much messages(like 100.000 , 200.000 or more) which users will post on the site. The problem is, messages will be stored on a table called 'site_messages' by it's ID. This means, all messages aren't grouped by their poster, it's grouped by their ID. If I want to fetch the messages that are posted by user 'foo', I have to query a lot of rows, and it will get really slow I think. Or I want to fetch the messages by post subject(yes, it will contain post subject column too, and maybe more column to add), I must query all the table again, and unfortunately, it will be less efficient. Is there any speedy solutions about that? I'm using PHP and MySQL(and PHPMyAdmin).
Edit: For example, my table would look like this:
MessageID: 1
MessageContent(Varchar, this is the message that user posts): Hi I like this site. Bye!
MessagePoster(Varchar): crazyuser
MessagePostDate: 12/12/09
MessagePostedIn(Varchar, this is the post subject): How to make a pizza
MessageID: 2
MessageContent(Varchar): This site reallllly sucks.
MessagePoster(Varchar): top_lel
MessagePostDate: 12/12/09
MessagePostedIn(Varchar): Hello, I have a question!
MessageID: 3
MessageContent(Varchar): Who is the admin of this site?
MessagePoster(Varchar): creepy2000
MessagePostDate: 1/13/10
MessagePostedIn(Varchar): This site is boring.
etc...

This is what DBs (especially relationship DBs) were built for! MySql and other DBs use things like indexes to help you get access to the rows you need in the most efficient way. You will be able to write queries like select * from site_messages where subject like "News%" order by entryDateTime desc limit 10 to find the latest ten messages starting with "News", or select * from site_messages, user where user.userid='foo' and site_messages.fk_user=user.id to find all posts for a certain user, and you'll find it performs pretty well. For these, you'd probably have (amongst others) an index for the subject column, and an index on the fk_user column.
Work on having a good table structure (data model). Of course if you have issues you can research DB performance and the topic of explain plans to help.
Yes, for each set of columns you want, you will query the table again. Think of a query as a set of rows. Avoid sending large numbers of rows over connections. As the other commenters have suggested, we can't help much more without more details about your tables.

Two candidates for indexing that jump right out are (Poster, PostDate) and (PostDate, Poster) to help queries in the form:
select ...
from ...
where Poster = #PID and PostDate > #Yesterday;
and
select Poster, count(*) as Postings, ...
from ...
where PostDate > #Yesterday
group by Poster;
and
select Poster, ...
from ...
where PostDate between #DayBeforeYesterday and #Yesterday;
Just keep in mind that indexing improves queries at the expense of the DML operations (insert, update, delete). If the query/DML ratio is very low, you just may want to live with the slower queries.

Looking for a starting point for a tagging system [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Basically I want to setup a tagging system like stack overflow has for entries and trying to plan out how a relevance based search would work. I want to have an option to pull up similar tagged entries for a related entries section. Right now I am using two tables for tags, a table for each unique tag and a join table. I am trying to think if that will work for being able to generate a list of entries that share similar tags.
If anyone has any ideas, or links to articles I could read on it to get my brain heading in the right direction that would be amazing. Thank you!

add one more field to entities table: tags. with string of comma separated tags, to prevent 2 more joins for selecting entities list.

Perhaps you could have a separate table to store related entries.
EntryId RelatedEntryId
Then you could have a CRON job recompute the relationships periodically and update the table. It would be less expensive than trying to compute these relationships on the fly.

You'll need to keep track of how often one tag is linked to another. Like, say "php" and "mysql" share 50 articles (or whatever the main content being tagged is), while "php" and "sql-server" might have 3, and "php" and "apache" have 25. So given "php," you'd want to return "mysql" and "apache" in that order (possibly letting "sql-server" fall to the wayside).
No way is this ideal, just thinking out loud (and kind of expanding on stephenc's answer, now that I see it):
CREATE TABLE tag_relations (
tag_id int unsigned not null,
related_tag_id int unsigned not null,
relation_count smallint unsigned not null,
PRIMARY KEY (tag_id, related_tag_id),
KEY relation_count (relation_count)
);
Then for each unique tag tied to an article, loop through all other tags and INSERT / UPDATE, incrementing the relation_count by 1. That means ("php", "mysql") and ("mysql", "php") are two completely different relations to be maintained, but without digging through search concepts I've probably forgotten, it'll still function. If something has 10+ tags, updates will be very slow (maybe pass that to cron like stephenc suggested), but it'll be easier to search this way. Nice and straightforward like so:
SELECT related_tag_id, COUNT(relation_count) AS total_relations
FROM tag_relations
WHERE tag_id IN ([list,of,tag,IDs,to,compare])
// AND tag_id NOT IN ([list,of,tag,IDs,to,compare]) -- probably
GROUP BY related_tag_id
ORDER BY total_relations DESC
Easier than having to check against both tag_id & related_tag_id and sum them up through a mess of subqueries, at least. JOIN on your tags table to get the actual tagnames & you're set.
So if you're looking up "php" and "mysql," and "apache" often relates to both, it'll be near the top since it's counting & weighting each common relation. It won't strictly limit it to common links though, so add HAVING total_relations >= x (x being an arbitrary cutoff) and/or just a regular LIMIT x to keep things relevant.
(note: research the heck out of this before thinking this is even slightly useful - I'm sure there's some known algorithm out there that's 100x smarter and I'm just not remembering it.)
PHPro.org has a good writeup too, using a similar idea.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.