PHP/MYSQL: Storing a list or massive table

PHP/MYSQL: Storing a list or massive table - php

I am still new to PHP and I was wondering which alternative would be better or maybe someone could suggest a better way.
I have a set of users and I have to track all of their interactions with posts. If a users taps on a button, it will add the post to a list and if they tap it again, it will remove the post, so would it be better to:
Have a column of a JSON array of postIDs stored in the table for each user (probably thousands).
-or-
Have a separate table with every save (combination of postID and userID) (probably millions) and return all results where the userID's match?
For the purposes of this question, there are two tables: Table A is users and Table B is posts. How should I store all of the user's saved posts?
EDIT: Sorry, but I didn't mention that posts will have multiple user interactions and users will have multiple post interactions (Many to Many relationship). I think that would affect Bob's answer.

This is an interesting question!
The solution really depends on your expected use case. If each user has a list of posts they've tagged, and that is all the information you need, it will be expedient to list these as a field in the user's table (or in their blob if you're using a nosql backend - a viable option if this is your use case!). There will be no impact on transmission time since the list will be the same size either way, but in this solution you will probably save on lookup time, since you're only using one table and dbs will optimize to keep this information close together.
On the other hand, if you have to be able to query a given post for all the users that have tagged it, then option two will be much better. In the former method, you'd have to query all users and see if each one had the post. In this option, you simply have to find all the relations and work from there. Presumably you'd have a user table, a post table and a user_post table with foreign keys to the first two tables. There are other ways to do this, but it necessitates maintaining multiple lists and cross checking each time, which is an expensive set of operations and error-prone.
Note that the latter option shouldn't choke on 'millions' of connections, since the db should be optimized for this sort of quick read. (pro tip: index the proper columns!) Do be careful about any data massage, though. One unnecessary for-loop will kill your performance.

For the purposes of this question, there are two tables: Table A is users and Table B is posts. How should I store all of the user's saved posts?
If each user has a unique ID of some sort (primary key), then ad a field to each post that refers to the unique ID of the user.
mysql> describe users;
+----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(200) | YES | | NULL | |
| username | varchar(20) | YES | | NULL | |
+----------+------------------+------+-----+---------+----------------+
mysql> describe posts;
+---------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+----------------+
| id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| user | int(11) unsigned | NO | | NULL | |
| text | text | YES | | NULL | |
+---------+------------------+------+-----+---------+----------------+
Then to get posts for a user, for example:
SELECT text
FROM posts
WHERE user=5;
Or to get all the posts from a particular organization:
SELECT posts.text,users.username
FROM posts,users
WHERE post.user=users.id
AND users.email LIKE '%#example.com';

I think it would make sense to keep a third table that would be all the post status data.
If your user interface shows, say, 50 posts per page, then the UI only needs to keep track of 50 posts at a time. They'll all have unique IDs in your database, so that shouldn't be a problem.

Related

Do the fields (structure) of MySQL tables get unique ids?

I'm not talking about unique keys or auto_increments, suppose I have this structure:
mysql> describe email_notifications;
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| email_id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| email_address | varchar(100) | NO | | | |
| course_id | int(11) unsigned | NO | MUL | NULL | |
+---------------+------------------+------+-----+---------+----------------+
I'm building (for fun, practice, and hopefully some practical use) a tool in PHP that will analyze the structure of each table in a database and then compare it to a newer one (to assist in Dev -> Live updates), and then spit out some MySQL queries (Such as ALTER TABLE...) that I can run on the live database in order to bring it up to speed.
The question - does each field get a unique id of some sort?
If I change email_address from varchar(100) to text (for example) or the name course_id to cr_id, is there any way for me to tell that it's still technically the same dataset? I don't want to run a Delete and Add, but instead rename it give it a new type.
Or if there's a better way to do it without some sort of MySQL ID, that would be great :)
Thanks!

I think you can use information_schema.columns. The following are both unique keys in this table (even if they are not so defined):
TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME
TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, ORDINAL_POSITION
When you change the name or type of a column, I do not believe that ORDINAL_POSITION is affected. So, the second version may be what you are looking for.
This may then lead to the question "what if I change the name of a table?" The information_schema tables can't help there, unfortunately.

Making a secure scoring system in PHP

On my website, I display 5 questions (MCQs) per page and when the user requests new page, I am calling a script score_update() with the score of this page and then presenting him with the next page.
The scoreUpdate() script is something like
<?php
//connect to database
//update the score
?>
The problem is that the user may refresh the page and the score may be updated twice or the number of times he refreshes the page or he may directly call the script by viewing the source code.
How can I implement this system?I need an idea.
EDIT
Here is my database schema
user
------------------------------------------
user_id | username | password | points
------------------------------------------
PS :The user may attempt the same question again at some point in future. There is no restriction on it. So no need to keep track of questions attempted by him. He must get marks only if he attempted the question and knocked it right. Hope I am clear.

I would recommend saving the user's state in your database. You should add another table in order to do so.
-----------------------------------
user_id | question_id | answer
-----------------------------------
When a user answers a question you can check if the user had already answered this question.
If so, update his answer and if it's the correct answer update him score. This method works assuming you won't present the same question again if the user already answered it correctly.
If you want to use questions multiple times I recommend another method.
Use 2 tables:
----------------------------
user_id | questionnaire_id
----------------------------
and
------------------------------------------
questionnaire_id | question_id | answer
------------------------------------------
Each questionnaire is unique and contains some questions - the answer to each question is empty at the start. Generate new questionnaire each time the user gets new questionnaire and save his answers per questionnaire. This way you can make sure the user can't submit the same questionnaire results twice (or more). If it's the first time the user submit this questionnaire you can update the score, if not, do nothing.
To make sure the user does not change his questionnaire_id manually you can save it in a session on the server so the user won't have access to it.

I would suggest using form keys, also known as NONCE.
This means that each time a submission is made, a new form key (NONCE) is generated.
Each NONCE can only be used once and the NONCE must be valid for the form submission to work.
Most modern frameworks have something like this built in as standard.
See this article for a more in depth explanation of the idea:
http://net.tutsplus.com/tutorials/php/secure-your-forms-with-form-keys/
And this section of the Symfony2 CSRF protection on forms which uses the same technique:
http://symfony.com/doc/current/book/forms.html#csrf-protection

There are different possible solutions for problems like this. It is basically the same with visitor counters or polls.
Atleast you have to store your information somewhere if there user as already triggered that script and redentify him on every page call.
The first and best method is a user account to login and save it in the PHP $_SESSION or directly in the database linked to the user_id / account_id. But this if your page doesnt have a login right now this is too much for a smaller problem I guess. But if you have already one login panel this is by far the best solution.
Another method is to save a cookie which may be a legal problem in some countries lately if the user doesnt agree to that before hand and cookies can be deleted so there it is easy to manipulate.
You can also save the users IP Address: Harder to manipulate (requires restart of internet and such and noone will do that a dozen times to fake your score counter) but if multiple people are sharing the same internet connection only one of them can achive one score.
All of them have different advantages and disadvantages. Depending on how paranoid you are you could also combine multiple of them if you want to make cheating / abusing harder but that decision is up to you.

Consider the following setup;
users
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| user_id | smallint(5) | NO | PRI | NULL | auto_increment |
| username | varchar(10) | NO | | NULL | |
+------------+-------------+------+-----+---------+----------------+
... You'll have more columns, but you get the idea
-
questions
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| qid | smallint(5) | NO | PRI | NULL | auto_increment |
| question | varchar(10) | NO | | NULL | |
| votes | smallint(5) | NO | | 0 | |
+----------+--------------+------+-----+---------+----------------+
-
votes
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| qid | smallint(5) | NO | | NULL | |
| user_id| smallint(5) | NO | | NULL | |
+--------+-------------+------+-----+---------+-------+
In this setup, I'm userid 1 and voting for question id 1
When a user votes, their vote is placed within votes
INSERT INTO `votes` (`qid`,`user_id`) VALUES (1, 1);
To check they've already voted, simply do;
SELECT `user_id` FROM `votes` WHERE (`user_id`=1) AND (`qid`=1);
If that query returns any rows, we know the user has already voted, and we shouldn't process the duplicate vote.
Of course this only restricts us to one type of voting - positive, or negative - whichever you decide to track. We can adapt votes to store the type of vote it is;
ALTER TABLE votes ADD type ENUM('up', 'down') NOT NULL DEFAULT 'up';
Which will make our table structure to the following;
+---------+-------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+-------------------+------+-----+---------+-------+
| qid | smallint(5) | NO | | NULL | |
| user_id | smallint(5) | NO | | NULL | |
| type | enum('up','down') | NO | | up | |
+---------+-------------------+------+-----+---------+-------+
And, again, adapt the lookup query;
SELECT `user_id` FROM `votes` WHERE (`user_id`=1) AND (`qid`=1) AND (`type`='up');

Check the $_SERVER['HTTP_REFERER'] value.
If it's the same the page: is reloaded. (do nothing)
If it is the previous: update database
If it is another domain: illegal access (redirect to first question)

The most foolproof system I see is based on tracking the whole lifetime of a given quizz.
If you store a "current question number" associated with the user and this particular quizz, you can easily filter out duplicate responses:
update_score ($question_number, $choice)
if current question for this quizz and user is not set to $question_number
ignore request
else
set choice for this specific question and update score
increment current question (possibly reaching the end of the quizz)
When the last question is answered, the final score is displayed/recorded and the "current question" reset to 0.
If the user wants to retry the test, current question is set to 1 and the whole process restarts.
If the user wants to cancel the current test and restart, he/she can do so by going back to quizz start page.
So any attempt to submit a second answer to the same question would fail (be it from accidental refresh or malicious attempts), until the quizz is finished and you can start back with question 1.

You can use a toggle session variable approach (name it as flag),which is the simplest and has a good level of security against duplicate requests.
Make a script called updateScore.php .When the user login set the flag=1 ,which means when the next request comes for updation,process it in updateScore.php and at the end of ths script make flag=0. When the next page appears again make flag=1.This way you alternate the values and also set a maximum update limit in your script,say, in your case you have 5 questions so you can set it to 50 (+10 per question). You can take more complicate values of flag to reduce guess chances.

MySQL table structure with one row. Best handling for management and insertion

I'm just creating a social network for practising my skills (PHP, HTML, CSS, JavaScript, etc.).
Now when designing the database layout a question appears, that I'm unhappily not able to solve.
I have a table called UserMain:
+------------+---------------------+
| Field | Type |
+------------+---------------------+
| u_id | bigint(20) unsigned |
| u_email | varchar(256) |
| u_password | varchar(30) |
| u_data | varchar(25) |
| u_friends | varchar(28) |
+------------+---------------------+
For storing the general data, that's being input when registering.
I wanted to separate the users data (prename, surename, sex, birthday, etc.) into another table called data and of course the relationship between users in a table called friends. So I decided to create a data- and friends-table for every user, via php using the u_id above and I came up with something like this, [u_id]_data:
+------------+---------------------+
| Field | Type |
+------------+---------------------+
| u_prename | varchar(20) |
| u_surname | varchar(20) |
| u_sex | boolean |
| u_birthday | DATE |
| u_avatar | varchar(28) |
+------------+---------------------+
Now I don't want to attach value at the friends table, because the problem starts obviously with the [u_id]_data table. A user just has one pre- and surname, etc., so it is a 1-row-table. Now the question:
How do I handle the input of the table in relation to the primary key?
For me, creating a new "id int not null auto_increment pk" seems needless for a single row, so I don't know what combination of columns to use for the primary key.
Maybe you know better implementations of this design, but please consider the following:
It doesn't matter what new implementation you have, the only thing I don't want to have is a table called data in which I have the data of all users.
Alright, I maybe have a bad opinion about MySQL or I'm not really good informed, but my idea of just having multiple data-tables comes from performance reasons.
My idea when changing or inserting data:
GetTheUsersId (Searching User-Table for the Id. That could take a
littlebit if I would have ... let's say 10,000,000 users)
When having the [u_id] I just can use the data-table to find what I'm searching for.
With a table that is made up of (again) 10,000,000 rows it would take longer. Now don't start laughing as I'm taking the abstraction and dimensions to a level far away. It's just for supporting the idea of saving performance.

the only thing I don't want to have is a table called data in which I have the data of all users.
Please tell us more about why this is not an option? It's a perfectly valid way to store user data.
But to answer your question, you probably don't need any key at all if there is only one row in a table. You are going to refer to the row by the table name in most cases anyways:
SELECT * FROM [uid]_data ...
SELECT * FROM [uid]_data JOIN ...
UPDATE [uid]_data ...
INSERT INTO [uid]_data ...
DELETE FROM [uid]_data --You're probably going to want to DROP the table as well

MySQL Occasionally Returns Wrong Value

This is a general question, one that I've been scratching my head on for a while now. My company's database handles about 2k rows a day. 99.9% of the time, we have no problem with the values that are returned in the different SELECT statements that are set up. However, on a very rare occasion, our database will "glitch" and return the value for a completely different row than what was requested.
This is a very basic example:
+---------+-------------------------+
| row_id | columnvalue |
+---------+-------------------------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 4 | 40 |
+---------+-------------------------+
SELECT columnvalue FROM table_name WHERE row_id = 1 LIMIT 1
Returns: 10
But on the very rare occasion, it may return: 20, or 30, etc.
I am completely baffled as to why it does this sometimes and would appreciate some insight on what appears to be a programming phenomena.
More specific information:
SELECT
USERID, CONCAT( LAST, ', ', FIRST ) AS NAME, COMPANYID
FROM users, companies
WHERE users.COMPANYCODE = companies.COMPANYCODE
AND USERID = 9739 LIMIT 1
mysql> DESCRIBE users;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| USERID | int(10) | NO | PRI | NULL | auto_increment |
| COMPANYCODE| varchar(255)| NO | MUL | | |
| FIRST | varchar(255)| NO | MUL | | |
| LAST | varchar(255)| NO | MUL | | |
+------------+-------------+------+-----+---------+----------------+
mysql> DESCRIBE companies;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| COMPANYID | int(10) | NO | PRI | NULL | auto_increment |
| COMPANYCODE| varchar(255)| NO | MUL | | |
| COMPANYNAME| varchar(255)| NO | | | |
+------------+-------------+------+-----+---------+----------------+
What the results were suppose to be: 9739, "L----, E----", 2197
What the results were instead: 9739, "L----, E----", 3288
Basically, it returned the wrong company id based off the join with companycode. Given the nature of our company, I can't share any more information than that.
I have run this query 5k times and have made very modification to the code imaginable in order to generate the second set of results and I have no been able to duplicate it. I'm not quick to blame MySQL -- this has been happening (though rarely) for over 8 years, and have exhausted all other possible causes. I have suspected the results were manually changed after the query was ran, but the timestamps states otherwise.
I'm just scratching my head as to why this can run perfectly 499k out of 500k times.

Now that we have a more realistic query, I notice right away that you are joining the tables, not on the primary key, but on the company code. Are we certain that the company code is being enforced as a unique index on companies? The Limit 1 would hide a second row if such a row was found.
From a design perspective, I would make the join on the primary key to avoid even the possibility of duplicate keys and put company code in as a unique indexed field for display and lookup only.

This behavior is either due to an incredibly unlikely SERIOUS bug in MySQL, -or- MySQL is returning a result that is valid at the time the statement is run, and there is some other software that is garfing up the displayed result.
One possibility to consider is that the row had been modified (by some other statement) at the time your SQL statement executed, and then the row was changed again later. (That's the most likely explanation we'd have for MySQL returning an unexpected result.)
The use of the LIMIT 1 clause is curious, because if the predicate uniquely identifies a row, there should be no need for the LIMIT 1, since the query is guaranteed to return no more than one row.
This leads me to suspect that row_id is not unique, and that the query actually returns more than one row. With the LIMIT clause, there is no guarantee as to which of the rows will get returned (absent an ORDER BY clause.)
Otherwise, the most likely culprit is out dated cache contents, or other problems in the code.
UPDATE
The previous answer was based on the example query given; I purposefully omitted the possibility that EMP was a view that was doing a JOIN, since the question originally said it was a table, and the example query showed just the one table.
Based on the new information in the question, I suggest that you OMIT the LIMIT 1 clause from the query. That will identify that the query is returning more than one row.
From the table definitions, we see that the database isn't enforcing a UNIQUE constraint on the COMPANYCODE column in the COMPANY table.
We also know there isn't a foreign key defined, due to the mismatch between the datatypes.
Normally, the foreign key would be defined referencing the PRIMARY KEY of the target table.
What we'd expect the users table to have a company_id column, which references the id (primary key) column in the companies table.
(We note the datatype of the companycode column (int) matches the datatype of the primary key column in the companies table, and we note that the join condition is matching on the companycode column, even though the datatypes do not match, which is very odd.)

There are several reasons this could happen. I suggest you look at the assumptions you're making. For example:
If you're using GROUP BY and one of the columns isn't an aggregate or the grouping expression, you're going to get an unpredictable value in that column. Make sure you use an appropriate aggregation (such as MAX or MIN) to get a predictable result on each column.
If you're assuming a row order without making it explicit, and using LIMIT to get only the first row, the actual returned order of rows differs depending on that result's execution plan, which is going to differ in large resultsets based on the statistics available to the optimiser. Make sure you use ORDER BY in such situations.

What is a Parent table and a Child table in Database?

I just want to know what is a parent table and what is a child table in databases. Can you please show me an example so I understand how it works please.
Thank You

Child tables and parent tables are just normal database tables, but they’re linked in a way that's described by a parent–child relationship.
It’s usually used to specify where one table’s value refers to the value in another table (usually a primary key of another table).
For example, imagine a news article. This could be represented by a table called articles and has fields for id, headline, body, published_date and author. But instead of placing a name in the author field, you could instead put the ID value of a user in a separate table—maybe called authors—that has information on authors such as id, name, and email.
Therefore, if you need to update an author’s name, you only need to do so in the authors (parent) table; because the articles (child) table only contains the ID of the corresponding author record.
Hope this helps you understand better.

Be aware you can have relationships that appear to be parent-child but are not, for instance when lookup tables are being used. The distinction is that in a true parent-child relationship, records typically don't stand are their own very well - they are detail records for the parent and are not useful without the parent table info. A person can own multiple cars in the DMV database, but you wouldn't want records in the CARS table without a parent record in the OWNERS table - it would be nearly useless data.
On the other hand, if I am using a lookup table to expand a code to something more meaningful, or to constrain data entry to acceptable values, then the "child" record can still useful (can stand alone) if the lookup table is deleted. I could still have the sex information as "M" or "F" even if I no longer have the lookup table to expand that to "Male" or "Female".

Parent - The entity on the "one" (/1) side of a relation with another table
Child - The entity on the "many" (/N/*) side of a relation with another table

A child table tends to be one where it has one or more foreign keys pointing at some other table(s). Note that a child table can itself be a parent to some OTHER table as well.

Those terms are used in database relationships.
for example u have two table,
1.Manifast
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| manifast_id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| description | text | NO | | NULL | |
| title | text | NO | | NULL | |
+-------------+------------------+------+-----+---------+----------------+
day_sequence
+-----------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+---------+----------------+
| day_sequence_id | int(11) unsigned | NO | PRI | NULL | auto_increment |
| day_number | int(11) | NO | | NULL | |
| day_start | int(11) | NO | | NULL | |
| manifast_id | int(11) | NO | | NULL | |
+-----------------+------------------+------+-----+---------+----------------+
if u want to connect those two tables,u need to use the command with following format.
> ALTER TABLE child_table_name ADD FOREIGN KEY (P_ID) REFERENCES
> parent_table_name (P_ID)
and so it become.
> ALTER TABLE day_sequence ADD CONSTRAINT fk_manifast FOREIGN KEY
> (manifast_Id) REFERENCES manifast(manifast_Id);
In summary,
Child table is a table which has foreign key,and is connected from others table.
Parent table has no foreign key and connect to other.
[ Note : This ans is just for connecting two tables ]

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.