How to manage huge data inside mysql table - php

I have an issue with huge data on my table.
Let suppose i have a table name employee_details and columns are
Emp-name | Emp-email | Emp-mobile | Emp-designation | Emp-salary
Suppose i have 1,00000 rows inside it. Then how should i structure table for best performance.

100000 rows are not a big deal for Mysql. They are designed to handle millions of rows. Make sure your datatypes and indexes are setup properly and you should be good.

Related

Storing assignments between 2 tables in MySQL

I am wondering what is the best solutions to store relations between 2 tables in mysql.
I have following structure
Table: categories
id | name | etc...
_______________________________
1 | Graphic cards | ...
2 | Processors | ...
3 | Hard Drives | ...
Table: properties_of_categories
id | name
_____________________
1 | Capacity
2 | GPU Speed
3 | Memory size
4 | Clock rate
5 | Cache
Now I need them to have connections, and question is what is a better, more efficient and lighter solution, which is important because there may be hundreds of categories and thousands of properties assigned to them.
Should I just create another table with a structure like
categoryId | propertyId
Or perhaps add another column to categories table and store properties in text field like 1,7,19,23
Or maybe create json files named for example 7.json with content like
{1,7,19,23}
As this question is pertaining to Relational World, I would suggest to add another table to store many to many relationship between Category and Property.
You can also use JSON column to store many values in one of the table.
JSON Datatype is introduced in MYSQL 5.7 and it comes with various features for JSON data retrieval and updation. However if you are using older version, you would need to manage it with string column with some cumbersome queries for string manipulation.
The required structure depends on the relationship type: one-to-many, many-to-one, or many-to-many (M2M).
For a one-to-many, a foreign key (FK) on the 'many' side relates many items to the 'one' side. The reverse is correct for many-to-one.
For many-to-many (M2M) you need an intermediate relational (or junction) table exactly as you suggest. This allows you to "reuse" both categories and properties in any combinations. However it's slightly more SQL - requiring 2 JOINs.
If you are looking for performance, then using FKs to primary keys (PKs) would be very efficient and the queries are pretty simple. Using JSON would presumably require you to parse in PHP and construct on-the-fly second queries which would multiply your coding work and testing, data transfer, CPU overhead, and limit scalability.
In your case I'm guessing that both "graphics cards" and "hard drives" could have e.g. "memory size" plus other properties, so you would need a M2M relational table as you suggest.
As long as your keys are indexed (which PKs are), your JOIN to this relational table will be very quick and efficient.
If you use CONSTRAINTs with your relations, they you ensure you maintain data integrity: you cannot delete a category to which a property is "attached". This is a good feature in the long run.
Hundreds and thousands of records is a tiny amount for MySQL. You would use this technique even with millions of records. So there's no worry about size.
RDBMS databases are designed specifically to do this, so I would recommend using the native features than try to do it yourself in JSON. (unless I'm missing some new JSON MySQL feature! *)
* Since posting this, I indeed stumbled across a new JSON MySQL feature. It seems, from a quick read, you could implement all sorts of new structures and relations using JSON and virtual column keys, possibly removing the need for junction tables. This will probably blur the line between MySQL as an RDBMS and NoSQL.
The first solution is better when it comes to relational databases. You should create a table that will pair each category to multiple properties (1:n relationship)
You could structure the table like so:
CREATE TABLE categories_properties_match(
categoryId INTEGER NOT NULL,
propertyId INTEGER NOT NULL,
PRIMARY KEY(categoryId, propertyId),
FOREIGN KEY(categoryId) REFERENCES categories(id) ON UPDATE CASCADE ON DELETE CASCADE,
FOREIGN KEY(propertyId) REFERENCES properties_of_categories(id) ON UPDATE CASCADE ON DELETE CASCADE
);
The primary key ensures that there will be no duplicate entries, that means entries that match one category to the same property twice

Copying records accross multiple databases using PHP

I want to do something which may sound wierd.I have a database for my main application which holds few html templates created using my application.These templates are stored in a traditional RDBMS style.A table for template details and other for page details of the template.
I have a similar application for different purpose on another domain.It has a different database with the same structure as the main app.I want to move the templates from one database to the other,with all columns intact.I cannot export as both have independent content of there own i.e same in structure and differ in content. 1st is the template table and 2nd is the page table
+----+----------+----------+
| id |templatename |
+----+----------+----------+|
| 1 | File A | |
| 2 | File B | |
| 3 | File C |
| 4 | File 123 |
| .. | ....... | ........ |
+----+----------+----------+
+----+----------+----------+
| id | page_name| template_id|(foreign key from above table)
+----+----------+----------+
| 1 | index | 1 |
| 2 | about | 1 |
| 3 | contact| 2 |
| 4 | | |
| .. | ........ | ........ |
+----+----------+------------+
I want to select records from 1st database and insert them to the other.Both are on differnet domains.
I thought of writing a PHP script which will use two DB connections,one to select and the other for insert to the other DB,but I want to know if I can achieve this in any other efficient way using command line or export feature in any way
EDIT: for better understanding
I have two databases A and B both n diff servers.Both have two tables say tbl_site and tbl_pages.Now both are independently updated on their domains via application interface.I have a few templates created in database A stored in tbl_site and tbl_pages as mentioned in the question above.I want the template records to be moved to the database B
You can do this in phpMyAdmin (and other query tools, but you mention PHP so I assume phpAdmin is available for you).
On the first database run a query to select the records that you want to copy to the second server. In the "Query results operations" section of the results screen, choose "Export" and select "SQL" as the format.
This will produce a text file containing SQL INSERT statements with the records from the first database.
Then connect to the second database and run the INSERT statements from the generated file.
As other mentioned you can use phpmyadmin, but if your second database table fields are different, then you can write down a small php script to do that for you. Please follow the following steps.
Note : Consider two databases A and B, and you want to move some data from A to B and both are on different servers.
1) First allow remote access on database A server for the database A. Also get a host, username and password for database A.
2) Now using mysqli_ extension, connect to that database. As you have the host for the other database A server, so you have to use that, not localhost. On most servers, the host is the IP of the other remote server.
3) Query database table and get your results. After you get results, close the database connection.
4) Connect to database B. Please note that in this case, database B host may be localhost. Check your server settings for that.
5) Process the data you got from database A and insert them to database B table(s).
I use this same method to import data from different systems (Drupal to Prestahop, Joomla to a customized system), and it works fine.
I hope this will help
Export just data of db A (to .sql). Or use php script - can then be automated if you need to do it again
Result:
INSERT table_A values(1, 'File A')
....
INSERT table_B values(1, 'index', 1)
....
Be careful now when importing data - if you have ids the same you will get error (keep this in mind). Make any mods to the script to solve these problems (remember if you change an id for table_A you will have to change the foreign key in table_B). Again this is a process which you might be forced to automate.
Run the insert scripts in db B
As my question was a bit different I preffered answering it.Also the above comments are relevant in different scenarios so,I won't say they are totally wrong.
I had to run a script to make the inserts happen based on new ids to the target database.
To make it a bit easy and avoid cross domain request to database,I took a dump of the first database and restored it to the target.
Now I wrote a script to select records from one database and insert them to the other i.e the target.So the ids were taken care of automatically.Only the problem(not a problem actually) was I had to run the script for each record independently.

MySQL BLOB data in same table or not

I have one varchar and two BLOB types of data for recipes. I don't need relations between data. For example I don't need to know which meals need potato etc.
I'll get meal's materails from database, edit them and save them again as BLOB. Then I will create a binary text file (~100KB) on the fly and save it in another column named binary data.
So my question is, does splitting table into two makes sense? Putting one BLOB in one table and another BLOB in another table changes performance (in theoretically). Or doesn't it change anything except backup issues ?
+-id--+-meal name (varchar)----+-materials (BLOB)------------+-binary data (BLOB)---+
| 1 | meatball | (meat, potato, bread etc.) | (some binary files) |
| 2 | omelette | (potato, egg, etc.) | (other binary files) |
+-----+------------------------+-----------------------------+----------------------+
If you will be using a ORM, better use the split table approach.
Otherwise, when you ask for the materials, the ORM will usually fetch all available fields... So reading big and unnecessary "binary" objects.
On other side of things... If you'll serve the binary results, a better approach would be to save the files and serve them directly.
It's more a design choice than a specific performance improvement. This assumes your query is not doing a catch-all "SELECT *". Your queries should always target the specific columns you are interested in for a given purpose.
If you do not anticipate the BLOB types for a specific meal growing past your current expectation, then keeping it in one table is an appropriate choice. This is assuming there is a one-to-one relationship between them.
However, if there is any chance there might be any need for more BLOB objects for a meal, then yes I would consider splitting it out to a new table and cross-references. Somtimes, it is better to be safe than sorry though.

EAV vs. Column based organization for my data

I'm in the process of rebuilding an application (lone developer here) using PHP and PostgreSQL. For most of the data, I'm storing it using a table with multiple columns for each attribute. However, I'm now starting to build some of the tables for the content storage. The content in this case, is multiple sections that each contain different data sets; some of the data is common and shared (and foreign key'd) and other data is very unique. In the current iteration of the application we have a table structure like this:
id | project_name | project_owner | site | customer_name | last_updated
-----------------------------------------------------------------------
1 | test1 | some guy | 12 | some company | 1/2/2012
2 | test2 | another guy | 04 | another co | 2/22/2012
Now, this works - but it gets hard to maintain for a few reasons. Adding new columns (happens rarely) requires modifying the database table. Audit/history tracking requires a separate table that mirrors the main table with additional information - which also requires modification if the main table is changed. Finally, there are a lot of columns - over 100 in some tables.
I've been brainstorming alternative approaches, including breaking out one large table into a number of smaller tables. That introduces other issues that I feel also cause problems.
The approach I am currently considering seems to be called the EAV model. I have a table that looks like this:
id | project_name | col_name | data_varchar | data_int | data_timestamp | update_time
--------------------------------------------------------------------------------------------------
1 | test1 | site | | 12 | | 1/2/2012
2 | test1 | customer_name | some company | | | 1/2/2012
3 | test1 | project_owner | some guy | | | 1/2/2012
...and so on. This has the advantage that I'm never updating, always inserting. Data is never over-written, only added. Of course, the table will eventually grow to be rather large. I have an 'index' table that lists the projects and is used to reference the 'data' table. However I feel I am missing something large with this approach. Will it scale? I originally wanted to do a simple key -> value type table, but realized I need to be able to have different data types within the table. This seems managable because the database abstraction layer I'm using will include a type that selects data from the proper column.
Am I making too much work for myself? Should I stick with a simple table with a ton of columns?
My advice is that if you can avoid using an EAV table, do so. They tend to be performance killers. They are also difficult to properly query especially for reporting (Yes let me join to this table an unknown number times to get all of the data out of it I need and, oh by the way, I don't know what columns I have available so I have no idea what columns the report will need to contain) and it is hard to get the kind of database constraints that you need to ensure data integrity (how to ensure that the required fields are filled in for instance) and it can cause you to use bad datatypes. It is far better in the long run to define tables that store the data you need.
If you are really need the functionality, then at least look into NoSQL databases which are more optimized for this sort of undefined data.
Moving your entire structure to EAV can lead to a lot of problems down the line, but it might be acceptable for the audit-trail portion of your problem since often foreign key relationships and strict datatyping may disappear over time anyway. You can probably even generate your audit tables automatically with triggers and stored procedures.
Note, however, that reconstructing old versions of records is non-trivial with an EAV audit trail and will require a fair amount of application code. The database will not be able to do it by itself.
An alternative you could consider is to store all your data (new and old records) in the same table. You can either include audit fields in the same table and leave NULL when unnecessary, or store some rows in the table being "current" and with audit-related fields stored in another table. To simplify your application, you can create a view which only shows current rows and issue queries against the view.
You can accomplish this with a joined table inheritance pattern. With joined table inheritance, you put common attributes into a base table along with a "type" column, and you can join to additional tables (which have the same primary key which is also a foreign key) based on type. Many Data-Mapper-Pattern ORMs have native support for this pattern, often called "polymorphism".
You could also use PostgreSQL's native table inheritance mechanism, but note the caveats carefully!

Can I use MySQL temporary tables to store search results?

I have a search page written in PHP, and it needs to search in the MySQL database, and the result need to be sortable. This search page will be accessed by many users (>1000 at any time).
However, it is not feasible to sort the search result in MySQL, as it would be very slow.
I'm thinking of storing each search result into a temporary table (not MySQL temporary table), and the table name is stored inside another table for reference like this:
| id | table_name | timeout |
-----------------------------
| 1 | result_1 | 10000 |
| 2 | result_2 | 10000 |
Then I can use the temporary tables to sort any search results whenever needed without the need to reconstruct (with some modification) the query.
Each table will be dropped, according to the specified timeout.
Assuming I cannot modify the structure of existing tables that are used in the query, would this be a good solution or are there better ways? Please advice.
Thanks
There's no need to go to the trouble of storing the results in a persistent database when you just want to cache search results in memory. Do you need indexed access to relational data? If the answer is no, don't store it in a MySQL database.
I know that phpbb (an open source web forum which supports MySQL backends) uses a key-value store to back its search results. If the forum is configured to give you a link to the specific results page (with the search id hash in the URL's query string) then that link will be valid for awhile but eventually be flushed out of the cache, just like you want. It may be overkill to implement a full database abstraction layer if you're set on MySQL though. Anyway:
http://wiki.phpbb.com/Cache
You should just use memcached or something to store the results data, and you can easily retrieve the data and sort it in PHP. Also there are some PHP-specific cache frameworks that minimize the cost of loading and offloading data from the interpreter:
https://en.wikipedia.org/wiki/List_of_PHP_accelerators

Categories