MySQL BLOB data in same table or not - php

I have one varchar and two BLOB types of data for recipes. I don't need relations between data. For example I don't need to know which meals need potato etc.
I'll get meal's materails from database, edit them and save them again as BLOB. Then I will create a binary text file (~100KB) on the fly and save it in another column named binary data.
So my question is, does splitting table into two makes sense? Putting one BLOB in one table and another BLOB in another table changes performance (in theoretically). Or doesn't it change anything except backup issues ?
+-id--+-meal name (varchar)----+-materials (BLOB)------------+-binary data (BLOB)---+
| 1 | meatball | (meat, potato, bread etc.) | (some binary files) |
| 2 | omelette | (potato, egg, etc.) | (other binary files) |
+-----+------------------------+-----------------------------+----------------------+

If you will be using a ORM, better use the split table approach.
Otherwise, when you ask for the materials, the ORM will usually fetch all available fields... So reading big and unnecessary "binary" objects.
On other side of things... If you'll serve the binary results, a better approach would be to save the files and serve them directly.

It's more a design choice than a specific performance improvement. This assumes your query is not doing a catch-all "SELECT *". Your queries should always target the specific columns you are interested in for a given purpose.
If you do not anticipate the BLOB types for a specific meal growing past your current expectation, then keeping it in one table is an appropriate choice. This is assuming there is a one-to-one relationship between them.
However, if there is any chance there might be any need for more BLOB objects for a meal, then yes I would consider splitting it out to a new table and cross-references. Somtimes, it is better to be safe than sorry though.

Related

Storing assignments between 2 tables in MySQL

I am wondering what is the best solutions to store relations between 2 tables in mysql.
I have following structure
Table: categories
id | name | etc...
_______________________________
1 | Graphic cards | ...
2 | Processors | ...
3 | Hard Drives | ...
Table: properties_of_categories
id | name
_____________________
1 | Capacity
2 | GPU Speed
3 | Memory size
4 | Clock rate
5 | Cache
Now I need them to have connections, and question is what is a better, more efficient and lighter solution, which is important because there may be hundreds of categories and thousands of properties assigned to them.
Should I just create another table with a structure like
categoryId | propertyId
Or perhaps add another column to categories table and store properties in text field like 1,7,19,23
Or maybe create json files named for example 7.json with content like
{1,7,19,23}
As this question is pertaining to Relational World, I would suggest to add another table to store many to many relationship between Category and Property.
You can also use JSON column to store many values in one of the table.
JSON Datatype is introduced in MYSQL 5.7 and it comes with various features for JSON data retrieval and updation. However if you are using older version, you would need to manage it with string column with some cumbersome queries for string manipulation.
The required structure depends on the relationship type: one-to-many, many-to-one, or many-to-many (M2M).
For a one-to-many, a foreign key (FK) on the 'many' side relates many items to the 'one' side. The reverse is correct for many-to-one.
For many-to-many (M2M) you need an intermediate relational (or junction) table exactly as you suggest. This allows you to "reuse" both categories and properties in any combinations. However it's slightly more SQL - requiring 2 JOINs.
If you are looking for performance, then using FKs to primary keys (PKs) would be very efficient and the queries are pretty simple. Using JSON would presumably require you to parse in PHP and construct on-the-fly second queries which would multiply your coding work and testing, data transfer, CPU overhead, and limit scalability.
In your case I'm guessing that both "graphics cards" and "hard drives" could have e.g. "memory size" plus other properties, so you would need a M2M relational table as you suggest.
As long as your keys are indexed (which PKs are), your JOIN to this relational table will be very quick and efficient.
If you use CONSTRAINTs with your relations, they you ensure you maintain data integrity: you cannot delete a category to which a property is "attached". This is a good feature in the long run.
Hundreds and thousands of records is a tiny amount for MySQL. You would use this technique even with millions of records. So there's no worry about size.
RDBMS databases are designed specifically to do this, so I would recommend using the native features than try to do it yourself in JSON. (unless I'm missing some new JSON MySQL feature! *)
* Since posting this, I indeed stumbled across a new JSON MySQL feature. It seems, from a quick read, you could implement all sorts of new structures and relations using JSON and virtual column keys, possibly removing the need for junction tables. This will probably blur the line between MySQL as an RDBMS and NoSQL.
The first solution is better when it comes to relational databases. You should create a table that will pair each category to multiple properties (1:n relationship)
You could structure the table like so:
CREATE TABLE categories_properties_match(
categoryId INTEGER NOT NULL,
propertyId INTEGER NOT NULL,
PRIMARY KEY(categoryId, propertyId),
FOREIGN KEY(categoryId) REFERENCES categories(id) ON UPDATE CASCADE ON DELETE CASCADE,
FOREIGN KEY(propertyId) REFERENCES properties_of_categories(id) ON UPDATE CASCADE ON DELETE CASCADE
);
The primary key ensures that there will be no duplicate entries, that means entries that match one category to the same property twice

PostgreSQL and PHP: Storing large files in multiple schemas: BLOB or BYTEA

I need to store large files (from several MB to 1GB) in Postgres database. The database has multiple schemas. It looks like Postgres has 2 options to store large objects: LOB and BYTEA. However we seem to hit problems with each of these options.
LOB. This works almost ideal, can store up to 2GB and allows streaming so that we do not hit memory limits in our PHP backend when reading the LOB. However all blobs are stored in pg_catalog and are not part of schema. This leads to a big problem when you try to use pg_dump with options –n and –b to dump just one schema with its blobs. It dumps the schema data correctly however then it includes ALL blobs in the database not just the blobs that belong to the particular schema.
Is there a way to dump the single schema with its blobs using pg_dump or some other utility?
BYTEA. These are correctly stored per schema so pg_dump –n works correctly however I cannot seem to find a way to stream the data. This means that there is no way to access the data from PHP if it is larger than memory limit.
Is there any other way to store large data in Postgres that allows streaming and correctly works with multiple schemas per database?
Thanks.
Although using bytea doesn't support a streaming/file-style API, you can use it to fetch only parts of the content, so it supports "chunking".
You need to set the storage mode of your bytea column to 'external' to disable compression, and then you can use substring on the bytea column to fetch only a part of it. At least according to the documentation, this will DTRT and efficiently access only the requisite part of the value on the database side:
http://www.postgresql.org/docs/current/static/storage-toast.html
So create a schema a bit like this:
create table media.entity(entity_id serial primary key, content bytea not null);
alter table media.entity alter column content set storage external;
And then to fetch 8Kb from the content:
select substring(content, 1, 8192) from media.entity where entity_id = 1;
select substring(content, 8193, 8192) from media.entity where entity_id = 1;
Unfortunately, fetches of TOAST data don't seem to be counted in the explain (buffers on) counts so it's hard to verify that the db is doing what the documentation says.
The pg_catalog.pg_largeobject system table where the large objects are actually stored is essentially a list of per-object bytea chunks ordered by pageno, which is a sequential chunk number from 0 to N.
Table "pg_catalog.pg_largeobject"
Column | Type | Modifiers
--------+---------+-----------
loid | oid | not null
pageno | integer | not null
data | bytea |
Indexes:
"pg_largeobject_loid_pn_index" UNIQUE, btree (loid, pageno)
The maximum size of these chunks is 2048 bytes (it can be changed but at the cost of a server recompilation), which is quite small for blobs of several hundred megabytes.
So one option in your case would be to replicate a similar structure in your own schema, probably with larger chunks, and implement stream-like access by iterating over a list of pageno. Having smaller column contents is better in general, anyway. For example, it's not obvious that pg_dump deals nicely with large bytea contents in a single row in terms of client-side memory requirements.

EAV vs. Column based organization for my data

I'm in the process of rebuilding an application (lone developer here) using PHP and PostgreSQL. For most of the data, I'm storing it using a table with multiple columns for each attribute. However, I'm now starting to build some of the tables for the content storage. The content in this case, is multiple sections that each contain different data sets; some of the data is common and shared (and foreign key'd) and other data is very unique. In the current iteration of the application we have a table structure like this:
id | project_name | project_owner | site | customer_name | last_updated
-----------------------------------------------------------------------
1 | test1 | some guy | 12 | some company | 1/2/2012
2 | test2 | another guy | 04 | another co | 2/22/2012
Now, this works - but it gets hard to maintain for a few reasons. Adding new columns (happens rarely) requires modifying the database table. Audit/history tracking requires a separate table that mirrors the main table with additional information - which also requires modification if the main table is changed. Finally, there are a lot of columns - over 100 in some tables.
I've been brainstorming alternative approaches, including breaking out one large table into a number of smaller tables. That introduces other issues that I feel also cause problems.
The approach I am currently considering seems to be called the EAV model. I have a table that looks like this:
id | project_name | col_name | data_varchar | data_int | data_timestamp | update_time
--------------------------------------------------------------------------------------------------
1 | test1 | site | | 12 | | 1/2/2012
2 | test1 | customer_name | some company | | | 1/2/2012
3 | test1 | project_owner | some guy | | | 1/2/2012
...and so on. This has the advantage that I'm never updating, always inserting. Data is never over-written, only added. Of course, the table will eventually grow to be rather large. I have an 'index' table that lists the projects and is used to reference the 'data' table. However I feel I am missing something large with this approach. Will it scale? I originally wanted to do a simple key -> value type table, but realized I need to be able to have different data types within the table. This seems managable because the database abstraction layer I'm using will include a type that selects data from the proper column.
Am I making too much work for myself? Should I stick with a simple table with a ton of columns?
My advice is that if you can avoid using an EAV table, do so. They tend to be performance killers. They are also difficult to properly query especially for reporting (Yes let me join to this table an unknown number times to get all of the data out of it I need and, oh by the way, I don't know what columns I have available so I have no idea what columns the report will need to contain) and it is hard to get the kind of database constraints that you need to ensure data integrity (how to ensure that the required fields are filled in for instance) and it can cause you to use bad datatypes. It is far better in the long run to define tables that store the data you need.
If you are really need the functionality, then at least look into NoSQL databases which are more optimized for this sort of undefined data.
Moving your entire structure to EAV can lead to a lot of problems down the line, but it might be acceptable for the audit-trail portion of your problem since often foreign key relationships and strict datatyping may disappear over time anyway. You can probably even generate your audit tables automatically with triggers and stored procedures.
Note, however, that reconstructing old versions of records is non-trivial with an EAV audit trail and will require a fair amount of application code. The database will not be able to do it by itself.
An alternative you could consider is to store all your data (new and old records) in the same table. You can either include audit fields in the same table and leave NULL when unnecessary, or store some rows in the table being "current" and with audit-related fields stored in another table. To simplify your application, you can create a view which only shows current rows and issue queries against the view.
You can accomplish this with a joined table inheritance pattern. With joined table inheritance, you put common attributes into a base table along with a "type" column, and you can join to additional tables (which have the same primary key which is also a foreign key) based on type. Many Data-Mapper-Pattern ORMs have native support for this pattern, often called "polymorphism".
You could also use PostgreSQL's native table inheritance mechanism, but note the caveats carefully!

Can I use MySQL temporary tables to store search results?

I have a search page written in PHP, and it needs to search in the MySQL database, and the result need to be sortable. This search page will be accessed by many users (>1000 at any time).
However, it is not feasible to sort the search result in MySQL, as it would be very slow.
I'm thinking of storing each search result into a temporary table (not MySQL temporary table), and the table name is stored inside another table for reference like this:
| id | table_name | timeout |
-----------------------------
| 1 | result_1 | 10000 |
| 2 | result_2 | 10000 |
Then I can use the temporary tables to sort any search results whenever needed without the need to reconstruct (with some modification) the query.
Each table will be dropped, according to the specified timeout.
Assuming I cannot modify the structure of existing tables that are used in the query, would this be a good solution or are there better ways? Please advice.
Thanks
There's no need to go to the trouble of storing the results in a persistent database when you just want to cache search results in memory. Do you need indexed access to relational data? If the answer is no, don't store it in a MySQL database.
I know that phpbb (an open source web forum which supports MySQL backends) uses a key-value store to back its search results. If the forum is configured to give you a link to the specific results page (with the search id hash in the URL's query string) then that link will be valid for awhile but eventually be flushed out of the cache, just like you want. It may be overkill to implement a full database abstraction layer if you're set on MySQL though. Anyway:
http://wiki.phpbb.com/Cache
You should just use memcached or something to store the results data, and you can easily retrieve the data and sort it in PHP. Also there are some PHP-specific cache frameworks that minimize the cost of loading and offloading data from the interpreter:
https://en.wikipedia.org/wiki/List_of_PHP_accelerators

Website: What is the best way to store a large number of user variables?

I'm designing a website using PHP and MySQL currently and as the site proceeds I find myself adding more and more columns to the users table to store various variables.
Which got me thinking, is there a better way to store this information? Just to clarify, the information is global, can be affected by other users so cookies won't work, also I'd lose the information if they clear their cookies.
The second part of my question is, if it does turn out that storing it in a database is the best way, would it be less expensive to have a large number of columns or rather to combine related columns into delimited varchar columns and then explode them in PHP?
Thanks!
In my experience, I'd rather get the database right than start adding comma separated fields holding multiple items. Having to sift through multiple comma separated fields is only going to hurt your program's efficiency and the readability of your code.
Also, if your table is growing to much, then perhaps you need to look into splitting it into multiple tables joined by foreign dependencies?
I'd create a user_meta table, with three columns: user_id, key, value.
I wouldn't go for the option of grouping columns together and exploding them. It's untidy work and very unmanageable. Instead maybe try spreading those columns over a few tables and using InnoDb's transaction feature.
If you still dislike the idea of frequently updating the database, and if this method complies with what you're trying to achieve, you can use APC's caching function to store (cache) information "globally" on the server.
MongoDB (and its NoSQL cousins) are great for stuff like this.
The database a perfectly fine place to store such data, as long as they're variables and not, say, huge image files. The database has all the optimizations and specifications for storing and retrieving large amounts of data. Anything you set up on file system level will always be beaten by what the database already has in terms of speed and functionality.
would it be less expensive to have a large number of columns or rather to combine related columns into delimited varchar columns and then explode them in PHP?
It's not really that much of a performance than a maintenance question IMO - it's not fun to manage hundreds of columns. Storing such data - perhaps as serialized objects - in a TEXT field is a viable option - as long as it's 100% sure you will never have to make any queries on that data.
But why not use a normalized user_variables table like so:
id | user_id | variable_name | variable_value
?
It is a bit more complex to query, but provides for a very clean table structure all round. You can easily add arbitrary user variables that way.
If you are doing a lot of queries like SELECT FROM USERS WHERE variable257 = 'green' you may have to stick to have specific columns.
The database is definitely the best place to store the data. (I'm assuming you were thinking of storing it in flat files otherwise) You'd definitely get better performance and security from using a DB over storing in files.
With regards to the storing your data in multiple columns or delimiting them... It's a personal choice but you should consider a few things
If you're going to delimit the items, you need to think of what you're going to delimit them with (something that's not likely to crop up within the text your delimiting)
I often find that it helps to try and visualise whether another programmer of your level would be able to understand what you've done with little help.
Yes, as Pekka said, if you want to perform queries on the data stored you should stick with the seperate columns
You may also get a slight performance boost from not retrieving and parsing ALL your data every time if you just want a couple of fields of information
I'd suggest going with the seperate columns as it offers you the option of much greater flexibility in the future. And there's nothing worse than having to drastically change your data structure and migrate information down the track!
I would recommend setting up a memcached server (see http://memcached.org/). It has proven to be viable with lots of the big sites. PHP has two extensions that integrate a client into your runtime (see http://php.net/manual/en/book.memcached.php).
Give it a try, you won't regret it.
EDIT
Sure, this will only be an option for data that's frequently used and would otherwise have to be loaded from your database again and again. Keep in mind though that you will still have to save your data to some kind of persistent storage.
A document-oriented database might be what you need.
If you want to stick to a relational database, don't take the naïve approach of just creating a table with oh so many fields:
CREATE TABLE SomeEntity (
ENTITY_ID CHAR(10) NOT NULL,
PROPERTY_1 VARCHAR(50),
PROPERTY_2 VARCHAR(50),
PROPERTY_3 VARCHAR(50),
...
PROPERTY_915 VARCHAR(50),
PRIMARY KEY (ENTITY_ID)
);
Instead define a Attribute table:
CREATE TABLE Attribute (
ATTRIBUTE_ID CHAR(10) NOT NULL,
DESCRIPTION VARCHAR(30),
/* optionally */
DEFAULT_VALUE /* whatever type you want */,
/* end_optionally */
PRIMARY KEY (ATTRIBUTE_ID)
);
Then define your SomeEntity table, which only includes the essential attributes (for example, required fields in a registration form):
CREATE TABLE SomeEntity (
ENTITY_ID CHAR(10) NOT NULL
ESSENTIAL_1 VARCHAR(30),
ESSENTIAL_2 VARCHAR(30),
ESSENTIAL_3 VARCHAR(30),
PRIMARY KEY (ENTITY_ID)
);
And then define a table for those attributes that you might or might not want to store.
CREATE TABLE EntityAttribute (
ATTRIBUTE_ID CHAR(10) NOT NULL,
ENTITY_ID CHAR(10) NOT NULL,
ATTRIBUTE_VALUE /* the same type as SomeEntity.DEFAULT_VALUE;
if you didn't create that field, then any type */,
PRIMARY KEY (ATTRIBUTE_ID, ENTITY_ID)
);
Evidently, in your case, that SomeEntity is the user.
Instead of MySQL you might consider using a triplestore, or a key-value store
that way you get the benifits of having all the multithreading multiuser, performance and caching voodoo, figured out, without all the trouble of trying to figure out ahead of time what kind of values you really want to store.
Downsides: it's a bit more costly to figure out the average salary of all the people in idaho who also own hats.
depends on what kind of user info you are storing. if its session pertinent data, use php sessions in coordination with session event handlers to store your session data in a single data field in the db.

Categories