I am developing an URL bookmark application (PHP and MySQL) for myself. With this application I will store URLs in MySQL.
The question is, should I store URLs in a TEXT column or should I first parse the URL and store its components (host, path, query, fragment) in separate columns in one table? The latter one also gives me the chance of generating statistical data by grouping servers and etc. Or maybe I should store servers in a separate table and use JOIN. What do you think?
Thanks.
I'd go with storing them in TEXT columns to start. As your application grows, you can build up the parsing and analysis functionality if you really want to. From what it sounds like, it's all just pie-in-the-sky functionality right now. Do what you need to get the basic application up and running first so that you have something to work with. You can always refactor and go from there.
The answer depends on how you like to use this data in the future.
If you like to analyze the different parts of the URL splitting them is the way to go.
If not. the INSERT, as well, as the SELECT, will be faster, if you store them in just one field.
If you know the URLs are not longer then 255 Chars, varchar(255) will be better, than text, for performance reasons.
If you seriously thing that you're going to be using it for getting interesting data, then sure, do it as a series of columns. Honestly, I'd say it'd probably just be easier to do it as a single column though.
Also, don't forget that it's easy for you to convert back and forth if you want to later. Single to multiple is just a SELECT;regex;INSERT[into another table]; multiple to single is just a INSERT SELECT with CONCAT.
Related
Well, I have a big dilemma in my life about how to store product lists when I need to link that lists to an ID or something like that.
For this first example I'll gonna use the user cart.
I've always used this, even never liking it:
ID | PRODUCT
12 | Ring
12 | Necklace
12 | Bracelet
But lately I've been thinking about store arrays in MySQL. While it sounds like a good idea (in first view), by using arrays I'd only be able to manage the carts through PHP, by creating some kind of control panel or something.
Unfortunately, there is a con in it. Although the whole system take less space than the other way, I wouldn't be able to handle things through MySQL itself. Like, if someone make an order, I wouldn't be able to SUM the prices*quantity to get the order value.
What is the best way? Is there another way?
As soon as you store unstructured information in MySQL you lose most of the benefits of using a relational database and you only have an over-engineered file system (with, that's true, excellent multi-user capabilities). If you aren't familiar with SQL (which I suspect is the case) you'll initially think you're speeding development. But one day, if the shop hopefully grows, you'll realise that retrieving everything for further PHP postprocessing doesn't scale well.
While it can certainly make sense to outsource certain features (not the complete app) to a nosql database if you have a high concurrence site, if you use a relational database you'd better use it as it's meant to be used: with properly structured and normalised information.
If you are going to need this operations over elements:
searching
sorting
filtering elements
aggregate functions
I would recomend you storing elements in separate rows. This way you are going to get full control over elements in "your arrays"
I wouldn't be able to handle things through MySQL itself. Like, if someone make an order, I wouldn't be able to SUM the prices*quantity to get the order value.
It seems you understand why your 'arrays' idea isn't a good one :)
Consider that XML (or JSON or similar) was invented because distributed systems get disconnected. It is OK to stored such 'complex' values in a database, even better if the DBMS has support for XML as a data type.
However, if you need to query the 'complex' value in the DBMS then you should be looking for first class support for such data types. For example, most SQL DBMSs have support for a timestamp/datetime temporal type: being able to declare a column of that type and compare two values of the type is one thing but what you really need are the temporal functions that will e.g. get time granules (years, days, etc), formats (weekday etc), support UTC and timezones, etc.
For your 'arrays' I'm guessing you will need to handle serialization and storage yourself, plus all that entails e.g. convert to a delimited string, escaping delimiting characters, and storing in a text column. Once you have done that work you will be left with no support on the DBMS side.
In CodeIgniter I am looking for a way to do some post processing on queries on a specific table/model. I can think of a number of ways of doing this, but I can't figure out any particularly nice way that would work well in the long run.
So what I am trying to do is something like this:
I have a table with an serial number column which is stored as an int (so it can be used as AI and PK, which might or might not be a great idea, but that's how it is right now anyway). In all circumstances where this serial number is used (in views, search queries, real world etc.) it is used with an three letter prefix. So I can add this in the view or wherever needed, but I guess my question is more on what would be the best design choice. Is there a good way to add a column ('ABC' + serial) after queries so that it is mostly transparent to the rest of the application? Perhaps something similar to CakePHPs afterFind() hook?
You can do that in the query itself:
SELECT CONCAT(prefix, serial_number) AS prefixed FROM table_name
Basically, I have tons of files with some data. each differ, some lack some variables(null) etc, classic stuff.
The part it gets somewhat interesting is that, since each file can have up to 1000 variables, and has at least 800~ values that is not null, I thought: "Hey I need 1000 columns". Another thing to mention is, they are integers, bools, text, everything. they differ by size, and type. Each variable is under 100 bytes, at all files, alth. they vary.
I found this question Work around SQL Server maximum columns limit 1024 and 8kb record size
Im unfamiliar with capacities of sql servers and table design, but the thing is: people who answered that question say that they should reconsider the design, but I cant do that. I however, can convert what I already have, as long as I still have that 1000 variables.
Im willing to use any sql server, but I dont know what suits my requirements best. If doing something else is better, please tell so.
What I need to do with this data is, look, compare, and search within. I dont need the ability to modify these. I thought of just using them as they are and keeping them as plain text files and reading from, that requires "seconds" of php runtime for viewing data out of "few" of these files and that is too much. Not even considering the fact that I need to check about 1000 or more of these files to do any search.
So the question is, what is the fastest way of having 1000++ entities with 1000 variables each, and searching/comparing for any variable I wish within them, etc. ? and if its SQL, which SQL server functions best for this sort of stuff?
Sounds like you need a different kind of database for what you're doing. Consider a document database, such as MongoDB, or one of the other not-only-SQL database flavors that allows for manipulation of data in different ways than a traditional table structure.
I just saw the note mentioning that you're only reading as well. I've had good luck with Solr on a similar dataset.
You want to use an EAV model. This is pretty common
You are asking for best, I can give an answer (how I solved it), but cant say if it is the 'best' way (in your environment), I had the Problem to collect inventory data of many thousend PCs (no not NSA - kidding)
my soultion was:
One table per PC (File for you?)
Table File:
one row per file, PK FILE_ID
Table File_data
one row per column in file, PK FILE_ID, ATTR_ID, ATTR_NAME, ATTR_VALUE, (ATTR_TYPE)
The Table File_data, was - somehow - big (>1e6 lines) but the DB handled that fast
HTH
EDIT:
I was pretty short in my anwser, lately; I want to put some additional information to my (and still working) solution:
the table 'per info source' has more than the two fields PK, FILE_ID ie. ISOURCE, ITYPE, where ISOURCE and ITYPE dscribe from where (I had many sources) and what basic Information type it is / was. This helps to get a structure into queries. I did not need to include data from 'switches' or 'monitors', when searching for USB divices (edit: to day probably: yes)
the attributes table had more fields, too. I mention here the both fileds: ISOURCE, ITYPE, yes, the same as above, but a slightly different meaning, the same idea behind
What you would have to put into these fields, depends definitely on your data.
I am sure, that if you take a closer look, what information you have to collect, you will find some 'KEY Values' for that
For storage, XML is probably the best way to go. There is really good support for XML in SQL.
For queries, if they are direct SQL queries, 1000+ rows isn't a lot and XML will be plenty fast. If you're moving towards a million+ rows, you're probably going to want to take the data that is most selective out of the XML and index that separately.
Link: http://technet.microsoft.com/en-us/library/hh403385.aspx
Can I store an array such as [1,2,3,4] or some JSON like {"name":"John Johnson","street":"Oslo, "phone":"5551234567"} as one record in mysql? If yes, is it really a good approach or for better result, Is it better to separate those and store in individual column and define the relationship? What is the good method in php to handle this?
EDIT: I want to make clear about what I wanted to do. I am using jquery UI and want to store the position of draggable and droppable in array format. The draggable and droppable will be identified with their ID along with their position so that they can be queried back and be created in the next html page whenever user wants to. I am not sure whether I have to explode those arrays, say like, in separate field like FROM_TOP, FROM_LEFT or just one array storage is sufficient. I was also little concerned about the efficiency.
You can; it is a bad idea. This violates something called normalization. You should have things separated out into individual columns and tables (as needed). It would require a lot of work to maintain the data in the JSON. Just think, you would have to update the whole record in PHP then update the record.
Say I just want to update the phone number...
Bad:
UPDATE `table`
SET `Data` = '{"name":"John Johnson","street":"Oslo, "phone":"5551234567foo"}'
Good:
UPDATE `table`
SET `Phone` = '5551234567foo'
Another reason this would be bad... querying would be slow with LIKE and very hard to be deterministic. Say you have a fax number field... which one would it be with the bad example?
Bad:
SELECT * FROM `table` WHERE `Data` LIKE '%5551234567%'
Good:
SELECT * FROM `table` WHERE `Phone` = '5551234567'
PHP really doesn't deal with the schema or handling the data directly, so there is no method to directly handle that. It's all done in the SQL.
If you know how to encode and decode json you can store it as TEXT in your Mysql Database...
It goes against the norms as usually you have normalize then into different tables.... but if this is your only way then use TEXT
It really depends what you are going to do with the data...
You can serialize() your data and store it in one database field or you can separate it and make is accessible from MySQL connections (use it to perform searches etc). If it is always standard (i.e. always has the same fields) save it in it's own fields, if it's just a random array the first method is fine
Your idea is not wrong . Sometimes, you may have to keep data in such format . i have seen this in many applications . Ex : keeping a regular expression or pattern like this . But this is a bad idea , considering database design guidelines . If you want to do a update query on this field, or change a portion of text in the field etc, at that time it would be a difficult task . If possible, keep the data in separate table with many rows .
Thanks
I use
serialize($item);
to store array of data into one field
then
unserialize($row['field_name']);
tor retrieve it
I found this to be an easy way to store a copy of a data set for revision
i would not use this as a primary method of storing data in a table though one i put it the column it there i don't try to manipulate it again. I just use it for storage
Yes you can. Just remember that you should not try to search within those fields. Horrible performance would be expected. Is it perfect design following normalization? No, but it still works.
Doctrine2 has a field type to do this automatically and abstract the serialization logic from you. However, it means getting to use the whole ORM.
If you go this route, I would advise you handle database migrations using scripts. This way, if your requirements ever change and you need to search one of the values contained, you can always write a simple migration and update your data. Modifying a schema should not be a burden.
Yes you can, MySql supports JSON natively see- https://dev.mysql.com/doc/refman/5.7/en/json.html.
As to the "horrible" performance, its not really an issue now, we use this in our production servers storing and querying millions of rows an hour. MySql has also branched out into NoSql territory so the claim that normalization is always right is bs.
INSERT INTO TABLENAME(JSON_COLUMN_NAME) VALUES (
'{"name":"John Johnson","street":"Oslo, "phone":"5551234567"}'
);
Try using json_encode when putting the array in the database and json_decode when extracting from the database.
I'm using PHP and MySQL, and I'm building a database that needs to store urls. I'm going to have to do a lot of work with the parts of the url. It's going to end up being millions of records.
My question is, what makes more sense:
to store the parts of the url in several fields, negating the need to
parse
store the whole url in one field, and parse it out every time
Thanks for any advice you can offer!
The rule of thumb when you design new database schema - is not to denormalize until it is proven that it is necessary.
So start with the most normalized and the simplest schema. And only after you experience any performance issues - profile your application and solve the particular bottleneck.
Depends on your querying pattern. If you're going to do things like SELECT * FROM urls WHERE hostname = ...., then you obviously want them split into their own fields. If you're never going to slice and dice your data using queries, then storing just the full URL itself would be fine. But you never want to parse db-side (always better to just store your parsed data if you find yourself parsing db-side).
Database structure is really depends on queries you are planning to run.
If you do need search by URL parts like domain name you do need to keep them somewhere else, outside of big urls table(s) to perform these queries against smaller table.