I was wondering where the cut off point is to creating a field in a database to hold your data or generating the data yourself in your code. For example, I need to know certain value that is generated from two different columns in the database. Column1-Column2 = Column3. So here is the question will it be better to generate that data in the code or should I create a Column3 and put the data there while populating the DB and then retrieve it later. In my case the data is a two digit integer or a single character string, basically small data.
I am using the latest mysql the programming language is php with the mysqli library. Also this website should not get too much traffic and the size of the db will be 200k rows at the most.
This type of attribute (column) is called derived attribute. You should not put them in database as they will increase redundancy. Just put column1 and column2 and calculate it while fetching. For example like this,
`Column1` - `Column2` as `Column3`
If you dont bother query that way every time create a view with added derived attributes.
Note, If calculating is cpu intensive you should consider using cache. and then you must implement how and when this cache will be invalidated.
It depends on how resource-hungry the calculation is and how often it's gonna be done. In your case it's pretty simple so storing the difference in a separate column would be overkill. You can do your calculation in SQL query like this:
SELECT col1, col2, col1-col2 AS col3...
Related
I'm facing a challenge that has never come up for me before and having trouble finding an efficient solution. (Likely because I'm not a trained programmer and don't know all the terminology).
The challenge:
I have a feed of data which I need to use to maintain a mysql database each day. To do this requires checking if a record exists or not, then updating or inserting accordingly.
This is simple enough by itself, but running it for thousands of records -- it seems very inefficient to do a query for each record to check if it already exists in the database.
Is there a more efficient way than looping through my data feed and running an individual query for each record? Perhaps a way to somehow prepare them into one larger query (assuming that is a more efficient approach).
I'm not sure a code sample is needed here, but if there is any more information I can provide please just ask! I really appreciate any advice.
Edits:
#Sgt AJ - Each record in the data feed has a number of different columns, but they are indexed by an ID. I would check against that ID in the database to see if a record exists. In this situation I'm only updating one table, albeit a large table (30+ columns, mostly text).
What is the problem;
if problem is performance for checking, inserting & updating;
insert into your_table
(email, country, reach_time)
values ('mike#gmail.com','Italy','2016-06-05 00:44:33')
on duplicate key update reach_time = '2016-06-05 00:44:33';
I assume that, your key is email
Old style, dont use
if email exists
update your_table set
reach_time = '2016-06-05 00:44:33'
where email = 'mike#gmail.com';
else
insert into your_table
(email, country, reach_time)
values ('mike#gmail.com','Italy','2016-06-05 00:44:33')
It depends on how many 'feed' rows you have to load. If it's like 10 then doing them record by record (as shown by mustafayelmer) is probably not too bad. Once you go into the 100 and above region I would highly suggest to use a set-based approach. There is some overhead in creating and loading the staging table, but this is (very) quickly offset by the reduction of queries that need to be executed and the amount of round-trips going on over the network.
In short, what you'd do is :
-- create new, empty staging table
SELECT * INTO stagingTable FROM myTable WHERE 1 = 2
-- adding a PK to make JOIN later on easier
ALTER TABLE stagingTable ADD PRIMARY KEY (key1)
-- load the data either using INSERTS or using some other method
-- [...]
-- update existing records
UPDATE myTable
SET field1 = s.field1,
field2 = s.field2,
field3 = s.field3
FROM stagingTable s
WHERE s.key1 = myTable.key1
-- insert new records
INSERT myTable (key1, field1, field2, field3)
SELECT key1, field1, field2, field3
FROM stagingTable new
WHERE NOT EXISTS ( SELECT *
FROM myTable old
WHERE old.key1 = new.key1 )
-- get rid of staging table again
DROP TABLE stagingTable
to bring your data up to date.
Notes:
you might want to make the name of the stagingTable 'random' to avoid the situation where 2 'loads' are running in parallel and might start re-using the same table giving all kinds of weird results (and errors). Since all this code is 'generated' in php anyway you can simply add a timestamp or something to the tablename.
on MSSQL I would load all the data in the staging table using a bulk-insert mechanism. It can use bcp or BULK INSERT; .Net actually has the SqlBulkCopy class for this. Some quick googling shows me mysql has mysqlimport if you don't mind writing to a temp-file first and then loading from there, or you could use this to do big INSERT blocks rather than one by one. I'd avoid doing 10k inserts in one go though, rather do them per 100 or 500 or so, you'll need to test what's most efficient.
PS: you'll need to adapt my syntax a bit here and there, like I said I'm more familiar with MSSQLs T-SQL dialect. Also, it could be you can use the on duplicate key methodology on the staging table direclty, thus combining the UPDATE and INSERT in one command. [MSSQL uses MERGE for this, but it would look completely different so I won't bother to include that here.]
Good luck.
I currently have a database structure for dynamic forms as such:
grants_app_id user_id field_name field_value
5--------------42434----full_name---John Doe
5--------------42434----title-------Programmer
5--------------42434----email-------example#example.com
I found this to be very difficult to manage, and it filled up the number rows in the database very quickly. I have different field_names that can vary up to 78 rows, so it proved to be very costly when making updates to the field_values or simply searching them. I would like to combine the rows and use either json or php serialize to greatly reduce the impact on the database. Does anyone have any advice on how I should approach this? Thank you!
This would be the expected output:
grants_app_id user_id data
5--------------42434----{"full_name":"John Doe", "title":"Programmer", "email":"example#example.com"}
It seems you don't have a simple primary key in those rows.
Speeding up the current solution:
create an index for (grants_app_id, user_id)
add an auto-incrementing primary key
switch from field_name to field_id
The index will make retrieving full-forms a lot more fun (while taking a bit extra time on insert).
The primary key allow you to update a row by specifying a single value backed by a unique index, which should generally be really fast.
You probably already have some definition of fields. Add integer-IDs and use them to speed up the process as less data is stored, compared, indexed, ...
Switching to a JSON-Encoded variant
Converting arrays to JSON and back can be done by using json_encode and json_decode since PHP 5.2.
How can you switch to JSON?
Possibly the current best way would be to use a PHP-Script (or similar) to retrieve all data from the old table, group it correctly and insert it into a fresh table. Afterwards you may switch names, ... This is an offline approach.
An alternative would be to add a new column and indicate by field_name=NULL that the new column contains the data. Afterwards you are free to convert data at any time or store only new data as JSON.
Use JSON?
While certainly it is tempting to have all data in one row there are somethings to remember:
with all fields preserved in a single text-field searching for a value inside a field may become a two-phase approach, as a % inside any LIKE can skip into other field's values. Also LIKE '%field:value%' is not easily optimized by indexing the column.
changing a single field means updating all stored fields. As long as you are sure only one process changes the data at any given time this is ok, otherwise there tend to be more problems.
JSON-column needs to be big enough to hold field-names + values + separators. This can be a lot. Also if you miss-calculate a long value in any field means a truncation with the risk of loosing all information on all fields after the long value
So in your case even with 78 different fields it may still be better two have a row per formular user and field. (It may even turn out that JSON is more practicable for formulars with few fields).
As explained in this question you have to remember that JSON is only some other text to MySQL.
working on the PHP project related to web scraping and my aim is to store the data into the mysql database,i'm using unique key index on 3 indexes in 9 columns table and records are more than 5k.
should i check for unique data at program level like putting values in arrays and then comparing before inserting into database ?
is there any way so that i can speed up my database insertion ?
Never ever create a duplicate table this is a anti SQL pattern and it makes it more difficult to work with your data.
Maybe PDO and prepared statement will give you a little boost but dont expect wonders from it.
multible INSERT IGNORE may also give you a little boost but dont expect wonders from it.
You should generate a multiinsert query like so
INSERT INTO database.table (columns) VALUES (values),(values),(values)
Keep in mind to keep under the max packet size that mysql will have.
this way the index file have to be updated once.
You could create a duplicate of the table that you currently have except no indices on any field. Store the data in this table.
Then use events to move the data from the temp table into the main table. Once the data is moved to the main table then delete if from the temp table.
you can follow your updates with triger. You should do update table and you have to right trigger for this table.
use PDO, mysqli_* function, to increase insertion into database
You could use "INSERT IGNORE" in your query. That way the record will not be inserted if any unique constraints are violated.
Example:
INSERT IGNORE INTO table_name SET name = 'foo', value = 'bar', id = 12345;
I have a problem how to store some data in mysql.
I have website which when link is pressed pass some data to php file which read this data with get and write in database(mysql). I'm passing campaign_id and unknown number of parameters.
http://domain.com/somefile.php?campaignid=1¶meter1=sometext1¶meter2=sometext2¶meter3=sometext3,....etc..
I don't know actual number of parameters because user make them in some sort of cms. The problem I'm facing is how to store them in database. I was thinking to make it like this below but i'm not sure if it's the right and the most effective way:
Combinations Table
-combination_id (Primary key and auto increment)
-campaign_id
-parameter1
-parameter2
-parameter3
-parameter4
-parameter5
-parameter6
-parameter7
-parameter8
-parameter9
-parameter10
In this example I assume that user will not add/use more than 10 parameters(which I think is lame, but I can't get better solution)
Also if I use this design I assume I need to check in this file where is get them from passing and write to database, if each parameter exist(if it was passed).
You have to normalize your schema.
Assume the following tables:
Entity: id, campaign_id, other fields.
Parameter: id, entityId, parameterValue.
This is a Many-to-One relation.
What About storing all the parameters as json in one table row?
You could try something like this:
combination_id (primary key auto increment)
campaign_id ( indexed / foreign key / can't be unique!)
param_name
param_value
You'd have to create an entry for every parameter you're getting, but you could theoretically add a thousand parameters or more.
Might not be the fastest method though and can be a bit hard to work with.
I think this is the kind of data nosql databases are made for... At least, trying to force it into a sql database always ends up as some kind of kludge. (been there done it...)
as far as I can see, you have three different ways of storing it:
As you proposed. Probably the easiest way to handle it and also probably the most efficient. But, at the moment you get 11 parameters you are in for major problems...
Make a parameter table - parameter_id, - campaign_id parameter (possible parameter name if it matters) - this gives you total flexibility - but everything else, ecept for searching for single values gets more difficult,
Combine the parameters and store them all in a text or varchar field. This is probably even more efficient than 1, except for searching for single parameter values.
And if I may add
Use a database system with an array type, eg postgresql
If you don't know the actual number of parameters that will come through url, there is a best option to store the infinite number of values for a campaign_id.
For that you can create multiple rows in the table. Like,
insert into table_name values(<campaign_id>,<parameter1>,<sometext>)
insert into table_name values(<campaign_id>,<parameter2>,<sometext>)
insert into table_name values(<campaign_id>,<parameter3>,<sometext>)
insert into table_name values(<campaign_id>,<parameter4>,<sometext>)
Assuming the campaign_id is unique in url.
I have hundred of thousands of elements to insert into a database. I realized calling an insert statement per element is way too costly and I need to reduce the overhead.
I recon each insert can have multiple data elements specified such as
INSERT INTO example (Parent, DataNameID) VALUES (1,1), (1,2)
My issue is that since the "DataName" keeps repeating itself for each element I thought it would optimize space if I stored these string names in another table and reference it.
However that causes problems for my idea of the bulk insert which now requires a way to actually evaluate the ID from the name before calling the bulk insert.
Any recommendations?
Should I simply de-normalize and insert the data every time as plain string to the table?
Also what is the limit of the size of the string as the string query amounts to almost 1.2 MB?
I am using PHP with MySQL backend
You haven't given us a lot of info on the database structure or size, but this may be a case where absolute normalization isn't worth the hassle.
However if you want to keep it normalized and the strings are already in your other table (let's call it datanames), you can do something like
INSERT INTO example (Parent, DataNameID) VALUES
(1, (select id from datanames where name='Foo')),
(1, (select id from datanames where name='Bar'))
First you should insert the name in the table.
Than call LAST_INSERT_ID() to get the id.
Than you can do your normal inserts.
If your table is MYisam based you can use INSERT DELAYED to improve performance: http://dev.mysql.com/doc/refman/5.5/en/insert-delayed.html
You might want to read up on load data (local) infile. It works great, I use it all the time.
EDIT: the answer only addresses the sluggishness of individual inserts. As #bemace points out, it says nothing about string IDs.