I need to make an import method that takes the CSV file and imports everything in the database.
I've done the parsing with one of Laravel's CSV addons and it works perfectly giving me a big array of values set as:
[
'col1_name' => 'col1 value',
'col2_name' => 'col2 value',
'col3_name' => 'col3 value,
'...' => '...'
]
This is also perfect since all the column names fit my model which makes the database inserts a breeze.
However - a lot of column values are strings that i'd like to set as separate tables/relations. For example, one column contains the name of the item manufacturer, and i have the manufacturer table set in my database.
My question is - what's the easy way to go through the imported CSV and swap the strings with the corresponding ID from the relationship table, making it compatible with my database design?
Something that would make the imported line:
[
'manufacturer' => 'Dell',
]
into:
[
'manufacturer' => '32',
]
I know i could just do a foreach loop comparing the needed values with values from the relationship models but I'm sure there's an easier and more clean way of doing it.
I don't think theres any "nice" way to do this - you'll need to look up each value for "manufacturer" - the question is, how many queries will you run to do so?
A consideration you need to make here is how many rows you will be importing from your CSV file.
You have a couple of options.
1) Querying 1 by 1
I'm assuming you're going to be looping through every line of the CSV file anyway, and then making a new model? In which case, you can add an extra database call in here;
$model->manufacturer_id = Manufacturer::whereName($colXValue)->first()->id;
(You'd obviously need to put in your own checks etc. here to make sure manufacturers exist)
This method is fine relatively small datsets, however, if you're importing lots and lots of rows, it might end up sluggish with alot of arguably unnecessary database calls.
2) Mapping ALL your Manufacturers
Another option would be to create a local map of all your Manufacturers before you loop through your CSV lines;
$mappedManufacturers = Manufacturer::all()->pluck('id', 'name');
This will make $mappedManufacturers an array of manufacturers that has name as a key, id as a value. This way, when you're building your model, you can do;
$model->manufacturer_id = $mappedManufacturers[$colXValue];
This method is also fine, unless you have tens of thousands of Manufacturers!
3) Where in - then re-looping
Another option would be to build up a list of manufacturer names when looping through your CSV lines, going to the database with 1 whereIn query and then re-looping through your models to populate the manufacturer ID.
So in your initial loop through your CSV, you can temporarily set a property to store the name of the manufacturer, whilst adding it to another array;
$models = collect();
$model->..... = ....;
$model->manufacturer = $colXValue;
$models->push($colXValue);
Then you'll end up with a collection of models. You then query the database for ONLY manufacturers which have appeared:
$manufacturers = Manufacturer::whereIn('name', $models->lists('manufacturer'))->get()->keyBy('name')->toArray();
This will give you array of manufacturers, keyed by their name.
You then loop through your $models collection again, assigning the correct manufacturer id using the map;
$model->manufacturer_id = $manufacturers[$model->manufacturer];
Hopefully this will give you some ideas of how you can achieve this. I'd say the solution mostly depends on your use case - if this was going to be a heavy duty ask - I'd definitely Queue it and be tempted to use Option 1! :P
Related
Hi, so I have this database project I'm working on that involves transcribing archival sources to make them more accessible.
I'm revamping the database structure, so I can make the depiction of the archival data more accurate to the manuscript sources. As part of that, I have this new table, which has both the labels/titles for columns of data in the documents, plus a "used"field which acts both as a flag for if the field is used, and also for what position it should be in left to right (As the order changes sometimes).
I'm wondering if there's a way to pair the columns together so I can do a query that - when asking for a single row to be returned= sorts the "used" functions numerically (returning all the ones that aren't -1), and also returns all the "label" fields also sorted into the same order (eg if guns_used is 2, and men_used is 1 and ship_name_position is 0, the query will put them in the correct order and also return guns_label, men_label and shipname_label in the correct order).
I'm also working with/around wordpress, so I have the contents of the whole wpdb thing available to me too.
I'm hoping to be able to "pair" the fields in some way so that if I order one set, the other gets ordered as well.
Edit:
I really would prefer to find a way to do this in a query but until I find a way to do that I'm going to
a)Select the entire row that I need
b)Have a long series of if statements- one for each pair of _label/_used fields- and assigning the values I want to the position in the array indicated by the value of the _used field.
Hello everyone and thank you for viewing this question.
Since someone asked what i am doing this for, here is the answer:
An artist asked me to make him a web app to store all his new concerts etc.. Now, when it comes to add the Instruments, artists etc, i could have 10 instruments, or maybe 100.. Everything is set into a form.. Some data is fixed like location, time etc, but this other fields are added dynamically using DOM..
I am building a system in which the user set up a form to be stored on a database like:
Name,Surname,field_1
//Lets say that this is the "fixed" part of the form
//But the user should be able to add 'n' other fields with no limit
//Therefore my problem is that i would end up with a row made of, lets say,
//4 colums
//And another one of, maybe, 100 columns
//
//Then i will need to access these rows, and row one should have 4 cols, row two 100..
//This can't be done in a "traditional" way since each row should have the
//same amount of cols
//
//I thought to create a new table for each submission
//but this doesn't really make that much sense to me..
//
//Storing all the possible fields in a single one and then
//access them through an array ? That would require too much, even since my fields
//should have the possibility to be edited..
//Each field is a mixture of variables then, like
//field1:a=12,field2:b=18.. too complex
Any help would be very appreciated
I would go the one field approach. You could have three columns, Name, Surname, and field_values. In the field_values column, store a PHP serialized string of an array representing what would otherwise be your columns. For example, running:
array(
['col1'] => 'val',
['col2'] => 'val1',
['col3'] => 'val2',
['col4'] => 'val3'
)
through serialize() would give you:
a:4:{s:4:"col1";s:3:"val";s:4:"col2";s:4:"val1";s:4:"col3";s:4:"val2";s:4:"col4";s:4:"val3";}
and you can take this value and run it back through unserialize() to restore your array and use it however you need to. Loading/saving data within this array is no more difficult than changing values in the array before serializing it and then saving it to the field_values column.
With this method you can have as many or few 'columns' as you need with no need for a ton of columns or tables.
In this case I would personally create a new table for each user, with new row inserted for ever new custom field. You must have a master table containing table names of each user table to access the data within later.
problem
I have two data tables SEQUENCES and ORGANISMS whose many-to-many-relationship is mappend in the table SOURCES. There is also a 1-m relationshipt between SOURCES and ENTRIES. I will append a detailed structure.
What i want to achieve, is the display of all sequences with all associated organisms and entries, where a condition within the sequences table is met. I have some ideas on how to achieve this, but i need the solution with the best performance, as each of these contains 50k+ entries.
idea one
Select all organisms that belong to the same sequence as a concatenated string in sql, and split it in PHP. I have no idea though, how to do the concatenation in SQL.
idea two
select same sequences with different organisms as distinct records, order by organism, and join them later in php. though this somehow feels just wrong.
idea three
use views. ANY idea on this one appreciated
structure
SEQUENCES
SEQUENCE_ID
DESCRIPTION
ORGANISMS
ORGANISM_ID
NAME
SOURCES
SOURCE_ID
SEQUENCE_ID FK to SEQUENCES.SEQUENCE_ID
ORGANISM_ID FK to ORGANISMS.ORGANISM_ID
ENTRIES
SOURCE_ID FK to SOURCES.SOURCE_ID
ENTRY_VALUE
desired outcome
array(
array(
"SEQUENCE_ID" => 4,
"DESCRIPTION" => "Some sequence",
"SOURCES" => array(
array(
"ORGANISM_ID" => 562,
"ORGANISM_NAME" => "Escherichia coli",
"ENTRIES" => array(
"some entry",
"some other entry"
),
array(
"ORGANISM_ID" => 402764,
"ORGANISM_NAME" => "Aranicola sp. EP18",
"ENTRIES" => array()
)
)
),
array(
"SEQUENCE_ID" => 5,
.....
)
)
PHP5 and FIREBIRD2.5.1
You can't fetch a nested array like that directly from a flat table structure. But if I get you right, what you want to do is not that hard to achieve.
I don't understand why you would concatenate things and then split them again, that's hard to maintain and probably slow.
I see two approaches here:
Fetch everything at once as flat table using JOIN and loop through it in PHP. This approach creates a lot of duplication but it's fast because you can fetch all data in one query and then process it with PHP.
Fetch every entity separately, loop and fetch the next hierarchy level as you go. This approach will be slower. It takes complexity away from the SQL query and doesn't fetch redunant data. It also gives you more freedom as to how you loop through your data and what you do with it.
Alternatively you might want to actually store hierarchical data in a no-sql way, where you could already store the array structure you mentioned.
The exact question. What if making relation one-to-many like this: store id's of many in one's field, if we interact with it quite seldom and no deletes in many table are expected.
Some other things
A have dishes. Dishes consist of Products. Products has their own price. What if I'd do it this way:
Products columns : { Id, Name , PricePerOne }
Dish columns: {
Id,
Name,
Content ( it is serialized [n x 2] PHP Array with ProductID and Amount of this product in each row)
And than unserialize it when necessarily and calculate the exact sum, querying from Products like this WHERE Id in ( ".explode(..)." ) or even caching this figure.
So, I must have said something wrong, but I don't need to compare this serialized string or even do something with it. It would simply be used in querying price. Actually that is quite close to relation one-to-many. I each one dish relates to few products. So, I simply store data about amount of products I need it this exact dish record.
I would advise against serializing anything you intend to search, or use in a "WHERE" clause.
The only times I have serialized data to put in my database is for logging full sets of POST of GET variables for later debugging, OR for caching an array of data.
In each example, I don't need to make any comparisons in SQL to the serialized string.
You may THINK it's easier to work with serialized data, but you're just going to end up tearing out your hair when your app runs at a glacial rate. You're negating the entire purpose of using a database.
Sit back and rethink your app from the ground up BEFORE you start down this path.
I would garner that the overhead on executing requests on a serialized DB format vs a relational one such as SQL increases exponentially with the complexity of the query you are performing.
Not only that, error checking and performing any number of more complex processes (Join? Union?) would doubtless cause premature aging...
There seems to be no shortage of hierarchical data questions in MySQL on SO, however it seems they are mostly talking about managing such data in the database or actually retrieving recursively hierarchical data. My situation is neither. I have a grid of items I need to display. Each item can also have 0 or more comments associated with it. Right now, both the item, along with its data, are displayed in the grid as well as any comments belonging to that item. Usually there is some sort of drill down, dialog, or other user action required to see child data for a grid item but in this case we display both parent and child data in the same grid. Might not fit the de facto standards but it is what it is.
Right now the comments are retrieved by a separate MySQL query for every single parent item in the grid. I immediately cringe at this being aware of all the completely separate database queries that have to be run for a single page load. I haven't profiled but I wouldn't be too surprised if this is part of the slow page loads we sometimes see. I'd like to ideally bring this down to a single query or perhaps 2. However, I'm having difficulty coming up with a solution that sounds any better than what is currently being done.
My first thought was to flatten the comment children for each row with some sort of separator like '|' and then explode them back apart in PHP when rendering the page. The issue with this is it gets increasingly complicated with having to separate each field in a comment, and then each comment, and then account for the possibility of separator characters in the data. Just feels like a mess to maintain and debug.
My next thought was to left outer join the comments to the items and just account for the item duplicates in PHP. I'm working with Codeigniter's database library that returns a PHP array for database data. This sounds like potentially a lot of duplicated data in the resulting array which could possibly be system taxing for larger result sets. I'm thinking in most cases it wouldn't be too bad though so this option is currently at the top of my possibilities list. Ideally, if I understand MVC correctly, I should keep my database, business logic, and view/display as separate as possible. So again, ideally, there should not be any database "quirks" (for lack of a better word) apparent in the data returned by the model. That is, whatever calls for data from this model method, shouldn't be concerned with duplicate data like this. So I'd have to add on an additional loop to somehow eliminate the duplicate item array entries but only after I have retrieved all the child comments and placed them into their own array.
Two queries is another idea but then I have to pass numerous item IDs in the SQL statement for the comments and then go through and zip all the data together manually in PHP.
My goal isn't to get out of doing work here but I am hoping there is some more optimal (less resource intensive and less confusing to the coder) method I haven't thought of yet.
As you state in your question, using a join will bring back a lot of duplicate information. It should be simple enough to remove in PHP, but why bring it back in the first place?
Compiling a SQL statement with a list of IDs retrieved from the query for your list of items shouldn't be a problem (see cwallenpoole's answer). Alternatively, you could create a sub-query so that MySQL recreates the list of IDs for you - it depends on how intensive the sub-query is.
Select your items:
SELECT * FROM item WHERE description = 'Item 1';
Then select the comments for those items:
SELECT * FROM comment WHERE item_id IN (
SELECT id FROM item WHERE description = 'Item 1'
);
For the most part, I solve this type of problem using some sort of ORM Lazy-Loading system but it does not look like you've that as an option.
Have you considered:
Select all top-level items.
Select all second-level items by the ID's in the top-level set.
Associate the objects retrieved in 2 with the items found in 1 in PHP.
Basically (in pseudo-code)
$stmt = $pdo->query("SELECT ID /*columns*/ FROM ENTRIES");
$entries = array();
foreach( $row as $stmt->fetchAll(PDO::FETCH_ASSOC) )
{
$row['child-entities'] = array();
$entries[$row['id']] = $row;
}
$ids = implode(',',array_keys($entries));
$stmt = $pdo->query("SELECT PARENT_ID /*columns*/ FROM children WHERE PARENT_ID IN ($ids)");
foreach( $row as $stmt->fetchAll(PDO::FETCH_ASSOC) )
{
$entries[$row['parent_pid']]['child-entities'][] = $row;
}
$entries will now be an associative array with parent items directly associated with child items. Unless recursion is needed, that should be everything in two queries.