SQL Join duplicates, converting an access db - php

I am converting an access database to a new format. Currently all data resides in MySQL.
For the purposes of this question, there are 3 tables. tbl_Bills, tbl_Documents, and tbl_Receipts.
I wrote an outer join query , as some bills have documents and receipts, other's don't. And I need a full listing of each set, given those situations, to be processed by a php script later on.
The problem is that the primary identifier, we'll call fld_CommonID, happens to exist in duplicate. For example, 3 bills have the same identifier, with different information. 3 documents and 3 receipts match those 3 bills.
So as you might have guessed, my join query results in 9 indistinct rows (6 duplicates), when there should be 3 (one join from each table). An inner join excludes data that isn't defined in the other table, and so doesn't work for my needs.
SO ... I'm thinking what I want to do, is update those 3 records in each table (across all rows that have duplicates) such that they have a unique counter id. #1, #2, and #3 respectively, so that I can perform join queries on them uniquely per row.
Is that possible without running php code to select the duplicates ordered by natural table order, followed-by updating them with a counter?
Would you advise that I go that route(scripted) instead of some magical SQL query to do such a thing, if such a query can be made?
Or is it possible to outer join based on natural table order (pretty sure that's impossible)?

writing this answer to simply close the question.
Inner joins would be perfect if there were a way to link duplicate fields in separate tables based on natural order (no primary key). The problem isn't that I lack a query, it's that the database is poorly structured. Which is a problem better solved with code not complex queries.

Related

MySQL left join with given ids

So I need to left join a table from MySQL with a couple of thousands of ids.
It’s like I need to temporarily build a table for the join then delete it, but that just sounds not right.
Currently the task is done by code but proves pretty slow on the results, and an sql query might be faster.
My thought was to use ...WHERE ID IN (“.$string_of_values.”);
But that cuts off the ids that have no match on the table.
So, how is it possible to tell MySQL to LEFT JOIN a table with a list of ids?
As I understand your task you need to leftjoin your working table to your ids, i.e. the output must contain all these ids even there is no matched row in working table. Am I correct?
If so then you must convert your ids list to the rowset.
You already tries to save them to the table. This is useful and safe practice. The additional points are:
If your dataset is once-used and may be dropped immediately after final query execution then you may create this table as TEEMPORARY. Than you may do not care of this table - it wil be deleted automatically when the connection is closed, but it may be reused (including its data edition) in this connection until it is closed. Of course the queries which creates and fills this table and final query must be executed in the same connection in that case.
If your dataset is small enough (approximately - not more than few megabytes) then you may create this table with the option ENGINE = Memory. In this case only table definition file (small text file) will be really written to the disk whereas the table body will be stored in the memory only, so the access to it will be fast enough.
You may create one or more indexes in such table and improve final query performance.
All these options may be combined.
Another option is to create such rowset dynamically.
In MySQL 5.x the only option is to create such rowset in according subquery. Like
SELECT ...
FROM ( SELECT 1 AS id UNION SELECT 2 UNIO SELECT 22 ... ) AS ids_rowset
LEFT JOIN {working tables}
...
In MySQL 8+ you have additional options.
You may do the same but use CTE:
WITH ids_rowset AS ( SELECT 1 AS id UNION SELECT 2 UNIO SELECT 22 ... )
SELECT ...
FROM ids_rowset
LEFT JOIN {working tables}
...
Alternatively you may transfer your ids list in some serialized form and parse it to the rowset in the query (in recursive CTE, or by using some table function, for example, JSON_TABLE).
All these methods creates once-used rowset (of course, CTE can be reused within the query). And this rowset cannot be indexed for query improvement (server may index this dataset during query execution if it finds this reasonable but you cannot affect this).

MySQL performance issue with large tables

I've been asked to develop a web software able to store some reading data from heat metering device and to divide the heat expenses among all the flat owner. I chose to work in php with MySQL engine MyISAM.
I was not used to work with large data, so i simply created a logical database where we have:
a table for building, with an id as primary key indexed (now we have ~1200
buildings in the db)
a table with all the flats in all the buildings, with an id as primary key indexed and the building_id to link to the building (around 32k+ flats in total)
a table with all the heaters in all the flats, with an id as primary key indexed and the flat_id to link to the flat (around 280k+ heaters)
a table with all the reading value, with the timestamp of the reading, an id as primary key and the heater_id to link to the heater (around 2.7M+ reading now)
There is also a separate table, linked to the building, where are stored the starting date and the end date between which the division of expenses have to be done.
When it is necessary to get all the data from a building, the approach i used is to get raw data from DB with single query, elaborate in php, than make the next query.
So here is roughly the operation sequence i used:
get the starting and end date from the specific table with a single query
store the dates in a php variable
get all the flats of the building: SELECT * FROM flats where building_id=my_building_id
parse all the data in php with a php while cycle
on each step of the while cycle i make a query getting all the heaters of that specific flat: SELECT * FROM heaters where flat_id=my_flat_id
parse all the data of the heaters with a php while cycle
on each step of this inner while cycle i'll get the last reading value of that specific heater: SELECT * FROM reading_values where heater_id=my_heater_id AND data<my_data
Now the problem is that i have serious performance issue.
Before someone point it out, i cannot get only reading value jumping all the first 6 steps of the list above, since i need to print bills and on each bill i have to write all flat information and all heaters information, so i have to get all the flats and heaters data anyway.
So I'd like some suggestions on how to improve script performance:
all the tables are indexed, but i have to add some index somewhere else?
would using a single query with subquery instead of several one among php code improve performance?
any other suggestions?
I haven't inserted specific code as i think it would have made the question too heavy, but if asked i could insert some.
Some:
Don't use 'SELECT *' if you can avoid it -> Just get the fields you really need
I didn't test it in your particular case, but usually a single query which joins all three tables should achieve much better performance rather than looping through results with php.
If you need to loop for some reason, then at least use mysql prepared statements, which again should increase performance given the amount of queries :)
Hope it helps!
Regards
EDIT:
just to exemplify an alternative query, not sure if this suits your specific needs and without testing it (which probably means I forgot something):
SELECT
a.field1,
b.field2,
c.field3,
d.field4
FROM heaters a
JOIN reading_values b ON (b.heater_id = a.heater_id)
JOIN flats c ON (c.flat_id = a.flat_id)
JOIN buildings d ON (d.building_id = c.building_id)
WHERE
a.heater_id = my_heater_id
AND b.date < my_date
GROUP BY a.heater_id
EDIT 2
Following your comments, I modified the query so that it retrieves the information as you want it: Given a building id, it will list all the heaters and their newest reading value according to a given date:
SELECT
a.name,
b.name,
c.name,
d.reading_value,
d.created
FROM buildings a
JOIN flats b ON (b.building_id = a.building_id)
JOIN heaters c ON (c.flat_id = b.flat_id)
JOIN reading_values d ON (d.reading_value_id = (SELECT reading_value_id FROM reading_values WHERE created <= my_date AND heater_id = c.heater_id ORDER BY created DESC LIMIT 1))
WHERE
a.building_id = my_building_id
GROUP BY c.heater_id
It should be interesting to know how it performs in your environment.
Regards

Good practice for handling naturally JOINed results across an application

I'm working on an existing application that uses some JOIN statements to create "immutable" objects (i.e. the results are always JOINed to create a processable object - results from only one table will be meaningless).
For example:
SELECT r.*,u.user_username,u.user_pic FROM articles r INNER JOIN users u ON u.user_id=r.article_author WHERE ...
will yield a result of type, let's say, ArticleWithUser that is necessary to display an article with the author details (like a blog post).
Now, I need to make a table featured_items which contains the columnsitem_type (article, file, comment, etc.) and item_id (the article's, file's or comment's id), and query it to get a list of the featured items of some type.
Assuming tables other than articles contain whole objects that do not need JOINing with other tables, I can simply pull them with a dynamicially generated query like
SELECT some_table.* FROM featured_items RIGHT JOIN some_table ON some_table.id = featured_items.item_id WHERE featured_items.type = X
But what if I need to get a featured item from the aforementioned type ArticleWithUser? I cannot use the dynamically generated query because the syntax will not suit two JOINs.
So, my question is: is there a better practice to retrieve results that are always combined together? Maybe do the second JOIN on the application end?
Or do I have to write special code for each of those combined results types?
Thank you!
a view can be thot of as like a table for the faint of heart.
https://dev.mysql.com/doc/refman/5.0/en/create-view.html
views can incorporate joins. and other views. keep in mind that upon creation, they take a snapshot of the columns in existence at that time on underlying tables, so Alter Table stmts adding columns to those tables are not picked up in select *.
An old article which I consider required reading on the subject of MySQL Views:
By Peter Zaitsev
To answer your question as to whether they are widely used, they are a major part of the database developer's toolkit, and in some situations offer significant benefits, which have more to do with indexing than with the nature of views, per se.

Too relation or not to relation ? A MySQL, PHP database workflow

im kinda new with mysql and i'm trying to create a kind complex database and need some help.
My db structure
Tables(columns)
1.patients (Id,name,dob,etc....)
2.visits (Id,doctor,clinic,Patient_id,etc....)
3.prescription (Id,visit_id,drug_name,dose,tdi,etc....)
4.payments (id,doctor_id,clinic_id,patient_id,amount,etc...) etc..
I have about 9 tables, all of them the primary key is 'id' and its set to autoinc.
i dont use relations in my db (cuz i dont know if it would be better or not ! and i never got really deep into mysql , so i just use php to run query's to Fitch info from one table and use that to run another query to get more info/store etc..)
for example:
if i want to view all drugs i gave to one of my patients, for example his id is :100
1-click patient name (name link generated from (tbl:patients,column:id))
2-search tbl visits WHERE patient_id=='100' ; ---> that return all his visits ($x array)
3-loop prescription tbl searching for drugs with matching visit_id with $x (loop array).
4- return all rows found.
as my database expanding more and more (1k+ record in visit table) so 1 patient can have more than 40 visit that's 40 loop into prescription table to get all his previous prescription.
so i came up with small teak where i edited my db so that patient_id and visit_id is a column in nearly all tables so i can skip step 2 and 3 into one step (
search prescription tbl WHERE patient_id=100), but that left me with so many duplicates in my db,and i feel its kinda stupid way to do it !!
should i start considering using relational database ?
if so can some one explain a bit how this will ease my life ?
can i do this redesign but altering current tables or i must recreate all tables ?
thank you very much
Yes, you should exploit MySQL's relational database capabilities. They will make your life much easier as this project scales up.
Actually you're already using them well. You've discovered that patients can have zero or more visits, for example. What you need to do now is learn to use JOIN queries to MySQL.
Once you know how to use JOIN, you may want to declare some foreign keys and other database constraints. But your system will work OK without them.
You have already decided to denormalize your database by including both patient_id and visit_id in nearly all tables. Denormalization is the adding of data that's formally redundant to various tables. It's usually done for performance reasons. This may or may not be a wise decision as your system scales up. But I think you can trust your instinct about the need for the denormalization you have chosen. Read up on "database normalization" to get some background.
One little bit of advice: Don't use columns named simply "id". Name columns the same in every table. For example, use patients.patient_id, visits.patient_id, and so forth. This is because there are a bunch of automated software engineering tools that help you understand the relationships in your database. If your ID columns are named consistently these tools work better.
So, here's an example about how to do the steps numbered 2 and 3 in your question with a single JOIN query.
SELECT p.patient_id p.name, v.visit_id, rx.drug_name, rx.drug_dose
FROM patients AS p
LEFT JOIN visits AS v ON p.patient_id = v.patient_id
LEFT JOIN prescription AS rx ON v.visit_id = rx.visit_id
WHERE p.patient_id = '100'
ORDER BY p.patient_id, v.visit_id, rx.prescription_id
Like all SQL queries, this returns a virtual table of rows and columns. In this case each row of your virtual table has patient, visit, and drug data. I used LEFT JOIN in this example. That means that a patient with no visits will have a row with NULL data in it. If you specify JOIN MySQL will omit those patients from the virtual table.

Issue Regarding Join Statements (Or Other Efficient Method) in MySQL

It took a while to come up with a title as I wasn't sure what to title it. Basically my problem deals with SQL queries and coming up with an efficient method to go about what I am trying to do.
To give it in an example, say we have two tables:
Table 1 (Articles): ID | ArticleName | AuthorID
Table 2 (Users): ID | AuthorName
What I am attempting to do is pull, say the last 5 articles. From here, with each article it pulls it has a while loop to query the second table to pull AuthorName where ID=AuthorID.
In essence, we have one query for the 5 articles and then another five queries to get the author names. This is further compounded on pages with 10-20 or more articles, where there's an extra 10-20+ queries.
Is there a more efficient method to join these statements together and have it pull the AuthorName for each article it pulls?
The reason for using AuthorID in table 1 is so that if usernames are changed, it doesn't break anything. Along with this, it (as far as I understand) cuts down a lot on the database storage.
I'm still somewhat new to SQL though so any ideas on how to resolve this would be much appreciated.
Thanks in advance, and if there are any questions please don't hesitate to ask!
SELECT * FROM `Articles` INNER JOIN `Users` ON `Articles`.`AuthorID`=`Users`.`ID`
There's two ways to do this. You can either do a one-shot query that JOINs in the additional authors table and presents a complete result set, or you can do a two pass where you fetch all the authors in a subsequent call using SELECT ... FROM Authors WHERE ID IN (...) using the distinct identifiers from the first query.
For small lists and small tables the JOIN method will almost always be more convenient. For large lists the two-pass approach seems "dumber" but often out-performs doing the join in the database. For instance, if the number of articles is very large and the number of authors is small then the JOIN adds significant amounts of work to the large query that could be eliminated by making a small secondary query after the fact.
For this case, with less than one million records and small fetch sizes, go with JOIN.

Categories