MySQL headache, should I or should I not? - php

I have a classifieds website.
I am using SOLR for indexing and storing data. Then I also have a MySQL db with some more information about the classified which I dont store or index.
Now, I have a pretty normalized db with 4 tables.
Whenever ads are searched on the website, SOLR does the searching and returns an array of ID_numbers which will then be used to query mysql.
So solr returns id:s, which are then used to get all ads from the mysql db with THOSE id:s.
Now, all the JOIN and relations between my tables gives me a headache.
What except for maintanance-ease do I get for having a normalized db?
I could you know, store all info into one table with some 50 columns.
So instead of this for finding one ad and displaying it:
SELECT
category_option.option_name,
option_values.value
FROM classified, category_option, option_values
WHERE classified.classified_id=?id
AND classified.cat_id=category_options.cat_id
AND option_values.option_id=category_options.option_id
I could use this:
SELECT * FROM table_name WHERE classified_id = $classified_id
Isn't the last one actually faster?
Or does a normalized db permform faster?
Thanks

I would advise against denormalizing in your situation. You'll get better with joins as you use them more and they start to become clearer in your head, and maintenance ease is a good benefit for the future.
Here's a pretty good link about normalization (and denormalization). Here's a question about denormalization. One answer suggests creating a view using joins to get the data you need, and using that like your SELECT * FROM table_name WHERE classified_id = $classified_id query. A normalized DB will likely be slower, but it's unlikely you'll want to denormalize for that reason. I hope this provides some help.

Whenever you do denormalization you usually gain reading speed and lose write speed, because you have to write the same value many times. Additionally, extra care should be taken to maintain data integrity.
How many times the query will be executed?
Is this a high traffic application?
Can you add a cache?

The query using a JOIN is trivial as far as MySQL joins are concerned. I see no need to denormalize this.
I would however suggest rewriting it to not be such a PITA to read:
SELECT
category_option.option_name,
option_values.value
FROM classified
JOIN category_option USING (cat_id)
JOIN option_values USING (option_id)
WHERE classified.classified_id = ?

Related

MYSQL Query optimization, comparing 3 tables w/ thousands of records

i have this query:
SELECT L.sku,L.desc1,M.map,T.retail FROM listing L INNER JOIN moto M ON L.sku=M.sku INNER JOIN truck T ON L.sku=T.sku LIMIT 5;
Each table (listing,moto,truck) has ~300.000 rows, and just for testing purppose i've set a LIMIT of 5 results, at the end i will need hundreds but let see...
That query takes like 3:26 minutes in Console...i wont imagine how much it will take with PHP...i need to handle it there
Any advice/solution to Optmize the query? Thanks!
Two things to recommend here:
Indexes
Denormalization
One thing people tend to do when databases get massive is invoke Denormalization. This is when you store the data from multiple tables in one table to prevent the need to do a join. This is useful if your application relies on specific reads to power it. It is a commonly used tactic when scaling.
If Denormalization is out of the question, another, simpler way to optimize this query would be to make sure you have indexes on the columns you are running the join against. So the columns L.sku, m.sku,T.sku would need to be indexed, you will immediately notice an increase in performance.
Any other optimizations I would need some more information about the data, hope it helps!

see many status or create a table to store all status

I have multiple tables with relationships.
sometimes I need to do a join just to check if status = true and the query is large and a little confusing ...
wanted to know how to approach this type of situation in large projects.
was thinking of creating a table with parent and status to group all conditions - in this case only need a simple query to check if the relationship status of this true or false.
like this:
select *
from table
where table.parent in (select id from tableB where status = 1)
or table.parent in (select id from tableC where status = 1)
or table.parent in (select id from tableD where status = 1)
this is a good approach?
never tested and do not know to what extent it can be the best solution
thank you
I am little confused. Do you want to redesign your data structure, or want to optimize your query?
Without clear specification I can't offer optimized data structure. Though here is some optimization suggestion based on some assumptions.
If your parent id do not overlapped between the tables(i.e. tableb, tablec, tabled do not have common id) You can move status field to your 'table' table.
If they share some id then previous would not work. then you can use Denormalization. Add status field to the 'table' table, and keep it up to date while any of the status changed.
If you like to keep your data structure then you can optimize your query by removing sub-queries and using join instead.
In most cases JOINs are faster than sub-queries and it is very rare for a sub-query to be faster.
In JOINs RDBMS can create an execution plan that is better for your query and can predict what data should be loaded to be processed and save time, unlike the sub-query where it will run all the queries and load all their data to do the processing.
The good thing in sub-queries is that they are more readable than JOINs: that's why most new SQL people prefer them; it is the easy way; but when it comes to performance, JOINS are better in most cases even though they are not hard to read too.
You really haven't given much information - or even asked a clear question :(
SUGGESTIONS:
1) Focus on your data design first
2) Make sure your design allows the querying whatever you need from the data. For example, if you need to check "status" by date, then make sure you have datetime columns.
3) "Optimizing" queries comes later in the game. Make sure your queries are correct first, worry about "optimization" later.
4) Tuning your database (for example, identifying and implementing indexes) is crucial, and should always be done in conjunction with 3)
'Hope that helps!
PS:
If you have a specfic question, please be sure to show some sample code.

Speeding up responses from a database query

I am running a select * from table order by date desc query using php on a mysql db server, where the table has a lot of records, which slows down the response time.
So, is there any way to speed up the response. If indexing is the answer, what all columns should I make indexes.
An index speeds up searching when you have a WHERE clause or do a JOIN with fields you have indexed. In your case you don't do that: You select all entries in the table. So using an index won't help you.
Are you sure you need all of the data in that table? When you later filter, search or aggregate this data in PHP, you should look into ways to do that in SQL so that the database sends less data to PHP.
you need to use caching system.
the best i know Memcache It's really great to speed up your application and it's not using database at all.
Simple answer: you can't speed anything up using software.
Reason: you're selecting entire contents of a table and you said it's a large table.
What you could do is cache the data, but not using Memcache because it's got a limit on how much data it can cache (1 MB per key), so if your data exceeds that - good luck using Memcache to cache a huge result set without coming up with an efficient scheme of maintaining keys and values.
Indexing won't help because you haven't got a WHERE clause, what could happen is that you can speed up the order by clause slightly. Use EXPLAIN EXTENDED before your query to see how much time is being spent in transmitting the data over the network and how much time is being spent in retrieving and sorting the data from the query.
If your application requires a lot of data in order for it to work, then you have these options:
Get a better server that can push the data faster
Redesign your application because if it requires so much data in order to run, it might not be designed with efficiency in mind
Optimizing Query is a big topic and beyond the scope this question
here are some highlight that will boost you select statement
Use proper Index
Limit the number records
use the column name that you require (instead writing select * from table use select col1, col2 from table)
to limit query for large offset is little tricky in mysql
this select statement for large offset will be slow because it have to process large set of data
SELECT * FROM table order by whatever LIMIT m, n;
to optimize this query here is simple solution
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever

Is this the optimal MySQL database schema for a website that can become huge?

Im sketching out a database layout for a website that has the potential to become huge with 100's of queries a minute.
I was thinking about doing the following:
user table
id
name
(few more fields)
Pages (this one will become the biggest table)
id
titel
img
text
restaurant (this will be the row that connects the pages to the user table, i was planning on creating an index on this one to increase speed)
So im wondering if creating an index for the 'restaurant' row will increase the speed of my queries or if there is any other way to speed up things?
Thanks in advance!
If you need to do some query like :
select *
from pages
where restaurant = ...
Or like :
select *
from user
inner join pages on pages.restaurant = user.id
where user.name = '...'
Or any other condition on the restaurant column, then, you'll probably want to add an index on that column, to avoid scanning all lines on the pages table.
But note that useful/necessary indexes will almost always depend on the kind of queries you'll be doing.
Which means that it's not quite possible to accurately guess which indexes you'll need -- first, you need to know how you will access you data.
Note : you should read the How MySQL Uses Indexes section of MySQL's manual : it contains stuff that's interesting to know ;-)
As a test, you can always run your query in your preferred tool and add EXPLAIN in front. This will show you what indices are being used and/or which temporary tables had to be created etc.
EXPLAIN select *
from pages
where restaurant = ...
If you're using the InnoDB storage, you should not just use 'an index' but make use of FOREIGN KEY. Thus, you will also decrease potential integrity problems.
Suggestion: do not use restaurant as a name. Add some more tables and it will be difficult to keep track what references what. Why not call it user_id? (This is a matter of personal preference, though.)

Whats better- query with long 'where in' condition or many small queries?

Maybe it's a little dumb, but i'm just not sure what is better.
If i have to check more than 10k rows in db for existanse, what i'd do?
#1 - one query
select id from table1 where name in (smth1,smth2...{till 30k})
#2 - many queries
select id from table1 where name=smth1
Though, perfomance is not the goal, i don't want to go down with mysql either ;)
Maybe, any other solutions will be more suitable...
Thanks.
upd: The task is to fetch domains list, save new (that are not in db yet) and delete those that dissappeared from list. Hope, it'll help a little...
What you should do is create a temp table, insert all of the names, and (using one query) join against this table for your select.
select id
from table1 t1
inner join temptable tt on t1.name = tt.name
The single query will most likely perform better as the second will give a lot of round-trip delays. But if you have a lot of names like in your example the first method might cause you to hit an internal limit.
In this case it might be better to store the list of names in a temporary table and join with it.
Depending on your future needs to do similar things, you might want to add a function in the database 'strlist_to_table'. Let the function take a text where your input is delimited by a delimiter character (possibly also passed to function), split it on the delimiter to create a on-the-fly table. Then you can use
where in strlist_to_table('smth1|smth2', '|')
and also get protection from sql injection (maybe little Bobby Tables appears in the input).
Just my 2 cents...
I'm not sure how flexible your application design is, but it might be worth looking into removing the delimited list altogether and simply making a permanent third table to represent the many-to-many relationship, then joining the tables on each query.

Categories