Does anyone have any idea on how can you create a product filtering query (or queries) that will emulate the results on this page?
http://www.emag.ro/notebook_laptop
Explanation
If you press HP as a brand, the page will show you all the HP products, and the rest of the available filters are gathered from this query result. Fine and dandy until now, I got this licked w/o any problems.
Press 4GB Ram, and ofcourse you will see all HP products that have this property/feature. Again fine and dandy, got no problems until here.
BUT if you look closely you will see that the Brand features now show also, let's say Acer, having a few products with the 4GB feature, and maybe more after Acer, and the checkbox isn't yet pressed.
The only ideea that comes to mind is to make that much more queries to the database to get these other possibilities results.
After you start checking the 3rd possible option (let's say Display size) the things start to complicate even more.
I guess my question is:
Does anyone has any idea on how to make this w/o taxing the server with tons of queries ?
Thank you very much for reading this far, I hope I made myself clear in all this little story.
Take a look at sql
UNION
syntax.
"UNION is used to combine the result from multiple SELECT statements into a single result set."
It's not really "tons" of queries, it's one query per attribute type (brand, RAM, HDD). Let's say you have selected HP, 4GB RAM and 250GB disk. Now for each attribute type select products according to the filter, except for the current type, and group by results by the current type. In a simplistic model, the queries could look like this:
SELECT brand, count(*) FROM products WHERE ram='4BG' AND disk='250GB' GROUP BY brand
SELECT ram, count(*) FROM products WHERE brand='HP' AND disk='250GB' GROUP BY ram
SELECT disk, count(*) FROM products WHERE brand='HP' AND ram='4BG' GROUP BY disk
SELECT cpu, count(*) FROM products WHERE brand='HP' AND ram='4BG' AND disk='250BG' GROUP BY cpu
...
You should have indexes on the columns, that every query doesn't do a sequential scan over the table. Of course there are some "popular" combinations and you will likely have to display the same numbers on multiple pages when the user is sorting/navigating the list, so you might want to cache the numbers and invalidate the cache on update/insert/delete.
It could be that there is some sophisticated means of determining some computed distance of a result from your criteria, but maybe it is as simple as using an OR in the query rather than an AND.
Related
As i am a junior PHP Developer growing day by day stuck in a performance problem described here:
I am making a search engine in PHP ,my database has one table with 41 column and million's of rows obviously it is a very large dataset. In index.php i have a form for searching data.When user enters search keyword and hit submit the action is on search.php with results.The query is like this.
SELECT * FROM TABLE WHERE product_description LIKE '%mobile%' ORDER BY id ASC LIMIT 10
This is the first query.After result shows i have to run 4 other query like this:
SELECT DISTINCT(weight_u) as weight from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(country_unit) as country_unit from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(country) as country from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(hs_code) as hscode from TABLE WHERE product_description LIKE '%mobile%'
These queries are for FILTERS ,the problem is this when i submit search button ,all queries are running simultaneously at the cost of Performance issue,its very slow.
Is there any other method to fetch weight,country,country_unit,hs_code speeder or how can achieve it.
The same functionality is implemented here,Where the filter bar comes after table is filled with data,How i can achieve it .Please help
Full Functionality implemented here.
I have tried to explain my full problem ,if there is any mistake please let me know i will improve the question,i am also new to stackoverflow.
Firstly - are you sure this code is working as you expect it? The first query retrieves 10 records matching your search term. Those records might have duplicate weight_u, country_unit, country or hs_code values, so when you then execute the next 4 queries for your filter, it's entirely possible that you will get values back which are not in the first query, so the filter might not make sense.
if that's true, I would create the filter values in your client code (PHP)- finding the unique values in 10 records is going to be quick and easy, and reduces the number of database round trips.
Finally, the biggest improvement you can make is to use MySQL's fulltext searching features. The reason your app is slow is because your search terms cannot use an index - you're wild-carding the start as well as the end. It's like searching the phonebook for people whose name contains "ishra" - you have to look at every record to check for a match. Fulltext search indexes are designed for this - they also help with fuzzy matching.
I'll give you some tips that will show useful in many situations when querying a large dataset, or mostly any dataset.
If you can list the fields you want instead of querying for '*' is a better practice. The weight of this increases as you have more columns and more rows.
Always try to use the PK's to look for the data. The more specific the filter, the less it will cost.
An index in this kind of situation would come pretty handy, as it will make the search more agile.
LIKE queries are generally pretty slow and resource heavy, and more in your situation. So again, the more specific you are, the better it will get.
Also add, that if you just want to retrieve data from this tables again and again, maybe a VIEW would fit nicely.
Those are just some tips that came to my mind to ease your problem.
Hope it helps.
So i have a website with a product catalog, this page has 4 product sliders one for recent products, another for bestsellers a third one for special offers.
Is it better to create a query for each type of slider, or should I get all products and then have php sort them out and separate them into three distict arrays one for each slider?
Currently I am just doing
SELECT * FROM products WHERE deleted = 0
For testing.
It's almost always best to refine the query so that it just returns not only the records you actually need, but also only the columns you really need. So, in your case this would look like
SELECT id, description, col3
FROM products
WHERE deleted = 0
AND -- conditions that make it fit in the proper slider
The reason is that it also costs resources (time and bandwidth) to transport a result set over to the processing program and processing time to evaluate the individual return records, and thus the "cost" of the query will evolve with the table size, not with the size of the dataset you actually need.
Just an example, but let's suppose you want to create a top 10 seller list, and you have 100 items in story. You'll retrieve 100 records and retain 10. No big deal. Now, your shop grows and you have 100000 items in store. For your top 10 you'll have to plow through all of theses and throw away 99990 records. 99990 records you'll have the db server read and transfer to you, and 99990 records you have to individually inspect to see whether it's a top 10 item.
As this type of query is going to be executed often, it's also a good idea to optimize them by indexing the search columns, as indexed searches in the db server are much faster.
I said "almost always" because there are rare cases where you have a hard time expressing in SQL what you actually need, or where you need to use fairly exotic techniques like query hints to force the database engine to execute your query in a reasonably efficient way. But these cases are fairly rare, and when a query doesn't perform as expected, with some analysis you'll manage to improve its performance in most cases - have a look at the literally thousands of questions regarding query optimization here on SO.
I'm working on a management system for a small library. I proposed them to replace the Excel spreadsheet they are using now with something more robust and professional like PhpMyBibli - https://en.wikipedia.org/wiki/PhpMyBibli - but they are scared by the amount of fields to fill, and also the interfaces are not fully translated in Italian.
So I made a very trivial DB, with basically a table for the authors and a table for the books. The authors table is because I'm tired to have to explain that "Gabriele D'Annunzio" != "Gabriele d'Annunzio" != "Dannunzio G." and so on.
My test tables are now populated with ~ 100k books and ~ 3k authors, both with plausible random text, to check the scripts under pressure.
For the public consultation I want to make an interface like that of Gallica, the website of the Bibliothèque nationale de France, which I find pretty useful. A sample can be seen here: http://gallica.bnf.fr/Search?ArianeWireIndex=index&p=1&lang=EN&f_typedoc=livre&q=Computer&x=0&y=0
The concept is pretty easy: for each menu, e.g. the author one, I generate a fancy <select> field with all the names retrieved from the DB, and this works smoothly.
The issue arises when I try to add beside every author name the number of books, as made by Gallica, in this way (warning - conceptual code, not actual PHP):
SELECT id, surname, name FROM authors
foreach row {
SELECT COUNT(*) as num FROM BOOKS WHERE id_auth=id
echo "<option>$surname, $name ($num)</option>";
}
With the code above a core of the CPU jumps at 100%, and no results are shown in the browser. Not surprising, since they are 3k queries on a 100k table in a very short time.
Just to try, I added a LIMIT 100 to the first query (on the authors table). The page then required 3 seconds to be generated, and 15 seconds when I raised the LIMIT to 500 (seems a linear increment). But of course I can't show to library users a reduced list of authors.
I don't know which hardware/software is used by Gallica to achieve their results, but I bet their budget is far above that of a small village library using 2nd hand computers.
Do you think that to add a "number_of_books" field in the authors table, which will be updated every time a new book is inserted, could be a practical solution, rather than to browse the whole list at every request?
BTW, a similar procedure must be done for the publication date, the language, the theme, and some other fields, so the query time will be hit again, even if the other tables are a lot smaller than the authors one.
Your query style is very inefficient - try using a join and group structure:
SELECT
authors.id,
authors.surname,
authors.name,
COUNT(books.id) AS numbooks
FROM authors
INNER JOIN books ON books.id_auth=authors.id
GROUP BY authors.id
ORDER BY numbooks DESC
;
EDIT
Just to clear up some issues I not explicitely said:
Ofcourse you don't need a query in the PHP loop any longer, just the displaying portion
Indices on books.id_auth and authors.id (the latter primary or unique) are assumed
EDIT 2
As #GordonLinoff pointed out, the IFNULL() is redundant in an inner join, so I removed it.
To get all themes, even if there aren't any books in them, just use a left join (this time including the IFNULL(), if your provider's MySQL may be old):
SELECT
theme.id,
theme.main,
theme.sub,
IFNULL(COUNT(books.theme),0) AS num
FROM themes
LEFT JOIN books ON books.theme=theme.id
GROUP BY themes.id
;
EDIT 3
Ofcourse a stored value will give you the best performance - but this denormalization comes at a cost: Your Database now has the potential to become inconsistent in a user-visible way.
If you do go with this method. I strongly recommend you use triggers to auto-fill this field (and ofcourse those triggers must sit on the books table).
Be prepared to see slowed down inserts - this might ofcourse be okay, as I guess you will see a much higher rate of SELECTS than INSERTS
After reading a lot about how the JOIN statement works, with the help of
useful answer 1 and useful answer 2, I discovered I used it some 15 or 20 years ago, then I forgot about this since I never needed it again.
I made a test using the options I had:
reply with the JOIN query with IFNULL(): 0,5 seconds
reply with the JOIN query without IFNULL(): 0,5 seconds
reply using a stored value: 0,4 seconds
That DB will run on some single core old iron, so I think a 20% difference could be significant, and I decide to use stored values, updating the count every time a new book is inserted (i.e. not often).
Anyway thanks a lot for having refreshed my memory: JOIN queries will be useful somewhere else in my DB.
update
I used the JOIN method above to query the book themes, which are stored into a far smaller table, in this way:
SELECT theme.id, theme.main, theme.sub, COUNT(books.theme) as num FROMthemesJOIN books ON books.theme = theme.id GROUP BY themes.id ORDER by themes.main ASC, themes.sub ASC
It works fine, but for themes which are not in the books table I obviously don't get a 0 response, so I don't have lines like Contemporary Poetry - Etruscan (0) to show as disabled options for the sake of list completeness.
Is there a way to have back my theme.main and theme.sub?
My PHP application has the function product_fetch([parameters]) which returns 'Product' objects which information are stored in database.
In my admin area, there is a page called "Featured Products" which allows me to select 10 products to be displayed in the main page.
Now comes the problem: I made 10 select/combobox, each allows me to select one product, out of 400. So in order to make all the options, a query has to be made: SELECT * FROM products
Question: Is it correct to make such a query, even though there's hundreds of rows?
The solution you proposed is certainly do-able, and 400 rows is really meek compared to the upper limits of what MySQL is capable of handling. What is more concerning is the user experience here. Granted this will only affect you, but I would design myself something a little nicer than a bunch of <select>s. My idea is to start with just one textbox that autocompletes the names of your products. This can be accomplished if the product title has a fulltext index. Then your autocomplete script could use this query:
SELECT * FROM Products WHERE MATCH(title) AGAINST ('contents of textbox' IN BOOLEAN MODE);
There are plenty of jQuery plugins like Autocomplete that will handle the JS side (querying the server for autocomplete results). The one I just mentioned adds a term GET parameter which you could easily grab and throw into the query:
// You might want to only select the relevant columns
$stmt = $pdo->prepare('SELECT * FROM Products WHERE MATCH(title) AGAINST (:search IN BOOLEAN MODE)');
$stmt->execute(array(':search' => $_GET['term']);
// Output JSON results
echo json_encode($stmt->fetchall());
Once you type in (or click on an autocomplete result) in the one textbox, another should appear below it and the focus should go to it. From there you can type another product, and continue until you reach 10. Each textbox (unless there is only one) should have a delete link next to it that removes that product from the featured listing.
I'll look for a site that implements this kind of functionality so you can better understand what I'm talking about. Basically here, what you're searching for is achieved. You don't have to have 10 select boxes with 400 options and you don't need a SELECT * FROM Products.
You would be much better of specifying which produts you want in the query and only returning those if you have no intention of using any of the others at all.
You can do this using many methods. A simple one would be using an ID field in an in statement like this:
select col1, col2 from products where id in(1,4,12,5)
It might seem to make little difference, but what if your procuts table had a hundred thousand rows in it?
You could also have a flag in the table to say that the items are featured which would let you use something like this:
select col1, col2 from products where featured='Y'
Or you could even have a table that only has featured items (even just their ID) and join it to your main listing like this:
select
a.col1,
a.col2
from
products a
join featured b
on a.id=b.id
If you want to pick through your whole table, you can even use a simple limit clause that picks up a certain number of rows from the table and can be reused to get the next set:
select col1, col2 from products limit 10;
// Will pick the first 10 rows of data.
select col1, col2 from procuts limit 30,10;
// Will pick rows 31-40 (skipping first 30, then returning next 10)
The short version is though, no matter how you do it, pulling back a whole table of data to pick through it within PHP is a bad thing and to be avoided at all costs. It makes the database work much harder, uses more network between the database and PHP (even if they are on the same machine, it is transferring a lot more data between the two of them) and it will by default make PHP use a lot more resources to process the information.
SELECT * FROM tbl LIMIT $page,10;
The above query will select 10 entries from offset $page
Fetching all rows is a bad idea as you are using 10 only anyway. When your table expands to millions of rows, you will see a noticeable difference.
There is nothing fundamentally wrong with selecting all rows if that's what you mean to do. Hundreds of rows also shouldn't be a problem for the query in terms of performance. However, thousands or millions might if your database grows that big.
Think about the user - can they scroll through hundreds of products? Probably. If not, then maybe it's the UI design at fault, not the query.
I am currently building a site for a car dealership. I would like to allow the user to refine the search results similar to amazon or ebay. The ability to narrow down the results with a click would be great. The problem is the way I am doing this now there are many different queries that need to be done each with a COUNT total.
So the main ways to narrow down the results are:
Vehicle Type
Year
Make
Price Range
New/Used
Currently I am doing 5 queries every time this page is loaded to get the numbers of results while passing in the set values.
Query 1:
SELECT vehicle_type, COUNT(*) AS total FROM inventory
[[ Already Selected Search Parameters]]
GROUP BY vehicle_type
ORDER BY vehicle_type ASC
Query 2:
SELECT make, COUNT(*) AS total FROM inventory
[[ Already Selected Search Parameters]]
GROUP BY make
ORDER BY make ASC
Query 3,4,5...
Is there any way to do this in one query? Is it faster?
Your queries seem reasonable.
You can do it in a single query using UNION ALL:
SELECT 'vehicle_type' AS query_type, vehicle_type, COUNT(*) AS total
FROM inventory
...
GROUP BY vehicle_type
UNION ALL
SELECT 'make', make, COUNT(*) AS total FROM inventory ... GROUP BY make
UNION ALL
SELECT ... etc ...
The performance benefit of this will not be huge.
If you find that you are firing off these queries a lot and the results don't change often, you might want to consider caching the results. Consider using something like memcache.
There are a couple ways to rank data along the lines of data warehousing but what you are trying to accomplish in search terms is called facets. A real search engine (which would be used with the sites you mentioned) performs this.
SEE: Faceted searching and categories in MySQL and Solr
Many sites use Lucene (Java-based) search engine with SOLR to accomplish what you are referring to. There is a newer solution called ElasticSearch that has a RESTful API and offers facets but you'd need to install Java, ES, and then could make calls to search engine that returns native JSON.
SEE: http://www.elasticsearch.org/guide/reference/api/search/facets/
Doing it in MySQL without requiring so many joins might need additional tables and perhaps triggers and gets complex. If the car dealership isn't expecting Cars.com traffic (millions/day) then you may be trying to optimize something before it actually needs it. Your recursive query might be fast enough and you haven't reported that there is an actual issue or bottleneck.
Use JOIN syntax:
http://dev.mysql.com/doc/refman/5.6/en/join.html
Or, I think you could write MySQL function for that. Where you will pass your search Parameters.
http://dev.mysql.com/doc/refman/5.1/en/create-function.html
To find where is faster you should do your own speed tests. That helped me to find out that some of my queries faster without joining them.