I have a series of rows in MySQL with a 'location' column, which represents the location of an object on a two dimensional xy grid. I want to search the table for rows with a location which is within a given distance of a certain tiles.
For example, if I ran a search within 10 tiles of [34,56], that would return any rows with a 'location' value between [24-44 and 46-66].
My solution to this problem was to create an array (using for loops) with all of the possible tiles that would fall within that search term, and then query MySQL thusly:
"SELECT * FROM table WHERE localcoordinate IN ('$rangearray')"
This solution works fine, but is very resource intensive. I'd like to be able to run many searches at a distance of hundreds or thousands of tiles. Can anyone suggest a better approach that might run faster?
I improved my resource consumption by a factor of 100 by implementing the following strategy changes.
1) I broke the xy coordinate into two fields within the table.
2) I searched natively in MySQL with the "BETWEEN" function.
The final query looked something like this. You can extrapolate the data structure from the query.
SELECT * FROM table WHERE localcoordinateX BETWEEN $x-lo AND $x-hi AND localcoordinateY BETWEEN $y-lo AND $y-hi.
I should have thought of this the first time around but I didn't. Just the act of posting to stack exchange got me thinking clearly again, though!
Related
As i am a junior PHP Developer growing day by day stuck in a performance problem described here:
I am making a search engine in PHP ,my database has one table with 41 column and million's of rows obviously it is a very large dataset. In index.php i have a form for searching data.When user enters search keyword and hit submit the action is on search.php with results.The query is like this.
SELECT * FROM TABLE WHERE product_description LIKE '%mobile%' ORDER BY id ASC LIMIT 10
This is the first query.After result shows i have to run 4 other query like this:
SELECT DISTINCT(weight_u) as weight from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(country_unit) as country_unit from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(country) as country from TABLE WHERE product_description LIKE '%mobile%'
SELECT DISTINCT(hs_code) as hscode from TABLE WHERE product_description LIKE '%mobile%'
These queries are for FILTERS ,the problem is this when i submit search button ,all queries are running simultaneously at the cost of Performance issue,its very slow.
Is there any other method to fetch weight,country,country_unit,hs_code speeder or how can achieve it.
The same functionality is implemented here,Where the filter bar comes after table is filled with data,How i can achieve it .Please help
Full Functionality implemented here.
I have tried to explain my full problem ,if there is any mistake please let me know i will improve the question,i am also new to stackoverflow.
Firstly - are you sure this code is working as you expect it? The first query retrieves 10 records matching your search term. Those records might have duplicate weight_u, country_unit, country or hs_code values, so when you then execute the next 4 queries for your filter, it's entirely possible that you will get values back which are not in the first query, so the filter might not make sense.
if that's true, I would create the filter values in your client code (PHP)- finding the unique values in 10 records is going to be quick and easy, and reduces the number of database round trips.
Finally, the biggest improvement you can make is to use MySQL's fulltext searching features. The reason your app is slow is because your search terms cannot use an index - you're wild-carding the start as well as the end. It's like searching the phonebook for people whose name contains "ishra" - you have to look at every record to check for a match. Fulltext search indexes are designed for this - they also help with fuzzy matching.
I'll give you some tips that will show useful in many situations when querying a large dataset, or mostly any dataset.
If you can list the fields you want instead of querying for '*' is a better practice. The weight of this increases as you have more columns and more rows.
Always try to use the PK's to look for the data. The more specific the filter, the less it will cost.
An index in this kind of situation would come pretty handy, as it will make the search more agile.
LIKE queries are generally pretty slow and resource heavy, and more in your situation. So again, the more specific you are, the better it will get.
Also add, that if you just want to retrieve data from this tables again and again, maybe a VIEW would fit nicely.
Those are just some tips that came to my mind to ease your problem.
Hope it helps.
I have a dilemma that I'm trying to solve right now. I have a table called "generic_pricing" that has over a million rows. It looks like this....
I have a list of 25000 parts that I need to get generic_pricing data for. Some parts have a CLEI, some have a partNumber, and some have both. For each of the 25000 parts, I need to search the generic_pricing table to find all rows that match either clei or partNumber.
Making matters more difficult is that I have to do matches based on substring searches. For example, one of my parts may have a CLEI of "IDX100AB01", but I need the results of a query like....
SELECT * FROM generic_pricing WHERE clei LIKE 'IDX100AB%';
Currently, my lengthy PHP code for finding these matches is using the following logic is to loop through the 25000 items. For each item, I use the query above on clei. If found, I use that row for my calculations. If not, I execute a similar query on partNumber to try to find the matches.
As you can imagine, this is very time consuming. And this has to be done for about 10 other tables similar to generic_pricing to run all of the calculations. The system is now bogging down and timing out trying to crunch all of this data. So now I'm trying to find a better way.
One thought I have is to just query the database one time to get all rows, and then use loops to find matches. But for 25000 items each having to compare against over a million rows, that just seems like it would take even longer.
Another thought I have is to get 2 associative arrays of all of the generic_pricing data. i.e. one array of all rows indexed by clei, and another all indexed by partNumber. But since I am looking for substrings, that won't work.
I'm at a loss here for an efficient way to handle this task. Is there anything that I'm overlooking to simplify this?
Do not query the db for all rows and sort them in your app. Will cause a lot more headaches.
Here are a few suggestions:
Use parameterized queries. This allows your db engine to compile the query once and use it multiple times. Otherwise it will have to optimize and compile the query each time.
Figure out a way to make in work. Instead of using like try ... left(clei,8) in ('IDX100AB','IDX100AC','IDX101AB'...)
Do the calculations/math on the db side. Build a stored proc which takes a list of part/clei numbers and outputs the same list with the computed prices. You'll have a lot more control of execution and a lot less network overhead. If not a stored proc, build a view.
Paginate. If this data is being displayed somewhere, switch to processing in batches of 100 or less.
Build a cheat sheet. If speed is an issue try precomputing prices into a separate table nightly, include some partial clei/part numbers if needed. Then use the precomputed lookup table.
I work on a site which sells let's say stuff and offers a "vendors search". On this search you enter your city, or postal code, or region and a distance (in km or miles) then the site gives you a list of vendors.
To do that, I have a database with the vendors. In the form to save these vendors, you enter their full address and when you click on the save button, a request to google maps is made in order to get their latitude and longitude.
When someone does a search, I look on a table where I store all the search terms and their lat/lng.
This table looks like
+--------+-------+------+
| term | lat | lng |
+--------+-------+------+
So the first query is something very simple
select lat, lng from my_search_table where term = "the term"
If I find a result, I then search with a nice method for all the vendors in the range the visitor wants and print the result on a map.
If I don't find a result, I search with a levenshtein function because people writing bruxelle or bruxeles instead of bruxelles is something really common and I don't want to make a request to google maps all the time (I also have a "how many time searched" column in my table to get some stats)
So I request my_search_time with no where clause and loop through all results to get the smallest levensthein distance. If the smallest result is greater than 2, I request coordinates from google maps.
Here is my problem. For some countries (we have several sites all around the world), my_search_table has 15-20k+ entries... and php doesn't (really) like looping on such data (which I perfectly understand) and my request falls under the php timeout. I could increase this timeout but the problem will be the same in a few months.
So I tried a levensthein MySQL function (found on stackoverflow btw) but it's also very slow.
So my question is "is there any way to make this search fast even on very large datasets ?"
My suggestion is based on three things:
First, your data set is big. That means - it's: big enough to reject the idea of "select all" + "run levenshtein() in PHP application"
Second, you have control over your database. So you can adjust some architecture-related things
Finally, performance of SELECT queries is the most important thing, while performance for adding new data doesn't matter.
The thing is you can not perform fast levenshtein search because levenshtein itself is very slow. I mean, calculating levenshtein distance is a slow thing. Thus, you'll not be able to resolve the issue with only "smart search". You'll have to prepare some data.
Possible solution will be: create some group index and assign it during adding/updating data. That means - you'll store additional column which will store some hash (numeric, for example). When adding new data, you'll:
Perform search with levenshtein distance (for that you may either use your application or that function which you've (already mentioned) over all records in your table against inserted data
Set group index for new row to value of index which found rows in previous step have.
If nothing found, set some new group index value (it' the first row and there are no similar rows yet) - which will be different from any group index values that already present in table
To search desired rows, you'll need just select rows with same group index value. That means: your select queries will be very fast. But - yes, this will cause extremely huge overhead when adding/changing your data. Thus, it isn't applicable for case, when performance of updating/inserting matters.
You could try MySQL function SOUNDS LIKE
SELECT lat, lng FROM my_search_table WHERE term SOUNDS LIKE "the term"
You can use a kd-tree or a ternary tree to speed up the search. The idea is to use a binary search.
I have a MySQL Database table with peoples names with thousands of rows.
I'm coding a search script for this table to display the most similiar names stored in the table.
So I thought of fetching ALL the rows of the table, then using a FOREACH loop that will call similar_text() (a function that returns a percentage number) and then display on the table only the names that matches 60% of similarity.
Will my website performance slow too much if do this (fetching all rows)?
Will my server bandwidth suffer because of that?
ps: 'SOUNDS LIKE' MySQL command doesn't help much on this case
Let the database do the searching.
See this question, looks like what you need: How to find similar results and sort by similarity?
Yes this will most likely slow down your site, especially as your site grows and you have many users searching simultaneously.
If possible use a stored procedure or user defined function inside the database to do the searching. Also even if you don't know the exact spelling of the entry you are looking for, if you know the first letter you can speed up the search. You can use something like WHERE name LIKE 'F%' AND similar_text(name, 'FOOBAR') > 0.6 because then an index can be used to find only those rows that start with F.
I'm running a sql query to get basic details from a number of tables. Sorted by the last update date field. Its terribly tricky and I'm thinking if there is an alternate to using the UNION clause instead...I'm working in PHP MYSQL.
Actually I have a few tables containing news, articles, photos, events etc and need to collect all of them in one query to show a simple - whats newly added on the website kind of thing.
Maybe do it in PHP rather than MySQL - if you want the latest n items, then fetch the latest n of each of your news items, articles, photos and events, and sort in PHP (you'll need the last n of each obviously, and you'll then trim the dataset in PHP). This is probably easier than combining those with UNION given they're likely to have lots of data items which are different.
I'm not aware of an alternative to UNION that does what you want, and hopefully those fetches won't be too expensive. It would definitely be wise to profile this though.
If you use Join in your query you can select datas from differents tables who are related with foreign keys.
You can look of this from another angle: do you need absolutely updated information? (the moment someone enters new information it should appear)
If not, you can have a table holding the results of the query in the format you need (serving as cache), and update this table every 5 minutes or so. Then your query problem becomes trivial, as you can have the updates run as several updates in the background.