Euclidean Distance Calculation in C - php

I tried to calculate Euclidean distance in PHP using the following code.But the time it takes is very long. I want to test if I perform the same operation in C if it will be faster. The input datas should be passed from php whereas all other datas are stored in the mysql database. How can I make the operation fast as I have to calculate the distance of 30,000+ images having about 900 attributes each. So how can I make this calculation faster in C than in PHP? I have not programmed in C alot so any suggestion will be highly appreciated.
The query used in PHP for the distance calculation can be summarized as below:
SELECT tbl_img.img_id,
tbl_img.img_path,
((pow(($r[9]-coarsewt_1),2))+(pow(($r[11]-coarsewt_2),2))+ ... +(pow(($r[31]-coarsewt_12),2))+
(pow(($r[36]-finewt_$wt1),2))+(pow(($r[38]-finewt_$wt2),2))+(pow(($r[40]-finewt_$wt3),2))+
(pow(($r[43]-shape_1),2))+(pow(($r[44]-shape_2),2))+ ... +(pow(($r[462]-shape_420),2))+
(pow(($r[465]-texture_1),2))+(pow(($r[466]-texture_2),2))+ ... +(pow(($r[883]-texture_419),2))+(pow(($r[884]-texture_420),2)))
as distance
FROM tbl_img
INNER JOIN tbl_coarsewt
ON tbl_img.img_id=tbl_coarsewt.img_id
INNER JOIN tbl_finewt
ON tbl_img.img_id=tbl_finewt.img_id
INNER JOIN tbl_shape
ON tbl_img.img_id=tbl_shape.img_id
INNER JOIN tbl_texture
ON tbl_img.img_id=tbl_texture.img_id
WHERE tbl_img.img_id>=1 AND tbl_img.img_id<=31930
ORDER BY distance ASC LIMIT 6

Your problem is not with the language, as Arash Kordi put it. That SQL is going to be executed by your SQL server, and thanks to the algorithm used, that server going to be your bottleneck, not the language your script is written in. If you switch to C, you won't gain any significant speed, unless you also change your strategy.
Basic rules of thumb for optimization:
Don't use database for your calculations. Use database to fetch relevant data and then carry out calculations in PHP or C.
(Pre-calculated?) Look-up arrays: Analyze your data and see if you can build a look-up array of -- say -- pow() results instead of calculating each value again each time. This is helpful if you have lot of repetitive data.
Avoid serialization -- Could you run multiple instances of your script parallel on different sections of your data to maximize throughput?
Consider using server-side prepared statements -- they may speed things up a little.

Related

Efficiency operating in MySQL or PHP?

I have a doubt.
I have a big query to get all the products of a web store, then I proccess the data to a CSV to synchronize to another external server, in this case DooFinder.
Now I am doing the proccess in the query. Example:
- round((p.p_price*(SELECT tax_rate FROM tax_rates WHERE tax_rates.tax_rates_id=p.p_tax_class_id)/100)+p.p_price,2)
- concat('https://www.domain.de/pimg/',p.p_image) AS image, p.manufacturers_id
And the question is: What will be more efficient? Make the operations in the query or in PHP? Now I have over 20 products in a test site and it works perfect, but the objective is have +1.000 products.
This is the query ($i is for each language, so +1.000 products * number of languages):
SELECT
pd.p_name,
p.p_quantity,
concat('https://www.domain.de/pinf.php?products_id=',p.products_id) AS p_link,
pd.p_description,
p.p_id,
p.p_tax_class_id,
p.p_date_available ,
round((p.p_price*(SELECT tax_rate FROM tax_rates WHERE tax_rates.tax_rates_id=p.p_tax_class_id)/100)+p.p_price,2) AS price,
concat('https://www.domain.de/pimg/',p.p_image) AS image, p.manufacturers_id
FROM
products p,
p_description pd,
p_to_categories ptc
WHERE
p.p_id = pd.p_id
AND
pd.language_id = ".$i."
AND
p.p_status=1
AND
ptc.p_id = p.p_id
AND
ptc.categories_id != 218
GROUP by p.p_id
When deciding whether to perform the computations on the client or on the database server, you should consider the following:
Which one has more spare CPU cycles?
Will making the client do the computation require additional data transfer from the server? If doing it on the server reduces the data transfer, it may be worthwhile to make the database CPU work a bit harder.
Can you reduce the data transfer by making the client compute the result? E.g. suppose the client displays 100 computations based on one column. In that case it makes sense to fetch just that column instead of all of the computations based on it.
In your specific example, the overhead of extra computations is going to be marginal relative to the rest of the query, so if you already have it coded to do it on the server, I would leave it like that. At the same time, if you are doing it on the client, it is not that bad either, so I would still not bother refactoring.
However, there is one thing that I would fix - rewrite the tax rate sub-select as a join. This one would have more overhead than any of the computations even if properly optimized by the MySQL optimizer.
And, of course, as was suggested in the comments, benchmark your performance and make decisions based on those measurements. Proper benchmarking aside from the obvious has a side benefit of helping find odd performance and even functionality bugs that otherwise would have been found by your users at the worst possible time.

MYSQL Query optimization, comparing 3 tables w/ thousands of records

i have this query:
SELECT L.sku,L.desc1,M.map,T.retail FROM listing L INNER JOIN moto M ON L.sku=M.sku INNER JOIN truck T ON L.sku=T.sku LIMIT 5;
Each table (listing,moto,truck) has ~300.000 rows, and just for testing purppose i've set a LIMIT of 5 results, at the end i will need hundreds but let see...
That query takes like 3:26 minutes in Console...i wont imagine how much it will take with PHP...i need to handle it there
Any advice/solution to Optmize the query? Thanks!
Two things to recommend here:
Indexes
Denormalization
One thing people tend to do when databases get massive is invoke Denormalization. This is when you store the data from multiple tables in one table to prevent the need to do a join. This is useful if your application relies on specific reads to power it. It is a commonly used tactic when scaling.
If Denormalization is out of the question, another, simpler way to optimize this query would be to make sure you have indexes on the columns you are running the join against. So the columns L.sku, m.sku,T.sku would need to be indexed, you will immediately notice an increase in performance.
Any other optimizations I would need some more information about the data, hope it helps!

Multiple SELECTs vs Single Query with JOIN

Our current setup looks a bit like this.
public_entry (5.000.000 rows) → telephone_number (5.000.000 rows) → user (400.000 rows)
3 tables, the arrow to the right indicating a foreign key constraint containing a foreign key (integer) from the right table.
Now we have two "views" of the data we want to present in our web app.
displaying telephone numbers with public entries based on user attributes (e.g. only numbers from male users), a bit like a score.
displaying telephone numers with public entries based on their entry date
Each result should get a score assigned whether the number fits your needs (e.g. you look for a plumber, if the number is in you area an the related user is a plumber the telephone number should score high).
We tried several approaches on solving this problem with two scenarios.
The first approach does a SELECT with INNER JOINs over the table, like the following
SELECT ..., (...) as score
FROM public_entry pe
INNER JOIN telephone_numer tn ON tn.id = pe.numberid
INNER JOIN user u ON u.id = tn.userid WHERE ... ORDER BY score
using this query on smaller system, 1/4 of the production system performs very very well, even under load.
However when we put this query in the production system it wrecked havoc with execution times over 30 seconds.
The second approach was getting all public_entries filtered with a single SELECT on public_entry without any JOINs and iterating over them an calling a SELECT for each public_entry fetching the telephone_number and user, computing the score and discarding the results if telephone_number and user do not match our filter/interest.
Usually the second approach is never considered, because it creates over 300 queries for a single page load. Foreach'ing over results and calling SELECTs within a foreach is usually considered bad style.
However approach number two performs on the production system. Not well but does not tak more tahn 1-3 seconds, but also performs bad on the test systems.
Do you have any suggestions on where the problem might be?
EDIT:
Query
SELECT COUNT(p.id)
FROM public_entry p, fon f, user u
WHERE p.isweb = 1
AND f.hidden = 0
AND f.deleted = 0
AND f.id = p.fonid
AND u.id = f.userid
AND u.gender = "female"
This query has 3 seconds execution time.
This is just an example query. I can take out the where and it performs just a bit worse. In general if we do a SELECT COUNT() with a single INNER JOIN over the data the query blows up (30 seconds)
I don't have the magic answer you want, but here are some 'reasons' for poor performance, and some possible workarounds (with caveats).
Which of isweb, hidden, deleted, and gender are the most 'selective'? This optimizer sees them as useless and annoying. That is, if each has two values and an INDEX on that field is probably useless. Hence, it picks one table, does a full scan, then reaches into the next table, etc. Notice, in the EXPLAIN that it picked the smallest table (user) first. This is typically what the optimizer does when none of the WHERE clause looks useful.
Whether MySQL does all that work, or you do all that work is about the same amount of effort. Perhaps you can do it faster since you can have a simple associative arrays in memory, while MySQL is coded to allow for the tables to live on disk an be "cached" in RAM, block by block. But, if you don't have enough RAM to load everything in, you are stuck with MySQL.
If you actually removed "hidden" and "deleted" rows, the task would be a little faster.
Your two SELECTs do not look much alike. Are you suggesting there is a wide range of SELECTs? And you effectively need to look through most of all 3 tables to get the "score" or "count"?
Let's look at this from a Data Warehouse approach... Is some of the data "static"; that is, unchanging and could be summarized? If so, precomputing subtotals (COUNT(*)) into a summary table would let the ultimate queries be a lot faster. DW often involves subtotals by day. But it requires that these subtotals don't change.
COUNT(x) has the overhead of checking x for being NULL. Usually that is not necessary and COUNT(*) gives you what you want.
How often are you running the same SELECT? Or, at least, similar SELECTs? Do you need up-to-the-second scores? I'm fishing for running all the likely queries in the middle of the night, then using the results for 24 hours. Note that some queries can run faster by doing multiple things at once. For example, instead of two SELECTs for 'female' versus 'male', do one SELECT and GROUP BY gender.

Speeding up responses from a database query

I am running a select * from table order by date desc query using php on a mysql db server, where the table has a lot of records, which slows down the response time.
So, is there any way to speed up the response. If indexing is the answer, what all columns should I make indexes.
An index speeds up searching when you have a WHERE clause or do a JOIN with fields you have indexed. In your case you don't do that: You select all entries in the table. So using an index won't help you.
Are you sure you need all of the data in that table? When you later filter, search or aggregate this data in PHP, you should look into ways to do that in SQL so that the database sends less data to PHP.
you need to use caching system.
the best i know Memcache It's really great to speed up your application and it's not using database at all.
Simple answer: you can't speed anything up using software.
Reason: you're selecting entire contents of a table and you said it's a large table.
What you could do is cache the data, but not using Memcache because it's got a limit on how much data it can cache (1 MB per key), so if your data exceeds that - good luck using Memcache to cache a huge result set without coming up with an efficient scheme of maintaining keys and values.
Indexing won't help because you haven't got a WHERE clause, what could happen is that you can speed up the order by clause slightly. Use EXPLAIN EXTENDED before your query to see how much time is being spent in transmitting the data over the network and how much time is being spent in retrieving and sorting the data from the query.
If your application requires a lot of data in order for it to work, then you have these options:
Get a better server that can push the data faster
Redesign your application because if it requires so much data in order to run, it might not be designed with efficiency in mind
Optimizing Query is a big topic and beyond the scope this question
here are some highlight that will boost you select statement
Use proper Index
Limit the number records
use the column name that you require (instead writing select * from table use select col1, col2 from table)
to limit query for large offset is little tricky in mysql
this select statement for large offset will be slow because it have to process large set of data
SELECT * FROM table order by whatever LIMIT m, n;
to optimize this query here is simple solution
select A.* from table A
inner join (select id from table order by whatever limit m, n) B
on A.id = B.id
order by A.whatever

Which of the following SQL queries would be faster? A join on two tables or successive queries?

I have two tables here:
ITEMS
ID| DETAILS| .....| OWNER
USERS:
ID| NAME|....
Where ITEMS.OWNER = USERS.ID
I'm listing the items out with their respective owners names. For this I could use a join on both tables or I could select all the ITEMS and loop through them making a sql query to retrieve the tuple of that itmes owner. Thats like:
1 sql with a JOIN
versus
1x20 single table sql queries
Which would be a better apporach to take in terms of speed?
Thanks
Of course a JOIN will be faster.
Making 20 queries will imply:
Parsing them 20 times
Making 20 index seeks to find the start of the index range on items
Returning 20 recordsets (each with its own metadata).
Every query has overhead. If you can do something with one query, it's (almost) always better to do it with one query. And most database engines are smarter than you. Even if it's better to split a query in some way, the database will find out himself.
An example of overhead: if you perform 100 queries, there will be a lot more traffic between your application and your webserver.
In general, if you really want to know something about performance, benchmark the various approaches, measure the parameters you're interested in and make a decision based on the results of the becnhmark.
Good luck!
Executing a join will be much quicker as well as a better practice
I join would be a lot quicker than performing another query on the child table for each record in the parent table.
You can also enable performance data in SQL to see the results for yourself..
http://wraithnath.blogspot.com/2011/01/getting-performance-data-from-sql.html
N

Categories