PHP extract data via SQL or PHP? - php

I'm working with data bases using PHP and ODBC driver. And I make a SQL query. Now I need to print the result, but only unique items. As I assume there're two ways: rebuild my query usind DISTINCT clause and rebuild the result array like this: $uniques = array_unique($result, SORT_REGULAR);
And now I'm confused about what way is more correct (in terms of data processing or execution time etc.)
Thanks.
UPD. I have a huge database, but the result could contain < 10 rows

Always the best is to optimize your SQL query. Using DISTINCT you can save the time for bringing the unnecessary records and no need to waste time by removing duplicate using PHP too.

For speed and memory efficiency, you want to return the minimum amount from the database without putting unnecessary rows for processing/memory efficiency. So, the distinct in this case is the better choice.

Related

Filter data from MySQL using WHERE or PHP?

In terms of performance, what's better between select all rows from a table and then filter the results with PHP, or filter the query directly using WHERE (multiple conditions) ?
Using where condition is the best choice because this query will run faster that the first one.
Indexes on the fields that appear in WHERE or GROUP BY or ORDER BY clauses are most of the time useful.
Loading all data before filter is better that loading the filtered data :)
Its better to fetch data using WHERE clause because it will boost your system performance in terms of time and load.
Because if you use "select * " then that will consume more time and after that you have to waste further time to use that records as per your needs. So again you have to write code for that.
Where is the better solution. If you would like to do it in php you have to load all data from database before you can filter the data.
In database you can add an index which makes the filtering performence better.
I think it also depends on how many time you need to call that mysql query to fetch result.
SO if you call that only once then yes WHERE clause is better solution
But if it require you to make multiple server call to fetch filterd data using mysql query then I think the first one approach will be better than this I.e Select all rows from a table and then filter the results with PHP
I hope it help you :)

Need faster PHP/MySQL search algorithm for highly complex calculations

I have a dilemma that I'm trying to solve right now. I have a table called "generic_pricing" that has over a million rows. It looks like this....
I have a list of 25000 parts that I need to get generic_pricing data for. Some parts have a CLEI, some have a partNumber, and some have both. For each of the 25000 parts, I need to search the generic_pricing table to find all rows that match either clei or partNumber.
Making matters more difficult is that I have to do matches based on substring searches. For example, one of my parts may have a CLEI of "IDX100AB01", but I need the results of a query like....
SELECT * FROM generic_pricing WHERE clei LIKE 'IDX100AB%';
Currently, my lengthy PHP code for finding these matches is using the following logic is to loop through the 25000 items. For each item, I use the query above on clei. If found, I use that row for my calculations. If not, I execute a similar query on partNumber to try to find the matches.
As you can imagine, this is very time consuming. And this has to be done for about 10 other tables similar to generic_pricing to run all of the calculations. The system is now bogging down and timing out trying to crunch all of this data. So now I'm trying to find a better way.
One thought I have is to just query the database one time to get all rows, and then use loops to find matches. But for 25000 items each having to compare against over a million rows, that just seems like it would take even longer.
Another thought I have is to get 2 associative arrays of all of the generic_pricing data. i.e. one array of all rows indexed by clei, and another all indexed by partNumber. But since I am looking for substrings, that won't work.
I'm at a loss here for an efficient way to handle this task. Is there anything that I'm overlooking to simplify this?
Do not query the db for all rows and sort them in your app. Will cause a lot more headaches.
Here are a few suggestions:
Use parameterized queries. This allows your db engine to compile the query once and use it multiple times. Otherwise it will have to optimize and compile the query each time.
Figure out a way to make in work. Instead of using like try ... left(clei,8) in ('IDX100AB','IDX100AC','IDX101AB'...)
Do the calculations/math on the db side. Build a stored proc which takes a list of part/clei numbers and outputs the same list with the computed prices. You'll have a lot more control of execution and a lot less network overhead. If not a stored proc, build a view.
Paginate. If this data is being displayed somewhere, switch to processing in batches of 100 or less.
Build a cheat sheet. If speed is an issue try precomputing prices into a separate table nightly, include some partial clei/part numbers if needed. Then use the precomputed lookup table.

Fastest way to count number of rows in MySQL database in Laravel?

I can think of a couple ways to count the number of rows in a table with Laravel (version 3).
DB::table('threads')->count();
Threads::count();
Threads::max('id');
DB::table('threads')->max('id);
DB::query('SELECT COUNT(*) FROM threads;');
Are any of these notably faster than the others? Is there any one fastest way to run this query? Later on it's going to be part of an expression: ceil(DB::table('threads')->count() / $threads_per_page); and it's executed on every page load so it's good to be optimized.
Database/table is MySQL and the InnoDB engine.
MAX(ID) is not the same as counting rows, so that rules out two of five alternatives.
And then it is your task to actually do a performance comparison between the remaining three methods to get the count. I'd think that actually executing an SQL statement directly might remove plenty of unnecessary ORM-layer overhead and be actually faster, but this would be premature optimization unless proven by facts.
DB::table('threads')->count();
Threads::count();
DB::query('SELECT COUNT(*) FROM threads;');
I was looking for the same thing.
These 3 results are exactly the same query I tested it (You can watch this with laravel debugbar).
Laravel perform "SELECT COUNT(*) as aggregate FROM threads";
It's already optimised with eloquent, but if you do ->get()->count() it's not optimised !
No performance difference with Threads::count();
Max('id') is totally different as it output the max id, it will never count the number of rows.
i dont think that it really that matter.. just be consistent in your code..
any way there is no need to run that query on evey page load.. use some caching to cache that number..

What's the best way to count MySQL records

I have a search engine on a shared host that uses MySQL. This search engine potentially has millions/trillions etc of records.
Each time a search is performed I return a count of the records that can then be used for pagination purposes.
The count tells you how many results there are in regard to the search performed. MySQL count is I believe considered quite slow.
Order of search queries:
Search executed and results returned
Count query executed
I don't perform a PHP count as this will be far slower in larger data sets.
Question is, do I need to worry about MySQL "count" and at what stage should I worry about it. How do the big search engines perform this task?
In almost all cases the answer is indexing. The larger your database gets the more important it is to have a well designed and optimized indexing strategy.
The importance of indexing on a large database can not be overstated.
You are absolutely right about not looping in code to count DB records. Your RDBMS is optimized for operations like that, your programming language is no. Wherever possible you want to do any sorting, grouping, counting, filtering operations within the SQL language provided by your RDBMS.
As for efficiently getting the count on a "paginated" query that uses a LIMIT clause, check out SQL_CALC_FOUND_ROWS.
SQL_CALC_FOUND_ROWS tells MySQL to calculate how many rows there would
be in the result set, disregarding any LIMIT clause. The number of
rows can then be retrieved with SELECT FOUND_ROWS(). See Section
11.13, “Information Functions”.
If MySQL database reaches several millions of records, that's a sign you'll be forced to stop using monolithic data store - meaning you'll have to split reads, writes and most likely use a different storage engine than the default one.
Once that happens, you'll stop using the actual count of the rows and you'll start using the estimate, cache the search results and so on in order to alleviate the work on the database. Even Google uses caching and displays an estimate of number of records.
Anyway, for now, you've got 2 options:
1 - Run 2 queries, one to retrieve the data and the other one where you use COUNT() to get the number of rows.
2 - Use SQL_CALC_FOUND_ROWS like #JohnFX suggested.
Percona has an article about what's faster, tho it might be outdated now.
The biggest problem you're facing is the way MySQL uses LIMIT OFFSET, which means you probably won't like your users using large offset numbers.
In case you indeed get millions of records - I don't forsee a bright future for your MySQL monolithic storage on a shared server. However, good luck to you and your project.
If I understand what you are trying to do properly, you can execute the one query, and perform the mysql_num_rows() function on the result in PHP... that should be pretty zippy.
http://php.net/manual/en/function.mysql-num-rows.php
Since you're using PHP, you could use the mysql_num_rows method to tell you the count after the query is done. See here: http://www.php.net/manual/en/function.mysql-num-rows.php

Is there a way to speed up this query with no WHERE clause?

I have about 1 million rows so its going pretty slow. Here's the query:
$sql = "SELECT `plays`,`year`,`month` FROM `game`";
I've looked up indexes but it only makes sense to me when there's a 'where' clause.
Any ideas?
Indexes can make a difference even without a WHERE clause depending on what other columns you have in your table. If the 3 columns you are selecting only make up a small proportion of the table contents a covering index on them could reduce the amount of pages that need to be scanned.
Not moving as much data around though, either by adding a WHERE clause or doing the processing in the database would be better if possible.
If you don't need all 1 million records, you can pull n records:
$sql = "SELECT `plays`,`year`,`month` FROM `game` LIMIT 0, 1000";
Where the first number is the offset (where to start from) and the second number is the number of rows. You might want to use ORDER BY too, if only pulling a select number of records.
You won't be able to make that query much faster, short of fetching the data from a memory cache instead of the db. Fetching a million rows takes time. If you need more speed, figure out if you can have the DB do some of the work, e.g. sum/group togehter things.
If you're not using all the rows, you should use the LIMIT clause in your SQL to fetch only a certain range of those million rows.
If you really need all the 1 million rows to build your output, there's not much you can do from the database side.
However you may want to cache the result on the application side, so that the next time you'd want to serve the same output, you can return the processed output from your cache.
The realistic answer is no. With no restrictions (ie. a WHERE clause or a LIMIT) on your query, then you're almost guaranteed a full table scan every time.
The only way to decrease the scan time would be to have less data (or perhaps a faster disk). It's possible that you could re-work your data to make your rows more efficient (CHARS instead of VARCHARS in some cases, TINYINTS instead of INTS, etc.), but you're really not going to see much of a speed difference with that kind of micro-optimization. Indexes are where it's at.
Generally if you're stuck with a case like this where you can't use indexes, but you have large tables, then it's the business logic that requires some re-working. Do you always need to select every record? Can you do some application-side caching? Can you fragment the data into smaller sets or tables, perhaps organized by day or month? Etc.

Categories