How do I select rows which IDs not in PHPs LARGE array?

How do I select rows which IDs not in PHPs LARGE array? - php

I need to solve the following task: I have a quite large array of IDs in PHP script and I need to select from MySQL DB all rows with IDs NOT IN this array.
There are several similar questions (How to find all records which are NOT in this array? (MySql)) and the most favourite answer is use NOT IN () construction with implode(',',$array) within a brackets.
And this worked... until my array gown up to 2007 IDs and about 20 kB (in my case) I've got a "MySQL server has gone away" error. As I can understand this is because of the lengthy query.
There are also some solutions to this problem like this:
SET GLOBAL max_allowed_packet=1073741824;
(just taken from this question).
Probably I could do it in this way, however now I doubt that NOT IN (implode) approach is a good one to a big arrays (I expect that in my case array can be up to 8000 IDs and 100 kB).
Is there any better solution for a big arrays?
Thanks!
EDIT 1
As a solution it is recommended to insert all IDs from array to a temporary table and than use JOIN to solve the initial task. This is clear. However I never used temporary tables and therefore I have some additional question (probably worth to be as a separate question but I decided to leave it here):
If I need to do this routine several times during one MySQL session, which approach will be better:
Each time I need to SELECT ID NOT IN PHP array I will create a NEW temporary table (all those tables will be deleted after MySQL connection termination - after my script will be terminated in fact).
I will create a temporary table and delete one after I made needed SELECT
I will TRNCATE a temporary table afterwards.
Which is the better? Or I missed something else?

In such cases it is usually better to create a temporary table and perform the query against it instead. It'd be something along the lines of:
CREATE TEMPORARY TABLE t1 (a int);
INSERT INTO t1 VALUES (1),(2),(3);
SELECT * FROM yourtable
LEFT JOIN t1 on (yourtable.id=t1.a)
WHERE t1.a IS NULL;
Of course INSERT statement should be constructed so that you'd insert all values from your array into the temporary table.
Edit: Inserting all values in a single INSERT statement would most probably lead into the same problem you already faced. Hence I'd suggest that you use a prepared statement that will be executed to insert the data into temporary table while you iterate through the PHP array.

I've once had to tackle this problem, but with a IN(id) WHERE Clause with approx 20,000-30,000 identifiers (indexes).
The way I got around this, with SELECT query, was that I reduced the number of filtered identifiers and increased the number of times I sent the same query, in order to extract the same data.
You could use array_chunk for PHP and divide 20,000 by 15, which would give you 15 separate SQL Calls, filtering records by 1500 identifiers (per call, you can divide more than 15 to reduce the number of identifiers further). But in your case, if you just divide 2007 idenitifers by 10 it would reduce the number of identifiers you're pushing to the database to 200 per SQL request, there are otherways to optimize this further with temporary tables and so fourth.
By dividing the number of indexes you're trying filter it will speed up each query, to run faster than if you were to send every index to the database in a single dump.

Related

Multiple sql statements or loops and conditions

I have two tables employee and attendance.
employee : empID, empName
attendance: attendanceID, empID, date, inTime, outTime
I need to show these data in a grid where employee name in the left side and then dates. So the column headers would be like Emp Name, 1,2,3,4....,30, With or without data, number of days in the month needs to be printed.
I realized three ways to do this.
Get attendance and employee data in a join query order by empID. Then loop through the data and print it if it is matching with current date.This will go until the empID change in current loop.
Loop through employees, then loop for days in the month, in every record get attendance from the database for particular employee and particular dates.
foreach($employees as $emp)
{
$empID = $emp['empID'];
for($day =1; $day<=$maxDaysInTheMonth $day++)
{
$attendance = getAttendanceFromDatabase($empID,$day);
}
}
To make performance better we try to minimize database connections and unnecessary loops. I like to implement the second way as it has minimum conditions and loops and code is clean. But it is making database retrieval for every employee, every day. Can someone pointout some facts for performance please.

Fetching records in a single query and looping through it is better. As it has to call database server a single time. For the second way - it has to call the database server multiple times which is more costlier.
Then make an associative array from the data. The index would be the empID.
After generating the array you can use it as you want.

Try this query
$sql="SELECT employee.empName AS empName, attendance.date AS date FROM employee,attendance WHERE employee.empID=attendance.empID";

As #Sougata suggest, Fetching records in a single query and looping through it is better. But keep in mind the query performance should be increased as follows:
Avoid Multiple Joins in a Single Query
Try to avoid writing a SQL query using multiple joins that includes outer joins, cross apply, outer apply and other complex sub queries. It reduces the choices for Optimizer to decide the join order and join type. Sometime, Optimizer is forced to use nested loop joins, irrespective of the performance consequences for queries with excessively complex cross apply or sub queries
Avoid Use of Non-correlated Scalar Sub Query
You can re-write your query to remove non-correlated scalar sub query as a separate query instead of part of the main query and store the output in a variable, which can be referred to in the main query or later part of the batch. This will give better options to Optimizer, which may help to return accurate cardinality estimates along with a better plan.
Creation and Use of Indexes
We are aware of the fact that Index can magically reduce the data retrieval time but have a reverse effect on DML operations, which may degrade query performance. With this fact, Indexing is a challenging task, but could help to improve SQL query performance and give you best query response time.
Create a Highly Selective Index
Selectivity define the percentage of qualifying rows in the table (qualifying number of rows/total number of rows). If the ratio of the qualifying number of rows to the total number of rows is low, the index is highly selective and is most useful. A non-clustered index is most useful if the ratio is around 5% or less, which means if the index can eliminate 95% of the rows from consideration. If index is returning more than 5% of the rows in a table, it probably will not be used; either a different index will be chosen or created or the table will be scanned.
Position a Column in an Index
Order or position of a column in an index also plays a vital role to improve SQL query performance. An index can help to improve the SQL query performance if the criteria of the query matches the columns that are left most in the index key. As a best practice, most selective columns should be placed leftmost in the key of a non-clustered index.

PHP & MySQL web app - Selecting a single field (vs) select * from table

I am working on converting a prototype web application into something that can be deployed. There are some locations where the prototype has queries that select all the fields from a table although only one field is needed or the query is just being used for checking the existence of the record. Most of the cases are single row queries.
I'm considering changing these queries to queries that only get what is really relevant, i.e.:
select * from users_table where <some condition>
vs
select name from users_table where <some condition>
I have a few questions:
Is this a worthy optimization in general?
In which kind of queries might this change be particularly good? For example, would this improve queries where joins are involved?
Besides the SQL impact, would this change be good at the PHP level? For example, the returned array will be smaller (a single column vs multiple columns with data).
Thanks for your comments.

If I were to answer all of your three questions in a single word, I would definitely say YES.

You probably wanted more than just "Yes"...
SELECT * is "bad practice": If you read the results into a PHP non-associative array; then add a column; now the array subscripts are possibly changed.
If the WHERE is complex enough, or you have GROUP BY or ORDER BY, and the optimizer decides to build a tmp table, then * may lead to several inefficiencies: having to use MyISAM instead of MEMORY; the tmp table will be bulkier; etc.
EXISTS SELECT * FROM ... comes back with 0 or 1 -- even simpler.
You may be able to combine EXISTS (or a suitable equivalent JOIN) to other queries, thereby avoiding an extra roundtrip to the server.

How to Improve Select Query Performance For Large Data in Mysql

Currently,I am working on one php project. for my project extension,i needed to add more data in mysql database.but,i had to add datas in only one particular table and the datas are added.now,that table size is 610.1 MB and number of rows is 34,91,534.one more thing 22 distinct record is in that table,one distinct record is having 17,00,000 of data and one more is having 8,00,000 of data.
After that i have been trying to run SELECT statement it is taking more time(6.890 sec) to execute.in that table possible number of columns is having index.even though it is taking more time.
I tried two things for fast retrieval process
1.stored procedure with possible table column index.
2.partitions.
Again,both also took more time to execute SELECT query against some distinct record which is having more number of rows.any one can you please suggest me better alternative for my problem or let me know, if i did any mistake earlier which i had tried.

When working with a large amount of rows like you do, you should be careful of heavy complex nested select statements. With each iteration of nested selects it uses more resources to get to the results you want.
If you are using something like:
SELECT DISTINCT column FROM table
WHERE condition
and it is still taking long to execute even if you have indexes and partitions going then it might be physical resources.
Tune your structure and then tune your code.
Hope this helps.

PHP-MySQL merge many query into one query to execution fast

I have PHP script to add new record and check this record in table1,table2 and table3 if record not exist than add it into table3 else update the record to table1 or table2 (where its exist).
I have large data to check. So its possible to perform this task using single MySQL query.
Thanks in advance.

Please keep in mind, that joining two large tables may be a lot slower than using 2 or 3 separate query to get data out of them one by one. The main question is what you consider huge. Joining millions of rows is never a good idea in MySQL AFAIK if you have large rows.
So while having it done in one query is definitely possible it may not be the economical thing to do.
We also need some info about row sizes, indexes, basic query syntax and stuff like that.

How do I speed up a SQL UPDATE that also contains a JOIN on 25 million rows

the query i'd like to speed up (or replace with another process):
UPDATE en_pages, keywords
SET en_pages.keyword = keywords.keyword
WHERE en_pages.keyword_id = keywords.id
table en_pages has the proper structure but only has non-unique page_ids and keyword_ids in it. i'm trying to add the actual keywords(strings) to this table where they match keyword_ids. there are 25 million rows in table en_pages that need updating.
i'm adding the keywords so that this one table can be queried in real time and return keywords (the join is obviously too slow for "real time").
we apply this query (and some others) to sub units of our larger dataset. we do this frequently to create custom interfaces for specific sub units of our data for different user groups (sorry if that's confusing).
this all works fine if you give it an hour to run, but i'm trying to speed it up.
is there a better way to do this that would be faster using php and/or mysql?

I actually don't think you can speed up the process.
You can still add brutal power to your database by cluserting new servers.

Maybe I'm wrong or missunderstood the question but...
Couldn't you use TRIGGERS ?
Like... when a new INSERT is detected on "en_pages", doing a UPDATE after on that same row?
(I don't know how frequent INSERTS are in that table)
This is just an idea.
How often does "en_pages.keyword" and "en_pages.keyword_id" changes after being inserted ?!?!?

I don't know about mySQL but usually this sort of thing runs faster in SQL Server if you process a limited number of batches of records (say a 1000) at a time in a loop.
You might also consider a where clause (I don't know what mySQL uses for "not equal to" so I used the SQL Server verion):
WHERE en_pages.keyword <> keywords.keyword
That way you are only updating records that have a difference in the field you are updating not all of the them.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.