Comparing huge data from 2 tables - php

I'm using PHP and under localhost using Wamp. I have 2 tables from my database which has a huge data almost 13,000 plus each table. I want to check if NameFromA from TableA exist in NameFromB from TableB. I have this code working when I try to use small amount of data around 100 data.
SELECT * FROM TableA WHERE EXISTS (SELECT * FROM TableB WHERE NameFromA = NameFromB)
My problem is when I try running it and comparing the 13,000 plus data nothing happens. It has no output.

Create index on column NameFromA in TableA and NameFromB in TableB.
Try below query:
SELECT a.*
FROM TableA AS a, TableB AS b
WHERE a.NameFromA = b.NameFromB

Try this:
SELECT
TableA.*
FROM
TableA
INNER JOIN TableB ON (TableA.NameFromA = TableB.NameFromB)
If it is still slow, may be you have problem in the DB or the timeout time is too short.
You can also try to run this in MySql management too to see how long that query will take.
One more thing. The data that you retreive must be send to web server. If your table contains a lot of data for each row and there are a lot of rows this also will take time.
EDIT:
Ok #mar. You have to make a little troubleshooting to find where is the problem.
First: if your PC is powerfull enought. Because for normal PC and Server 13k records are nothing.
Second: are you sure that there are at least one name from table A which is presented in table B?
Third: try to run query in external SQL tool - not in php. If it retunrs correct set quickly, then the problem is in php.

Related

Lost connection after simple query for a big table

I am running a complex LEFT JOIN query of two tables.
Table A - 1.6 million rows
Table B - 700k rows.
All columns are indexed.
I tried different debuggings but had no success on finding the problem since I guess that's not too many data.
Anyway I found out that there is no problem if I remove the 'WHERE' clause in my query
But when I try this simple query on table A - it returns "Lost connection".
SELECT id FROM table_A ORDER BY id LIMIT 10
What is the best practice to run this query? I don't wish to exceed the timeout.
Are my tables too big and should I "empty" the old data or something?
How do you handle big tables with millions of rows and JOINS? All I know that can help is indexing, and I've already done that.
A million rows -- not a problem; a billion rows -- then it gets interesting. Your tables are not "too big".
"All columns are indexed." -- Usually a mistake. We need to see the actual query before commenting on what index(es) would be useful.
Possibly you need a "composite" index.
SELECT id FROM table_A ORDER BY id LIMIT 10 -- If there is an index starting with id, that will return nearly instantly. Please provide SHOW CREATE TABLE table_A so we can see the schema.

SQL select all files where a value in table A is the same as in table B (same database)

I'm building a sales system for a company, but I'm stuck with the following issue.
Every day I load .XML productfeed into a database called items. The rows in the productfeed are never in the same order, so sometimes the row with Referentie = 380083 is at the very top, and the other day that very same row is at the very bottum.
I also have to get all the instock value, but when I run the following query
SELECT `instock` FROM SomeTable WHERE `id` > 0
I get all values, but not in the same order as in the other table.
So I have to get the instock value of all rows where referentie in table A is the same as it is in table B.
I already have this query:
select * from `16-11-23 wed 09:37` where `referentie` LIKE '4210310AS'
and this query does the right job, but I have like 500 rows in the table.
So I need to find a way to automate the: LIKE '4210310AS' bit, so it selects all 500 values in one go.
Can somebody tell me how that can be done?
I'm not even sure I understand your problem...
Don't take this personally, but you seem to be concerned/confused by the ordering of the data in the tables which suggests to me your understanding of relational databases and SQL is lacking. I suggest you brush up on the basics.
Can't you just use the following query?
SELECT a.referentie
, b.instock
FROM tableA a
, tableB b
WHERE b.referentie = a.referentie

PHP array_diff VS mysql NOT IN

I tried to compare two zipcode columns between two tables to see if values were missing in the second one.
I first wanted to do it with mysql, my query was something like
'SELECT code FROM t1 WHERE t1 NOT IN (select code FROM t2)'
But it was really slow so I tried another way :
I made two select, and then compared the results with array_diff().
With mysql : few minutes, and sometimes crash
With PHP : less than 1 second.
Can someone explain these differences ?
Is my SQL query wrong ?
If your main table has 50k rows, using a sub select in your query will result into 1 + 50k executions of selects. One for the first table, and 50k selects, one for each row. The server compares the row with your sub select that is reloaded every time iterating the main table. This is why your sql code takes its time and it also may be a huge memory problem as well.
See serjoschas information about joins to fix it in sql, it should be even faster that your php solution.
Checking which values are missing within a table (compared to another) can easily be done with a LEFT or RIGHT JOIN they are just made for actions like this.. alternatively take a look at this: How to Find Missing Value Between Two Mysql Tables – serjoscha
One solution to:
SELECT code FROM t1
WHERE code NOT IN ( SELECT code FROM t2 )
will be:
SELECT t1.code
FROM t1
LEFT JOIN t2
ON t1.code = t2.code
WHERE t2.code is null
Have a try. Also have a look on indexing as Cyclone suggests:
If you don't have an index you should definitly add one since this will speed up your query. You could add an index like this: ALTER TABLE ADD INDEX code_idx (code) this should be done for both tables. If you then were to execute EXPLAIN for the query you would see something like Using where; Using index; Using join buffer which is good – Cyclone
Indexing speeds up your query. If the table only provides one column, searching an index table with the same content as the source table will be exactly the same and redundant. Otherwise I strongly recommend indexing the code column of t2 which leads to a high increase of performance and less memory consumtion.

Correlated subquery where data is not in a table

I am using MySql and have a situation which is a lot like a correlated subquery except that the data in the inner query is not in the database - but in a PHP session.
If all the data were in the database, the query would look something like this:
SELECT * FROM tableA WHERE (tableA.valueA && (
SELECT valueB FROM tableB WHERE tableB.id = tableA.id));
My problem is that I have no tableB. Instead I have a PHP array. How can I inject the array into the query? Should I attempt to create a temporary table somewhere? Or perhaps I should be trying to declare the array as a variable?
The information in the PHP array is specific to each user and changes rapidly. Also, there will be lots of queries so performance is a consideration.
See comments from Wrikken above - I have made this a temporary table and it seems to be a fine solution.

MySQL PHP | "SELECT FROM table" using "alphanumeric"-UUID. Speed vs. Indexed Integer / Indexed Char

At the moment, I select rows from 'table01 and table02' using:
SELECT t1.*,t2.* FROM table01 AS t1
INNER JOIN table02 AS t2 ON (t1.ID = t2.t1ID)
WHERE t1.UUID = 'whatever';
The UUID column is a unique index, type: char(15), with alphanumeric input. I know this isn't the fastest way to select data from the database, but the UUID is the only row-identifier that is available to the front-end.
Since I have to select by UUID, and not ID, I need to know what of these two options I should go for, if say the table consists of 100'000 rows. What speed differences would I look at, and would the index for the UUID grow to large, and lag the DB?
Get the ID before doing the "big" select
1. $id = SELECT ID FROM table01 WHERE UUID = '{alphanumeric character}';
2. SELECT t1.*,t2.* FROM table01 AS t1
INNER JOIN table02 AS t2 ON (t1.ID = t2.t1ID)
WHERE t1.ID = $id;
Or keep it the way it is now, using the UUID.
2. SELECT t1.*,t2.* FROM table01 AS t1
INNER JOIN table02 AS t2 ON (t1.ID = t2.t1ID)
WHERE t1.UUID = 'whatever';
Side note: All new rows are created by checking if the system generated uniqueid exists before trying to insert a new row. Keeping the column always unique.
Why not just try it out? Create a new db with those tables. Write a quick php script to populate the tables with more records than you can imagine being stored (if you're expecting 100k rows, insert 10 million). Then experiment with different indexes and queries (remember, EXPLAIN is your friend)...
When you finally get something you think works, put the query into a script on a webserver and hit it with ab (Apache Bench). You can watch what happens as you increase the concurrency of the requests (1 at a time, 2 at a time, 10 at a time, etc).
All this shouldn't take too long (maybe a few hours at most), but it will give you a FAR better answer than anyone at SO could for your specific problem (as we don't know your DB server config, exact schema, memory limits, etc)...
The second solution have the best performance. You will need to look up the row by the UUID in both solutions, but in the first solution you first do it by UUID, and then do a faster lookup by primary key, but then you've already found the right row by UUID so it doesn't matter that the second lookup is faster because the second lookup is unnecessary altogether.

Categories