Longest Prefix between two MySQL Tables - php

I have a MySQL database with 2 tables:
Table A:
Number
Location
Table B:
Calling Code
Area Code
Location
Initially, I have about 60,000 entries in table A, which has the Location column empty at the beginning. In table B I have about 250,000+ entries with a lot of area codes, calling codes (1, 011) and their respective location in the world. What I want is a FAST way of populating the table A's location column with the location of the number.
So for example if the first entry in Table A is (17324765600, null) I want to read trough table B and get the location for that number. Right now I am getting the location of a number with this query:
SELECT b.location
FROM
tableB b
LEFT JOIN tableA a
ON a.number LIKE CONCAT(b.calling_code, b.code, '%')
ORDER BY CHAR_LENGTH(b.code) DESC
LIMIT 1;
That gives me the proper location (even though I have my doubts that it can fail..). The problem is that performance wise this method is a no go. If I loop over all the 50k number
Update 1
Allow me to put some sample data with the expected output:
Sample Table A:
number location
17324765600 NULL
01134933638950 NULL
0114008203800 NULL
…60k Records + at the moment..
Sample Table B:
calling_code code location
1 7324765 US-NJ
011 34933 Spain
011 400820 China
…250,000+ records at the moment
Expected output after the processing:
Table A:
number location
17324765600 US-NJ
01134933638950 Spain
0114008203800 China
The best I’ve come up with is the following update statement:
UPDATE tableA a JOIN tableB b ON a.location LIKE CONCAT(b.calling_code, b.code, '%') SET a.location = b.location
Of course here I am not sure if it will always return the longest prefix of the code, for example if in the above tables there was another code starting with 73247XX let’s say that code is for Iowa (just as an example).. I am not sure if the query will always return the longest code so here I would also need help.
Let me know if the samples help.
.SQL for the database structure:
Download
Update 2:
I am thinking on doing this the following way:
Before inserting the data in table A I am thinking of exporting Table B into a CSV and sort it by area code, that way I can have 2 pointers one for the array of entries for table A and one for the csv, both sorted by area code that way I can make a kind of parallel search and populate the entry's location on PHP and not having to do this in MySQL.
Let me know if this approach seems like a better option if so I will test it out and publish the answer.

If you want all locations, then you need to remove LIMIT
SELECT b.location
FROM
tableB b
LEFT JOIN tableA a
ON a.number LIKE CONCAT(b.calling_code, b.code, '%')
ORDER BY CHAR_LENGTH(b.code);
If you want the same location name should not come twice then you need to use GROUP BY
SELECT b.location
FROM
tableB b
LEFT JOIN tableA a
ON a.number LIKE CONCAT(b.calling_code, b.code, '%')
GROUP BY b.location ORDER BY CHAR_LENGTH(b.code) ;

You have one join only with 250000 records, its not so stressful. You should take proper indexing for search columns and fine tune your mysql server. A good indexing & server variables well to set will solve your problem easily. Optimize your query well.Generally it creates problems when we have much of joins & many string comparison.
I think you need the query like this-
UPDATE a SET a.location = (
SELECT location from b
WHERE a.number LIKE CONCAT(b.calling_code, b.area_code, '%')
ORDER BY LENGTH(CONCAT(b.calling_code, b.area_code, '%')) desc
limit 1
);

I decided to take the below approach since I did not received any clear response:
Prior to the process I prepared 2 new tables, a table for country codes and a table for state codes (since I also need to know the state in case the number is within the US). Both tables will have: country, state, calling_code, code …
As for these 2 tables I broke down all the numbers with the prefixes and grouped them by area code so instead of having full 6 numbers to identify a country/state I grouped them by the first 3 numbers and if the code is within the USA or not, hence the 2 tables.
With this modifications I was able to break the 250,000 + rows table to only about 300 rows (each table).
After this I will follow these steps:
I get the list of phone numbers
I first execute a query very similar as the one I posted to update all the numbers that belong to the country_code table
I then update the rows that are still without location assigned with the table of state_code
I had to put some kind of cron in order to get this done every x amount of time to avoid having a huge amount of phones.
This may not be the best approach but for the 50k numbers that are in place at the moment I was able to (manually executing query by query with some more polishing) get it down to about 10 seconds, executing this every x amount of time (which will allow performing this process to less than 10k numbers) will make this smoothly.
I will mark this as the answer but if someone else magically comes up with a better answer I will make sure to update this.
Divide and conquer!

Related

Query selected columns from two tables with a Condition clause

I have two tables-
1) ****Company_Form****
[Contract_No#,Software_Name,Company_Name,Vendor_Code]
2) ****User_Form****
[Contract_No,Invoice_No#,Invoice_Date,Invoice_Amount,Invoice_Submit_Date]
Fields denoted with # and bold are primary keys.
=>The user has to enter a software name for which he wants to get the data of.
=>I have to structure a query in which I have to display the result in the following form:
[Contract#,Software_Name,Company_Name,Invoice_No,Invoice_Date,Invoice_Submission_Date]
Now,
one Contract_No can contain many Invoice_no under its name in
the User Form table.
One Contract_No can occur one time only in
Company_Form table
The retrieved records have to be group by the latest Invoice_Date
I came to the logic that:
I have to first retrieve all the contract numbers with that software
name from Company_Form table.
I have to query that contract number from User_Form table and display
the data for each matched contract no. fetched from Company_Form
table.
The problem is that I am unable to structure a query in SQL that can do the task for me.
Kindly help me in formulating the query.
[PS] I am using SQL with PHP.
I tried a query like:
I tried one approach as :
SELECT a.ContractNo,a.SoftwareName,a.CompanyName,b.InvoiceNo,b.InvoiceDate,b.InvAmount,b.InvoiceSubmitDate
FROM Company_Form as a,User_Form as b
WHERE b.ContractNo IN(SELECT ContractNo FROM Company_Form WHERE
SoftwareName='$Sname') AND a.ContractNo=b.ContractNo;
But I am getting a error that sub query returns more than 1 row.
Can I get help from this?
I am assuming you are attempting to find the most recent price of the user selected software and its corresponding invoice. Here is an approach to do this. If this is tested to your satisfaction, I can add necessary explanation.
select uf.Contract_No#,
cf.Software_Name,
cf.Company_Name,
uf.Invoice_No#,
uf.Invoice_Date,
uf.Invoice_Amount,
uf.Invoice_Submit_Date
from User_Form uf
inner join (
-- Most recent sale of software
select Contract_No#, max(Invoice_Date)
from User_Form
group by Contract_No#
) latest
on (
-- Filter via join for latest match records
uf.Contract_No# = latest.Contract_No#
and uf.Invoice_Date = latest.Invoice_Date
)
inner join Company_Form cf
on cf.Contract_No# = uf.Contract_No#
where cf.Software_name = :software_name
If the requirement allows your sub query to return more than one row, I would suggest you to use IN instead of = in the where clause of your main query.
Please note that I have just looked at the query and have not fully understood the requirements.
Thanks.
I worked around for some time and finally came to the following query which works like a charm
SELECT a.ContractNo,a.SoftwareName,a.CompanyName,b.InvoiceNo,b.InvoiceDate,b.InvAmount,b.ISD
FROM Company_Form as a,User_Form as b
WHERE b.ContractNo IN (SELECT ContractNo FROM Company_Form WHERE SoftwareName='$Sname')
AND a.ContractNo=b.ContractNo;
If anybody needs help in understanding the logic of this query,feel free to comment below.

SQL select all files where a value in table A is the same as in table B (same database)

I'm building a sales system for a company, but I'm stuck with the following issue.
Every day I load .XML productfeed into a database called items. The rows in the productfeed are never in the same order, so sometimes the row with Referentie = 380083 is at the very top, and the other day that very same row is at the very bottum.
I also have to get all the instock value, but when I run the following query
SELECT `instock` FROM SomeTable WHERE `id` > 0
I get all values, but not in the same order as in the other table.
So I have to get the instock value of all rows where referentie in table A is the same as it is in table B.
I already have this query:
select * from `16-11-23 wed 09:37` where `referentie` LIKE '4210310AS'
and this query does the right job, but I have like 500 rows in the table.
So I need to find a way to automate the: LIKE '4210310AS' bit, so it selects all 500 values in one go.
Can somebody tell me how that can be done?
I'm not even sure I understand your problem...
Don't take this personally, but you seem to be concerned/confused by the ordering of the data in the tables which suggests to me your understanding of relational databases and SQL is lacking. I suggest you brush up on the basics.
Can't you just use the following query?
SELECT a.referentie
, b.instock
FROM tableA a
, tableB b
WHERE b.referentie = a.referentie

INNER JOIN too slow. how can it bee quicker

My code
SELECT * FROM andmed3 INNER JOIN test ON andmed3.isik like concat('%', test.isik, '%')
In andmed3 i have 130 000 rows and on test i have 10 000 rows, and it wont run.
When i limit it to 0,500 then it will query about 2-3 minutes.
How can it be better?
andmed3 table
id name number isik link stat else
-----------------------------------------------
1 john 15 1233213 none 11 5
8455666
7884555
test table
id isik
-----------
45 8455666
So i need all the rows from the andmed3 where is number what occures in test
The problem is the engine ill need to avalute the LIKE expression for each pair of rows in the join (130.000 X 10.000).
Also indexes are useless in this scenario because the expression need to be evaluated in order to accomplish the join (and you cannot put that expression INSIDE a index)
Maybe it's your architecture/schema the problem. When no one antecipated the need to join two tables based in a string expression.
Possible solution:
(It's a wild guess)
Hard to tell for sure from your example but if andmed3.isik contains all possible values to be used in the join you can try to put that in another table like it:
Andmed3Id isik
--------- -------
1 1233213
1 8455666
1 7884555
Of course to populate this table you ill need a strategy, possbile ones are: in the insert/update, in a batch in some late hour.
If this suits you just need to add one more join in your query.

Repeated Insert copies on ID

We have records with a count field on an unique id.
The columns are:
mainId = unique
mainIdCount = 1320 (this 'views' field gets a + 1 when the page is visited)
How can you insert all these mainIdCount's as seperate records in another table IN ANOTHER DBASE in one query?
Yes, I do mean 1320 times an insert with the same mainId! :-)
We actually have records that go over 10,000 times an id. It just has to be like this.
This is a weird one, but we do need the copies of all these (just) counts like this.
The most straightforward way to this is with a JOIN operation between your table, and another row source that provides a set of integers. We'd match each row from our original table to as many rows from the set of integer as needed to satisfy the desired result.
As a brief example of the pattern:
INSERT INTO newtable (mainId,n)
SELECT t.mainId
, r.n
FROM mytable t
JOIN ( SELECT 1 AS n
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) r
WHERE r.n <= t.mainIdCount
If mytable contains row mainId=5 mainIdCount=4, we'd get back rows (5,1),(5,2),(5,3),(5,4)
Obviously, the rowsource r needs to be of sufficient size. The inline view I've demonstrated here would return a maximum of five rows. For larger sets, it would be beneficial to use a table rather than an inline view.
This leads to the followup question, "How do I generate a set of integers in MySQL",
e.g. Generating a range of numbers in MySQL
And getting that done is a bit tedious. We're looking forward to an eventual feature in MySQL that will make it much easier to return a bounded set of integer values; until then, having a pre-populated table is the most efficient approach.

Recalculate values in MySQL tables

I have a web application that stores points in a table, and total points in the user table as below:
User Table
user_id | total_points
Points Table
id | date | user_id | points
Every time a user earns a point, the following steps occur:
1. Enter points value to points table
2. Calculate SUM of the points for that user
3. Update the user table with the new SUM of points (total_points)
The values in the user table might get out of sync with the sum in the points table, and I want to be able to recalculate the SUM of all points for every user once in a while (eg. once a month). I could write a PHP script that could loop through each user in the user table and find the sum for that user and update the total_points, but that would be a lot of SQL queries.
Is there a better(efficient) way of doing what I am trying to do?
Thanks...
A more efficient way to do this would be the following:
User Table
user_id
Points Table
id | date | user_id | points
Total Points View
user_id | total_points
A view is effectively a select statement disguised as a table. The select statement would be: SELECT "user_id", SUM("points") AS "total_points" FROM "Points Table" GROUP BY "user_id". To create a view, execute CREATE VIEW "Total Points View" AS <SELECT STATEMENT> where SELECT STATEMENT is the previous select statement.
Once the view has been created, you can treat it as you would any regular table.
P.S.: I don't know that the quotes are necessary unless your table names actually contain spaces, but it's been a while since I worked with MySQL, so I don't remember it's idiosyncrasies.
You have to user Triggers for this, to make the users total points in sync with the user_points table. Something like:
Create Trigger UpdateUserTotalPoints AFTER INSERT ON points
FOR EACH ROW Begin
UPDATE users u
INNER JOIN
(
SELECT user_id, SUM(points) totalPoints
FROM points
GROUP BY user_id
) p ON u.user_id = p.user_id
SET u.total_points = p.totalPoints;
END;
SQL Fiddle Demo
Note that: As noted by #FireLizzard, if these records in the second table, are frequently updated or delted, you have to have other AFTER UPDATE and AFTER DELETE triggers as well, to keep the two tables in sync. And in this case the solution that #FireLizzard will be better in this case.
If you want it once a month, you can’t deal with just MySQL. You have too « logic » code here, and put too logic in database is not the correct way to go. The trigger of Karan Punamiya could be nice, but it will update the user_table on every insert in points table, and it’s not what you seem to want.
For the fact you want to be able to remove points, just add bsarv new negated rows in points, don’t remove any row (it will break the history trace).
If you really want it periodically, you can run a cron script that does that, or even call your PHP script ;)

Categories