MYSQL match text fields - php

I have a mysql database with around 1.5 million company records(name, country and other small text fields) I want to mark the same records with a flag (for example if two companies with the same name are in USA then I have to set a field (match_id) equal to say an integer 10) and likewise for other matches. At the moment its taking a long time (days) I feel I am not utilizing MYsql properly I am posting my code below, Is there a faster way to do this???
<?php
//Create the table if does not already exist
mysql_query("CREATE TABLE IF NOT EXISTS proj (
id INT(11) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
company_id text NOT NULL ,
company_name varchar(40) NOT NULL ,
company_name_text varchar(33) NOT NULL,
company_name_metaphone varchar(19) NOT NULL,
country varchar(20) NOT NULL ,
file_id int(2) NOT NULL ,
thompson_id varchar(11) NOT NULL ,
match_no int(7) NOT NULL ,
INDEX(company_name_text))")
or die ("Couldn't create the table: " . mysql_error());
//********Real script starts********
$countries_searched = array(); //To save record ids already flagged (save time)
$counter = 1; //Flag
//Since the company_names which are same are going to be from the same country so I get all the countries first in the below query and then in the next get all the companies in that country
$sql = "SELECT DISTINCT country FROM proj WHERE country='Canada'";
$result = mysql_query($sql) or die(mysql_error());
while($resultrow = mysql_fetch_assoc($result)) {
$country = $resultrow['country'];
$res = mysql_query("SELECT company_name_metaphone, id, company_name_text
FROM proj
WHERE country='$country'
ORDER BY id") or die (mysql_error());
//Loop through the company records
while ($row = mysql_fetch_array($res, MYSQL_NUM)) {
//If record id is already flagged (matched and saved in the countries searched array) don't waste time doing anything
if ( in_array($row[1], $countries_searched) ) {
continue;
}
if (strlen($row[0]) > 9) {
$row[0] = substr($row[0],0,9);
$query = mysql_query("SELECT id FROM proj
WHERE country='$country'
AND company_name_metaphone LIKE '$row[0]%'
AND id<>'$row[1]'") or die (mysql_error());
while ($id = mysql_fetch_array($query, MYSQL_NUM)) {
if (!in_array($id[0], $countries_searched)) $countries_searched[] = $id[0];
}
if(mysql_num_rows($query) > 0) {
mysql_query("UPDATE proj SET match_no='$counter'
WHERE country='$country'
AND company_name_metaphone LIKE '$row[0]%'")
or die (mysql_error()." ".mysql_errno());
$counter++;
}
}
else if(strlen($row[0]) > 3) {
$query = mysql_query("SELECT id FROM proj WHERE country='$country'
AND company_name_text='$row[2]' AND id<>'$row[1]'")
or die (mysql_error());
while ($id = mysql_fetch_array($query, MYSQL_NUM)) {
if (!in_array($id[0], $countries_searched)) $countries_searched[] = $id[0];
}
if(mysql_num_rows($query) > 0) {
mysql_query("UPDATE proj SET match_no='$counter'
WHERE country='$country'
AND company_name_text='$row[2]'") or die (mysql_error());
$counter++;
}
}
}
}
?>

I would go for pure sql solution, something like :
SELECT
GROUP_CONCAT(id SEPARATOR ' '), "name"
FROM proj
WHERE
LENGTH(company_name_metaphone) < 9 AND
LENGTH(company_name_metaphone) > 3
GROUP BY country, UPPER(company_name_text)
HAVING COUNT(*) > 1
UNION
SELECT
GROUP_CONCAT(id SEPARATOR ' '), "metaphone"
FROM proj
WHERE
LENGTH(company_name_metaphone) > 9
GROUP BY country, LEFT(company_name_metaphone, 9)
HAVING COUNT(*) > 1
then loop through this results to update ids.

I'm not sure what your are trying to do, but what I can see in your code is that you are making a lot of searches in arrays with a lot of data, I think your problem is your PHP code and not SQL statements.

you will need to adjust the group by fields to suit your matching requirements
if your script times out (quite likely due to the large amount of data), set_time_limit(0)
otherwise you can also add a limit of 1000 or something to the $sql, and run the script multiple times as the where clause will exclude any matched rows already processed (but will not keep track of $match_no inbetween calls, so your would need to handle that yourself)
// find all companies that have multiple rows grouped by identifying fields
$sql = "select company_name, country, COUNT(*) as num_matches from proj
where match_no = 0
group by company_name, country
having num_matches > 1";
$res = mysql_query($sql);
$match_no = 1;
// loop through all duplicate companies, and set match_id
while ($row = mysql_fetch_assoc($res)) {
$company_name = mysql_escape_string($row['company_name']);
$country = mysql_escape_string($row['country']);
$sql = "update proj set match_no = $match_no where
company_name = '$company_name', country = '$country';
mysql_query($sql);
$match_no++;
}

Related

How to get the rank of a row in a MySQL table using PHP and how to store it into a separate variable?

I have been writing the script of a cron job in PHP that is supposed to connect to a text file, get the participantID that is written to it using Java separately, and return details about that participant in a separate text file. The details about that participant are supposed to be their Points, Quantity and Rank out of other participants(which is got using the points that the respective participants have got).
This is the code that I have written so far:
<?php
//connect to mysql database procedurally
$conn = mysqli_connect("localhost","root","","anka");
if(!$conn){
echo "Connection error" . mysqli_connect_error("localhost","root","","anka");
}
$participantid = 0;
$participantid = fopen('performancerequests.txt',"r") or die("Request not taken.");
$content = fread($participantid, filesize("performancerequests.txt"));
file_put_contents("performancerequests.txt","");
//should return rank, points, products left(quantity)
$sql1 = "SELECT points from participants where id='$content'";
$sql2 = "SELECT quantity from products where participant_id='$content'";
$sql3 = "SELECT points, ROW_NUMBER() OVER( ORDER BY points ) RowNumber from participants where id='$content'";
$result1 = mysqli_query($conn, $sql1);
$result2 = mysqli_query($conn, $sql2);
$result3 = mysqli_query($conn, $sql3);
while ($row1 = $result1->fetch_assoc()) {
$points = $row1['points'];
}
while ($row2 = $result2->fetch_assoc()) {
$quantity = $row2['quantity'];
}
while ($row3 = $result3->fetch_assoc()) {
$rank = $row3['rank'];
}
$handle = fopen("performance.txt", "w") or die("File does not exist.");
if(fwrite($handle,$content." "."Points: ".$points." "."Quantity: ".$quantity." "."Rank: ".$rank) == false ){
echo "Error Writing.";
}
?>
The code for the tables in question are also as follows:
CREATE TABLE participants (
id bigint(20),
name varchar(255),
password varchar(255),
product varchar(255),
DOB date,
points int(11),
PRIMARY KEY(id)
);
CREATE TABLE products (
id bigint(20),
name varchar(255),
quantity varchar(255),
rate varchar(255),
description varchar(255),
FOREIGN KEY(participant_id) REFERENCES participants(id),
PRIMARY KEY(id)
);
But when I test and run it, it returns the warning that there's an undefined array key, for the variable $rank. And in the text file, the rank bit is also left blank.
Is there a way I could solve this?
You must match alias in your query to array key:
$sql3 =
"WITH ranked AS(
SELECT id, points, ROW_NUMBER() OVER( ORDER BY points ) RowNumber
FROM participants
) SELECT * FROM ranked WHERE id='$content'";
while ($row3 = $result3->fetch_assoc()) {
$rank = $row3['RowNumber']; // here changed
}
php code online

How select random column value from Mysql database using PHP

I'm trying to fetch random column from database using Rand() function.
It's returning random value but many time it is returning duplicate.
This is what my database table look like.
Column
Type
Null
Default
no
int(30)
No
postid
varchar(100)
Yes
NULL
byuser
varchar(32)
Yes
NULL
likeslimit
int(30)
No
createdon
date
No
And this is what my PHP code is.
$query = mysqli_query(
$mysql,
"SELECT postid FROM history ORDER BY Rand() LIMIT 1"
);
if (mysqli_num_rows($query) == 1) {
while ($row = mysqli_fetch_assoc($query)) {
echo $row['postid'];
}
}
I want it to always return random never the same till the end of data reached.
Don't use loop and condition you want only 1 limit try this
$query = mysqli_query(
$mysql,
"SELECT postid FROM history ORDER BY Rand() LIMIT 1"
);
$row = mysqli_fetch_assoc($query)
echo $row['postid'];
This is the way RAND in mysql works and will repeat the results from time to time. But you can achieve such functionality by using mysql with php.
$query = mysqli_query($mysql, "SELECT postid FROM cacheTable WHERE 1 ORDER BY RAND() LIMIT 1");
$row = mysqli_fetch_assoc($query);
$foundId = (int)$row['postid'];
if((int) $foundId === 0) { // NO records left in cacheTable then fill it up again
mysqli_query($mysql, "INSERT INTO cacheTable (postid) SELECT postid FROM history");
$query = mysqli_query($mysql, "SELECT postid FROM cacheTable WHERE 1 ORDER BY RAND() LIMIT 1");
$row = mysqli_fetch_assoc($query);
$foundId = (int) $row['postid'];
}
mysqli_query($mysql, "DELETE FROM cacheTable WHERE postid=".$foundId); // DELETE the record
$query = mysqli_query($mysql, "SELECT * FROM history WHERE postid=".$foundId);
$result = mysqli_fetch_assoc($query);
cacheTable will have only one column - ID (primary key) which will hold the corresponding ID (primary key) from history. cacheTable structure:
|------
|Column|Type|Null|Default
|------
|postid|varchar(100)|Yes|NULL
|-----
cacheTable will fill with all the ids from history table (it will be done once the cacheTable is empty). You will select rand result from the cacheTable and you will delete it then so it will not appear in the next selects. When we are out of records in cache table it will populate again.
NB: this approach has one major drawback - when you have new entries in history table they won't be available in cache table until it is empty and filled again.
This is the code Samir Nabil Suggested :
session_start();
$_SESSION['dupes'] = array();
$query = mysqli_query(
$mysql,
"SELECT postid FROM history ORDER BY Rand() LIMIT 1"
);
if (mysqli_num_rows($query) == 1) {
while ($row = mysqli_fetch_assoc($query)) {
if (!in_array($row['postid'], $_SESSION['dupes'])) {
echo $row['postid'];
$_SESSION['dupes'][] = $row['postid'];
}
}
}

How can I show rows from one table that aren't in another table?

I have two tables in a database, one of them is a list of 'buildings' you could create. The other is a list of buildings that have been built by users.
On one page, (cityproduction.php), it displays a list of 'buildings' you can build.
I want it to display the buildings that you can build, that you haven't already built.
Here is my code:
$sql = "SELECT * FROM [The list of built buildings] WHERE building_owner = '$user'";
$result = $conn->query($sql);
if ($result->num_rows > 0) {
while($row = $result->fetch_assoc()) {
$variable = $row["building_name"];
}
(...)
$sql = "SELECT * FROM [The list of ALL buildings] WHERE name != '$variable' ORDER BY id asc";
$result = mysqli_query($database,$sql) or die(mysqli_error($database));
while($rws = mysqli_fetch_array($result)){
echo $rws["name"]; (etc.)
What this is doing is only not-showing one of the buildings that the user has built, not all of them.
Without seeing the real table names or the schema it is tricky to answer accurately but you could try something along these lines:
SELECT * FROM `all_buildings`
WHERE `id` not in (
select `building_id` from `built_buildings` where `building_owner` = '$user'
)
ORDER BY `id` asc;
Another translation of your question into SQL (besides NOT IN) results in a Correlated Subquery:
SELECT * FROM `all_buildings` AS a
WHERE NOT EXISTS
(
select * from `built_buildings` AS b
where a.`id` = b.`building_id` -- Outer Select correlated to Inner
and b.`building_owner` = '$user'
)
ORDER BY `id` asc;
The main advantage over NOT IN: it's using only two-valued-logic (NULL is ignored = false) while NOT IN uses three-valued-logic (comparison to NULL returns unknown which might no return what you expect)
Why are you using while after the first query, it suppose to be a list or just a single value? because if you use $variable in your second query it will only have the value of the last value of the list you are getting
if ($result->num_rows > 0) {
$variable = array();
while($row = $result->fetch_assoc()) {
$variable[] = $row["building_name"];
}
Second query example:
foreach($variable as $building) {
$sql = "SELECT * FROM [The list of ALL buildings] WHERE name != '$building' ORDER BY id asc";
$result = mysqli_query($database,$sql) or die(mysqli_error($database));
$result = mysqli_fetch_assoc($result);
echo $result["name"];
}
Assuming both of your tables have some sort of id column to relate them, with this query:
SELECT building_name, building_owner FROM
test.all_buildings a
LEFT JOIN test.built_buildings b ON a.id = b.building_id AND b.building_owner = ?
ORDER BY building_owner DESC, building_name;
(where ? is the user), you can select all the buildings, first the ones that have been built, followed by the ones that haven't, in one query. If your tables don't have id's like that, you can join them on name instead; it should work as long as the names are distinct.
Then as you fetch the rows, you can sort them into "built" or "not built" by checking if the row has a building_owner.
if ($result->num_rows > 0) {
while($row = $result->fetch_assoc()) {
if ($row['building_owner']) {
$built[] = $row['building_name'];
} else {
$not_built = $row['building_name'];
}
}
}

Solve with only one query?

I have the following table:
CREATE TABLE list(
country TINYINT UNSIGNED NOT NULL,
name VARCHAR(10) CHARACTER SET latin1 NOT NULL,
name_index INT UNSIGNED NOT NULL,
UNIQUE KEY(country, name), PRIMARY KEY(country, name_index)) ENGINE = INNODB
I want to:
Given: ($country, $name, $new_index)
Check if a row with country = $country && name = $name exists.
If the row exists, get the index $index = name_index.
If the row doesn't exist, add it and then get the index.
I can do the following using many queries, but I am looking for an efficient way to do it, using only one query. Is this possible?
It's not possible with only one query.
You CAN do this:
$sql = "SELECT name_index FROM (your table) WHERE country = '$country' AND
name = '$name' LIMIT 1";
$query = mysql_query($sql);
$numrows = mysql_num_rows($query);
if($numrows == 1) {
$row = mysql_fetch_row($query);
$index = $row[0];
} else {
$sql = "INSERT INTO (your table) (country, name)
VALUES('$country','$name')";
$query = mysql_query($sql);
$check = mysql_num_rows($query);
if($check > 0) {
$sql = "SELECT name_index FROM (your table) WHERE country = '$country' AND
name = '$name' LIMIT 1";
$query = mysql_query($sql);
$row = mysql_fetch_row($query);
$index = $row[0];
} else {
echo "Error occured while trying to insert new row";
}
}
Hope this helps :).

How to update a value by 1 if the new value inserted into the database clashes with value in the database?

I want to update the database of the sort order column to increase its value by one if the the new value inserted into the database clashes with the value that is already in the database. May I know how should I go about doing it? Please help! Thanks!
Below is my code (I am not sure whether am I on the right track):
$result = mysql_query("SELECT sortorder FROM information ORDER BY id ASC;");
if($result >= 1 ){
$i=1;
while ($initialorder = mysql_fetch_assoc($result))
{
$initialorder = $initialorder["sortorder"];
if ($sortorder == $initialorder ){
$result6 = mysql_query("SELECT * FROM information
WHERE `sortorder` = '$sortorder'");
$row6 = mysql_fetch_array($result6);
$removethis1 = $row6['id'];
$result7 = mysql_query("UPDATE information
SET `sortorder`= ((SELECT `sortorder`
FROM (SELECT MAX(`sortorder`) AS
'$initialorder' FROM information) AS '$initialorder') + 1)
WHERE id='$removethis1'");
}
$query = "INSERT INTO `information`
(`id`,`page`,`description`,`status`,`sortorder`,`keyword`,`date_added`)
VALUES
('$id','$title','$description','$status',
'$sortorder','$keyword','$date_added')";
$result = mysql_query($query, $conn);
header('Location: index.php?status=1&title='.$title);
$i++; }
}
You can do this:
INSERT INTO ON `information`
...
DUPLICATE KEY UPDATE
sortorder = '".$sortorder + 1." '

Categories