I have a table of Arabic text. I want to remove duplicate rows. In view of the symbols in Arabic language: َ ِ ُ
My table: vocabulary
+----+----------+--------------------------------+
| id | word | mean |
--------------------------------------------------
| 1 | سِلام | xxx |
--------------------------------------------------
| 2 | سَلام | xxx |
--------------------------------------------------
| 3 | سلام | xxx |
--------------------------------------------------
| 4 | سلام | xxx |
+------------------------------------------------+
Now i want this table:
+----+----------+--------------------------------+
| id | word | mean |
--------------------------------------------------
| 1 | سِلام | xxx |
--------------------------------------------------
| 2 | سَلام | xxx |
--------------------------------------------------
| 3 | سلام | xxx |
+------------------------------------------------+
How can i do that ?!
My Try:
$result = mysql_query( "SELECT * FROM vocabulary where");
while($end = mysql_fetch_assoc($result)){
$word = $end["word"];
$mean = $end["mean"];
$id = $end["id"];
$result2 = mysql_query( "SELECT * FROM vocabulary where word='$word' AND mean='$mean'");
$TotalResults = mysql_num_rows($result2);
if($TotalResults>1){
mysql_query( "DELETE FROM vocabulary WHERE id='$id'");
}
Summary: How can I sensitive MySQL to the Arabic symbols ?
There are multiple ways to achieve this.
1- You can either select your rows from the database, loop through them and save the 'word' title in an array, and in each iteration in the loop, you can check if a similar value is in_array(). If the value exists, then you can save the id in another array and then use these ids to delete from the database.
2- Another way to extract the ids is to use a query similar to the below:
select count(*), id from table group by title
You can then loop through the results and delete the row (using the ids) where count is greater than 1.
The basic concept in both (and other methods) is that you just have to match the strings. Phonetics on letters change the actual string so "سَلام" is not equal to "سلام".
On a side note, there is a great Arabic PHP library you can use for various Arabic related string manipulation: PHP and Arabic Language.
This way will only remove one duplicate.
There are several other ways to do it, and it all depends on the size of the data set you have and if deleting these duplicates is a one time thing or a frequent thing because you will have to keep performance in mind.
I haven't tested it, but this should work:
CREATE TEMPORARY TABLE tmp_keeps
SELECT title, MIN(id) AS keepID
FROM theTable
GROUP BY title
;
DELETE FROM theTable
WHERE (title, id) NOT IN (
SELECT title, keepID
FROM tmp_keeps
)
;
DROP TEMPORARY TABLE tmp_keeps;
It (in the subquery) gets the first id for each title, and then deletes rows that don't meet that condition.
Edit: Revised to avoid SQL error pointed out in comments.
If it is a large table, something along the lines of Adon's answer might be faster.
Related
I'm making a cron job where it publishes (inserting new into the database) an article. I was able to pull it through but there is one query I can't get to work. I'd like to print certain rows from another table that can be inserted to the article being published. Supposed I have this another table like this:
+----------+-------------+
| filename | released |
+----------+-------------+
| tigers | 2020-05-27 |
| wolves | 2020-05-27 |
| earth | 2020-05-27 |
| bamboo | 2020-05-27 |
| glaciers | 2020-05-02 |
+----------+-------------+
How can I print the result of the filenames as:
bamboo, earth, tigers, wolves
so that the cron can insert it to the article table's specified column with the same format? I've tried using this query below but it only returns one result, which is the tigers filename.
SELECT filename,
GROUP_CONCAT(filename ORDER BY filename ASC SEPARATOR ', ')
FROM table
WHERE released='2020-05-27'
GROUP BY released
Many thanks for the help in advance!
Using TheImpaler's query, I managed to solve what I'm trying to get with the following code:
$get = $database->query("SELECT released,
GROUP_CONCAT(filename ORDER BY filename ASC SEPARATOR ', ')
FROM another
WHERE released='2020-05-27'
GROUP BY released");
$row = mysqli_fetch_array($get);
echo $row['1']; // Prints the comma-separated result of the filenames
echo print_r($row); // Prints the entire row
Many thanks for the comments, it gave me an idea and had my query validated!
I am having a bit of a problem running a select query on a database. Some of the data is held as a list of comma separated values, an example:
Table: example_tbl
| Id | standardid | subjectid |
| 1 | 1,2,3 | 8,10,3 |
| 2 | 7,6,12 | 18,19,2 |
| 3 | 10,11,12 | 4,3,7 |
And an example of the kind of thing I am trying to run:
select * from table where standardid in (7,10) and subjectid in (2,3,4)
select * from table where FIND_IN_SET(7,10,standardid) and FIND_IN_SET(2,3,4,subjectid)
Thanks in advance for anything you can tell me.
comma separated values in a database are inherently problematic and inefficient, and it is far, far better to normalise your database design; but if you check the syntax for FIND_IN_SET() it looks for a single value in the set, not matches several values in the set.
To use it for multiple values, you need to use the function several times:
select * from table
where (FIND_IN_SET(7,standardid)
OR FIND_IN_SET(10,standardid))
and (FIND_IN_SET(2,subjectid)
OR FIND_IN_SET(3,subjectid)
OR FIND_IN_SET(4,subjectid))
I tried to make this inside this question, but i am too young on #stackoverflow to post comments.
MySQL returning results from one table based on data in another table
I cannot get this to work. My intentions are slightly different.
I have two tables (and more in the future) that I intend to work together. I want to keep my db size down, so instead of using full words to reference time_code_department, I added a column to reference the "department_id". now I want to grab all the "time_codes" from table where the "time_code_depart" id matches the variable entered.
So if user selects "Solar" department and time_code_department table has "9" as the "solar" "department_id", then i want to return all the entries in "time_codes" that have the "department_id" "9" on the time_codes table. Which in this example would be lines with id 40 and 75.
Table Structure:
----------------------------------------------
| time_codes (table) |
| |
| id | department_id | code_number | code_name |
----------------------------------------------
| 40 | 9 | 35 | Safety |
| 52 | 10 | 725 | Inventory |
| 75 | 9 | 18 | Cabinets |
----------------------------------------------
-----------------------------------
| time_code_depart (table) |
| |
| department_id | name | manager |
-----------------------------------
| 9 | Solar | John |
| 10 | Finance | Mary |
| 11 | Design | Sue |
-----------------------------------
I've tried to query:
SELECT 'department_id'
FROM `time_codes`
INNER JOIN `time_code_depart`
ON 'time_codes.department_id' = 'time_code_depart.department_id'
WHERE 'name' LIKE 'Solar'
and
SELECT 'time_codes.id', 'time_codes.code_number', 'time_codes.code_name'
FROM `time_codes`
ON 'time_codes.department_id' = 'time_code_depart.department_id'
WHERE 'time_code_depart.name'
LIKE 'Solar'
Both of these I formed based on several readings on the subject, and i have used several variation of sentax. I cannot get it to return the entries for the lines with id 40 and 75.
Can you help me identify where I am going wrong?
You have several problems with quoting.
First, to quote table or column names in MySQL, you use backticks; single quotes are used for making strings.
Second, when you have a table.column, you must quote them each separately.
Note that it normally isn't necessary to quote table and column names at all. They only need to be quoted if they're the same as reserved words, or contain punctuation characters.
SELECT `time_codes`.`department_id`
FROM `time_codes`
INNER JOIN `time_code_depart`
ON `time_codes`.`department_id` = `time_codes_depart`.`department_id`
WHERE `name` LIKE 'Solar'
And when you have long table names like this, I recommend making use of table aliases to make expressions more readable:
SELECT tc.department_id
FROM time_codes AS tc
INNER JOIN time_code_depart AS tcd
ON tc.department_id = tcd.department_id
WHERE name LIKE 'Solar'
Looking for a solution to keep a random order of a user table in the database when clicking the next page button.
Actually I have a database with 1000 users and I want to display 10 users each page (in a memberlist), my query looks like this:
$sql = "SELECT * FROM users ORDER BY user_id LIMIT 1,10";
Now I would like to ORDER BY RAND() and it works, except of course when clicking the next page, then it is shuffled again and it happens sometimes that the same users will be there again.
So my question is about a solution to keep the random order I had on the first page, also on the next pages.
I thought about to set a $_SESSION variable when someone visits the memberlist for the first time with shuffled numbers from 1 to 1000 in it and then order the members by position in the $_SESSION variable where a number is equal to a user_id.
Don't know how this might be possible, but I actually imagine a solution like:
$numbers = range(1, 1000);
$shuffled_numbers = shuffle($numbers);
$sort = $_SESSION['random_user_sort'] = $shuffled_numbers;
So I will have a mysql query when clicking page two (next page) like this:
$sql = "SELECT * FROM users ORDER BY $sort LIMIT 11,20";
Any solution to let it work this way or even better ideas?
The RAND() function does not really generate random numbers but what's called pseudo random numbers: numbers are calculated with a deterministic formula and they're just intended to look random. To calculate a new number, you take the previous one and apply the formula to it, and that's how we get different output with a deterministic function: by using different input.
The initial number we use is known as seed. If you have a look at the manual you'll see that RAND() has an optional argument:
RAND(), RAND(N)
Returns a random floating-point value v in the range 0 <= v < 1.0. If
a constant integer argument N is specified, it is used as the seed
value, which produces a repeatable sequence of column values
You've probably figured out by now where I want to go:
mysql> SELECT language_id, name FROM language ORDER BY RAND(33);
+-------------+----------+
| language_id | name |
+-------------+----------+
| 3 | Japanese |
| 1 | English |
| 4 | Mandarin |
| 6 | German |
| 5 | French |
| 2 | Italian |
+-------------+----------+
6 rows in set (0.00 sec)
mysql> SELECT language_id, name FROM language ORDER BY RAND(33);
+-------------+----------+
| language_id | name |
+-------------+----------+
| 3 | Japanese |
| 1 | English |
| 4 | Mandarin |
| 6 | German |
| 5 | French |
| 2 | Italian |
+-------------+----------+
6 rows in set (0.00 sec)
P.S. The manual is not explicit about the seed range (it just says integer), you might need some extra research (or just some quick testing).
I know that this title is overused, but it seems that my kind of question is not answered yet.
So, the problem is like this:
I have a table structure made of four tables (tables, rows, cols, values) that I use to recreate the behavior of the information_schema (in a way).
In php I am generating queries to retrieve the data, and the result would still look like a normal table:
SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')
HAVING (col2 LIKE "%4%")
OR
SELECT * FROM
(SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')) d
WHERE col2 LIKE "%4%"
note that the part where I define the columns of the result is generated by a php script. It is less important why I am doing this, but I want to extend this algorithm that generates the queries for a broader use.
And we got to the core problem, I have to decide if I will generate a where or a having part for the query, and I know when to use them both, the problem is my algorithm doesn't and I have to make a few extra checks for this. But the two above queries are equivalent, I can always put any query in a sub-query, give it an alias, and use where on the new derived table. But I wonder if I will have problems with the performance or not, or if this will turn back on me in an unexpected way.
I know how they both work, and how where is supposed to be faster, but this is why I came here to ask. Hopefully I made myself understood, please excuse my english and the long useless turns of phrases, and all.
EDIT 1
I already know the difference between the two, and all that implies, my only dilemma is that using custom columns from other tables, with variable numbers and size, and trying to achieve the same result as using a normally created table implies that I must use HAVING for filtering the derived tables columns, at the same time having the option to wrap it up in a subquery and use where normally, this probably will create a temporary table that will be filtered afterwards. Will this affect performance for a large database? And unfortunately I cannot test this right now, as I do not afford to fill the database with over 1 billion entries (that will be something like this: 1 billion in rows table, 5 billions in values table, as every row have 5 columns, 5 rows in cols table and 1 row in tables table = 6,000,006 entries in total)
right now my database looks like this:
+----+--------+-----------+------+
| id | name | title | dets |
+----+--------+-----------+------+
| 1 | table1 | Table One | |
+----+--------+-----------+------+
+----+-------+------+
| id | table | name |
+----+-------+------+
| 3 | 1 | col1 |
| 4 | 1 | col2 |
+----+-------+------+
where `table` is a foreign key from table `tables`
+----+-------+-------+
| id | table | extra |
+----+-------+-------+
| 1 | 1 | |
| 2 | 1 | |
+----+-------+-------+
where `table` is a foreign key from table `tables`
+----+-----+-----+----------+
| id | row | col | value |
+----+-----+-----+----------+
| 1 | 1 | 3 | 13 |
| 2 | 1 | 4 | 14 |
| 6 | 2 | 4 | 24 |
| 9 | 2 | 3 | asdfghjk |
+----+-----+-----+----------+
where `row` is a foreign key from table `rows`
where `col` is a foreign key from table `cols`
EDIT 2
The conditions are there just for demonstration purposes!
EDIT 3
For only two rows, it seems there is a difference between the two, the one using having is 0,0008 and the one using where is 0.0014-0.0019. I wonder if this will affect performance for large numbers of rows and columns
EDIT 4
The result of the two queries is identical, and that is:
+----------+------+
| col1 | col2 |
+----------+------+
| 13 | 14 |
| asdfghjk | 24 |
+----------+------+
HAVING is specifically for GROUP BY, WHERE is to provide conditional parameters. See also WHERE vs HAVING
I believe the having clause would be faster in this case, as you're defining specific values, as opposed to reading through the values and looking for a match.
See: http://database-programmer.blogspot.com/2008/04/group-by-having-sum-avg-and-count.html
Basically, WHERE filters out columns before passing them to an aggregate function, but HAVING filters the aggregate function's results.
you could do it like that
WHERE col2 In (14,24)
your code WHERE col2 LIKE "%4%" is bad idea so what about col2 = 34 it will be also selected.