As a developer, I know a good DB guy is worth their weight in gold. I often find myself using seriously inefficient ways to tackle non critical problems, but in this current case, I need speed over 'just make it work'. I won't even show what I've done so far as it's so embarrassing, but lets just say that I had sub queries within my main PHP while loop. Sorry.
I have several tables that I need to join together for a json request to an indexing engine (Apache SOLR).
tbl_contacts
+----+--------------+---------------+
| ID | FirstName | LastName |
+----+--------------+---------------+
| 1 | Joe | Blogs |
+----+--------------+---------------+
| 2 | Jane | Baker |
+----+--------------+---------------+
| 3 | John | Doe |
+----+--------------+---------------+
tbl_attributes_map
+----+--------------+---------------+
| ID | ContactID | AttributeID |
+----+--------------+---------------+
| 1 | 1 | 1 |
+----+--------------+---------------+
| 2 | 1 | 3 |
+----+--------------+---------------+
| 3 | 2 | 2 |
+----+--------------+---------------+
tbl_attributes
+----+---------------+---------------+
| ID | AttributeType | Attribute |
+----+---------------+---------------+
| 1 | Lower | Shoe |
+----+---------------+---------------+
| 2 | Upper | T-Shirt |
+----+---------------+---------------+
| 3 | Upper | Vest |
+----+---------------+---------------+
tbl_notes
+----+---------------+---------------+
| ID | ContactID | Note |
+----+---------------+---------------+
| 1 | 1 | Big feet |
+----+---------------+---------------+
| 2 | 2 | Showoff |
+----+---------------+---------------+
| 3 | 2 | Sweaty |
+----+---------------+---------------+
tbl_appointment
+----+---------------+---------------+--------------------+-------------------+
| ID | ContactID | Location | TimeFrom | TimeTo |
+----+---------------+---------------+--------------------+-------------------+
| 1 | 1 | Big Ben | 2015-12-16 08:00:00|2015-12-16 08:30:00|
+----+---------------+---------------+--------------------+-------------------+
| 1 | 2 | London | 2015-12-17 10:00:00|2015-12-17 11:00:00|
+----+---------------+---------------+--------------------+-------------------+
| 1 | 2 | New York | 2015-12-16 12:00:00|2015-12-16 12:30:00|
+----+---------------+---------------+--------------------+-------------------+
I need to run a query which essentially allows me to print an array structure such as:
Array(
[FirstName] => Joe
[LastName] => Blogs
[Upper] => Array(
Vest
)
[Lower] => Array(
Shoe
)
[Notes] => Array(
Big Feet
)
[Location] => Array(
Big Ben
)
[ApptFrom] => Array(
2015-12-16 08:00:00
)
[ApptTo] => Array(
2015-12-16 08:30:00
)
If I can get to a stage where I can run a query to get the following output, I can run a delimiter to implode on the fields that I need to build an array for. EG:
+----+------------+-----------+---------+---------+-----------------+-------------------+-----------------------------------------+-----------------------------------------+
| ID | FirstName | LastName | Upper | Lower | Notes | Location | ApptFrom | ApptTo |
+----+------------+-----------+---------+---------+-----------------+-------------------+-----------------------------------------+-----------------------------------------+
| 2 | Jane | Baker | T-Shirt | | Show off,Sweaty | London,New York | 2015-12-17 10:00:00,2015-12-16 12:00:00 | 2015-12-17 11:00:00,2015-12-16 12:30:00 |
+----+------------+-----------+---------+---------+-----------------+-------------------+-----------------------------------------+-----------------------------------------+
My script currently works, but at a high performance penalty. It takes around 3 hours to churn through 80,000 contacts :-/
Thanks in advance.
I have exactly similar situation when I start indexing Elasticsearch (use Apache Lucene as SOLR) 50 million of data, now it only take me couple of hours I think what you can do are following:
put explain on your query to see if your query are using the proper index
try to use more sub select instead of join because mysql have problem to select with index for million of data (you can use force index maybe)
start muti-thread to do your indexing
Related
I have a main table (advices) and two reference tables (expert, friend)
advices
----------------------------------------
|id | advisor_id | advisor_type |
----------------------------------------
| 1 | 6 | expert |
| 2 | 6 | friend |
| 3 | 7 | expert |
| 4 | 8 | expert |
----------------------------------------
expert
----------------------------------
|id | lastname | firstname |
----------------------------------
| 6 | Polo | Marco |
| 7 | Wayne | John |
| 8 | Smith | Brad |
----------------------------------
friend
----------------------------------
|id | lastname | firstname |
----------------------------------
| 6 | Doe | John |
| 7 | Brown | Jerry |
| 8 | Goofy | Doofy |
----------------------------------
I would like to get all of the advices (some are from an expert, some are from a friend) and have their respective lastname and firstname be part of the result set.
Each advice row has reference tables (expert, friend tables) tied to it via the id and type.
So I would like to have a result based on id but depending on type inso far as which table to query
The result would look like this
Combining lastname and firstname from reference tables depending on whether it is an expert or a friend.
advices (array)
----------------------------------------------------------------
|id | advisor_id | advisor_type | lastname | firstname |
-----------------------------------------------------------------
| 1 | 6 | expert | Polo | Marco |
| 2 | 6 | friend | Doe | John |
| 3 | 7 | expert | Wayne | John |
| 4 | 8 | expert | Smith | Brown |
-----------------------------------------------------------------
In non programming simple words term I would like to create a query such as this.
SELECT
advices.id, advices.advisor_id, advices.type
IF advices.type==expert THEN expert.lastname, expert.firstname
ELSE IF advices.type==friend THEN friend.lastname, friend.firstname
FROM advices, expert, friend
Obviously I know that the SELECT statement does not allow for this type of on the fly logic. But can this be done in another way?
This should work:
SELECT a.*, e.firstname, e.lastname
FROM advices AS a
INNER JOIN expert AS e ON a.advisor_id = e.id AND a.advisor_type = 'expert'
UNION
SELECT a.*, f.firstname, f.lastname
FROM advices AS a
INNER JOIN friend AS f ON a.advisor_id = f.id AND a.advisor_type = 'friend'
I have a table of words used in the title of articles. I want to find which words which are used the least in the set or article titles.
Example:
Titles:
"Congressman Joey of Texas does not sign bill C1234."
"The pretty blue bird flies at night in Texas."
"Congressman Bob of Arizona is the signs bill C1234."
The table would contain the following.
Table WORDS_LIST
----------------------------------------------------
| INDEX ID | WORD | ARTICLE ID |
----------------------------------------------------
| 1 | CONGRESSMAN | 1234 |
| 2 | JOEY | 1234 |
| 3 | SIGN | 1234 |
| 4 | BILL | 1234 |
| 5 | C1234 | 1234 |
| 6 | TEXAS | 1234 |
| 7 | PRETTY | 1235 |
| 8 | BLUE | 1245 |
| 9 | BIRD | 1245 |
| 10 | FLIES | 1245 |
| 11 | NIGHT | 1245 |
| 12 | TEXAS | 1245 |
| 13 | CONGRESSMAN | 1246 |
| 14 | BOB | 1246 |
| 15 | ARIZONA | 1246 |
| 16 | SIGNS | 1246 |
| 17 | BILL | 1246 |
| 18 | C1234 | 1246 |
----------------------------------------------------
In this case, the words "pretty,blue, flies, night" would be the used in the least number of articles.
I would appreciate any ideas on how to best create this query. So far below is what I started with. I can also write something in PHP but figured a query would be faster.
SELECT distinct a1.`word`, count(a1.`word`)
FROM mmdb.words_list a1
JOIN mmdb.words_list b1
ON a1.id = b1.id AND
upper(a1.word) = upper(b1.word)
where date(a1.`publish_date`) = '2017-06-09'
group by `word`
order by count(a1.`word`);
I don't see why a self-join is necessary. Just do something like this:
select wl.word, count(*)
from mmdb.words_list wl
where date(wl.`publish_date`) = '2017-06-09'
group by wl.word
order by count(*);
You can add a limit to get a fixed number of words. If publish_date is already a date, you should do the comparison as:
where publish_date = '2017-06-09'
If it has a time component:
where publish_date >= '2017-06-09' and publish_date < '2017-06-10'
This expression allows MySQL to use an index.
Try this. It's a bit more simple and should return the correct results:
SELECT `WORD`,
COUNT(*) as `num_articles`
FROM `WORDS_LIST`
WHERE date(`publish_date`) = '2017-06-09'
GROUP BY `WORD`
ORDER BY COUNT(*) ASC;
Would like to get the following as a result from the table structure below (MYSQL + PHP)
array[0][name]1,[desc]red,[title]hero,[desc]strong,[desc2]smells,[img][0]red1,[img][1]red2,[img][2]red3,ext[0].jpg,[ext][1].gif,[ext][2].png,[count][0]253,[count][1]211,[count][2]21,[count][3]121,[dist][0]5,[dist][1]5,[dist][2]12,[dist][3]2,[score][0]2,[score][1]3,[score][2]1,[score][3]5,[score][4]4,[val][0]5,[val][1]1,[val][2]4,[val][3]3,[val][4]4
The problem I have with a simple SELECT, JOIN and GROUP_CONCAT is that the values duplicate after selecting all the images.
I've tried various other ways for example selecting the data by row combined with a foreach loop in PHP, but I end up with lots of duplicates, and it looks very messy.
I also though about splitting it into multiple selects instead of using one, but I really would like to know if it can be done with one select.
Could someone help me with an MYSQL select? Thanks
game
+-----+----------+
| pid | name |
+-----+----------+
| 1 | red |
| 2 | green |
| 3 | blue |
+-----+----------+
detail
+-----+------+--------+-------+--------+
| id | pid | title | desc | desc 2 |
+-----+------+--------+-------+--------+
| 1 | 1 | hero |strong | smells |
| 2 | 2 | prince |nice | tall |
| 3 | 3 | dragon |big | green |
+-----+------+--------+-------+--------+
image
+-----+-----+-----+----+
| id | pid | img |ext |
+-----+-----+-----+----+
| 1 | 1 | red1|.jpg|
| 2 | 1 | red2|.gif|
| 3 | 1 | red3|.png|
+-----+-----+-----+----+
devmap
+-----+-----+-------+------+
| id | pid | count | dist |
+-----+-----+-------+------+
| 1 | 1 | 253 | 5 |
| 2 | 1 | 211 | 5 |
| 3 | 1 | 21 | 12 |
| 4 | 1 | 121 | 2 |
+-----+-----+-------+------+
stats
+-----+-----+-------+------+
| id | pid | scrore| val |
+-----+-----+-------+------+
| 1 | 1 | 2 | 5 |
| 2 | 1 | 3 | 1 |
| 3 | 1 | 1 | 4 |
| 4 | 1 | 5 | 3 |
| 5 | 1 | 4 | 3 |
+-----+-----+-------+------+
When you do a JOIN that involves more than a 1:1 mapping between tables you're going to have duplicate data, and there's no way to get around that in the query.
You can break it out into multiple selects, or you can loop through the result set and pare out whatever duplicate information you don't want.
I have a table in mysql, let's call it foo and it has a limitied amount of columns.
| id | name | date |
--------------------------
| 1 | bar | 2012-05-08 |
| 2 | buba | 2012-05-09 |
My users can add records to the table foo_field (stuff like, code, description, time...).
| id | name |
--------------------
| 1 | code |
| 2 | description |
| 3 | time |
In the table foo_field_value the values for the user-defined fields are stored, like so:
| id | foo_id | foo_field_id | value |
------------------------------------------
| 1 | 1 | 1 | b |
| 2 | 1 | 2 | Lalalala |
| 3 | 1 | 3 | 12:00 |
| 1 | 2 | 1 | c |
| 2 | 2 | 2 | We are go |
| 3 | 2 | 3 | 14:00 |
Ideally, I'd want one query which would give me a result like
| id | name | date | code | description | time |
------------------------------------------------------
| 1 | bar | 2012-05-08 | b | Lalalala | 12:00 |
| 2 | buba | 2012-05-09 | c | We are go | 14:00 |
Is this even possible without doing an inner join on the foo_fields_value table for every foo_field (generating the query with PHP by doing another query first).
It's possible to do it in just one, and quite simple.
We are going to modify the foo_field table a bit, adding a column corresponding to the foo table's id column, which I assume is the primary key.
so now we have
* foo
|------|
| id |
| name |
| date |
|------|
* foo_field
|-------------|
| foo_id |
| code |
| description |
| time |
|-------------|
Which means we can add the extra fields with one simple query:
SELECT * FROM foo
LEFT JOIN foo_field ON foo.id = foo_field.foo_id
Which will give us a result set of
| id | name | date | foo_id | code | description | time |
|----+-------+------------+--------+--------+-------------+----------|
| 1 | asdw | 2012-05-16 | 1 | asdasd | asdasd | 15:03:41 |
| 2 | fdgfe | 2012-05-18 | 2 | asdas | asdas | 15:03:41 |
| 3 | asdw | 2012-05-16 | 3 | asdas | asdas | 15:03:52 |
| 4 | fdgfe | 2012-05-18 | 4 | asdasd | asdasd | 15:03:52 |
I am still not sure I surely understood your question. If you want to create truly dynamic values and datastructures, I suggest you save a serialized array into a TEXT field in your database, but I also suggest you to overlook your solution if this is the case; if you want your solution to be able to grow, you want to manage as strict structures as possible.
What you are looking for is a pivot query. And since you have dynamic fields that need to converted to columns, check this article here about making automatic pivot queries
http://www.artfulsoftware.com/infotree/queries.php#523
When trying to execute this query my mysql server cpu usage goes to 100% and the page just stalls. I setup an index on (Client_Code, Date_Time, Time_Stamp, Activity_Code, Employee_Name, ID_Transaction) it doesn't seem to help. What steps can I go about next to fix this issue? Also there is already one index on the database if that matters any. Thanks
Here is what this query does
Database info
ID_Transaction | Client_Code | Employee_Name | Date_Time |Time_Stamp| Activity_Code
1 | 00001 | Eric | 11/15/10| 7:30AM | 00023
2 | 00001 | Jerry | 11/15/10| 8:30AM | 00033
3 | 00002 | Amy | 11/15/10| 9:45AM | 00034
4 | 00003 | Jim | 11/15/10| 10:30AM | 00063
5 | 00003 | Ryan | 11/15/10 | 12:00PM | 00063
6 | 00003 | bill | 11/14/10 | 1:00pm | 00054
7 | 00004 | Jim | 11/15/10 | 1:00pm | 00045
8 | 00005 | Jim | 11/15/10| 10:00 AM| 00045
The query takes the info above and counts it like so. By the most recent entry for each client_code. In this case the query would look like this. After php.
Jerry = 1
2 | 00001 | Jerry | 11/15/10| 8:30AM | 00033
Amy = 1
3 | 00002 | Amy | 11/15/10| 9:45AM | 00034
Ryan = 1
5 | 00003 | Ryan | 11/15/10 | 12:00PM | 00063
Jim = 2
7 | 00004 | Jim | 11/15/10 | 1:00pm | 00045
8 | 00005 | Jim | 11/15/10| 10:00 AM| 00045
$sql = "SELECT m.Employee_Name, count(m.ID_Transaction)
FROM ( SELECT DISTINCT Client_Code FROM Transaction)
md JOIN Transaction m ON
m.ID_Transaction = ( SELECT
ID_Transaction FROM Transaction mi
WHERE mi.Client_Code = md.Client_Code AND Date_Time=CURdate() AND Time_Stamp!='' AND
Activity_Code!='000001'
ORDER BY m.Employee_Name DESC, mi.Client_Code DESC, mi.Date_Time DESC,
mi.ID_Transaction DESC LIMIT 1 )
group by m.Employee_Name";
Is there a better way to write this query so it doesnt bog down my system? The query works fine with 10 database entries but it locks my server up when the database has 300,000 entries.
Thanks
Eric
+----+--------------------+-------------+--------+------------------------+--------------+---------+----------------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------------+--------+------------------------+--------------+---------+----------------+------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | [NULL] | [NULL] | [NULL] | [NULL] | 8 | 100.00 | Using temporary; Using filesort |
| 1 | PRIMARY | m | index | [NULL] | search index | 924 | [NULL] | 21 | 100.00 | Using where; Using index; Using join buffer |
| 3 | DEPENDENT SUBQUERY | mi | ref | search index,secondary | search index | 18 | md.Client_Code | 3 | 100.00 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | Transaction | index | [NULL] | secondary | 918 | [NULL] | 21 | 38.10 | Using index |
+----+--------------------+-------------+--------+------------------------+--------------+---------+----------------+------+----------+----------------------------------------------+
What about going with multiple GROUP BY's instead of the all the sub queries to simplify things.... something like:
SELECT * FROM Transaction WHERE Date_Time=CURdate() AND Time_Stamp!='' AND Activity_Code != '000001' GROUP BY Client_Code, Employee_Name
If I'm understanding your query correctly then something like this would solve the issues and prevent the need for sub queries.
You'll definitely want to do a join instead of a sub select.
Also, how many records are you viewing? Is pagination and using limit out of the question?
If you set up your initial query modified with inner/outer joins as a view and it doesn't crash, you'll be one step closer. Once the view is set up, you'll be able to use a much less complicated select statement - potentially paginated.