SQL CODE to do: Trending topics equals a twitter - php

If i wants count the matching words in a rows of two tables, with milions of rows, sample:
Table posts, sample:
+----+---------+-----------------------------+
| ID | ID_user | text |
+----+---------+-----------------------------+
| 1 | bruno | michael jackson is dead |
| 2 | thomasi | michael j. moonwalk is dead |
| 3 | userts | michael jackson lives |
+----+---------+-----------------------------+
i want query the words most repeated on the table, limit top 10, the result may be this:
+-------+------------+
| count | word |
+-------+------------+
| 3 | michael |
| 2 | dead |
| 2 | jackson |
| 1 | j. |
| 1 | lives |
| 1 | moonwalk |
+-------+------------+
but i want search only words that repeat more of 10 times, in this case noone word is appear, but if criteria for repetead words is 2, will display only 'michael' and 'dead', but ignore 'is' because i dont want words with less 2 chars of lenght, and the words that a phrase, then i need apear this:
+-------+-----------------+
| count | word |
+-------+-----------------+
| 2 | michael jackson |
| 2 | dead |
+-------+-----------------+
i need a code in mysql that replies the "trending topics" of twitter for posts of my site.

What you're looking for is term extraction, which isn't provided natively within MySQL.
Some other platforms provide that function, but it's considered an enterprise feature, so you'll have to pay through the nose for it.
Alternatively, you can use something like Yahoo!'s Term Extraction API.
Here is a blog post that talks about using Yahoo!'s service from PHP5.

break the sentence up on insert, filter the words against a blacklist, store distinct words with a count (or probably with references). count using count() :)
this would generate a lot of data tough, and i don't know what the speed and storage implications are.

Related

Best practices linking data in MySQL tables

For an online game, I have a table that contains all the plays, and some information on those plays, like the difficulty setting etc.:
+---------+---------+------------+------------+
| play-id | user-id | difficulty | timestamp |
+---------+---------+------------+------------+
| 1 | abc | easy | 1335939007 |
| 2 | def | medium | 1354833214 |
| 3 | abc | easy | 1354833875 |
| 4 | abc | medium | 1354833937 |
+---------+---------+------------+------------+
In another table, after the game has finished, I store some stats related to that specific game, like the score etc:
+---------+----------------+--------+
| play-id | type | value |
+---------+----------------+--------+
| 1 | score | 201487 |
| 1 | enemies_killed | 17 |
| 1 | gems_found | 4 |
| 2 | score | 110248 |
| 2 | enemies_killed | 12 |
| 2 | gems_found | 7 |
+---------+----------------+--------+
Now, I want to make a distribution graph so users can see in what score percentile they are. So I basically want the boundaries of the percentiles.
If it would be on a score level, I could rank the scores and start from there, but it needs to be on a highscore level. So mathematically, I would need to sort all the highscores of users, and then find the percentiles.
I'm in doubt what's the best approach here.
On one hand, constructing an array that holds all the highscores seems like a performance heavy thing to do, because it needs to cycle through both tables and match the scores and the users (the first table holds around 10M rows).
On the other hand, making a separate table with the highscore of users would make things easier, but it feels like it's against the rules of avoiding data redundancy.
Another approach that came to mind was doing the performance heavy thing once a week and keep the result in a separate table, or doing the performance heavy stuff on only a (statistically relevant) subset of the data.
Or maybe I'm completely missing the point here and should use a completely different database setup?
What's the best practice here?

MYSQL UPDATE from SELECT INNER JOIN statement

Hello :) I am fairly new to using INNER JOIN and still trying to comprehend it's logic which I think I am sort of beginning to understand. After being across a few different articles on the topic I have generated a query for finding duplicates in my table of phone numbers.
My table structure is as such:
+---------+-------+
| PhoneID | Phone |
+---------+-------+
Very simple. I created this query:
SELECT A.PhoneID, B.PhoneID FROM T_Phone A
INNER JOIN T_Phone B
ON A.Phone = B.Phone AND A.PhoneID < B.PhoneID
Which returns the ID of a phone that matches another one. I don't know how to word that properly so here is an example output:
+---------+---------+
| PhoneID | PhoneID |
+---------+---------+
| 17919 | 17969 |
| 17919 | 22206 |
| 17919 | 23837 |
| 17920 | 17970 |
| 17920 | 22203 |
| 17920 | 23834 |
| 17921 | 17971 |
| 17921 | 22225 |
| 17921 | 22465 |
| 17921 | 24011 |
| 17921 | 24047 |
| 17922 | 17972 |
| 17922 | 22198 |
| 17922 | 23879 |
| 17923 | 17973 |
| 17923 | 22199 |
| 17923 | 23880 |
+---------+---------+
You can note that on the left there is repeating IDs, the phone number that matches will be on the right (These are just the IDs of said numbers). what I am trying to accomplish, is to actually change a join table relative to the ID on the right. The join table structure is as such:
+----------+-----------+
| T_JoinID | T_PhoneID |
+----------+-----------+
Where T_JoinID is a larger object with a collection of those T_PhoneIDs, hence the join table. What I want to do is take a row from the original match query, and find the right side PhoneID in the join table, then update that item in the Join to be equal to the left side PhoneID. Repeating this for each row.
It's sort of a way to save space and get rid of matching numbers, I can just point the matching ones to the original and use that as a reference when I need to retrieve it.
After that I need to actually delete the original numbers that I reset the reference for but... This seems like a job for 2 or 3 different queries.
EDIT:
Sorry I know I didn't include enough detail. Here is some additional info:
My exact table structure is not the same as here but I am only using the columns that I listed so I didn't consider the fact that any of the others would matter. Most of the tables have a unique ID that is auto incremented. The phone table has carrier, type, ect columns. The additional columns I felt were irrelevant to include, but if there is a solution that includes the auto incremented ID of each table, let me know :) Anyway, I sort of found a solution, using multiple queries though I am still interested to learn and apply knowledge based on this question. So I have a that join table that I mentioned. It might look something like this for the expected results. There is a before and after table in one sorry for poor formatting.
+--------------------+---------+----------+---------+
| Join Table Results | | | |
+--------------------+---------+----------+---------+
| Before | | After | |
| Join | Table | Join | Table |
| PersonID | PhoneID | PersonID | PhoneID |
| 1 | 1 | 1 | 1 |
| 1 | 2 | 1 | 2 |
| 1 | 3 | 1 | 3 |
| 2 | 4 | 2 | 1 |
| 2 | 5 | 2 | 5 |
| 2 | 6 | 2 | 6 |
| 3 | 7 | 3 | 5 |
| 3 | 8 | 3 | 5 |
| 3 | 9 | 3 | 5 |
| 3 | 10 | 3 | 8 |
| 3 | 11 | 3 | 9 |
+--------------------+---------+----------+---------+
So you can see that in the before columns, 7, 8, and 9 would all be duplicate phone numbers in the PhoneID - PhoneID relationship table I posted originally. After the query I wanted to retrieve the duplicates using the PhoneID - PhoneID comparison and take the ones that match, to change the join table in a way that I have shown directly above. So 7, 8, 9 all turn to 5. Because 5 is the original number, and 7, 8, 9 coincidentally were duplicates of 5. So I am basically pointing all of them to 5, and then deleting what would have been 7, 8, 9 in my Phone table since they all have a new relationship to 5. Is this making sense? xD It sounds outrageous typing it out.
End Edit
How can I improve my query to accomplish this task? Is it possible using an UPDATE statement? I was also considering just looping through this output and updating each row individually but I had a hope to just use a single query to save time and code. Typing it out makes me feel a tad obnoxious but I had hope there was a solution out there!
Thank you to anyone in advance for taking your time to help me out :) I really appreciate it. If it sounds outlandish, let me know I will just use multiple queries.

Mysql join query multiple values

I have a table called facility.
Structure looks as follows:
id | name
---------
1 | Hotel
2 | Hospital
3 | medical shop
I have an other table which is taking data from the above table and keeping multiple values in one column. View looks like below:
id | facilities
---------------
1 | Hospital~~medical shop~~Hotel
2 | Hospital~~Hotel
3 | medical shop~~Hotel
If I want to join these two tables how does the query look like?
I tried this, but it didn't work:
select overview.facilities as facility
from overview join facility on facility.id=overview.facilities;
you can do this with a bit of hackery
select o.facilities as facility
from overview o
join facility f on find_in_set(f.facilities, replace(o.facilities, '~~', ','));
I would highly recommend you change the way you are storing data. currently it is considered un normalized and that quickly becomes a monster to deal with
you should change your table structure to look something more like this
+----------+--------------+
| facility |
+----------+--------------+
| id | name |
+----------+--------------+
| 1 | Hotel |
| 2 | Hospital |
| 3 | medical shop |
+----------+--------------+
+-----------+-------------+
| overview |
+-----------+-------------+
| id | facility_id |
+-----------+-------------+
| 1 | 2 |
| 2 | 3 |
| 3 | 1 |
| 4 | 2 |
| 5 | 1 |
| 6 | 3 |
| 7 | 1 |
+-----------+-------------+
Code Explanation:
basically you are wanting to find the matching facilities in the overview. one handy function MySQL has is FIND_IN_SET() that allows you to find an item in a comma separated string aka find_in_set(25, '11,23,25,26) would return true and that matching row would be returned... you are separating your facilities with the delimiter ~~ which wont work with find_in_set... so I used REPLACE() to change the ~~ to a comma and then used that in the JOIN condition. you can go from here in multiple ways.. for instance lets say you want the facility id's for the overview.. you just add in the select GROUP_CONCAT(f.id) and you have all of the id's... note if you do that you need to add a GROUP BY at the end of your query to tell it how you want the results grouped

SQL Query Logic 2

I have already asked here Query Logic SQL and didn't get a response(could be with the presentation of my data) I really hope someone can take a look at this and provide their input on how to get this done. would highly appreciate any help.
I have a sql table data that looks like
users table
id | name |
_______________
1 | John |
2 | Mary |
3 | Charles |
4 | Mike |
5 | Lucy |
6 | Debbie |
pairing table:
main_id | pair_id |
_____________________
1 | 2 |
1 | 3 |
2 | 4 |
2 | 5 |
3 | 6 |
3 | 1 |
when rendering output to user, my html table would look like this, using group_by groupconcat in sql.
main_name | paired_names
John | Mary, Charles
Mary | Mike, Lucy
Charles | Debbie, John
Now,the problem is during searching(wildcard search)
say the user will input "Charles"...
The output needs to be:
main_name | paired_names
John | Mary, Charles
Charles | Debbie, John
since its going to do a wildcard search in both columns in the pair table.
For now, what i do is i manipulate the result set from the database but this has pagination and been advised that it will affect system performance if i query all data then manipulate after.
I hope someone will be kind enough to provide their advice on how to get this done.
I can provide further details if needed.
Looking forward to hear from you.
Checks the fiddle
SELECT users.id,GROUP_CONCAT(pair_id) FROM (SELECT users.id,
users.name,pairing.main_id,pairing.pair_id
FROM users, pairing
WHERE pairing.main_id=users.id
) AS t1 JOIN users ON users.id=t1.id GROUP BY users.id;

MySQL query how to get list of all distinct values from columns that contain multiple string values?

I am trying to get a list of distinct values from the columns out of a table.
Each column can contain multiple comma delimited values. I just want to eliminate duplicate values and come up with a list of unique values.
I know how to do this with PHP by grabbing the entire table and then looping the rows and placing the unique values into a unique array.
But can the same thing be done with a MySQL query?
My table looks something like this:
| ID | VALUES |
---------------------------------------------------
| 1 | Acadian,Dart,Monarch |
| 2 | Cadillac,Dart,Lincoln,Uplander |
| 3 | Acadian,Freestar,Saturn |
| 4 | Cadillac,Uplander |
| 5 | Dart |
| 6 | Dart,Cadillac,Freestar,Lincoln,Uplander |
So my list of unique VALUES would then contain:
Acadian
Cadillac
Dart
Freestar
Lincoln
Monarch
Saturn
Uplander
Can this be done with a MySQL call alone, or is there a need for some PHP sorting as well?
Thanks
Why would you store your data like this in a database? You deliberately nullify all the extensive querying features you would want to use a database for in the first place. Instead, have a table like this:
| valueID | groupID | name |
----------------------------------
| 1 | 1 | Acadian |
| 2 | 1 | Dart |
| 3 | 1 | Monarch |
| 4 | 2 | Cadillac |
| 2 | 2 | Dart |
Notice the different valueID for Dart compared to Matthew's suggestion. That's to have same values have the same valueID (you may want to refer to these later on, and you don't want to make the same mistake of not thinking ahead again, do you?). Then make the primary key contain both the valueID and the groupID.
Then, to answer your actual question, you can retrieve all distinct values through this query:
SELECT name FROM mytable GROUP BY valueID
(GROUP BY should perform better here than a DISTINCT since it shouldn't have to do a table scan)
I would suggest selecting (and splitting) into a temp table and then making a call against that.
First, there is apparently no split function in MySQL http://blog.fedecarg.com/2009/02/22/mysql-split-string-function/ (this is three years old so someone can comment if this has changed?)
Push all of it into a temp table and select from there.
Better would be if it is possible to break these out into a table with this structure:
| ID | VALUES |AttachedRecordID |
---------------------------------------------------------------------
| 1 | Acadian | 1 |
| 2 | Dart | 1 |
| 3 | Monarch | 1 |
| 4 | Cadillac | 2 |
| 5 | Dart | 2 |
etc.

Categories