MySQL根据第二个表中的条目对类似的行进行分组

我真的不知道该怎么称呼它.

我有几张像这样的结构表

一个“句子”表

id |    sentence       | ...
----------------------------
1  | See Spot run      | ...
2  | See Jane run      | ...
3  | Jane likes cheese | ...

一个“单词”表

id | word (unique)
----------
1  | See
2  | Spot
3  | run
4  | Jane
5  | likes
6  | cheese

还有一个“word_references”表

sentence_id | word_id
---------------------
          1 | 1 
          1 | 2
          1 | 3
          2 | 1
          2 | 3
          2 | 4
          3 | 4
          3 | 5
          3 | 6

我想根据相似性排序的共享单词返回彼此相似的句子对列表.所以它应该返回:

one | two | similarity
----------------------
 1  |  2  |  2
 2  |  3  |  1

因为句子1和2共用两个词:“看”和“跑”,而句子2和3共用一个词:“简”.

最佳答案 此查询应解决您的问题:

SELECT r1.sentence_id AS one, 
       r2.sentence_id AS two, 
       Count(*)       AS similarity 
FROM   word_references r1 
       INNER JOIN word_references r2 
               ON r1.sentence_id < r2.sentence_id 
                  AND r1.word_id = r2.word_id 
GROUP  BY r1.sentence_id, 
          r2.sentence_id 

这给了:

one | two | similarity
----------------------
 1  |  2  |  2
 2  |  3  |  1

sqlfiddle here

如果更改表达式r1.sentence_id< r2.sentence_id到r1.sentence_id<> r2.sentence_id,你会得到关系的两面:

one | two | similarity
----------------------
 1  |  2  |  2
 2  |  3  |  1
 2  |  1  |  2
 3  |  2  |  1
点赞