我真的不知道该怎么称呼它.
我有几张像这样的结构表
一个“句子”表
id | sentence | ...
----------------------------
1 | See Spot run | ...
2 | See Jane run | ...
3 | Jane likes cheese | ...
一个“单词”表
id | word (unique)
----------
1 | See
2 | Spot
3 | run
4 | Jane
5 | likes
6 | cheese
还有一个“word_references”表
sentence_id | word_id
---------------------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 3
2 | 4
3 | 4
3 | 5
3 | 6
我想根据相似性排序的共享单词返回彼此相似的句子对列表.所以它应该返回:
one | two | similarity
----------------------
1 | 2 | 2
2 | 3 | 1
因为句子1和2共用两个词:“看”和“跑”,而句子2和3共用一个词:“简”.
最佳答案 此查询应解决您的问题:
SELECT r1.sentence_id AS one,
r2.sentence_id AS two,
Count(*) AS similarity
FROM word_references r1
INNER JOIN word_references r2
ON r1.sentence_id < r2.sentence_id
AND r1.word_id = r2.word_id
GROUP BY r1.sentence_id,
r2.sentence_id
这给了:
one | two | similarity
----------------------
1 | 2 | 2
2 | 3 | 1
sqlfiddle here
如果更改表达式r1.sentence_id< r2.sentence_id到r1.sentence_id<> r2.sentence_id,你会得到关系的两面:
one | two | similarity
----------------------
1 | 2 | 2
2 | 3 | 1
2 | 1 | 2
3 | 2 | 1