postgresql – 我需要计算一个单词出现的不同行数

2023年7月30日 339次阅读

到目前为止我有

SELECT
    word, count(*)
FROM
    (SELECT
            regexp_split_to_table(ColDescription, '\s') as word
    FROM tblCollection
    ) a
GROUP BY word
ORDER BY count(*) desc

这是一个很好的列表,列出了所有单词以及它们在我的描述列中出现的次数.

我需要的是一种方法来同时显示一个单词连续多少次.

例如,如果我的数据是：

hello hello test 
hello test test test
test hi

它会显示出来

word    count   # of rows it appears in
hello     3        2
test      5        3
hi        1        1

我是数据库的初学者,任何帮助表示赞赏！

样本表：

CREATE TABLE tblCollection ( ColDescription varchar(500) NOT NULL PRIMARY KEY);

样本数据是：

"hello hello test"
"hello test test test"
"test hi"

每个字符串都是自己的行.

最佳答案主要障碍是您的子查询不保留有关它在何处找到单词实例的任何信息.这很容易解决：

SELECT
  regexp_split_to_table(ColDescription, '\s') as word,
  ColDescription
FROM tblCollection

现在你已经列出了每个单词的源字段,这只是计算它们的问题：

SELECT
  word, count(*), count(distinct ColDescription)
FROM
...