`md5sum -c`不适用于Apache的MD5文件格式

2023年11月18日 295次阅读

我带你去旅行..

我正在尝试通过MD5在新的Debian(Jessie)机器上下载并验证Apache Spark(http://www.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz).

md5sum脚本已经存在于该机器上,我无需做任何事情.

因此,我继续将MD5校验和(http://www.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz.md5)下载到与下载的Spark相同的目录,然后执行：

md5sum -c spark-1.6.0-bin-hadoop2.6.tgz.md5

这失败了：

md5sum: spark-1.6.0-bin-hadoop2.6.tgz.md5: no properly formatted MD5 checksum lines found

所以我通过cat spark-1.6.0-bin-hadoop2.6.tgz.md5查看内容：

spark-1.6.0-bin-hadoop2.6.tgz: 62 4B 16 1F 67 70 A6 E0  E0 0E 57 16 AF D0 EA 0B

这是整个文件.看起来不错 – 也许Spark下载真的很糟糕？在采取这个假设之前,我将首先通过md5sum spark-1.6.0-bin-hadoop2.6.tgz看到MD5现在是什么：

624b161f6770a6e0e00e5716afd0ea0b  spark-1.6.0-bin-hadoop2.6.tgz

嗯,这是一个完全不同的格式 – 但是如果你看起来很难,你会注意到数字和字母实际上是相同的(除了小写和没有空格).看起来Debian附带的md5sum遵循不同的标准.

也许还有另一种方法可以运行这个命令？让我们试试md5sum –help：

Usage: md5sum [OPTION]... [FILE]...
Print or check MD5 (128-bit) checksums.
With no FILE, or when FILE is -, read standard input.

  -b, --binary         read in binary mode
  -c, --check          read MD5 sums from the FILEs and check them
      --tag            create a BSD-style checksum
  -t, --text           read in text mode (default)

The following four options are useful only when verifying checksums:
      --quiet          don't print OK for each successfully verified file
      --status         don't output anything, status code shows success
      --strict         exit non-zero for improperly formatted checksum lines
  -w, --warn           warn about improperly formatted checksum lines

      --help     display this help and exit
      --version  output version information and exit

The sums are computed as described in RFC 1321.  When checking, the input
should be a former output of this program.  The default mode is to print
a line with checksum, a character indicating input mode ('*' for binary,
space for text), and name for each FILE.

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
Report md5sum translation bugs to <http://translationproject.org/team/>
Full documentation at: <http://www.gnu.org/software/coreutils/md5sum>
or available locally via: info '(coreutils) md5sum invocation'

好的, – tag似乎改变了格式.让我们试试md5sum –tag spark-1.6.0-bin-hadoop2.6.tgz：

MD5 (spark-1.6.0-bin-hadoop2.6.tgz) = 624b161f6770a6e0e00e5716afd0ea0b

实际上,这是一种不同的格式,但仍然不是正确的格式.所以我查看Apache Download Mirrors页面上的说明并找到以下文本：

Alternatively, you can verify the MD5 hash on the file. A unix program called md5 or md5sum is included in many unix distributions. It is also available as part of 07003…

所以我按照这个链接发现Textutils在2003年被合并到Coreutils – 所以我实际上想要Coreutils的md5sum.但是你可以在md5sum –help转储的底部看到它已经来自Coreutils了.

这可能意味着我的Coreutils已经过时了.所以我会得到更新&& apt-get upgrade coreutils,但后来我发现：

Calculating upgrade... coreutils is already the newest version.

那是一个死胡同……但是等一下,他们说“md5或md5sum”！让我们看看那个领先.

md5脚本尚不存在,所以我将尝试apt-get install md5：

E: Unable to locate package md5

现在我迷路了,所以转向Google,然后转向StackOverflow寻求帮助..现在我来了.

那么两种不同的MD5文件格式是什么呢？我该如何处理这个问题(最后验证我的Apache Spark)？

最佳答案我相信gpg –print-md md5 spark-1.6.0-bin-hadoop2.6.tgz应该匹配.md5文件的内容.

md5 / sha文件的格式存在问题,因为构建spark版本的脚本使用gpg –print-md md5来创建签名文件.见：https://issues.apache.org/jira/browse/SPARK-5308