java – 处理一个巨大的文件并快速调用文件的每一行上的函数

2019年7月27日 181次阅读

我有一个大约10.000.000行文本的文件(是的,我有足够的内存).

现在我想要一个MyClass列表(构造函数是MyClass(String s)与文件的每一行.现在我这样做：

List<MyClass> help = Files.lines(Paths.get(s))
                          .parallel()
                          .map(MyClass::new)
                          .collect(Collectors.toList());

但需要数年才能取得进展.关于如何加快这个问题的任何想法？

最佳答案首先,来自
Collectors.toList()文档的相关摘录：

[…]There are no guarantees on the type, mutability, serializability, or thread-safety of the List returned; if more control over the returned List is required, use toCollection(Supplier)

现在,让我们更深入地了解收藏家characteristics;我们发现这个：

public static final Collector.Characteristics CONCURRENT
Indicates that this collector is concurrent, meaning that the result container can support the accumulator function being called concurrently with the same result container from multiple threads.
If a CONCURRENT collector is not also UNORDERED, then it should only be evaluated concurrently if applied to an unordered data source.

现在,没有什么能保证Collectors.toList()返回的收集器完全是并发的.

尽管启动你的新类别可能需要一段时间,但这里的安全赌注是假设这个收集器不是并发的.但幸运的是,我们有一种方法可以使用并发集合,如javadoc中所述.那么,让我们试试：

.collect(
        Collector.of(CopyOnWriteArrayList::new,
            List::add,
            (o, o2) -> { o.addAll(o2); return o; },
            Function.<List<String>>identity(),
            Collector.Characteristics.CONCURRENT,
            Collector.Characteristics.IDENTITY_FINISH
        )
    )

这可能会加快速度.

现在,你有另一个问题.你不关闭你的流.

这个鲜为人知,但Stream(无论是任何类型还是{Int,Double,Long}流)都实现了AutoCloseable.您想要关闭I / O绑定的流,而Files.lines()就是这样的流.

所以,试试这个：

final List<MyClass> list;

try (
    final Stream<String> lines = Files.lines(...);
) {
    list = lines.parallel().map(MyClass::new)
        .collect(seeAbove);
}