String.prototype.repeat在V8和Chakra中的完成

2019年8月30日 216次阅读来源: flowmemo

作者: @flowmemo
源地点: http://flowmemo.github.io/2016/03/25/str…

近来一个left-pad事宜搞得javascript圈沸沸扬扬的. 我们临时把这个事变放一边, 来看看left-pad自身的完成.

left-pad的源码以下:

module.exports = leftpad;

function leftpad (str, len, ch) {
  str = String(str);
  var i = -1;
  if (!ch && ch !== 0) ch = ' ';
  len = len - str.length;
  while (++i < len) {
    str = ch + str;
  }
  return str;
}

这段顺序的作用是, 给一个字符串str(或能够转成str的变量), 用字符ch在左侧补位, 将其补到长度为len.
固然这个顺序没做足够的参数搜检, 这个就不细说了. 我们剖析一下它的效力:
假如要补n位, 字符串加法的实行次数是n次, 也就是O(n).

我们来看一下如何用ES6的String.prototype.repeat来完成这个功用:
假定str是字符串, len黑白负整数, ch参数可选(假如有的话必需是长度为1的字符串) 所以我更喜好强范例言语.

function leftpadES6 (str, len, ch) {
  ch = ch||' '
  return ch.repeat(len - str.length) + str
}

固然还没完, 这么写的效力怎样呢, 我们得看一下js引擎对String.prototype.repeat的完成.

V8

下面是Chrome/Chromuim的js引擎V8的完成, 直接用js写的
源码地点: https://code.google.com/p/chromium/codes…

// ES6, section 21.1.3.13
function StringRepeat(count) {
  CHECK_OBJECT_COERCIBLE(this, "String.prototype.repeat");
  
  var s = TO_STRING(this);
  var n = TO_INTEGER(count);

  if (n < 0 || n === INFINITY) throw MakeRangeError(kInvalidCountValue);

  // Early return to allow an arbitrarily-large repeat of the empty string.
  if (s.length === 0) return "";

  // The maximum string length is stored in a smi, so a longer repeat
  // must result in a range error.
  if (n > %_MaxSmi()) throw MakeRangeError(kInvalidCountValue);

  var r = "";
  while (true) {
    if (n & 1) r += s;
    n >>= 1;
    if (n === 0) return r;
    s += s;
  }
}

疏忽参数搜检, 我们来关注算法自身. 这个算法的中心是, 运用字符串反复次数count的二进制示意, 经由过程字符串的自加来削减字符串加法的次数. 这个算法和疾速幂算法异常像. 细致的算法诠释能够看这篇文章: http://www.2ality.com/2014/01/efficient-…

举例个例子, 假如count = 6 , 谁人它的二进制示意就为110₂ = 4*1 + 2*1 + 1*0. 也就是说关于长度为6的字符串s, 有
s.repeat(6) = s.repeat(4) + s.repeat(2)

注意到每次轮回最多有两次字符串加法的操纵, 而轮回次数约等于logn, 所以按字符串加法的次数来记它的复杂度为O(logn)

Firefox的完成相似, 地点在这里https://dxr.mozilla.org/mozilla-central/… .

Chakra

好了, 我们来看看微软的Edge浏览器所运用的js引擎, Chakra对String.prototype.repeat的完成, 它是用的C++.
源码地点: https://github.com/Microsoft/ChakraCore/…

Chakra中完成repeat分了两个函数, 一个是JavascriptString::EntryRepeat, 它的主如果做一些初始化事情, 参数搜检, 特殊情况的处置惩罚.中心算法是JavascriptString::RepeatCore, 代码以下

JavascriptString* JavascriptString::RepeatCore(JavascriptString* currentString, charcount_t count, ScriptContext* scriptContext)
  {
    Assert(currentString != nullptr);
    Assert(currentString->GetLength() > 0);
    Assert(count > 0);

    const char16* currentRawString = currentString->GetString();
    int currentLength = currentString->GetLength();

    charcount_t finalBufferCount = UInt32Math::Add(UInt32Math::Mul(count, currentLength), 1);
    char16* buffer = RecyclerNewArrayLeaf(scriptContext->GetRecycler(), char16, finalBufferCount);

    if (currentLength == 1)
    {
        wmemset(buffer, currentRawString[0], finalBufferCount - 1);
        buffer[finalBufferCount - 1] = '\0';
    }
    else
    {
        char16* bufferDst = buffer;
        size_t bufferDstSize = finalBufferCount;

        for (charcount_t i = 0; i < count; i += 1)
        {
            js_wmemcpy_s(bufferDst, bufferDstSize, currentRawString, currentLength);
            bufferDst += currentLength;
            bufferDstSize -= currentLength;
        }
        Assert(bufferDstSize == 1);
        *bufferDst = '\0';
    }

    return JavascriptString::NewWithBuffer(buffer, finalBufferCount - 1, scriptContext);
  }

看起来很长是吗? 不要被吓到了, 我们只体贴中心算法, 实在就这一小段:

if (currentLength == 1)
{
    wmemset(buffer, currentRawString[0], finalBufferCount - 1);
    buffer[finalBufferCount - 1] = '\0';
}
else
{
    char16* bufferDst = buffer;
    size_t bufferDstSize = finalBufferCount;

    for (charcount_t i = 0; i < count; i += 1)
    {
        js_wmemcpy_s(bufferDst, bufferDstSize, currentRawString, currentLength);
        bufferDst += currentLength;
        bufferDstSize -= currentLength;
    }
    Assert(bufferDstSize == 1);
    *bufferDst = '\0';
}

粗心是假如字符串长度自身为1, 也就是一个字符, 那就直接用wmemset(相似于memset)将一块内存全用这个字符添补; 假如不为字符串长度不为1, 就一次衔接一个字符串. 疏忽js_wmemcpy_s

我们之前说了, V8完成的字符串加法的操纵次数是O(logn)的, 然则, 我们要把一个字符串反复n次, 肯定要得在要对O(n)的内存举行写操纵.
update1: 经由@哦胖茶巨巨的提示(http://weibo.com/2451315930/DowQCo6wN), V8、ChakraCore 和 Rhino 底层完成字符串拼接用的都是rope(tree). 关于rope来讲字符串衔接不需要为新字符串拓荒被内存并把内容写进去, 而是兼并两个二叉树, 详见这里: https://en.wikipedia.org/wiki/Rope_%28da… .

这么看的话, 在不斟酌rope摊平的情况下, 仅从算法复杂度的角度来看V8的rope衔接和疾速幂完成是比Chakra的要好. Chakra里字符串也用了rope, 然则对repeat的完成没有用rope, 直接就是复制内存. 字符串在内存是就是以一连内存的情势存储的话, 把一个字符串反复n次, 肯定要得在要对O(n)的内存举行写操纵, 所以疾速幂优化意义不大.

至于V8的这类字符串加法用rope、js完成用疾速幂的要领好, 照样Chakra这类直接复制内存的要领好, 我没有跑benchmark, 就不下结论了.

末了想说的是, 虽然这篇文章关注的是完成算法, 然则参数搜检、边界条件的处置惩罚, 也异常的主要, 万万不能以为无所谓. 能够说东西函数中edge case的处置惩罚常常比算法更加头疼…

P.S.1 我在晓得V8, Chakra的字符串完成用了rope后对本文举行过修正
*P.S.2 我之前搞错了, V8中字符串“+”操纵符不依赖String.prototype.concat, 现实恰好相反，是String.prototype.concat的完成直接用了字符串“+”运算. 源码: https://code.google.com/p/chromium/codes…

    原文作者：flowmemo
    原文地址: https://segmentfault.com/a/1190000004697112
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。