c# – 没有实际解析的TryParse或用于检查具有性能优势的文本格式的任何其他替代方法

我目前正在创建自己的库,名为TextCheckerExtension,它基本上会在进一步处理之前尝试检查文本格式(下面显示的是短代码片段).

现在,我知道我在做什么与Parse或TryParse非常相似这个和所有Parse之间的唯一区别是它不会生成任何已解析的对象.它只是检查字符串.

我的问题是:

> Parse和TryParse都生成解析对象.如果我们只想检查字符串输入的有效性,那么生成Parsed对象的开销真的会影响方法的性能吗(这种情况下的任何例子)?也就是说,自生成的检查方法在不生成解析对象的情况下会更快.
>在C#中是否有任何替代方法(内置)来检查各种字符串格式的有效性而不生成解析对象?
> Regex可以作为替代选择吗?

对此事的任何意见都将非常感谢.

public static bool IsPureHex(string str) {
  return IsPureHex(str, int.MaxValue); //assuming very high value!
}

public static bool IsPureHex(string str, int maxNibble) {
  if (str.Length > maxNibble) //if the length is violated, it is considered failed
    return false;
  for (int i = 0; i < Math.Min(maxNibble, str.Length); i++)
    if (!((char.IsDigit(str, i)) || ((str[i] >= 'A') && (str[i] <= 'F')) || ((str[i] >= 'a') && (str[i] <= 'f'))))
      return false;
  return true;
}

public static bool IsHex(string str) {
  if (str.Length <= 2 || (str[0] != '0') || !((str[1] == 'x') || (str[1] == 'X'))) //Check input validity
    return false;
  for (int i = 2; i < str.Length; i++)
    if (!((char.IsDigit(str, i)) || ((str[i] >= 'A') && (str[i] <= 'F')) || ((str[i] >= 'a') && (str[i] <= 'f'))))
      return false;
  return true;
}

public static bool IsFloat(string str) { //another criterion for float, giving "f" in the last part?
  int dotCounter = 0;
  for (int i = 0; i < str.Length; i++) { //Check if it is float
    if (!(char.IsDigit(str, i)) && (str[i] != '.'))
      return false;
    else if (str[i] == '.')
      ++dotCounter; //Increase the dotCounter whenever dot is found
    if (dotCounter > 1) //If there is more than one dot for whatever reason, return error
      return false;
  }
  return dotCounter == 1 && str.Length > 1;
}

public static bool IsDigitsOnly(string str) {
  foreach (char c in str)
    if (c < '0' || c > '9')
      return false;      
  return str.Length >= 1; //there must be at least one character here to continue
}

public static bool IsInt(string str) { //is not designed to handle null input or empty string
  return str[0] == '-' && str.Length > 1 ? IsDigitsOnly(str.Substring(1)) : IsDigitsOnly(str);
}

最佳答案 它的确有所作为.

令我惊讶的是:当我出于好奇而继续这个项目时,我发现进行实际的解析并简单地检查字符串是否具有某种格式确实会对时间性能产生显着影响.

在下面的实验中,通过创建没有解析器的检查器,与使用内置的TryParse相比,我们可以获得33.77%到58.26%的时间增益.另外,我还将我的扩展名与Microsoft.VisualBasic.Information dll中的VB.Net IsNumeric进行比较.

以下是(1)测试代码,(2)测试场景,(3)测试代码,以及(4)测试结果(必要时在每个部分添加注释):

经测试的代码:

这是经过测试的代码,我的扩展代码名为Extension.Checker.Text.到目前为止,我只测试了泛型整数和float / double(有/没有点 – 可能更好地称为分数编号)的场景.通用整数I表示未选中最大值和最小值范围(例如,8位有符号整数的-128到127).此代码仅用于确定文本是否为整数,因为人们在不查看其范围的情况下理解它.对于float / double来说,情况也是如此.

与发布此答案时答案有400个upvotes的this帖子相比,我相信可以安全地假设我们通常会使用int.TryParse来测试文本是否为整数或不是第一次尝试(尽管对于通用整数文本,其范围限制为-2e9到2e9). Some other posts也表现出相同的趋势.我们可以从这些帖子中看到的另一种方法是通过Visual Basic IsNumeric进行检查.因此,我也将该方法用于基准测试.

public static bool IsFloatOrDoubleByDot(string str) { //another criterion for float, giving "f" in the last part?
        if (string.IsNullOrWhiteSpace(str))
            return false;
        int dotCounter = 0;
        for (int i = str[0] == '-' ? 1 : 0; i < str.Length; i++) { //Check if it is float
    if (!(char.IsDigit(str, i)) && (str[i] != '.'))
      return false;
    else if (str[i] == '.')
      ++dotCounter; //Increase the dotCounter whenever dot is found
    if (dotCounter > 1) //If there is more than one dot for whatever reason, return error
      return false;
  }
  return dotCounter == 0 || dotCounter == 1 && str.Length > 1;
}

public static bool IsDigitsOnly(string str) {
  foreach (char c in str)
    if (c < '0' || c > '9')
      return false;      
  return str.Length >= 1; //there must be at least one character here to continue
}

public static bool IsInt(string str) { //is not designed to handle null input or empty string
        if (string.IsNullOrWhiteSpace(str))
            return false;           
  return str[0] == '-' && str.Length > 1 ? IsDigitsOnly(str.Substring(1)) : IsDigitsOnly(str);
}

测试场景:

到目前为止,我测试了四种不同的场景:

>整数(在int.TryParse的可解析范围内)
>浮动包含点的文本(最大7位精度,在float.TryParse的精确可解析范围内)
>包含点的双重文本(最多11位精度,在double.TryParse的精确可解析范围内)
>整数文本读取为float / double文本(在double.TryParse的可解析范围内)

对于每个场景,我有四个案例要测试:

>有效的正值文本
>有效的负值文本
>无效的正值文本
>无效的负值文本

对于每个案例,我测试了检查所需的时间:

>适合的TryParse
>合适的Extension.Checker.Text
> Visual Basic IsNumeric
>其他类型特定的技巧,如string.All(char.IsDigit)为整数

测试代码:

为了测试上述场景,我使用以下数据:

string intpos = "1342517340";
string intneg = "-1342517340";
string intfalsepos = "134251734u";
string intfalseneg = "-134251734u";
string floatpos = "56.34251";
string floatneg = "-56.34251";
string floatfalsepos = "56.3425h";
string floatfalseneg = "-56.3425h";
string doublepos = "56.342515312";
string doubleneg = "-56.342515312";
string doublefalsepos = "56.34251531y";
string doublefalseneg = "-56.34251531y";
List<string> liststr = new List<string>() {
    intpos, intneg, intfalsepos, intfalseneg,
    floatpos, floatneg, floatfalsepos, floatfalseneg,
    doublepos, doubleneg, doublefalsepos, doublefalseneg
};
List<string> liststrcode = new List<string>() {
    "i+", "i-", "if+", "if-",
    "f+", "f-", "ff+", "ff-",
    "d+", "d-", "df+", "df-"
};
bool parsed = false; //to store checking result
int intval; //for int.TryParse result
float fval; //for float.TryParse result
double dval; //for double.TryParse result

文本代码的格式为.例子:

> if =整数误报
> f- =浮动负数

我使用以下测试循环来获得每种方法的每种方法的时间性能:

//time snap
for (int i = 0; i < 10000000; ++i) //for integer case
    parsed = int.TryParse(str, out intval); //built-in TryParse
//time snap
//Print the result
//time snap
for (int i = 0; i < 10000000; ++i)
    parsed = Extension.Checker.Text.IsInt(str); //extension Text checker
//time snap
//Print the result
//time snap
for (int i = 0; i < 10000000; ++i)
    parsed = Information.IsNumeric(str); //Microsoft.VisualBasic
//time snap
//Print the result
//time snap
for (int i = 0; i < 10000000; ++i)
    parsed = str[0] == '-' ? str.Substring(1).All(char.IsDigit) : str.All(char.IsDigit); //misc methods
//time snap
//Print the result
//Print the result difference

我使用笔记本电脑测试每个方法每个测试用例多达1000万次迭代.

注意:请注意,我的Extension.Checker.Text的行为与内置的TryParse并不完全等效,例如检查字符串或字符串的数值范围,其他格式可能是TryParse案例可接受的但不是在我的情况下.这是因为我的Extension.Checker.Text的主要目的不是将给定文本转换为C#中的某些数据类型作为内置的TryParse.这就是我的Extension.Checker.Text的重点.这里进行的比较仅仅是为了比较 – 在​​时间性能方面的好处 – (1)检查某些文本格式的popular way与(2)我们可能做出的扩展方法,因为我们不需要TryParse的结果,但仅限于文本是否具有某种格式.与VB IsNumeric相比,这是相同的

测试结果:

我打印出解析/检查结果,以确保我的扩展名与内置的TryParse,VB.Net IsNumeric以及给定案例的其他替代技巧具有相同的结果.我还打印原始文本,以便于阅读/检查.然后,在测试之间的时间间隔,我可以获得每个测试用例的时间性能和时间差异,我也打印出来.然而,时间增益比较仅通过TryParse完成.这是完整的结果.

[2016-01-05 06:04:25.466 UTC] Integer:
[2016-01-05 06:04:26.999 UTC] TryParse i+:  1531 ms Result: True    Text: 1342517340
[2016-01-05 06:04:27.639 UTC] Extension i+:     639 ms  Result: True    Text: 1342517340
[2016-01-05 06:04:30.345 UTC] VB.IsNumeric i+:  2705 ms Result: True    Text: 1342517340
[2016-01-05 06:04:31.468 UTC] All is digit i+:  1124 ms Result: True    Text: 1342517340
[2016-01-05 06:04:31.469 UTC] Gain on TryParse i+:  892 ms  Percent: -58.26%
[2016-01-05 06:04:31.469 UTC] 
[2016-01-05 06:04:32.996 UTC] TryParse i-:  1527 ms Result: True    Text: -1342517340
[2016-01-05 06:04:33.846 UTC] Extension i-:     849 ms  Result: True    Text: -1342517340
[2016-01-05 06:04:36.413 UTC] VB.IsNumeric i-:  2566 ms Result: True    Text: -1342517340
[2016-01-05 06:04:37.693 UTC] All is digit i-:  1280 ms Result: True    Text: -1342517340
[2016-01-05 06:04:37.694 UTC] Gain on TryParse i-:  678 ms  Percent: -44.40%
[2016-01-05 06:04:37.694 UTC] 
[2016-01-05 06:04:39.058 UTC] TryParse if+:     1364 ms Result: False   Text: 134251734u
[2016-01-05 06:04:39.845 UTC] Extension if+:    786 ms  Result: False   Text: 134251734u
[2016-01-05 06:04:42.436 UTC] VB.IsNumeric if+:     2590 ms Result: False   Text: 134251734u
[2016-01-05 06:04:43.540 UTC] All is digit if+:     1103 ms Result: False   Text: 134251734u
[2016-01-05 06:04:43.540 UTC] Gain on TryParse if+:     578 ms  Percent: -42.38%
[2016-01-05 06:04:43.540 UTC] 
[2016-01-05 06:04:44.937 UTC] TryParse if-:     1397 ms Result: False   Text: -134251734u
[2016-01-05 06:04:45.745 UTC] Extension if-:    807 ms  Result: False   Text: -134251734u
[2016-01-05 06:04:48.275 UTC] VB.IsNumeric if-:     2530 ms Result: False   Text: -134251734u
[2016-01-05 06:04:49.541 UTC] All is digit if-:     1267 ms Result: False   Text: -134251734u
[2016-01-05 06:04:49.542 UTC] Gain on TryParse if-:     590 ms  Percent: -42.23%
[2016-01-05 06:04:49.542 UTC] 
[2016-01-05 06:04:49.542 UTC] Float by Dot:
[2016-01-05 06:04:51.136 UTC] TryParse f+:  1594 ms Result: True    Text: 56.34251
[2016-01-05 06:04:51.967 UTC] Extension f+:     830 ms  Result: True    Text: 56.34251
[2016-01-05 06:04:54.328 UTC] VB.IsNumeric f+:  2360 ms Result: True    Text: 56.34251
[2016-01-05 06:04:54.329 UTC] Time Gain f+:     764 ms  Percent: -47.93%
[2016-01-05 06:04:54.329 UTC] 
[2016-01-05 06:04:55.962 UTC] TryParse f-:  1634 ms Result: True    Text: -56.34251
[2016-01-05 06:04:56.790 UTC] Extension f-:     827 ms  Result: True    Text: -56.34251
[2016-01-05 06:04:59.102 UTC] VB.IsNumeric f-:  2313 ms Result: True    Text: -56.34251
[2016-01-05 06:04:59.103 UTC] Time Gain f-:     807 ms  Percent: -49.39%
[2016-01-05 06:04:59.103 UTC] 
[2016-01-05 06:05:00.623 UTC] TryParse ff+:     1519 ms Result: False   Text: 56.3425h
[2016-01-05 06:05:01.429 UTC] Extension ff+:    802 ms  Result: False   Text: 56.3425h
[2016-01-05 06:05:03.730 UTC] VB.IsNumeric ff+:     2301 ms Result: False   Text: 56.3425h
[2016-01-05 06:05:03.730 UTC] Time Gain ff+:    717 ms  Percent: -47.20%
[2016-01-05 06:05:03.731 UTC] 
[2016-01-05 06:05:05.312 UTC] TryParse ff-:     1581 ms Result: False   Text: -56.3425h
[2016-01-05 06:05:06.147 UTC] Extension ff-:    835 ms  Result: False   Text: -56.3425h
[2016-01-05 06:05:08.485 UTC] VB.IsNumeric ff-:     2337 ms Result: False   Text: -56.3425h
[2016-01-05 06:05:08.486 UTC] Time Gain ff-:    746 ms  Percent: -47.19%
[2016-01-05 06:05:08.486 UTC] 
[2016-01-05 06:05:08.487 UTC] Double by Dot:
[2016-01-05 06:05:10.341 UTC] TryParse d+:  1854 ms Result: True    Text: 56.342515312
[2016-01-05 06:05:11.492 UTC] Extension d+:     1151 ms Result: True    Text: 56.342515312
[2016-01-05 06:05:14.035 UTC] VB.IsNumeric d+:  2541 ms Result: True    Text: 56.342515312
[2016-01-05 06:05:14.035 UTC] Time Gain d+:     703 ms  Percent: -37.92%
[2016-01-05 06:05:14.036 UTC] 
[2016-01-05 06:05:15.916 UTC] TryParse d-:  1879 ms Result: True    Text: -56.342515312
[2016-01-05 06:05:17.051 UTC] Extension d-:     1133 ms Result: True    Text: -56.342515312
[2016-01-05 06:05:19.542 UTC] VB.IsNumeric d-:  2492 ms Result: True    Text: -56.342515312
[2016-01-05 06:05:19.543 UTC] Time Gain d-:     746 ms  Percent: -39.70%
[2016-01-05 06:05:19.543 UTC] 
[2016-01-05 06:05:21.210 UTC] TryParse df+:     1667 ms Result: False   Text: 56.34251531y
[2016-01-05 06:05:22.315 UTC] Extension df+:    1104 ms Result: False   Text: 56.34251531y
[2016-01-05 06:05:24.797 UTC] VB.IsNumeric df+:     2481 ms Result: False   Text: 56.34251531y
[2016-01-05 06:05:24.798 UTC] Time Gain df+:    563 ms  Percent: -33.77%
[2016-01-05 06:05:24.798 UTC] 
[2016-01-05 06:05:26.509 UTC] TryParse df-:     1711 ms Result: False   Text: -56.34251531y
[2016-01-05 06:05:27.596 UTC] Extension df-:    1086 ms Result: False   Text: -56.34251531y
[2016-01-05 06:05:30.039 UTC] VB.IsNumeric df-:     2442 ms Result: False   Text: -56.34251531y
[2016-01-05 06:05:30.040 UTC] Time Gain df-:    625 ms  Percent: -36.53%
[2016-01-05 06:05:30.041 UTC] 
[2016-01-05 06:05:30.041 UTC] Integer as Double by Dot:
[2016-01-05 06:05:31.794 UTC] TryParse (doubled) i+:    1752 ms Result: True    Text: 1342517340
[2016-01-05 06:05:32.904 UTC] Extension (doubled) i+:   1109 ms Result: True    Text: 1342517340
[2016-01-05 06:05:35.590 UTC] VB.IsNumeric (doubled) d+:    2684 ms Result: True    Text: 1342517340
[2016-01-05 06:05:35.590 UTC] Time Gain d+:     643 ms  Percent: -36.70%
[2016-01-05 06:05:35.591 UTC] 
[2016-01-05 06:05:37.390 UTC] TryParse (doubled) i-:    1799 ms Result: True    Text: -1342517340
[2016-01-05 06:05:38.515 UTC] Extension (doubled) i-:   1125 ms Result: True    Text: -1342517340
[2016-01-05 06:05:41.139 UTC] VB.IsNumeric (doubled) d-:    2623 ms Result: True    Text: -1342517340
[2016-01-05 06:05:41.139 UTC] Time Gain d-:     674 ms  Percent: -37.47%
[2016-01-05 06:05:41.140 UTC] 
[2016-01-05 06:05:42.840 UTC] TryParse (doubled) if+:   1700 ms Result: False   Text: 134251734u
[2016-01-05 06:05:43.933 UTC] Extension (doubled) if+:  1092 ms Result: False   Text: 134251734u
[2016-01-05 06:05:46.575 UTC] VB.IsNumeric (doubled) df+:   2642 ms Result: False   Text: 134251734u
[2016-01-05 06:05:46.576 UTC] Time Gain df+:    608 ms  Percent: -35.76%
[2016-01-05 06:05:46.577 UTC] 
[2016-01-05 06:05:48.328 UTC] TryParse (doubled) if-:   1750 ms Result: False   Text: -134251734u
[2016-01-05 06:05:49.434 UTC] Extension (doubled) if-:  1106 ms Result: False   Text: -134251734u
[2016-01-05 06:05:52.042 UTC] VB.IsNumeric (doubled) df-:   2607 ms Result: False   Text: -134251734u
[2016-01-05 06:05:52.042 UTC] Time Gain df-:    644 ms  Percent: -36.80%
[2016-01-05 06:05:52.043 UTC] 

我从目前的结果中得出的结论:

>我们可以使用上面的扩展方法获得的最佳性能增益是当文本类型是有效的正整数时.时间
我们可以获得的性能提升高达58.26%
案件.也许这归功于有效正整数文本的简单性.
>我们可以使用如上所述的扩展方法获得的最差性能增益是当文本类型为无效正双精度时.时间
我们可以获得的性能增益仅为给定的33.77%
案件.
>对于整数和float / double(带/不带点)文本格式,要检查文本是否是那些格式而不需要实际解析它,可以通过构建我们自己的文本扩展来加快检查过程与使用内置TryParse相比,检查器.对于所有情况,VB IsNumeric都比休息时慢(这也令我惊讶,因为根据post中的基准测试,VB看起来相当快 – 尽管不是最好的).

可能的用途:

这种扩展检查的一种可能用途是,如果您收到某个字符串并且您知道它可以是多种格式类型(例如,整数或双精度),但您想要先检查实际的文本类型而不是检查时的实际解析.对于这种给定的情况,扩展方法可以加速该过程.

另一个用途是在计算语言领域,通常你想知道文本的类型而不实际解析它以便在计算上使用.

点赞