使用dplyr和rollapply在数据框中滚动预测

我的第一个问题:)

我的目标是:给定具有预测变量的数据框(每列预测器/行观察)使用lm拟合回归,然后使用滚动窗口使用最后一次观察来预测该值.

数据框看起来像:

> DfPredictor[1:40,]
           Y            X1           X2                 X3            X4               X5
1          3.2860       192.5115    2.1275              83381         11.4360           8.7440
2          3.2650       190.1462    2.0050              88720         11.4359           8.8971
3          3.2213       192.9773    2.0500              74130         11.4623           8.8380
4          3.1991       193.7058    2.1050              73930         11.3366           8.7536
5          3.2224       193.5407    2.0275              80875         11.3534           8.7555
6          3.2000       190.6049    2.0950              86606         11.3290           8.8555
7          3.1939       191.1390    2.0975              91402         11.2960           8.8433
8          3.1971       192.2921    2.2700              88181         11.2930           8.8681
9          3.1873       194.9700    2.3300             115959          1.9477           8.5245
10         3.2182       194.5396    2.4200             134754         11.3200           8.4990
11         3.2409       194.5396    2.2025             136685          1.9649           8.4192
12         3.2112       195.1362    2.1900             136316          1.9750           8.3752
13         3.2231       193.3560    2.2475             140295          1.9691           8.3546
14         3.2015       192.9649    2.2575             139474          1.9500           8.3116
15         3.1744       194.0154    2.1900             146202          1.8476           8.2225
16         3.1646       194.4423    2.2650             142983          1.8600           8.1948
17         3.1708       194.9473    2.2425             141377          1.8522           8.2589
18         3.1675       193.9788    2.2400             141377          1.8600           8.2600
19         3.1744       194.2563    2.3000             149875          1.8718           8.2899
20         3.1410       193.4316    2.2300             129561          1.8480           8.2395
21         3.1266       191.2633    2.2550             122636          1.8440           8.2396
22         3.1486       192.0354    2.3600             130996          1.8570           8.8640
23         3.1282       194.3351    2.4825              92430          1.7849           8.1291
24         3.1214       193.5196    2.4750              94814          1.7624           8.1991
25         3.1230       193.2017    2.3725              87590          1.7660           8.2310
26         3.1182       192.1642    2.4475              87715          1.6955           8.2414
27         3.1203       191.3744    2.3775              89857          1.6539           8.2480
28         3.1156       192.2646    2.3725              92159          1.5976           8.1676
29         3.1270       192.7555    2.3675              97425          1.5896           8.1162
30         3.1154       194.0375    2.3725              87598          1.5277           8.2640
31         3.1104       192.0596    2.3850              93236          1.5132           7.9999
32         3.0846       192.2792    2.2900              94608          1.4990           8.1600
33         3.0569       193.2573    2.3050              84663          1.4715           8.2200
34         3.0893       192.7632    2.2550              67149          1.4955           7.9590
35         3.0991       192.1229    2.3050              75519          1.4280           7.9183
36         3.0879       192.1229    2.3100              76756          1.3839           7.9133
37         3.0965       192.0502    2.2175              61748          1.3130           7.8750
38         3.0655       191.2274    2.2300              41490          1.2823           7.8656
39         3.0636       191.6342    2.1925              51049          1.1492           7.7447
40         3.1097       190.9312    2.2150              21934          1.1626           7.6895

例如,使用宽度= 10的滚动窗口,应估计回归,然后预测对应于X1,X2,…,X5的“Y”.
预测应包含在新列’Ypred’中.

使用rollapply lm / predict mudate有一些方法吗?

非常感谢!!

最佳答案 使用最后注释中的数据并假设在宽度为10的窗口中我们想要预测最后的Y(即第10个),然后:

library(zoo)

pred <- function(x) tail(fitted(lm(Y ~., as.data.frame(x))), 1)
transform(DF, pred = rollapplyr(DF, 10, pred, by.column = FALSE, fill = NA))

赠送:

        Y       X1     X2     X3      X4     X5     pred
1  3.2860 192.5115 2.1275  83381 11.4360 8.7440       NA
2  3.2650 190.1462 2.0050  88720 11.4359 8.8971       NA
3  3.2213 192.9773 2.0500  74130 11.4623 8.8380       NA
4  3.1991 193.7058 2.1050  73930 11.3366 8.7536       NA
5  3.2224 193.5407 2.0275  80875 11.3534 8.7555       NA
6  3.2000 190.6049 2.0950  86606 11.3290 8.8555       NA
7  3.1939 191.1390 2.0975  91402 11.2960 8.8433       NA
8  3.1971 192.2921 2.2700  88181 11.2930 8.8681       NA
9  3.1873 194.9700 2.3300 115959  1.9477 8.5245       NA
10 3.2182 194.5396 2.4200 134754 11.3200 8.4990 3.219764
11 3.2409 194.5396 2.2025 136685  1.9649 8.4192 3.241614
12 3.2112 195.1362 2.1900 136316  1.9750 8.3752 3.225423
13 3.2231 193.3560 2.2475 140295  1.9691 8.3546 3.217797
14 3.2015 192.9649 2.2575 139474  1.9500 8.3116 3.205856
15 3.1744 194.0154 2.1900 146202  1.8476 8.2225 3.177928
16 3.1646 194.4423 2.2650 142983  1.8600 8.1948 3.156405
17 3.1708 194.9473 2.2425 141377  1.8522 8.2589 3.176243
18 3.1675 193.9788 2.2400 141377  1.8600 8.2600 3.177165
19 3.1744 194.2563 2.3000 149875  1.8718 8.2899 3.177211
20 3.1410 193.4316 2.2300 129561  1.8480 8.2395 3.145533
21 3.1266 191.2633 2.2550 122636  1.8440 8.2396 3.127410
22 3.1486 192.0354 2.3600 130996  1.8570 8.8640 3.148792
23 3.1282 194.3351 2.4825  92430  1.7849 8.1291 3.124913
24 3.1214 193.5196 2.4750  94814  1.7624 8.1991 3.124992
25 3.1230 193.2017 2.3725  87590  1.7660 8.2310 3.117981
26 3.1182 192.1642 2.4475  87715  1.6955 8.2414 3.117679
27 3.1203 191.3744 2.3775  89857  1.6539 8.2480 3.119898
28 3.1156 192.2646 2.3725  92159  1.5976 8.1676 3.121039
29 3.1270 192.7555 2.3675  97425  1.5896 8.1162 3.123903
30 3.1154 194.0375 2.3725  87598  1.5277 8.2640 3.119438
31 3.1104 192.0596 2.3850  93236  1.5132 7.9999 3.113963
32 3.0846 192.2792 2.2900  94608  1.4990 8.1600 3.101229
33 3.0569 193.2573 2.3050  84663  1.4715 8.2200 3.076817
34 3.0893 192.7632 2.2550  67149  1.4955 7.9590 3.083266
35 3.0991 192.1229 2.3050  75519  1.4280 7.9183 3.089377
36 3.0879 192.1229 2.3100  76756  1.3839 7.9133 3.084225
37 3.0965 192.0502 2.2175  61748  1.3130 7.8750 3.075252
38 3.0655 191.2274 2.2300  41490  1.2823 7.8656 3.063025
39 3.0636 191.6342 2.1925  51049  1.1492 7.7447 3.068808
40 3.1097 190.9312 2.2150  21934  1.1626 7.6895 3.091819

注意:可重复形式的输入DF是:

Lines <- "           Y            X1           X2                 X3            X4               X5
1          3.2860       192.5115    2.1275              83381         11.4360           8.7440
2          3.2650       190.1462    2.0050              88720         11.4359           8.8971
3          3.2213       192.9773    2.0500              74130         11.4623           8.8380
4          3.1991       193.7058    2.1050              73930         11.3366           8.7536
5          3.2224       193.5407    2.0275              80875         11.3534           8.7555
6          3.2000       190.6049    2.0950              86606         11.3290           8.8555
7          3.1939       191.1390    2.0975              91402         11.2960           8.8433
8          3.1971       192.2921    2.2700              88181         11.2930           8.8681
9          3.1873       194.9700    2.3300             115959          1.9477           8.5245
10         3.2182       194.5396    2.4200             134754         11.3200           8.4990
11         3.2409       194.5396    2.2025             136685          1.9649           8.4192
12         3.2112       195.1362    2.1900             136316          1.9750           8.3752
13         3.2231       193.3560    2.2475             140295          1.9691           8.3546
14         3.2015       192.9649    2.2575             139474          1.9500           8.3116
15         3.1744       194.0154    2.1900             146202          1.8476           8.2225
16         3.1646       194.4423    2.2650             142983          1.8600           8.1948
17         3.1708       194.9473    2.2425             141377          1.8522           8.2589
18         3.1675       193.9788    2.2400             141377          1.8600           8.2600
19         3.1744       194.2563    2.3000             149875          1.8718           8.2899
20         3.1410       193.4316    2.2300             129561          1.8480           8.2395
21         3.1266       191.2633    2.2550             122636          1.8440           8.2396
22         3.1486       192.0354    2.3600             130996          1.8570           8.8640
23         3.1282       194.3351    2.4825              92430          1.7849           8.1291
24         3.1214       193.5196    2.4750              94814          1.7624           8.1991
25         3.1230       193.2017    2.3725              87590          1.7660           8.2310
26         3.1182       192.1642    2.4475              87715          1.6955           8.2414
27         3.1203       191.3744    2.3775              89857          1.6539           8.2480
28         3.1156       192.2646    2.3725              92159          1.5976           8.1676
29         3.1270       192.7555    2.3675              97425          1.5896           8.1162
30         3.1154       194.0375    2.3725              87598          1.5277           8.2640
31         3.1104       192.0596    2.3850              93236          1.5132           7.9999
32         3.0846       192.2792    2.2900              94608          1.4990           8.1600
33         3.0569       193.2573    2.3050              84663          1.4715           8.2200
34         3.0893       192.7632    2.2550              67149          1.4955           7.9590
35         3.0991       192.1229    2.3050              75519          1.4280           7.9183
36         3.0879       192.1229    2.3100              76756          1.3839           7.9133
37         3.0965       192.0502    2.2175              61748          1.3130           7.8750
38         3.0655       191.2274    2.2300              41490          1.2823           7.8656
39         3.0636       191.6342    2.1925              51049          1.1492           7.7447
40         3.1097       190.9312    2.2150              21934          1.1626           7.6895"

DF <- read.table(text = Lines, header = TRUE)
点赞