Skip to content

Commit e5ffdee

Browse files
authored
Merge pull request #11 from ericqu/FValue
F value
2 parents 320997c + a05e7ba commit e5ffdee

8 files changed

Lines changed: 191 additions & 69 deletions

File tree

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
name = "LinearRegressionKit"
22
uuid = "e91d531d-6e51-44a8-96b7-a10d5d51daa3"
33
authors = ["Eric Quere <13007637+ericqu@users.noreply.github.com>"]
4-
version = "0.7.6"
4+
version = "0.7.7"
55

66
[deps]
77
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"

README.md

Lines changed: 66 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,16 @@ Model statistics:
3333
R²: 0.938467 Adjusted R²: 0.935049
3434
MSE: 1.01417 RMSE: 1.00706
3535
σ̂²: 1.01417
36+
F Value: 274.526 with degrees of freedom 1 and 18, Pr > F (p-value): 2.41337e-12
3637
Confidence interval: 95%
3738
3839
Coefficients statistics:
39-
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) low ci high ci
40-
──────────────┼─────────────────────────────────────────────────────────────────────────────
41-
(Intercept) │ -2.44811 0.819131 -2.98867 0.007877 -4.16904 -0.727184
42-
x │ 27.6201 1.66699 16.5688 2.41337e-12 24.1179 31.1223
40+
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) code low ci high ci
41+
──────────────┼──────────────────────────────────────────────────────────────────────────────────────────
42+
(Intercept) │ -2.44811 0.819131 -2.98867 0.007877 ** -4.16904 -0.727184
43+
x │ 27.6201 1.66699 16.5688 2.41337e-12 *** 24.1179 31.1223
44+
45+
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
4346
```
4447

4548
# Contrasts with Julia Stats GLM package
@@ -78,6 +81,7 @@ Ridge Regression (potentially with analytical weights) is implemented in the Lin
7881
- Type 1 & 2 Sum of squares
7982
- Squared partial correlation coefficient, squared semi-partial correlation coefficient.
8083
- PRESS as the sum of square of predicted residuals errors
84+
- F Value (SAS naming) F Statistic (R naming) is presented with its p-value
8185

8286
## List of Statistics about the predicted values:
8387
- The predicted values
@@ -106,6 +110,7 @@ Please post your questions, feedabck or issues in the Issues tabs. As much as po
106110
- http://hua-zhou.github.io/teaching/biostatm280-2019spring/slides/12-sweep/sweep.html
107111
- https://github.com/mcreel/Econometrics for the Newey-West implementation
108112
- https://blogs.sas.com/content/iml/2013/03/20/compute-ridge-regression.html
113+
- Code from StatsModels https://github.com/JuliaStats/StatsModels.jl/blob/master/test/extension.jl (in December 2021)
109114

110115
# Examples
111116

@@ -135,24 +140,27 @@ lr
135140
Model definition: y ~ 1 + x
136141
Used observations: 101
137142
Model statistics:
138-
R²: 0.750957 Adjusted R²: 0.748441
139-
MSE: 5693.68 RMSE: 75.4565
140-
σ̂²: 5693.68 AIC: 875.338
143+
R²: 0.758985 Adjusted R²: 0.75655
144+
MSE: 5660.28 RMSE: 75.2348
145+
σ̂²: 5660.28 AIC: 874.744
146+
F Value: 311.762 with degrees of freedom 1 and 99, Pr > F (p-value): 2.35916e-32
141147
Confidence interval: 95%
142148
143149
Coefficients statistics:
144-
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) low ci high ci VIF
145-
──────────────┼──────────────────────────────────────────────────────────────────────────────────────────
146-
(Intercept) │ -24.5318 10.7732 -2.27711 0.0249316 -45.9082 -3.15535 0.0
147-
x │ 44.4953 2.57529 17.2778 1.20063e-31 39.3854 49.6052 1.0
150+
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) code low ci high ci VIF
151+
──────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────
152+
(Intercept) │ -26.6547 10.7416 -2.48145 0.0147695 * -47.9683 -5.34109 0.0
153+
x │ 45.3378 2.56773 17.6568 2.35916e-32 *** 40.2429 50.4327 1.0
154+
155+
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
148156
```
149-
This is pretty good, so let's further review some diagnostic plots.
157+
This is okay, so let's further review some diagnostic plots.
150158

151159
```julia
152160
[[ps["fit"] ps["residuals"]]
153161
[ps["histogram density"] ps["qq plot"]]]
154162
```
155-
![Overview Plots](https://github.com/ericqu/LinearRegressionKit.jl/raw/main/assets/asset_exe_072_01.svg "Overview Plots")
163+
![Illustrative Overview Plots](https://github.com/ericqu/LinearRegressionKit.jl/raw/main/assets/asset_exe_072_01.svg "Illustrative Overview Plots")
156164

157165
Please note that for the fit plot, the orange line shows the regression line, in dark grey the confidence interval for the mean, and in light grey the interval for the individuals predictions.
158166

@@ -168,18 +176,21 @@ Giving:
168176
Model definition: y ~ 1 + :(x ^ 3)
169177
Used observations: 101
170178
Model statistics:
171-
R²: 0.979585 Adjusted R²: 0.979379
172-
MSE: 466.724 RMSE: 21.6038
173-
σ̂²: 466.724 AIC: 622.699
179+
R²: 0.984023 Adjusted R²: 0.983861
180+
MSE: 375.233 RMSE: 19.3709
181+
σ̂²: 375.233 AIC: 600.662
182+
F Value: 6097.23 with degrees of freedom 1 and 99, Pr > F (p-value): 9.55196e-91
174183
Confidence interval: 95%
175184
176185
Coefficients statistics:
177-
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) low ci high ci VIF
178-
──────────────┼──────────────────────────────────────────────────────────────────────────────────────────
179-
(Intercept) │ 1.23626 2.65774 0.465157 0.642841 -4.03726 6.50979 0.0
180-
x ^ 3 │ 1.04075 0.0151001 68.9236 1.77641e-85 1.01079 1.07071 1.0
186+
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) code low ci high ci VIF
187+
──────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────
188+
(Intercept) │ -0.0637235 2.38304 -0.0267404 0.978721 -4.7922 4.66475 0.0
189+
x ^ 3 │ 1.05722 0.0135394 78.0847 9.55196e-91 *** 1.03036 1.08409 1.0
190+
191+
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
181192
```
182-
![Overview Plots](https://github.com/ericqu/LinearRegressionKit.jl/raw/main/assets/asset_exe_072_02.svg "Overview Plots")
193+
![Illustrative Overview Plots](https://github.com/ericqu/LinearRegressionKit.jl/raw/main/assets/asset_exe_072_02.svg "Illustrative Overview Plots")
183194

184195
Further, in addition to the diagnostic plots helping confirm if the residuals are normally distributed, a few tests can be requested:
185196

@@ -198,27 +209,30 @@ Giving:
198209
Model definition: y ~ 1 + :(x ^ 3)
199210
Used observations: 10001
200211
Model statistics:
201-
R²: 0.997951 Adjusted R²: 0.997951
202-
MSE: 43.4392 RMSE: 6.59084
203-
σ̂²: 43.4392 AIC: 37719.4
212+
R²: 0.99795 Adjusted R²: 0.99795
213+
MSE: 43.4904 RMSE: 6.59472
214+
σ̂²: 43.4904 AIC: 37731.2
215+
F Value: 4.868e+06 with degrees of freedom 1 and 9999, Pr > F (p-value): 0
204216
Confidence interval: 95%
205217
206218
Coefficients statistics:
207-
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) low ci high ci VIF
208-
──────────────┼──────────────────────────────────────────────────────────────────────────────────────────
209-
(Intercept) │ 11.3151 0.0815719 138.714 0.0 11.1552 11.475 0.0
210-
x ^ 3 │ 1.03984 0.000471181 2206.87 0.0 1.03892 1.04076 1.0
219+
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) code low ci high ci VIF
220+
──────────────┼───────────────────────────────────────────────────────────────────────────────────────────────────────
221+
(Intercept) │ 11.3419 0.0816199 138.96 0.0 *** 11.1819 11.5019 0.0
222+
x ^ 3 │ 1.04021 0.000471459 2206.35 0.0 *** 1.03928 1.04113 1.0
223+
224+
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
211225
212226
Diagnostic Tests:
213227
214228
Kolmogorov-Smirnov test (Normality of residuals):
215-
KS statistic: 3.47709 observations: 10001 p-value: 0.0
229+
KS statistic: 3.05591 observations: 10001 p-value: 0.0
216230
with 95.0% confidence: reject null hyposthesis.
217231
Anderson–Darling test (Normality of residuals):
218-
A² statistic: 24.924901 observations: 10001 p-value: 0.0
232+
A² statistic: 25.508958 observations: 10001 p-value: 0.0
219233
with 95.0% confidence: reject null hyposthesis.
220234
Jarque-Bera test (Normality of residuals):
221-
JB statistic: 241.764504 observations: 10001 p-value: 0.0
235+
JB statistic: 240.520153 observations: 10001 p-value: 0.0
222236
with 95.0% confidence: reject null hyposthesis.
223237
```
224238

@@ -230,28 +244,33 @@ lr = regress(@formula(y ~ 1 + x^3 ), vdf, cov=["white", "nw"])
230244
Giving:
231245
```
232246
Model definition: y ~ 1 + :(x ^ 3)
233-
Used observations: 101
247+
Used observations: 10001
234248
Model statistics:
235-
R²: 0.979585 Adjusted R²: 0.979379
236-
MSE: 466.724 RMSE: 21.6038
249+
R²: 0.99795 Adjusted R²: 0.99795
250+
MSE: 43.4904 RMSE: 6.59472
251+
PRESS: 435034
252+
F Value: 4.868e+06 with degrees of freedom 1 and 9999, Pr > F (p-value): 0
237253
Confidence interval: 95%
238254
239-
White's covariance estimator (HC3):
240-
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) low ci high ci
241-
──────────────┼─────────────────────────────────────────────────────────────────────────────
242-
(Intercept) │ 1.23626 2.66559 0.463785 0.64382 -4.05285 6.52538
243-
x ^ 3 │ 1.04075 0.0145322 71.6169 4.30034e-87 1.01192 1.06959
255+
White's covariance estimator (HC0):
256+
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) code low ci high ci
257+
──────────────┼──────────────────────────────────────────────────────────────────────────────────────────
258+
(Intercept) │ 11.3419 0.0828903 136.83 0.0 *** 11.1794 11.5044
259+
x ^ 3 │ 1.04021 0.000471604 2205.67 0.0 *** 1.03928 1.04113
260+
261+
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
244262
245263
Newey-West's covariance estimator:
246-
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) low ci high ci
247-
──────────────┼─────────────────────────────────────────────────────────────────────────────
248-
(Intercept) │ 1.23626 2.4218 0.510472 0.610857 -3.56912 6.04165
249-
x ^ 3 │ 1.04075 0.0129463 80.3897 5.60424e-92 1.01506 1.06644
264+
Terms ╲ Stats │ Coefs Std err t Pr(>|t|) code low ci high ci
265+
──────────────┼──────────────────────────────────────────────────────────────────────────────────────────
266+
(Intercept) │ 11.3419 0.158717 71.46 0.0 *** 11.0308 11.653
267+
x ^ 3 │ 1.04021 0.000863819 1204.19 0.0 *** 1.03851 1.0419
268+
269+
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
250270
```
251271

252272
Finally if you would like more examples I encourage you to go to the documentation as it gives a few more examples.
253273

254-
## Notable changes since version 0.74
255-
- The Sweep operator algorithm has been modified to work with column major. This should gives a performance boost.
256-
- The ```sweep_linreg``` function is now exported if one would like to do the linear regression with alreadz prepared design matrix. Although this gives back only the coefficients from the regression.
257-
- fix the White and Breusch-Pagan test description.
274+
## Notable changes since version 0.76
275+
- Added the F Value (F Statistics) as a default statistic computed when a model is fitted.
276+
- Significance codes similar to R (lm) are also displayed when p_values are requested (which they are by default).

0 commit comments

Comments
 (0)