To fulfill my, irrational, desire to record as much running data as possible, I undertook the task of recording everyday of running last year. The metrics I recorded were basic: time, distance, weight. Now that it is a full year later, I am trying to figure out what the heck I do with this data? So in a effort to do a little entertaining analysis, I thought I would provide some stats and charts for everyone to peruse. Let me know if you have any thoughts or ways to better present the data. I am always open to suggestions. I should state now, that these are very rough estimates and I am probably in gross need of brushing up on my statistics. With that being said take it with a grain of salt and enjoy…the 100 mile prediction is at the bottom!
Basic Metrics:
Total Yearly Miles: 1505.2
Highest Daily Mileage: 51.2
Highest Weekly Mileage: 62
Average Miles per Month: 99.7
Average Pace (min/mile): 8:50
Chart 1:
I think this is a somewhat obligatory chart and kinda just gives a sense of the year in terms of monthly totals. I added the moving average to provide a little smoothing of the data and to give a better sense of the average level given that month and the previous months values.
As you can tell, beginning in July I really started to get back into running after three months of zero to very low mileage. I escalated fairly quickly from July on and only tapered off in November due to vacations/injury/taper before North Face 50 on December 3rd. I can safely say that nearly half of the December total is due to North Face.
Chart 2:
I find this chart to be much more interesting and thought provoking than Chart 1. First, the chart is seeking to find the relationship between my pace and the distance I run. This is useful for the purposes of predicting how fast I will run at a particular distance. Obviously, the relationship is non-linear and in fact is more likely logarithmic or maybe even quadratic. Second, this plot provides an insight into the model that could be used to forecast race performance.
Getting down to the actual numbers, it is clear that the fit of the model to the data is somewhat lacking given the low R^2 and therefore the low amount of the variance in pace explained by distance. In its current form, I exclude races from the regression analysis simply because they are efforts that were outside the norm but I have included them in the plot. So this plot can be interpreted in terms of training. As such, the interpretation from this simple linear model is that for every mile run, the pace increases by .073 minutes per mile. Due to the excel handling of time, this is literally interpreted as an increase of 7.3% of a minute or 60*.073 = 4.38 seconds. Note that in an abstract sense, the constant (8.3324 or 8:20 min/mile) is the pace that would be run if I ran zero miles.
Chart 3:
The purpose of this exercise was to predict my 50 mile pace at North Face using my training data. I employ two models one linear and the other logarithmic. As is apparent, neither truly captures my race time. The differential for the linear model and my actual time is plus 1:42 min/mile and for the log model minus 00:21 min/mile. Again, the fit of these models is somewhat suspect given my naive regression analysis.
One potential way to improve the fit of the model is to use only race times. However, I only have five races this year. This suggests, I could use last year’s races as well to bolster the amount of data and therefore the statistical power. The only problem with this is the downward trend (faster pace) in race performance over time. One way to adjust for this is to introduce a time variable that would account for changes over time. Take for example the Pacifica 30k trail races I ran in 2010 and 2011, one I ran in 3:33:46 and the other in 2:56:19. I tend to believe that a large part of this improvement is due to an improvement in performance related to increased training. However, in my current model this improvement would not be accounted for. I plan to explore these improvements at a later date.
For now, I am looking for any helpful comments as I would like to develop a predictor for ultra times. I am not sure if there is anything out there yet, but would like to calibrate something. I am imagining the old model of the two mile time trial is not very accurate for anything beyond the 26.2 mile mark. That being said, I think I can develop something if I can get enough data from a variety of individuals or scrape the ultrasignup website. So, my first job will be to gather the appropriate data but I need to think about what data I need first (e.g. race times and distances). So be on your toes, you may be getting an email soon!
Alright, PREDICTION TIME. The two models I have constructed suggest that I will run between a 10:18 and a 15:40 min/mile pace for a 100 miles or to put the range in hours between 17:10 and 26:06. In a general sense, I consider these lower and upper bounds given my training. Obviously, race specific details such as elevation and altitude will change the time but I think this is a decent range albeit somewhat on the low end. In the case of Tahoe Rim Trail 100, I know that times in the 17, 18 and 19 hour range win the race, are fast and relatively few, so I am imagining something more towards the mid to top of the distribution meaning 22 hours and up. Anyway, only time will tell!
I will be sure to update this as the race gets closer and training progresses.