Posts

Showing posts from November, 2025

final project

Image
  Introduction   I looked at whether high-volume shooters in the NBA are actually less efficient than low-volume shooters. Most people assume that players who take a lot of shots end up shooting a lower percentage, but I wanted to test if there is a real statistical relationship between shot volume and field goal percentage. The reason I picked this because I love sports there are so many different statistics and oddities in it I can talk about sports forever the sports world as only gotten increasingly analytical, so it made sense for this to be my topic   Data & Methods   The dataset included 1,003 NBA player seasons with stats like field goal attempts (FGA), field goal percentage (FG%), points, rebounds, and more. I used the median number of field goal attempts to split players into high-volume and low-volume shooters, with 501 players in each group.   To analyze the relationship, I ran a two-sample t test comparing FG% between the two groups, a co...

Module 12

  The time series plot basically shows the charges just keep going up. Both years start lower in the beginning and rise consistently toward the end, with 2013 being higher across the board. When I ran the SES model, the alpha came out to 0.82 , which is pretty high. That means the model is putting a lot more weight on the most recent months, which makes sense because the spending keeps increasing. The error values (RMSE around 5.9 , MAE around 4.8 ) were fine and showed that the model fit the data pretty well. The forecasts leveled out at 62.42 , which is what SES does since it doesn’t model trend directly. Basically, SES smooths everything out and gives one stable forecast based on the recent months. If I wanted the model to keep the upward trend going, I’d need Holt’s method, but for this assignment SES works. Overall, the results match the plot: student credit card charges were rising for two straight years, and the smoothing model backs that up.

Module 11 assignment

 When I ran the different model formulas, I noticed that they all use the same factors, but depending on how I write the model, R builds the design matrix differently. So a*b gives main effects and the interaction. a:b only gives the interaction cells. a + b only gives main effects. Same variables, but different structure. The main thing I saw is that some of these versions actually run fine, and some end up with singularities. For example, lm(z ~ a*b) worked normally, but lm(z ~ a:b) gave me an NA because the interaction only model wasn’t full rank with the intercept. So it isn’t the data that causes NAs, it’s literally how the model is written. The implication is just that the way you write the formula matters. It controls the math in the background, and that decides whether R can estimate all the coefficients or not. > a <- gl(2,2,8) # same thing as g1(2,2,8) > b <- gl(2,4,8) # same thing as g1(2,4,8) > x <- 1:8 > y <- c(1:4,8:5) > set.seed(...

Module 10

 lm(cystfibr$spemax ~ age + weight + bmp + fev1, data=cystfibr) anova(lm(cystfibr$spemax ~ age + weight + bmp + fev1, data=cystfibr))   I used the cystfibr dataset from the ISwR package In completing this assignment. I ran a regression on spemax Based on the predictor variables age, weight, BMI an FEV 1 with spemex is being predicted. My results indicated that my overall model model was statistically significant. Therefore, at least one of the predictor variables is significantly associated with spemax .   Based upon the coefficients produced in this analysis, I examined age. Since the coefficient for age was negative, it can be concluded that the age increases spemex We'll decrease while controlling for the effects of the other predictor variables. This indicates that older patients generally exhibit lower maximum specific power.