Week |
Date |
Topic/Materials/Tasks |
1 |
Mon Jan 11 |
- Introduction, syllabus, etc.
- Get set up on the
Sagemath Cloud:
- Use Firefox (or Chrome), not Internet Explorer.
- Sign up for a free account using your real name
and your University email address (@ohio.edu). Sign
in.
- Click on "Help" in the upper left and read about it.
- Look for a project in your account titled with your
name. I created this and shared it with you. Your
submissions as an individual, such as your
biography, go here.
- Look for a project "StatisticalComputing". I
will put things here for the whole class to use.
- Do your autobiography
- Familiarize yourself with the markdown language documentation and extensions.
- Familiarize yourself with the html
language documentation.
- Familiarize yourself with the \(\LaTeX\)
Wikibook.
- Upload the autobiography file to your project. Edit it to make your autobiography.
|
Wed Jan 13 |
- Find your partner for this week's tasks and journal
(see StatisticalComputing/partners.sagews), and sit next
to them.
- One of you create a new project using the naming
convention "Week Firstname and Firstname". For
example, if your name is Hillary and your partner is
Donald, then name it "1 Hillary and Donald".
- Hit the "Settings" button, look under "Collaborators",
search for the other person by email, and add them
as a collaborator. Search for me as
mohlenka@ohio.edu and add me as a collaborator.
- Hit the "New" button, pick a name (such as
"1Journal"), and hit the "SageMath Worksheet"
button. This will create the file for your journal
this week. You can both edit it simultaneously. (I
can access this file to grade it so you do not have
to send it to me.)
- Use markdown
(or html)
to put in your names, a title, the course and the
week.
- Introduction to R:
- Read What is R?
- Skim the FAQs
- Notice the manuals
and familiarize yourself with An
Introduction to R. You will refer to these
manuals a lot.
- Become familiar with the all-important help()
and help.search()
functions. Try
help(Syntax) ,
help(Arithmetic) ,
help(Comparison) ,
help(Extract) , and
help(Control) .
- In your journal, write a list of 10 interesting
things you learned about R and link to where you
learned them.
- Upload the sage worksheet
basics.sagews. Run
each cell of code and observe/guess what it is
doing.
- In your journal, run 10 different R commands in
10 different cells. Briefly explain what each one
does.
|
Thu Jan 14 8am autobiography (counts as a
journal) due. |
Fri Jan 15 |
- Plotting warm up:
- In your journal:
- Plot the data
x<-c(1,2,4,5,9)
y<-c(0,-1,3,3,1)
using each of the 9 options for type. Use layout()
to show in a 3x3 grid.
- Pick your favorite type (not "n") and plot with
a title, subtitle, x-label, y-label, and larger x
and y limits.
- Read help(plotmath). Repeat the above plot, now with
some of the labels mathematical expressions; use at
least some powers and greek letters.
- Repeat the above plot, making it colorful. Use
abline() to add a thick green line \(y=0.5x-1\).
- Read about table(), pie(), barplot(), and
rainbow(). Make a pie chart and a barplot of the
frequencies of the y values colored by rainbow; use
layout() to show them side by side.
- Use boxplot() to make a boxplot of x and
y. Title and label it.
- MATH 5530 Exploration:
- Download an article from Computational
Statistics & Data Analysis or Statistics
and Computing published in 2015 or 2016. (You
need to be on-campus or use the proxy server through
the library to
download.) Upload the paper to your individual
project.
- Read the abstract and introduction and skim the
rest of the paper.
- In a sage worksheet titled
"1Exploration", include:
- The full bibliographic information on the
article and the name of the pdf file.
- A one-paragraph (about 10 sentence) summary
in your own words of the topic of the paper.
|
2 |
Mon Jan 18 | Martin Luther King, Jr. Day
holiday |
Wed Jan 20 |
- Find your partner for this week's tasks and journal,
sit next to them, and set up a project for your journal.
- Read about head(), summary(), mean(), var(),
paste(), and print().
- Common discrete probability distributions:
- Read about the R
functions for the binomial,
geometric,
hypergeometric,
poisson,
and negative
binomial distributions.
- For each of the five above distributions:
-
Choose some parameters (not trivial like 0 or 1).
- Use the d....() function to generate the probability
distribution function and plot it.
- Use the r....() function to generate 1000
data points. Use table() and lines() (or
points()) and appropriate scaling to plot it on
the same graph as the distribution function, so
that they approximately match. Remember to title
your graph.
- Use mean() and var() to check the mean and
variance of the data; compare to the theoretical
values. (Note that the Wikipedia and R
definitions sometimes differ, such as switching
successes and failures.)
|
Thu Jan 21 8am journal for Jan 13-15
due. 5530 exploration from Jan 15 due. |
Fri Jan 22 |
(drop deadline)
- From the StatisticalComputing project copy
ratings.sagews to your individual project. In it rate
your partner on the journal due yesterday.
- Read about density().
- Common continuous probability densities:
- Read about the R
functions for the uniform,
normal,
gamma,
and beta
distributions.
- For each of the four above densities:
-
Choose some parameters (not trivial like 0 or 1).
- Use the d....() function to generate the probability
density function and plot it.
- Use the r....() function to generate 1000
data points. Use density() and lines() to plot
it on the same graph as the distribution
function. Remember to title your graph.
- Use mean() and var() to check the mean and variance
of the data; compare to the theoretical values.
- MATH 5530 Exploration:
- Repeat the exploration from Jan 15 with a new article.
- Identify one concept or method that you are not
familiar with. Research it (usually Wikipedia is
sufficient) and write a paragraph explaining
it. Cite and link to your sources.
|
3 |
Mon Jan 25 |
- Find your partner for this week's tasks and journal,
sit next to them, and set up a project for your journal.
- Read the Good Problems handout on
Flow. Starting with this journal, be sure to use
complete sentences and paragraphs and have text to bind
your journal together.
- Making functions:
- Read about
writing your own functions,
return(), if(), sapply(), length(), numeric(), and while().
- Implement the function
\(f(x)=\left\{\begin{array}{ll} 1-|x| & -1\le x \le 1\\ 0 & \text{otherwise}\end{array}\right.\) and plot it on \([-2,2]\).
- Find the explicit formula for the function
\(F(x)=\int_{-\infty}^x f(t)dt\), implement it,
and plot it on \([-2,2]\).
- Find the explicit formula for the function
\(F^{-1}(y)\), implement it, and plot it on an
appropriate interval.
- Custom densities:
- Read about Inverse
transform sampling. Summarize the method
in your own words. Use this method to generate
samples from the probability density function \(f(x)\)
above. Show that it worked.
- Read about Rejection
sampling. Summarize the method in your
own words. Use this method to generate samples from
the probability density function \(f(x)\) above.
Show that it worked.
|
Tue Jan 26 8am journal for Jan 20-22 due. Rate your partner. |
Wed Jan 27 |
- Follow these
instructions to install the R package "mcmc". (Let
me know if it fails.)
- Markov Chain Monte Carlo methods:
- Read about Markov
Chain Monte Carlo methods. Read about it again,
this time slower and more carefully. Summarize in
your own words.
- Read about the Metropolis-Hastings
algorithm. Summarize in your own words.
- Do
library(mcmc) to load the mcmc
package. Do help(metrop) to read about
its metrop() function. (If library() fails, you may
need to use its lib.loc option.)
- Let
\(f(x)=\left\{\begin{array}{ll} 1-|x| & -1\le x \le
1\\ 0 & \text{otherwise}\end{array}\right.\). Use
metrop() to produce samples from the distribution
\(f\). Plot the samples produced versus their index
to see how the Markov Chain moves around. Plot the
resulting density to see how well the sampling
worked.
- Let \(g_A(x) = \frac{f(x) + f(x-A)}{2}\). Use
metrop() to produce samples from the distribution
\(g_A\) for \(A=1,3,5\). For each \(A\) plot the
samples versus their index and the resulting
densities. Explore the options to metrop() to make
the sampling and resulting densities better.
|
Thu Jan 28 8am 5530 exploration from Jan 22 due. |
Fri Jan 29 |
- Catch up.
- MATH 5530 Exploration:
- Repeat the exploration from Jan 22 with a new article.
- Identify the specific claims made in the paper
(such as that their method is better than existing
methods for certain problems).
|
4 |
Mon Feb 1 |
- Find your partner for this week's tasks and journal,
sit next to them, and set up a project for your journal.
- Read about read.table(), qqplot(), qqnorm().
- Maximum Likelihood Estimators:
- Read about Maximum Likelihood Estimators. Summarize in your own words.
- Load the
stats4 package using
library and read about its
mle function.
- Get the continuous data set unknowncontinuous.dat.
Through plots or measurements, determine which
continuous probability density (one of uniform,
normal, gamma, beta) was used to generate it.
- Use mle() to compute the
maximum likelihood estimate of the
parameter(s). Plot the density function using these
parameters along with the density from the data to
see how well it worked.
- Get the discrete data set unknowndiscrete.dat.
Through plots or measurements, determine which
discrete probability distribution (one of binomial,
geometric, poisson, negative
binomial) was used to generate it.
- Use mle() to compute the maximum likelihood
estimate of the continuous parameters. If the
distribution type you selected has discrete
parameters (like n), then fix them using the 'fixed'
option and manually try a few values that seem
reasonable from the plots; which value is most
likely to be true? Plot the distribution function
using the parameters you determined and the
normalized table from the data to see how well it
worked.
|
Tue Feb 2 8am journal for Jan 25-29 due. Rate your partner. |
Wed Feb 3 |
- Read about
sample and replicate .
- Generate 10000 samples from the (Gaussion mixture)
density 0.3*normal(0,1)+0.7*normal(5,2), meaning
that a sample has 0.3 probability of coming from
normal(0,1) and 0.7 probability of coming from
normal(5,2). Plot the density to make sure it looks correct.
- Expectation Maximization:
- Read about the EM
algorithm and especially its use for Gaussian
mixtures. Summarize the method in your own
words.
- Install the package mclust, load it, and read
about its
em function.
- Forget that you know the parameters 0.3, 0, 1,
0.7, 5, and 2, and use
em to try to recover them
from the data you generated above.
|
Thu Feb 4 8am 5530 exploration from Jan 29 due. |
Fri Feb 5 |
- Catch up.
- MATH 5530 Exploration:
- Repeat the exploration from Jan 29 with a new article.
- List the numerical experiments in the paper and
identify which you might be able to reproduce to
(in)validate the claims of the paper.
|
5 |
Mon Feb 8 |
- Find your partner for this week's tasks and journal,
sit next to them, and set up a project for your journal.
- Read the Good Problems handout on
Introductions and Conclusions. Starting this week's
journal, you need to include an introduction and a
conclusion.
- Read about matrix().
- Gibbs sampling:
- Read about Gibbs
sampling. Summarize in your own words.
- Write a Gibbs sampler to construct samples from
the uniform distribution on a disc of radius 1. Plot
the resulting points.
- Write a Gibbs sampler to construct samples from
the distribution f(x,y) whose marginal in x is
unif(min=0,max=5) independent of y and whose
marginal in y is unif(min=x,max=2x+1) for x in
[0,5]. Plot the resulting points.
|
Tue Feb 9 8am journal for Feb 1-5 due. Rate your partner. |
Wed Feb 10 |
- Multivariate Normal distributions:
- Read about the Multivariate
Normal distribution and the methods for drawing
values from it. Summarize in your own words.
- Install the
mvtnorm package and
read about its rmvnorm ,
pmvnorm , qmvnorm , and
dmvnorm functions.
- Read about the
persp ,
contour , image , and
wireframe functions in the
lattice package.
- Let \(\mu=[1,2]\) and
\(\Sigma=\left[\begin{array}{cc}1&0.8\\
0.8&1\end{array}\right]\).
- Generate 1000 samples from the
normal\((\mu,\Sigma)\) using
rmvnorm and plot them. Use
colMeans and var to
check \(\mu\) and \(\Sigma\).
- Evaluate the density function for
normal\((\mu,\Sigma)\) using
dmvnorm
on a grid including \([-2,4]\times [-1,5] \).
Plot using persp ,
contour , image , and
wireframe .
|
Thu Feb 11 8am 5530 exploration from Feb 5 due. |
Fri Feb 12 |
- Create a data set D with samples from the uniform
distribution on the unit disc.
- Create a data set G with samples from the
two-dimensional normal distribution with \(\mu=[0,0]\)
and \(\Sigma=\left[\begin{array}{cc}1&0\\
0&1\end{array}\right]\).
- For each of these:
- Determine (by thinking) the density function for
the x values and the density function for the y
values. Are they the same?
- Determine (by thinking) whether x and y are
independent.
- Test computationally whether or not x and y have
the same distribution.
- Test computationally whether or not x and y
are independent.
- MATH 5530 Exploration:
- Repeat the exploration from Feb 5 with a new article.
- Identify an R function that we have not used in
this class and would likely be useful in trying to
validate the results in this paper. Run a very
simple calculation using this function.
|
6 |
Mon Feb 15 |
- Find your partner for this week's tasks and journal,
sit next to them, and set up a project for your journal.
- From the StatisticalComputing project, copy the
files 2161_MATH2301_100key.sagews,
2161_MATH2301_100grades.csv, and
2161_MATH2301_100perfect.csv. Look at them. Do
not share them with anyone outside this
class.
- Read about data
frames,
read.csv , summary ,
head , factor ,
levels , with ,
names , sort , and
NA .
- Use
read.csv to load the data in
2161_MATH2301_100grades.csv as a data frame named
grades and the data in
2161_MATH2301_100perfect.csv as a data frame named
perfect . Apply head to both
data frames and summary to grades and
interpret the results.
- Plot
grades$grade and note that the
grades are in alphabetical order, rather than their
natural order. Use factor and its
levels option to replace
grades$grade with itself but with the
grades in the order
c("A","A-","B+","B","B-","C+","C","C-","D+","D","D-","F","FS","WP","WF","Z") .
Replot to show that it worked. (You can use the option
cex.names=0.75 to shrink the labels so they
all show.)
- Similarly, fix the order of the levels of
grades$Level to put them in their natural order and plot to show that it worked.
- Similarly, fix the order of the levels of
grades$College to put them order from most
students to fewest student. (Use table ,
sort , and names to
automatically find the correct order of colleges.)
Plot to show that it worked.
- (Start using
with .)
- Plot
grade versus avg and
avg versus grade . Explain how
to read the plots and which plot you think is most useful.
Interpret what the plot tells you about how the students did.
- Similarly, plot, explain, and interpret using
Level and avg .
- Similarly, plot, explain, and interpret using
Level and grade .
- Similarly, plot, explain, and interpret using
College and grade .
- Use
table to make a contingency table of College and grade .
|
Tue Feb 16 8am journal for Feb 8-12 due. Rate your partner. |
Wed Feb 17 |
- Read about vector
manipulations, selecting
subsets of the data,
subset ,
is.na , %in% ,
rbind , legend ,
and length .
- Using
subset , create the following data frames:
finished : students who took the final exam.
unfinished : students who did not
take the final exam.
ABC : students with grades in
c("A","A-","B+","B","B-","C+","C","C-") .
DFW : students with grades in
c("D+","D","D-","F","FS","WP","WF","Z") .
Use summary , table , or
another method to show that you got the subsets you
wanted.
- Show the distribution of
finished$Level
and unfinished$Level within a single
barplot. Each level (like "Freshman") should have two
bars (use the "beside" option). Make the plot colorful
and include a legend. Interpret the results.
- Similarly, show the distributions of
Level for
ABC and DFW and interpret.
- Similarly, show the distribution of
College
for finished /unfinished and for
ABC /DFW ; interpret.
- Plot
Level versus College
for the ABC data frame and (separately) for
the DFW data frame. Interpret the results.
|
Thu Feb 18 8am 5530 exploration from Feb 12 due. |
Fri Feb 19 |
- Summarize what you have learned so far from this data set.
How could the Mathematics department use this information to
make MATH 2301 better?
- MATH 5530 Exploration:
- Repeat the exploration from Feb 12 with a new article.
|
7 |
Mon Feb 22 |
- Find your partner for this week's tasks and journal,
sit next to them, and set up a project for your journal.
- Read the Good Problems handout on
Logic. Starting with this week's journal, be sure to
make your logic clear and use logical connectives.
- Linear Models
- Read about linear
models. Summarize the method in your own words.
- Read about linear
models in R,
lm ,
fitted , coefficients , and
I .
- Using the
finished dataframe,
explore how well the scores on the exam
etotal (written by the course
coordinator) are predicted by the scores on tests
tbest5 (written by the instructor).
- Apply
lm to fit a line relating
these variables. Run summary and
plot on the result. (It actually
produces 4 plots, so use
layout(matrix(1:4,2,2)) .) Interpret
the results.
plot(tbest5,etotal) . From the
result of lm , extract the
coefficients (automatically, not copy and paste)
and use them to plot the line of best fit on top
of the data. Interpret the result.
- Similarly, see how well
tbest5 is
predicted by gwbest10 using a line and
interpret.
- Similarly, see how well
tbest5 is
predicted by gwbest10 using a parabola
(quadratic in gwbest10 ). (You will need
to use the I function in your formula.)
Argue whether the line or parabola is better.
- Run
lmetg <- lm(etotal ~ tbest5*gwbest10,data=finished)
summary(lmetg)
layout(matrix(1:4,2,2))
plot(lmetg)
Explain what the model is, what the results mean,
and how well the prediction worked. Argue whether
this is better or worse than the prediction using
only tbest5 .
|
Tue Feb 23 8am journal for Feb 15-19 due. Rate your partner. |
Wed Feb 24 |
- Read about Generalized
Linear Models. Summarize in your own words and
specifically address:
- How are they different from (ordinary) linear models?
- What is a link function?
- What choices (such as the link function) within
the generalized linear model gives an (ordinary)
linear model?
- Read about Generalized
Linear Models in R,
glm , and
family .
- Use
glm with appropriate choices to try
to reproduce the results from lm when
etotal is predicted by
tbest5 using a line.
- When you used
lm to predict
etotal using tbest5 , you
(should have) found that very low scores on
tbest5 lead to negative predictions for
etotal , which is nonsense. Use
glm with family=binomial to
avoid this nonsense. (It expects response values in
\([0,1]\), so try to predict etotal/200 .)
- Using
layout(matrix(1:12,4,3)) , plot
the results from the original lm test, the
glm test that should reproduce it, and the
glm test using
family=binomial . Interpret the results.
- Make a single plot that has
- The original
(tbest5,etotal) points with
xlim=c(0,100),ylim=c(0,200) .
- The prediction line you got using
lm .
- The prediction line you got using
glm trying to reproduce
lm , in a different color.
- The prediction curve you got using
glm with
family=binomial . (You will need to map
using binomial()$linkinv and multiply
by 200. It should look similar to the prediction lines
but stay in \([0,200]\).)
Interpret the results.
|
Thu Feb 25 8am 5530 exploration from Feb 19 due. |
Fri Feb 26 |
Due March 10:
- MATH 4530 Students: Propose a topic for your final
project. Explain why you think it is a good choice.
- MATH 5530 Exploration: Propose a paper to use for
your final project. It could be one you used for an
exploration or a new one. Explain why you think it is a
good choice. Identify its specific claims and the
numerical experiments you plan to reproduce.
|
Spring Break |
8 |
Mon Mar 7 |
Work on your final project proposal.
|
Tue Mar 8 8am journal for Feb 22-24 due. Rate your partner. |
Wed Mar 9 |
- Find your partner for this week's tasks and journal,
sit next to them, and set up a project for your journal.
- Read the competitive mathematical
game False
Alarms in a Sensor Network.
- Read about Expected Value. Summarize in your own words.
- Computed the expected values of the following:
- The cost of radioactive leaks from nuclear plants in one year, assuming the sensor network did not detect them early.
- The cost of radioactive leaks from nuclear plants in one year, assuming the sensor network did detect them early.
- The cost of clouds of radioactivity from the East, assuming the sensor network did not detect them early.
- The cost of clouds of radioactivity from the East, assuming the sensor network did detect them early.
- The cost dirty bombs, assuming the sensor network did not detect them early.
- The cost dirty bombs, assuming the sensor network did detect them early.
- The cost to maintain the sensor network.
- The cost of sensors giving false alarms, assuming the sensor is isolated.
- The cost of sensors giving false alarms, assuming the sensor network detects it as a likely malfunction.
- If there was no sensor network, what is the expected total cost?
- In the best case, where the sensor network detects everything, what is the expected total cost?
- What is the net benefit of installing the sensor network?
|
Thu Mar 10 8am final project proposal due. For
5530 students counts as an exploration. |
Fri Mar 11 |
- Make a draft entry in the game (in your journal, not
as a .pdf).
- MATH 5530 Exploration:
- If your final project proposal was rejected,
then propose a new paper.
- If your final project proposal was accepted,
then pick some self-contained topic that you will
need for your final project, explain it, and do some
related computation.
|
9 |
Mon Mar 14 |
Work on your final project.
|
Tue Mar 15 8am journal for Mar 9-11
due. Rate your partner. |
Wed Mar 16 |
- Find your partner for this week's tasks and journal,
sit next to them, and set up a project for your journal.
- Read about
data.frame , cbind , and rowSums .
- Make a table of the grades received by students who
did not take test 1 or did not take test 2 (or did not
take both). Interpret the results. (We will not be able
to use these for the analysis today.)
- Make a data frame
begend such that:
- Students who did not take test 1, test 2, or
both, are excluded.
- It has a column
Level3
derived from Level that preserves
"Freshman" and "Sophomore" but has all other values
converted to "other".
- It has a column
College3 derived
from College that preserves "A&S" and
"ENT" but has all other values converted to "other".
- It has a column
preparation with
the sum of all the questions from test 1 and
questions 1, 2, and 3 from test 2. (These are
PreCalculus questions.)
- It has a column
gw1and2 that is the
sum of the first 2 groupworks, with missing scores
counted as 0.
- It has a column
grade2 with "ABC"
if the student grade was in
c("A","A-","B+","B","B-","C+","C","C-")
and "DFW" otherwise.
Show that your data frame is correct by using
table , summary , etc.
- Plot all 10 combinations of
pairs of factors in
begend . For each pair, choose the
order (i.e. plot(x,y) or
plot(y,x) ) that gives the most useful
plot and interpret the results.
|
Thu Mar 17 8am 5530 exploration from Mar 11 due. |
Fri Mar 18 |
- Read about Binary
Classification and Evaluation
of binary classifiers. Summarize in your own
words. Give the formulas for and interpretation of
sensitivity, specificity, and accuracy.
- Consider a
grade2 value of "DFW" as
the disease state. Our goal is to diagnose this disease
based on information available early in the semester.
- From looking at the plots, decide on
a classification of students into "ABC" or "DFW" using
only
Level3 information. Compute the
sensitivity, specificity, and accuracy of your
classifier.
- Repeat using only
College3 information.
Does this classifier do better or worse?
- Write a function
ssa with inputs:
score : a factor containing
numerical scores;
class : a factor containing the true
classes corresponding to the numerical scores (only
two classes);
levels : a vector (or list) of the
two classes, with the class generally corresponding
to lower scores first; and
cut : a cutoff score.
Have it compute the sensitivity, specificity, and
accuracy of the binary classifier that classifies
scores less than or equal to cut as
levels[1] and scores greater than
cut as levels[2] . Return
c(sensitivity,specificity,accuracy) .
- Run
ssa with
score=preparation and
class=grade2 for cut in
1:150 and plot the sensitivity,
specificity, and accuracy on a single graph. Interpret
the results.
- Run
ssa with
score=gw1and2 and
class=grade2 for cut in
1:200 and plot the sensitivity,
specificity, and accuracy on a single graph. Interpret
the results.
- If the Mathematics department wants to identify
at-risk students early in the semester, what method do
you recommend they use?
- MATH 5530 Exploration: Pick some self-contained
topic that you will need for your final project, explain
it, and do some related computation.
|
10 |
Mon Mar 21 |
Work on your final project.
|
Tue Mar 22 8am journal for Mar 16-18
due. Rate your partner. |
Wed Mar 23 |
- Find your partner for this week's tasks and journal,
sit next to them, and set up a project for your journal.
- From the StatisticalComputing project, copy the
files that start with
L3C3eg . Do
not share them with anyone outside this
class. These contain Level3 ,
College3 , etotal , and
grade for different sections of MATH
2301. Fix the order of the Level3 ,
College3 , and grade factors to
their natural orders.
- For each of the following, make a single plot
overlaying the curves from each of the
L3C3eg dataframes. Color-code by dataframe
and include a legend. Interpret the results.
- Plot the proportion of students in each level of
Level3 .
- Plot the proportion of students in each level of
College3 .
- Plot the density of
etotal .
- Plot the proportion of students
that received each grade.
- Plot the proportion of students getting DFW
grades at each level of
Level3 (i.e. proportion of Freshmen
with DFW, proportion of Sophomores with DFW, ...).
(Hint: sapply(levels(Level3),function(x){sum(Level3==x & grade %in% c("D+","D","D-","F","FS","WP","WF"))/sum(Level3==x)}) .)
- Plot the proportion of students getting DFW grades
in each level of
College3 .
- Plot the mean
etotal of
students that received each grade.
|
Thu Mar 24 8am 5530 exploration from Mar 18 due. |
Fri Mar 25 |
(drop deadline with WP/WF)
- Note that a rough draft of your final project report
is due next Thursday.
- Read about Hypothesis
Testing. Summarize in your own words.
- Read about Student's
t-test and
t.test . Summarize in your
own words.
-
Apply
t.test to
L3C3eg100$etotal to test the hypothesis
that it is drawn from a population whose mean is greater
than 110. Repeat for 115, 120, 125, and 130. Interpret
the results.
- Read about Welch's
t test. Summarize in your own words.
-
Apply
t.test to the etotal
for each pair of L3C3eg
dataframes. Interpret the results.
- Summarize what you have learned about MATH 2301 this
week. What differences in performance between sections
do you think are due to different student populations?
What differences do you think are due
to different instructors? How could the Mathematics
department use this information to make MATH 2301
better?
|
11 |
Mon Mar 28 |
Work on your final project.
|
Tue Mar 29 8am journal for Mar 23-25
due. Rate your partner. |
Wed Mar 30 |
- Find your partner for this week's tasks and journal,
sit next to them, and set up a project for your journal.
- From
grades$etotal make
etotalnoNA that has NA values
discarded. From grades$etotal make
etotal0 that discards NA
values that correspond to a grade of "Z"
and sets the remaining NA values to 0.
- Make a function
subsetmeanvar that
inputs two vectors (data,indices) and
returns the mean and variance of the subset of the data
vector with those indices.
- Make a function
subsetmedian that
inputs (data,indices) and returns the
median of the subset of the data with those indices.
- Read about Bootstrapping. Summarize
the method in your own words.
- Read about the
boot package and its
functions boot and boot.ci .
- Use
boot and boot.ci on
etotalnoNA with the statistic
subsetmeanvar to study the empirical
distribution of the mean statistic for
etotalnoNA . Print the outputs of
boot and boot.ci and plot the
output of boot . Interpret the results.
- One may argue that the mean of
etotalnoNA is a poor way to measure the
effectiveness of the instructor, since a terrible
instructor may scare off all but the strongest
students. To account for these students, we can instead
consider the median of etotal0 . Use
boot and boot.ci on
etotal0 with the statistic
subsetmedian to study the empirical
distribution of the median statistic for
etotal0 . Interpret the results.
|
Thu Mar 31 8am rough draft of final project report
due. For 5530 students counts as an exploration. |
Fri Apr 1 |
- Note that a rough draft of your final project
presentation (slides) is due next Thursday. See next
Monday for guidance.
- From
grades make a data frame
eqs that includes only the scores on the
exam questions and has NA values
discarded. Run boxplot on it to see how the
students did on each question, and intepret the results.
- One can argue that if the class does very badly on a
question, then it was "too hard" and asking it was not
productive. Identify the question on which the students
did the worst and make a table of the scores. Look at
the question
itself and judge whether or not it is too
hard. Argue whether or not questions similar to this one
should be included on exams in the future.
- One can argue that if the class does very well on a
question, then it was "too easy" and asking it was not
productive. Identify the question on which the students
did the best and make a table of the scores. Look at the
question
itself and judge whether or not it is too
easy. Argue whether or not questions similar to this one
should be included on exams in the future.
- One can argue that if the correllation between
scores on two questions is too high, then we could save
everyone time and energy by only asking one of
them. Apply
pairs to eqs and
interpret the results. Apply cor and find
the most correlated pair of questions. Look at the questions
themselves and judge whether or not they are too
similar. Argue whether or not only one question similar
to one of this pair should be included on exams in the
future.
- Summarize what you have learned about MATH 2301 this
week. How should performance differerences between
instructors be measured? Should the design of the final
exam be modified?
|
12 |
Mon Apr 4 |
Work on your final project presentation slides:
|
Tue Apr 5 8am journal for Mar 30 - Apr 1 due. Rate your partner. |
Wed Apr 6 |
- Find your partner for this week's tasks and journal,
sit next to them, and set up a project for your journal.
- Read about Monte
Carlo integration. Summarize the basic method in your own
words.
- For practice, we will estimate the integral \(I =
\int_{-1}^1 (1-x^2)\,dx\). Compute the exact value so we
can compare.
- Write a function with input
n that uses
the basic Monte Carlo integration method with
n points to estimate \(I\).
- Read about Antithetic
variates. Summarize the method in your own
words. Write a function with input
n that
uses this method with n total points to
estimate \(I\). (It should use n/2 original
points and n/2 antithetic points.)
- Read about Importance
sampling. Summarize the method in your own words.
Write a function with input
n that uses
this method with n total points to estimate
\(I\) using sampling density \(f(x)=1-|x|\) on \([-1,1]\).
- Compare the three methods by doing the following for each:
- Use
replicate to run it 1000 times
using 1000 points to collect 1000 estimates for
\(I\).
- Compute the variance of the estimates.
- Run
summary on the absolute value
of the error of the estimates.
Interpret the results. Which method is working better?
|
Thu Apr 7 8am rough draft of presentation (slides)
due. For 5530 students counts as an exploration. |
Fri Apr 8 |
- (No more 5530 explorations.)
- Read about Monte
Carlo methods. Summarize in your own words.
- Consider the following three-person game, with
players whom we will call 1, 2, and 3. In each round
two players play while the third waits. The winner of
that round plays in the next round versus the player who
was waiting. If a player wins two consecutive rounds
then the game stops and that player is declared the
overall winner. Each round is a simple coin toss, with
the two players having equal probability of winning.
Suppose in the first round 1 plays 2 while 3 waits. Is
the probability of 1 being the overall winner the same
as the probability of 2 being the overall winner? Is
the probability of 1 being the overall winner the same
as the probability of 3 being the overall winner?
- Write a function that simulates this game and
returns the (overall) winner (1,2, or 3).
- Use
replicate to run this function many
times and compute the relative frequencies of the
different players winning. Interpret the results.
- Suppose you need to decide a winner among 3 players
using only a (fair) coin and you would like the
probabilities of each winning to be the same. Decide on
a method/game to do so, write a function that simulates
it, and run the simulation many times to show that the
relative frequencies tend towards equality.
|
13 |
Mon Apr 11 |
- From the StatisticalComputing project, copy the file
ratinganalysis.sagews into your personal
project. Read through it to see how partner ratings will
be used to adjust journal scores. On Thursday after you
submit your final journal, make sure your partner
ratings are up to date. If you see any way to improve
the process in ratinganalysis.sagews to
make it better/fairer, then comment on it in your
ratings.sagews file.
- Look at the presentation rating guide and form. If you have any
questions on how the presentations will work then ask
them.
|
Wed Apr 13 |
Project presentations:
- Must be at least 10 and at most 15 minutes long.
- Must use \(\LaTeX\) slides in the beamer class.
- Aim for half the presentation to explain to the
class general background and half to show what
you did.
- Worth 10% of your project grade.
- You will rate each other. (rating guide and form)
|
Thu Apr 14 8am journal for Apr 6-8 due. Rate your partner. |
Fri Apr 15 |
More presentations |
14 |
Mon Apr 18 | More presentations |
Wed Apr 20 |
More presentations
|
Fri Apr 22 |
More presentations or clean-up |
15 |
Fri Apr 29 |
Final Exam 1-3pm (virtual, your presence is not required).
- Presentation slides due. You can improve them based
on feedback from your presentation.
- Project report due.
|