# Serial of year 30

You can find the serial also in the yearbook.

*We are sorry, this serial has not been translated.*

## Text of serial

## Tasks

### (10 points)1. Series 30. Year - S. random one

- Try to explain in your own words what is a random variable and what are its properties (explanations of following concepts are required: random variable, distribution of a random variable, realization of a random variable, mean, variance, histogram).
- Generate graphs of probability distribution functions for the following distributions of random variable: normal, exponential, uniform (continuous) and Poisson. Describe what happens when you alter the parameters of aforementioned distributions.
- From the data set attached to this task, generate histograms and try to determine the associated distributions.
- Suppose we define a random variable $X$ as a result of a „fair“ (all outcomes are equally probable) six-sided dice roll. Determine the distribution function of the random variable $X$ and calculate $\mathrm {E} X$ and $\mathrm {var} X$.

**Bonus:** Name two different distributions of random variables with the same mean and variance.

For data processing and creating the plots, you may use the *R* programming language. Most of these tasks can be solved by slightly altering the attached scripts.

Michal created a random problem, hopefully it won't be too hard.

### (10 points)2. Series 30. Year - S. guessing problem

- Describe in your own words the purpose of interval estimation of mean of a normal distribution and explain its physical interpretation (it is sufficient to describe, in your own words, the following concepts: physical interpretation of the estimation of expected value, difference between point and interval estimation, measurement uncertainty). It’s not necessary to state the exact mathematical derivations. It’s sufficient to briefly explain the concepts and their properties.
- Attached to this task, in the file
*mereni1.csv*there are measured values of a certain physical quantity (assume type B uncertainty of B $s\_B = 0{,}1$). Create both the point and interval estimations of the measured physical quantity and try to interpret their meaning. - Suppose we measure a certain physical quantity and we know that due to the method being used, the measured values will have a variance equal to a constant $c$ (ignore the type B uncertainty). How many measurements do we need to make to achieve an uncertainty below $s$?
- In the attached file
*mereni2.csv*there are data of measurements one physical quantity two different ways (neglect type B uncertainty). Which method used more precise measurement equipment? Which method produced a more precise results Briefly give reasons for your answers.

**Bonus:** Try to rigorously derive that in a normal distribution the sample variance is an unbiased estimate of the real variance (i.e. the mean of sample variance is equal to the real variance). For the solution of this problem you may use any and all sources (if you cite them correctly).

For data processing and creating the plots, you may use the *R* programming language. Most of these tasks can be solved by slightly altering the attached scripts.

Michal guessed the optimal wording of the problem, let's hope he was right.

### (10 points)3. Series 30. Year - S. limiting

- Try to, in your own words, describe the method for creating interval estimations of expected value of a general distribution of measured data (it is sufficient to describe the following: central limit theorem (CLT), covariance, correlation (Pearson correlation coefficient), multidimensional CLT, law of propagation of uncertainty and its uses.) It’s not necessary to describe the concepts mathematically, a brief description in your own words is sufficient.
- In the attached datafile
*mereni3-1.csv*there are measurements of a certain physical quantity $v$. Assume we cannot be sure whether the measured data have a normal distribution. Find the uncertainty (standard deviation) of the measurements (neglect the type B uncertainty), set up the interval estimations using CLT and briefly interpret their meaning. How would the results (and interpretation) change if only the first quarter of the data was available? - Suppose our aim is to measure a physical quantities $x$ and $y$, which we will then plug into the equation \[\begin{equation*} v= \frac {1}{2} x y^2 . \end {equation*}\] and suppose that we are certain that all measurements are independent and we already have measured a significant amount of data, processed them and there are the results \[\begin{align*} x &= (5,2\pm 0.1) , \\ y &= (12{,}84\pm 0.06) . \end {align*}\] Estimate the value of $v$ and its uncertainty.

**Hint:**These equations may come in handy $$\frac{\partial}{\partial x} \( \frac {1}{2} x y^2 \) = \frac {1}{2} y^2\, ,$$ $$\frac{\partial}{\partial y} \( \frac {1}{2} x y^2 \) = x y \, .$$ - Using a computer simulation demonstrate the validity of central limit theorem i.e. generate $n$-tuples (sequences of $n$ real numbers) of independent realizations of a random variable, which does not have a normal distribution (use the exponential, uniform and Poisson distributions with arbitrary parameters) and show, using a histogram, that applying the transformation \[\begin{equation*} \sqrt {n}\frac {\overline {x_n - \mu }}{S_n} , \end {equation*}\] to the data will (approximately) yield a normal distribution $N(0, 1)$.

**Bonus:** Suppose our aim is to measure physical quantities $x$ and $y$, which we will then plug into \[\begin{equation*} v= x^2 \sin y .
\end {equation*}\] Assume the most general model of measurement (i.e. the measured data do not have a normal distribution and the measurements of $x$ and $y$ may not be independent. In the datafile *mereni3-2.csv* you may find the results of measurements of $x$ and $y$, determine the uncertainty of $v$ and construct an interval estimation of $v$.

For data processing and creating of plots use the *R* programming language. In the attached scripts is explained all necessary syntax.

### (10 points)4. Series 30. Year - S. testing

- Try to describe in your own words what purpose serves testing of hypotheses and how its done (it is sufficient to briefly describe the following: null hypothesis and alternative hypothesis, type I and type II error, level of significance, test statistic, confidence level, $p$-value). It’s not necessary to describe the concepts mathematically, a brief description in your own words is sufficient.
- In the attached data file
*testovani1.csv*there are measurements of a certain physical quantity. Using one-sample $t$-test find out whether the real value of the measured quantity is equal to $20$. Then suppose our aim is to show that the real value is larger than $20$. Test this claim using an appropriate modification of $t$-test (be careful which null hypothesis and alternative hypothesis you choose). - In the attached data file
*testovani2.csv*you may find the measurements of two different physical quantities. Assume the measurements to be of the same physical characteristic, just under different conditions (temperature, pressure etc.). Test the hypothesis that the value of said physical characteristic is the same under both sets of outside conditions using the two sample $z$-test. - Use the data from the last task in the first series of this year and using Kolmogorov–Smirnov test determine which of the four data samples comes from uniform distribution and which comes from exponential distribution.

**Bonus:** Assume you have at your disposal measurements of 2 physical quantities (i.e. two sets of measurements), where all the data are independent. Set up a modified $z$-test, that will test the hypothesis that the real value of the first physical quantity is double the real value of the second physical quantity. It is sufficient to set up the corresponding test statistic and confidence level. (*Hint:* Use the multidimensional central limit theorem with appropriately selected function $f$, and then proceed analogically to setting up a classical two-sample $z$-test) For data processing and creating the plots, you may use the *R* programming language. Most of these tasks can be solved by slightly altering the attached scripts.

Michal wanted to test, how difficult problems you can solve.

### (10 points)5. Series 30. Year - S. linear

- Try to describe in your own words how and for what purpose linear regression is used (it is sufficient to briefly describe the following: two significant applications of linear regression, least squares method, maximum likelihood estimation, linear regression model, basic graphical methods of regression diagnostics). It’s not necessary to describe the concepts mathematically, a brief description in your own words is sufficient.
- In the attached data file
*linreg1.csv*you may find the results of a certain physical experiment, in which we measured the pairs of data $(x_i, y_i)$. We want to fit the measured data with a theoretical function in this case a parabola in the form \[\begin{equation*} f(x) = ax^2 + bx + c . \end {equation*}\] Determine the value of the coefficient $a$ and its uncertainty. It is not necessary to use regression diagnostics. - In the attached data file
*linreg2.csv*you may find the results of a certain physical experiment, in which we measured the pairs of data $(x_i, y_i)$. We want to fit the measured data with a theoretical function, in this case a logarithmic function in the form \[\begin{equation*} f(x) = a+ b \cdot \log (x) . \end {equation*}\] Plot the measured data into a graph with the fitting function and briefly comment on it. It is not necessary to use regression diagnostics. - Suppose we have measured pairs of data $(x_i, y_i)$ and want to fit them with a linear function in the form \[\begin{equation*} f(x) = a+ bx . \end {equation*}\] Derive the exact formula for calculating the regression parameters. You may use any and all sources, if you cite them correctly. (Actually derive the formula, do not just write it.)

**Bonus:** In the tasks *b)* and *c)* perform regression diagnostics and discuss, whether all necessary criteria (assumptions) are met.

For data processing and creating the plots, you may use the *R* programming language. Most of these tasks can be solved by slightly altering the attached scripts.

Michal heard somewhere, that linear regression is really easy.

### (10 points)6. Series 30. Year - S. nonlinear

- Try to describe in your own words how and for what purpose nonlinear regression is used (it is sufficient to briefly describe the following: model of nonlinear regression, methods for finding regression coefficients, uncertainties in the determination of regression coefficients, uncertainties in the function being fitted, statistic methods for testing the values of the regression coefficients, how to choose the form of the fitting function). It’s not necessary to describe the concepts mathematically, a brief description in your own words is sufficient.
- In the attached data file
*regrese1.csv*you may find pairs of valuest $(x_i, y_i)$. Fit these data with a sine function in the form \[\begin{equation*} f(x) = a+ b \cdot \sin (c x + d) . \end {equation*}\] Plot the measured values and the fit and comment on it briefly. It’s not necessary to perform regression diagnostics.

**Hint:**Be wary of correct constraints for the values of parameter $c$.

- In the attached data file
*regrese2.csv*you may find pairs of values $(x_i, y_i)$. Fit these data with an exponential function in the form \[\begin{equation*} f(x) = a+ \eu ^{b x + c} . \end {equation*}\] Estimate the values of all regression coefficient including their uncertainties.

**Hint:**Using graphical method examine homoscedasticity. You may use Huber-White (sandwich) estimator for determining the uncertainties in estimating regression coefficients if necessary.

- In the attached data file
*regrese3.csv*find the pairs of values $(x_i, y_i)$. Fit these data with a hyperbolic function in the form \[\begin{equation*} f(x) = a+ \frac {1}{b x + c} . \end {equation*}\] Plot the measured data in the form of means and error bars and briefly comment on it. Perform the regression diagnostics.

**Bonus:** In the attached data file *regrese4.csv* you may find pairs of values $(x_i, y_i)$. We want to fit these data with a function too complex to be expressed analytically. Use spline regression to fit these data with appropriately chosen knots and order).

For data processing and creating the plots, you may use the *R* programming language. Most of these tasks can be solved by slightly altering the attached scripts.

Michal wanted to make the last series as hard as possible.