In urgent need of help for Biology IA: What is the correlation between the use of snuff and the incidence of oral cavity cancer?

StressedDiplomaStudent · September 23, 2021

Hello. I am currently doing my IA in Biology and my research question is "What is the correlation between the use of snuff and the incidence of oral cavity cancer?".

I have found the relevant sources and I have the raw data necessary to conduct my investigation. The only problem is that my teacher does not really know how to do statistical analysis and therefore have been unable to help me in my investigation. My own knowledge of statistical analysis is quite bad and I have not found out how to do a proper statistical analysis in relation to this question. Do you guys have any tips?

Skylarrrr · December 4, 2021

Hello! There are different types of stats tests, which my teacher introduced us to, for example there are:

- Pearson's Correlation

- One/Two-way Anova

- T-test

According to me, I think Pearson's Correlation would be the best for you, you can watch videos to get the hang of it!

Kaito · December 4, 2021

On 9/23/2021 at 3:38 PM, StressedDiplomaStudent said:

Hello. I am currently doing my IA in Biology and my research question is "What is the correlation between the use of snuff and the incidence of oral cavity cancer?".

I have found the relevant sources and I have the raw data necessary to conduct my investigation. The only problem is that my teacher does not really know how to do statistical analysis and therefore have been unable to help me in my investigation. My own knowledge of statistical analysis is quite bad and I have not found out how to do a proper statistical analysis in relation to this question. Do you guys have any tips?

hmm... it's been a while since I did this but I think you want to do something along the lines of this. To figure out if the correlation is significant or if it is due to randomness you can use the null hypothesis.

The null hypothesis works something like this. You assume your data does not show significant correlation. Significance in this case means that the correlation is meaningful and not random. You then have an alternative hypothesis, which in this case is that there is a correlation between the use of snuff and oral cavity cancer. By finding the p-value from your data, you can determine the probability that the data distribution is random. If the p-value is very low, it is unlikely that the data distribution is random.

To determine this you need the correlation coefficient, degrees of freedom and a test. If the p-value is greater than the significance level ɑ (usually 0.05) you CANNOT reject the null hypothesis since you do not have significant evidence that the data is not random, thus your data is inclusive. Note that failure to reject the null hypothesis does not prove the null hypothesis. On the other hand, if the p-value is less than the significance level you can reject the null hypothesis and assume the alternate hypothesis to be true. The lower the p-value, the less likely it is that the data is the result of random distribution.

To calculate the p-value you need a statistical test such as the student's t-test where T = r√(n-2)/√(1-r²).

(n is the number of data points and the degrees of freedom DF = n-2).

To get the Pearson correlation coefficient r you can use the following formula:

You could probably get a calculator to take care of this part for you.

Once you have the value of T for your data, you can use a calculator (or table) to obtain the p-value by inputting T and the degrees of freedom. You also need to select a significance level. ɑ = 0.05 is recommended.

There are one-sided right-tailed test and one-sided left-tailed tests as well as two-sided tests. You need to make sure to select the correct one depending on your hypothesis (top percentile or bottom percentile). I included a link that explains when to use which.

You might want to do some further reading on the following topics to get more detailed and accurate information since I might have missed something:

Testing the Significance of the Correlation Coefficient: https://stats.libretexts.org/Bookshelves/Introductory_Statistics/Book%3A_Introductory_Statistics_(OpenStax)/12%3A_Linear_Regression_and_Correlation/12.05%3A_Testing_the_Significance_of_the_Correlation_Coefficient

Pearson correlation coefficient: https://en.wikipedia.org/wiki/Pearson_correlation_coefficient

Null hypothesis: https://en.wikipedia.org/wiki/Null_hypothesis

p-value: https://en.wikipedia.org/wiki/P-value

Statistical significance: https://en.wikipedia.org/wiki/Statistical_significance

Student's t-Distribution: https://en.wikipedia.org/wiki/Student's_t-distribution

Left- or right-tailed test: https://www.statisticshowto.com/probability-and-statistics/hypothesis-testing/how-to-decide-if-a-hypothesis-test-is-a-left-tailed-test-or-a-right-tailed-test/

Online p-value calculator: https://www.socscistatistics.com/pvalues/tdistribution.aspx