We accept projects that are in the following themes: Statistical Analysis or Machine Learning. The main difference between Machine Learning vs Statistical Analysis lies in the facts that Statistical Analysis try to find dependencies between variables whereas Machine Learning are used for prediction purpose. However, there is no clear cut between the two, you can submit your project in either category if you find it is hard to make a decision about which category you should submit it.
Evaluation Committee
University faculty members, Statisticians, and Data Scientists whose decisions are final, will judge the reports on the following:
I. Statistical Analysis
Question to answer using statistical analysis: The statistical question that you try to solve should be clearly stated, and data that supports the analysis should be attainable.
Data Collection: Candidates should be able to provide evidence that the data was collected by themselves. Candidates should explore the data to determine if the data quality is adequate to answer the question, and demonstrate the understanding of the data. A data quality report that includes data summaries may be provided.
Data Visualization/Exploration: Relevant well-labeled, and accurate graphs and tables of the data should be included.
Data Analysis: Expected level of statistical methods should be found in an introductory statistics course, which may include descriptive statistics, basic sampling methods, designed experiments, probability, confidence intervals (parametric and non-parametric approaches), hypothesis testing, simple or multiple linear regression, one- or two-factor ANOVA, simulation-based inference (randomization tests or bootstrapping), and other nonparametric methods.
Conclusion: The conclusion should reprise the questions and the answer, and should provide an overall picture of the project. Candidates should note what have been done or haven't, what may be the next steps for further study.
Final report: Details for the steps described above form the final report. The final report should be well-organized, and well-written.
2. Machine Learning
Question to answer using Machine Learning: What is the objective of the project? and what is the target variable? what is the type of the problem? Some examples below may help:
- How much or how many? (A regression problem)
- Which category? (A classification problem)
- Which group? (A clustering problem)
- Is this weird? (An anomaly detection problem)
- Which option should be taken? (A recommendation problem)
Then you need to come up with an idea of what kind of data you will need to solve your problem, and how to collect them.
Data Collection: Candidates should be able to provide evidence that the data was collected by themselves. Candidates should explore the data to determine if the data quality is adequate to answer the question, and demonstrate the understanding of the data. A data quality report that includes data summaries may be provided.
Data Visualization/Exploration: Relevant well-labeled, and accurate graphs and tables of the data should be included.
Modeling: Candidates should demonstrate understanding of feature selection and feature engineering, model training, validation, and testing, and usages of popular ML packages. Artifacts may include raw data sources, and feature sets, model report for each model and parameters, accuracy, precision, recall, ROC, AUC, variable importance, and discussion of overfitting if applicable. The expected level of methods should be found in any introductory Machine Learning or Data Mining course.
Conclusion: The conclusion should reprise the questions and the answer, and should provide an overall picture of the project. Candidates should note what have been done or haven't, what may be the next steps for further study.
Final report: Details for the steps described above form the final report. The final report should be well-organized, and well-written.
More information is to come...