This assessment involves writing a report that summarises a data science related investigation that you have conducted on data that you have collected yourself. The investigation must involve the main topics covered in the subject, most noticeably data pre-processing (representation, wrangling, tidying) and exploratory data visualisation using R/RStudio.
It is a merger of techniques learned for previous assessments. However the pre-processing/exploratory steps to be carried out will not be provided, you have to make independent choices and decisions. We won’t mark you for coding and as such there is no expectation that you submit codes. If however, you think particular coding segments may contribute to your presentation (and argument) you could include that as supplementary and highlight – and refer to that- that in the main text.
You are required to find your own data set.. However, your dataset cannot be smaller than 1000 observations of 5 variables, except if the targeted data science problem to be addressed relates to spatial-temporal data, in which case less than 5 dimensions could be allowed.
The report should not exceed 10 pages. The main body text must notbe longer than 5 pages. The rest of the (5) pages should incorporate any supplementary files, including graphs or codes (codes won’t be marked unless clearly indicated in main text or linked to the analysis).
Download the Capstone Project assignment document for full details of this assessmentincluding the marking scheme.
If you use Word or any other program, save your work as a pdf for submission.
Include the following in your submission:
- Your work in pdf format
- The task cover sheet
- Upload both files at the same time.