This assignment must be submitted individually. You are encouraged to discuss and exchange solutions during the lab sessions or on Slack, but you are not allowed to share code electronically. Plagiarism, unauthorized collaboration and other offenses under Concordia's Academic Code of Conduct will be firmly handled.
To submit this assignment, you will have to be familiar with Git and GitHub. If you have never used these technologies, it is recommended to go through the following tutorials:
In particular, you will have to be able to:
-
Clone a Git repository from GitHub: find the URL of a GitHub repository
and clone it using
git clone <repo_url>
. -
Commit modifications to a local clone of a Git repository:
git add <file>
andgit commit -m "message"
. -
Push modifications from your local clone to the origin repository on GitHub:
git push
.
You have to submit your assignment through GitHub classroom, using the following procedure:
- Accept the assignment at https://classroom.github.com/a/J0XB3ioO. This will create your own copy of the assignment repository, located at http://github.com/tgteacher/bigdata-la3-your_github_username.
- Clone your copy of the assignment repository on your computer, and
implement the functions in
answers/answer.py
, following the instructions in the documentation strings. A skeleton of your answer file already exists in fileanswers/answer.py
with the required syntax for each function. - Commit your solution to your local copy of the assignment repository.
- Push your solution to your GitHub copy of the assignment repository.
Important: please make sure that the email address you use in Moodle is added to your GitHub account (you can add multiple addresses to your GitHub account).
You can repeat steps 3 and 4 as many times as you wish. Your assignment will be graded based on a snapshot of your repository taken on the submission deadline.
Your assignment will be automatically graded through software tests.
The tests are already available in directory tests
. You
may want to run them as you implement your solution, to check that your
code passes them. To do so, you will have to install pytest
and simply
run pytest tests
in the base directory of the assignment.
Your grade will be determined from the number of passing tests as returned by pytest. All tests will contribute equally to the final grade. For instance, if 20 tests are evaluated, and your solution passes 18 tests, then your grade will be 90%.
This grading scheme is meant to be transparent and objective. However, it is also radical and you should be very meticulous with your coding: make a single syntax error in your answer file, such as a spurious tabulation character, and all the tests will fail! To avoid that kind of surprises, you are strongly encouraged to check the output of the tests on Travis CI regularly.
The rules below aim at discouraging cheating. They might sound a bit harsh, but in general be cool: if you don't aim at cheating, you probably won't :)
- You are not allowed to modify the tests to make them pass. Every deliberate attempt to modify the tests will result in a grade of 0.
- You must use the libraries mentioned in the instructions to implement the assignment. Any attempt to implement the solution with a different library, for instance Dask when Spark was expected, will result in a grade of 0.
- You are not allowed to make the tests pass using a hard-coded solution. Your solution must, in principle, apply to other similar datasets. Any hard-coded solution will receive the grade of 0.
- Any deliberate attempt to trick the grading system by making the tests pass without providing a correct, non-hardcoded, solution will receive the grade of 0.
Your code will be tested with Python 3.6 in a Ubuntu environment provided by Travis CI. It is your responsibility to ensure that the tests will pass in this environment. The following resources will help you.
Python 3.6 is available in the computer labs and can be loaded using
module load python/3.6
. You can check the version of Python that
you are using by running python --version
. Computer labs can easily be
accessed remotely, using ssh
.
It is strongly suggested that you run the disclosed tests before
submitting your assignment, using pytest
as explained previously.
Live feedback on your assignment is provided through Travis CI here. You will have to sign-in using your GitHub account to see your assignment repository. Your grade will be determined from the result of the tests executed in Travis CI.