Database Systems Architecture - Project


You are asked to implement an external-memory merge-sort algorithm (such as the one described in Section 15.4.1 of TCB), and examine its performance under different parameters. As a warm-up exercise you will need to explore several different ways to read data from, and write data to secondary memory. The overall goal of the assignment is to get real-world experience with the performance of external-memory algorithms. Be sure to read the full assignment for full details and modalities.

Practicalities & Evaluation

The assignment should be solved in groups of 3 (if the total number of students is not divisible by 3, at most two groups of 2 students will be allowed)

This project contributes 6/20 to the overall grade, and the written exam contributes the remaining 14/20 points. Your solution should be pushed to the Git repository (see full assignment) no later than 6 january 2019. You get a penalty of -1/6 points for each day that your solution is delayed.

Overview of groups

The group compositions are online: sorted by group number and sorted by last name

Questions & Answers

  • For generating the file to read, do we need to consider only numbers from 0 to the MAXVALUE or do we also need to include negative numbers?
    The generation procedure should generate numbers between [MIN_INT_VAL, MAX_INT_VAL] where MIN_INT_VAL is the minimum integer number supported on your programming platform and MAX_INT_VAL is the maximum integer number supported on your platform. Typically MIN_INT_VAL includes negative integers.
  • In the third I/O variant, do we need to implement our own buffer manually, or do we need to configure the buffer size parameter available through Java BufferedInputStream class and test it for different values?
    The objective of the 3rd IO variant is to be able to experiment with different values for B. Hence, it suffices to configure the buffer size parameter available through the Java BufferedInputStream class.
  • For step 1 of section 1.3, do we need to use (can we use) the “readFile” implementation that we created on the number 4 of section 1.1 or do we need to create a different one?
    You are free to use any of the file implementations from step 1.1.
  • For step 3 of section 1.3, do we need to use (can we use) the “multiway merge” implementation that we created in section 1.2? Or do we need to create a different one?
    Yes, indeed, you should use here the multiway merge implementation that you created in section 1.2.
  • For the tests, how can we proceed systematically and make use of existing testing environments?
    As described in the project assignment there are several frameworks available on the web that allow conducting experiments in a systematic way. These are called micro benchmark frameworks. You are free to use such framework if you find that convenient. Note that, typically, these frameworks do not ship code to generate test files automatically (since that is rather application-specific).
teaching/infoh417/project.txt · Last modified: 2018/10/30 15:00 by svsummer