L1: The Library of Alexandria#

It is the third century BC. Callimachus of Cyrene (Καλλίμαχος ὁ Κυρηναῖος) has just completed his famous Pinakes (Πίνακες). It is a great catalog of hundreds of thousands of scrolls of the Library of Alexandria - in 120 volumes. This required a huge amount of work, but as you can guess, such a large list is still inconvenient to use. However, Callimachus does not lose his enthusiasm and is already thinking about another improvement in indexing the Library’s resources. For this purpose, he intends to use the latest achievement of Hellenic technology - a computer with the Λίνουξ system.

Callimachus will handle entering information about the scrolls into the computer. Your task is to write a program running on the Λίνουξ system and using the ΠΟΣΙΞ API to create appropriate indexes.

Remember to free all unused resources (memory, descriptors), check for errors in system functions, and follow good programming practices.

Stages#

Stage 1 (6 pts.)#

In this stage, the program accepts one argument - the path to a file containing the metadata of a book from the library. They are saved using a simple format - each line has a key:value pair. You can assume that there are no extra : characters in the line (however, other incorrect situations should be handled). Read the contents of the file, and then print the contents of the author, title, and genre fields (in that order). If any of the fields are missing, print missing! for it. For an example file with the content:

uthor:Plutarch
latin_title:De fluviorum et montium nominibus et de iis, quae in illis inveniuntur
title:Περὶ ποταμῶν καὶ ὀρῶν ἐπωνυμίας
genre:geography
incipit:When Chrysippe, through the anger of Aphrodite
had fallen into a yearning 
for Hydaspes

you should print:

author: missing!
title: Περὶ ποταμῶν καὶ ὀρῶν ἐπωνυμίας
genre: geography

(there is a typo in the first line, so the program does not find the author field).

Hint: The standard library contains many useful functions for such parsing, like getline, strchr, strcmp or strdup.

Hint 2: This functionality will be useful later, although not immediately - it’s best to place it in a separate function.

Stage 2 (6 pts.)#

The database created by Callimachus reflects the physical structure of the Great Library in its directory structure. Individual nested folders symbolize the respective wings of the building, rooms, shelves, chests… At the end, in the directories, there are regular files whose names reflect the title written on the outer part of the scroll (it does not have to be the same as the real title of the book saved in the metadata).

In this stage, modify the program’s operation - the program no longer takes arguments. After starting, the program recursively searches the library directory. All regular files symbolize books. In the program’s directory, create a directory named index (if it already exists, return an error). Inside this directory, create another one named by-visible-title. Place relative symbolic links there to all books from the library (files) with the same names but without nesting. If any name is repeated, return an error. Use the nftw function for searching.

Hint: Creating relative symbolic links (check man 3p symlink) is problematic when they are not in the working directory. You can use symlinkat or chdir. The join_paths function from the starter code will also be useful. If you want to use symlinkat, you can declare a global variable here to pass the descriptor to nftw.

Stage 3 (4 pts.)#

Create two new indexes next to by-visible-title.

  • The by-title index is similar to the one from the previous stage, but it uses the title field from the metadata as the file name. If the field does not exist, the file should be skipped. If the title is longer than 64 bytes, it should be truncated to this length.

  • The by-genre index places books in subfolders of the form index/by-genre/<genre>, where <genre> is the value of the genre field in the metadata. The file name is the book title from the metadata as above. The genre value should be truncated to 64 bytes if necessary.

Stage 4 (6 pts.)#

The last functionality of the index will be to check if any books are missing. For this purpose, another programmer has already created a file containing a list of all books in binary format. In this stage, the first argument of the program, when present, means the path to the database file. You need to read its contents and check if any books are missing. The database format is as follows: there are a number of entries, each consisting of 4 bytes for the file size (unsigned int) and the first 64 bytes of the title (from metadata). If the title is shorter than 64 bytes, the field is padded with zeros, so each entry is always 68 bytes long - their number can be determined from the file size. You should browse the contents of the library. Each time a given book is missing, print the message Book "<title>" is missing, and when the size does not match, Book "<title>" size mismatch (<expected size> vs <file size>). Use the readv function to load the index. To check if books are missing, use the by-title index.

The repository contains two sample databases. The first one, named database_correct, contains all the books in the library directory. The second one, named database_missing, contains additional books, so there should be an error when checking.

Hint: To ensure our string is always correctly null-terminated, in the structure of a single book, we can create an array of 65 chars and zero out the last one. This way, even if the title has been truncated and is a full 64 bytes, the string will be correctly terminated after reading.

Starting code#

comments powered by Disqus