# Blocks (400 + 300 pts)¶

## Question:¶

I recovered as much data as I could. Can you recover the flag?

blocks-data

## Write-Up:¶

Good to Know: There are several instances of this challenge, so various teams may have received different instances (with different flags). However, the solution is essentially the same.

1. Using the command file blocks-data does note reveal anything useful about the file.

2. Next, run the command strings blocks-data to see the string contents of the file:

@
CREATE TABLE "data" (
ID
INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
Data
BLOB NOT NULL,
Cat
INTEGER NOT NULL
indexsqlite_autoindex_data_1data
Ytablesqlite_sequencesqlite_sequence
CREATE TABLE sqlite_sequence(name,seq)
_tablecategorycategory
CREATE TABLE category (
ID
INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
Cat
TEXT NOT NULL
...


It seems like a DB, most probably a SQLite DB (due to its small size).

3. The command hd data | head reveals that the file header is corrupt.

Perhaps we can use a valid file header to repair it?

The script repair-header.sh does exactly that: It generates an empty SQLite DB, and concatenates various prefixes of its header (of length 1 to 100), to data file.

The script succeeds after a few trials, and generates data.sqlite.

4. Use SQLite Manager or any SQLite tool of your choice to investigate the DB. You see that it has two tables: category and data.

5. category contains 8 rows:

ID  CAT
---------
1   CHRM
2   IDAT
3   ICCP
4   IHDR
5   TEXT
6   TIME
7   PLTE
8   TRNS


Simply Google a few of these keywords to reveal that they refer to various chunks in PNG data structure.

6. The data table has three columns: ID, Data, and Cat.

Perhaps the data table contains the actual contents of the PNG file?

Each row of the table seems to contain the data related to one chunk of the PNG file, whose type is specified by the Cat column.

7. A minimal PNG file has the following structure:

MAGIC BYTES     # = '89504e470d0a1a0a'
IHDR            # Must appear first
IDAT
IEND            # Must appear last


The data table, however, specifies multiple IDAT chunks, a PLTE chunk, a tRNS chunk, and no IEND chunk!

We'll soon fix that, but let's first take a closer look at the chunk data structure.

8. Each chunk must have the following structure:

Length
Type
Data
CRC


A quick look at the data table reveals that none of the chunks have the above structure; they simply contain raw data.

So, we write a function raw2chunk() to compute the remaining fields for each chunk (See blocks-solution.py).

9. The contents of the IDAT chunk is scattered among multiple rows. That's typical of PNG files to have multiple IDAT chunks, as it aids in better streaming the image.

The only problem is to determine the order of the IDAT chunks.

A huge help is to know that the raw data of IDAT chunks is compressed using zlib.

So, if we have $n$ IDAT chunks, we can try all $n!$ combinations, and check which combination decompresses correctly.

However, that strategy is time consuming, so we can come up with some clever tricks:

• zlib header of PNG files (almost always) starts with the byte 0x78, which indicates deflate compression with a 32-KB window. Only one chunk satisfies this property, which is deemed as the first chunk.

• All chunks have equal size, except one which is smaller. The smaller chunk is the last chunk.

• Removing the first and last chunks, only $n-2$ chunks remain. In Python, zlib has a decompressobj() function, which returns an object capable of decompressing partial streams (or erring on invalid streams). We use this capability in a backtracking algorithm, which creates a search tree and prunes those subtrees with no valid child as early as possible. This way, we don't need to try all $(n-2)!$ combinations.

• Using zlib.decompress() function, we prune those combinations of length $n$ which are valid as partial streams but invalid as a whole stream.

The above algorithm is implemented in find_correct_idat() function (See blocks-solution.py).

It returns as soon as one valid combination is found. The calculations may take several minutes.

10. We reconstruct all chunks using the raw2chunk() function. (For IEND, the data field is empty.)

The chunks are then concatenated in arbitrary order, with the constraint that IHDR and IEND must be first and last, respectively.

The MAGIC bytes are added, and the file is saved.

11. As the file is flipped and rotated, the necessary transformations are applied to make it upright.

12. The following text is written in the file:

SharifCTF{D2E857E07AD021CB52AE76DD9AD9B6DB}