GEX thesis source code, full text, references
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
gex-thesis/ch.fat16.tex

172 lines
15 KiB

\chapter{The FAT16 File System and Its Emulation} \label{sec:fat16}
7 years ago
A \gls{FS} is used by GEX to provide the user comfortable access to the configuration files. By emulating a mass storage \gls{USB} device, the module appears as a thumb drive on the host \gls{PC}, and the user can edit its configuration using their preferred text editor. The FAT16 file system was selected for its simplicity and good cross-platform support~\cite{os-support-table}.
7 years ago
Three variants of the \gls{FAT} file system exist: FAT12, FAT16, and FAT32. FAT12 was used on floppy disks and is similar to FAT16, except for additional size constraints and a \gls{FAT} entry packing scheme. FAT16 and FAT32 are FAT12's later developments from the time when hard disks became more common and the old addressing scheme could not support their larger capacity.
7 years ago
This chapter will explain the structure of FAT16 and the challenges faced when trying to emulate it without a physical storage medium. A more detailed overview of the file system can be found in literature~\cite{ms-fat,fat16-brainy,fat16-maverick,fat16-phobos,fat-whitepaper} consulted during the GEX firmware development, with the Microsoft white paper~\cite{fat-whitepaper} giving the most complete description.
\section{The General Structure of the FAT File System}
7 years ago
The storage medium is organized into \textit{sectors} (or \textit{blocks}), usually 512 bytes long; that is the smallest addressing unit used by the file system. The disk starts with a \textit{boot sector}, also called the \gls{MBR}, followed by optional reserved sectors, one or two copies of the file allocation table, and the root directory. All disk areas are aligned to a sector boundary:
7 years ago
7 years ago
\begin{table}[H]
\centering
\begin{tabular}{ll}
\toprule
\textbf{Disk area} & \textbf{Size / Notes} \\
\midrule
Boot sector & 1 sector \\
Reserved sectors & optional \\
FAT 1 & 1 or more sectors, depends on disk size \\
FAT 2 & optional, a back-up copy of FAT 1 \\
Root directory & 1 or more sectors \\
Data area & Organized in \textit{clusters} \\
\bottomrule
\end{tabular}
7 years ago
\caption{\label{tab:fat16_disk_areas}Areas of a FAT-formatted disk}
\end{table}
\subsection{Boot Sector}
This is a 1-sector structure which holds the \gls{OS} bootstrap code for bootable disks. The first 3 bytes are a jump instruction to the actual bootstrap code located later in the sector. What matters to us when implementing the file system is that the boot sector also contains data fields describing how the disk is organized, what file system is used, who formatted it, etc. The size of the \gls{FAT} and the root directory is defined here. The exact structure of the boot sector can be found in either of~\cite{ms-fat,fat16-brainy,fat16-maverick,fat16-phobos,fat-whitepaper} or in the attached GEX source code.
\subsection{File Allocation Table}
The data area of the disk is organized in clusters, logical allocation units composed of groups of sectors. The use of a larger allocation unit allows the system to use shorter addresses and thus support a larger disk capacity.
The \gls{FAT} acts as a look-up table combined with linked lists. In FAT16, it is organized in 16-bit fields, each corresponding to one cluster. The first two entries in the allocation table are reserved and hold special values set by the disk formatter and the host \gls{OS}: a ``media descriptor'' 0xFFF8 and a ``clean/dirty flag'' 0xFFFF/0x3FFF.
7 years ago
Files can span multiple clusters; each \gls{FAT} entry either holds the address of the following file cluster, or a special value:
7 years ago
\begin{itemize}[nosep]
\item 0x0000 -- free cluster
\item 0xFFFF -- last cluster of the file (still including file data)
\item 0xFFF7 -- bad cluster
\end{itemize}
7 years ago
The bad cluster mark, 0xFFF7, is used for clusters which are known to corrupt data due to a flaw in the storage medium.
\subsection{Root Directory}
7 years ago
A directory is a record on the disk that can span several clusters and holds information about the files and sub-directories contained in it. The root directory has the same structure as any other directory; the difference lies in the fact that it is allocated with a fixed position and size when the disk is formatted, whereas other directories are stored in the same way as ordinary files and their capacity can be increased by simply expanding to another cluster.
7 years ago
Directories are organized in 32-byte entries representing individual files. \Cref{tab:fat16_dir_entry} shows the structure of one such entry. The name and extension fields form the well-known ``8.3'' file name format known from MS DOS\footnote{``8.3'' refers to the byte size of the name and extension fields in the directory entry.}. Longer file names are encoded using the \gls{LFN} scheme~\cite{fat-lfn} as special hidden entries stored in the directory table alongside the regular ``8.3'' ones kept for backward compatibility.
\begin{table}
\centering
\begin{tabular}{lll}
\toprule
\textbf{Offset} & \textbf{Size (bytes)} & \textbf{Description}\\
\midrule
0 & 8 & File name (padded with spaces) \\
8 & 3 & File extension \\
11 & 1 & File attributes \\
12 & 10 & Reserved \\
22 & 2 & Creation time \\
24 & 2 & Creation date \\
26 & 2 & Address of the first cluster \\
28 & 4 & File size (bytes) \\
\bottomrule
\end{tabular}
7 years ago
\caption{\label{tab:fat16_dir_entry}Structure of a FAT16 directory entry}
\end{table}
7 years ago
\noindent
7 years ago
The first byte of the file name has special meaning:
\begin{itemize}
\item 0x00 -- indicates that there are no more files when searching the directory
\item 0xE5 -- marks a free slot; this is used when a file is deleted
\item 0x05 -- indicates that the first byte should actually be 0xE5, a code used in some character sets at the time, and the slot is \textit{not} free\footnote{The special meaning of 0xE5 appears to be a correction of a less than ideal design choice earlier in the development of the file system}.
7 years ago
\item Any other values, except 0x20 (space) and characters forbidden in a DOS file name, starts a valid file entry. Generally, only space, A--Z, 0--9, \verb|-|, and \verb|_| should be used in file names for maximum compatibility.
\end{itemize}
7 years ago
The attributes field contains flags such as \textit{directory}, \textit{volume label}, \textit{read-only} and \textit{hidden}. Volume label is a special entry in the root directory defining the disk's label shown by the host \gls{OS}. A file with the directory bit set is actually a pointer to a subdirectory, meaning that when we open the linked cluster, we will find another directory table.
7 years ago
\Cref{fig:fat_example} shows a possible organization of the GEX file system with two INI files, one spanning two clusters, the other being entirely inside one. The clusters need not be used completely; the exact sizes are stored in the files' directory entries.
\begin{figure}[h]
\centering
\includegraphics[scale=1.3] {img/fat-links.pdf}
7 years ago
\caption{\label{fig:fat_example}An example of the GEX virtual file system}
\end{figure}
\section{FAT16 Emulation}
7 years ago
The FAT16 file system is relatively straightforward to implement. However, it is not practical or even possible to keep the entire file system in memory on a small microcontroller like our STM32F072. This means that we have to generate and parse disk sectors and clusters on-demand, when the host reads or writes them. The STM32 \gls{USB} Device library helpfully implements the \gls{MSC} and provides \gls{API} endpoints to which we connect our file system emulator. Specifically, those are requests to read and write a sector, and to the read disk's status and its parameters, such as the capacity.
7 years ago
\subsection{DAPLink Emulator}
7 years ago
A FAT16 emulator was developed as part of the open-source \mbed DAPLink project~\cite{daplink}. It is used there for a drag-and-drop flashing of firmware images to the target microcontroller, taking advantage of the inherent cross-platform support (it uses the same software driver as any thumb drive, as discussed in \cref{sec:msc}). \mbed also uses a browser-based \gls{IDE} and cloud build servers, thus the end user does not need to install or set up any software to program a compatible development kit.
7 years ago
The GEX firmware adapts several parts of the DAPLink code, optimizing its \gls{RAM} usage and porting it to work with FreeRTOS. The emulator source code is located in the \mono{User/vfs} folder of the GEX repository; the original Apache 2.0 open-source software license headers, as well as the file names, have been retained.
7 years ago
As shown in \cref{tab:fat16_disk_areas}, the disk consists of several areas. The boot sector is immutable and can be loaded from the flash memory when requested. The handling of the other disk areas (\gls{FAT}, data area) depends on the type of access: read or write.
7 years ago
\subsection{Read Access}
7 years ago
7 years ago
The user can only read files that already exist on the disk; in our case, \verb|UNITS.INI| and \verb|SYSTEM.INI|. Those files are dynamically generated from the binary settings storage and, conversely, parsed as a byte stream without ever existing in their full form. This fact makes our task more challenging, as the file size cannot be easily measured and there is no obvious way to read a sector from the middle of a longer file. We solve this by implementing two additional functions in the INI file generation routine: a \textit{read window} and a \textit{dummy read mode}.
7 years ago
A read window is a specification of the byte range we wish to generate. The INI generator discards bytes before the start of the read window, writes those inside the window to a holding buffer, and stops the end of the window is reached. This lets us extract a sector from anywhere in a file. The second function, dummy read, is tied to the window function: we set the start index so high that it is never reached (e.g., 0xFFFFFFFF), and have the generator count discarded characters. This character counter holds the full file size once the generation is completed.
7 years ago
One more problem needs to be addressed: we need to know the mapping between the files and the clusters they are stored in. In our case, the files change only when the settings are modified. After each such change, an algorithm is run which measures the file sizes, allocates their clusters, and preserves this information for later use. When the host tries to read from the data area of the disk, we simply test if the requested sectors are occupied by any file, and if so, serve the corresponding part of it using the read window function. The \gls{FAT} can be dynamically generated from this information as well.
7 years ago
\subsection{Write Access}
7 years ago
Write access to the disk is more challenging to emulate than reading, as the host OS tends to be somewhat unpredictable. In GEX's case we are interested only in the action of overwriting an already existing file, but it is interesting to analyze other actions the host may perform as well.
7 years ago
It must be noted that due to the nonexistence of a physical storage medium, it is not possible to read back a file the host has previously written, unless we store or re-generate its content when such a read attempt occurs. The \gls{OS} may show the written file on the disk, but when the user tried to open it, the action either fails, or shows a cached copy. The emulator works around this problem by temporarily reporting that the storage medium has been removed after a file is written, forcing the host to drop any cached data and reload the disk.
\subsubsection{File Deletion}
A file is deleted by:
\begin{enumerate}
7 years ago
\item Marking all \gls{FAT} sectors used by the file as free
\item Replacing the first character of its name in the directory table by 0xE5 to indicate the slot is free
\end{enumerate}
7 years ago
From the perspective of emulation, we can ignore the \gls{FAT} access and only detect writes to the directory sectors. This is slightly more complicated when one considers that all disk access is performed in sectors: the emulator must compare the written data with the original bytes to detect what change has been performed. Alternatively, we could parse the entire written sector as a directory table and compare it with our knowledge of its original contents.
\subsection{File Name Change}
7 years ago
A file is renamed by modifying its directory entry. In the simple case of a short, 8.3 file name, this is an in-place modification of the file entry. Long file names, using the \gls{LFN} extension, are a complication, as the number of non-file entries holding the long file name might change, and subsequently the following entries in the table may shift or be re-arranged.
\subsection{File Creation}
A new file is created in three steps:
\begin{enumerate}
\item Finding free clusters and chaining them by writing the following cluster addresses (or 0xFFFF for the last cluster) into the \gls{FAT}
\item Finding and overwriting a free entry in the directory table
\item Writing the file content
\end{enumerate}
7 years ago
We can expect that the host first finds available sectors and a free directory entry before performing any write operations, to prevent potential disk corruption.
7 years ago
To properly handle a newly created file by the emulator, we could, in theory, find its name from the directory table, which has been updated, and then collect the data written to the corresponding clusters. In practice, confirmed by experiments with a real Linux host, the two latter steps may happen in any order, and often the content is written before the directory table is updated.
7 years ago
The uncertain order of the written areas poses a problem when the file name has any significance, as we cannot store the received file data while waiting for the directory table to be updated. The \arm DAPLink firmware solves this by analyzing the content of the first written sector of the file, which may contain the binary \gls{NVIC} table, or a character pattern typical for Intel hex files, allowing it to recognize a binary image the user wants to flash to the target \gls{MCU}.
\subsection{File Content Change}
7 years ago
A change to file's content is performed in a similar way to the creation of a new file, except instead of creating a new entry in the directory table, an existing one is updated with the new file size. The name of the file may be unknown until the content is written, but we could detect the file name by comparing the start sector with those of all files known to the virtual file system.
In the case of GEX, the detection of a file name is not important; we expect only INI files to be written, and the particular file may be detected by its first section marker, such as \verb|[UNITS]| or \verb|[SYSTEM]|. Should a non-INI file be written by accident, the INI parser will likely detect a syntax error and discard it.
7 years ago
It should be noted that a file could be updated only partially, skipping the clusters which remain unchanged, and there is also no guarantee regarding the order in which the file's sectors are written. A non-linear or partial file update is hard to process for the emulator, but it can be reliably detected and discarded. Fortunately, this host behavior has not been conclusively observed in practice, but a file update rarely fails for unknown reasons; this could be a possible cause.
7 years ago