UBIFS

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Jonathan Corbet
April 2, 2008

The steady growth in flash-based memory devices looks set to transform parts of the storage industry. Flash has a number of advantages over rotating magnetic storage: it is smaller, has no moving parts, requires less power, makes less noise, is truly random access, and it has the potential to be faster. But flash is not without its own idiosyncrasies. Flash-based devices operate on much larger blocks of data: 32KB or more. Rewriting a portion of a block requires running an erase cycle on the entire block (which can be quite slow) and writing the entire block's contents. There is a limit to the number of times a block can be erased before it begins to corrupt the data stored there; that limit is low enough that it can bring a premature end to a flash-based device's life, especially if the same block is repeatedly rewritten. And so on.

A number of approaches exist for making flash-based devices work well. Many devices, such as USB drives, include a software "flash translation layer" (FTL); this layer performs the necessary impedance matching to make a flash device look like an ordinary block device with small sectors. Internally, the FTL maintains a mapping between logical blocks and physical erase blocks which allows it to perform wear leveling - distributing rewrite operations across the device so that no specific erase block wears out before its time - though some observers question whether low-end flash devices bother to do that. The use of FTL layers makes life easy for the rest of the system, but it is not necessarily the way to get the best performance out of the hardware.

If you can get to the device directly, without an FTL getting in the way, it is possible to create filesystems which embody an awareness of how flash works. Most of our contemporary filesystems are designed around rotating storage, with the result that they work hard to minimize time-consuming operations like head seeks. A flash-based filesystem need not worry about such issues, but it must be concerned about things like erase blocks instead. So making the best use of flash requires a filesystem written with flash in mind.

The main filesystem for flash-based devices on Linux is the venerable JFFS2. This filesystem works, but it was designed for devices which are rather smaller than those available today. Since JFFS2 must do things like rebuild the entire directory tree at mount time, it can be quite slow on large devices - for relatively small values of "large" by 2008 standards. JFFS2 is widely seen as reaching the end of its time.

A more contemporary alternative is LogFS, which has been discussed on these pages in the past. This work remains unfinished, though, and development has been relatively slow in recent times; LogFS has not yet been seriously considered for merging into the mainline. A more recent contender is UBIFS; this code is in a state of relative completion and its developers are asking for serious review.

UBIFS depends on the UBI layer, which was merged for 2.6.22. UBI ("unsorted block images") is not, technically, an FTL, but it performs a number of the same functions. At the heart of UBI is a translation table which maps logical erase blocks (LEBs) onto physical erase blocks (PEBs). So software using UBI to access flash sees a device providing a simple set of sequential blocks which apparently do not move. In fact, when an LEB is rewritten, the new data will be placed into a different location on the physical device, but the upper layers know nothing about it. So UBI makes problems like wear leveling and bad block avoidance go away for the upper layers. UBI also takes care of running time-consuming erase operations in the background when possible so that upper layers need not wait when writing a block.

One little problem with UBI is that the logical-to-physical mapping information is stored in the header of each erase block. So when the UBI layer initializes a flash device, it must read the header from every block to build the mapping table in memory; this operation clearly takes time. For 1GB flash devices, this initialization overhead is tolerable; in the future, when we'll be booting our laptops with terabyte-sized flash drives in them, the linear scan will be a problem. The UBIFS developers are aware of this issue, but believe that it can be solved at the UBI level without affecting the higher-level filesystem code.

By using UBI, the UBIFS developers are able to stop worrying about some aspects of flash-based filesystem design. Other problems remain, though. For example, the large erase blocks provided by flash devices require filesystems to track data at the sub-block level and to perform occasional garbage collection: coalescing useful information into new blocks so that the remaining "dead" space can be reclaimed. Garbage collection, along with the potential for blocks to turn bad, makes space management on flash devices tricky: freeing space may require using more space first, and there is no way to know how much space will actually become available until the work has been done.

In the case of UBIFS, space management is an even trickier problem for a couple of reasons. One is that, like a number of other flash filesystems, UBIFS performs transparent compression of the data. The other is that, unlike JFFS2, UBIFS provides full writeback support, allowing data to be cached in memory for some time before being written to the physical media. Writeback gives large performance improvements and reduces wear on the device, but it can lead to big trouble if the filesystem commits to writing back more data than it actually has the space to store. To deal with this problem, UBIFS includes a complex "budgeting" layer which manages outstanding writes with pessimistic assumptions on what will be possible.

Like LogFS, UBIFS uses a "wandering tree" structure to percolate changes up through the filesystem in an atomic manner. UBIFS also uses a journal, though, to minimize the number of rewrites to the upper-level nodes in the tree.

The latest UBIFS posting raised questions about how it compares with LogFS. The resulting discussion was ... not entirely technical, but a few clear points came out. UBIFS is in a more complete state and appears to perform quite a bit better at this time. LogFS is a lot less code, avoids the boot-time linear scan of the device, and is able to work (with some flash awareness) through an FTL. Which is better is not a question your editor is prepared to answer at this time; what does seem clear is that the growing competition between the two projects has the potential to inspire big improvements on both sides in the near future.

Index entries for this article
Kernel	Filesystems/Flash
Kernel	UBIFS

(Log in to post comments)

Wandering Tree?

Posted Apr 3, 2008 7:09 UTC (Thu) by Mithrandir (guest, #3031) [Link]

What is a wandering tree?  I note that there's no wikipedia article on it and the first few
google results don't seem to be relevant...

Wandering Tree?

Posted Apr 3, 2008 8:52 UTC (Thu) by ahunter (guest, #51399) [Link]

There is some info here:

www.linux-mtd.infradead.org/tech/JFFS3design.pdf

Wandering Tree?

Posted Apr 3, 2008 9:57 UTC (Thu) by deleteme (guest, #49633) [Link]

There is a whitepaper on the UBIFS page, and LWN wrote about it in the LogFS article.
The on-flash tree looks much like the structure used by ext2. There are some differences in how it is managed, however. The log structure of the filesystem implies that blocks cannot be rewritten in place; any time a block is changed it must be moved and written to a new location. If there are pointers to the moved block (think about the usual indirect blocks used to store the layout of larger files), the blocks containing the pointers must also be changed, and thus moved. That, in turn, will require changes at the next level up in the tree. Thus changes at the bottom of the tree will propagate upward all the way to the root. This is the "wandering tree" algorithm. One of the advantages is that the old filesystem structure remains valid until the root is rewritten - a crash could cause the loss of the last operation, but it will leave previous data and the structure of the filesystem intact.

Actually managing the entire directory tree as a wandering tree would be expensive; beyond that, files with multiple hard links break the tree structure and make wandering trees much harder to implement. So the actual tree implemented by LogFS just has two levels. There is an "inode file" containing the inode structures for every file and directory existing within the filesystem; each inode then points to the associated blocks holding the file's data. Directory entries contain a simple integer index giving the inode offset within the inode file. So changes to an inode only require writing the inode itself and the inode file; the rest of the directory structure need not be touched.

Wandering Tree?

Posted Apr 8, 2008 10:43 UTC (Tue) by saffroy (guest, #43999) [Link]

It's been a while since I read about it, but this approach sounds similar to the method used
by WAFL, the filesystem used by NetApp and described here:

http://media.netapp.com/documents/wp_3002.pdf

This is great for snapshots, but I think it's patented (maybe someone can confirm?).

Speaking of patents, from what I read in this article, I don't see what's different between
UBI and FTL, the latter being patented too.

Wandering Tree?

Posted Apr 11, 2008 16:48 UTC (Fri) by anton (subscriber, #25547) [Link]

Netapp has patents, but if wandering trees are among them, then that's a weak patent, because there is quite a bit of very well-documented prior art: They were used in log-structured file system implementations, and are also the technique that people programming in functional and logic programming languages use to manage trees.

Wandering Tree?

Posted Apr 4, 2008 14:36 UTC (Fri) by rwmj (subscriber, #5474) [Link]

They've rediscovered a functional programming technique which
has been used for ages called 'zippers'.

There is even a Haskell "OS" (actually, filesystem) written
using this technique, which has some neat properties like
multiple versions, rollback and transactions:
http://okmij.org/ftp/Computation/Continuations.html#zippe...

JFFS2 and large flash

Posted Apr 3, 2008 22:09 UTC (Thu) by lwithers (guest, #23379) [Link]

I recently tried using JFFS2 on a large NAND flash partition (512MiB) with 
less than inspiring results. Turning on the summary option allows the 
partition to be mounted in under 10 seconds rather than 2 minutes, which 
is good, only I found that if I actually did any work on the filesystem 
and rebooted, the garbage collection thread would eat 100% CPU and all I/O 
to the flash would block. This got progressively worse the more work I 
did, to the point where the system became unusable after writing only a 
couple of hundred MiB of data.

Clearly something better is needed. I've used YAFFS (v1) on a 192MiB NAND 
flash partition, and it's worked extremely well -- I haven't noticed any 
data loss and we literally pull the power on our devices to switch them 
off with no noticed corruption. I'm hoping that YAFFS (v2) will sit nicely 
on my 512MiB partition.

UBIFS

Posted Apr 4, 2008 13:24 UTC (Fri) by NAR (subscriber, #1313) [Link]

Are any of these filesystems supported on other operating systems? If I format an USB stick to
one of these filesystems, will the stick be readable from other operating systems?

UBIFS

Posted Apr 4, 2008 17:31 UTC (Fri) by dedekind (guest, #32521) [Link]

This stuff is for embedded systems and it is not relevant for USB sticks and other FTL-enabled
things (with few weird exceptions probably).

UBIFS

Posted Apr 4, 2008 19:34 UTC (Fri) by Cato (guest, #7643) [Link]

No. Basically there are two types of flash devices, in simple terms:

1. Consumer flash devices - SD Cards, USB sticks, etc - these emulate hard drives by use of a
Flash Translation Layer (FTL) that hides any bad eraseblocks (flash blocks) and does 'wear
levelling' i.e. ensuring that no single flash block is used too much.  Flash blocks wear out
after 10K to 100K writes.  Different consumer flash devices have different FTLs, and writing a
good FTL is apparently hard.  Filesystems that work with such devices are all the normal Linux
hard drive ones - ext3, JFS, XFS, vfat - but they are wholly dependent on the FTL to work.

2. 'Raw' flash devices, aka Memory Technology Devices (MTDs).  These are key parts of consumer
flash devices, but aren't usable directly without an FTL, which is not included.  They expose
the differences between NOR and NAND flash (which are quite large).  MTDs require an
MTD-specific filesystem, e.g. YAFFS, JFFS2, LogFS, etc, and some of these FSs are specific to
NAND or NOR technology.

I've just been reading up on this so some of the above may be slightly off - it's a
complicated area.  The FAQs at http://www.linux-mtd.infradead.org/faq/general.html were quite
helpful, as were the LWN articles of course.

FTL in consumer flash devices

Posted Jul 24, 2008 18:05 UTC (Thu) by brouhaha (subscriber, #1698) [Link]

In the "consumer flash devices" you cite (SD cards, USB sticks), the FTL is entirely embedded and hidden inside the flash device itself. That's also true of MMC cards and CF cards. It was NOT true for so-called "SmartMedia", but that died a well-deserved death. I'm not sure about some other types such as Memory Stick and XD cards.

Filesystems that work with such devices are all the normal Linux hard drive ones - ext3, JFS, XFS, vfat - but they are wholly dependent on the FTL to work.

Acutually, the filesystems are wholly independent of the FTL. Brand X and Brand Y SD cards might have any entirely different FTL embedded in them, but ext3 doesn't know or care, because they present the same interface to the system.

FTL in consumer flash devices

Posted Jul 25, 2008 20:29 UTC (Fri) by nix (subscriber, #2304) [Link]

They're dependent on the FTL in the sense that if the FTL wasn't there, 
they'd ruin the card in short order. Their *code* is not dependent on the 
FTL, but their *proper functioning* is, exactly because they're not 
designed for Flash devices.

UBIFS

Posted Apr 4, 2008 19:40 UTC (Fri) by Cato (guest, #7643) [Link]

Here's a sort-of relevant question: does anyone have pointers on how to optimally configure a
very old low memory laptop (say 100 MB) for use with Linux, using a CF disk or USB thumb drive
as the main 'disk'?  The hard drive has gone and I like the idea of a flash-based laptop at
almost no cost.  The laptop runs Linux fine already, but I want better performance, stability
and flash lifetime.

There are some flash-optimised Linux distros such as Puppy and Damn Small Linux (DSL), but
there's very little guidance on how best to set them up - should you use a large or small RAM
disk, how much swap should you use, etc.  Also, given the state of consumer flash devices and
their FTLs, is it best to do a 'frugal install' (install a loop filesystem, usually squashfs)
to the flash device, and save all state into a separate 'current state' loop FS, which is what
Puppy does, or simply back it up into a tarball, which is what DSL does.

There's a huge amount of expertise in setting up Linux/Unix for hard drive based systems, but
the details of setting it up for flash is rather thin on the ground, at least outside the
embedded device industry.

Flash-based linux

Posted Apr 7, 2008 13:25 UTC (Mon) by hummassa (guest, #307) [Link]

look up 'how to install ubuntu on the eeepc' (I did this on my eeepc) and then you have all 
the tips like 'swappiness to zero', 'noatime nodiratime', etc etc...

Log-structured filesystems

Posted Apr 10, 2008 18:06 UTC (Thu) by abacus (guest, #49001) [Link]

Would it be a good idea to use BTRFS on a flash-based memory device ? BTRFS is a
log-structured filesystem, just like ZFS. Log-structured filesystems try to write all data in
one long log without overwriting older data, which is ideal for flash-based devices. And these
filesystems typically use a block size of 64 KB or more.

See also:
* http://oss.oracle.com/projects/btrfs/
* http://www.cs.berkeley.edu/~brewer/cs262/LFS.pdf

Log-structured filesystems

Posted Apr 14, 2008 6:44 UTC (Mon) by Cato (guest, #7643) [Link]

Flash filesystems such as JFFS2, UBIFS, LogFS, etc are designed for direct use of MTDs (memory
technology devices, i.e. devices that expose the fact they are Flash - typically within an
embedded system), not on block devices (i.e. hard drives or consumer Flash drives such as
Compact Flash, USB thumb drives, etc).  It may be that BTRFS would work well on the latter,
but it probably wouldn't work on MTDs, as it would need to include an FTL (Flash Translation
Layer) to do wear levelling etc.  

Of course, BTRFS ideas and code might be useful in writing an MTD-capable flash filesystem
such as UBIFS/LogFS, but you might as well start from scratch as Flash is a very different
medium to hard disks.