Refs file system reviews. File systems: comparison, secrets and unique features

Meet the new file system ReFS (Resilient File System - fault-tolerant file system).

In principle, it is not so new, Microsoft did not develop ReFS from scratch, previously known under the code name Protogon, which was developed for Windows Server 8 and will now be installed on Windows 8 client machines.

So, to open, close, read and write files, the system uses the same API access interfaces as NTFS.
Many well-known features remained untouched - for example, Bitlocker disk encryption and symbolic links for libraries.
Other features, such as data compression, have disappeared.

The previous NTFS (New Technology File System) file system in version 1.2 was introduced back in 1993 as part of Windows NT 3.1, and by the advent of Windows XP in 2001, NTFS had grown to version 3.1, and only then it began to be installed on client machines.
Gradually, the capabilities of NTFS have reached their limits: scanning large-capacity storage media takes too much time.
The log (registration file) slows down access, and the maximum file size has almost been reached.

Most of ReFS's innovations lie in the area of creating and managing file and folder structures.
They are designed for automatic error correction, maximum scaling and operation in Always Online mode.
For these purposes, Microsoft uses the concept of B+ trees, familiar from databases.
This means that folders in the file system are structured as tables with files as entries.

These, in turn, can have certain attributes added as subtables, creating a hierarchical tree structure.
Even free disk space is organized in tables.
The core of the ReFS system is the object table - a central directory that lists all the tables in the system.

ReFS gets rid of complex log management and now commits new file information to free space, preventing it from being overwritten.
But even if this suddenly happens, the system will re-register links to records in the B+-tree structure.

Like NTFS, ReFS fundamentally distinguishes between file information (metadata) and file content (user data), but generously provides both with the same security features.
Thus, metadata is protected by default using checksums.
The same protection can be provided to user data if desired.
These checksums are located on the disk at a safe distance from each other so that if an error occurs, the data can be recovered.

Transferring data from NTFS to ReFS

Will it be possible to easily and easily convert data from NTFS to ReFS and vice versa in Windows 8?
Microsoft says there won't be any built-in format conversion functionality, but information can still be copied.
The scope of ReFS is obvious: at first it can only be used as a large data manager for the server.
Therefore, it is not yet possible to run Windows 8 from a disk running the new file system.
There will be no external drives with ReFS yet - only internal ones.

Obviously, over time, ReFS will be equipped with more features and will be able to replace the legacy system.
Perhaps this will happen with the release of the first update package for Windows 8.

Comparing NTFS and ReFS file systems.

Rename file

NTFS

1. NTFS writes to the Log that the file name should be changed.
NTFS also records all actions there.
2. Only after this does it change the file name on the spot.
Thus, the old name is overwritten by the new one.
3. Finally, a mark indicating the successful completion of the specified operation appears in the Log (file system registration file).

ReFS

1 - The new name is written to the free space.
It is very important that the previous name is not erased at first.
2 - As soon as the new name is written, ReFS changes the reference to the name field.
Now in the file system it leads not to the old name, but to the new one.

Renaming a file during a power failure

ReFS

1. NTFS, as usual, writes the change request to the Log.
2. After this, due to a power failure, the renaming process is interrupted, and there is no record of either the old or new names.
3. Windows reboots.
4. Following this, the error correction program - Chkdisk - is launched.
5. Only now, using the Journal, when applying a rollback, the original file name is restored.

NTFS

1. In the first stage, ReFS writes a new name to another location in the file system, but at this moment the power supply is cut off.
2. Failure causes Windows to automatically restart.
3. After it, the Chkdisk program starts. It analyzes the file system for errors and corrects them if necessary.
Meanwhile, the ReFS dataset is in a stable state. The previous file name becomes valid again immediately after a power failure.

Key goals of ReFS:

Maintain maximum compatibility with a set of widely used NTFS features, and at the same time get rid of unnecessary ones that only complicate the system;
. Verification and auto-correction of data;
. Maximum scalability;
. The impossibility of completely disabling the file system due to the isolation of faulty areas;
. Flexible architecture using the Storage Spaces feature, which is designed and implemented specifically for ReFS.

Key ReFS features (some only available with Storage Spaces):

Metadata integrity with checksums;
. Integrity streams: a method of writing data to disk for additional data protection if part of the disk is damaged;
. Transactional model “allocate on write” (copy on write);
. Large limits on the size of partitions, files and directories.
Partition size is limited to 278 bytes with a cluster size of 16 KB (2 64 16 2 10), Windows stack supports 2 64 .
Maximum number of files in a directory: 2 64 .
Maximum number of directories in a section: 2 64 ;
. Pooling and virtualization for easier partitioning and file system management;
. Serial data segmentation (data ripping) for improved performance, redundant writes for fault tolerance;
. Support for background disk cleaning techniques (disk scrubbing) to identify hidden errors;
. Rescue data around a damaged area on the disk;
. Shared storage pools between machines for additional fault tolerance and load balancing.

Pipe cutter and pipe bender for self-assembly of life-support equipment

Two tools from EK Water Blocks are aimed at those who assemble their own liquid liquids: the EK-Loop Soft Tube Cutter and the EK-Loop Modulus Hard Tube Bending Tool.

The first set of graphics drivers, Radeon Software Adrenalin 2020 Edition 20.1.1, released in January 2020, contains optimizations for the game Monster Hunter World: Iceborne and fixes almost three dozen errors identified in previous releases.

Google will continue to support Chrome browser for Windows 7

Many users, especially enterprise users, are in no hurry to abandon Windows 7, although extended support for Windows 7 for regular users ends on January 14, 2020.

NTFS And FAT32 are not the only file systems that Windows 10 can see and read. It also supports legacy FAT, expanded exFAT, new ReFS, virtual CDFS, and also partially with those used in Linux EXT2 And EXT3. When working with disks, including removable ones, you may need to determine the current file system of the media or logical partition. In Windows 10 you can do this in different ways.

Simplest- this is to open the disk properties and see what is specified in the parameter.

However, this method is only suitable for those drives that have a letter and explicitly "native" file system, if the media or partition is formatted in EXT3 or due to a failure it will be determined as RAW, it will not be available in Explorer. In this case, you should use a snap-in or console utility to determine the file system.

Press Win + X to open the Start button menu, launch the snap-in and look at the contents of the column.

The snap-in will display even those disks and partitions that do not have a letter. Another way to look FS carrier is to use the utility. Open a command line and run these two commands:

list volume

The first command launches the utility, the second displays a list of all logical partitions. You will find the information you need in the column FS. Alternative - console. To find out the file system type for all disks, run it as an administrator and run the command get-volume. The searched values will be listed in the column FileSystemType.

Alas, all the methods described above have a common drawback, namely incorrect recognition of Linux file systems CDFS And EXT2/3/4 .

So, in our example determined CDFS How Unknown and the Disk Management snap-in identified CDFS correct, but could not recognize EXT3, defining it as RAW, that is, like the absence of a file system. Standard tools began to show correct results only after installation - utilities and drivers to provide access to media EXT2/3/4 from under Windows.

And since we're talking about EXT2/3/4, at the same time it is also worth mentioning file systems HFS or HFS+, used in poppies. When connected to a Windows computer, they will also not be detected and in order to be able to work with them, you need to install a driver Paragon HFS+ or MacDrive.

In 2012, Microsoft decided to improve the NTFS file system and released a test, more reliable version of ReFS (Resilient File System).

Today this format is available for users of the Windows 8/8.1 and Windows 10 operating systems. Windows 7 and earlier versions do not work with devices of this format. How to change the format of a flash drive to ReFS in Windows 8/8.1 and Windows 10?

Advantages and disadvantages of the ReFS format

This file system has many advantages. However, just like at the initial stage of NTFS development, they are quite shaky.

Among the advantages of ReFS it is worth highlighting:

Cataloged file location;
Fault tolerance, which is implemented by background recovery and logging processes. However, at the same time, this quality is also a disadvantage. Essentially, if the drive fails, you will not find any tools to restore it.
Automatically fixes errors and file corruption.
Copy, write and move large files.

Support for symbolic links.
High data transfer speed.

Among the disadvantages of this system it is worth highlighting:

Incompatible with operating systems Windows 7 and lower;
Lack of conversion programs;
Fixed cluster size of 67 KB;
No quotas;
No deduplication (files will be copied in 2 or more copies).

And although the advantages are significant, the NTFS file system will occupy a leading position for several more years. If you have Windows 8/8.1 or Windows 10 installed, you can format one drive and test ReFS.

Format a flash drive in ReFS

To format a drive in ReFS, you need to make changes in the Registry Editor. To do this, press “Win + R” and enter “regedit”.

The Registry Editor will open. Go to the branch “HKEY_LOCAL_MACHINE”, “SYSTEM”,

Right-click on the section and select “New”, “DWORD Value”. Call the parameter “RefsDisableLastAccessUpdate” and set the value to “1”.

In the “Control” section of the same branch, it is worth creating a new section. Let's call it "MiniNT". In it we create a DWORD parameter with the name “AllowRefsFormatOverNonmirrorVolume” and the value “1”.

Reboot your PC for the changes to take effect.

You can also format the drive using the Disk Management console. To do this, enter the command “format e:/fs:refs”, and then click “Yes”.

The cumulative Creators Update, which introduced many new features to Windows 10, included official support for the modern REFS file system, the successor to NTFS, currently used in operating systems from Microsoft. This fact did not cause much fuss, since REFS is far from a new development of the software giant. It could be used in Windows 10 before, but only for disk spaces created by the system (software RAID). This feature was not provided for regular disk partitions, but it could be implemented in both Windows 10 and Windows 8.1 (64-bit editions) by manually editing the system registry or making changes using REG files posted on forums for computer geeks .

Modern REFS file system: features and surprises

What kind of file system is this, how does it differ from NTFS, what are its real benefits for ordinary users, and what surprises should you be prepared for when working with it - more on all this below.

Features of REFS

REFS is an abbreviation for Resilient File System, which in Russian means fault-tolerant file system. This, as mentioned above, is the successor to NTFS, but for now in the distant, poorly foreseeable future. Microsoft introduced the new file system to the world back in 2012. All these years it has been “tested” on server editions of Windows, starting with version Server 2012. 6 years of testing have led only to the modest fate of being an alternative for non-system disk partitions in the latest version of the client operating system. However, if you look at the history of NTFS implementation, it turns out that in the case of REFS everything goes on as usual. After all, Microsoft has been implementing NTFS on client Windows for 7 long years.

The new file system is not just a successor to NTFS, it is based on the latter, but eliminates its shortcomings and opens up new possibilities. The key feature of REFS is fault tolerance and protection against data loss, which is ensured by a number of mechanisms to support their integrity. Microsoft is so confident in its work that for disk partitions formatted in REFS, it even removed from their properties the ability to run a check for file system errors.

The new file system inherited from NTFS:

Access control lists ACLs;
USN Magazine;
Symbolic links;
Mounting, connecting and reprocessing points;
BitLocker encryption technology.

REFS has eliminated unused NTFS features:

EFS file-level encryption;
DOS-compatible short filenames 8.3;
Hard links;
Disk quotas.

Features of REFS that NTFS does not include:

Preventing data loss - minimizing the occurrence of file system errors, isolating damaged sectors, preventive measures to avoid data corruption;
As the developers assure, increased performance;
Promptly checking disks for errors;
Other features are listed below in the comparison table with NTFS.

Real benefits for ordinary users

Which of the benefits described above are good for ordinary users? To whom the ultimate capabilities of NTFS seem astronomical due to the lack of opportunity to implement them.

Alas, the bottom line is that we will only get the opportunity to no longer languish in anticipation, watching on the preload screen the flickering numbers of the progress of checking the file system for errors if Windows ends incorrectly. Well, and even less chance of losing valuable data. Less, but not 100%. A fault-tolerant file system is very good, but it naturally only solves its own problems. Whatever file system is used, user data is still threatened by the theoretical probability of hard drive failure, which is the task of the users themselves to prevent. Of course, REFS can solve this problem for users, but only within the framework of using Storage Spaces technology and creating a storage pool similar to mirrored RAID 1 (at a minimum).

In this case, the “reliable file system + reliable storage” combination will undoubtedly provide the greatest guarantees. Just what is so valuable that should be stored on the average person’s disk so that he would bother and invest financially in RAID, regardless of the technology for its implementation?

What about the claimed performance improvements for REFS? This applies to a greater extent to the use of the same disk space technology. The new file system initially allows data to be written to a faster hard drive. And when the computer is idle, large files will be moved to a slower hard drive.

What can ordinary users expect if they have a single HDD installed on their computer? Alas, no matter what. When testing REFS and comparing it with NTFS on a regular HDD partition, no performance improvements were noted. Under the same testing conditions - with the same test file size, with the same number of read and write cycles, on the same disk partition - the Crystal Disk Mark program recorded approximately the same performance. The random reading and writing of small files that are significant for performance in REFS exceeded the speeds of NTFS by a tiny margin.

This means that the new file system is not optimized in any way to reduce the number of movements of HDD heads. And, accordingly, it will not solve the problem of hard drives, which have long been asking to remain in the past of computer technology.

Benefits for those working with hypervisors

But in terms of performance, there is also good news, although not really for ordinary users, but rather for advanced users working with the Microsoft Hyper-V hypervisor. If virtual machines are placed on a partition formatted in REFS, the processes of cloning and merging with checkpoints will be many times faster. Because for a new file system it is enough to write new metadata and refer to the data written on disk, but not to physically copy it.

REFS can also quickly write zeros to a large file, which means that when creating virtual disks with a fixed size, you will need to wait a few seconds, not minutes, as happens in NTFS. And this is a significant breakthrough. NTFS not only takes a long time to create fixed virtual disks, it also loads the HDD, making it impossible to work in parallel with other programs. When testing creating a 60GB fixed-size VHD file on a REFS partition, the process took 1 second. Whereas on a partition with NTFS, creating exactly the same VHD file took almost 7 minutes with the disk load at 99%.

It is expected that these features will also be implemented when working with VMware and VirtualBox virtual machines.

Disadvantages of REFS

We've sorted out the advantages of REFS, but what about the disadvantages? They exist, but if Microsoft decides to actively implement the new file system, some of the shortcomings will be eliminated over time. For now, we have what we have - REFS:

Can only be used for non-system disk partitions, and cannot be used for a Windows partition;
Can only be used for internal media, but not for external media;
You cannot convert an NTFS partition into it without losing data, you can only format it, which necessitates the need to temporarily transfer data somewhere;
Not all third-party programs work with it, in particular, this applies to resuscitators of deleted data.

Well, the main surprise: friends, do you recognize the version of Windows?

This is how you store data in new file systems. Not like Windows 7, even Windows 8.1 does not see the REFS partition. In the case of Windows 8.1, an attempt was made to give the new file system a chance to be recognized, and an amendment was made to the system registry that provided support for REFS. But only the ability to format new partitions in Windows 8.1 was implemented.

Why may a smartphone not launch programs from a memory card? How is ext4 fundamentally different from ext3? Why will a flash drive last longer if you format it in NTFS rather than FAT? What is the main problem with F2FS? The answers lie in the structural features of file systems. We'll talk about them.

Introduction

File systems define how data is stored. They determine what limitations the user will encounter, how fast read and write operations will be, and how long the drive will operate without failures. This is especially true for budget SSDs and their younger brothers - flash drives. Knowing these features, you can get the most out of any system and optimize its use for specific tasks.

You have to choose the type and parameters of the file system every time you need to do something non-trivial. For example, you want to speed up the most common file operations. At the file system level, this can be achieved in different ways: indexing will provide fast searches, and pre-reserving free blocks will make it easier to rewrite frequently changing files. Pre-optimizing the data in RAM will reduce the number of required I/O operations.

Such properties of modern file systems as lazy writing, deduplication and other advanced algorithms help to increase the period of trouble-free operation. They are especially relevant for cheap SSDs with TLC memory chips, flash drives and memory cards.

There are separate optimizations for different levels of disk arrays: for example, the file system can support simplified volume mirroring, instant snapshotting, or dynamic scaling without taking the volume offline.

Black box

Users generally work with the file system that is offered by default by the operating system. They rarely create new disk partitions and even less often think about their settings - they simply use the recommended parameters or even buy pre-formatted media.

For Windows fans, everything is simple: NTFS on all disk partitions and FAT32 (or the same NTFS) on flash drives. If there is a NAS and it uses some other file system, then for most it remains beyond perception. They simply connect to it over the network and download files, as if from a black box.

On mobile gadgets with Android, ext4 is most often found in the internal memory and FAT32 on microSD cards. Yabloko does not care at all what kind of file system they have: HFS+, HFSX, APFS, WTFS... for them there are only beautiful folder and file icons drawn by the best designers. Linux users have the richest choice, but you can add support for non-native file systems in both Windows and macOS - more on that later.

Common roots

Over a hundred different file systems have been created, but a little more than a dozen can be considered current. Although they were all developed for their own specific applications, many ended up being related on a conceptual level. They are similar because they use the same type of (meta)data representation structure - B-trees (“bi-trees”).

Like any hierarchical system, a B-tree begins with a root record and then branches down to leaf elements - individual records of files and their attributes, or “leaves.” The main reason for creating such a logical structure was to speed up the search for file system objects on large dynamic arrays - such as multi-terabyte hard drives or even larger RAID arrays.

B-trees require far fewer disk accesses than other types of balanced trees to perform the same operations. This is achieved due to the fact that the final objects in B-trees are hierarchically located at the same height, and the speed of all operations is precisely proportional to the height of the tree.

Like other balanced trees, B-trees have equal path lengths from the root to any leaf. Instead of growing upward, they branch more and grow wider: all branch points in a B-tree store many references to child objects, making them easy to find in fewer calls. A large number of pointers reduces the number of the most time-consuming disk operations - head positioning when reading arbitrary blocks.

The concept of B-trees was formulated back in the seventies and has since undergone various improvements. In one form or another it is implemented in NTFS, BFS, XFS, JFS, ReiserFS and many DBMSs. All of them are relatives in terms of the basic principles of data organization. The differences concern details, often quite important. Related file systems also have a common disadvantage: they were all created to work specifically with disks even before the advent of SSDs.

Flash memory as the engine of progress

Solid-state drives are gradually replacing disk drives, but for now they are forced to use file systems that are alien to them, passed down by inheritance. They are built on flash memory arrays, the operating principles of which differ from those of disk devices. In particular, flash memory must be erased before being written, an operation that NAND chips cannot perform at the individual cell level. It is only possible for large blocks entirely.

This limitation is due to the fact that in NAND memory all cells are combined into blocks, each of which has only one common connection to the control bus. We will not go into details of the page organization and describe the complete hierarchy. The very principle of group operations with cells and the fact that the sizes of flash memory blocks are usually larger than the blocks addressed in any file system are important. Therefore, all addresses and commands for drives with NAND flash must be translated through the FTL (Flash Translation Layer) abstraction layer.

Compatibility with the logic of disk devices and support for commands of their native interfaces is provided by flash memory controllers. Typically, FTL is implemented in their firmware, but can (partially) be implemented on the host - for example, Plextor writes drivers for its SSDs that accelerate writing.

It is impossible to do without FTL, since even writing one bit to a specific cell triggers a whole series of operations: the controller finds the block containing the desired cell; the block is read completely, written to the cache or to free space, then erased entirely, after which it is rewritten back with the necessary changes.

This approach is reminiscent of everyday life in the army: in order to give an order to one soldier, the sergeant makes a general formation, calls the poor fellow out of formation and commands the rest to disperse. In the now rare NOR memory, the organization was special forces: each cell was controlled independently (each transistor had an individual contact).

The tasks for controllers are increasing, since with each generation of flash memory the technical process of its production decreases in order to increase density and reduce the cost of data storage. Along with technological standards, the estimated service life of chips is also decreasing.

Modules with single-level SLC cells had a declared resource of 100 thousand rewrite cycles and even more. Many of them still work in old flash drives and CF cards. For enterprise-class MLC (eMLC), the resource was declared in the range of 10 to 20 thousand, while for regular consumer-grade MLC it is estimated at 3-5 thousand. Memory of this type is actively being squeezed by even cheaper TLC, whose resource barely reaches a thousand cycles. Keeping the lifespan of flash memory at an acceptable level requires software tricks, and new file systems are becoming one of them.

Initially, the manufacturers assumed that the file system was unimportant. The controller itself must service a short-lived array of memory cells of any type, distributing the load between them in an optimal way. For the file system driver, it simulates a regular disk, and itself performs low-level optimizations on any access. However, in practice, optimization varies from device to device, from magical to bogus.

In enterprise SSDs, the built-in controller is a small computer. It has a huge memory buffer (half a gigabyte or more) and supports many data efficiency techniques to avoid unnecessary rewrite cycles. The chip organizes all blocks in the cache, performs lazy writes, performs on-the-fly deduplication, reserves some blocks and clears others in the background. All this magic happens completely unnoticed by the OS, programs and the user. With an SSD like this, it really doesn't matter which file system is used. Internal optimizations have a much greater impact on performance and resource than external ones.

Budget SSDs (and even more so flash drives) are equipped with much less smart controllers. The cache in them is limited or absent, and advanced server technologies are not used at all. The controllers in memory cards are so primitive that it is often claimed that they do not exist at all. Therefore, for cheap devices with flash memory, external methods of load balancing remain relevant - primarily using specialized file systems.

From JFFS to F2FS

One of the first attempts to write a file system that would take into account the principles of organizing flash memory was JFFS - Journaling Flash File System. Initially, this development by the Swedish company Axis Communications was aimed at increasing the memory efficiency of network devices that Axis produced in the nineties. The first version of JFFS supported only NOR memory, but already in the second version it became friends with NAND.

Currently JFFS2 has limited use. It is still mainly used in Linux distributions for embedded systems. It can be found in routers, IP cameras, NAS and other regulars of the Internet of Things. In general, wherever a small amount of reliable memory is required.

A further attempt to develop JFFS2 was LogFS, which stored inodes in a separate file. The authors of this idea are Jorn Engel, an employee of the German division of IBM, and Robert Mertens, a teacher at the University of Osnabrück. LogFS source code is available on GitHub. Judging by the fact that the last change to it was made four years ago, LogFS has not gained popularity.

But these attempts spurred the emergence of another specialized file system - F2FS. It was developed by Samsung Corporation, which accounts for a considerable part of the flash memory produced in the world. Samsung makes NAND Flash chips for its own devices and for other companies, and also develops SSDs with fundamentally new interfaces instead of legacy disk ones. Creating a specialized file system optimized for flash memory was a long overdue necessity from Samsung's point of view.

Four years ago, in 2012, Samsung created F2FS (Flash Friendly File System). Her idea was good, but the implementation turned out to be crude. The key task when creating F2FS was simple: to reduce the number of cell rewrite operations and distribute the load on them as evenly as possible. This requires performing operations on multiple cells within the same block at the same time, rather than forcing them one at a time. This means that what is needed is not instant rewriting of existing blocks at the first request of the OS, but caching of commands and data, adding new blocks to free space and delayed erasing of cells.

Today, F2FS support is already officially implemented in Linux (and therefore in Android), but in practice it does not yet provide any special advantages. The main feature of this file system (lazy rewrite) led to premature conclusions about its effectiveness. The old caching trick even fooled early versions of benchmarks, where F2FS demonstrated an imaginary advantage not by a few percent (as expected) or even by several times, but by orders of magnitude. The F2FS driver simply reported the completion of an operation that the controller was just planning to do. However, if the real performance gain for F2FS is small, then the wear on the cells will definitely be less than when using the same ext4. Those optimizations that a cheap controller cannot do will be performed at the level of the file system itself.

Extents and bitmaps

For now, F2FS is perceived as exotic for geeks. Even Samsung's own smartphones still use ext4. Many consider it a further development of ext3, but this is not entirely true. This is more about a revolution than about breaking the 2 TB per file barrier and simply increasing other quantitative indicators.

When computers were large and files were small, addressing was not a problem. Each file was allocated a certain number of blocks, the addresses of which were entered into the correspondence table. This is how the ext3 file system worked, which remains in service to this day. But in ext4 a fundamentally different addressing method appeared - extents.

Extents can be thought of as extensions of inodes as discrete sets of blocks that are addressed entirely as contiguous sequences. One extent can contain an entire medium-sized file, but for large files it is enough to allocate a dozen or two extents. This is much more efficient than addressing hundreds of thousands of small blocks of four kilobytes.

The recording mechanism itself has also changed in ext4. Now blocks are distributed immediately in one request. And not in advance, but immediately before writing data to disk. Lazy multi-block allocation allows you to get rid of unnecessary operations that ext3 was guilty of: in it, blocks for a new file were allocated immediately, even if it entirely fit in the cache and was planned to be deleted as temporary.

FAT Restricted Diet

In addition to balanced trees and their modifications, there are other popular logical structures. There are file systems with a fundamentally different type of organization - for example, linear. You probably use at least one of them often.

Mystery

Guess the riddle: at twelve she began to gain weight, by sixteen she was a stupid fatty, and by thirty-two she became fat, and remained a simpleton. Who is she?

That's right, this is a story about the FAT file system. Compatibility requirements provided her with bad heredity. On floppy disks it was 12-bit, on hard drives it was initially 16-bit, and has survived to this day as 32-bit. In each subsequent version, the number of addressable blocks increased, but nothing changed in its essence.

The still popular FAT32 file system appeared twenty years ago. Today it is still primitive and does not support access control lists, disk quotas, background compression, or other modern data optimization technologies.

Why is FAT32 needed these days? Everything is still solely to ensure compatibility. Manufacturers rightly believe that a FAT32 partition can be read by any OS. That's why they create it on external hard drives, USB Flash and memory cards.

How to free up your smartphone's flash memory

microSD(HC) cards used in smartphones are formatted in FAT32 by default. This is the main obstacle to installing applications on them and transferring data from internal memory. To overcome it, you need to create a partition on the card with ext3 or ext4. All file attributes (including owner and access rights) can be transferred to it, so any application can work as if it were launched from internal memory.

Windows does not know how to create more than one partition on flash drives, but for this you can run Linux (at least in a virtual machine) or an advanced utility for working with logical partitioning - for example, MiniTool Partition Wizard Free. Having discovered an additional primary partition with ext3/ext4 on the card, the Link2SD application and similar ones will offer many more options than in the case of a single FAT32 partition.

Another argument in favor of choosing FAT32 is often cited as its lack of journaling, which means faster write operations and less wear on NAND Flash memory cells. In practice, using FAT32 leads to the opposite and gives rise to many other problems.

Flash drives and memory cards die quickly due to the fact that any change in FAT32 causes overwriting of the same sectors where two chains of file tables are located. I saved the entire web page, and it was overwritten a hundred times - with each addition of another small GIF to the flash drive. Have you launched portable software? It creates temporary files and constantly changes them while running. Therefore, it is much better to use NTFS on flash drives with its failure-resistant $MFT table. Small files can be stored directly in the main file table, and its extensions and copies are written to different areas of flash memory. In addition, NTFS indexing makes searching faster.

INFO

For FAT32 and NTFS, theoretical restrictions on the level of nesting are not specified, but in practice they are the same: only 7707 subdirectories can be created in a first-level directory. Those who like to play matryoshka dolls will appreciate it.

Another problem that most users face is that it is impossible to write a file larger than 4 GB to a FAT32 partition. The reason is that in FAT32 the file size is described by 32 bits in the file allocation table, and 2^32 (minus one, to be precise) is exactly four gigs. It turns out that neither a movie in normal quality nor a DVD image can be written to a freshly purchased flash drive.

Copying large files is not so bad: when you try to do this, the error is at least immediately visible. In other situations, FAT32 acts as a time bomb. For example, you copied portable software onto a flash drive and at first you use it without problems. After a long time, one of the programs (for example, accounting or email), the database becomes bloated, and... it simply stops updating. The file cannot be overwritten because it has reached the 4 GB limit.

A less obvious problem is that in FAT32 the creation date of a file or directory can be specified to within two seconds. This is not sufficient for many cryptographic applications that use timestamps. The low precision of the date attribute is another reason why FAT32 is not considered a valid file system from a security perspective. However, its weaknesses can also be used for your own purposes. For example, if you copy any files from an NTFS partition to a FAT32 volume, they will be cleared of all metadata, as well as inherited and specially set permissions. FAT simply doesn't support them.

exFAT

Unlike FAT12/16/32, exFAT was developed specifically for USB Flash and large (≥ 32 GB) memory cards. Extended FAT eliminates the above-mentioned disadvantage of FAT32 - overwriting the same sectors with any change. As a 64-bit system, it has no practically significant limits on the size of a single file. Theoretically, it can be 2^64 bytes (16 EB) in length, and cards of this size will not appear soon.

Another fundamental difference between exFAT is its support for access control lists (ACLs). This is no longer the same simpleton from the nineties, but the closed nature of the format hinders the implementation of exFAT. ExFAT support is fully and legally implemented only in Windows (starting from XP SP2) and OS X (starting from 10.6.5). On Linux and *BSD it is supported either with restrictions or not quite legally. Microsoft requires licensing for the use of exFAT, and there is a lot of legal controversy in this area.

Btrfs

Another prominent representative of file systems based on B-trees is called Btrfs. This FS appeared in 2007 and was originally created in Oracle with an eye to working with SSDs and RAIDs. For example, it can be dynamically scaled: creating new inodes directly on the running system or dividing a volume into subvolumes without allocating free space to them.

The copy-on-write mechanism implemented in Btrfs and full integration with the Device mapper kernel module allow you to take almost instantaneous snapshots through virtual block devices. Pre-compression (zlib or lzo) and deduplication speed up basic operations while also extending the lifetime of flash memory. This is especially noticeable when working with databases (2-4 times compression is achieved) and small files (they are written in orderly large blocks and can be stored directly in “leaves”).

Btrfs also supports full logging mode (data and metadata), volume checking without unmounting, and many other modern features. The Btrfs code is published under the GPL license. This file system has been supported as stable in Linux since kernel version 4.3.1.

Logbooks

Almost all more or less modern file systems (ext3/ext4, NTFS, HFSX, Btrfs and others) belong to the general group of journaled ones, since they keep records of changes made in a separate log (journal) and are checked against it in the event of a failure during disk operations . However, the logging granularity and fault tolerance of these file systems differ.

Ext3 supports three logging modes: closed-loop, ordered, and full logging. The first mode involves recording only general changes (metadata), performed asynchronously with respect to changes in the data itself. In the second mode, the same metadata recording is performed, but strictly before making any changes. The third mode is equivalent to full logging (changes both in metadata and in the files themselves).

Only the last option ensures data integrity. The remaining two only speed up the detection of errors during the scan and guarantee restoration of the integrity of the file system itself, but not the contents of the files.

Journaling in NTFS is similar to the second logging mode in ext3. Only changes in metadata are recorded in the log, and the data itself may be lost in the event of a failure. This logging method in NTFS was not intended as a way to achieve maximum reliability, but only as a compromise between performance and fault tolerance. This is why people who are used to working with fully journaled systems consider NTFS pseudo-journaling.

The approach implemented in NTFS is in some ways even better than the default in ext3. NTFS additionally periodically creates checkpoints to ensure that all previously deferred disk operations are completed. Checkpoints have nothing to do with recovery points in \System Volume Information\ . These are just service log entries.

Practice shows that such partial NTFS journaling is in most cases sufficient for trouble-free operation. After all, even with a sudden power outage, disk devices do not lose power instantly. The power supply and numerous capacitors in the drives themselves provide just the minimum amount of energy that is enough to complete the current write operation. With modern SSDs, with their speed and efficiency, the same amount of energy is usually enough to perform pending operations. An attempt to switch to full logging would reduce the speed of most operations significantly.

Connecting third-party files in Windows

The use of file systems is limited by their support at the OS level. For example, Windows does not understand ext2/3/4 and HFS+, but sometimes it is necessary to use them. This can be done by adding the appropriate driver.

WARNING

Most drivers and plugins for supporting third-party file systems have their limitations and do not always work stably. They may conflict with other drivers, antiviruses, and virtualization programs.

An open driver for reading and writing ext2/3 partitions with partial support for ext4. The latest version supports extents and partitions up to 16 TB. LVM, access control lists, and extended attributes are not supported.

There is a free plugin for Total Commander. Supports reading ext2/3/4 partitions.

coLinux is an open and free port of the Linux kernel. Together with a 32-bit driver, it allows you to run Linux on Windows from 2000 to 7 without using virtualization technologies. Supports 32-bit versions only. Development of a 64-bit modification was canceled. CoLinux also allows you to organize access to ext2/3/4 partitions from Windows. Support for the project was suspended in 2014.

Windows 10 may already have built-in support for Linux-specific file systems, it's just hidden. These thoughts are suggested by the kernel-level driver Lxcore.sys and the LxssManager service, which is loaded as a library by the Svchost.exe process. For more information about this, see Alex Ionescu’s report “The Linux Kernel Hidden Inside Windows 10,” which he gave at Black Hat 2016.

ExtFS for Windows is a paid driver produced by Paragon. It runs on Windows 7 to 10 and supports read/write access to ext2/3/4 volumes. Provides almost complete support for ext4 on Windows.

HFS+ for Windows 10 is another proprietary driver produced by Paragon Software. Despite the name, it works in all versions of Windows starting from XP. Provides full access to HFS+/HFSX file systems on disks with any layout (MBR/GPT).

WinBtrfs is an early development of the Btrfs driver for Windows. Already in version 0.6 it supports both read and write access to Btrfs volumes. It can handle hard and symbolic links, supports alternative data streams, ACLs, two types of compression and asynchronous read/write mode. While WinBtrfs does not know how to use mkfs.btrfs, btrfs-balance and other utilities to maintain this file system.

Capabilities and limitations of file systems: summary table

File system	Maximum volume size	Limit size of one file	Length of proper file name	Length of the full file name (including the path from the root)	Limit number of files and/or directories	Accuracy of file/directory date indication	Rights dos-tu-pa	Hard links	Symbolic links	Snap-shots	Data compression in the background	Data encryption in the background	Grandfather-ple-ka-tion of data
FAT16	2 GB in 512 byte sectors or 4 GB in 64 KB clusters	2 GB	255 bytes with LFN	-	-	-	-	-	-	-	-	-	-
FAT32	8 TB sectors of 2 KB each	4 GB (2^32 - 1 byte)	255 bytes with LFN	up to 32 subdirectories with CDS	65460	10 ms (create) / 2 s (modify)	No	No	No	No	No	No	No
exFAT	≈ 128 PB (2^32-1 clusters of 2^25-1 bytes) theoretical / 512 TB due to third-party restrictions	16 EB (2^64 - 1 byte)			2796202 in the catalog	10 ms	ACL	No	No	No	No	No	No
NTFS	256 TB in 64 KB clusters or 16 TB in 4 KB clusters	16 TB (Win 7) / 256 TB (Win 8)	255 Unicode characters (UTF-16)	32,760 Unicode characters, up to a maximum of 255 characters per element	2^32-1	100 ns	ACL	Yes	Yes	Yes	Yes	Yes	Yes
HFS+	8 EB (2^63 bytes)	8 EB	255 Unicode characters (UTF-16)	not limited separately	2^32-1	1 s	Unix, ACL	Yes	Yes	No	Yes	Yes	No
APFS	8 EB (2^63 bytes)	8 EB	255 Unicode characters (UTF-16)	not limited separately	2^63	1 ns	Unix, ACL	Yes	Yes	Yes	Yes	Yes	Yes
Ext3	32 TB (theoretically) / 16 TB in 4 KB clusters (due to limitations of e2fs programs)	2 TB (theoretically) / 16 GB for older programs	255 Unicode characters (UTF-16)	not limited separately	-	1 s	Unix, ACL	Yes	Yes	No	No	No	No
Ext4	1 EB (theoretically) / 16 TB in 4 KB clusters (due to limitations of e2fs programs)	16 TB	255 Unicode characters (UTF-16)	not limited separately	4 billion	1 ns	POSIX	Yes	Yes	No	No	Yes	No
F2FS	16 TB	3.94 TB	255 bytes	not limited separately	-	1 ns	POSIX, ACL	Yes	Yes	No	No	Yes	No
BTRFS	16 EB (2^64 - 1 byte)	16 EB	255 ASCII characters	2^17 bytes	-	1 ns	POSIX, ACL	Yes	Yes	Yes	Yes	Yes	Yes