ID	918f0526-89f3-4a5c-a08d-5fa9e99c2cc7
DeertopiaVisibility	public

Non-hierarchical file system

META: This file is concerned with the file system, which only discusses tags when they are directly relevant to the file system. For abstract notes on tags, see Tag-oriented organisation.

A non-hierarchical file system is one that rejects the rejects the traditional "folder-based" approach to file systems. This could be in favour of any alternative structure: a flat file system (one without directories), a database file system, a semantic file system, etc. These notes primarily concern themselves with the tag-based file system.

Motivation

Example: Photos

You do a photoshoot and capture a series of images, say “Fred-01.jpg”, “Fred-02.jpg”, ..., “Fred-30.jpg”. You delete some outtakes, leaving holes in the sequential file numbering. If you publish this subset of photos as is, the consumer might wonder why there are gaps in the numbering, and get unintended insight into how many raws were shot. But it gets worse – your coworker Mary was shooting alongside you, and offers you her collection of photos of the same subject. In a stroke of bad luck (or great minds thinking alike?), she also named her photos “Fred-01.jpg”, “Fred-02.jpg”, ..., “Fred-19.jpg”. You want the benefit of being able to use her files, but now you are left with a dilemma on how to organize them. You can either put her files in a separate folder so that you end up with “Me/Fred-01.jpg” vs. “Mary/Fred-01.jpg”, or you can re-number her files to integrate them into your collection (e.g. add 30 to Mary’s numbers), or you can add a prefix to her files (e.g. “Mary-Fred-01.jpg”) and put them into your folder. The first option of keeping collections in separate folders is the easiest to do but makes retrieval awkward; the second and third are awkward due to poor tools for renaming files, and have a high potential for human error (misnaming, accidentally deleting, or overwriting files). In all cases, you have to take extraneous actions to merge two collections together or retrieve files from a merged collection. These extra steps stem from the hierarchical filing paradigm, not from the idea of merging data.
---
[cite:@nayuki2017designing]

Example: Jade's music library

In general I have laid it out along the lines of <genre>/<subgenre>/.../<artist>/<album>/<track> which works out fine for 95% of albums. But there exist things such as compilations ... so I had to become inventive by using 🥁 🥁 🥁 symbolic links! So now I have <genre>/<subgenre>/.../[Compilation]/<album> and symlink it such that for any one artist the structure above is preserved. But from this I'm getting the issue of subgenres overlapping ... for example I have the hierarchy punk/hardcore punk/ as well as metal/<...> and I somehow have to integrate grindcore into this, which developed solely from hardcore but is nowadays considered a subgenre of metal ..... so more symlinking it is
---
[cite:@jade2024music]

Example: Madeleine's documents

My ~/Documents folder has been growing at an increasing rate:

/home/crumb/Documents
├── books
│   ├── (42 entries)
├── papers
│   ├── (54 entries)
└── (13 entries)

Notice my horrible organisation! I've arbitrary decided to throw books in one pile, whitepapers in another, and a handful of unsorted documents at the root. Ideally, my documents would be tagged with topics, authors, publish years, etc. If I were preparing for a compiler project, I would love to quickly query my library to see my compiler-related papers and books.

Note-taking

Trying to fit notes into a hierarchy is particularly awful. Dumping everything into some ~/Notes makes your precious knowledge undiscoverable. Organising into directories will quickly become unmaintainable (TODO: why?), and is a waste of time in the first place. Every respectable knowledge base that comes to mind takes the shape of a wiki, optionally with tagged articles. To my awareness, the only exception is the 1Lab — a literate Agda library, subject to the constraints imposed by Agda's module system; see Implications for programming languages. Examples of non-hierarchical knowledge management include org-roam, Wikipedia, the nLab, and many, many others. Instead of manually organising topics by placing them into different directories, articles are given titles and aliases, and connections between articles are made naturally by mention of said topics.

The HFS→NHFS injection

Permissions

TODO investigate NTFS permissions

The CWD

This is mostly a matter of UI/UX, but could potentially be an issue of compatibility, depending on the scale. How does one navigate a NHFS? does one navigate a NHFS?

Navigation

Related to the CWD problem. How does one navigate a NHFS?

Set expressions

Navigation of a HFS is something like incremental construction of a path; one metaphor for navigation of a NHFS is incremental construction of a set expression, using disjunction, conjunction, implication, and negation. One would start at the "root" (the set of everything on the file system), and add additional constraints one by one.

There is no way the naïve implementation is at all feasible, lmfao. Any form of previewing the results — e.g., tab-completions, or display in a graphical file manager — would likely have to be done lazily, with generous caching, and with items received in arbitrary order. If the file system could gaurantee search/traversal in an at all meaningful order, that would be pretty cool.

Compatibility with the FHS

In a system built from the ground up with a NHFS, how are system libraries and executables located? I think NHFSes have the potential to offer a lot in this regard. Imagine if instead of your $PATH being a list of arbitrary directories (with a significant number of files symlinked into those directories!), you declared the $PATH to be a query for files tagged with "on-path." This example is a pretty simple change, but I can sense a lot of potential here. Continue to ponder this!

Implications for programming languages

ID	9d828b35-672f-4256-929e-381fbc50d585
DeertopiaVisibility	public
ROAM_EXCLUDE	t

Programming languages typically model modules in parallel or atop of the underlying HFS. How do NHFSes play into this?

Identity

Identity appears to be the single most difficult point in designing a NHFS. At the end of the day, there will always be situations where uniquely and reliably identifying a single file object will be necessary. On the implementation's side, files must be identified by something to be meaningfully queried at all; for the user, the ability to refer to a single file is essential — which file am I executing/reading/writing?

Hashing

While really cool, identification based on a hash has the inherent issue of not allowing mutability. i.e., to modify a file is to delete it, and create a new file with the applied modifications. — Jade.

Diff chains

To makes immutability more feasible, one could imagine files being stored as a linked list of diffs/patches; however, this carries space concerns.

Reference correction

Upon modification, all references to a file could be updated. This is time-consuming, and error-prone.

UUID / Global counter

Ugly! — Jade.

Userspace solutions

One approach is to implement a virtual file system using FUSE. A consequence of this, observed in tmsu, is that the "guest" file system becomes very fragile to changes made in the "host" file system. This could potentially be solved with a flat Nix-style store, into which files are hard-linked, and periodically garbage-collected.

Projects

https://karl-voit.at/tagstore/en/papers.shtml

BeFS

See the Wikipedia article.

TagFS

Authors	Mark Watts

Source code

Designing better file organization around tags, not hierarchies

[cite:@nayuki2017designing]

A tag-based filesystem for Ubuntu

A Novel, Tag-Based File-System

[cite:@yang2012novel]

tmsu

Perkeep

Homepage

Perkeep (née Camlistore) is a set of open source formats, protocols, and software for modeling, storing, searching, sharing and synchronizing data in the post-PC era. Data may be files or objects, tweets or 5TB videos, and you can access it via a phone, browser or FUSE filesystem.

The Naming System Venture

Archived article. A planning document for ReiserFS.

ReiserFS

ID	157d191d-fd59-4f04-8b4f-84578a7ae9e2
DeertopiaVisibility	public
ROAM_EXCLUDE	t

See the Wikipedia article.

Tagsistent

Homepage

Dantalian

Homepage

Non-hierarchical file system

Motivation

Example: Photos

Example: Jade's music library

Example: Madeleine's documents

Note-taking

The HFS→NHFS injection

Permissions

The CWD

Navigation

Set expressions

Compatibility with the FHS

Implications for programming languages

Identity

Hashing

Diff chains

Reference correction

UUID / Global counter

Userspace solutions

Projects

BeFS

TagFS

Designing better file organization around tags, not hierarchies

A tag-based filesystem for Ubuntu

A Novel, Tag-Based File-System

tmsu

Perkeep

The Naming System Venture

ReiserFS

Tagsistent

Dantalian

Wikipedia — Semantic file system

Wikipedia — Database file systems

Stack Overflow — File system that uses tags rather than folders

Going beyond the hierarchical file system

The ubiquitous digital file: a review of file management research