| ID | 918f0526-89f3-4a5c-a08d-5fa9e99c2cc7 |
|---|---|
| DeertopiaVisibility | public |
Non-hierarchical file system
META: This file is concerned with the file system, which only discusses tags when they are directly relevant to the file system. For abstract notes on tags, see Tag-oriented organisation.
A non-hierarchical file system is one that rejects the rejects the traditional "folder-based" approach to file systems. This could be in favour of any alternative structure: a flat file system (one without directories), a database file system, a semantic file system, etc. These notes primarily concern themselves with the tag-based file system.
Motivation
Example: Photos
You do a photoshoot and capture a series of images, say “Fred-01.jpg”, “Fred-02.jpg”, ..., “Fred-30.jpg”. You delete some outtakes, leaving holes in the sequential file numbering. If you publish this subset of photos as is, the consumer might wonder why there are gaps in the numbering, and get unintended insight into how many raws were shot. But it gets worse – your coworker Mary was shooting alongside you, and offers you her collection of photos of the same subject. In a stroke of bad luck (or great minds thinking alike?), she also named her photos “Fred-01.jpg”, “Fred-02.jpg”, ..., “Fred-19.jpg”. You want the benefit of being able to use her files, but now you are left with a dilemma on how to organize them. You can either put her files in a separate folder so that you end up with “Me/Fred-01.jpg” vs. “Mary/Fred-01.jpg”, or you can re-number her files to integrate them into your collection (e.g. add 30 to Mary’s numbers), or you can add a prefix to her files (e.g. “Mary-Fred-01.jpg”) and put them into your folder. The first option of keeping collections in separate folders is the easiest to do but makes retrieval awkward; the second and third are awkward due to poor tools for renaming files, and have a high potential for human error (misnaming, accidentally deleting, or overwriting files). In all cases, you have to take extraneous actions to merge two collections together or retrieve files from a merged collection. These extra steps stem from the hierarchical filing paradigm, not from the idea of merging data.
---
[cite:@nayuki2017designing]
Example: Jade's music library
In general I have laid it out along the lines of
<genre>/<subgenre>/.../<artist>/<album>/<track>which works out fine for 95% of albums. But there exist things such as compilations ... so I had to become inventive by using 🥁 🥁 🥁 symbolic links! So now I have<genre>/<subgenre>/.../[Compilation]/<album>and symlink it such that for any one artist the structure above is preserved. But from this I'm getting the issue of subgenres overlapping ... for example I have the hierarchypunk/hardcore punk/as well asmetal/<...>and I somehow have to integrategrindcoreinto this, which developed solely from hardcore but is nowadays considered a subgenre of metal ..... so more symlinking it is---
[cite:@jade2024music]
Example: Madeleine's documents
My ~/Documents folder has been growing at an increasing rate:
/home/crumb/Documents ├── books │ ├── (42 entries) ├── papers │ ├── (54 entries) └── (13 entries)
Notice my horrible organisation! I've arbitrary decided to throw books in one pile, whitepapers in another, and a handful of unsorted documents at the root. Ideally, my documents would be tagged with topics, authors, publish years, etc. If I were preparing for a compiler project, I would love to quickly query my library to see my compiler-related papers and books.
Note-taking
Trying to fit notes into a hierarchy is particularly awful. Dumping everything into some ~/Notes makes your precious knowledge undiscoverable. Organising into directories will quickly become unmaintainable (TODO: why?), and is a waste of time in the first place. Every respectable knowledge base that comes to mind takes the shape of a wiki, optionally with tagged articles. To my awareness, the only exception is the 1Lab — a literate Agda library, subject to the constraints imposed by Agda's module system; see Implications for programming languages. Examples of non-hierarchical knowledge management include org-roam, Wikipedia, the nLab, and many, many others. Instead of manually organising topics by placing them into different directories, articles are given titles and aliases, and connections between articles are made naturally by mention of said topics.
The HFS→NHFS injection
Permissions
TODO investigate NTFS permissions
The CWD
This is mostly a matter of UI/UX, but could potentially be an issue of compatibility, depending on the scale. How does one navigate a NHFS? does one navigate a NHFS?
Navigation
Related to the CWD problem. How does one navigate a NHFS?
Set expressions
Navigation of a HFS is something like incremental construction of a path; one metaphor for navigation of a NHFS is incremental construction of a set expression, using disjunction, conjunction, implication, and negation. One would start at the "root" (the set of everything on the file system), and add additional constraints one by one.
There is no way the naïve implementation is at all feasible, lmfao. Any form of previewing the results — e.g., tab-completions, or display in a graphical file manager — would likely have to be done lazily, with generous caching, and with items received in arbitrary order. If the file system could gaurantee search/traversal in an at all meaningful order, that would be pretty cool.
Compatibility with the FHS
In a system built from the ground up with a NHFS, how are system libraries and executables located? I think NHFSes have the potential to offer a lot in this regard. Imagine if instead of your $PATH being a list of arbitrary directories (with a significant number of files symlinked into those directories!), you declared the $PATH to be a query for files tagged with "on-path." This example is a pretty simple change, but I can sense a lot of potential here. Continue to ponder this!
Implications for programming languages
| ID | 9d828b35-672f-4256-929e-381fbc50d585 |
|---|---|
| DeertopiaVisibility | public |
| ROAM_EXCLUDE | t |
Programming languages typically model modules in parallel or atop of the underlying HFS. How do NHFSes play into this?
Identity
Identity appears to be the single most difficult point in designing a NHFS. At the end of the day, there will always be situations where uniquely and reliably identifying a single file object will be necessary. On the implementation's side, files must be identified by something to be meaningfully queried at all; for the user, the ability to refer to a single file is essential — which file am I executing/reading/writing?
Hashing
While really cool, identification based on a hash has the inherent issue of not allowing mutability. i.e., to modify a file is to delete it, and create a new file with the applied modifications. — Jade.
Diff chains
To makes immutability more feasible, one could imagine files being stored as a linked list of diffs/patches; however, this carries space concerns.
Reference correction
Upon modification, all references to a file could be updated. This is time-consuming, and error-prone.
UUID / Global counter
Ugly! — Jade.
Userspace solutions
One approach is to implement a virtual file system using FUSE. A consequence of this, observed in tmsu, is that the "guest" file system becomes very fragile to changes made in the "host" file system. This could potentially be solved with a flat Nix-style store, into which files are hard-linked, and periodically garbage-collected.
Projects
BeFS
See the Wikipedia article.
TagFS
| Authors | Mark Watts |
|---|
Designing better file organization around tags, not hierarchies
[cite:@nayuki2017designing]
A tag-based filesystem for Ubuntu
A Novel, Tag-Based File-System
[cite:@yang2012novel]
tmsu
Perkeep
Perkeep (née Camlistore) is a set of open source formats, protocols, and software for modeling, storing, searching, sharing and synchronizing data in the post-PC era. Data may be files or objects, tweets or 5TB videos, and you can access it via a phone, browser or FUSE filesystem.
The Naming System Venture
Archived article. A planning document for ReiserFS.
ReiserFS
| ID | 157d191d-fd59-4f04-8b4f-84578a7ae9e2 |
|---|---|
| DeertopiaVisibility | public |
| ROAM_EXCLUDE | t |
See the Wikipedia article.