With advent of fast and high-res camera mobile phones, comes the need to manage your cloud storage. Memories are just a few taps away residing in your pocket, though, stored somewhere on a cloud server (most likely at Google Photos or iCloud Photos). I have been a fanboy of Google products since childhood. And as I grow old, I am also becoming a fan of Apple products. But, the programmer within me loves everything FOSS(linux anyone?). Darn it, why can’t I pick a side. Actually, I do not have to. By now, you might be wondering where am I leading to with all of this?
We all have enjoyed free backups. We might have enjoyed fast mobile cameras too. Alas, free things don’t last. Now, you need to declutter your storage every now and then. And, you need to pay to keep on continuing cloud-based photo managers. My memories are now split between worlds of Apple and Google(Android) because of their incompatible setup. Google Photos has entirely moved away from desktop versions. And, iCloud does not have smart features like Google Photos, plus a lot of my data is in Google Photos already.
With this confusing context, the actual problem I am trying to tackle here is to declutter and organise photos, but without help of any cloud provider. This should help in choosing right cloud backup provider and also allow for escaping a vendor lock-in, should you choose to keep self managed backups using the VPN approach.
In order to reduce photos, I want to be able to
I was focusing on first two problems, which made me realise,
A DBSCAN (density based scan) algorithm for clustering can provide unsupervised mechanism to build groups (clusters). The two parameters to the algorithm can be thought as
For my usecase, a count of 2 with distance as 2min can build me photographs which were taken together within 2 minutes of each other grouped together. This can then help me reduce clutter and choose which photographs to keep. Alternatively, if I have larger distance value and larger count, say 1 hour and 5 photos, I shall see groups resembling a travel kind of memory.
To implement this, I went ahead with rust to revise my previous learnings. Also, who does not love a bite of smaller memory and cpu footprint ?. Check out the codebase at fileman. The command line utility fileman is available for macos target for now
➜ release git:(main) ✗ ./fileman --help
fileman is a simple tool to group files based on create_date os timestamp using dbscan algorithm. author: Sumit Mundra
Usage: fileman [OPTIONS] --input-path <INPUT_PATH> --target-path <TARGET_PATH>
Options:
-i, --input-path <INPUT_PATH>
the directory to scan
-o, --target-path <TARGET_PATH>
output directory with links to input files grouped inside cluster-wise folders
-t, --time-interval-sec <TIME_INTERVAL_SEC>
duration in seconds between two files to cluster them together [default: 600]
-c, --min-cluster-size <MIN_CLUSTER_SIZE>
minimum count of files needed to define a cluster [default: 3]
-p, --tag-prefix <TAG_PREFIX>
prefix to be used as tag, no tagging if empty [default: cluster]
-h, --help
Print help
-V, --version
Print version
➜ release git:(main) ✗
To ease out the viewing problems of groups, fileman
provides two approaches for now,
On my MBP 2016 model(A1707), fileman
could cluster a couple of day’s worth of images (77) in under 20ms.
➜ release git:(main) ✗ ./fileman -i /Users/sumitmundra/Downloads/file_input_data -o /Users/sumitmundra/Downloads/out_path
Finished in 17 ms
I would add capacity to read multiple filepaths from a file and then allow to add/remove tags independent of clustering part.
Fun fact: The fileman currently works on all file types. It can help you sort out your downloads folder in accordance of your browsing activity ;).