Controlled hallucination

Personal Accounting

Published: - Reading time: 14 minutes

I always liked keeping track of my expenses. When I started working and earning my own money, I was already using Linux and always looking for open source alternatives for tools I needed. Not surprisingly, the first program I ever used for tracking my expenses was GnuCash. Fast-forward to today, and I have moved into an equilibrium that I don’t think will last much longer. For the past 4 years I have been using a web app I developed, and exporting the results into a text file in hledger format. My current “accounting workflow” consists of two main activities:

  1. Manual entry for cash expenses. I do this every day I spend cash.
  2. Extracting data from the bank and credit card statements, automating as much as possible. I suffer through this once every month or so.

Basically:

flowchart LR DB[(CouchDB)] M(Manual Entry) --> A(App) S(Scripts) --> A A --> hledger A --> DB

In this post, I will not talk about point number two, which has become the most cumbersome.

These are some videos of the webapp, rendered at the scale of an iPhone 13. Please ignore the data, it is just test data I added.

  • It is a progressive web application (PWA) written in React.
  • It uses CouchDB for synchronization across devices.
  • It encrypts data in the browser before synchronizing.

Feature I don’t use much:

  • Graph reporting

Feature I never use:

  • Envelope budgeting method (I won’t discuss this here).

But more on that later. First, the boring part.

Double-Entry Accounting

I never cared much for accounting and avoided studying it formally as much as I could, both at school and at university. This might come as a surprise, given I am writing a post called Personal Accounting. I can live with that contradiction.

You can read what double-entry accounting is in Wikipedia and countless blogs and books. My brief explanation is that something does not come out of nothing. Or better said, what goes in one account, has to go into another. For example, lets say I spent $10 in cash at the supermarket, those $10 came out of my “cash” account. This transaction is balanced, because when you add up all the amounts in the entries, you end up with zero (i.e. sum(debits)=sum(credits)).

In a ledger-like format, we can express the example above like this:

2023-05-28 Supermarket
  Expenses:Groceries   $10
  Assets:Cash         -$10

A transaction has two or more entries. An entry specifies a currency, amount and the corresponding account.

This was a simple example. But it can get more complicated in everyday life.

Multiple entries (transaction splits)

In many situations, a transaction has more than two entries. A typical example could be dining out and paying with a credit card, but leaving the tip with cash. I like to track this in a single transaction.

2023-07-17 Beers with friends
    Expenses:Socials        $250
    Assets:Bank            -$200
    Assets:Cash            -$50

GnuCash calls these transaction splits. I use them often. In the screenshots above, you can see this one:

Split Transaction

Different entry dates

In Argentina, we buy things in installments, so it is normal for me to have transactions like these:

2023-07-17 Pharmacy
    Expenses:Health & Beauty        $300
    Liabilities:Visa               -$100
    Liabilities:Visa               -$100 ; date:2023-08-17
    Liabilities:Visa               -$100 ; date:2023-09-17

In this example, there is more than one date. The first $100 installment “inherits” the transaction’s date. But the remaining two, don’t.

After researching how different tools and persons suggested tracking credit card installments, I settled on this approach because:

  • I prefer specifying the whole $300 in one expense, at the date when I spend it. In my head, I already spent the money. Other methods alsto split the expense per installment.
  • It is also clear to me I owe Visa $200 after I pay them this month. If I graph my debts (liabilities) into the future, those two entries would appear in the following months.

In the screenshots above, you’ll see this one:

Installments

Compared with the previous example, the only difference in rendering is that all accounts are the same, so we can just add ${account} x ${count}.

Year 2019 - Hello React and plain text accounting

By 2019 I already had a few gripes with GnuCash, and the companion app I was using on my phone to do the data-entry of the cash expenses. The app did not handle multiple currencies. GnuCash was very hard to script. The account structure was, for me at least, too rigid. For example, I use multiple currencies a lot, so I prefer having ONE Expenses:Groceries account. I don’t care that it mixes currencies. At least back then, an account in GnuCash was tied to one currency. So I ended up with Expenses:Groceries (ARS), Expenses:Groceries (EUR) etc.

I tried out a few open source alternatives and eventually found plain text accounting and it resonated with me. You basically write the transaction in a simple text-based format and other tools do the accounting stuff.

At that time, I wanted to try out React, and I wanted to see how hard/easy it was to build a PWA. So I took this side project as an opportunity to explore these topics.

My requirements were mainly:

  1. Cross-platform (I used Android, my wife iOS, my laptop Linux)
  2. Work offline (when I travel, I don’t always have internet)
  3. Replication (for a backup in the server, my wife could also do data-entry)
  4. Client-side encryption
  5. It should auto-complete past entries, and other nice things to improve the data-entry experience.

The end product

These are the building blocks I used:

React and PWA

Learning React was not easy. However, setting up a project, and getting a real PWA that you can “install” in your phone was super easy.

At the time, Android support for PWA was way ahead of iOS, though I have seen it get better very recently.

The hard part about the PWA was getting the app to actually download the new version. I had already seen “Click to update the app” in the wild, and I figured it was easier.

CouchDB

I had played with CouchDB years ago, and I really enjoyed it’s (apparent) simplicity. These things seemed like a good fit:

  • A JSON document model
  • Built-in replication model
  • Views based on map/reduce (more on that later)

After a quick search on the internet, I found pouchdb and I thought it looked promising. It uses IndexedDB in the browser and synchronizes against CouchDB.

I installed a CouchDB server through bitnami - hosted in Google Cloud. The server had HTTPS certificates issued with Let’s Encrypt. I created a user and password, and enabled the Database Per User feature. Not much more.

The amount of data was never going to be huge. Even now, after four years, I have less than 7,000 transactions. According to Firefox, it uses less than 50MB of storage.

Data Model

Lets take the first transactions as an example:

2023-05-28 Supermarket
  Expenses:Groceries   $10
  Assets:Cash         -$10

This gets transformed into this JSON document

{
    "_id": "2023-05-28-dae19d76-69a4-4378-8295-5b374bd3a322",
    "date": "2023-05-28",
    "description": "Supermarket",
    "postings": [
        {
            "account": "Expenses:Groceries",
            "currency": "ARS",
            "amount": 1000
        },
        {
            "account": "Assets:Cash",
            "currency": "ARS",
            "amount": -1000
        }
    ],
    "accounts": ["Assets:Cash", "Expenses:Groceries"],
    "createdAt": 1685289802430
}

The _id field (document ID) is the date of the transaction, plus a random UUID. Adding the date in the ID allows me to do efficient range queries and sorting (necessary for a “endless-scroll” for example). This of course means that changing the transaction date involves actually creating a new document and deleting the previous one.

The postings field is an array of all the accounts involved in the transaction. This gets indexed.

The amount field is in cents. This is an easy way of not dealing with rounding issues with floating points in JavaScript. I basically hard-coded the minor unit to 1/100. This was not very useful when I added stocks in my app, as you sometimes end up with a fraction in the thousands (1/1000) of a stock.

Map/Reduce

CouchDB uses map/reduce extensively. For example, if I wanted account balances, all I had to do was create a view that given a document, emitted an amount for each account the postings:

const exampleView = {
  views: {
    balance: { // DEFINE A `balance` VIEW
      map: function(doc) {
        doc.postings.forEach(function(p) {
            emit([p.account, p.currency].toString(), p.amount);
        });
      }.toString(),
      reduce: "_sum"
    }
  }
}

This would then produce a reduced list of my (Account,Currency, Balance) tuples. If I wanted to add tags to the app later, as another method to organize transactions, I could have other views for tags.

I could also emit more documents for nested accounts. For example, for Expenses:Utilities:Electricity I could emit 3:

  • Expenses
  • Expenses:Utilities
  • Expenses:Utilities:Electricity

However, pouchdb did not perform well here. At least back in 2019, it was faster to iterate over the default _all_docs view. So though it was fun, I ended up scrapping that code.

Encryption

I will tell you upfront that I am not a security expert. I am well aware that keeping web apps secure is not an easy endeavour. This is a personal project where I wanted to try some things out and if somebody manages to read my browser’s database, it would be more invasive than if they open my paper bank statements.

Having said that, this is a summary of how I handled encryption:

  1. No data leaves the browser in plain-text.
  2. A local copy of the database lives in a pouchdb instance (unencrypted).
  3. Username, password and encryption passphrase are stored in plain-text in the local pouchdb database.

At it’s core, the data at rest in the browser has the same-origin policy guarantees (at least in firefox where IndexedDB is used). If you are interested in security in web applications, I recommend this Auth0 article: Secure Browser Storage: The Facts.

To encrypt the document, I used the Transform Pouch plugin coupled with the native crypto primitives available at the time in the browser: Web Crypto API.

sequenceDiagram PouchDB->>Transform: Encrypt Document Transform->>Server: Encrypted Data Server-->>Transform: Encrypted Data Transform-->>PouchDB: Decrypt Document

Each document gets its own initialization vector (IV) for the AES-GCM algorithm, which gets sent along with the encrypted data. We can’t encrypt the _id nor the _rev fields, because that would render the document meaningless in replication. An encrypted document looks something like this:

{
  "_id": "2020-10-03-05739058-67e5-4ad7-91b5-17712c429e9d",
  "_rev": "1-21b9483a4a9aa5101e8537b7ba501d68",
  "data": "87f083d27914e67ee4a38060|642be(...)9120"
}

Where the data is the iv|ciphertext. As you can see, I chose to leak information. This document is a transaction from the 3rd of October, 2020. This was a tradeoff I was willing to make.

In summary:

  • The app data is not encrypted at rest in the browser. I was not worried about this. Adding at rest encryption means that at some point, you need to decrypt the data to actually use it. It added quite a lot of overhead, and the idea of using IdexedDB was to have the browser be more efficient (i.e. I didn’t want to load all data in memory).
  • _id and _rev data is not encrypted, and the _id includes the date. Having the date in the _id allowed me to leverage CouchDB’s query syntax when querying by date, scrolling, etc.
  • Only standard AES-GCM with one static key, nothing fancy. I like listening to security podcasts that go way above my head. I liked this episode Tink with Sophie Schmieg from the “Security Cryptography Whatever” podcast. They talk about tink and it sounds great. I would definitely try it:

Tink is a multi-language, cross-platform, open source library that provides cryptographic APIs that are secure, easy to use correctly, and hard(er) to misuse.

But in that podcast they usually talk about massive things, not a little personal app with 7000 documents in 4 years.

CouchDB final notes

One thing I DO regret, is using the same database for everything. Originally, it only stored transaction documents. But then I though it would be good to store price documents (currency prices). Adding a discriminator attribute is one way to do it. It is cheap setup (same DB) but you end up putting a few if (doc.type !== "transaction") that are annoying.

I really did not use CouchDB other than as an encrypted backup and synchronization mechanism. I ended up not using all the fun things like map/reduce views in pouch (and in CouchDB it was impossible, as the data is of course encrypted).

If I had to redo this app, I would study other alternatives like Google’s Firebase Realtime Database.

Other details

Wow, if you made it this far I might as well tell you a few extra things. Worst case scenario, it will help you fall asleep.

<datalist> HTML Tag

When possible, I like using native things from the browser, if the UX is good enough.

In 2019 I turned 39 years old. At that point in my life, I had tried lots of autocomplete JavaScript plugins and had learned to be weary of them, specially when mixing known data with “just let the user type whatever he wants”. Given that I was already using a framework (React) and another framework (material-ui), meant that solutions bulit on those came with strings attached. After some trial and errors, I came upon the HTML <datalist> tag, and it worked quite well for what I wanted:

  • Native autocomplete with no JavaScript plugins
  • It allows the user to type whatever they want if the option is not in the list

If you are not familiar with that attribute, here is the code for the example below:

<div>
  <label for="description">Description:</label>
  <input list="frequent-descriptions" name="description" autocomplete="off">
  <datalist id="frequent-descriptions">
    <option value="Supermarket">
    <option value="Ice cream">
    <option value="Coffee">
    <option value="Grocery Store">
  </datalist>
</div>

Example: type c then o…

(More details at <datalist>: The HTML Data List element)

I used this for my inputs (search, accounts, description).

< input type=“date” …>

I was pleasantly surprised that the UX for the default calendar in 2019 was good enough for Android, iOS and most browsers. No calendar JavaScript 🎉

Autocomplete accounts based on description

Each time a transaction was modified (locally or through replication), it updates a data structure that is then used to suggest the accounts when the user fills in the description. So if I select “Supermarket” and I most usually pay with cash, and file it under Expenses:Groceries, it autocompletes both. Previous apps I used autocompleted the amount, which I found annoying. In my use case, it was never the same amount, so I had to edit it. This is the second video at the top.

Sticky list subheader

I liked this feature and I used it to group transactions by date in the main view. This is the first video at the top.

Reports

I used react-chartsjs-2 to have some basic pie charts (per month) and bar charts. The bar charts have presets (Savings, House, Food, Leisure, etc.) and you can select “account depth”. So for example, you can easily chart Expenses:Taxes (depth=2) or increase the depth to discriminate by sub-accounts (e.g. Expenses:Taxes:Income Tax).

However, as I started using hledger more, I just do my reporting there.

Conclusions

  • I liked PWA and finally iOS is giving it better support.
  • Synching to CouchDB was easy, though I would do it differently today. I also did not handle CouchDB version conflicts (haven’t encountered them yet 🤞)
  • I don’t think I am good at UX. So when I try anything which deviates from the default, it looks weird. conflicts. I have never encountered one yet.
  • Securing/Monitoring CouchDB is work. I migrated the server to a new provider and now I just use it through tailscale - an excellent product. Highly recommend. Works like magic. As it supports https:// it works well with the PWA and no browser complains.

The precarious equilibrium

So, why I am thinking of stop using it? These are the ugly truths:

  • The resulting React/JS code is not pretty. As a newcomer to React, I had a bit of a hard time with it, and it shows. Specially handling forms/validations/etc.
  • It has some rough edges, so the other user (my wife) doesn’t use it much.
  • I use it only as a way to do data-entry of the cash expenses. Inflation is rampant in Argentina. So though we still use cash a lot, I try to avoid it. Last week I paid ARS 9,300 to the Sushi delivery guy. That is less than $20 but it involved a stack of 12 bills. This week we inaugurated a ARS 2,000 bill, so at least there is that.
  • Most of my time (in accounting) now goes into parsing the garbage data I get from my main bank. So I want to focus on reducing that pain.

Next Steps

I will continue evaluating other products like Firefly III or just find an off-the-shelf data-entry for hledger. I know they have the web-ui… now that I have tailscale… who knows.