Generative AI and spreadsheets

Large language models won’t replace financial modelers any time soon, but are here to assist them already

Published in

GRID

8 min readFeb 8, 2023

With all the well deserved attention GPT-3 is getting these days, not least in the context of software development (a la Copilot), it’s no wonder many are thinking about what they might mean for the most widely used end-user development environments of them all: spreadsheet software.

There has been quite a lot of activity in this space lately, and several concepts explored across a fairly wide range of use-cases, including:

Data enrichment and cleansing
Fill suggestions
Formula assistance
Spreadsheet model generation

Solutions have already seen the light of day on some fronts, while others are likely still a pipe dream.

Let’s explore each of them.

Data enrichment and cleansing

OpenAI themselves highlighted this use-case in the announcement of their APIs back in 2020, subsequently releasing the code for the Excel add-in demonstrated in the video, called Tabulate. (The code is no longer maintained and “no expectations that it will work” according to the Github repository)

Screenshot from a video in OpenAI’s API announcement post. Click for a direct link to the video.

There’s even a “spreadsheet generation” example in OpenAI’s playground, filling in entire tabular data tables based on simple prompts.

Several 3rd parties have lately brought this use-case to life with solutions such a the SheetAI add-on for Google Sheets, and less formal OpenAI API experiments such as SpreadsheetMagic and GPT3().

Similar results can be achieved using integration solutions such as Zapier or Make to connect the spreadsheet software of choice to OpenAI’s API.

Challenger spreadsheets Rows and Spreadsheet.com have recently released built-in functions ASK_OPENAI() and OPENAI_COMPLETE() for this purpose specifically.

An honorable mention also goes out to Excel’s Flash Fill feature that has been around since Excel 2013, but is almost certainly using a rule based model — and certainly not on generative AI.

All of these examples fall squarely into the “tabular data” area of spreadsheet usage, where spreadsheets are really being used as small databases rather than as calculation or modeling tools.

Let’s move on to use-cases involving formulas and modeling.

Fill suggestions

Fill suggestions were probably the first AI feature to be made publicly available in spreadsheet software, when Google Sheets rolled out suggested autofill (blog post) late 2021. Based — at least partly — on very interesting PhD work by now Senior Google Brain researcher, Xinyun Chen, these are often helpful when calculating single columns.

Currently however, the suggestions are limited to relative references, so they are usually not helpful when calculating series that rely on the same assumption across series or chained calculations as those require absolute references.

To my knowledge no other spreadsheet software has taken up similar patterns.

Formula assistance

Given the success of Github’s Copilot for other programming languages, and the fact that even experienced spreadsheet modelers spend a significant amount of time consulting function documentation and googling for assistance, using large language models to assist in formula writing is an obvious opportunity for innovation.

A recent paper from a group of researchers at Microsoft, explores using a purpose-specific model for spreadsheet formula repair and auto-completion with promising results. A rule-based approach still probably has a lot of merit in repair and corrections, and a hybrid approach might offer the best of both worlds.

Several projects — most noticeably Exceformulabot — have experimented with formula explanations, returning natural language breakdown of what a formula does when fed a formula string. This can certainly be helpful when debugging formulas or learning how to use new functions or patterns. This, however, feels like a domain where a rule based approach to generate the explanation should be fairly well achievable and more reliable than a generative AI one. To my knowledge no-one has yet attempted such an approach.

The most direct analogy to Copilot would still be formula generation — creating a fully working formula from a natural text explanation of a problem. Excelformulabot and Excel CoPilot both provide this through add-ons to Excel and Google Sheets. While projects such as Excel Formulator and experiments such as this Google Sheet posted by @fabianstelzer on Twitter take a less integrated approach.

At GRID we’re putting the finishing touches on an integrated GPT-3 powered formula assistant. It allows users to describe the calculations they need in natural language and receive suggested formulas in return:

Update: The formula assistant has been launched now.

GRID is the first spreadsheet software to offer formula generation as a fully integrated feature. To be honest, I thought this would be a nice demo gimmick at best, but — even though I consider myself a fairly advanced spreadsheet user — I find myself using the formula assistant all the time.

The possibilities to assist further are endless as soon as one gets a feel for this functionality in action. More about that later.

Model generation

So, does this mean that large language models will soon replace financial modelers and other spreadsheet creators?

Here, the answer is firmly and squarely “no”. This kind of language model will “never” (i.e. not in the current paradigm) be able to create complete spreadsheet models from natural language prompts.

There are several reasons I’m willing to state this so firmly, the main two being:

Large language models output text, they currently have no way to output an XLSX file or other formal spreadsheet definitions combining data and formulas.
The models have not and will not in any foreseeable future be trained on a corpus of whole spreadsheet models.

The first problem is addressable through purpose-built models, but the second one is a showstopper.

There are obviously two companies that sit on large enough corpuses of spreadsheet models to do training: Google and Microsoft. But:

Despite a lot of spreadsheets being stored in these companies’ cloud solutions (Google Drive and OneDrive), the vast majority of them are private models. Given the moral and legal debate that has been triggered by training large language models on publicly available and even open source data, using such private and often business sensitive sources for training purposes would be highly questionable and probably too risky for their reputation for them to even consider it.
The largest publicly available corpuses of spreadsheets, such as the Enron Spreadsheet Corpus and the EUSES corpus only count a few tens of thousands of spreadsheets and are — for various reasons — not fully representative of broader, modern spreadsheet usage (the spreadsheets in the Enron corpus for example are already over 20 years old). So there is no open source repository of spreadsheet models akin to what Github is for traditional programming languages. On that note, the Microsoft paper referenced in the formula correction section above mentions about a “corpus of 1.8M publicly available Excel workbooks […] planned for public release by a separate group of authors”. It will be interesting to learn more about this corpus as it becomes available.
Spreadsheet models are a lot less likely to have distinct comments and explanations that a natural language model could be trained on, than traditional code where — at least at any level of sophistication — developers tend to be quite disciplined in commenting their code.
To the extent that textual explanations and context for spreadsheet models exist, in fact they usually sit in the emails the spreadsheets might be attached to, or in the slide decks and documents used to present the charts and tables made from them.
And as my friend — and literally the author on the book on spreadsheet implementation — Peter Sestoft pointed out to me, the fact that — unlike traditional software code — spreadsheets are a mix of code (formulas) and proprietary data makes the training on private data even riskier as it is more likely to leak proprietary information.

Finally, as is becoming more and more evident, a key shortcoming of large language models is how confidently wrong they can be — the so-called hallucination problem. This is especially troublesome when it comes to logic and factual knowledge — both of which are key to spreadsheet modeling.

All of this is well demonstrated by the following experiment I ran on ChatGPT:

Prompt:

create a spreadsheet model forecasting revenue for a subscription
business with a single pricing plan based on assumptions about ad spend and
funnel conversion

ChatGPT’s response:

Here is a basic spreadsheet model that can be used to forecast revenue for a 
subscription business with a single pricing plan:

1. Inputs:
  * Ad spend: The amount of money that is spent on advertising each period
    (e.g., per month).
  * CPA (cost per acquisition): The amount of money that is spent to acquire
    each new customer through advertising.
  * Conversion rate: The percentage of visitors to the website who sign up
    for a subscription.
  * Churn rate: The percentage of existing customers who cancel their
    subscriptions each period.
  * ARPU (average revenue per user): The average amount of revenue generated
    from each customer per period.
  * Initial number of subscribers: The number of subscribers at the beginning
    of the forecast period.

2. Formulas:
  * Visitors: = Ad spend / CPA
  * Subscribers: = Visitors * Conversion rate
  * New subscribers: = Subscribers - Initial number of subscribers
  * Lost subscribers: = Initial number of subscribers * Churn rate
  * Ending subscribers:
    = Initial number of subscribers + New subscribers - Lost subscribers
  * Revenue: = Ending subscribers * ARPU

3. Example: Assuming an ad spend of $10,000 per month, a CPA of $50, an
initial number of 1,000 subscribers, and assuming a conversion rate of 2%,
churn rate of 10%, ARPU $50 per month.
  * Visitors: = 10000 / 50 = 200
  * Subscribers: = 200 * 2% = 4
  * New subscribers: = 4 - 1000 = -996
  * Lost subscribers: = 1000 * 10% = 100
  * Ending subscribers: = 1000 + (-996) - 100 = -96
  * Revenue: = -96 * 50 = -4800

This is just a basic example, you may have to consider other factors such
as seasonality or other events that may affect the funnel or revenue.

While impressive at first glance, on closer inspection this reveals the enormity of the gap between generating a convincing textual explanation and an actual, working and correct spreadsheet model.

Translating this text explanation into a spreadsheet model will be quite a lot of work, even for a skilled spreadsheet modeler, tapping into a lot of their tacit and formal knowledge and logic.
The proposed model is in fact wrong. The calculation for “New subscribers” being the most glaring example.

Conclusion

Copilot-like assistance in spreadsheet software, suggesting fills and writing formulas, is going to be table stakes before long.

But just like nobody would expect Github’s Copilot to generate the design and architecture for an entire software system, large language models generating entire spreadsheet models is not within reasonable reach of current technology, especially given the lack of availability of suitable training data.

— — — — — — — — —

P.S. At least for a time being I’ll be maintaining a list of relevant projects in this post and those that I learn about subsequently in the table below. Please comment below if you know of projects or initiatives that should be added.