Graphs for Inference

Paco Xander Nathan
derwen
Published in
8 min readFeb 19, 2024

--

Lately I’ve become intrigued about the published research + open source code for a relatively specific topic: generating graphs to use for inference.

This category seems to be becoming popular as means for blending LLM use with graph technologies to obtain more grounded results. By other names “Hybrid AI” might be a good general description for this category, or as one of the flavors of “Neurosymbolic AI” — though frankly I’m not as much of a fan of the latter terminology. To wit, how can we blend the benefits of machine learning and generative approaches with symbolic inference?

Heidelberger Schloss: backdrop for recent graph discussions

“Topologies of Reasoning” by Maciej Besta, et al., ETH Zurich (2024–01–25) provides a good survey, comparisons, and analysis of graphs used to decompose LLMs prompts into “graph of thoughts” results. These approaches typically leverage graphs to track relative costs, backtracking, provenance, partial explanations, and so on:

Wong, Grand, et al., introduced the notion of probabilistic language of thought (PLoT) as “a general-purpose symbolic substrate for probabilistic, generative world modeling.” In other words, translate from natural language into PLoT to construct small probabilistic worlds, then calculate posterior distributions for queries. They use probabilistic programming based on the Church language. In contrast to using, say, Prolog, this allows for inductive inference in addition to deductive methods, and also avoids some limitations in Prolog.

Church goes a step further by specifying a generative domain theory in addition to probabilistic inference rules. We believe that this interplay between probabilistic priors and likelihoods — which is central to Bayesian inference — is also at the heart of human cognition.

“From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought”
Lionel Wong, Gabriel Grand, Alexander K. Lew, Noah D. Goodman, Vikash K. Mansinghka, Jacob Andreas, Joshua B. Tenenbaum
MIT (2023–06–23)

An example of probabilistic reasoning via language-to-code translation

This approach looks quite promising. It requires some work up-front to define the meaning functions and inference functions, which would be highly specific per use case. They use LLMs trained specifically to translate text into code (application of “copilot”-esque code generation) based on these domain theories. Given a prompt about the generated world, the Church language performs heavy-lifting to calculate posterior distributions as responses.

Overall, the shift from parsed language to formal meaning allows for probabilistic manipulation of worlds and queries about them.

In another twist, consider inverting a scene description, i.e., using this world-building tool to generate visual depictions. These can be operated on by agents, as a kind of stage with actors.

Taking this a few steps further, one could image leveraging reinforcement learning, given the posteriors, to optimize for agent behaviors within these generated worlds.

University of South Florida has the functional object-oriented network (FOON) project, which provides an open source graph-based language for robot planning. In this paper, Sakib and Sun explore using LLMs to generate task trees based on FOON, to direct cooking robots. Given a collection a task trees, and some subsequent analysis to note the sequential and parallel dependencies among subtasks, a graph is produced to represent a detailed plan for preparing a recipe.

“From Cooking Recipes to Robot Task Trees — Improving Planning Correctness and Task Efficiency by Leveraging LLMs with a Knowledge Network”
Md Sadman Sakib, Yu Sun
U South Florida (2023–09–17)

A FOON-based plan for preparing pancake batter

This approach is less formal than the PLoT domain theories, and perhaps a more general-purpose approach — with less overhead than probabilistic programming. The results guide real-world actions, instead of estimating the probabilities of potential actions.

To generalize, one could develop a domain specific language (DSL) for the target representation of a use case. Then leverage LLMs to translate from text descriptions, images, etc., into small worlds generated using the DSL. This speaks to AI planning based on LLMs + Graphs, which recalls the priorities of an earlier day of AI research, circa mid-1980s. Back when I was a grad student.

Note that by taking the converse of the FOON-based research, one could restate recipes where a human cook acts much like a directed robot. Almost anyone who’s worked in commercial kitchens understands how close this proposition is to the stark reality of cooking for a living. Perhaps LLMs could be used to restate the wide-ranging fluff of cooking recipe content online into useful, consistent, spam-free, influencer-free instructions for people?

Wen, et al., at UIUC explore how to prompt LLMs with knowledge graphs, generating a mind map from text input to represent evidence subgraphs. Using “evidence graph mining” and “evidence graph aggregation” from the initial prompt, this graph data gets fed into an LLM to produce a merged reasoning graph.

This approach is intended to help remedy the typical limitations of generative approaches: hallucinations, difficulty incorporating new knowledge, transparency for decision-making processes, and so on. The UIUC research seeks to be able to engage LLMs with updated knowledge, then elicit pathways through the resulting “mind map” graph which get used for reasoning.

The experiments on three question & answering datasets also show that MindMap prompting leads to a striking empirical gain. For instance, prompting a GPT-3.5 with MindMap yields an overwhelming performance over GPT-4 consistently. We also demonstrate that with structured facts retrieved from KG, MindMap can outperform a series of prompting-with-document-retrieval methods, benefiting from more accurate, concise, and comprehensive knowledge from KGs.

“MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models”
Yilin Wen, Zifeng Wang, Jimeng Sun
UIUC (2023–09–15)

Conceptual comparisons between LLM prompting methods

The resulting mind map provides a visual depiction of the LLM “graph of thoughts” reasoning. Moreover, it would seem to be a by-product from which its elements could get accumulated over time into a larger, curated KG?

Contrasting these LLM-based generative AI approaches, research into graphical causal models provides good means for calculating causal inference between variables in a system under study. One can perturb the system, test for sensitivities, or take into account potentially unobserved confounders.

First, describe a causal graph: a directed acyclic graph (DAG) representing the causal relationships between variables in a system. These represent one’s understanding and assumptions about the mechanics of the system. Second, provide tabular observational data for the variables in the DAG as a Pandas dataframe. Then DoWhy-GCM fits a graphical causal model which can be interrogated to ask causal questions.

“DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models”
Patrick Blöbaum, Peter Götz, Kailash Budhathoki, Atalanti A. Mastakouri, Dominik Janzing
Amazon AWS (2022–06–04)

An overview of DoWhy features

Supervised learning in ML typically gets used to predict outcomes, generalizing from historical data based on past-as-prologue. Training data and the test data are assumed to come from the same populations of data overall. In contrast, causal inference assumes that the training data and “test” data will differ. Notable, the “test” data involves counterfactuals, and we cannot observe the counterfactual world at the time when we’re constructing a model. The “Cult of Prediction” gives way to means of exploring decision-making approaches and their implications, risks, assumptions, lack of assumptions, and so on.

Restated in a different way: graphical causal models address the “last-mile” problem of decision makers working through complex issues and learning from results. Ostensibly this approach provides a much better fit for real-world decision intelligence problems than chatbots typically afford.

Note that it is possible to learn the causal graph representation from data. As a longer-term prospect, how about translating from text, audio, diagrams, etc., using LLMs to generate causal graphs, then translate prompts into causal questions?

Logan, et al., at UC Irvine presaged this area of research in 2019 through “fact-aware language modeling” based on knowledge graphs, introducing an approach they called KGLM.

LLMs typically only “remember” facts which are available at training time, and tend to hallucinate, i.e., have difficulty recalling these facts correctly.

While traversing through an input text, KGLM accumulates facts (entities and relations) into a graph representation. This provides an alternative to RAG for using a KG to ground the text responses generated by an LLM, ensuring that responses represent factual replies.

“Barack’s Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling”
Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters, Matt Gardner, Sameer Singh
UC Irvine (2019–06–20)

A localized KG containing facts conveyed in the input text
Illustration of the KGLM process

Note that this UC Irvine research predates the Lewis, et al., 2020 paper which introduced retrieval augmented generation (RAG), and moreover it begins by using KGs to ground RAG.

Kudos to Jürgen Müller and Shachar Klaiman at BASF for hiking uphill at a reasonable pace then engaging in many varied graph technologies discussions over Peruvian cuisine in old town Heidelberg.

In general, I’m tracking new papers in this category using a Hugging Face collection:

newsletter sign-ups: https://derwen.ai/newsletter

--

--