Quant Mashup - Gautier Marti EMNLP 2025 in Suzhou [Gautier Marti]This year at EMNLP 2025 in Suzhou, my colleague Khaled Al Nuaimi and I attended the conference so that Khaled could present his paper on Evasive Answers in Financial Q&A, and also to explore current R&D trends in empirical NLP. While walking through the poster sessions, we saw a dozen of(...) Basic DSPy RAG tutorial on DataGrapple blog posts [Gautier Marti]This blog is more a note to self for experimenting further with DSPy (arXiv, GitHub) than a pedagogical or original intro to the framework. It essentially follows this weaviate tutorial with small adaptations, notably removing the weaviate part of it, and replacing their retrieval module by a very(...) Prompting is Programming with LMQL [Gautier Marti]In this blog, I just toy around with a relatively new framework for querying (large) language models: LMQL, a SQL-like for LLMs. It is a first step toward a novel programming paradigm: Language Model Programming (LMP). These ideas are described in the very interesting paper Prompting Is Programming:(...) Selected ML Papers from ICML 2023 [Gautier Marti]This blog post serves as a summary and exploration of ~100 papers, providing insights into the key trends presented at ICML 2023. The papers can be categorized into several sub-fields, including Graph Neural Networks and Transformers, Large Language Models, Optimal Transport, Time Series Analysis,(...) Selected ML Papers from ICML 2023 [Gautier Marti]This blog post serves as a summary and exploration of ~100 papers, providing insights into the key trends presented at ICML 2023. The papers can be categorized into several sub-fields, including Graph Neural Networks and Transformers, Large Language Models, Optimal Transport, Time Series Analysis,(...) Active Reading with ChatGPT: Systematic Investing in Credit [Gautier Marti]Yet another experiment with ChatGPT-4: Active reading a semi-technical book. Chapter 1 Can a Combination of Treasuries and Equities Replace Credit in a Portfolio? What is the size of the corporate bond market? As of my knowledge cutoff in September 2021, I don’t have the most recent data on the(...) Active Reading with ChatGPT [Gautier Marti]Another experiment with ChatGPT-4: Active reading a semi-technical book. This book by Michael Isichenko is probably the best I have read so far in this field. Let’s dive into it! You can (and should) buy this book. Chapter 1 Market Data Gautier’s Prompt: The author mentions in his book that(...) Building a S&P 500 company classification from Wikipedia articles (guided by ChatGPT) [Gautier Marti]Collaboration with ChatGPT. I am still useful to package the experiment, and advertise it, but for how long? 🙂 In this joint work, I felt more like the robot copy-pasting rather than the author of the experiment. Sure, I did the prompting, but that too could be automated, after all building(...) Book Review: Volatility Trading [Gautier Marti]A good book for an introduction to volatility from a trading perspective. Some excerpts from Volatility Trading by Sinclair: I am a trader. I am not a mathematician, financial engineer, or philosopher. My success is measured in profits. The tools I use and develop need only be useful. They need not(...) SetFit: Fine-tuning a LLM in 10 lines of code and little labeled data [Gautier Marti]This blog is a follow-up to the series of posts Snorkel Credit Sentiment - Part 1 (May 2019) May the Fourth: VADER for Credit Sentiment? (May 2019) Experimenting with LIME - A tool for model-agnostic explanations of Machine Learning models (May 2019) Using LIME to ‘explain’ Snorkel Labeler(...) Performance attribution of a crypto market-neutral book on a statistical risk model [Gautier Marti]In this short blog post, we investigate whether a simple systematic market-neutral stat arb crypto book loads on the main components of a statistical risk model. from datetime import timedelta import pandas as pd from tqdm import tqdm import statsmodels.formula.api as smf def(...) Interview with ChatGPT about its book 'From Data to Trade: A Machine Learning Approach to Quantitative Trading' [Gautier Marti]Introducing the first book ever generated by an artificial intelligence on the subject of using machine learning for quantitative trading: “From Data to Trade: A Quantitative Approach to Machine Learning”! This groundbreaking work offers a unique perspective on the use of machine learning in the(...) Serverless architecture for crypto trading [Gautier Marti]I recently asked on LinkedIn about advice and opinions on infrastructure for collecting, storing, processing, and storing back derived data (features, signals) for some simple mid freq / stat arb trading strategies. I did not expect to receive so much feedback about infrastructure for trading data(...) Hierarchical PCA x Hierarchical clustering on crypto perpetual futures [Gautier Marti]PCA is a useful tool for quant trading (stat arb) but in its naive implementation suffers from several forms of instabilities which yield to unnecessary turnover (trading cost…) and spurious trades. In order to regularize the model, several techniques are available: Sparse PCA Robust PCA Kernel(...) Crypto PCA First Eigenvector [Gautier Marti]This short blog to illustrate an interesting fact that I found in An Analysis of Eigenvectors of a Stock Market Cross-Correlation Matrix by Nguyen and co-authors: The first eigenvector is not THE market portfolio (market-cap or uniformly weighted) as people usually believe, but a(...) Bayesian net and Boparan 7.625% 30 Nov 2025 Prospectus [Gautier Marti]This blog is a follow-up on a first naive modelling of Matalan notes using Bayesian nets. Bayesian nets are a good tool to quantify qualitative knowledge, as explained here. The work presented in this blog post was mostly realized by Zhiyuan Shen in the context of his financial mathematics master of(...) Naive modelling of Matalan defaulting on its MTNLN 9.5 01/31/24 Notes [Gautier Marti]When reading Denev’s book Probabilistic Graphical Models – A New Way of Thinking in Financial Modelling, commented on my blog back in Summer 2020, I put a note in my todo list to model Matalan probability of default using a Bayesian network (for fun, not work). I was rather familiar with this(...) Naive modelling of credit defaults using a Markov Random Field [Gautier Marti]Mid-2020, I read a book on probabilistic graphical models (PGMs) applied in finance by Alexander Denev. Mid-2021, I hosted a machine learning meetup with an application of PGMs to predict the future states of economic and financial variables, and geopolitical events based on forward-looking views(...) Back to basics: PCA on stocks returns [Gautier Marti]A short code snippet to apply PCA on stocks returns. No secret sauce is used here to clean the empirical covariance matrix. This blog post will mostly serve as a basis for comparing several flavours of PCA and their impact on ex-ante volatility estimation. We may look in future blog posts into(...) Book Review: Advanced Portfolio Mgmt - A Quant's Guide for Fundamental Investors [Gautier Marti]Great book, I absolutely recommend. Precise and concise (less than 200 pages). This book will especially be useful to grads or analysts in the early stages of their career. A junior analyst/quant/data scientist who masters the content of this book will definitely be useful in a pod of fundamental(...) Top2Vec: Distributed Representations of Topics [Gautier Marti]Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis were the most widely used methods for topic modeling for the past 20 years. However, they rely on heavy pre-processing of the text content (custom stop-word lists, stemming, and lemmatization), and require the number of topics to(...) Hong Kong Machine Learning Meetup [Gautier Marti]When? Wednesday, October 27, 2021 from 7:00 PM to 9:00 PM (Hong Kong Time) Where? At your home, on zoom. All meetups will be online as long as this COVID-19 crisis is not over. The page of the event on Meetup: HKML S4E2 Programme: Talk 1: Systematic Pricing and Trading of Municipal Bonds Petter N.(...) Embeddings of Sectors and Industries using Graph Neural Networks [Gautier Marti]You can find the reproducible experiment in this Colab Notebook. In econometrics and financial research, categorical variables, and especially sectors and industries, are usually encoded as dummy variables (also called one-hot encoding in the machine learning community). You can find plenty of such(...)