Quant Mashup - Eran Raviv

Statistical Shrinkage (2) [Eran Raviv]

During 2017 I blogged about Statistical Shrinkage. At the end of that post I mentioned the important role signal-to-noise ratio (SNR) plays when it comes to the need for shrinkage. This post shares some recent related empirical results published in the Journal of Machine Learning Research from the

*- 1 month ago, 8 Aug 2023, 12:46am -*

Most popular posts – 2022 [Eran Raviv]

As per usual this point in time, I check my blog’s traffic-analytics to see which were the most popular pieces last year. Without further ado.. First: Correlation and Correlation Structure (6) – Distance Correlation (08:33 minutes average time on page) Second: Similarity and Dissimilarity

*- 8 months ago, 4 Jan 2023, 10:18am -*

Beware of Spurious Factors [Eran Raviv]

The word spurious refers to “outwardly similar or corresponding to something without having its genuine qualities.” Fake. While the meanings of spurious correlation and spurious regression are common knowledge nowadays, much less is understood about spurious factors. This post draws your

*- 9 months ago, 13 Dec 2022, 09:41pm -*

Correlation and Correlation Structure (6) – Distance Correlation [Eran Raviv]

While linear correlation (aka Pearson correlation) is by far the most common type of dependence measure there are few arguably better ways to characterize\estimate the degree of dependence between variables. This is a fascinating topic I keep coming back to. There is so much for a typical geek to

*- 1 year ago, 15 Aug 2022, 11:32pm -*

Most popular posts – 2021 [Eran Raviv]

Kind of sad, but the same intro which served last year, befits this year also. Littered with Corona, this year was not easy. But looking around me, I feel grateful. The following quote by Socrates comes to mind: “If all our misfortunes were laid in one common heap whence everyone must take an

*- 1 year ago, 4 Jan 2022, 11:03am -*

A New parameterization of Correlation Matrices [Eran Raviv]

In volatility modelling, a typical challenge is to keep the covariance matrix estimate valid, meaning (1) symmetric and (2) positive semi definite*. A new paper published in Econometrica (citing from the paper) “introduces a novel parametrization of the correlation matrix. The reparametrization

*- 1 year ago, 24 Oct 2021, 09:12pm -*

Bayesian vs. Frequentist in Practice, part 3 [Eran Raviv]

This post is inspired by Leo Breiman’s opinion piece “No Bayesians in foxholes”. The saying “there are no atheists in foxholes” refers to the fact that if you are in the foxhole (being bombarded..), you pray! Leo’s paraphrase indicates that when complex, real problems are present, there

*- 2 years ago, 28 Jun 2021, 11:24am -*

Beta in the tails [Eran Raviv]

Every form of strength is also a form of weakness*. I love statistics, but I focus to much on methodology, which is not for everyone. Some people (right or wrong) question: “wonderful sir, but what can I do with it?”. A new paper titled “Beta in the tails” is a showcase application for why

*- 2 years ago, 18 Apr 2021, 11:34am -*

Correlation and correlation structure (5) – a new coefficient of correlation [Eran Raviv]

This is the fifth post which is concerned with quantifying the dependence between variables. When talking correlations one usually thinks about linear correlation, aka Pearson’s correlation. One serious limitation of linear correlation is that it’s, well.. linear. By construction it’s not

*- 2 years ago, 26 Feb 2021, 11:40am -*

Understanding Variance Explained in PCA - Matrix Approximation [Eran Raviv]

Principal component analysis (PCA from here on) is performed via linear algebra functions called eigen decomposition or singular value decomposition. Since you are actually reading this, you may well have used PCA in the past, at school or where you work. There is a strong link between PCA and the

*- 2 years ago, 2 Feb 2021, 08:10pm -*

Most popular posts - 2020 [Eran Raviv]

Littered with Corona, this year was not easy. But looking around me, I feel grateful. The following quote by Socrates comes to mind: “If all our misfortunes were laid in one common heap whence everyone must take an equal portion, most people would be content to take their own and depart.” On

*- 2 years ago, 30 Dec 2020, 08:14pm -*

Why complex models are data-hungry? [Eran Raviv]

If you regularly read this blog then you know I am not one to jump on the “AI Bandwagon”, being quickly weary of anyone flashing the “It’s Artificial Intelligence” joker card. Don’t get me wrong, I understand it is a sexy term I, but to me it always feels a bit like a sales pitch. If the

*- 2 years ago, 19 Nov 2020, 10:18am -*

Correlation and correlation structure - asymmetric correlations of equity portfolios [Eran Raviv]

Here I share a refreshing idea from the paper “Asymmetric correlations of equity portfolios” which was published in the Journal of financial Economics, a top tier journal in this field. The question is how much the observed conditional correlation on the downside (say) differs from the

*- 2 years ago, 19 Nov 2020, 10:17am -*

Boundary corrected kernel density [Eran Raviv]

Density estimation is now a trivial one-liner script in all modern software. What is not so easy is to become comfortable with the result, how well is is my density estimated? we rarely know. One reason is the lack of ground-truth. Density estimation falls under unsupervised learning, we don’t

*- 3 years ago, 30 Jul 2020, 07:44pm -*

R + Python = Rython [Eran Raviv]

Enough! Enough with that pointless R versus Python debate. I find it almost as pointless as the Bayesian vs Frequentist “dispute”. I advocate here what I advocated there (“..don’t be a Bayesian, nor be a Frequenist, be opportunist“). Nowadays even marginally tedious computation is being

*- 3 years ago, 5 Jul 2020, 11:14pm -*

Machine learning is simply statistics - part 2 [Eran Raviv]

Another opinion piece. If you can’t explain it simply you don’t understand it well enough. (Albert Einstein) Rant in progress A bit on Deep Learning What is so deep about deep learning? Nothing. There is nothing deep about it. If you read through the excellent Deep Learning book you can see (p.

*- 3 years ago, 2 Jun 2020, 09:45am -*

Curse of Dimensionality part 4: Distance Metrics [Eran Raviv]

Many machine learning algorithms rely on distances between data points as their input, sometimes the only input, especially so for clustering and ranking algorithms. The celebrated k-nearest neighbors (KNN) algorithm is our example chief, but distances are also frequently used as an input in the

*- 3 years ago, 15 Apr 2020, 10:39am -*

Understanding Pointwise Mutual Information [Eran Raviv]

The term mutual information is drawn from the field of information theory. Information theory is busy with the quantification of information. For example, a central concept in this field is entropy, which we have discussed before. If you google the term “mutual information” you will land at some

*- 3 years ago, 27 Jan 2020, 10:01am -*

Most popular posts – 2019 [Eran Raviv]

As every year, I checked my analytics so that I can let you know what was popular. This year I have also experimented with a survey where I asked one question at the end of each relevant post. About 120 replies recieved, but the free Survey Monkey account (the survey provider I went with) only lets

*- 3 years ago, 6 Jan 2020, 09:12am -*

CUR matrix decomposition for improved data analysis [Eran Raviv]

I have recently been reading about more modern ways to decompose a matrix. Singular value decomposition is a popular way, but there are more. I went down the rabbit whole. After a couple of “see references therein” I found something which looks to justify spending time on this. An excellent

*- 3 years ago, 20 Oct 2019, 09:17am -*

Understanding Variance Explained in PCA [Eran Raviv]

Principal component analysis (PCA) is one of the earliest multivariate techniques. Yet not only it survived but it is arguably the most common way of reducing the dimension of multivariate data, with countless applications in almost all sciences. Mathematically, PCA is performed via linear algebra

*- 4 years ago, 4 Sep 2019, 09:17pm -*

Robust Moving Average [Eran Raviv]

Moving average is one of the most commonly used smoothing method, basically the go-to. It helps us detect trend in the data by smoothing out short term fluctuations. The computation is trivial: take the most recent k points and simple-average them. Here is how it looks: Moving average example with

*- 4 years ago, 1 Aug 2019, 08:11pm -*

Portfolio construction tilting towards higher moments [Eran Raviv]

When you build your portfolio you must decide what is your risk profile. A pension fund’s risk profile is different than that of a hedge fund, which is different than that of a family office. Everyone’s goal is to maximize returns given the risk. Sinfully but commonly risk is defined as the

*- 4 years ago, 10 Jun 2019, 09:55am -*

Adaptive Huber Regression [Eran Raviv]

Many years ago, when I was still trying to beat the market, I used to pair-trade. In principle it is quite straightforward to estimate the correlation between two stocks. The estimator for beta is very important since it determines how much you should long the one and how much you should short the

*- 4 years ago, 19 May 2019, 07:56am -*

Day of the week and the cross-section of returns [Eran Raviv]

I just finished reading an interesting paper by Justin Birru titled: “Day of the week and the cross-section of returns” (reference below). The story is much too simple to be true, but it looks to be so. In fact, I would probably altogether skip it without the highly ranked Journal of Financial

*- 4 years ago, 15 Mar 2019, 11:57am -*

Most popular machine learning R packages [Eran Raviv]

In a previous post: Most popular machine learning R packages, trying to hash out what are the most frequently used machine learning packages, I simply chose few names from my own memory. However, there is a CRAN task views web page which “aims to provide some guidance which packages on CRAN are

*- 4 years ago, 10 Feb 2019, 08:55pm -*

R tips and tricks – higher-order functions [Eran Raviv]

A higher-order function is a function that takes one or more functions as arguments, and\or returns a function as its result. This can be super handy in programming when you want to tilt your code towards readability and still keep it concise. Consider the following code: # Generate some fake data

*- 4 years ago, 28 Jan 2019, 12:06am -*

Most popular posts – 2018 [Eran Raviv]

2019 is well underway. 2018 was personally difficult, so I am happy it’s behind us. Without further ado, here is what my analytics report shows to be the three most popular posts for 2018: – Create own Recession Indicator using Mixture Models (3:53 minutes average time on page) – Portfolio

*- 4 years ago, 13 Jan 2019, 09:24pm -*

Reproducible Finance with R - Book Review [Eran Raviv]

Reproducible Finance with R is a clever book, with modern treatment of classical concepts. Here below is what I liked- and disliked about the book. Back when I was practicing Judo, there was a guy in my group who mastered that one exercise (called Uchi Mata). He could go fighting 20 consecutive

*- 4 years ago, 5 Jan 2019, 12:13pm -*

Create own Recession Indicator using Mixture Models [Eran Raviv]

Broadly speaking, we can classify financial markets conditions into two categories: Bull and Bear. The first is a “todo bien” market, tranquil and generally upward sloping. The second describes a market with a downturn trend, usually more volatile. It is thought that those bull\bear terms

*- 4 years ago, 27 Nov 2018, 09:31pm -*

Price Movement Prediction [Eran Raviv]

Just finished reading the paper Stock Market’s Price Movement Prediction With LSTM Neural Networks. The abstract attractively reads: “The results that were obtained are promising, getting up to an average of 55.9% of accuracy when predicting if the price of a particular stock is going to go up

*- 4 years ago, 15 Oct 2018, 02:09am -*

Test of Equality Between Two Densities [Eran Raviv]

Are returns this year actually different than what can be expected from a typical year? Is the variance actually different than what can be expected from a typical year? Those are fairly light, easy to answer questions. We can use tests for equality of means or equality of variances. But how about

*- 4 years ago, 9 Oct 2018, 04:56pm -*

Visualizing Time Series Data [Eran Raviv]

This post has two goals. I hope to make you think about your graphics, and think about the future of data-visualization. An example is given using some simulated time series data. A very quick read. In visualization, like in programming, presenting or any other skill, there is much to learn. Also

*- 5 years ago, 17 Sep 2018, 10:52am -*

Market intraday momentum [Eran Raviv]

I recently spotted the following intriguing paper: Market intraday momentum. From the abstract of that paper: Based on high frequency S&P 500 exchange-traded fund (ETF) data from 1993–2013, we show an intraday momentum pattern: the first half-hour return on the market as measured from the

*- 5 years ago, 29 Jul 2018, 10:15am -*

R in Finance [Eran Raviv]

The yearly R in Finance conference is one of my favorites: 1. Titans of the R community are there every year. This year the founder of Rstudio (but much more really), JJ Allaire was a keynote speaker. He gave a talk about Machine Learning with TensorFlow and R. 2. Single track. I like everything,

*- 5 years ago, 27 Jun 2018, 10:47pm -*

Curse of dimensionality part 3: Higher-Order Comoments [Eran Raviv]

Higher moments such as Skewness and Kurtosis are not as explored as they should be. These moments are crucial for managing portfolio risk. At least as important as volatility, if not more. Skewness relates to asymmetry risk and Kurtosis relates to tail risk. Despite their great importance, those

*- 5 years ago, 20 Jun 2018, 12:59pm -*

Portfolio Construction with R [Eran Raviv]

Constructing a portfolio means allocating your money between few chosen assets. The simplest thing you can do is evenly split your money between few chosen assets. Simple as it is, good research shows it is just fine, and even better than other more sophisticated methods (for example Optimal Versus

*- 5 years ago, 10 Apr 2018, 10:26pm -*

Machine learning is simply statistics [Eran Raviv]

Note: I usually write more technical posts, this is an opinion piece. And you know what they say: opinions are like feet, everybody’s got a couple. Machine learning is simply statistics A lot of buzz words nowadays. Data Science, business intelligence, machine learning, deep learning, statistical

*- 5 years ago, 7 Mar 2018, 11:09am -*

Bitcoin exponential growth [Eran Raviv]

Is bitcoin a bubble? I don’t know. What defines a bubble? The price should drastically overestimate the underlying fundamentals. I simply don’t know much about blockchain to have an opinion there. A related characteristic is a run-away price. Going up fast just because it is going up fast. How

*- 5 years ago, 29 Jan 2018, 11:09am -*

Most popular posts – 2017 [Eran Raviv]

Writing this, I can’t believe how quickly the year 2017 has gone by. Also weird, we are already three weeks into 2018, unreal. Time flies when you’re having fun I guess. The analytics report shows that the three most popular posts for 2017 are: – Understanding False Discovery Rate (4 minutes

*- 5 years ago, 19 Jan 2018, 10:07am -*

R vs MATLAB - round 4 [Eran Raviv]

This is another comparison between R and MATLAB (Python also in the mix this time). In previous rounds we discussed the differences in 3d visualization, differences in syntax and input-output differences. Today is about computational speed. Spoiler alert: MATLAB wins by a knockout. A genuinely fair

*- 6 years ago, 6 Sep 2017, 11:35am -*

Visualizing Tail Risk [Eran Raviv]

Tail risk conventionally refers to the risk of a large and sharp draw down of the portfolio. How large is subjective and depends on how you define what is a tail. A lot of research is directed towards having a good estimate of the tail risk. Some fairly new research also now indicates that investors

*- 6 years ago, 8 Aug 2017, 05:50am -*

Lasso, Lasso, Lasso (and friends) [Eran Raviv]

LASSO stands for Least Absolute Shrinkage and Selection Operator. It was first introduced 21 years ago by Robert Tibshirani (Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B). In 2004 the four statistical masters: Efron, Hastie, Johnstone and

*- 6 years ago, 5 Jul 2017, 11:47pm -*

Density Estimation Using Regression [Eran Raviv]

Density estimation using regression? Yes we can! I like regression. It is one of those simple yet powerful statistical methods. You always know exactly what you are doing. This post is about density estimation, and how to get an estimate of the density using (Poisson) regression. The “go-to”

*- 6 years ago, 26 Jun 2017, 03:17am -*

Computer Age Statistical Inference [Eran Raviv]

If you consider yourself Econometrician\Statistician or one of those numerous buzz word synonyms that are floating around these days, Computer Age Statistical Inference: Algorithms, Evidence and Data Science by Bradley Efron and Trevor Hastie is a book you can’t miss, and now nor should you. You

*- 6 years ago, 5 Jun 2017, 09:09am -*

Random Books [Eran Raviv]

It seems like a very long while since my bachelor. Checking my bookshelf the other day I was thinking to flag some of those books which helped or inspired me along the way. Here they are in no particular order. Risk: Elements of Financial Risk Management Clear and to the point, 5 stars. Value at

*- 6 years ago, 4 Jun 2017, 02:31am -*

Shrinkage in statistics [Eran Raviv]

Shrinkage in statistics has increased in popularity over the decades. Now statistical shrinkage is commonplace, explicitly or implicitly. But when is it that we need to make use of shrinkage? At least partly it depends on signal-to-noise ratio. Introduction The term shrinkage, I think, is the most

*- 6 years ago, 11 May 2017, 02:50am -*

Machine Trading from @ChanEP - Book Review [Eran Raviv]

In trading and in trading-related research one could be quickly overwhelmed with the sea of ink devoted to trading strategies and the like. It is essential that you “pick your battles” so to speak. I recently finished reading Machine Trading, by Ernest Chan. Here is what I think about the book.

*- 6 years ago, 3 May 2017, 04:45am -*

Understanding False Discovery Rate [Eran Raviv]

False Discovery Rate is an unintuitive name for a very intuitive statistical concept. The math involved is as elegant as possible. Still, it is not an easy concept to actually understand. Hence i thought it would be a good idea to write this short tutorial. We reviewed this important topic in the

*- 6 years ago, 4 Apr 2017, 07:14am -*

Understanding K-Means Clustering [Eran Raviv]

Google “K-means clustering”, and you usually you find ugly explanations and math-heavy sensational formulas*. It is my opinion that you can only understand those explanations if you don’t need them; meaning you are already familiar with the topic. Therefore, this is a more gentle introduction

*- 6 years ago, 12 Mar 2017, 07:00pm -*