papers | Tyler Farghly

Updated aug-25. See my Google Scholar page for a more up-to-date list.

Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis

T Farghly, P Rebeschini, G Deligiannidis, and A Doucet

arXiv preprint 2025

URL Bib
@article{farghly2025implicit, title = {Implicit Regularisation in Diffusion Models: An Algorithm-Dependent Generalisation Analysis}, author = {Farghly, Tyler and Rebeschini, Patrick and Deligiannidis, George and Doucet, Arnaud}, journal = {arXiv preprint arXiv:2507.03756}, year = {2025}, }
The success of denoising diffusion models raises important questions regarding their generalisation behaviour, particularly in high-dimensional settings. Notably, it has been shown that when training and sampling are performed perfectly, these models memorise training data – implying that some form of regularisation is essential for generalisation. Existing theoretical analyses primarily rely on algorithm-independent techniques such as uniform convergence, heavily utilising model structure to obtain generalisation bounds. In this work, we instead leverage the algorithmic aspects that promote generalisation in diffusion models, developing a general theory of algorithm-dependent generalisation for this setting. Borrowing from the framework of algorithmic stability, we introduce the notion of score stability, which quantifies the sensitivity of score-matching algorithms to dataset perturbations. We derive generalisation bounds in terms of score stability, and apply our framework to several fundamental learning settings, identifying sources of regularisation. In particular, we consider denoising score matching with early stopping (denoising regularisation), sampler-wide coarse discretisation (sampler regularisation) and optimising with SGD (optimisation regularisation). By grounding our analysis in algorithmic properties rather than model structure, we identify multiple sources of implicit regularisation unique to diffusion models that have so far been overlooked in the literature.
Generalisation under gradient descent via deterministic PAC-Bayes

E Clerico, T Farghly, G Deligiannidis, B Guedj, and A Doucet

In ALT 2025

URL Bib
@inproceedings{pmlr-v272-clerico25a, title = {Generalisation under gradient descent via deterministic PAC-Bayes}, author = {Clerico, Eugenio and Farghly, Tyler and Deligiannidis, George and Guedj, Benjamin and Doucet, Arnaud}, booktitle = {Proceedings of The 36th International Conference on Algorithmic Learning Theory}, pages = {349--389}, year = {2025}, editor = {Kamath, Gautam and Loh, Po-Ling}, volume = {272}, series = {Proceedings of Machine Learning Research}, month = {24--27 Feb}, publisher = {PMLR}, }
We establish disintegrated PAC-Bayesian generalisation bounds for models trained with gradient descent methods or continuous gradient flows. Contrary to standard practice in the PAC-Bayesian setting, our result applies to optimisation algorithms that are deterministic, without requiring any de-randomisation step. Our bounds are fully computable, depending on the density of the initial distribution and the Hessian of the training objective over the trajectory. We show that our framework can be applied to a variety of iterative optimisation algorithms, including stochastic gradient descent (SGD), momentum-based schemes, and damped Hamiltonian dynamics.
Towards a Complete Analysis of Langevin Monte Carlo: Beyond Poincaré Inequality

A Mousavi-Hosseini, T Farghly, Y He, K Balasubramanian, and M Erdogdu

In COLT 2023

URL Bib
@inproceedings{pmlr-v195-mousavi-hosseini23a, title = {Towards a Complete Analysis of Langevin Monte Carlo: Beyond Poincaré Inequality}, author = {Mousavi-Hosseini, Alireza and Farghly, Tyler K. and He, Ye and Balasubramanian, Krishna and Erdogdu, Murat A.}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {1--35}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, }
Langevin diffusions are rapidly convergent under appropriate functional inequality assumptions. Hence, it is natural to expect that with additional smoothness conditions to handle the discretization errors, their discretizations like the Langevin Monte Carlo (LMC) converge in a similar fashion. This research program was initiated by Vempala and Wibisono (2019), who established results under log-Sobolev inequalities. Chewi et al. (2022a) extended the results to handle the case of Poincaré inequalities. In this paper, we go beyond Poincaré inequalities, and push this research program to its limit. We do so by establishing upper and lower bounds for Langevin diffusions and LMC under weak Poincaré inequalities that are satisfied by a large class of densities including polynomially-decaying heavy-tailed densities (i.e., Cauchy-type). Our results explicitly quantify the effect of the initializer on the performance of the LMC algorithm. In particular, we show that as the tail goes from sub-Gaussian, to sub-exponential, and finally to Cauchy-like, the dependency on the initial error goes from being logarithmic, to polynomial, and then finally to being exponential. This three-step phase transition is in particular unavoidable as demonstrated by our lower bounds, clearly defining the boundaries of LMC.
Mean-Square Analysis of Discretized Itô Diffusions for Heavy-tailed Sampling

Y He, T Farghly, K Balasubramanian, and M Erdogdu

JMLR 2024

URL

We analyze the complexity of sampling from a class of heavy-tailed distributions by discretizing a natural class of Itô diffusions associated with weighted Poincaré inequalities. Based on a mean-square analysis, we establish the iteration complexity for obtaining a sample whose distribution is ϵ close to the target distribution in the Wasserstein-2 metric. In this paper, our results take the mean-square analysis to its limits, i.e., we invariably only require that the target density has finite variance, the minimal requirement for a mean-square analysis. To obtain explicit estimates, we compute upper bounds on certain moments associated with heavy-tailed targets under various assumptions. We also provide similar iteration complexity results for the case where only function evaluations of the unnormalized target density are available by estimating the gradients using a Gaussian smoothing technique. We provide illustrative examples based on the multivariate t-distribution.
Time-independent Generalization Bounds for SGLD in Non-convex Settings

T Farghly, and P Rebeschini

In NeurIPS 2021

URL Bib
@inproceedings{NEURIPS2021_a4ee59dd, author = {Farghly, Tyler and Rebeschini, Patrick}, booktitle = {Advances in Neural Information Processing Systems}, editor = {Ranzato, M. and Beygelzimer, A. and Dauphin, Y. and Liang, P.S. and Vaughan, J. Wortman}, pages = {19836--19846}, publisher = {Curran Associates, Inc.}, title = {Time-independent Generalization Bounds for SGLD in Non-convex Settings}, volume = {34}, year = {2021}, }
We establish generalization error bounds for stochastic gradient Langevin dynamics (SGLD) with constant learning rate under the assumptions of dissipativity and smoothness, a setting that has received increased attention in the sampling/optimization literature. Unlike existing bounds for SGLD in non-convex settings, ours are time-independent and decay to zero as the sample size increases. Using the framework of uniform stability, we establish time-independent bounds by exploiting the Wasserstein contraction property of the Langevin diffusion, which also allows us to circumvent the need to bound gradients using Lipschitz-like assumptions. Our analysis also supports variants of SGLD that use different discretization methods, incorporate Euclidean projections, or use non-isotropic noise.
Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations

S Coope, T Farghly, D Gerz, I Vulić, and M Henderson

In ACL 2020

URL Bib
@inproceedings{coope-etal-2020-span, title = {{S}pan-{ConveRT}: {F}ew-shot Span Extraction for Dialog with Pretrained Conversational Representations}, author = {Coope, Samuel and Farghly, Tyler and Gerz, Daniela and Vuli{\'c}, Ivan and Henderson, Matthew}, editor = {Jurafsky, Dan and Chai, Joyce and Schluter, Natalie and Tetreault, Joel}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics}, month = jul, year = {2020}, address = {Online}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2020.acl-main.11/}, doi = {10.18653/v1/2020.acl-main.11}, pages = {107--121}, }
We introduce Span-ConveRT, a light-weight model for dialog slot-filling which frames the task as a turn-based span extraction task. This formulation allows for a simple integration of conversational knowledge coded in large pretrained conversational models such as ConveRT (Henderson et al., 2019). We show that leveraging such knowledge in Span-ConveRT is especially useful for few-shot learning scenarios: we report consistent gains over 1) a span extractor that trains representations from scratch in the target domain, and 2) a BERT-based span extractor. In order to inspire more work on span extraction for the slot-filling task, we also release RESTAURANTS-8K, a new challenging data set of 8,198 utterances, compiled from actual conversations in the restaurant booking domain.