TFvelo (Li et al., Nat Commun 2024)

Jiachen Li, Xiaoyong Pan, Ye Yuan & Hong-Bin Shen. Nature Communications 15:1387 (2024-02-15). Shanghai Jiao Tong University (Institute of Image Processing and Pattern Recognition).

Summary

TFvelo estimates RNA velocity without splicing information, replacing the unspliced/spliced phase delay with transcription-factor (TF) regulation. Its premise (established by LASSO): the velocity of a target gene can be approximated as a linear combination of its TFs’ expression. So TFvelo models the time-derivative of a target gene’s total mRNA abundance y_g as dy_g/dt = h(X_g, y_g) = W_g·X_g − γ_g·y_g, where X_g are the gene’s TF expressions and W_g learned regulatory weights. Like UniTVelo it is top-down: it directly designs the abundance profile y_g(t) = α_g·sin(ω_g t + θ_g) + β_g and fits via a generalized EM over three parameter groups (cell-specific latent time t, TF weights W, shape params). The phase portrait becomes WX (TF combination) vs y (target) — a clockwise co-expression curve — instead of unspliced vs spliced. Because it needs only total mRNA + a GRN prior, it runs on datasets where splicing is unavailable or too sparse: FISH, ATAC-paired (vs MultiVelo without ATAC), and privacy-restricted human embryo data lacking the raw sequencing. For the wiki, TFvelo is the extreme of the regulatory axis: it keeps RegVelo’s GRN-driven-α idea but drops the splicing kinetic basis entirely — moving further from physical-time grounding even as it gains robustness.

Key Claims

  • Velocity ≈ linear combination of TF expression. LASSO on the pancreas dataset predicts scVelo velocity of a target from its TFs’ expression with high held-out correlation; TFs with non-zero weights are significantly enriched in the ENCODE TF-target database (p = 6.66e-06) — motivating a regulation-based velocity model.
  • Splicing-free dynamical model. dy_g(t)/dt = W_g·X_g(t) − γ_g·y_g(t), with y = total mRNA (spliced + unspliced), X_g = TF expressions, γ_g degradation. TF-target pairs from ENCODE + ChEA; a TF is included if the pair is annotated in ≥1 of the two DBs.
  • Top-down sine profile + generalized EM. y_g(t) = α_g·sin(ω_g t + θ_g) + β_g (ω fixed = 2π, unimodal on [0,1]); derived WX(t) = α√(4π²+γ²)·sin(2πt+θ+φ)+βγ, φ=arctan(2π/γ). EM alternates: assign t (grid search on the phase curve) → update W (bounded linear regression, trust-region) → update shape [α,β,θ,γ]. Loss = signed Euclidean residual to the fitted curve (Gaussian).
  • New phase portrait (WX vs y). The clockwise loop on the TF-combination vs target plane plays the role the unspliced/spliced loop plays in classical velocity — and is cleaner where the splicing delay (~30 s, vs the differentiation timescale) is too brief or too noisy to resolve.
  • Works where splicing methods can’t. (a) Pancreas: recovers H19, MAML3 directionality; KEGG insulin-secretion / glucagon enrichment; REST (negative) & HMGN3 (positive) as key TFs. (b) Gastrulation erythroid: GATA1/GATA2/LMO2 key TFs; porphyrin/heme GO. (c) 10x multiome brain: matches MultiVelo direction without using ATAC (AHI1, NTRK2, GRIN2B). (d) 88 human preimplantation embryos (day 3–7): raw sequencing unavailable → splicing methods fail, only TFvelo runs (PPP1R14A, RGS2, GSTP1).
  • Competitive metrics. Validated on synthetic data (Spearman 0.823 weights, 0.894 velocity); high In-Cluster Coherence (ICCoh ≈ 0.98, on par with UniTVelo) and CBDir. Slower than scVelo (must learn TF representations); some genes still unfittable.

Physical-time grounding (standing lens)

  1. Latent time — ordinal or metric? Ordinal. Cell-specific latent time t on [0,1] (sine unimodal window), used to build a velocity-pseudotime and transition matrix (scVelo-style), validated by CBDir / expression-trend correlation. No physical units.
  2. Scale degeneracy. Inherited — the sine profile lives in normalized [0,1] time; W and γ are relative; no absolute anchor. (Arguably deepened: with no splicing kinetics, even the ds/dt-in-hours interpretation the velocyto origin had is gone — velocity here is a regression-derived rate of total abundance.)
  3. External time anchor. None — and uniquely, TFvelo removes the splicing signal too, working from total mRNA + a static GRN prior. It is the furthest from an external time anchor: no labeling, no unspliced, just regulation-implied direction.
  4. Constant-rate assumptions. Production is replaced by W_g·X_g(t) — a GRN-driven, time-varying “transcription” term (the analog of RegVelo’s GRN-gated α, but using TF expression directly rather than a learned function on spliced counts). γ_g is a gene-specific constant.

TFvelo is the cleanest illustration of a wiki thesis: the field advances on the regulatory axis while the temporal axis stays untouched — here taken to the limit, splicing itself is discarded in favor of pure TF→target regulation. It gains real robustness (FISH, no-splicing, privacy-restricted data) but its “velocity” is a regulation-implied direction on total abundance, not a physically calibrated rate. For FlowVelo: a strong reminder that “RNA velocity” now spans from physical ds/dt (velocyto) to abundance-regression-from-GRN (TFvelo) — the temporal-grounding question must be asked separately from the directional-accuracy question.

Key Quotes

“we propose TFvelo, which expands the RNA velocity concept to various single-cell datasets without relying on splicing information, by introducing gene regulatory information.” — Abstract.

“TFvelo models the dynamics of a target gene with the expression level of TFs and the target gene itself … instead of only relying on the unspliced/spliced counts.” — Introduction.

Connections

  • TFvelo — the method entity.
  • grn-informed-velocity — TFvelo is the splicing-free extreme of this axis.
  • RegVelo — the GRN-coupled sibling that keeps splicing; TFvelo drops it. Both make α regulation-driven.
  • UniTVelo — shares the top-down profile-function strategy (sine vs RBF); TFvelo cites it.
  • scVelo — LASSO on scVelo velocity motivates the model; the splicing baseline it replaces.
  • MultiVelo — TFvelo matches its direction without ATAC; a regulation-only vs epigenome comparison.
  • velocyto / splicing-kinetics-ode — the splicing-kinetics basis TFvelo abandons.
  • velocity-skepticism — constructive response to “splicing signal is weak/noisy” (drop it, use regulation).
  • physical-time-grounding / physical-time-grounding-across-methods — regulatory axis advances, temporal axis untouched.
  • FlowVelo — contrast on what “velocity” even denotes once splicing is gone.

Contradictions

  • Tension with the splicing-centric definition of velocity. Classical RNA velocity (velocyto) is the unspliced→spliced derivative; TFvelo redefines velocity as a GRN-driven derivative of total abundance with no splicing. Not a factual contradiction, but a notable conceptual divergence to flag on RNA velocity and grn-informed-velocity — “velocity” is becoming an umbrella for several different observables.