Computational Biology

FuncCell: Function-Aware Single-Cell Embeddings

A novel approach to single-cell analysis that aggregates 19,294 ProteinBERT embeddings into compact 512-dimensional cell representations, achieving 99.5% accuracy and 0.999 AUC on cancer classification with minimal training.

Key innovations:

  • Attention-Based Aggregation: Learns which genes matter regardless of expression level—capturing tumor suppressors that stay important when lowly expressed while ignoring noisy housekeeping genes.
  • Function-First Embedding: Maps cells to protein function space rather than raw expression, enabling biologically meaningful representations.
  • Lightweight Architecture: Only ~198K parameters trained for 7 epochs for attention pooling.
  • Interpretable Results: Attention weights reveal which genes drive classification decisions.
PyTorchProteinBERTScanpyScikit-learnPython