03:49:41
Computational Biology
FuncCell: Function-Aware Single-Cell Embeddings
A novel approach to single-cell analysis that aggregates 19,294 ProteinBERT embeddings into compact 512-dimensional cell representations, achieving 99.5% accuracy and 0.999 AUC on cancer classification with minimal training.
Key innovations:
- • Attention-Based Aggregation: Learns which genes matter regardless of expression level—capturing tumor suppressors that stay important when lowly expressed while ignoring noisy housekeeping genes.
- • Function-First Embedding: Maps cells to protein function space rather than raw expression, enabling biologically meaningful representations.
- • Lightweight Architecture: Only ~198K parameters trained for 7 epochs for attention pooling.
- • Interpretable Results: Attention weights reveal which genes drive classification decisions.
PyTorchProteinBERTScanpyScikit-learnPython