tinyllama-proteinpretrain-quinoa

Full model finetuning of TinyLLaMA-1.1B on the "research" split (quinoa protein sequences) of GreenBeing-Proteins dataset.

Notes: pretraining only on sequences leads the model to only generate protein sequences, eventually repeating VVVV ot KKKK.

  • This model may be replaced with mixed training (bio/chem text and protein).
  • This model might need "biotokens" to represent the amino acids instead of using the existing tokenizer.

More details TBD

Downloads last month
8
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for monsoon-nlp/tinyllama-proteinpretrain-quinoa

Finetuned
(93)
this model

Datasets used to train monsoon-nlp/tinyllama-proteinpretrain-quinoa