HF中国镜像站

KnutJaegersberg
/

Deita-34b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

datasets?

#1

by ehartford - opened May 17, 2024

May 17, 2024

In which datasets is this trained?

KnutJaegersberg

Owner May 22, 2024

the same as allways, the 6k version of Deita research paper, but I tried to filter out Chinese records.
I've linked the dataset now.

KnutJaegersberg changed discussion status to closed May 22, 2024

May 22, 2024

I don't see a link to the Deita research paper

KnutJaegersberg

Owner May 22, 2024

I've linked to the github in the dataset

KnutJaegersberg

Owner May 22, 2024

https://github.com/hkust-nlp/deita

KnutJaegersberg

Owner May 22, 2024

I've picked Deita because it performs well for its seize, is based on mostly multiturn conversations and those are very long. It's very flexible, when I can I try to fine tune over the maximum context length my system can bear. It's practical.

KnutJaegersberg

Owner May 22, 2024

it's an AI filtered subset of ultrachat, I think.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment