datasets?

#1
by ehartford - opened

In which datasets is this trained?

the same as allways, the 6k version of Deita research paper, but I tried to filter out Chinese records.
I've linked the dataset now.

KnutJaegersberg changed discussion status to closed

I don't see a link to the Deita research paper

I've linked to the github in the dataset

I've picked Deita because it performs well for its seize, is based on mostly multiturn conversations and those are very long. It's very flexible, when I can I try to fine tune over the maximum context length my system can bear. It's practical.

it's an AI filtered subset of ultrachat, I think.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment