Differences bewteen OrionForCausalLM and LlamaForCausalLM

#5
by J22 - opened

As far as I can tell, the only differences are that input_layernorm, post_attention_layernorm and final norm are changed to nn.LayerNorm from LlamaRMSNorm.

The attention and embedding are also different by trust remote code

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment