Zephyr 7B Beta beats ChatGPT: HuggingFace’s Open challenge to OpenAI.

5 min readOct 28, 2023
Photo by Mariia Shalabaieva on Unsplash

HuggingFace H4 Team focuses on research and development of aligning language models to be helpful, honest, harmless. They released the Zephyr 7B alpha model 3 weeks ago. Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-α is a fine tuned version of mistrial 7B , which outperforms massive language models like GPT-3.5, Llama-13B-chat, Falcon-40B and many more large language model’s.

Now the H4 team has released Zephyr 7B Beta models which outperforms all large language models including gpt-3.5-turbo, llama 70b, etc and equally competes with gpt-4 in alpaca benchmarks. So the key fact is that zephyr 7b is 25 times smaller gpt-3.5 model.

About Zephyr 7B β

Zephyr-7B-β is the second model in the series, and is a fine-tuned version of mistralai/Mistral-7B-v0.1 that was trained on on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO).

Model description

  • Model type: A 7B parameter GPT-like model fine-tuned on a mix of publicly available, synthetic datasets.
  • Language(s) (NLP): Primarily English
  • License: MIT
  • Finetuned from model: mistralai/Mistral-7B-v0.1


At the time of release, Zephyr-7B-β is the highest ranked 7B chat model on the MT-Bench and AlpacaEval benchmarks:

In particular, on several categories of MT-Bench, Zephyr-7B-β has strong performance compared to larger open models like Llama2-Chat-70B:

How to use Zephyr 7B Beta Model on consumer hardware.

!pip install -q accelerate bitsandbytes gradio transformers langchain xformers einops
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, TextStreamer, gradio

model = AutoModelForCausalLM.from_pretrained("HuggingFaceH4/zephyr-7b-beta",
torch_dtype="auto", load_in_4bit=True)

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta",
def stream(user_prompt):
runtimeFlag = "cuda:0"
system_prompt = 'You are Gathnex AI, an intelligent assistant dedicated to providing effective solutions. Your responses will include emojis to add a friendly and engaging touch. 😊 Analyze user queries and provide clear and practical answers, incorporating emojis to enhance the user experience. Focus on delivering solutions that are accurate, actionable, and helpful. If additional information is required for a more precise solution, politely ask clarifying questions. Your goal is to assist users by providing effective and reliable solutions to their queries.'
E_INST = "</s>"
user, assistant = "<|user|>", "<|assistant|>"

prompt = f"{system_prompt}{E_INST}\n{user}\n{user_prompt.strip()}{E_INST}\n{assistant}\n"

inputs = tokenizer([prompt], return_tensors="pt").to(runtimeFlag)

streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

_ = model.generate(**inputs, streamer=streamer, max_new_tokens=500)

Let’s test the model

stream("write a email about the product delivery deley for 1day, which is 29-10-23 will be the delivery data")


Subject: Product Delivery Delay Notification

Dear Valued Customer,

We regret to inform you that there has been a delay in the delivery of your recent order. The expected delivery date for your product is now 29-10-23, which is a one-day delay from the original delivery date.

We apologize for any inconvenience this may cause and assure you that we are doing everything possible to expedite the delivery process. The delay is due to unforeseen circumstances beyond our control, which has affected our logistics and supply chain.

We understand the importance of timely delivery, and we are working diligently to minimize the impact of this delay. We will keep you updated on the progress of your order and provide you with a new delivery window as soon as possible.

We appreciate your patience and understanding during this time, and we assure you that we will make every effort to ensure that your order is delivered to you as soon as possible.

If you have any further questions or concerns, please do not hesitate to contact us. Our customer service team is available 24/7 to assist you.

Thank you for your continued business, and we apologize once again for any inconvenience caused.

Best regards,

[Your Name]

[Your Company Name]

📦 Delivery Delay 🚚

📩 Email Notification 📧

📅 New Delivery Date: 29-10-23 📅

📱 Contact Us Anytime 📱

🤝 We're Here to Help 🤝

Source : Google colab

Why Zephyr 7B β is outperforming

Just as with the alpha release, what is interesting about the model is not just the metrics, but how it was trained. Zephyr is a fine-tune with these components:

  • Fine-tune of the best small open-source pretrained model out there: Mistral 7B
  • Usage of large scale preferences dataset: UltraFeedback
  • Drop RL to use Direct Preference Optimization (DPO)
  • Overfitting on the preference dataset surprisingly yields better chat results

Training Steps

  1. Distilled Supervised fine-tuning (dSFT): Build a large scale, self-instruct-style dataset (UltraChat) and then do distilled SFT.
  2. AI Feedback (AIF) collection: 4 different LLMs generate completions and then GPT-4 is used to rank the responses (UltraFeedback).
  3. Distilled direct preference optimization (dDPO): We do DPO of the dSFT model (from step 1) using the feedback data (from step 2). DPO is an alternative to PPO that removes the need for a reward model. Zephyr beta trains for more DPO epochs (than Zephyr alpha) leading to better chat results!

Major facts of DPO

  • Overfitting with DPO leads to a better chat model according to all benchmarks
  • We did ablation experiments to see if SFT and DPO were really needed. Conclusions: DPO with no SFT leads to the model not learning the chat template. SFT + DPO yield the best results.
  • The feedback received for Zephyr Alpha was that there was incorrect casing (e.g. “Hi. how are you?”) and some responses were prefaced weirdly (e.g. “I don’t have personal X”), so we did some additional filtering for that.

Huggingface released the language model alignment handbook as open source https://github.com/huggingface/alignment-handbook .


You can evaluate Zephyr 7B beta with other language models using Chatbot Arena, LMSYS arena: http://arena.lmsys.org




🤖 Exploring Generative AI & LLM. Join the Gathnex community for cutting-edge discussions and updates! LinkedIn : https://www.linkedin.com/company/gathnex/ 🌟