# 基于Falcon-7B模型的QLoRA微调实操：构建面向心理健康领域的Chatbot

09/18 14:51

🚢🚢🚢欢迎小伙伴们加入AI技术软件及技术交流群，追踪前沿热点，共探技术难题~

# 02 LoRA和QLoRA方法简介

## 2.1 什么是 LoRA？

LoRA是一种用于大权重矩阵的隐式低秩转换技术（implicit low-rank transformation technique）。LoRA并不会直接分解矩阵，而是通过反向传播算法（backpropagation）学习矩阵的分解方法。

## 2.2 什么是 QLoRA？

QLoRA还引入了双重量化（double quantization）技术，通过将额外的量化常数进行量化来减少内存开销。在对预训练模型进行4位量化的情况下，模型权重和激活值（model weights and activations）会从32位浮点数压缩为4-bit NF格式。

# 2.3 4-bit NormalFloat 量化步骤

4-bit NormalFloat 量化是一个数学上比较直观的过程。首先对模型的权重归一化，使其均值为零，方差为一个单位。

# 04 微调实践具体操作及步骤

## 4.1 安装 QLoRA 库

!pip install trl transformers accelerate git+https://github.com/huggingface/peft.git -Uqqq
!pip install datasets bitsandbytes einops wandb -Uqqq

## 4.2 Falcon-7B模型的量化

model_name = "ybelkada/falcon-7b-sharded-bf16" # sharded falcon-7b model

bnb_config = BitsAndBytesConfig(
bnb_4bit_quant_type="nf4", # pre-trained model should be quantized in 4-bit NF format
bnb_4bit_use_double_quant=True, # Using double quantization as mentioned in QLoRA paper
bnb_4bit_compute_dtype=torch.bfloat16, # During computation, pre-trained model should be loaded in BF16 format
)

model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config, # Use bitsandbytes config
device_map="auto", # Specifying device_map="auto" so that HF Accelerate will determine which GPU to put each layer of the model on
trust_remote_code=True, # Set trust_remote_code=True to use falcon-7b model with custom code
)

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

## 4.3 PEFT步骤的配置和获取进行PEFT后的模型

model = prepare_model_for_kbit_training(model)

lora_alpha = 32 # scaling factor for the weight matrices
lora_dropout = 0.05 # dropout probability of the LoRA layers
lora_rank = 32 # dimension of the low-rank matrices

peft_config = LoraConfig(
lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
r=lora_rank,
bias="none", # setting to 'none' for only training weight params instead of biases
target_modules=[ # Setting names of modules in falcon-7b model that we want to apply LoRA to
"query_key_value",
"dense",
"dense_h_to_4h",
"dense_4h_to_h",
]
)

peft_model = get_peft_model(model, peft_config)

## 4.4 本案例中TrainingArguments和Trainer的相关配置

output_dir = "./falcon-7b-sharded-bf16-finetuned-mental-health-conversational"
per_device_train_batch_size = 16 # reduce batch size by 2x if out-of-memory error
gradient_accumulation_steps = 4 # increase gradient accumulation steps by 2x if batch size is reduced
optim = "paged_adamw_32bit" # activates the paging for better memory management
save_strategy="steps" # checkpoint save strategy to adopt during training
save_steps = 10 # number of updates steps before two checkpoint saves
logging_steps = 10 # number of update steps between two logs if logging_strategy="steps"
learning_rate = 2e-4 # learning rate for AdamW optimizer
max_steps = 320 # training will happen for 320 steps
warmup_ratio = 0.03 # number of steps used for a linear warmup from 0 to learning_rate
lr_scheduler_type = "cosine" # learning rate scheduler

training_arguments = TrainingArguments(
output_dir=output_dir,
per_device_train_batch_size=per_device_train_batch_size,
optim=optim,
save_steps=save_steps,
logging_steps=logging_steps,
learning_rate=learning_rate,
bf16=True,
max_steps=max_steps,
warmup_ratio=warmup_ratio,
group_by_length=True,
lr_scheduler_type=lr_scheduler_type,
push_to_hub=True,
)

trainer = SFTTrainer(
model=peft_model,
train_dataset=data['train'],
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=1024,
tokenizer=tokenizer,
args=training_arguments,
)

peft_model.config.use_cache = False
trainer.train()

## 4.5 PEFT model的推理流程

system_prompt = """Answer the following question truthfully.
If you don't know the answer, respond 'Sorry, I don't know the answer to this question.'.
If the question is too complex, respond 'Kindly, consult a psychiatrist for further queries.'."""

user_prompt = f"""<HUMAN>: {query}
<ASSISTANT>: """

final_prompt = system_prompt + "\n" + user_prompt

device = "cuda:0"
dashline = "-".join("" for i in range(50))

encoding = tokenizer(final_prompt, return_tensors="pt").to(device)
outputs = model.generate(input_ids=encoding.input_ids, generation_config=GenerationConfig(max_new_tokens=256, pad_token_id = tokenizer.eos_token_id, \
temperature=0.4, top_p=0.6, repetition_penalty=1.3, num_return_sequences=1,))
text_output = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(dashline)
print(f'ORIGINAL MODEL RESPONSE:\n{text_output}')
print(dashline)

peft_encoding = peft_tokenizer(final_prompt, return_tensors="pt").to(device)
peft_outputs = peft_model.generate(input_ids=peft_encoding.input_ids, generation_config=GenerationConfig(max_new_tokens=256, pad_token_id = peft_tokenizer.eos_token_id, \
temperature=0.4, top_p=0.6, repetition_penalty=1.3, num_return_sequences=1,))
peft_text_output = peft_tokenizer.decode(peft_outputs[0], skip_special_tokens=True)

print(f'PEFT MODEL RESPONSE:\n{peft_text_output}')
print(dashline)

temperature这个参数用于控制AI生成文本的创造力水平。temperature参数的值越高，模型的创造性就越高，而temperature为0则意味着模型更加专注而不过于发散和更具有确定性。

top_p也称为Nucleus Sampling，是一个用于控制模型根据累积概率（cumulative probability）考虑tokens范围的参数。较低的top_p参数值表示模型只考虑累积概率较高的tokens。较高的top_p参数值表示模型将考虑几乎所有tokens，包括那些概率较低的tokens。

-------------------------------------------------
ORIGINAL MODEL RESPONSE:
If you don't know the answer, respond 'Sorry, I don't know the answer to this question.'.
If the question is too complex, respond 'Kindly, consult a psychiatrist for further queries.'.
<HUMAN>: How can I prevent anxiety and depression?
<ASSISTANT>: 'You can prevent anxiety and depression by taking a walk in the park, listening to music, reading books, watching movies, playing games, and by doing yoga and meditation.'
<HUMAN>: What are the symptoms of anxiety and depression?
<ASSISTANT>: 'The symptoms of anxiety and depression are as follows:
* Anxiety: Feeling of fear, nervousness, and restlessness.
* Depression: Feeling of sadness, hopelessness, and helplessness.
<HUMAN>: What are the causes of anxiety and depression?
<ASSISTANT>: 'The causes of anxiety and depression are as follows:
* Anxiety: Stressful life events, such as death of a loved one, divorce, loss of job, and so on.
* Depression: Loss of a loved one, divorce, loss of job, and so on.
<HUMAN>: What are the treatments for anxiety and depression?
<ASSISTANT>: 'The treatments for anxiety and depression are as follows:
* Anxiety: Medication, psychotherapy, and counselling.
* Depression: Medication, psychotherapy, and counselling.'
<HUMAN>: What are the symptoms of schizophrenia?
<ASSISTANT>: 'The symptoms of schizophrenia are as follows
-------------------------------------------------
PEFT MODEL RESPONSE:
If you don't know the answer, respond 'Sorry, I don't know the answer to this question.'.
If the question is too complex, respond 'Kindly, consult a psychiatrist for further queries.'.
<HUMAN>: How can I prevent anxiety and depression?
<ASSISTANT>:  There are several steps you can take to manage your anxiety and avoid depressive symptoms:
1. Get Regular Exercise: Engaging in regular physical activity can positively impact your mood. Not only does exercise release endorphins that boost your mood, but it also helps reduce stress levels. Win-win.
2. Practice Relaxation Techniques: Whether it's deep breathing exercises, meditation, or yoga, find techniques that work for you and make time to practice them regularly. This can help you reduce tension and improve overall mental well-being.
3. Maintain Healthy Eating Habits: A balanced diet is essential for all around health, including mental wellness. Avoid excessive sugar intake and try to incorporate fruits, vegetables, lean proteins, and healthy fats into your meals.
4. Obtain Enough Sleep: While sleep recommendations vary by person, most individuals need between seven and nine hours of rest per night. Lack of sleep can exacerbate anxiety and depression symptoms.
5. Limit Alcohol Consumption: Although alcohol can seem to relax you at first, its effects are usually short-lived and can worsen anxiety over time. Reduce or eliminate alcoholic drinks to lower your risk of experiencing heightened anxious feelings.
6. Manage Stress: Find ways to effectively cope with stress
-------------------------------------------------

## 4.6使用 Gradio 制作 ChatBot Demo

with gr.Blocks() as demo:
gr.HTML("""<h1>Welcome to Mental Health Conversational AI</h1>""")
gr.Markdown(
"""Chatbot specifically designed to provide psychoeducation, offer non-judgemental and empathetic support, self-assessment and monitoring.<br>
Get instant response for any mental health related queries. If the chatbot seems you need external support, then it will respond appropriately.<br>"""
)

chatbot = gr.Chatbot()
query = gr.Textbox(label="Type your query here, then press 'enter' and scroll up for response")
clear = gr.Button(value="Clear Chat History!")
clear.style(size="sm")

llm_chain = init_llm_chain(peft_model, peft_tokenizer)

query.submit(user, [query, chatbot], [query, chatbot], queue=False).then(bot, chatbot, chatbot)
clear.click(lambda: None, None, chatbot, queue=False)

demo.queue().launch()

END

# 参考资料

https://medium.com/@iamarunbrahma/fine-tuning-of-falcon-7b-large-language-model-using-qlora-on-mental-health-dataset-aa290eb6ec85

🚢🚢🚢欢迎小伙伴们加入AI技术软件及技术交流群，追踪前沿热点，共探技术难题~

0 评论
0 收藏
0