译者 | 布加迪
审校 | 重楼
大语言模型(LLM)已成为一种工具,从回答问题到生成任务列表,它们在许多方面简化了我们的工作。如今个人和企业已经使用LLM来帮助完成工作。
代码生成和评估最近已经成为许多商业产品提供的重要功能,以帮助开发人员处理代码。LLM还可以进一步用于处理数据科学工作,尤其是模型选择和试验。
本文将探讨如何将自动化用于模型选择和试验。
借助LLM实现模型选择和试验自动化
我们将设置用于模型训练的数据集和用于自动化的代码。在这个例子中,我们将使用来自Kaggle的信用汽车欺诈数据集。以下是我为预处理过程所做的准备。
复制
import pandas as pd df = pd.read_csv('fraud_data.csv') df = df.drop(['trans_date_trans_time', 'merchant', 'dob', 'trans_num', 'merch_lat', 'merch_long'], axis =1) df = df.dropna().reset_index(drop = True) df.to_csv('fraud_data.csv', index = False)
我们将只使用一些数据集,丢弃所有缺失的数据。这不是最优的过程,但我们关注的是模型选择和试验。
接下来,我们将为我们的项目准备一个文件夹,将所有相关文件放在那里。首先,我们将为环境创建requirements.txt文件。你可以用下面的软件包来填充它们。
复制
openai pandas scikit-learn pyyaml
接下来,我们将为所有相关的元数据使用YAML文件。这将包括OpenAI API密钥、要测试的模型、评估度量指标和数据集的位置。
复制
llm_api_key: "YOUR-OPENAI-API-KEY" default_models: - LogisticRegression - DecisionTreeClassifier - RandomForestClassifier metrics: ["accuracy", "precision", "recall", "f1_score"] dataset_path: "fraud_data.csv"
然后,我们导入这个过程中使用的软件包。我们将依靠Scikit-Learn用于建模过程,并使用OpenAI的GPT-4作为LLM。
复制
import pandas as pd import yaml import ast import re import sklearn from openai import OpenAI from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
此外,我们将设置辅助(helper)函数和信息来帮助该过程。从数据集加载到数据预处理,配置加载器在如下的函数中。
复制
model_mapping = { "LogisticRegression": LogisticRegression, "DecisionTreeClassifier": DecisionTreeClassifier, "RandomForestClassifier": RandomForestClassifier } def load_config(config_path='config.yaml'): with open(config_path, 'r') as file: config = yaml.safe_load(file) return config def load_data(dataset_path): return pd.read_csv(dataset_path) def preprocess_data(df): label_encoders = {} for column in df.select_dtypes(include=['object']).columns: le = LabelEncoder() df[column] = le.fit_transform(df[column]) label_encoders[column] = le return df, label_encoders
在同一个文件中,我们将LLM设置为扮演机器学习角色的专家。我们将使用下面的代码来启动它。
复制
def call_llm(prompt, api_key): client = OpenAI(api_key=api_key) response = client.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": "You are an expert in machine learning and able to evaluate the model well."}, {"role": "user", "content": prompt} ] ) return response.choices[0].message.content.strip()
你可以将LLM模型更改为所需的模型,比如来自Hugging Face的开源模型,但我们建议暂且坚持使用OpenAI。
我将在下面的代码中准备一个函数来清理LLM结果。这确保了输出可以用于模型选择和试验步骤的后续过程。
复制
def clean_hyperparameter_suggestion(suggestion): pattern = r'\{.*?\}' match = re.search(pattern, suggestion, re.DOTALL) if match: cleaned_suggestion = match.group(0) return cleaned_suggestion else: print("Could not find a dictionary in the hyperparameter suggestion.") return None def extract_model_name(llm_response, available_models): for model in available_models: pattern = r'\b' + re.escape(model) + r'\b' if re.search(pattern, llm_response, re.IGNORECASE): return model return None def validate_hyperparameters(model_class, hyperparameters): valid_params = model_class().get_params() invalid_params = [] for param, value in hyperparameters.items(): if param not in valid_params: invalid_params.append(param) else: if param == 'max_features' and value == 'auto': print(f"Invalid value for parameter '{param}': '{value}'") invalid_params.append(param) if invalid_params: print(f"Invalid hyperparameters for {model_class.__name__}: {invalid_params}") return False return True def correct_hyperparameters(hyperparameters, model_name): corrected = False if model_name == "RandomForestClassifier": if 'max_features' in hyperparameters and hyperparameters['max_features'] == 'auto': print("Correcting 'max_features' from 'auto' to 'sqrt' for RandomForestClassifier.") hyperparameters['max_features'] = 'sqrt' corrected = True return hyperparameters, corrected
然后,我们将需要该函数来启动模型和评估训练过程。下面的代码将用于通过接受分割器数据集、我们要映射的模型名称以及超参数来训练模型。结果将是度量指标和模型对象。
复制
def train_and_evaluate(X_train, X_test, y_train, y_test, model_name, hyperparameters=None): if model_name not in model_mapping: print(f"Valid model names are: {list(model_mapping.keys())}") return None, None model_class = model_mapping.get(model_name) try: if hyperparameters: hyperparameters, corrected = correct_hyperparameters(hyperparameters, model_name) if not validate_hyperparameters(model_class, hyperparameters): return None, None model = model_class(**hyperparameters) else: model = model_class() except Exception as e: print(f"Error instantiating model with hyperparameters: {e}") return None, None try: model.fit(X_train, y_train) except Exception as e: print(f"Error during model fitting: {e}") return None, None y_pred = model.predict(X_test) metrics = { "accuracy": accuracy_score(y_test, y_pred), "precision": precision_score(y_test, y_pred, average='weighted', zero_division=0), "recall": recall_score(y_test, y_pred, average='weighted', zero_division=0), "f1_score": f1_score(y_test, y_pred, average='weighted', zero_division=0) } return metrics, model
准备就绪后,我们就可以设置自动化过程了。有几个步骤我们可以实现自动化,其中包括:
1.训练和评估所有模型
2. LLM选择最佳模型
3. 检查最佳模型的超参数调优
4. 如果LLM建议,自动运行超参数调优
复制
def run_llm_based_model_selection_experiment(df, config): #Model Training X = df.drop("is_fraud", axis=1) y = df["is_fraud"] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) available_models = config['default_models'] model_performance = {} for model_name in available_models: print(f"Training model: {model_name}") metrics, _ = train_and_evaluate(X_train, X_test, y_train, y_test, model_name) model_performance[model_name] = metrics print(f"Model: {model_name} | Metrics: {metrics}") #LLM selecting the best model sklearn_version = sklearn.__version__ prompt = ( f"I have trained the following models with these metrics: {model_performance}. " "Which model should I select based on the best performance?" ) best_model_response = call_llm(prompt, config['llm_api_key']) print(f"LLM response for best model selection:\n{best_model_response}") best_model = extract_model_name(best_model_response, available_models) if not best_model: print("Error: Could not extract a valid model name from LLM response.") return print(f"LLM selected the best model: {best_model}") #Check for hyperparameter tuning prompt_tuning = ( f"The selected model is {best_model}. Can you suggest hyperparameters for better performance? " "Please provide them in Python dictionary format, like {'max_depth': 5, 'min_samples_split': 4}. " f"Ensure that all suggested hyperparameters are valid for scikit-learn version {sklearn_version}, " "and avoid using deprecated or invalid values such as 'max_features': 'auto'. " "Don't provide any explanation or return in any other format." ) tuning_suggestion = call_llm(prompt_tuning, config['llm_api_key']) print(f"Hyperparameter tuning suggestion received:\n{tuning_suggestion}") cleaned_suggestion = clean_hyperparameter_suggestion(tuning_suggestion) if cleaned_suggestion is None: suggested_params = None else: try: suggested_params = ast.literal_eval(cleaned_suggestion) if not isinstance(suggested_params, dict): print("Hyperparameter suggestion is not a valid dictionary.") suggested_params = None except (ValueError, SyntaxError) as e: print(f"Error parsing hyperparameter suggestion: {e}") suggested_params = None #Automatically run hyperparameter tuning if suggested if suggested_params: print(f"Running {best_model} with suggested hyperparameters: {suggested_params}") tuned_metrics, _ = train_and_evaluate( X_train, X_test, y_train, y_test, best_model, hyperparameters=suggested_params ) print(f"Metrics after tuning: {tuned_metrics}") else: print("No valid hyperparameters were provided for tuning.")
在上面的代码中,我指定了LLM如何根据试验评估我们的每个模型。我们使用以下提示根据模型的性能来选择要使用的模型。
复制
prompt = ( f"I have trained the following models with these metrics: {model_performance}. " "Which model should I select based on the best performance?")
你始终可以更改提示,以实现模型选择的不同规则。
一旦选择了最佳模型,我将使用以下提示来建议应该使用哪些超参数用于后续过程。我还指定了Scikit-Learn版本,因为超参数因版本的不同而有变化。
复制
prompt_tuning = ( f"The selected model is {best_model}. Can you suggest hyperparameters for better performance? " "Please provide them in Python dictionary format, like {'max_depth': 5, 'min_samples_split': 4}. " f"Ensure that all suggested hyperparameters are valid for scikit-learn version {sklearn_version}, " "and avoid using deprecated or invalid values such as 'max_features': 'auto'. " "Don't provide any explanation or return in any other format.")
你可以以任何想要的方式更改提示,比如通过更大胆地尝试调优超参数,或添加另一种技术。
我把上面的所有代码放在一个名为automated_model_llm.py的文件中。最后,添加以下代码以运行整个过程。
复制
def main(): config = load_config() df = load_data(config['dataset_path']) df, _ = preprocess_data(df) run_llm_based_model_selection_experiment(df, config) if __name__ == "__main__": main()
一旦一切准备就绪,你就可以运行以下代码来执行代码。
复制
python automated_model_llm.py
输出:
复制
LLM selected the best model: RandomForestClassifier Hyperparameter tuning suggestion received: { 'n_estimators': 100, 'max_depth': None, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 'sqrt', 'bootstrap': True } Running RandomForestClassifier with suggested hyperparameters: {'n_estimators': 100, 'max_depth': None, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_features': 'sqrt', 'bootstrap': True} Metrics after tuning: {'accuracy': 0.9730041532071989, 'precision': 0.9722907483489197, 'recall': 0.9730041532071989, 'f1_score': 0.9724045530119824}
这是我试验得到的示例输出。它可能和你的不一样。你可以设置提示和生成参数,以获得更加多变或严格的LLM输出。然而,如果你正确构建了代码的结构,可以将LLM运用于模型选择和试验自动化。
结论
LLM已经应用于许多使用场景,包括代码生成。通过运用LLM(比如OpenAI GPT模型),我们就很容易委派LLM处理模型选择和试验这项任务,只要我们正确地构建输出的结构。在本例中,我们使用样本数据集对模型进行试验,让LLM选择和试验以改进模型。
原文标题:Model Selection and Experimentation Automation with LLMs,作者:Cornellius Yudha Wijaya