Polymodeling

In this example we use VerbaGPT to create multiple models to predict the progression of diabetes from the famous diabetes dataset¹.

Video:

User question: Give me 3 best models to predict the iris flower. Please rank their performance, and include confusion matrix as at least one measure of performance.

VerbaGPT response:

Cost of query: $0.00*

* Using open-source llama4-maverick model from OpenRouter

Decision Tree R2 score: 1.000
Random Forest R2 score: 0.921
Linear Regression R2 score: 0.518

VerbaGPT answers the question correctly and gives us what we asked for. It is amazing that we are at a point where we can ask a question in plain english, and in seconds have this statistical cake prepared, recipe and and all, that tells us all kinds of interesting things and gives us an incredible head-start in our analysis.

Check out the full VerbaGPT read-out by click on below.

VerbaGPT Message Window

Here is a Python code snippet that imports the necessary libraries and uses three different models to predict the target variable in the diabetes dataset. The models are ranked in terms of R2 performance:


import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score

# Load data from SQL database
query = "SELECT * FROM diabetes"
data = pd.read_sql(query, conn)

# Split data into features and target variable
X = data.drop('target', axis=1)
y = data['target']

# Create and fit models
lr = LinearRegression()
lr.fit(X, y)
dt = DecisionTreeRegressor()
dt.fit(X, y)
rf = RandomForestRegressor()
rf.fit(X, y)

# Make predictions and calculate R2 scores
lr_pred = lr.predict(X)
lr_r2 = r2_score(y, lr_pred)
dt_pred = dt.predict(X)
dt_r2 = r2_score(y, dt_pred)
rf_pred = rf.predict(X)
rf_r2 = r2_score(y, rf_pred)

# Rank models by R2 score
models = [('Linear Regression', lr_r2), ('Decision Tree', dt_r2), ('Random Forest', rf_r2)]
models.sort(key=lambda x: x[1], reverse=True)

# Print results
for model in models:
    print(f'{model[0]} R2 score: {model[1]:.3f}')


This code uses the `LinearRegression`, `DecisionTreeRegressor`, and `RandomForestRegressor` models from the `sklearn` library to make predictions on the diabetes dataset. The R2 scores for each model are calculated using the `r2_score` function from the `sklearn.metrics` module. The models are then ranked by their R2 scores and the results are printed.

diabetes dataset

Comments

Leave a Reply Cancel reply