Complex Classification

In a previous post, we asked VerbaGPT to build a classification model for us. In this example we use VerbaGPT to classify the type of Iris flower from the famous iris dataset1. However, we challenge it further by asking for not one but 3 models, and giving us a confusion matrix as the measure of performance.

Video:

Coming soon…

User question: Give me 3 best models to predict the iris flower. Please rank their performance, and include confusion matrix as at least one measure of performance.

VerbaGPT response:

Cost of query: $0.00*

* Using open-source llama4-maverick model from OpenRouter

Logistic Regression Confusion Matrix:
[[11  0  0]
 [ 0 13  0]
 [ 0  0  6]]

K-Nearest Neighbors Confusion Matrix:
[[11  0  0]
 [ 0 12  1]
 [ 0  0  6]]

Support Vector Machines Confusion Matrix:
[[11  0  0]
 [ 0 13  0]
 [ 0  0  6]]

VerbaGPT answers the question correctly and gives us what we asked for. The diagonal represents correctly classified examples, and the off-diagonal items in the confusion matrix are the errors. We can see that logistic model performs the best (i.e., highest number of examples falling on the diagonal of the matrix).

Check out the full VerbaGPT read-out by click on below.

VerbaGPT Message Window
Here is some Python code that you can use to predict the iris flower species using the iris dataset from your MS SQL database. This code uses the pandas, sklearn, and numpy libraries, so make sure to import them before running the code. The code uses three different models: Logistic Regression, K-Nearest Neighbors, and Support Vector Machines. The performance of each model is measured using a confusion matrix.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix

# Load data from MS SQL database
query = "SELECT * FROM iris"
iris = pd.read_sql(query, conn)

# Split data into features and target
X = iris[['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']]
y = iris['target']

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# Create models
models = {
    'Logistic Regression': LogisticRegression(),
    'K-Nearest Neighbors': KNeighborsClassifier(),
    'Support Vector Machines': SVC()
}

# Train and evaluate models
for name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    cm = confusion_matrix(y_test, y_pred)
    print(f'{name} Confusion Matrix:\n{cm}\n')

This code loads the iris dataset from your MS SQL database using a SQL query and the pd.read_sql function. The data is then split into features (X) and target (y) variables. The data is further split into training and test sets using the train_test_split function.

Three different models are created: Logistic Regression, K-Nearest Neighbors, and Support Vector Machines. Each model is trained on the training data using the fit method and then used to make predictions on the test data using the predict method. The performance of each model is evaluated using a confusion matrix generated by the confusion_matrix function.

You can run this code to see the confusion matrix for each model and compare their performance. You can also modify the code to use different models or evaluation metrics if you wish.
  1. iris dataset

Posted

in

, ,

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *