Abdullah Ahmed, Alec Albrecht, Carlos Hernandez, Ankita Somu, Sanjana Srinivasan

Link to the model codebase on Google Colab
With the rise of artwork generated by text-to-image models like DALL-E and Midjourney, it raises an interesting question about influences from source material. Generative art models are trained on large datasets including works from different art movements over time. Our team seeks to develop a solution for classifying an art piece into a stylistic period. This categorization algorithm could be used to identify and retrace the art styles mimicked in generative art. Previous approaches have used k-Nearest Neighbor and support vector machine algorithms to build their models (Falomir, Zoe, et al.).
Given an image our project aims to classify the style of an art piece. Our goal is to create a model that consistently classifies the style of the images in any collection.
We are using a collection of artwork on Kaggle called the Best Artworks of All Time to train our model. The dataset identifies stylistic periods for each artist included in the collection. Our model will learn to categorize artwork to its respective artist based on its training data. The model will then be able to report what artist is most likely associated with a new artwork and then a stylistic period can be concluded.
The preprocessing for the images from the dataset:
The purpose for the image transformations is to take care of classes in the dataset with a low number of data points. In our case, the artists Eugene Delacroix (31), Georges Seurat (43), Jackson Pollack (24) had low amounts of images in the dataset. These transformations allowed us to generate more data points for our model to train on.

This is a showcase of how the image normalization changes the images. The histograms shown indicate a change in range from [0, 1] to a range centered around 0. The purpose of this normalization is to make the mean of all images 0 and the standard deviation of all images 1.
Explaining our rationale for using certain metrics and how they are relevant to our model.
Loss vs Epoch
Loss is a common measure used to evaluate how well neural networks perform when working on training data. The loss shows the difference between the predicted outputs and the actual label values, which we should be minimizing to create an optimal model. The loss after each epoch should be decreasing throughout the training process. Visualizing and calculating how the loss changes after each epoch will give us a good idea of how successful the model is. Monitoring the loss vs. epoch will allow us to observe how the network is converging, and this value should rapidly decrease, as that would mean the model is adapting well to the training data, while a plateau in the graph may potentially indicate problems such as overfitting. This metric can also help us in improving our model, especially in terms of different hyperparameters (such as batch size or number of epochs), as we can see how changes in these may affect the loss, giving us ideas to fine tune. It will also assist in comparing both our models to see which one learns and converges better.
In a many of the DL frameworks like TensorFlow and PyTorch (used in our case), we optimize cross entropy loss and use that as a measure to see how efficient our implementation is. Because cross entropy is also logarithmic in nature, it gives us a very nuanced and precise way of measuring the accuracy of our model. Even a slight change can lead to significant drops in the deviations from the true distribution, which forces our model to be extremely accurate to get even moderately accurate predictions. This is relevant to our model because when we are analyzing paintings, we aim to reduce misclassification of genres and artists as much as possible. Especially in cases when we are trying to identify fraud, we must be overly particular in what similarity exists between the predicted and true distributions to give us the ground truth value. In essence, we will be working to get our cross entropy value as low as possible.
Confusion matrix
A confusion matrix is one of the most efficient ways to properly visualize how the model is performing for each artist batch specifically. Based on the batch that we want to sample, we can create True Positives, True Negatives, False Positives, and False Negatives categories and determine if the instances of a class are being correct or not, based on if they are in a class. Another reason we chose this is because we can measure the specificity rate, which is the true negative rate. We can thus measure the proportion of actual negative instances correctly predicted, and see if the model is biased toward predicting the majority vs minority classes. Finally, by analyzing TP, TN, FP, and FN, we know a confusion matrix is relevant to our model and the neural network system because it allows us to iteratively change the model to quickly identify misclassifications of identifying artists from paintings, which is not easily done with other metrics such as only looking at True Positives or area under the curve (detrimental for datasets that are imbalanced).
Using PyTorch’s Neural Network class as a starting point, the architecture used for this dataset follows that proposed by Simonyan and Zisserman. Due to our image size, it consists of only 4 convolution layers each with a convolution, rectified linear unit (ReLU) and maxpool. The maxpool is always of size 2 and stride two, while the convolutions increase in number of filters, doubling from 64 to 512. Three fully connected layers and ReLUs step down the convolution output to the 50 classes in the data. Cross entropy loss is used to determine the loss for each batch and a stochastic gradient descent optimizer, also provided by PyTorch, was used for network backpropagation. A Batch size of 64 was used oer 25 epochs for initial training.
The testing dataset was then fed into the trained model. Since the output of the model is the vector of probabilities that a given image belongs to a particular class, the maximum value is taken as the predicted label. The accuracy of the model as a whole is calculated, as well as class specific accuracies defined as the number of labels correctly predicted out of the total size.
| Conv[3-64] | MaxPool | Conv[3-128] | MaxPool | Conv[3-256] | MaxPool => |
|---|---|---|---|---|---|
| 64x64x128x128 | 64x64x64x64 | 64x128x64x64 | 64x128x32x32 | 64x256x32x32 | 64x256x16x16 |
| => Conv[3-512] | MaxPool | Flatten | FC-4096 | FC-4096 | Fc-50 |
|---|---|---|---|---|---|
| 64x512x16x16 | 64x512x8x8 | 64x32768 | 64x4096 | 64x4096 | 64x50 |
Link to the model codebase on Google Colab
So far, our model is quite successful, and is improving with adjustments. Without padding the dataset, the model had an overall accuracy of 19 percent, which is around 10 times better than random guesses, even though the specific accuracies for each class were not very high. After adjusting some hyperparameters such as the number of epochs and the batch sizes, the model improved significantly. 17 of the 50 artists were able to be classified with an accuracy of 50% or higher (all accuracies listed below), most probably the artists that had many images in the dataset. Many of the classes followed close behind, with accuracies in the high forties.
| Artist Name | Percent Accuracy |
|---|---|
| Albrecht_Durer | 21.0% |
| Alfred_Sisley | 34.5% |
| Amedeo_Modigliani | 48.1% |
| Andrei_Rublev | 29.1% |
| Andy_Warhol | 69.2% |
| Camille_Pissarro | 42.0% |
| Caravaggio | 41.6% |
| Claude_Monet | 56.9% |
| Diego_Rivera | 51.1% |
| Diego_Velazquez | 53.5% |
| Edgar_Degas | 65.5% |
| Edouard_Manet | 46.4% |
| Edvard_Munch | 44.4% |
| El_Greco | 42.2% |
| Eugene_Delacroix | 52.5% |
| Francisco_Goya | 37.5% |
| Frida_Kahlo | 12.5% |
| Georges_Seurat | 38.6% |
| Giotto_di_Bondone | 52.6% |
| Gustav_Klimt | 43.1% |
| Gustave_Courbet | 52.7% |
| Henri_Matisse | 45.3% |
| Henri_Rousseau | 29.0% |
| Henri_de_Toulouse-Lautrec | 43.3% |
| Hieronymus_Bosch | 49.1% |
| Jackson_Pollock | 56.9% |
| Jan_van_Eyck | 86.4% |
| Joan_Miro | 23.4% |
| Kazimir_Malevich | 75.0% |
| Leonardo_da_Vinci | 59.6% |
| Marc_Chagall | 35.9% |
| Michelangelo | 40.6% |
| Mikhail_Vrubel | 41.4% |
| Pablo_Picasso | 28.1% |
| Paul_Cezanne | 39.3% |
| Paul_Gauguin | 25.9% |
| Paul_Klee | 49.1% |
| Peter_Paul_Rubens | 65.7% |
| Pierre-Auguste_Renoir | 50.0% |
| Piet_Mondrian | 23.2% |
| Pieter_Bruegel | 66.2% |
| Raphael | 31.1% |
| Rembrandt | 53.8% |
| Rene_Magritte | 14.1% |
| Salvador_Dali | 42.6% |
| Sandro_Botticelli | 68.7% |
| Titian | 20.0% |
| Vasiliy_Kandinskiy | 22.6% |
| Vincent_van_Gogh | 37.5% |
| William_Turner | 56.7% |
Analyzing our data, we know that our data points should be relatively logarithmic in nature and follow the opposite trend of a bell-curve, we attribute the lack of initial rise in our graphs to the fact that we are still working on training our model, and the batch size is still relatively unstable. For instance, the learning rate for our optimization algorithm is currently a bit low, so it is partially attributing to the slight plateau in the loss curve. Additionally, we are still experimenting with different batch sizes which will certainly impact the linearity of the loss curve.
The confusion matrix above depicts our predicted labels vs. the actual label values. The diagonal of the matrix represents the times that our model predicted the labels correctly, so ideally, the values on this diagonal should be high, and the values on all other positions in the matrix should be low, as these represent incorrect labeling from our model. Overall, the model appears to have a high accuracy, however there are some classes where the number of correct predictions is not as distinct from incorrect predictions for that class. However, there are rarely any classes for which incorrect predictions outnumber correct predictions. While the accuracy looks quite good, this is something that we can work to improve as we adjust our models, increasing our accuracy.
Confusion matrix preview of last batch
| Artist Name | Precision | Recall | F1-Score |
|---|---|---|---|
| Albrecht_Durer | 0.32432432432432434 | 0.4528301886792453 | 0.37795275590551186 |
| Alfred_Sisley | 0.48717948717948717 | 0.31666666666666665 | 0.38383838383838376 |
| Amedeo_Modigliani | 0.2204724409448819 | 0.4666666666666667 | 0.29946524064171126 |
| Andrei_Rublev | 0.5217391304347826 | 0.2222222222222222 | 0.3116883116883117 |
| Andy_Warhol | 0.62 | 0.5535714285714286 | 0.5849056603773586 |
| Camille_Pissarro | 0.5490196078431373 | 0.25225225225225223 | 0.345679012 |
| Caravaggio | 0.5510204081632653 | 0.19852941176470587 | 0.29189189189189185 |
| Claude_Monet | 0.4782608695652174 | 0.18032786885245902 | 0.2619047619047619 |
| Diego_Rivera | 0.42105263157894735 | 0.38095238095238093 | 0.4 |
| Diego_Velazquez | 0.4393939393939394 | 0.48333333333333334 | 0.46031746031746035 |
| Edgar_Degas | 0.524390244 | 0.7166666666666667 | 0.6056338028169014 |
| Edouard_Manet | 0.31666666666666665 | 0.3333333333333333 | 0.3247863247863248 |
| Edvard_Munch | 0.4153846153846154 | 0.4909090909090909 | 0.45 |
| El_Greco | 0.6206896551724138 | 0.3103448275862069 | 0.41379310344827586 |
| Eugene_Delacroix | 0.49122807017543857 | 0.4827586206896552 | 0.4869565217391304 |
| Francisco_Goya | 0.5454545454545454 | 0.24 | 0.3333333333333333 |
| Frida_Kahlo | 0.23333333333333334 | 0.11864406779661017 | 0.15730337078651685 |
| Georges_Seurat | 0.36363636363636365 | 0.4827586206896552 | 0.4148148148148148 |
| Giotto_di_Bondone | 0.4868421052631579 | 0.6379310344827587 | 0.5522388059701493 |
| Gustav_Klimt | 0.38 | 0.3064516129032258 | 0.33928571428571425 |
| Gustave_Courbet | 0.31746031746031744 | 0.4 | 0.35398230088495575 |
| Henri_Matisse | 0.3717948717948718 | 0.5576923076923077 | 0.4461538461538461 |
| Henri_Rousseau | 0.4 | 0.14925373134328357 | 0.21739130434782608 |
| Henri_de_Toulouse-Lautrec | 0.28378378378378377 | 0.3 | 0.29166666666666663 |
| Hieronymus_Bosch | 0.4626865671641791 | 0.5166666666666667 | 0.4881889763779528 |
| Jackson_Pollock | 0.6268656716417911 | 0.5833333333333334 | 0.604316547 |
| Jan_van_Eyck | 0.8461538461538461 | 0.5789473684210527 | 0.6875 |
| Joan_Miro | 0.175 | 0.2 | 0.18666666666666665 |
| Kazimir_Malevich | 0.5084745762711864 | 0.6382978723404256 | 0.5660377358490567 |
| Leonardo_da_Vinci | 0.4125 | 0.5789473684210527 | 0.4817518248175182 |
| Marc_Chagall | 0.3684210526315789 | 0.2 | 0.25925925925925924 |
| Michelangelo | 0.3333333333333333 | 0.625 | 0.43478260869565216 |
| Mikhail_Vrubel | 0.272 | 0.5230769230769231 | 0.3578947368421053 |
| Pablo_Picasso | 0.5 | 0.14285714285714285 | 0.22222222222222224 |
| Paul_Cezanne | 0.5384615384615384 | 0.1 | 0.16867469879518074 |
| Paul_Gauguin | 0.20353982300884957 | 0.40350877192982454 | 0.27058823529411763 |
| Paul_Klee | 0.34545454545454546 | 0.3220338983050847 | 0.33333333333333326 |
| Peter_Paul_Rubens | 0.3148148148148148 | 0.25757575757575757 | 0.2833333333333333 |
| Pierre-Auguste_Renoir | 0.38095238095238093 | 0.38095238095238093 | 0.38095238095238093 |
| Piet_Mondrian | 0.21052631578947367 | 0.12698412698412698 | 0.15841584158415842 |
| Pieter_Bruegel | 0.4444444444444444 | 0.4745762711864407 | 0.4590163934426229 |
| Raphael | 0.21794871794871795 | 0.3953488372093023 | 0.2809917355371901 |
| Rembrandt | 0.7083333333333334 | 0.5573770491803278 | 0.6238532110091743 |
| Rene_Magritte | 0.21641791044776118 | 0.4393939393939394 | 0.29 |
| Salvador_Dali | 0.6206896551724138 | 0.5538461538461539 | 0.5853658536585366 |
| Sandro_Botticelli | 0.43820224719101125 | 0.8125 | 0.5693430656934307 |
| Titian | 0.2602739726027397 | 0.296875 | 0.2773722627737227 |
| Vasiliy_Kandinskiy | 0.3125 | 0.1724137931034483 | 0.22222222222222224 |
| Vincent_van_Gogh | 0.094339623 | 0.16666666666666666 | 0.12048192771084337 |
| William_Turner | 0.35526315789473684 | 0.4153846153846154 | 0.3829787234042554 |
| accuracy | 0.3796234028244788 | 0.3796234028244788 | 0.3796234028244788 |
| macro avg | 0.4106144987762023 | 0.3899332054177764 | 0.37601054370366105 |
| weighted avg | 0.4159938498220489 | 0.3796234028244788 | 0.371166493 |
We also reported metrics like precision and recall to calculate the f1-score. Precision is a measure of how many of the positive predictions are correct, in other words, how many TPs there are. On the other hand, recall is the number of positive cases the classifier predicts correctly divided by the total number of positive cases. Support helps us visualize how many occurrences of each artist there are in the dataset, which is useful when determining if frequency correlates to accuracy in some way. There are also other measures like specificity and sensitivity. However, for our model, these metrics are not as relevant. It is commonly known and reported that an f1-score close to 0.7 or higher is a good f1-score, which is calculated from the precision and recall values. From our collected data, after doing these calculations, it is apparent that our f1-score is in line with our precision, which indicates a consistent accuracy of about 50%, which translates to an average f1-score of about 0.4. This is relatively good for the first iteration for this midterm, and we plan to reach at least about 0.7 by the time we finish training our model.
All in all, our metrics are certainly optimal and highly accurate, which is what is encouraging us to make our model even more accurate despite it being accurate already.
A major next step for our team is to further tune the hyperparameters of the CNN to achieve lower cross-entropy loss values during the training. We will also need to treat any overfitting when we are able to achieve low training loss. Another plan our team has is to likely implement another supervised learning model like k-NN and compare the performance with our current deep learning solution. It is possible we may need to introduce some feature selection or dimensionality reduction in this case as for CNNs it was not necessary. We hope to generate some interesting visualizations on the comparison of these two models.
| Name | Contribution |
|---|---|
| Abdullah Ahmed | Loss vs. Epoch |
| Alec Albrecht | CNN training |
| Carlos Hernandez | Data visualization |
| Ankita Somu | Metric analysis |
| Sanjana Srinivasan | Confusion matrix |
Very Deep Convolutional Networks for Large-Scale Image Recognition
Simonyan, K., & Zisserman, A. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” (2015).
Categorizing Paintings in Art Styles Based on Qualitative Color Descriptors, Quantitative Global Features and Machine Learning (QArt-Learn)
Falomir, Zoe, et al. “Categorizing Paintings in Art Styles Based on Qualitative Color Descriptors, Quantitative Global Features and Machine Learning (QArt-Learn).” Expert Systems with Applications, vol. 97, 2018, pp. 83–94, https://doi.org/10.1016/j.eswa.2017.11.056.
Using Machine Learning for Identification of Art Paintings
Blessing, Alexander. “Using Machine Learning for Identification of Art Paintings.” (2010).
Discerning Art Works through Active Machine Learning
Z. Yu, “Discerning Art Works through Active Machine Learning,” 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China, 2022, pp. 1002-1006, doi: 10.1109/CVIDLICCEA56201.2022.9824180.
Best Artworks of All Time
Icaro (2019, February). Best Artworks of All Time, Version 1. Retrieved October 6, 2023 from https://www.kaggle.com/datasets/ikarus777/best-artworks-of-all-time.