Artwork Categorization

Abdullah Ahmed, Alec Albrecht, Carlos Hernandez, Ankita Somu, Sanjana Srinivasan

View the Project on GitHub

Artwork

Midterm Report

Link to the model codebase on Google Colab

Back


Introduction/Background

With the rise of artwork generated by text-to-image models like DALL-E and Midjourney, it raises an interesting question about influences from source material. Generative art models are trained on large datasets including works from different art movements over time. Our team seeks to develop a solution for classifying an art piece into a stylistic period. This categorization algorithm could be used to identify and retrace the art styles mimicked in generative art. Previous approaches have used k-Nearest Neighbor and support vector machine algorithms to build their models (Falomir, Zoe, et al.).

Problem definition

Given an image our project aims to classify the style of an art piece. Our goal is to create a model that consistently classifies the style of the images in any collection.

Data collection

We are using a collection of artwork on Kaggle called the Best Artworks of All Time to train our model. The dataset identifies stylistic periods for each artist included in the collection. Our model will learn to categorize artwork to its respective artist based on its training data. The model will then be able to report what artist is most likely associated with a new artwork and then a stylistic period can be concluded.

Methods

Preprocessing

The preprocessing for the images from the dataset:

The purpose for the image transformations is to take care of classes in the dataset with a low number of data points. In our case, the artists Eugene Delacroix (31), Georges Seurat (43), Jackson Pollack (24) had low amounts of images in the dataset. These transformations allowed us to generate more data points for our model to train on.

Image Normalization

Original Resized Images Original Resized Images Normalized Resized Images Normalized Resized Images

This is a showcase of how the image normalization changes the images. The histograms shown indicate a change in range from [0, 1] to a range centered around 0. The purpose of this normalization is to make the mean of all images 0 and the standard deviation of all images 1.

Metrics

Explaining our rationale for using certain metrics and how they are relevant to our model.

Loss vs Epoch

Loss is a common measure used to evaluate how well neural networks perform when working on training data. The loss shows the difference between the predicted outputs and the actual label values, which we should be minimizing to create an optimal model. The loss after each epoch should be decreasing throughout the training process. Visualizing and calculating how the loss changes after each epoch will give us a good idea of how successful the model is. Monitoring the loss vs. epoch will allow us to observe how the network is converging, and this value should rapidly decrease, as that would mean the model is adapting well to the training data, while a plateau in the graph may potentially indicate problems such as overfitting. This metric can also help us in improving our model, especially in terms of different hyperparameters (such as batch size or number of epochs), as we can see how changes in these may affect the loss, giving us ideas to fine tune. It will also assist in comparing both our models to see which one learns and converges better.

In a many of the DL frameworks like TensorFlow and PyTorch (used in our case), we optimize cross entropy loss and use that as a measure to see how efficient our implementation is. Because cross entropy is also logarithmic in nature, it gives us a very nuanced and precise way of measuring the accuracy of our model. Even a slight change can lead to significant drops in the deviations from the true distribution, which forces our model to be extremely accurate to get even moderately accurate predictions. This is relevant to our model because when we are analyzing paintings, we aim to reduce misclassification of genres and artists as much as possible. Especially in cases when we are trying to identify fraud, we must be overly particular in what similarity exists between the predicted and true distributions to give us the ground truth value. In essence, we will be working to get our cross entropy value as low as possible.

Confusion matrix

A confusion matrix is one of the most efficient ways to properly visualize how the model is performing for each artist batch specifically. Based on the batch that we want to sample, we can create True Positives, True Negatives, False Positives, and False Negatives categories and determine if the instances of a class are being correct or not, based on if they are in a class. Another reason we chose this is because we can measure the specificity rate, which is the true negative rate. We can thus measure the proportion of actual negative instances correctly predicted, and see if the model is biased toward predicting the majority vs minority classes. Finally, by analyzing TP, TN, FP, and FN, we know a confusion matrix is relevant to our model and the neural network system because it allows us to iteratively change the model to quickly identify misclassifications of identifying artists from paintings, which is not easily done with other metrics such as only looking at True Positives or area under the curve (detrimental for datasets that are imbalanced).

Model Architecture

Using PyTorch’s Neural Network class as a starting point, the architecture used for this dataset follows that proposed by Simonyan and Zisserman. Due to our image size, it consists of only 4 convolution layers each with a convolution, rectified linear unit (ReLU) and maxpool. The maxpool is always of size 2 and stride two, while the convolutions increase in number of filters, doubling from 64 to 512. Three fully connected layers and ReLUs step down the convolution output to the 50 classes in the data. Cross entropy loss is used to determine the loss for each batch and a stochastic gradient descent optimizer, also provided by PyTorch, was used for network backpropagation. A Batch size of 64 was used oer 25 epochs for initial training.

The testing dataset was then fed into the trained model. Since the output of the model is the vector of probabilities that a given image belongs to a particular class, the maximum value is taken as the predicted label. The accuracy of the model as a whole is calculated, as well as class specific accuracies defined as the number of labels correctly predicted out of the total size.

Conv[3-64] MaxPool Conv[3-128] MaxPool Conv[3-256] MaxPool =>
64x64x128x128 64x64x64x64 64x128x64x64 64x128x32x32 64x256x32x32 64x256x16x16
=> Conv[3-512] MaxPool Flatten FC-4096 FC-4096 Fc-50
64x512x16x16 64x512x8x8 64x32768 64x4096 64x4096 64x50

Link to the model codebase on Google Colab

Results and discussion

General Results

So far, our model is quite successful, and is improving with adjustments. Without padding the dataset, the model had an overall accuracy of 19 percent, which is around 10 times better than random guesses, even though the specific accuracies for each class were not very high. After adjusting some hyperparameters such as the number of epochs and the batch sizes, the model improved significantly. 17 of the 50 artists were able to be classified with an accuracy of 50% or higher (all accuracies listed below), most probably the artists that had many images in the dataset. Many of the classes followed close behind, with accuracies in the high forties.

Model Accuracies by Class
Artist Name Percent Accuracy
Albrecht_Durer 21.0%
Alfred_Sisley 34.5%
Amedeo_Modigliani 48.1%
Andrei_Rublev 29.1%
Andy_Warhol 69.2%
Camille_Pissarro 42.0%
Caravaggio 41.6%
Claude_Monet 56.9%
Diego_Rivera 51.1%
Diego_Velazquez 53.5%
Edgar_Degas 65.5%
Edouard_Manet 46.4%
Edvard_Munch 44.4%
El_Greco 42.2%
Eugene_Delacroix 52.5%
Francisco_Goya 37.5%
Frida_Kahlo 12.5%
Georges_Seurat 38.6%
Giotto_di_Bondone 52.6%
Gustav_Klimt 43.1%
Gustave_Courbet 52.7%
Henri_Matisse 45.3%
Henri_Rousseau 29.0%
Henri_de_Toulouse-Lautrec 43.3%
Hieronymus_Bosch 49.1%
Jackson_Pollock 56.9%
Jan_van_Eyck 86.4%
Joan_Miro 23.4%
Kazimir_Malevich 75.0%
Leonardo_da_Vinci 59.6%
Marc_Chagall 35.9%
Michelangelo 40.6%
Mikhail_Vrubel 41.4%
Pablo_Picasso 28.1%
Paul_Cezanne 39.3%
Paul_Gauguin 25.9%
Paul_Klee 49.1%
Peter_Paul_Rubens 65.7%
Pierre-Auguste_Renoir 50.0%
Piet_Mondrian 23.2%
Pieter_Bruegel 66.2%
Raphael 31.1%
Rembrandt 53.8%
Rene_Magritte 14.1%
Salvador_Dali 42.6%
Sandro_Botticelli 68.7%
Titian 20.0%
Vasiliy_Kandinskiy 22.6%
Vincent_van_Gogh 37.5%
William_Turner 56.7%


Loss vs Epoch

Analyzing our data, we know that our data points should be relatively logarithmic in nature and follow the opposite trend of a bell-curve, we attribute the lack of initial rise in our graphs to the fact that we are still working on training our model, and the batch size is still relatively unstable. For instance, the learning rate for our optimization algorithm is currently a bit low, so it is partially attributing to the slight plateau in the loss curve. Additionally, we are still experimenting with different batch sizes which will certainly impact the linearity of the loss curve.

Loss trend across all batches

Confusion Matrix

Click to open zoomable image

The confusion matrix above depicts our predicted labels vs. the actual label values. The diagonal of the matrix represents the times that our model predicted the labels correctly, so ideally, the values on this diagonal should be high, and the values on all other positions in the matrix should be low, as these represent incorrect labeling from our model. Overall, the model appears to have a high accuracy, however there are some classes where the number of correct predictions is not as distinct from incorrect predictions for that class. However, there are rarely any classes for which incorrect predictions outnumber correct predictions. While the accuracy looks quite good, this is something that we can work to improve as we adjust our models, increasing our accuracy.

Original Resized Images Confusion matrix preview of last batch

Precision, Recall and F1-Scores
Artist Name Precision Recall F1-Score
Albrecht_Durer 0.32432432432432434 0.4528301886792453 0.37795275590551186
Alfred_Sisley 0.48717948717948717 0.31666666666666665 0.38383838383838376
Amedeo_Modigliani 0.2204724409448819 0.4666666666666667 0.29946524064171126
Andrei_Rublev 0.5217391304347826 0.2222222222222222 0.3116883116883117
Andy_Warhol 0.62 0.5535714285714286 0.5849056603773586
Camille_Pissarro 0.5490196078431373 0.25225225225225223 0.345679012
Caravaggio 0.5510204081632653 0.19852941176470587 0.29189189189189185
Claude_Monet 0.4782608695652174 0.18032786885245902 0.2619047619047619
Diego_Rivera 0.42105263157894735 0.38095238095238093 0.4
Diego_Velazquez 0.4393939393939394 0.48333333333333334 0.46031746031746035
Edgar_Degas 0.524390244 0.7166666666666667 0.6056338028169014
Edouard_Manet 0.31666666666666665 0.3333333333333333 0.3247863247863248
Edvard_Munch 0.4153846153846154 0.4909090909090909 0.45
El_Greco 0.6206896551724138 0.3103448275862069 0.41379310344827586
Eugene_Delacroix 0.49122807017543857 0.4827586206896552 0.4869565217391304
Francisco_Goya 0.5454545454545454 0.24 0.3333333333333333
Frida_Kahlo 0.23333333333333334 0.11864406779661017 0.15730337078651685
Georges_Seurat 0.36363636363636365 0.4827586206896552 0.4148148148148148
Giotto_di_Bondone 0.4868421052631579 0.6379310344827587 0.5522388059701493
Gustav_Klimt 0.38 0.3064516129032258 0.33928571428571425
Gustave_Courbet 0.31746031746031744 0.4 0.35398230088495575
Henri_Matisse 0.3717948717948718 0.5576923076923077 0.4461538461538461
Henri_Rousseau 0.4 0.14925373134328357 0.21739130434782608
Henri_de_Toulouse-Lautrec 0.28378378378378377 0.3 0.29166666666666663
Hieronymus_Bosch 0.4626865671641791 0.5166666666666667 0.4881889763779528
Jackson_Pollock 0.6268656716417911 0.5833333333333334 0.604316547
Jan_van_Eyck 0.8461538461538461 0.5789473684210527 0.6875
Joan_Miro 0.175 0.2 0.18666666666666665
Kazimir_Malevich 0.5084745762711864 0.6382978723404256 0.5660377358490567
Leonardo_da_Vinci 0.4125 0.5789473684210527 0.4817518248175182
Marc_Chagall 0.3684210526315789 0.2 0.25925925925925924
Michelangelo 0.3333333333333333 0.625 0.43478260869565216
Mikhail_Vrubel 0.272 0.5230769230769231 0.3578947368421053
Pablo_Picasso 0.5 0.14285714285714285 0.22222222222222224
Paul_Cezanne 0.5384615384615384 0.1 0.16867469879518074
Paul_Gauguin 0.20353982300884957 0.40350877192982454 0.27058823529411763
Paul_Klee 0.34545454545454546 0.3220338983050847 0.33333333333333326
Peter_Paul_Rubens 0.3148148148148148 0.25757575757575757 0.2833333333333333
Pierre-Auguste_Renoir 0.38095238095238093 0.38095238095238093 0.38095238095238093
Piet_Mondrian 0.21052631578947367 0.12698412698412698 0.15841584158415842
Pieter_Bruegel 0.4444444444444444 0.4745762711864407 0.4590163934426229
Raphael 0.21794871794871795 0.3953488372093023 0.2809917355371901
Rembrandt 0.7083333333333334 0.5573770491803278 0.6238532110091743
Rene_Magritte 0.21641791044776118 0.4393939393939394 0.29
Salvador_Dali 0.6206896551724138 0.5538461538461539 0.5853658536585366
Sandro_Botticelli 0.43820224719101125 0.8125 0.5693430656934307
Titian 0.2602739726027397 0.296875 0.2773722627737227
Vasiliy_Kandinskiy 0.3125 0.1724137931034483 0.22222222222222224
Vincent_van_Gogh 0.094339623 0.16666666666666666 0.12048192771084337
William_Turner 0.35526315789473684 0.4153846153846154 0.3829787234042554
accuracy 0.3796234028244788 0.3796234028244788 0.3796234028244788
macro avg 0.4106144987762023 0.3899332054177764 0.37601054370366105
weighted avg 0.4159938498220489 0.3796234028244788 0.371166493


We also reported metrics like precision and recall to calculate the f1-score. Precision is a measure of how many of the positive predictions are correct, in other words, how many TPs there are. On the other hand, recall is the number of positive cases the classifier predicts correctly divided by the total number of positive cases. Support helps us visualize how many occurrences of each artist there are in the dataset, which is useful when determining if frequency correlates to accuracy in some way. There are also other measures like specificity and sensitivity. However, for our model, these metrics are not as relevant. It is commonly known and reported that an f1-score close to 0.7 or higher is a good f1-score, which is calculated from the precision and recall values. From our collected data, after doing these calculations, it is apparent that our f1-score is in line with our precision, which indicates a consistent accuracy of about 50%, which translates to an average f1-score of about 0.4. This is relatively good for the first iteration for this midterm, and we plan to reach at least about 0.7 by the time we finish training our model.

All in all, our metrics are certainly optimal and highly accurate, which is what is encouraging us to make our model even more accurate despite it being accurate already.

Next Steps

A major next step for our team is to further tune the hyperparameters of the CNN to achieve lower cross-entropy loss values during the training. We will also need to treat any overfitting when we are able to achieve low training loss. Another plan our team has is to likely implement another supervised learning model like k-NN and compare the performance with our current deep learning solution. It is possible we may need to introduce some feature selection or dimensionality reduction in this case as for CNNs it was not necessary. We hope to generate some interesting visualizations on the comparison of these two models.

Updated timeline

Link to Gantt chart

Contribution table

Name Contribution
Abdullah Ahmed Loss vs. Epoch
Alec Albrecht CNN training
Carlos Hernandez Data visualization
Ankita Somu Metric analysis
Sanjana Srinivasan Confusion matrix

References

Very Deep Convolutional Networks for Large-Scale Image Recognition
Simonyan, K., & Zisserman, A. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” (2015).

Categorizing Paintings in Art Styles Based on Qualitative Color Descriptors, Quantitative Global Features and Machine Learning (QArt-Learn)
Falomir, Zoe, et al. “Categorizing Paintings in Art Styles Based on Qualitative Color Descriptors, Quantitative Global Features and Machine Learning (QArt-Learn).” Expert Systems with Applications, vol. 97, 2018, pp. 83–94, https://doi.org/10.1016/j.eswa.2017.11.056.

Using Machine Learning for Identification of Art Paintings
Blessing, Alexander. “Using Machine Learning for Identification of Art Paintings.” (2010).

Discerning Art Works through Active Machine Learning
Z. Yu, “Discerning Art Works through Active Machine Learning,” 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China, 2022, pp. 1002-1006, doi: 10.1109/CVIDLICCEA56201.2022.9824180.

Best Artworks of All Time
Icaro (2019, February). Best Artworks of All Time, Version 1. Retrieved October 6, 2023 from https://www.kaggle.com/datasets/ikarus777/best-artworks-of-all-time.