Abdullah Ahmed, Alec Albrecht, Carlos Hernandez, Ankita Somu, Sanjana Srinivasan
Link to the model codebase on Google Colab
With the rise of artwork generated by text-to-image models like DALL-E and Midjourney, it raises an interesting question about influences from source material. Generative art models are trained on large datasets including works from different art movements over time. Our team seeks to develop a solution for classifying an art piece into a stylistic period. This categorization algorithm could be used to identify and retrace the art styles mimicked in generative art. Previous approaches have used k-Nearest Neighbor and support vector machine algorithms to build their models (Falomi, et al., 2016).
Given an image, our project aims to classify the style of an art piece. Our goal is to create a machine learning model that consistently classifies the style of the images in any collection. The project is motivated by the rise of generative art produced by text-to-image models that can closely mimic the nuance of diverse art styles. These forged images can be fradulantly shared as authentic and original pieces of art. To preserve the integrity of historical artwork, our trained models will be able to identify which images have consistent ties back to a notable art style. In this way, we will be able to differentiate between unique, original pieces of art and images created through generative models based on historical artworks. In their research, Babak et al. (2016) discuss identifying artistic influence between two historical artists. What is new in our approach is using machine learning to identify influence between generative art and a historical artist.
We are using a collection of artwork on Kaggle called the Best Artworks of All Time to train our model. The dataset includes approximately 16,800 images from 50 notable artists. The provided CSV file details more information on each artist. This includes their name, year range, genre (style), nationality, a short bio, and the number of paintings included in the dataset for each respective artist. Our model will learn to categorize artwork to its respective artist based on its training data. The model will then be able to report what artist is most likely associated with a new artwork and then a stylistic period can be concluded.
Original images from the Best Artworks of All Time dataset
Preprocessing was broken into two sections, first being at import and the second being done by PyTorch transforms during training. The 50-class classical artwork dataset is very imbalanced and has images of varying sizes. Initial preprocessing conducted using the python PIL library involved a minimum crop following by a resize. The minimum crop reduced the larger dimension of height and width to the size of the other dimension, forming a square. Then, all images were resized to fit the smallest of the dataset: 204x204. This was used instead of a random crop to have consistency between trained model architectures.
Prior to training, the imported dataset of identical size was balanced by using geometric transformations like flips and rotation. One class in the dataset featured 877 unique images, while another only had 24. Using a vertical and horizontal flip and 90-degree rotations, classes with fewer images were able to reach 6x their original count without eliminating information through random crops or image augmentations. Finally, the dataset was imported using PyTorch’s Dataset and DataLoader classes for use with the training model. A native PyTorch transform that normalized the images to have an average of 0 was used, which is analogous to subtracting the mean value from each pixel in the image. The dataset was split into 85% training and 15% testing images and labels were converted from strings to a numeric index.
The purpose for the image transformations is to take care of classes in the dataset with a low number of data points. In our case, the artists Eugene Delacroix (31), Georges Seurat (43), Jackson Pollack (24) had low amounts of images in the dataset. These transformations allowed us to generate more data points for our model to train on.
Image Min-Crop preprocessing; upscaled version only for visualization, not used in model
This is a showcase of how the image normalization changes the images. The histograms shown indicate a change in range from [0, 1] to a range centered around 0. The purpose of this normalization is to make the mean of all images 0 and the standard deviation of all images 1.
Explaining our rationale for using certain metrics and how they are relevant to our model.
Loss vs Epoch
Loss is a common measure used to evaluate how well neural networks perform when working on training data. The loss shows the difference between the predicted outputs and the actual label values, which we should be minimizing to create an optimal model. The loss after each epoch should be decreasing throughout the training process. Visualizing and calculating how the loss changes after each epoch will give us a good idea of how successful the model is. Monitoring the loss vs. epoch will allow us to observe how the network is converging, and this value should rapidly decrease, as that would mean the model is adapting well to the training data, while a plateau in the graph may potentially indicate problems such as overfitting. This metric can also help us in improving our model, especially in terms of different hyperparameters (such as batch size or number of epochs), as we can see how changes in these may affect the loss, giving us ideas to fine tune. It will also assist in comparing both our models to see which one learns and converges better.
In a many of the DL frameworks like TensorFlow and PyTorch (used in our case), we optimize cross entropy loss and use that as a measure to see how efficient our implementation is. Because cross entropy is also logarithmic in nature, it gives us a very nuanced and precise way of measuring the accuracy of our model. Even a slight change can lead to significant drops in the deviations from the true distribution, which forces our model to be extremely accurate to get even moderately accurate predictions. This is relevant to our model because when we are analyzing paintings, we aim to reduce misclassification of genres and artists as much as possible. Especially in cases when we are trying to identify fraud, we must be overly particular in what similarity exists between the predicted and true distributions to give us the ground truth value. In essence, we will be working to get our cross entropy value as low as possible.
Confusion matrix
A confusion matrix is one of the most efficient ways to properly visualize how the model is performing for each artist batch specifically. Based on the batch that we want to sample, we can create True Positives, True Negatives, False Positives, and False Negatives categories and determine if the instances of a class are being correct or not, based on if they are in a class. Another reason we chose this is because we can measure the specificity rate, which is the true negative rate. We can thus measure the proportion of actual negative instances correctly predicted, and see if the model is biased toward predicting the majority vs minority classes. Finally, by analyzing TP, TN, FP, and FN, we know a confusion matrix is relevant to our model and the neural network system because it allows us to iteratively change the model to quickly identify misclassifications of identifying artists from paintings, which is not easily done with other metrics such as only looking at True Positives or area under the curve (detrimental for datasets that are imbalanced).
Three different convolutional neural networks of varying complexities were implemented. First, using a very simple architecture of four convolutional layers and two fully connected layers was used to validate the preprocessing methods and develop a code structure for training and evaluating the models. All convolutions used a kernel size of 3x3 without padding. The first layer transformed from 3 to 32 channels and the second maintained the 32 channels and was then followed by a 2x2 maximum pooling layer. These three layers were repeated with 32 to 64 channels before being flattened and fed into two fully connected layers. This model was not particularly accurate and was especially helpful for determining layer size matching.
Next, a VGG architecture was implemented using the A configuration in Table 1 of Simonyan and Zisserman. This architecture used several 3x3 convolutions layers with padding that each fed into a rectified linear activation function and took the input from 3 channels to 512. The out of the final convolutional layer was fed into a series of three reducing fully connected layers before entering a softmax activation for the final output. Training for this model used a batchsize of 128, learning rate of 0.001 and 50 training epochs. Due to the limitations of Google Colab compute resources, cross validation was unable to be used to optimize these hyperparameters because Google would timeout the instance. Instead, batchsize and number of epochs were adjusted by hand to reach a reasonable result.
Table I: VGG Implementation Table
Finally, a ResNet-18 model described by He et al. was trained that uses a novel residual block method that implements shortcuts between layers. Convolutional layers 2 through 5 each use this idea by summing the outputs from two convolutional layers with the input to the first one, with the goal of being able to take advantage of the improvements from additional layers without overfitting or vanishing gradients. Each convolution is fed into a batch normalization before its activation function, and an average pool was used before entering the final fully connected classification layer. This architecture was trained with a batchsize of 128 over 50 epochs but increased the learning rate to 0.01. The original paper used a learning rate reduction technique on plateaus and considerably longer training epochs, but due to the limitations of Google Colab, the learning rate was kept high because the plateauing wasn’t seen.
Table II: ResNet Implementation Table
A ResNet-50 and more complication Inception architecture was attempted, but both had considerably larger runtimes that couldn’t be done on a single instance of a free Google Colab unit. Instead, simpler versions were used.
Figure I: VGG Example Architecture
Figure II: ResNet-18 (Source)
Link to the model codebase on Google Colab
Accuracy over Epoch for the ResNet-18 model
In order to demonstrate the quality of our Residual Neural Network Model, we chose to monitor accuracy over epochs. This allows us to understand how well the model is learning and identify problems including overfitting and fluctuations, thus allowing us to know when to stop training the model. For instance, in the two lines plotted above, the testing accuracy indicates how well the model generalizes to data that it has yet to see. In an ideal situation, the training and testing accuracies increase together, which is indicated by our chart above, showing the success of our model. However, it is also common to see a trend of plateauing, especially as seen in our testing data above; there is a slight plateau in comparison to the training data, indicating a chance of overfitting. Because we see this plateau, it is safe to assume that the model has learned most of the patterns that were in the training data, and the training or testing data accuracies might not incur significant improvements in accuracy from here onwards, specifically after the epoch 41 mark. Indicated by the final accuracy reaching about 40%, we have learned that our previous model performs better and gives us a higher performance, despite them both being supervised machine learning models.
Loss over Epoch for the ResNet-18 model
One method we chose to evaluate the success of the Residual Neural Network Model is the Loss values and how they change as each epoch of training progresses. Looking at the loss values is a very beneficial measure to determine the success of our network, as the loss shows the difference between the predicted outputs and the actual label values, which we should be minimizing to create an optimal model. This means that we want the loss to be decreasing as the epochs continue. In the graph above, we have plotted both the training loss and testing loss of our network. Ideally, for a successful network, the training loss should be steadily decreasing, showing that the model is fitting the training data well and learning successfully, and the testing loss should also decrease as the model should be generalizing well to unseen data. Based on this metric, our model is performing quite well, as both of these measures are decreasing and remain reasonably close to each other in value. Our testing loss continues to decrease throughout each epoch, meaning our model does not show signs of overfitting, however they do not seem to converge, so there may be additional factors needed within our model to improve its performance. There is also room for improvement, as the testing loss is currently higher than the training loss, which ideally should not be the case.
Above we show the loss trend across all of the different batches for our model from before (VGG model) and our current model (ResNet). As we can see in the first graph, there is an initial high loss at the initialization of the training, indicating that the weights to start are randomly determined. Then, we see a decaying trend as the training goes through all of the epochs because the weights are updated accordingly with backpropagation, indicating that the loss decreases steadily since the model is better learning and fitting to the training data over time. After about batch number 6000, we see there is a moment where the graph starts to plateau and converge to a relatively low value, showing that there is stabilization occurring. In the second plot, we see that the graph is flat compared to the previous model. Although this does not inherently indicate much about the performance itself, we can make some conclusions from this: the steeper loss curve on the left shows that the model is able to learn quickly and follow through with rapid improvements to fit the training data, while the flatter loss curve on the right shows that the model is slowly converging since the data is more complex and relatively harder to fit to. Therefore, the model is learning slower and steadier and is not overfitting and converging simultaneously.
Overall, we want to minimize the loss as much as possible, as when the difference between actual and predicted values is the smallest, the model is the most successful. While the ResNet performs reasonably well, its lowest loss value only reaches around 3.4, while the VGG model’s lowest loss value reaches significantly lower than 0.5, which is a much better performance than the ResNet. The ResNet definitely shows some room for improvement and adjustment which may be accomplished by adjusting different hyperparameters.
Artist Name | Percent Accuracy |
---|---|
Albrecht_Durer | 21.0% |
Alfred_Sisley | 34.5% |
Amedeo_Modigliani | 48.1% |
Andrei_Rublev | 29.1% |
Andy_Warhol | 69.2% |
Camille_Pissarro | 42.0% |
Caravaggio | 41.6% |
Claude_Monet | 56.9% |
Diego_Rivera | 51.1% |
Diego_Velazquez | 53.5% |
Edgar_Degas | 65.5% |
Edouard_Manet | 46.4% |
Edvard_Munch | 44.4% |
El_Greco | 42.2% |
Eugene_Delacroix | 52.5% |
Francisco_Goya | 37.5% |
Frida_Kahlo | 12.5% |
Georges_Seurat | 38.6% |
Giotto_di_Bondone | 52.6% |
Gustav_Klimt | 43.1% |
Gustave_Courbet | 52.7% |
Henri_Matisse | 45.3% |
Henri_Rousseau | 29.0% |
Henri_de_Toulouse-Lautrec | 43.3% |
Hieronymus_Bosch | 49.1% |
Jackson_Pollock | 56.9% |
Jan_van_Eyck | 86.4% |
Joan_Miro | 23.4% |
Kazimir_Malevich | 75.0% |
Leonardo_da_Vinci | 59.6% |
Marc_Chagall | 35.9% |
Michelangelo | 40.6% |
Mikhail_Vrubel | 41.4% |
Pablo_Picasso | 28.1% |
Paul_Cezanne | 39.3% |
Paul_Gauguin | 25.9% |
Paul_Klee | 49.1% |
Peter_Paul_Rubens | 65.7% |
Pierre-Auguste_Renoir | 50.0% |
Piet_Mondrian | 23.2% |
Pieter_Bruegel | 66.2% |
Raphael | 31.1% |
Rembrandt | 53.8% |
Rene_Magritte | 14.1% |
Salvador_Dali | 42.6% |
Sandro_Botticelli | 68.7% |
Titian | 20.0% |
Vasiliy_Kandinskiy | 22.6% |
Vincent_van_Gogh | 37.5% |
William_Turner | 56.7% |
Artist Name | Percent Accuracy |
---|---|
Albrecht_Durer | 81.5% |
Alfred_Sisley | 77.2% |
Amedeo_Modigliani | 0.0 % |
Andrei_Rublev | 51.6% |
Andy_Warhol | 0.0% |
Camille_Pissarro | 51.9% |
Caravaggio | 0.0% |
Claude_Monet | 62.1% |
Diego_Rivera | 0.0% |
Diego_Velazquez | 70.0% |
Edgar_Degas | 75.0% |
Edouard_Manet | 57.7% |
Edvard_Munch | 0.0% |
El_Greco | 89.1% |
Eugene_Delacroix | 65.5% |
Francisco_Goya | 0.0% |
Frida_Kahlo | 0.0% |
Georges_Seurat | 0.0% |
Giotto_di_Bondone | 0.0% |
Gustav_Klimt | 0.0% |
Gustave_Courbet | 83.3% |
Henri_Matisse | 0.0% |
Henri_Rousseau | 0.0% |
Henri_de_Toulouse-Lautrec | 0.0% |
Hieronymus_Bosch | 0.0% |
Jackson_Pollock | 0.0% |
Jan_van_Eyck | 0.0% |
Joan_Miro | 0.0% |
Kazimir_Malevich | 73.1% |
Leonardo_da_Vinci | 0.0% |
Marc_Chagall | 0.0% |
Michelangelo | 0.0% |
Mikhail_Vrubel | 43.1% |
Pablo_Picasso | 83.3% |
Paul_Cezanne | 0.0% |
Paul_Gauguin | 82.4% |
Paul_Klee | 0.0% |
Peter_Paul_Rubens | 0.0% |
Pierre-Auguste_Renoir | 0.0% |
Piet_Mondrian | 0.0% |
Pieter_Bruegel | 52.4% |
Raphael | 0.0% |
Rembrandt | 83.9% |
Rene_Magritte | 75.4% |
Salvador_Dali | 0.0% |
Sandro_Botticelli | 0.0% |
Titian | 67.2% |
Vasiliy_Kandinskiy | 53.1% |
Vincent_van_Gogh | 70.8% |
William_Turner | 0.0% |
Now we can compare our models’ performance using the best possible and most stringent performance metric, the confusion matrix. As we can see above, the diagonal of the depicts our predicted labels vs. the actual label values. The diagonals represent the number of times our model predicted the labels correctly, meaning that in the best case scenario, the values on each of the diagonals should be as high as possible, whereas the remaining values should be low since they represent inaccuracies. Using our previous model from the left, there are rarely any classes for which incorrect predictions outnumber correct predictions and the accuracy is quite high, unlike the model on the right, which is the second supervised model we chose to implement. This could be due to a variety of different reasons, but it is apparent that the VGG model excels due to its ability to capture intricate artistic nuances and features with its tailored architecture than the latter model.
Artist Name | Precision | Recall | F1-Score |
---|---|---|---|
Albrecht_Durer | 0.32432432432432434 | 0.4528301886792453 | 0.37795275590551186 |
Alfred_Sisley | 0.48717948717948717 | 0.31666666666666665 | 0.38383838383838376 |
Amedeo_Modigliani | 0.2204724409448819 | 0.4666666666666667 | 0.29946524064171126 |
Andrei_Rublev | 0.5217391304347826 | 0.2222222222222222 | 0.3116883116883117 |
Andy_Warhol | 0.62 | 0.5535714285714286 | 0.5849056603773586 |
Camille_Pissarro | 0.5490196078431373 | 0.25225225225225223 | 0.345679012 |
Caravaggio | 0.5510204081632653 | 0.19852941176470587 | 0.29189189189189185 |
Claude_Monet | 0.4782608695652174 | 0.18032786885245902 | 0.2619047619047619 |
Diego_Rivera | 0.42105263157894735 | 0.38095238095238093 | 0.4 |
Diego_Velazquez | 0.4393939393939394 | 0.48333333333333334 | 0.46031746031746035 |
Edgar_Degas | 0.524390244 | 0.7166666666666667 | 0.6056338028169014 |
Edouard_Manet | 0.31666666666666665 | 0.3333333333333333 | 0.3247863247863248 |
Edvard_Munch | 0.4153846153846154 | 0.4909090909090909 | 0.45 |
El_Greco | 0.6206896551724138 | 0.3103448275862069 | 0.41379310344827586 |
Eugene_Delacroix | 0.49122807017543857 | 0.4827586206896552 | 0.4869565217391304 |
Francisco_Goya | 0.5454545454545454 | 0.24 | 0.3333333333333333 |
Frida_Kahlo | 0.23333333333333334 | 0.11864406779661017 | 0.15730337078651685 |
Georges_Seurat | 0.36363636363636365 | 0.4827586206896552 | 0.4148148148148148 |
Giotto_di_Bondone | 0.4868421052631579 | 0.6379310344827587 | 0.5522388059701493 |
Gustav_Klimt | 0.38 | 0.3064516129032258 | 0.33928571428571425 |
Gustave_Courbet | 0.31746031746031744 | 0.4 | 0.35398230088495575 |
Henri_Matisse | 0.3717948717948718 | 0.5576923076923077 | 0.4461538461538461 |
Henri_Rousseau | 0.4 | 0.14925373134328357 | 0.21739130434782608 |
Henri_de_Toulouse-Lautrec | 0.28378378378378377 | 0.3 | 0.29166666666666663 |
Hieronymus_Bosch | 0.4626865671641791 | 0.5166666666666667 | 0.4881889763779528 |
Jackson_Pollock | 0.6268656716417911 | 0.5833333333333334 | 0.604316547 |
Jan_van_Eyck | 0.8461538461538461 | 0.5789473684210527 | 0.6875 |
Joan_Miro | 0.175 | 0.2 | 0.18666666666666665 |
Kazimir_Malevich | 0.5084745762711864 | 0.6382978723404256 | 0.5660377358490567 |
Leonardo_da_Vinci | 0.4125 | 0.5789473684210527 | 0.4817518248175182 |
Marc_Chagall | 0.3684210526315789 | 0.2 | 0.25925925925925924 |
Michelangelo | 0.3333333333333333 | 0.625 | 0.43478260869565216 |
Mikhail_Vrubel | 0.272 | 0.5230769230769231 | 0.3578947368421053 |
Pablo_Picasso | 0.5 | 0.14285714285714285 | 0.22222222222222224 |
Paul_Cezanne | 0.5384615384615384 | 0.1 | 0.16867469879518074 |
Paul_Gauguin | 0.20353982300884957 | 0.40350877192982454 | 0.27058823529411763 |
Paul_Klee | 0.34545454545454546 | 0.3220338983050847 | 0.33333333333333326 |
Peter_Paul_Rubens | 0.3148148148148148 | 0.25757575757575757 | 0.2833333333333333 |
Pierre-Auguste_Renoir | 0.38095238095238093 | 0.38095238095238093 | 0.38095238095238093 |
Piet_Mondrian | 0.21052631578947367 | 0.12698412698412698 | 0.15841584158415842 |
Pieter_Bruegel | 0.4444444444444444 | 0.4745762711864407 | 0.4590163934426229 |
Raphael | 0.21794871794871795 | 0.3953488372093023 | 0.2809917355371901 |
Rembrandt | 0.7083333333333334 | 0.5573770491803278 | 0.6238532110091743 |
Rene_Magritte | 0.21641791044776118 | 0.4393939393939394 | 0.29 |
Salvador_Dali | 0.6206896551724138 | 0.5538461538461539 | 0.5853658536585366 |
Sandro_Botticelli | 0.43820224719101125 | 0.8125 | 0.5693430656934307 |
Titian | 0.2602739726027397 | 0.296875 | 0.2773722627737227 |
Vasiliy_Kandinskiy | 0.3125 | 0.1724137931034483 | 0.22222222222222224 |
Vincent_van_Gogh | 0.094339623 | 0.16666666666666666 | 0.12048192771084337 |
William_Turner | 0.35526315789473684 | 0.4153846153846154 | 0.3829787234042554 |
accuracy | 0.3796234028244788 | 0.3796234028244788 | 0.3796234028244788 |
macro avg | 0.4106144987762023 | 0.3899332054177764 | 0.37601054370366105 |
weighted avg | 0.4159938498220489 | 0.3796234028244788 | 0.371166493 |
Artist Name | Precision | Recall | F1-Score |
---|---|---|---|
Albrecht_Durer | 0.540816327 | 0.815384615 | 0.650306748 |
Alfred_Sisley | 0.352 | 0.771929825 | 0.483516484 |
Amedeo_Modigliani | 0 | 0 | 0 |
Andrei_Rublev | 0.230392157 | 0.516483516 | 0.318644068 |
Andy_Warhol | 0 | 0 | 0 |
Camille_Pissarro | 0.397058824 | 0.519230769 | 0.45 |
Caravaggio | 0 | 0 | 0 |
Claude_Monet | 0.25 | 0.620689655 | 0.356435644 |
Diego_Rivera | 0 | 0 | 0 |
Diego_Velazquez | 0.35 | 0.7 | 0.466666667 |
Edgar_Degas | 0.327731092 | 0.75 | 0.456140351 |
Edouard_Manet | 0.260606061 | 0.577181208 | 0.35908142 |
Edvard_Munch | 0 | 0 | 0 |
El_Greco | 0.476744186 | 0.891304348 | 0.621212121 |
Eugene_Delacroix | 0.208092486 | 0.654545455 | 0.315789474 |
Francisco_Goya | 0 | 0 | 0 |
Frida_Kahlo | 0 | 0 | 0 |
Georges_Seurat | 0 | 0 | 0 |
Giotto_di_Bondone | 0 | 0 | 0 |
Gustav_Klimt | 0 | 0 | 0 |
Gustave_Courbet | 0.35971223 | 0.833333333 | 0.502512563 |
Henri_Matisse | 0 | 0 | 0 |
Henri_Rousseau | 0 | 0 | 0 |
Henri_de_Toulouse-Lautrec | 0 | 0 | 0 |
Hieronymus_Bosch | 0 | 0 | 0 |
Jackson_Pollock | 0 | 0 | 0 |
Jan_van_Eyck | 0 | 0 | 0 |
Joan_Miro | 0 | 0 | 0 |
Kazimir_Malevich | 0.550724638 | 0.730769231 | 0.628099174 |
Leonardo_da_Vinci | 0 | 0 | 0 |
Marc_Chagall | 0 | 0 | 0 |
Michelangelo | 0 | 0 | 0 |
Mikhail_Vrubel | 0.189189189 | 0.430769231 | 0.262910798 |
Pablo_Picasso | 0.333333333 | 0.833333333 | 0.476190476 |
Paul_Cezanne | 0 | 0 | 0 |
Paul_Gauguin | 0.545454545 | 0.823529412 | 0.65625 |
Paul_Klee | 0 | 0 | 0 |
Peter_Paul_Rubens | 0 | 0 | 0 |
Pierre-Auguste_Renoir | 0 | 0 | 0 |
Piet_Mondrian | 0 | 0 | 0 |
Pieter_Bruegel | 0.289473684 | 0.523809524 | 0.372881356 |
Raphael | 0 | 0 | 0 |
Rembrandt | 0.361111111 | 0.838709677 | 0.504854369 |
Rene_Magritte | 0.311594203 | 0.754385965 | 0.441025641 |
Salvador_Dali | 0 | 0 | 0 |
Sandro_Botticelli | 0 | 0 | 0 |
Titian | 0.294520548 | 0.671875 | 0.40952381 |
Vasiliy_Kandinskiy | 0.116591928 | 0.530612245 | 0.191176471 |
Vincent_van_Gogh | 0.264367816 | 0.707692308 | 0.384937238 |
William_Turner | 0 | 0 | 0 |
accuracy | 0.301613988 | 0.301613988 | 0.301613988 |
macro avg | 0.140190287 | 0.289911373 | 0.186163097 |
weighted avg | 0.144601939 | 0.301613988 | 0.192898474 |
Specifically, we can look at the precision and recall to calculate the f1-score. Precision is a measure of how many of the positive predictions are correct, in other words, how many TPs there are. On the other hand, recall is the number of positive cases the classifier predicts correctly divided by the total number of positive cases. It is commonly known and reported that an f1-score close to 0.7 or higher is a good f1-score, which is calculated from the precision and recall values. For our initial VGG model, it is apparent that our f1-score is in line with our precision, which translates to an average f1-score of about 0.4 after the addition of layers. To a lesser accurate degree is the second supervised model we chose to implement. Its f1-score of the ResNet calculates to about 0.18, which is more than half as poor as the 0.4 from the VGG model. However, it still reflects the better performance of the previous model, and we can see even more drastic results with the calculated precision scores, which is about 0.41 for the VGG model and about 0.31 for the ResNet-18 model. This could be due to a variety of different reasons, but it is apparent that the VGG model excels due to its ability to capture intricate artistic nuances and features with its tailored architecture than the latter model.
Classifying art pieces into styles and genres is no easy task, especially when there are cases of potentially fraudulent and forged art pieces. Given that any art piece could be attempting to mimic the style of a well-known artist, but the chance of having a forgery is extremely rare in the mass amounts of data collected, our machine learning model must be trained extensively and have extremely small margin of error such that it is able to identify such forgeries. This is extremely important, as identifying fraud and forgery in artwork can be a very time-extensive process, and if a model identifies a legitimate artwork as a fraudulent piece, this can be an expensive process to rectify the error.
After experimenting with both different variants of the ResNet architecture, a Naive PyTorch image classification model and Resnet-18, we found that ResNet-18 took longer to run (three-fold) than the Naive PyTorch image classification model, due to its increase complexity and, likely because we were validating the model at every epoch so that we can visualize the evolution of training versus testing accuracies and losses. We also implemented and attempted to run the ResNet-50 model, but the accuracy did not continually increase. Therefore, we did not move forward with it and instead pursued ResNet-18, with which we discovered that it is much less prone to overfitting since it has a relatively lower ability to memorize all of the training data, making it much easier to optimize when necessary. In the future, we hope to employ even more complex supervised learning models for our data, including BERT (Bidirectional Encoder Representations from Transformers), Inception Net, as well as visualizations with Random Forests to hopefully increase the accuracy significantly. We also hope to potentially expand our models from style and artist classification to include new art examples and even forgeries to see if our models can accurately identify legitimate paintings from the forgeries.
Overall, for the problem of classifying artwork based on artists and stylistic genres, from our analysis, we conclude that using a VGG model is the most effective and accurate route to gather the data based on the training we were able to perform. If we had had increased time and resources to train the ResNet-18 model for a significantly longer time, it very well may have performed better than the VGG model. Not only can our models be used in classifying artists, but it can also be adapted to be used in various fields including the stock market, bioinformatics, and even in the medical field to classify equipment and materials in times of emergency.
Name | Contribution |
---|---|
Abdullah Ahmed | Data collection |
Alec Albrecht | Methods |
Carlos Hernandez | Data preprocessing |
Ankita Somu | Results & Discussion |
Sanjana Srinivasan | Conclusion & Next Steps |
Very Deep Convolutional Networks for Large-Scale Image Recognition
Simonyan, K., & Zisserman, A. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” (2015).
Deep Residual Learning for Image Recognition
K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
Toward Automated Discovery of Artistic Influence
Babak Saleh, Kanako Abe, Ravneet Singh Arora, and Ahmed Elgammal. 2016. Toward automated discovery of artistic influence. Multimedia Tools Appl. 75, 7 (April 2016), 3565–3591. https://doi.org/10.1007/s11042-014-2193-x
Categorizing Paintings in Art Styles Based on Qualitative Color Descriptors, Quantitative Global Features and Machine Learning (QArt-Learn)
Falomir, Zoe, et al. “Categorizing Paintings in Art Styles Based on Qualitative Color Descriptors, Quantitative Global Features and Machine Learning (QArt-Learn).” Expert Systems with Applications, vol. 97, 2018, pp. 83–94, https://doi.org/10.1016/j.eswa.2017.11.056.
Using Machine Learning for Identification of Art Paintings
Blessing, Alexander. “Using Machine Learning for Identification of Art Paintings.” (2010).
Discerning Art Works through Active Machine Learning
Z. Yu, “Discerning Art Works through Active Machine Learning,” 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China, 2022, pp. 1002-1006, doi: 10.1109/CVIDLICCEA56201.2022.9824180.
Best Artworks of All Time
Icaro (2019, February). Best Artworks of All Time, Version 1. Retrieved October 6, 2023 from https://www.kaggle.com/datasets/ikarus777/best-artworks-of-all-time.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding