Artwork

Final Report

Link to the model codebase on Google Colab

Introduction/Background

With the rise of artwork generated by text-to-image models like DALL-E and Midjourney, it raises an interesting question about influences from source material. Generative art models are trained on large datasets including works from different art movements over time. Our team seeks to develop a solution for classifying an art piece into a stylistic period. This categorization algorithm could be used to identify and retrace the art styles mimicked in generative art. Previous approaches have used k-Nearest Neighbor and support vector machine algorithms to build their models (Falomi, et al., 2016).

Problem definition

Given an image, our project aims to classify the style of an art piece. Our goal is to create a machine learning model that consistently classifies the style of the images in any collection. The project is motivated by the rise of generative art produced by text-to-image models that can closely mimic the nuance of diverse art styles. These forged images can be fradulantly shared as authentic and original pieces of art. To preserve the integrity of historical artwork, our trained models will be able to identify which images have consistent ties back to a notable art style. In this way, we will be able to differentiate between unique, original pieces of art and images created through generative models based on historical artworks. In their research, Babak et al. (2016) discuss identifying artistic influence between two historical artists. What is new in our approach is using machine learning to identify influence between generative art and a historical artist.

Data collection

We are using a collection of artwork on Kaggle called the Best Artworks of All Time to train our model. The dataset includes approximately 16,800 images from 50 notable artists. The provided CSV file details more information on each artist. This includes their name, year range, genre (style), nationality, a short bio, and the number of paintings included in the dataset for each respective artist. Our model will learn to categorize artwork to its respective artist based on its training data. The model will then be able to report what artist is most likely associated with a new artwork and then a stylistic period can be concluded.

Original Resized Images

Original images from the Best Artworks of All Time dataset

Methods

Preprocessing

Preprocessing was broken into two sections, first being at import and the second being done by PyTorch transforms during training. The 50-class classical artwork dataset is very imbalanced and has images of varying sizes. Initial preprocessing conducted using the python PIL library involved a minimum crop following by a resize. The minimum crop reduced the larger dimension of height and width to the size of the other dimension, forming a square. Then, all images were resized to fit the smallest of the dataset: 204x204. This was used instead of a random crop to have consistency between trained model architectures.

Prior to training, the imported dataset of identical size was balanced by using geometric transformations like flips and rotation. One class in the dataset featured 877 unique images, while another only had 24. Using a vertical and horizontal flip and 90-degree rotations, classes with fewer images were able to reach 6x their original count without eliminating information through random crops or image augmentations. Finally, the dataset was imported using PyTorch’s Dataset and DataLoader classes for use with the training model. A native PyTorch transform that normalized the images to have an average of 0 was used, which is analogous to subtracting the mean value from each pixel in the image. The dataset was split into 85% training and 15% testing images and labels were converted from strings to a numeric index.

The purpose for the image transformations is to take care of classes in the dataset with a low number of data points. In our case, the artists Eugene Delacroix (31), Georges Seurat (43), Jackson Pollack (24) had low amounts of images in the dataset. These transformations allowed us to generate more data points for our model to train on.

Image Cropping

Original Resized Images

Image Min-Crop preprocessing; upscaled version only for visualization, not used in model

Image Normalization

Original Resized Images Normalized Resized Images

This is a showcase of how the image normalization changes the images. The histograms shown indicate a change in range from [0, 1] to a range centered around 0. The purpose of this normalization is to make the mean of all images 0 and the standard deviation of all images 1.

Metrics

Explaining our rationale for using certain metrics and how they are relevant to our model.

Loss vs Epoch

Loss is a common measure used to evaluate how well neural networks perform when working on training data. The loss shows the difference between the predicted outputs and the actual label values, which we should be minimizing to create an optimal model. The loss after each epoch should be decreasing throughout the training process. Visualizing and calculating how the loss changes after each epoch will give us a good idea of how successful the model is. Monitoring the loss vs. epoch will allow us to observe how the network is converging, and this value should rapidly decrease, as that would mean the model is adapting well to the training data, while a plateau in the graph may potentially indicate problems such as overfitting. This metric can also help us in improving our model, especially in terms of different hyperparameters (such as batch size or number of epochs), as we can see how changes in these may affect the loss, giving us ideas to fine tune. It will also assist in comparing both our models to see which one learns and converges better.

In a many of the DL frameworks like TensorFlow and PyTorch (used in our case), we optimize cross entropy loss and use that as a measure to see how efficient our implementation is. Because cross entropy is also logarithmic in nature, it gives us a very nuanced and precise way of measuring the accuracy of our model. Even a slight change can lead to significant drops in the deviations from the true distribution, which forces our model to be extremely accurate to get even moderately accurate predictions. This is relevant to our model because when we are analyzing paintings, we aim to reduce misclassification of genres and artists as much as possible. Especially in cases when we are trying to identify fraud, we must be overly particular in what similarity exists between the predicted and true distributions to give us the ground truth value. In essence, we will be working to get our cross entropy value as low as possible.

Confusion matrix

A confusion matrix is one of the most efficient ways to properly visualize how the model is performing for each artist batch specifically. Based on the batch that we want to sample, we can create True Positives, True Negatives, False Positives, and False Negatives categories and determine if the instances of a class are being correct or not, based on if they are in a class. Another reason we chose this is because we can measure the specificity rate, which is the true negative rate. We can thus measure the proportion of actual negative instances correctly predicted, and see if the model is biased toward predicting the majority vs minority classes. Finally, by analyzing TP, TN, FP, and FN, we know a confusion matrix is relevant to our model and the neural network system because it allows us to iteratively change the model to quickly identify misclassifications of identifying artists from paintings, which is not easily done with other metrics such as only looking at True Positives or area under the curve (detrimental for datasets that are imbalanced).

Model Architecture

Three different convolutional neural networks of varying complexities were implemented. First, using a very simple architecture of four convolutional layers and two fully connected layers was used to validate the preprocessing methods and develop a code structure for training and evaluating the models. All convolutions used a kernel size of 3x3 without padding. The first layer transformed from 3 to 32 channels and the second maintained the 32 channels and was then followed by a 2x2 maximum pooling layer. These three layers were repeated with 32 to 64 channels before being flattened and fed into two fully connected layers. This model was not particularly accurate and was especially helpful for determining layer size matching.

Next, a VGG architecture was implemented using the A configuration in Table 1 of Simonyan and Zisserman. This architecture used several 3x3 convolutions layers with padding that each fed into a rectified linear activation function and took the input from 3 channels to 512. The out of the final convolutional layer was fed into a series of three reducing fully connected layers before entering a softmax activation for the final output. Training for this model used a batchsize of 128, learning rate of 0.001 and 50 training epochs. Due to the limitations of Google Colab compute resources, cross validation was unable to be used to optimize these hyperparameters because Google would timeout the instance. Instead, batchsize and number of epochs were adjusted by hand to reach a reasonable result.

VGG Implementation Table

Table I: VGG Implementation Table

Finally, a ResNet-18 model described by He et al. was trained that uses a novel residual block method that implements shortcuts between layers. Convolutional layers 2 through 5 each use this idea by summing the outputs from two convolutional layers with the input to the first one, with the goal of being able to take advantage of the improvements from additional layers without overfitting or vanishing gradients. Each convolution is fed into a batch normalization before its activation function, and an average pool was used before entering the final fully connected classification layer. This architecture was trained with a batchsize of 128 over 50 epochs but increased the learning rate to 0.01. The original paper used a learning rate reduction technique on plateaus and considerably longer training epochs, but due to the limitations of Google Colab, the learning rate was kept high because the plateauing wasn’t seen.

ResNet Implementation Table

Table II: ResNet Implementation Table

A ResNet-50 and more complication Inception architecture was attempted, but both had considerably larger runtimes that couldn’t be done on a single instance of a free Google Colab unit. Instead, simpler versions were used.

VGG Example Architecture

Figure I: VGG Example Architecture

ResNet-18

Figure II: ResNet-18 (Source)

Link to the model codebase on Google Colab

Results and discussion

ResNet-18 Results

ResNet-Accuracy

Accuracy over Epoch for the ResNet-18 model

In order to demonstrate the quality of our Residual Neural Network Model, we chose to monitor accuracy over epochs. This allows us to understand how well the model is learning and identify problems including overfitting and fluctuations, thus allowing us to know when to stop training the model. For instance, in the two lines plotted above, the testing accuracy indicates how well the model generalizes to data that it has yet to see. In an ideal situation, the training and testing accuracies increase together, which is indicated by our chart above, showing the success of our model. However, it is also common to see a trend of plateauing, especially as seen in our testing data above; there is a slight plateau in comparison to the training data, indicating a chance of overfitting. Because we see this plateau, it is safe to assume that the model has learned most of the patterns that were in the training data, and the training or testing data accuracies might not incur significant improvements in accuracy from here onwards, specifically after the epoch 41 mark. Indicated by the final accuracy reaching about 40%, we have learned that our previous model performs better and gives us a higher performance, despite them both being supervised machine learning models.

ResNet-Loss

Loss over Epoch for the ResNet-18 model

One method we chose to evaluate the success of the Residual Neural Network Model is the Loss values and how they change as each epoch of training progresses. Looking at the loss values is a very beneficial measure to determine the success of our network, as the loss shows the difference between the predicted outputs and the actual label values, which we should be minimizing to create an optimal model. This means that we want the loss to be decreasing as the epochs continue. In the graph above, we have plotted both the training loss and testing loss of our network. Ideally, for a successful network, the training loss should be steadily decreasing, showing that the model is fitting the training data well and learning successfully, and the testing loss should also decrease as the model should be generalizing well to unseen data. Based on this metric, our model is performing quite well, as both of these measures are decreasing and remain reasonably close to each other in value. Our testing loss continues to decrease throughout each epoch, meaning our model does not show signs of overfitting, however they do not seem to converge, so there may be additional factors needed within our model to improve its performance. There is also room for improvement, as the testing loss is currently higher than the training loss, which ideally should not be the case.

Comparison of Models

VGG-Loss

Above we show the loss trend across all of the different batches for our model from before (VGG model) and our current model (ResNet). As we can see in the first graph, there is an initial high loss at the initialization of the training, indicating that the weights to start are randomly determined. Then, we see a decaying trend as the training goes through all of the epochs because the weights are updated accordingly with backpropagation, indicating that the loss decreases steadily since the model is better learning and fitting to the training data over time. After about batch number 6000, we see there is a moment where the graph starts to plateau and converge to a relatively low value, showing that there is stabilization occurring. In the second plot, we see that the graph is flat compared to the previous model. Although this does not inherently indicate much about the performance itself, we can make some conclusions from this: the steeper loss curve on the left shows that the model is able to learn quickly and follow through with rapid improvements to fit the training data, while the flatter loss curve on the right shows that the model is slowly converging since the data is more complex and relatively harder to fit to. Therefore, the model is learning slower and steadier and is not overfitting and converging simultaneously.

Overall, we want to minimize the loss as much as possible, as when the difference between actual and predicted values is the smallest, the model is the most successful. While the ResNet performs reasonably well, its lowest loss value only reaches around 3.4, while the VGG model’s lowest loss value reaches significantly lower than 0.5, which is a much better performance than the ResNet. The ResNet definitely shows some room for improvement and adjustment which may be accomplished by adjusting different hyperparameters.

VGG-16 Model Accuracies by Class

Artist Name	Percent Accuracy
Albrecht_Durer	21.0%
Alfred_Sisley	34.5%
Amedeo_Modigliani	48.1%
Andrei_Rublev	29.1%
Andy_Warhol	69.2%
Camille_Pissarro	42.0%
Caravaggio	41.6%
Claude_Monet	56.9%
Diego_Rivera	51.1%
Diego_Velazquez	53.5%
Edgar_Degas	65.5%
Edouard_Manet	46.4%
Edvard_Munch	44.4%
El_Greco	42.2%
Eugene_Delacroix	52.5%
Francisco_Goya	37.5%
Frida_Kahlo	12.5%
Georges_Seurat	38.6%
Giotto_di_Bondone	52.6%
Gustav_Klimt	43.1%
Gustave_Courbet	52.7%
Henri_Matisse	45.3%
Henri_Rousseau	29.0%
Henri_de_Toulouse-Lautrec	43.3%
Hieronymus_Bosch	49.1%
Jackson_Pollock	56.9%
Jan_van_Eyck	86.4%
Joan_Miro	23.4%
Kazimir_Malevich	75.0%
Leonardo_da_Vinci	59.6%
Marc_Chagall	35.9%
Michelangelo	40.6%
Mikhail_Vrubel	41.4%
Pablo_Picasso	28.1%
Paul_Cezanne	39.3%
Paul_Gauguin	25.9%
Paul_Klee	49.1%
Peter_Paul_Rubens	65.7%
Pierre-Auguste_Renoir	50.0%
Piet_Mondrian	23.2%
Pieter_Bruegel	66.2%
Raphael	31.1%
Rembrandt	53.8%
Rene_Magritte	14.1%
Salvador_Dali	42.6%
Sandro_Botticelli	68.7%
Titian	20.0%
Vasiliy_Kandinskiy	22.6%
Vincent_van_Gogh	37.5%
William_Turner	56.7%

ResNet-18 Model Accuracies by Class

Artist Name	Percent Accuracy
Albrecht_Durer	81.5%
Alfred_Sisley	77.2%
Amedeo_Modigliani	0.0 %
Andrei_Rublev	51.6%
Andy_Warhol	0.0%
Camille_Pissarro	51.9%
Caravaggio	0.0%
Claude_Monet	62.1%
Diego_Rivera	0.0%
Diego_Velazquez	70.0%
Edgar_Degas	75.0%
Edouard_Manet	57.7%
Edvard_Munch	0.0%
El_Greco	89.1%
Eugene_Delacroix	65.5%
Francisco_Goya	0.0%
Frida_Kahlo	0.0%
Georges_Seurat	0.0%
Giotto_di_Bondone	0.0%
Gustav_Klimt	0.0%
Gustave_Courbet	83.3%
Henri_Matisse	0.0%
Henri_Rousseau	0.0%
Henri_de_Toulouse-Lautrec	0.0%
Hieronymus_Bosch	0.0%
Jackson_Pollock	0.0%
Jan_van_Eyck	0.0%
Joan_Miro	0.0%
Kazimir_Malevich	73.1%
Leonardo_da_Vinci	0.0%
Marc_Chagall	0.0%
Michelangelo	0.0%
Mikhail_Vrubel	43.1%
Pablo_Picasso	83.3%
Paul_Cezanne	0.0%
Paul_Gauguin	82.4%
Paul_Klee	0.0%
Peter_Paul_Rubens	0.0%
Pierre-Auguste_Renoir	0.0%
Piet_Mondrian	0.0%
Pieter_Bruegel	52.4%
Raphael	0.0%
Rembrandt	83.9%
Rene_Magritte	75.4%
Salvador_Dali	0.0%
Sandro_Botticelli	0.0%
Titian	67.2%
Vasiliy_Kandinskiy	53.1%
Vincent_van_Gogh	70.8%
William_Turner	0.0%

Confusion Matrix

VGG-16

Click to open zoomable image

ResNet-18

Click to open zoomable image

Now we can compare our models’ performance using the best possible and most stringent performance metric, the confusion matrix. As we can see above, the diagonal of the depicts our predicted labels vs. the actual label values. The diagonals represent the number of times our model predicted the labels correctly, meaning that in the best case scenario, the values on each of the diagonals should be as high as possible, whereas the remaining values should be low since they represent inaccuracies. Using our previous model from the left, there are rarely any classes for which incorrect predictions outnumber correct predictions and the accuracy is quite high, unlike the model on the right, which is the second supervised model we chose to implement. This could be due to a variety of different reasons, but it is apparent that the VGG model excels due to its ability to capture intricate artistic nuances and features with its tailored architecture than the latter model.

Precision, Recall and F1-Scores

VGG-16 Model Precision, Recall and F1-Scores

Artist Name	Precision	Recall	F1-Score
Albrecht_Durer	0.32432432432432434	0.4528301886792453	0.37795275590551186
Alfred_Sisley	0.48717948717948717	0.31666666666666665	0.38383838383838376
Amedeo_Modigliani	0.2204724409448819	0.4666666666666667	0.29946524064171126
Andrei_Rublev	0.5217391304347826	0.2222222222222222	0.3116883116883117
Andy_Warhol	0.62	0.5535714285714286	0.5849056603773586
Camille_Pissarro	0.5490196078431373	0.25225225225225223	0.345679012
Caravaggio	0.5510204081632653	0.19852941176470587	0.29189189189189185
Claude_Monet	0.4782608695652174	0.18032786885245902	0.2619047619047619
Diego_Rivera	0.42105263157894735	0.38095238095238093	0.4
Diego_Velazquez	0.4393939393939394	0.48333333333333334	0.46031746031746035
Edgar_Degas	0.524390244	0.7166666666666667	0.6056338028169014
Edouard_Manet	0.31666666666666665	0.3333333333333333	0.3247863247863248
Edvard_Munch	0.4153846153846154	0.4909090909090909	0.45
El_Greco	0.6206896551724138	0.3103448275862069	0.41379310344827586
Eugene_Delacroix	0.49122807017543857	0.4827586206896552	0.4869565217391304
Francisco_Goya	0.5454545454545454	0.24	0.3333333333333333
Frida_Kahlo	0.23333333333333334	0.11864406779661017	0.15730337078651685
Georges_Seurat	0.36363636363636365	0.4827586206896552	0.4148148148148148
Giotto_di_Bondone	0.4868421052631579	0.6379310344827587	0.5522388059701493
Gustav_Klimt	0.38	0.3064516129032258	0.33928571428571425
Gustave_Courbet	0.31746031746031744	0.4	0.35398230088495575
Henri_Matisse	0.3717948717948718	0.5576923076923077	0.4461538461538461
Henri_Rousseau	0.4	0.14925373134328357	0.21739130434782608
Henri_de_Toulouse-Lautrec	0.28378378378378377	0.3	0.29166666666666663
Hieronymus_Bosch	0.4626865671641791	0.5166666666666667	0.4881889763779528
Jackson_Pollock	0.6268656716417911	0.5833333333333334	0.604316547
Jan_van_Eyck	0.8461538461538461	0.5789473684210527	0.6875
Joan_Miro	0.175	0.2	0.18666666666666665
Kazimir_Malevich	0.5084745762711864	0.6382978723404256	0.5660377358490567
Leonardo_da_Vinci	0.4125	0.5789473684210527	0.4817518248175182
Marc_Chagall	0.3684210526315789	0.2	0.25925925925925924
Michelangelo	0.3333333333333333	0.625	0.43478260869565216
Mikhail_Vrubel	0.272	0.5230769230769231	0.3578947368421053
Pablo_Picasso	0.5	0.14285714285714285	0.22222222222222224
Paul_Cezanne	0.5384615384615384	0.1	0.16867469879518074
Paul_Gauguin	0.20353982300884957	0.40350877192982454	0.27058823529411763
Paul_Klee	0.34545454545454546	0.3220338983050847	0.33333333333333326
Peter_Paul_Rubens	0.3148148148148148	0.25757575757575757	0.2833333333333333
Pierre-Auguste_Renoir	0.38095238095238093	0.38095238095238093	0.38095238095238093
Piet_Mondrian	0.21052631578947367	0.12698412698412698	0.15841584158415842
Pieter_Bruegel	0.4444444444444444	0.4745762711864407	0.4590163934426229
Raphael	0.21794871794871795	0.3953488372093023	0.2809917355371901
Rembrandt	0.7083333333333334	0.5573770491803278	0.6238532110091743
Rene_Magritte	0.21641791044776118	0.4393939393939394	0.29
Salvador_Dali	0.6206896551724138	0.5538461538461539	0.5853658536585366
Sandro_Botticelli	0.43820224719101125	0.8125	0.5693430656934307
Titian	0.2602739726027397	0.296875	0.2773722627737227
Vasiliy_Kandinskiy	0.3125	0.1724137931034483	0.22222222222222224
Vincent_van_Gogh	0.094339623	0.16666666666666666	0.12048192771084337
William_Turner	0.35526315789473684	0.4153846153846154	0.3829787234042554
accuracy	0.3796234028244788	0.3796234028244788	0.3796234028244788
macro avg	0.4106144987762023	0.3899332054177764	0.37601054370366105
weighted avg	0.4159938498220489	0.3796234028244788	0.371166493

ResNet-18 Model Precision, Recall and F1-Scores

Artist Name	Precision	Recall	F1-Score
Albrecht_Durer	0.540816327	0.815384615	0.650306748
Alfred_Sisley	0.352	0.771929825	0.483516484
Amedeo_Modigliani	0	0	0
Andrei_Rublev	0.230392157	0.516483516	0.318644068
Andy_Warhol	0	0	0
Camille_Pissarro	0.397058824	0.519230769	0.45
Caravaggio	0	0	0
Claude_Monet	0.25	0.620689655	0.356435644
Diego_Rivera	0	0	0
Diego_Velazquez	0.35	0.7	0.466666667
Edgar_Degas	0.327731092	0.75	0.456140351
Edouard_Manet	0.260606061	0.577181208	0.35908142
Edvard_Munch	0	0	0
El_Greco	0.476744186	0.891304348	0.621212121
Eugene_Delacroix	0.208092486	0.654545455	0.315789474
Francisco_Goya	0	0	0
Frida_Kahlo	0	0	0
Georges_Seurat	0	0	0
Giotto_di_Bondone	0	0	0
Gustav_Klimt	0	0	0
Gustave_Courbet	0.35971223	0.833333333	0.502512563
Henri_Matisse	0	0	0
Henri_Rousseau	0	0	0
Henri_de_Toulouse-Lautrec	0	0	0
Hieronymus_Bosch	0	0	0
Jackson_Pollock	0	0	0
Jan_van_Eyck	0	0	0
Joan_Miro	0	0	0
Kazimir_Malevich	0.550724638	0.730769231	0.628099174
Leonardo_da_Vinci	0	0	0
Marc_Chagall	0	0	0
Michelangelo	0	0	0
Mikhail_Vrubel	0.189189189	0.430769231	0.262910798
Pablo_Picasso	0.333333333	0.833333333	0.476190476
Paul_Cezanne	0	0	0
Paul_Gauguin	0.545454545	0.823529412	0.65625
Paul_Klee	0	0	0
Peter_Paul_Rubens	0	0	0
Pierre-Auguste_Renoir	0	0	0
Piet_Mondrian	0	0	0
Pieter_Bruegel	0.289473684	0.523809524	0.372881356
Raphael	0	0	0
Rembrandt	0.361111111	0.838709677	0.504854369
Rene_Magritte	0.311594203	0.754385965	0.441025641
Salvador_Dali	0	0	0
Sandro_Botticelli	0	0	0
Titian	0.294520548	0.671875	0.40952381
Vasiliy_Kandinskiy	0.116591928	0.530612245	0.191176471
Vincent_van_Gogh	0.264367816	0.707692308	0.384937238
William_Turner	0	0	0
accuracy	0.301613988	0.301613988	0.301613988
macro avg	0.140190287	0.289911373	0.186163097
weighted avg	0.144601939	0.301613988	0.192898474

Specifically, we can look at the precision and recall to calculate the f1-score. Precision is a measure of how many of the positive predictions are correct, in other words, how many TPs there are. On the other hand, recall is the number of positive cases the classifier predicts correctly divided by the total number of positive cases. It is commonly known and reported that an f1-score close to 0.7 or higher is a good f1-score, which is calculated from the precision and recall values. For our initial VGG model, it is apparent that our f1-score is in line with our precision, which translates to an average f1-score of about 0.4 after the addition of layers. To a lesser accurate degree is the second supervised model we chose to implement. Its f1-score of the ResNet calculates to about 0.18, which is more than half as poor as the 0.4 from the VGG model. However, it still reflects the better performance of the previous model, and we can see even more drastic results with the calculated precision scores, which is about 0.41 for the VGG model and about 0.31 for the ResNet-18 model. This could be due to a variety of different reasons, but it is apparent that the VGG model excels due to its ability to capture intricate artistic nuances and features with its tailored architecture than the latter model.

Conclusion & Next Steps

Classifying art pieces into styles and genres is no easy task, especially when there are cases of potentially fraudulent and forged art pieces. Given that any art piece could be attempting to mimic the style of a well-known artist, but the chance of having a forgery is extremely rare in the mass amounts of data collected, our machine learning model must be trained extensively and have extremely small margin of error such that it is able to identify such forgeries. This is extremely important, as identifying fraud and forgery in artwork can be a very time-extensive process, and if a model identifies a legitimate artwork as a fraudulent piece, this can be an expensive process to rectify the error.

After experimenting with both different variants of the ResNet architecture, a Naive PyTorch image classification model and Resnet-18, we found that ResNet-18 took longer to run (three-fold) than the Naive PyTorch image classification model, due to its increase complexity and, likely because we were validating the model at every epoch so that we can visualize the evolution of training versus testing accuracies and losses. We also implemented and attempted to run the ResNet-50 model, but the accuracy did not continually increase. Therefore, we did not move forward with it and instead pursued ResNet-18, with which we discovered that it is much less prone to overfitting since it has a relatively lower ability to memorize all of the training data, making it much easier to optimize when necessary. In the future, we hope to employ even more complex supervised learning models for our data, including BERT (Bidirectional Encoder Representations from Transformers), Inception Net, as well as visualizations with Random Forests to hopefully increase the accuracy significantly. We also hope to potentially expand our models from style and artist classification to include new art examples and even forgeries to see if our models can accurately identify legitimate paintings from the forgeries.

Overall, for the problem of classifying artwork based on artists and stylistic genres, from our analysis, we conclude that using a VGG model is the most effective and accurate route to gather the data based on the training we were able to perform. If we had had increased time and resources to train the ResNet-18 model for a significantly longer time, it very well may have performed better than the VGG model. Not only can our models be used in classifying artists, but it can also be adapted to be used in various fields including the stock market, bioinformatics, and even in the medical field to classify equipment and materials in times of emergency.

Updated timeline

Link to Gantt chart

Contribution table

Name	Contribution
Abdullah Ahmed	Data collection
Alec Albrecht	Methods
Carlos Hernandez	Data preprocessing
Ankita Somu	Results & Discussion
Sanjana Srinivasan	Conclusion & Next Steps

References

Very Deep Convolutional Networks for Large-Scale Image Recognition
Simonyan, K., & Zisserman, A. “Very Deep Convolutional Networks for Large-Scale Image Recognition.” (2015).

Deep Residual Learning for Image Recognition
K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.

Toward Automated Discovery of Artistic Influence
Babak Saleh, Kanako Abe, Ravneet Singh Arora, and Ahmed Elgammal. 2016. Toward automated discovery of artistic influence. Multimedia Tools Appl. 75, 7 (April 2016), 3565–3591. https://doi.org/10.1007/s11042-014-2193-x

Categorizing Paintings in Art Styles Based on Qualitative Color Descriptors, Quantitative Global Features and Machine Learning (QArt-Learn)
Falomir, Zoe, et al. “Categorizing Paintings in Art Styles Based on Qualitative Color Descriptors, Quantitative Global Features and Machine Learning (QArt-Learn).” Expert Systems with Applications, vol. 97, 2018, pp. 83–94, https://doi.org/10.1016/j.eswa.2017.11.056.

Using Machine Learning for Identification of Art Paintings
Blessing, Alexander. “Using Machine Learning for Identification of Art Paintings.” (2010).

Discerning Art Works through Active Machine Learning
Z. Yu, “Discerning Art Works through Active Machine Learning,” 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China, 2022, pp. 1002-1006, doi: 10.1109/CVIDLICCEA56201.2022.9824180.

Best Artworks of All Time
Icaro (2019, February). Best Artworks of All Time, Version 1. Retrieved October 6, 2023 from https://www.kaggle.com/datasets/ikarus777/best-artworks-of-all-time.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Foundation Models, Transformers, BERT and GPT