The seven sins of Artificial Intelligence

Harness the potential of AI. We help you avoid the most common mistakes when implementing it with clear examples and practical advice.
plus big
zoom
Technology

This own and exclusive content offers you:

REPORT

Sin I. Ignorance

Failure to correctly state the problem to be solved and believing that AI is applicable in all situations.

To talk about Artificial Intelligence, we must first talk about the human intelligence that it tries to imitate. In this case, intelligence is defined as the ability to acquire knowledge from information in the environment. There are three ways in which humans do this (and not all of them have been imitated by AI):

1. The deductionis based on combining already known ideas and syllogisms. E.g. if we know that “All men are mortal” and “Socrates is a man”, then we can deduce: “Socrates is mortal”. Interestingly, deduction does not allow us to acquire knowledge. It is simply based on combining already known rules of the world, generating new rules. This system was the first to try to be implemented, but it had little success.

2. Induction: it is based on generating rules from experience.

E.g. if I only see black crows, I can generate a rule “All crows are black”. If today we see an albino crow, this rule will be invalidated but will become “Most crows are black”.

All what we call AI in the 21st century, current supervised models need training, a process by which the model is given examples to generate a new rule by induction. They rely exclusively on experience. This makes it impossible for it to act in a general way for all problems in the world, only for those for which it has been trained (e.g. if I want a model to detect chairs, I will first have to show it thousands of different chairs during training and it will only work for that. It will not recognise people)

3. Abduction: is based on acquiring knowledge through assumptions and probabilities. E.g.: If we see the floor wet, we will be able to indicate that someone must have spilled water. This inference is made by chance, without training, and with a high failure rate. In the presence of the puddle, we infer this possibility but we know that it is not the only one: there may be humidity, a broken tap, condensation from the window… we keep several explanations at the same time while we find the correct one.

Currently, no AI is capable of imitating this kind of logic, which would allow us to move away from the dictatorship of training and have a generalist artificial intelligence.

If we want to set up our own AI, we will have to think about what kind of problem we are facing. we intend to attack, or how to adapt it to look like one of these. If we can do this, we will have the first step to building our own AI. The fundamental problems that AI can solve are:

  • Classification: labelling a piece of data with one or more categories to which it belongs.
  • Association: grouping several pieces of information together because they resemble each other.
  • Prediction: predict the following data using previous data as a reference.
  • Optimisation: taking different ways of responding to certain data to achieve an objective.

So, if we want to set up a Netflix-type film recommendation model, we can approach it in two different ways:

  • As an association problem, grouping the available films with their tags and description, and returning the films most similar to the last one the user has watched.
  • As a prediction problem, trying to predict the next movie a user wants to watch given their previous history.

If we do not have our solution associated with any of these problems… the solution may not require AI, and there may be other, simpler solutions. Or worse, it may be impossible to solve.

Sin II. Pride

Thinking that the effectiveness of the AI model will be sufficient to solve any problem.

Every AI model has a failure rate and it is important to consider the severity of these failures in relation to the problem you are trying to solve.

Ex: Severity of a Netflix failure to recommend movies vs. failure of a self-driving car. The latter cannot be implemented even if the failure rate is negligible or much lower than that of Netflix, because it would cost us a life.

It is crucial to conduct a risk analysis to determine whether the implementation of an AI model is appropriate in terms of cost and benefit.

For this reason, if the severity of the failure is high, it is important to change the problem to fall within the safe zone..
Esto puede ser realizado cambiando de modelo de IA para reducir errores, ya sea reentrenándolo o usando otros modelos actualizados del estado del arte; o reducir la gravedad del error cambiando el problema.
Ej: no podemos lograr la conducción totalmente automática pero si la conducción asistida, que ayude al humano a conducir y cuyo fallo no implique un accidente.

Sin III. Data poverty

Not having enough data to train or improve AI models.

The quality and quantity of training data are critical to the success of an AI model. It is necessary to ensure that the data are:

  • Enough. Depending on the model, we may need more or less data to train it. As reference values we can have:
  • Unsupervised or Zero-Shot models >Do not require training data (although their results may not fit our solution).
  • Fine Tunning >Retrain an already trained AI model to adapt it to our solution. As it is already trained, they usually need less data. Typically 1000 data per category.
  • New >training Train a model from scratch, without any previous training. Typically 100000 data per category.
  • Coherent. It is important that the data is similar to what the model will later encounter during inference. E.g. in the chair recognition model, it is necessary that the images of chairs in the training have the quality and size of the chairs that will be encountered.
  • Legal. We can use external databases to train our models, but many are licensed for research use only, and do not allow commercial, off-the-shelf solutions.

It is quite common not to find any public database compatible with our solution. In these cases, we have a few last resources that we can follow:

  • Generate our own dataset.
    Existen herramientas open-source como LabelStudio para etiquetar tus datos de entrenamiento por tu cuenta, también hay negocios asociados a este proceso como Amazon Mechanical Turk.
    El problema es que este proceso requerirá bastante tiempo y esfuerzo.
  • Use synthetic data. Studies are beginning to show the effectiveness of using generative AIs to generate the database artificially. This system is useful and fast but dangerous, as the model can inherit the biases and failures of the generative AI used.
  • Alternatives without data.
    Como decíamos, existen modelos Zero-Shot, basados en asociaciones de patrones lingüísticos y/o visuales que permiten poder usarse sin necesidad de entrenamiento.
    Los dos únicos problemas es que suelen ser modelos de inferencia lenta (lo que dificulta implementar algunas soluciones) y que no siempre se adaptan a lo que necesitamos montar.

Sin IV. Impatience

Not taking into account AI inference times, which limits some business solutions.

It is important to consider the time it takes for an AI model to return results, especially in real-time applications. We can differentiate AI solutions into two types according to their average inference time:

  • Real-time solutions.
    La inferencia tarda menos de 1 segundo.
    Los modelos que cumplen esta condición, salvo excepciones, suelen ser más sencillos y menos precisos, por lo que dan más posibilidades de fallo.
    Además, las soluciones en las que se combinan varios modelos son más limitadas, ya que el tiempo de inferencia de todos los modelos se suman entre sí.
  • Asynchronous solutions.
    Si la inferencia tarda más de un segundo, estamos ante una solución que debe ser implementada de manera asíncrona, y el usuario no debe esperar los resultados de manera inmediata.
    Ej: modelo de HeyGen capaz de traducir videos, cambiando el audio al idioma asignado y haciendo que encaje con la voz y el movimiento de labios del actor en escena.
    Tardaba, mínimo, 3 minutos por cada segundo de video introducido, un tiempo algo prohibitivo para una sociedad acostumbrada a las soluciones inmediatas.


    Large and complex models in demos tend to have long inference times, although there are a few tricks to avoid this:
  • Cache the results so as not to be called twice.
  • Autocomplete with already generated results similar to what was requested.
  • Include something to do while waiting.

Sin V. Greed

Implement AI models that are too heavy and costly, difficult to scale.

It is essential to consider the cost of maintaining an AI model on a cloud platform, such as AWS or Azure, including the price of infrastructure and resource consumption. Each machine has its own characteristics regarding power, memory and price per hour, which we will have to take into account.

Real-time solutions are always more expensive, as they require the model to be available 24 hours a day. Also, heavier and more complex models will increase the cost of our solution, as they require more powerful and expensive machines. Another more subtle detail is the number of recurring calls, if many people call the model at the same time, it will be necessary to hire more simultaneous machines, raising the cost to double or triple.

In this sense, to implement an AI model, we will always have to make a budget of how much it will cost to maintain this model in the cloud. The cost may be too high for the benefits it can provide, giving us a surprise at the end of the month.

Sin VI. Idleness

Failure to monitor the quality of the model over time and to account for variations in its effectiveness.

AI models can deteriorate over time due to the phenomenon of model drifting. E.g. a content recommendation model will not return new movies because they are not included in its training, and its results become obsolete over time.

It is essential to establish evaluation metrics (e.g. the number of clicks on recommended content, the percentage of undetected images in a chair detector, etc.) and to monitor the performance of the model continuously to detect and correct possible problems with retraining.

Sin VII. Obscurantism

Not knowing how the implemented AI model works, using it as a black box.

Even if they have complex mathematics underneath, it is not complicated to know how a given AI model works in general terms.

It is necessary to understand the strengths and weaknesses of AI models in order to use them correctly.
– ChatGPT and other LMMs are predictive language models, which return combinations of linguistic patterns and keywords associated with given questions. Being a model focused on autocompleting our sentences, it depends too much on the database it has, so it can neither give absolute truths nor be used as a source of truth.

– Image classifiers associate visual patterns to one or more categories through training, but they do not necessarily have to be the most consistent for a human being. E.g.: in 2016, a model trained to differentiate between wolves and Husky dogs was published. As they are so similar to each other, it was thought that this model could be useful for distinguishing subtle visual aspects. Unfortunately, after several tests, it was found that the model learned to distinguish the snow in the background of the image, not the animal. If there was no snow, it was a wolf. If not, it was a Husky.

– Generative AIs form an image, video or audio using visual/auditory patterns based on a prompt. This content is generated using noise as part of its composition, so the results are random and change each time. In the case of rare words, the lack of images in the database causes the results to be worse or non-existent.

Disclosure and transparency about how the models work are essential to avoid misuse of AI.

At Multimarkts we apply our own AI, taking into account all of the above so that we can offer the best experience to our customers, minimising errors. We invite you to meet us and see a real and very profitable demonstration of the use of AI.

Frequently asked questions

In order to be able to: 1. To improve your experience on our platform by personalising your browsing experience according to your preferences. 2. Send you relevant information by email or other means that may be of interest to you (little and you can unsubscribe at any time). 3. Provide you with marketing communications that we think you may find useful. You can find more details about our privacy policy. on this page.

Completely free of charge

We want to share knowledge to help you in your day-to-day work and for you to get to know us a little better.

The seven sins of Artificial Intelligence

The seven sins of Artificial Intelligence