Chapter 4: Glossary


Key definitions to guide you on your path to AI literacy!

Here are the definitions of a few key terms that often come up when you are reading about artificial intelligence.

  • Artificial Intelligence (AI)
    • A field of computer science that aims to use computers to do complex things that have historically been considered to require human intelligence. This includes tasks like prediction, reasoning, making decisions, and solving complex problems.

  • Machine learning
    • A field of AI that focuses on developing predictive algorithms that can perform tasks that they were not explicitly programmed to do. Machine learning provides the basis for many technologies that are considered AI today.

  • Model
    • A model is software that applies algorithms to sets of data in order to “learn” to recognize patterns. It may later use those patterns to make predictions around new data.

  • Supervised Machine Learning
    • A type of machine learning where a model is trained on annotated data with labeled inputs and outputs. The model recognizes patterns that connect inputs to outputs, and then uses those patterns to predict outputs for new data. An example of this could be a model that understands if a photo has a dog or not. If it was trained on one million images of dogs, each labeled “dog,” and one million images that did not have dogs, each labeled “no dog,” it could learn to predict if an image it had never “seen” before was of a dog or not.

  • Unsupervised Machine Learning
    • A type of machine learning where a model does not use data. An unsupervised model finds patterns without being given explicit categories. The Financial Times used an unsupervised learning model to notice patterns in their articles and suggest new “clusters” the articles could be categorized into.

  • Reinforced Machine Learning
    • A type of machine learning where a model learns to understand patterns through feedback. By receiving positive and negative reinforcement, the model learns to predict outputs that will receive a prioritized type of feedback. An example of this is an algorithm that plays a game. Through winning and losing many games, the model will recognize what patterns (moves and strategies) will lead to the most positive result (getting a lot of points/winning). 

  • Training
    • The process of exposing a machine learning model to large sets of data in order to “teach” it to recognize patterns, often with the goal of later having it apply those patterns to data it has not “seen” before.

  • Training data
    • The data with which a machine learning model is trained to recognize patterns. In many popular AI models, these training sets are collected through massive scrapes of public internet data. These scrapes often include copyrighted content like writing, news articles, images, and artwork. Many AI companies do not allow their training datasets to be publicly viewed.

  • Fine Tuning
    • The process of taking an existing machine learning model and giving it more specific data to train it to complete a particular kind of task.

  • Deep learning
    • A subset of advanced machine learning that involves using complex, layered ML models called neural networks, which are named after the neurons in the human brain. 

  • Vectorization
    • In AI, vectorization is the process of turning non-numerical data like text into “vectors,” which are numerical representations which can be more efficiently processed by an algorithm.

  • Natural Language Processing
    • A branch of computer science aimed at training computers to process and generate human language in the form of writing and speech. NLP combines computational linguistics with machine learning and deep learning models. 

  • Large Language Model
    • An LLM is a model built to output realistic text in natural language. ChatGPT uses the LLM GPT 3.5, created by OpenAI.

  • ChatGPT
    • A popular generative AI system that generates text from text prompts. ChatGPT is owned by OpenAI, and uses their Large Language Model GPT 3.5. It is important to note that ChatGPT is not the name of the Large Language Model itself, but the name of the chat-style web interface that allows users to use the LLM. 

  • Generative AI
    • Systems that are used to create new pieces of media. These models are usually trained on massive sets of data that are scraped from the internet, and “learn” to recognize patterns in that data. Generative AI systems take user prompts and use them to predict the most likely outcome for a given prompt. ChatGPT (text), Bard (text), Midjourney (images), DALLE (images) are all generative AI systems.

  • Synthetic Content
    • Text, images, videos, audio, and other media that is created using Generative AI. These pieces of content result from an AI system being given a prompt, and then predicting the most likely outcome for that prompt based on patterns it is trained to recognize in a set of training data.

  • Artificial General Intelligence
    • A hypothetical autonomous computer program that could learn to complete any task that humans are capable of. AGI does not exist today, and the possibility of AGI ever being created is heavily debated. Some AI companies, like OpenAI, see the creation of AGI as a primary goal.

  • “Black Box”
    • Many AI models are colloquially referred to as “black boxes,” meaning that their inner workings are very difficult to interpret and explain. In a “black box” model, it is very hard to trace or explain the connection between the input and output.

  • Algorithmic Bias
    • Discrimination and bias that occurs as a result of algorithmic outputs. This happens as the result of the design of an algorithm, what is represented within the training data, how the system is built and interfaced, and what decisions are made based on the outcomes of an algorithm. One major example of algorithmic bias is predictive policing models, which are often trained on biased arrest data in order to predict where crime is most likely to happen.

  • Anthropomorphization
    • The attribution of human characteristics to inanimate objects. This occurs when people describe the output from a generative AI system as “friendly,” give human names  and pronouns to AI chatbots, or have them “speak” in natural sounding voices. Referring to an AI system as if it has its own thoughts, feelings, motivations, goals, or dreams are all ways we anthropomorphize these systems. These words are very useful in explaining what computers do in simple terms, but should be used carefully. 

  • Hallucinations
    • AI “hallucinations” refer to instances where an AI system produces an output that is untrue or unintended. Because these systems are trained to recognize patterns in finite datasets and produce outputs based on those patterns, outputs that are not aligned with the intentions of the user are quite inevitable.

  • Technochauvinism
    • A term coined by data journalist and computer scientist Meredith Broussard to refer to the pervasive assumption that technological solutions are inherently superior to others.