Contact Us

Book, AI Terminology

AI Terminology For Business Leaders [Beta]

No items found.

Artificial Intelligence is a complicated, rapidly evolving field. Actual competency is mainly seen by “AI First” companies like Google, Amazon, Netflix, Baidu, and Facebook. However, artificial intelligence should not be ignored by everyone else. If you are unclear if AI will be a thing, check out the patent and investment trends over the last decade. AI is however a very noisy topic. One of the biggest barriers we see while advocating for AI adoption is a communication barrier. Many strategy leaders are intimidated by AI and either unaware or unsure about the terminology. If we want to democratize AI, we need to be able to communicate and share ideas. That requires a shared vocabulary. It requires a dictionary, if you will.

This is a dictionary for business/strategy leaders. Why? Because artificial intelligence requires support from the C-team to work. Otherwise projects and data will remain in silos with limited measurable impact. Compared to other glossaries out there, this dictionary will be different:

  1. This dictionary will be collaborative and require input from other technical and strategy practitioners. We encourage you to give input on key terms and techniques every strategy leader should know for adopting AI into their organization. With input from techies and non-techies alike, together we can make this an expansive resource.
  2. This AI dictionary will include resources to dive deeper into terms and techniques for further exploration, key issues, and training.

We need your help! We need help building out our list of key terminology you think every business/strategy leader needs to know. We also need to add great resources for readers to explore further. If you have something to share, please share! You can email your additions to ( or add them to the comment section as we spread this around. Let’s see where this goes.


Artificial Intelligence

Intelligent machines that perceive the world around them, form plans, and make decisions to achieve their goals. Its foundations include mathematics, logic, philosophy, probability, linguistics, neuroscience, and decision theory.


Artificial Narrow Intelligence (ANI)

Artificial intelligence which can effectively perform a narrowly defined task, such as computer vision, robotics, machine learning, and natural language processing.

Artificial General Intelligence (AGI)

Also known as strong AI, is an artificial intelligence that can successfully perform any intellectual task that a human being can, including learning, planning and decision-making under uncertainty, communicating in natural language, making jokes, manipulating people, trading stocks, or... reprogramming itself.

Artificial Superintelligence (ASI)

An ultra-intelligent machine that can surpass all the intellectual activities of any person, however clever.



is the amount of error introduced by approximating real-world phenomena with a simplified model. Dataset used in AI can be commonly biased towards race, gender, and ethnicity which can put projects at risk.




Data points, often words or titles used as inputs in machine learning to train or test an algorithm.


A model that outputs the probability of a categorical target variable Y belonging to a certain class. For example, is this a picture of a cat or a dog?


The goal of clustering is to create groups of data points such that points in different clusters are dissimilar while points within a cluster are similar. Clustering can be considered the most important unsupervised learning problem; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data, finding relationships between data points in dimensional space.  

Convolutional Neural Networks (CNNs)

CNNs are designed specifically for taking images as input, and are effective for computer vision tasks. They are also instrumental in deep reinforcement learning. CNNs are specifically inspired by the way animal visual cortices work



Deep learning

A subset of machine learning in Artificial Intelligence (AI) that has networks capable of learning unsupervised from data that is unstructured or unlabeled. A subset of machine learning, deep learning (DL) is distinct in that it is composed of multiple layers, typically between 10 and 100 (hence ‘deep’) in contrast to machine learning algorithms which tends to only have one or two. Each layer of the network is responsible for the detection of one characteristic about the inputs, and computations at each level base assumptions/build upon previous levels, which allows the network to “learn” more nuanced and abstract characteristics to determine the output.


Dimensionality Reduction

Dimensionality reduction looks a lot like compression. This is about trying to reduce the complexity of the data while keeping as much of the relevant structure as possible.



Or Transparent AI is an artificial intelligence (AI) whose actions can be easily understood by humans. It contrasts with the concept of the "black box" in machine learning, meaning the "interpretability" of the workings of complex algorithms, where even their designers cannot explain why the AI arrived at a specific decision.




Data points, often numerical values used as inputs in machine learning to train or test an algorithm.

Feature Engineering

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data. If feature engineering is done correctly, it increases the predictive power of machine learning algorithms by creating features from raw data that help facilitate the machine learning process.


Feature Extraction

Deep-learning models are capable of learning to focus on the right features by themselves, requiring little guidance from the programmer.

Feature Representation

Feature is an individual measurable property or characteristic (within data) of a phenomenon being observed.

Feature Scaling

A method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step.


Ground Truth Data

In machine learning, the term "ground truth" refers to the accuracy of the training set's classification for supervised learning techniques. This is used in statistical models to prove or disprove research hypotheses. The term "ground truthing" refers to the process of gathering the proper objective (provable) data for this test. Compare with gold standard.



A general setting of your model that can be increased or decreased (i.e. tuned) in order to improve performance. Represented in an equation by a lambda.


k-Nearest Neighbors (k-NN)

k-NN is to label a test data point x by finding the mean (or mode) of the k closest data points’ labels. You can measure similarity of data points by creating a vector representation of the items, and then compare the vectors using an appropriate distance metric (like the Euclidean distance, for example). k-NN is commonly used in concept search and product recommendations.


Machine learning

A subfield of artificial intelligence. An algorithm that allows computers to learn on their own. Machine Learning enables computers to identify patterns in observed data, build models that explain the world, and predict things without having explicit pre-programmed rules and models.




Once an machine learning algorithm has been trained on data, and learning occurs, the output of the process is a model. This can be used to make predictions.


Neural Network

A computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs (see Deep Learning). A NN model is designed to continually analyze data with a logic structure similar to how a human would draw conclusions. The design of a NN is inspired by the biological neural network of the human brain. This makes for machine intelligence that’s far more capable than that of standard machine learning models.




Learning a function that perfectly explains the training data that the model learned from, but doesn’t generalize well to unseen test data. Overfitting happens when a model over learns from the training data to the point that it starts picking up idiosyncrasies that aren’t representative of patterns in the real world.



Random Forests

An algorithm is a supervised classification algorithm. The decision tree is a decision support tool. It uses a tree-like graph to show the possible consequences. If you input a training dataset with targets and features into the decision tree, it will formulate some set of rules. These rules can be used to perform predictions.



Predict a continuous numerical value.

Recurrent neural networks (RNNs)

RNNs have a sense of built-in memory and are well-suited for language problems. They’re also important in reinforcement learning since they enable the agent to keep track of where things are and what happened historically even when those elements aren’t all visible at once.

Reinforcement Learning (RL)

Intelligent machines that learn goal-oriented behavior by trial and error in an environment that rewards or penalizes in response to the agents actions towards achieving that goal.


Supervised Learning

An algorithm that identifies patterns in data (inputs) to form heuristics and then automate or predict a certain output. Requires manual inputs and expected outputs model to work.

Structured Data

Refers to information with a high degree of organization, such that inclusion in a relational database is seamless and readily searchable by simple, straightforward search engine algorithms or other search operations; whereas unstructured data is essentially the opposite. A lack of structure makes compilation a time and energy-consuming task. It would be beneficial to a company across all business strata to find a mechanism of data analysis to reduce the costs unstructured data adds to the organization.



Training Data

A subset of data the represents the features of the desired output and used to train a model to make predictions or perform a desired behavior.

Test Data

A subset of data the represents the features of the desired output used to test a model for fidelity after training has occurred.

Target Variable

A label or numerical value we are trying to predict.

Transfer Learning

A practice in the field of machine learning to store knowledge gained by solving one problem and apply it to a different or related problem thereby reducing the need for additional training or compute. Transfer learning makes the development of machine learning more accessible and less resource intensive.



Unstructured Data

Information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. Email is an example of unstructured data; because while the busy inbox of a corporate human resources manager might be arranged by date, time or size; if it were truly fully structured, it would also be arranged by exact subject and content, with no deviation or spread – which is impractical, because people don’t generally speak about precisely one subject even in focused emails.


Unsupervised learning

is a type of machine learning algorithm used to draw inferences from datasets consisting of input data without labeled responses. The most common unsupervised learning method is cluster analysis, which is used for exploratory data analysis to find hidden patterns or grouping in data.



Instead of following the training data too closely (see overfit), a model that underfits the ignores the lessons from the training data and fails to learn the underlying relationship between inputs and outputs.




is how much your model's test error changes based on variation in the training data. It reflects the model's sensitivity to the idiosyncrasies of the data set it was trained on. For example, an overfit has high variance and low bias. However an underfit has low variance and high bias.


By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
X Icon