Полная версия
Artificial Intelligence Glossarium: 1000 terms
Bigram (Биграмм) – An N-gram in which N=2.
Binary choice regression model (Регрессионная модель бинарного выбора) is a regression model in which the dependent variable is dichotomous or binary. Dependent variable can take only two values and mean, for example, belonging to a particular group.
Binary classification (Двоичная, бинарная или дихотомическая классификация) — A type of classification task that outputs one of two mutually exclusive classes. For example, a machine learning model that evaluates email messages and outputs either “spam” or “not spam” is a binary classifier.
Binary format (Двоичный формат) Any file format in which information is encoded in some format other than a standard character-encoding scheme. A file written in binary format contains information that is not displayable as characters. Software capable of understanding the particular binary format method of encoding information must be used to interpret the information in a binary-formatted file. Binary formats are often used to store more information in less space than possible in a character format file. They can also be searched and analyzed more quickly by appropriate software. A file written in binary format could store the number “7” as a binary number (instead of as a character) in as little as 3 bits (i.e., 111), but would more typically use 4 bits (i.e., 0111). Binary formats are not normally portable, however. Software program files are written in binary format. Examples of numeric data files distributed in binary format include the IBM-binary versions of the Center for Research in Security Prices files and the U.S. Department of Commerce’s National Trade Data Bank on CD-ROM. The International Monetary Fund distributes International Financial Statistics in a mixed-character format and binary (packed-decimal) format. SAS and SPSS store their system files in binary format. [82]
Binary number (Двоичное число) A number written using binary notation which only uses zeros and ones. Example: Decimal number 7 in binary notation is: 111. [83]
Binary tree (Бинарное дерево) – A tree data structure in which each node has at most two children, which are referred to as the left child and the right child. A recursive definition using just set theory notions is that a (non-empty) binary tree is a tuple (L, S, R), where L and R are binary trees or the empty set and S is a singleton set. Some authors allow the binary tree to be the empty set as well. [84]
Binning (Биннинг) is the process of combining charge from neighboring pixels in a CCD during readout. This process is performed prior to digitization in the CCD chip using dedicated serial and parallel register control. The two main benefits of binning are improved signal-to-noise ratio (SNR) and the ability to increase frame rates, albeit at the cost of reduced spatial resolution.
Bioconservatism (Биоконсерватизм) (a portmanteau of biology and conservatism) is a stance of hesitancy and skepticism regarding radical technological advances, especially those that seek to modify or enhance the human condition. Bioconservatism is characterized by a belief that technological trends in today’s society risk compromising human dignity, and by opposition to movements and technologies including transhumanism, human genetic modification, “strong” artificial intelligence, and the technological singularity. Many bioconservatives also oppose the use of technologies such as life extension and preimplantation genetic screening [85,86].
Biometrics (Биометрия) is a people recognition system, one or more physical or behavioral traits.
Black box (Чёрный ящик) – A description of some deep learning system. They take an input and provide an output, but the calculations that occur in between are not easy for humans to interpret.
Blackboard system (Системы, использующие принцип классной доски) – An artificial intelligence approach based on the blackboard architectural model, where a common knowledge base, the “blackboard”, is iteratively updated by a diverse group of specialist knowledge sources, starting with a problem specification and ending with a solution. Each knowledge source updates the blackboard with a partial solution when its internal constraints match the blackboard state. In this way, the specialists work together to solve the problem.
BLEU (Bilingual Evaluation Understudy) (Алгоритм BLEU) – A score between 0.0 and 1.0, inclusive, indicating the quality of a translation between two human languages (for example, between English and Russian). A BLEU score of 1.0 indicates a perfect translation; a BLEU score of 0.0 indicates a terrible translation.
Blockchain (Блокчейн) is algorithms and protocols for decentralized storage and processing of transactions structured as a sequence of linked blocks without the possibility of their subsequent change.
Boltzmann machine (Also stochastic Hopfield network with hidden units) (Машина Больцмана) – A type of stochastic recurrent neural network and Markov random field. Boltzmann machines can be seen as the stochastic, generative counterpart of Hopfield networks [87].
Boolean neural network (Булевая нейронная сеть) – is an artificial neural network approach which only consists of Boolean neurons (and, or, not). Such an approach reduces the use of memory space and computation time. It can be implemented to the programmable circuits such as FPGA (Field-Programmable Gate Array or Integrated circuit).
Boolean satisfiability problem (Also propositional satisfiability problem; abbreviated SATISFIABILITY or SAT) (Проблема булевой выполнимости) – is the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE. If this is the case, the formula is called satisfiable. On the other hand, if no such assignment exists, the function expressed by the formula is FALSE for all possible variable assignments and the formula is unsatisfiable. [88].
Boosting (Бустинг) – A Machine Learning ensemble meta-algorithm for primarily reducing bias and variance in supervised learning, and a family of Machine Learning algorithms that convert weak learners to strong ones.
Bounding Box (Ограничивающая рамка) – Commonly used in image or video tagging, this is an imaginary box drawn on visual information. The contents of the box are labeled to help a model recognize it as a distinct type of object.
Brain technology (Also self-learning know-how system) (Мозговая технология) – A technology that employs the latest findings in neuroscience. The term was first introduced by the Artificial Intelligence Laboratory in Zurich, Switzerland, in the context of the ROBOY project. Brain Technology can be employed in robots, know-how management systems and any other application with self-learning capabilities. In particular, Brain Technology applications allow the visualization of the underlying learning architecture often coined as “know-how maps”.
Brain – computer interface (BCI, Интерфейс мозг-компьютер), sometimes called a brain – machine interface (BMI), is a direct communication pathway between the brain’s electrical activity and an external device, most commonly a computer or robotic limb. Research on brain – computer interface began in the 1970s by Jacques Vidal at the University of California, Los Angeles (UCLA) under a grant from the National Science Foundation, followed by a contract from DARPA. The Vidal’s 1973 paper marks the first appearance of the expression brain – computer interface in scientific literature [89].
Brain-inspired computing (Мозгоподобные вычисления) – calculations on brain-like structures, brain-like calculations using the principles of the brain (see also neurocomputing, neuromorphic engineering).
Branching factor (коэффициент ветвления дерева) – In computing, tree data structures, and game theory, the number of children at each node, the outdegree. If this value is not uniform, an average branching factor can be calculated.
Broadband (Широкополосный доступ) refers to various high-capacity transmission technologies that transmit data, voice, and video across long distances and at high speeds. Common mediums of transmission include coaxial cables, fiber optic cables, and radio waves. [90]
Brute-force search (Also exhaustive search or generate and test) (Полный перебор) – A very general problem-solving technique and algorithmic paradigm that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem’s statement.
Bucketing (Разделение на сегменты) – Converting a (usually continuous) feature into multiple binary features called buckets or bins, typically based on value range.
Byte (Байт) Eight bits. A byte is simply a chunk of 8 ones and zeros. For example: 01000001 is a byte. A computer often works with groups of bits rather than individual bits and the smallest group of bits that a computer usually works with is a byte. A byte is equal to one column in a file written in character format. [91]
“C”
Caffe – is short for Convolutional Archi- tecture for Fast Feature Embedding which is an open-source deep learning framework de- veloped in Berkeley AI Research. It supports many different deep learning architectures and GPU-based acceleration computation kernels.
Calibration layer (Калибровочный слой) – A post-prediction adjustment, typically to account for prediction bias. The adjusted predictions and probabilities should match the distribution of an observed set of labels.
Candidate generation (Генерация кандидатов) — The initial set of recommendations chosen by a recommendation system. [92].
Candidate sampling (Выборка кандидатов) — A training-time optimization in which a probability is calculated for all the positive labels, using, for example, softmax, but only for a random sample of negative labels. For example, if we have an example labeled beagle and dog candidate sampling computes the predicted probabilities and corresponding loss terms for the beagle and dog class outputs in addition to a random subset of the remaining classes (cat, lollipop, fence). The idea is that the negative classes can learn from less frequent negative reinforcement as long as positive classes always get proper positive reinforcement, and this is indeed observed empirically. The motivation for candidate sampling is a computational efficiency win from not computing predictions for all negatives.
Canonical Formats (Канонические форматы) In information technology, canonicalization is the process of making something [conform] with some specification… and is in an approved format. Canonicalization may sometimes mean generating canonical data from noncanonical data. Canonical formats are widely supported and considered to be optimal for long-term preservation. [93]
Capsule neural network (CapsNet) (Капсульная нейронная сеть) – A machine learning system that is a type of artificial neural network (ANN) that can be used to better model hierarchical relationships. [94] The approach is an attempt to more closely mimic biological neural organization [95]
Case-Based Reasoning (CBR) (Рассуждения по прецедентам) – is a way to solve a new problem by using solutions to similar problems. It has been formalized to a process consisting of case retrieve, solution reuse, solution revise, and case retention [96].
Categorical data (Категориальные данные) — Features having a discrete set of possible values. For example, consider a categorical feature named house style, which has a discrete set of three possible values: Tudor, ranch, colonial. By representing house style as categorical data, the model can learn the separate impacts of Tudor, ranch, and colonial on house price. Sometimes, values in the discrete set are mutually exclusive, and only one value can be applied to a given example. For example, a car maker categorical feature would probably permit only a single value (Toyota) per example. Other times, more than one value may be applicable. A single car could be painted more than one different color, so a car color categorical feature would likely permit a single example to have multiple values (for example, red and white). Categorical features are sometimes called discrete features. Contrast with numerical data [97].
Center for Technological Competence (Центр технологических компетенций) is an organization that owns the results, tools for conducting fundamental research and platform solutions available to market participants to create applied solutions (products) on their basis. The Technology Competence Center can be a separate organization or be part of an application technology holding company.
Central Processing Units (CPU) (Центральный процессор) is a von Neumann cyclic processor designed to execute complex computer programs.
Centralized control (Централизованное управление) is a process in which control signals are generated in a single control center and transmitted from it to numerous control objects.
Centroid (Центроид) – The center of a cluster as determined by a k-means or k-median algorithm. For instance, if k is 3, then the k-means or k-median algorithm finds 3 centroids.
Centroid-based clustering (Кластеризация на основе центроида) – A category of clustering algorithms that organizes data into nonhierarchical clusters. k-means is the most widely used centroid-based clustering algorithm. Contrast with hierarchical clustering algorithms.
Character format (Формат символов)
Any file format in which information is encoded as characters using only a standard character-encoding scheme. A file written in “character format” contains only those bytes that are prescribed in the encoding scheme as corresponding to the characters in the scheme (e.g., alphabetic and numeric characters, punctuation marks, and spaces). [98]
Chatbot (Чат-бот) is a software application designed to simulate human conversation with users via text or speech. Also referred to as virtual agents, interactive agents, digital assistants, or conversational AI, chatbots are often integrated into applications, websites, or messaging platforms to provide support to users without the use of live human agents. Chatbots originally started out by offering users simple menus of choices, and then evolved to react to particular keywords. “But humans are very inventive in their use of language,” says Forrester’s McKeon-White. Someone looking for a password reset might say they’ve forgotten their access code, or are having problems getting into their account. “There are a lot of different ways to say the same thing,” he says. This is where AI comes in. Natural language processing is a subset of machine learning that enables a system to understand the meaning of written or even spoken language, even where there is a lot of variation in the phrasing. To succeed, a chatbot that relies on AI or machine learning needs first to be trained using a data set. In general, the bigger the training data set, and the narrower the domain, the more accurate and helpful a chatbot will be [99].
Checkpoint (Контрольная точка) — Data that captures the state of the variables of a model at a particular time. Checkpoints enable exporting model weights, as well as performing training across multiple sessions. Checkpoints also enable training to continue past errors (for example, job preemption). Note that the graph itself is not included in a checkpoint.
Chip (Чип) – an electronic microcircuit of arbitrary complexity, made on a semiconductor substrate and placed in a non-separable case or without it, if included in the micro assembly.
Class (Класс) — One of a set of enumerated target values for a label. For example, in a binary classification model that detects spam, the two classes are spam and not spam. In a multi-class classification model that identifies dog breeds, the classes would be poodle, beagle, pug, and so on.
Classification (Классификация). Classification problems use an algorithm to accurately assign test data into specific categories, such as separating apples from oranges. Or, in the real world, supervised learning algorithms can be used to classify spam in a separate folder from your inbox. Linear classifiers, support vector machines, decision trees and random forest are all common types of classification algorithms.
Classification model (Модель классификации) — A type of machine learning model for distinguishing among two or more discrete classes. For example, a natural language processing classification model could determine whether an input sentence was in French, Spanish, or Italian.
Classification threshold (Порог классификации) — A scalar-value criterion that is applied to a model’s predicted score in order to separate the positive class from the negative class. Used when mapping logistic regression results to binary classification.
Clinical Decision Support (CDS) (Поддержка принятия клинических решений) – A clinical decision support system is a health information technology system that is designed to provide physicians and other health professionals with clinical decision support, that is, assistance with clinical decision- making tasks [100].
Clipping (Отсечение) – A technique for handling outliers. Specifically, reducing feature values that are greater than a set maximum value down to that maximum value. Also, increasing feature values that are less than a specific minimum value up to that minimum value. For example, suppose that only a few feature values fall outside the range 40—60. In this case, you could do the following: Clip all values over 60 to be exactly 60. Clip all values under 40 to be exactly 40. In addition to bringing input values within a designated range, clipping can also used to force gradient values within a designated range during training.
Closed dictionary (Закрытый словарь) – In speech recognition systems, a dictionary with a limited number of words, to which the recognition system is configured and which cannot be replenished by the user
Cloud (Облако) – The cloud is a general metaphor that is used to refer to the Internet. Initially, the Internet was seen as a distributed network and then with the invention of the World Wide Web as a tangle of interlinked media. As the Internet continued to grow in both size and the range of activities it encompassed, it came to be known as “the cloud.” The use of the word cloud may be an attempt to capture both the size and nebulous nature of the Internet [101].
Cloud computing (Облачные вычисления) is an information technology model for providing ubiquitous and convenient access using the Internet to a common set of configurable computing resources (“cloud”), data storage devices, applications and services that can be quickly provided and released from the load with minimal operating costs or with little or no involvement of the provider.
Cloud robotics (Облачная робототехника) – A field of robotics that attempts to invoke cloud technologies such as cloud computing, cloud storage, and other Internet technologies centred on the benefits of converged infrastructure and shared services for robotics. When connected to the cloud, robots can benefit from the powerful computation, storage, and communication resources of modern data center in the cloud, which can process and share information from various robots or agent (other machines, smart objects, humans, etc.). Humans can also delegate tasks to robots remotely through networks. Cloud computing technologies enable robot systems to be endowed with powerful capability whilst reducing costs through cloud technologies. Thus, it is possible to build lightweight, low cost, smarter robots have intelligent “brain” in the cloud. The “brain” consists of data center, knowledge base, task planners, deep learning, information processing, environment models, communication support, etc. [102]
Cloud TPU (Облачный процессор) – A specialized hardware accelerator designed to speed up machine learning workloads on Google Cloud Platform [103]
Cluster analysis (Кластерный анализ) – A type of unsupervised learning used for exploratory data analysis to find hidden patterns or groupings in the data; clusters are modeled with a similarity measure defined by metrics such as Euclidean or probability distance.
Clustering (Кластеризация) is a data mining technique for grouping unlabeled data based on their similarities or differences. For example, K-means clustering algorithms assign similar data points into groups, where the K value represents the size of the grouping and granularity. This technique is helpful for market segmentation, image compression, etc.
Co-adaptation (Коадаптация) – When neurons predict patterns in training data by relying almost exclusively on outputs of specific other neurons instead of relying on the network’s behavior as a whole. When the patterns that cause co-adaption are not present in validation data, then co-adaptation causes overfitting. Dropout regularization reduces co-adaptation because dropout ensures neurons cannot rely solely on specific other neurons.
Cobweb (Метод COBWEB) – An incremental system for hierarchical conceptual clustering. COBWEB was invented by Professor Douglas H. Fisher, currently at Vanderbilt University. COBWEB incrementally organizes observations into a classification tree. Each node in a classification tree represents a class (concept) and is labeled by a probabilistic concept that summarizes the attribute-value distributions of objects classified under the node. This classification tree can be used to predict missing attributes or the class of a new object.
Code (Код) is a one-to-one mapping of a finite ordered set of symbols belonging to some finite alphabet.
Codec (Кодек) “A codec is the means by which sound and video files are compressed for storage and transmission purposes. There are various forms of compression: ‘lossy’ and ‘lossless’, but most codecs perform lossless compression because of the much larger data reduction ratios that occur [with lossy compression]. Most codecs are software, although in some areas codecs are hardware components of image and sound systems. Codecs are necessary for playback, since they uncompress [or decompress] the moving image and sound files and allow them to be rendered.” [104]
Cognitive architecture (Когнитивная архитектура) – The Institute of Creative Technologies defines cognitive architecture as: “hypothesis about the fixed structures that provide a mind, whether in natural or artificial systems, and how they work together – in conjunction with knowledge and skills embodied within the architecture – to yield intelligent behavior in a diversity of complex environments”
Cognitive computing (Когнитивные вычисления) — is used to refer to the systems that simulate the human brain to help with the decision- making. It uses self-learning algorithms that perform tasks such as natural language processing, image analysis, reasoning, and human – computer interaction. Examples of cognitive systems are IBM’s Watson and Google DeepMind [105]
Cognitive Maps (Когнитивные карты) Cognitive maps are structured representations of decision depicted in graphical format (variations of cognitive maps are cause maps, influence diagrams, or belief nets). Basic cognitive maps include nodes connected by arcs, where the nodes represent constructs (or states) and the arcs represent relationships. Cognitive maps have been used to understand decision situations, to analyze complex cause-effect representations and to support communication. [106]