Casino Caliente how to x the dog house slot machines
Glosa Li Bai The ... Ina Chasinum
O o o o and an and and raq.
This glossary defines general machine learning terms and specific terms for TensorFlow.
Do you know?
You can filter the glossary by selecting a topic from the top navigation bar dro p-down glossary. Bird icon hatching indicates the definition for ML beginners.
A
excerpt
A method that evaluates its importance by temporarily removing certain features and elements from the model. Then, if the models that have removed the features and elements are r e-established, and the performance of the redesigned model is significantly poor, the removed characteristics and elements are important.
For example, it is supposed that a classification model is trained with 10 features, and a test set has achieved 88 % accuracy. To test the importance of the first feature, you can r e-train the models using only the other nine features. If the performance of the r e-learning model is extremely bad (for example, 55 % accuracy), it is considered that the removed characteristics were important. Conversely, if the performance of the r e-learning model was equivalent, its features were not so important.
Removal is also useful for judgment of importance:
- A larger component, like the entire su b-system of a larger ML system
- Processes and techniques such as data pretentious steps
In either case, after deleting the component, you will notice how the system performance changes (or whether it changes).
A/B test
A statistical method that compares two (or more) method-A and B-. Usually A is an existing method, and B is a new method. The A/B test not only determines which technique shows better performance, but also determines whether the difference is statistically significant.
The A/B test usually compares one measurement in two methods. For example, what happens when the accuracy of the model is compared in two methods? However, the A/B test can also compare any finite measurements.
Accelerator chip
#Google CloudA special hardware component class designed to execute the basic calculations required for deep learning algorithm.
Accelerator chips (abbreviated accelerators) can greatly improve the speed and efficiency of learning and reasoning tasks compared to genera l-purpose CPUs. Ideal for neural network training and similar computational tasks.
The following is an example of the accelerator chip:
- Google's Tensor Processing Unit (TPU) is equipped with special hardware for deep learning.
- NVIDIA GPUs, originally designed for graphics processing, are designed for parallel processing, allowing for a significant increase in processing speed.
Accuracy.
#Basic principlesThe number of correct classification predictions divided by the total number of predictions. That is,
textFor example, a model that makes 40 correct predictions and 10 incorrect predictions has an accuracy:
texttext
text- TN is the number of true negatives (correct predictions).
- FP is the number of false positives (incorrect predictions).
- FN is the number of false negatives (incorrect predictions).
- Click on the icon to see other notes.
While a valuable measurement in some situations, accuracy is very misleading in others. In particular, accuracy is usually a poor metric for evaluating classification models that deal with class-imbalanced datasets.
For example, suppose that a subtropical city experiences only 25 days of snow per century. The snow dataset for this city is class-imbalanced because the number of days without snow (negative class) far outweighs the number of days with snow (positive class). Imagine a binary classification model that assumes that it predicts whether it will snow or not every day. This model is very accurate, but has no predictive power. The table below summarizes the results of a century of prediction:
Category ... Common activation functions include:
A plot of an activation function is never a single line. For example, the graph of the Relu activation function consists of two lines: | A plot of a sigmoid activation function looks like this: |
---|---|
TP | 0 |
TN | 36499 |
FP | 0 |
FN | 25 |
Click on the icon to see an example.
In a neural network, an activation function deals with the weighted sum of all inputs to a neuron. To calculate the weighted sum, a neuron adds the products of the relative values and the weights. For example, suppose the relative inputs to a neuron are composed as follows:
Input value
Input weight
-1. 3
Thus, the weighted sum is:
Weighted sum = (2) (-1, 3) + (-1) (0, 6) + (3) (0, 4) = -2, 0
#Basic principlesSee Neural Networks: See Activation Functions in Machine Learning Crash Course for more information.
Active Learning
A learning approach in which an algorithm selects a portion of the data from which it learns. Active learning is particularly valuable when the highlighted examples are rare or expensive. Rather than blindly seeking a wide range of labeled examples, active learning algorithms selectively seek the specific range of examples they need to learn from.
Adagrad
A sophisticated gradient descent algorithm recombines the gradients of each parameter, effectively giving each parameter an independent learning rate. See the Adagrad paper for a full explanation.
Medium
More generally, an agent is software that has the ability to adapt to changes in the environment and autonomously plans and executes a series of actions toward a goal. For example, an LLM-based agent could use LLM to create a plan rather than implementing a reinforcement learning policy. | Accumulation |
2 | #Accumulation |
-1 | 0.6 |
3 | 0.4 |
The process of identifying excessive values. For example, if an attribute has a mean of 100 and a standard deviation of 10, anomaly detection will flag a value of 200 as suspicious.
Area Under the PR Curve
Area Under the ROC Curve
Artificial Intelligence
Mechanisms other than humans who demonstrate a wide range of problems, creativity, and adaptability. For example, a program that shows artificial intelligence can translate sentences, compose symphony, and excel in games that have not yet been invented.
artificial intelligence
#Basic principle
No n-no n-human programs and models who can solve complex tasks. For example, programs, models that translate text, and programs and models that identify disease from radiation images both have artificial intelligence.
Formally, machine learning is a field of artificial intelligence. However, in recent years, some organizations have used artificial intelligence and machine learning in the same sense.
caveat
The mechanism used in neural networks shows the meaning of a specific word or word. Attention compresses the amount of information required for the model to predict the next token/ word. A typical attainment mechanism may be composed of weight with weight for a series of inputs, and the weight of each input is calculated by another part of the neural network.For more information about sel f-awareness, "LLM: What is a large language model" on the machine learning crash course?
Peculiarity
AR
In the fairness of machine learning, attributes often refer to individual characteristics.
Featuring sampling
When each decision tree learns conditions, only a random subset of possible characteristics is considered. In general, subset of different attributes is sampled for each node. In contrast, when training a decision tree without sampling the attribute, all possible attributes are considered in each node.
AUC (ROC curved area)
#Basic principle
#Basic principlesFor example, the image below shows a classification model that completely separates positive class (green elliptical) and negative class (purple rectangle). This unrealistic perfect model AUC is 1, 0:
In contrast, the figure below shows the results of the classification model that produce random results. The AUC of this model is 0, 5:
Yes, the AUC of the previous model is 0, 5 instead of 0, 0. < SPAN> Mechanisms other than humans who demonstrate a wide range of problems, creativity, and adaptability. For example, a program that shows artificial intelligence can translate sentences, compose symphony, and excel in games that have not yet been invented.
artificial intelligence
#Basic principle
No n-no n-human programs and models who can solve complex tasks. For example, programs, models that translate text, and programs and models that identify disease from radiation images both have artificial intelligence.
Formally, machine learning is a field of artificial intelligence. However, in recent years, some organizations have used artificial intelligence and machine learning in the same sense.
caveat
The mechanism used in neural networks shows the meaning of a specific word or word. Attention compresses the amount of information required for the model to predict the next token/ word. A typical attainment mechanism may be composed of weight with weight for a series of inputs, and the weight of each input is calculated by another part of the neural network.
For more information about sel f-awareness, "LLM: What is a large language model" on the machine learning crash course?
#Basic principlesIn the fairness of machine learning, attributes often refer to individual characteristics.
Featuring sampling
When each decision tree learns conditions, only a random subset with potential features is considered. In general, subset of different attributes is sampled for each node. In contrast, when training a decision tree without sampling the attribute, all possible attributes are considered in each node.
AUC (ROC curved area)
#Basic principle
In the numerical value between 0, 0 and 1, 0, the doubl e-class classification model indicates the ability to separate positive and negative classes. The closer AUC, the higher the ability of a model that separates classes and classes.
For example, the image below shows a classification model that completely separates positive class (green elliptical) and negative class (purple rectangle). This unrealistic perfect model AUC is 1, 0:
In contrast, the figure below shows the results of the classification model that produce random results. The AUC of this model is 0, 5:
Yes, the AUC of the previous model is 0, 5 instead of 0, 0. Mechanisms other than humans who demonstrate a wide range of problems, creativity, and adaptability. For example, a program that shows artificial intelligence can translate sentences, compose symphony, and excel in games that have not yet been invented.
artificial intelligence
#Basic principle
No n-no n-human programs and models who can solve complex tasks. For example, programs, models that translate text, and programs and models that identify disease from radiation images both have artificial intelligence.
Formally, machine learning is a field of artificial intelligence. However, in recent years, some organizations have used artificial intelligence and machine learning in the same sense.
caveat
The mechanism used in neural networks shows the meaning of a specific word or word. Attention compresses the amount of information required for the model to predict the next token/ word. A typical attainment mechanism may be composed of weight with weight for a series of inputs, and the weight of each input is calculated by another part of the neural network.
For more information about sel f-awareness, "LLM: What is a large language model" on the machine learning crash course?
Peculiarity
In the fairness of machine learning, attributes often refer to individual characteristics.
Featuring sampling
- When each decision tree learns conditions, only a random subset of possible characteristics is considered. In general, subset of different attributes is sampled for each node. In contrast, when training a decision tree without sampling the attribute, all possible attributes are considered in each node.
- AUC (ROC curved area)
#Basic principle
- In the numerical value between 0, 0 and 1, 0, the doubl e-class classification model indicates the ability to separate positive and negative classes. The closer AUC, the higher the ability of a model that separates classes and classes.
- For example, the image below shows a classification model that completely separates positive class (green elliptical) and negative class (purple rectangle). This unrealistic perfect model AUC is 1, 0:
In contrast, the figure below shows the results of a classification model that produce random results. The AUC of this model is 0, 5:
Yes, the AUC of the previous model is 0, 5 instead of 0, 0.
Most models fall somewhere between the extremes. For example, the model below separates positives and negatives somewhat, so its AUC is somewhere between 0, 5 and 1, 0:
AUC ignores the value you set for the classification threshold. Instead, AUC takes into account all possible classification thresholds.
Click the icon to see the relationship between AUC and ROC curves.
- AUC represents the area under the ROC curve. For example, the ROC curve for a model that perfectly separates positives and negatives looks like this:
- The AUC is the area of the grey region in the previous figure. In this unusual case, the area is simply the length of the grey region (1, 0) multiplied by the width of the grey region (1, 0). Thus, the product of 1, 0 and 1, 0 results in an AUC of exactly 1, 0, which is the best possible AUC score.
- On the other hand, the ROC curve for a classifier that cannot separate classes at all would look like this. This grey region ranges from 0, 5.
- A more typical ROC curve looks like this:
Since manually calculating the area under this curve is tedious, programs usually calculate most AUC values.
Click on the icon for a more formal definition of AUC.
AUC is the probability that the classifier is confident that a randomly chosen positive example is truly positive, higher than the probability that it is confident that a randomly chosen negative example is positive.
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.Augmented reality
A technique that overlays computer-generated images onto the real world as seen by the user.
Autocoder
A system that learns to extract the most relevant information from its input. An autoencoder is a combination of an encoder and a decoder. An autoencoder is based on a two-step process:
The encoder maps the input to a (usually) lower-dimensional (intermediate) lossy format.
The decoder creates a lossy version of the original input by mapping the lower-dimensional format back to the original higher-dimensional input format.
An autoencoder is trained end-to-end, with the decoder trying to reconstruct the original input as faithfully as possible from the encoder's intermediate form. Because the intermediate form is smaller (lower dimensional) than the original form, the autoencoder is forced to learn what information in the input is necessary, and the output will not be exactly the same as the input.
If the input data is a graphic, the non-exceptional copy resembles the original graphic, but with some modifications. Perhaps the non-exceptional copy removes noise from the original graphic or fills in some pixels.
If the input data is text, the self-builder creates new text that mimics (but is not identical to) the original text.
B
Automation bias
#Basic principlesFor more information, see Justice: types of bias in the machine learning crash course.
- Automation
- The automated process of manufacturing machine learning models. Automl can automate tasks such as:
Searching for the best model.
Finding the best model.
Preparing the data (including running machine characterization).
Developing the resulting model.
Automl is useful for data scientists because it can save time and effort in developing machine learning pipelines and improve prediction accuracy. It is also useful for non-experts by making complex machine learning tasks more accessible.
For more information, see AutomerL in the Machine Learning Crash Course.
Automated models
#generativeai
A model that infers predictions based on its own past predictions. For example, an automatic language model predicts the next token based on previously predicted tokens. All large transformer-based language models are automatic-enabled.
In contrast, GAN-based image models do not usually self-replicate because they generate images in a forward-only manner rather than iteratively and incrementally. However, some image generation models do self-replicate because they generate images incrementally.
- Auxiliary loss
- Loss function - used in conjunction with the main loss function of a neural network model - helps speed up learning in the first few iterations, where weights are randomly initialized.
- Average precision
A metric to summarize the performance of a sorted set of results. Average precision is calculated by taking the average of the precision values for each relevant result (each result in the ranking list whose recall is increasing relative to the previous result).
- Axis-aligned condition
- In a decision tree, a state that contains only a single attribute. For example, if range is an attribute, the following are states along the axis:
- Area 200
Copy
#Basic principle
For neural network training, you need to repeat the following tw o-pass cycle over and over:
In the front pass, the system processes the example batch to obtain predictions. The system compares each prostitution value with each label value. The difference between the predicted value and the tag value is the loss of this example. The system calculates the total loss of the current batch by aggregating the loss of all examples.
#Basic principlesNeural networks often contain many neurons in many hidden layers. These neurons contribute to all losses in different ways. The regression determines whether to increase or reduce the weight applied to a specific neuron.
The learning rate is a multiplier that controls the degree of increasing or decreasing each rear pass. The high learning rate increases each weight than the low learning rate.
From a calculus point of view, the back applies the rules of the chain. In other words, back propagation calculates the radical function of errors in each parameter.
Until a few years ago, ML practitioners had to write code to implement back propagation. The latest ML API like Keras has now implemented back propagation. Huh!
<#TensorFlow See the neural network of the machine learning crash course for details.Jackknife
saccade stands for Ootstrap Agg Logging.
For more information, see the random forest of the forest course.
Singl e-language bag
A phrase and sentences in sentences regardless of the class. For example, a word bag expresses the following three phrases at the same time:
- Dog jumps
- Dog jumps
- Dogs jumping by dogs jump
Each word is mapped to the index of the sparse vector, and the vector has an index for each word in the vocabulary. For example, a phrase in which a dog jumps is mapped to a feature vector with a value other than 0 in the three indexes corresponding to the Dog and Jumps. The no n-zero value is one of the following:
#Basic principlesCount how many words appear in the bag. For example, if a brown dog is a dog with brown fur, both maron and dogs are represented 2, and other words are described as 1.
Other values, for example, logarithms when words appear in the bag.
- Baseline
- A model used as a benchmark to compare the performance of another (usually more complex) model. For example, a logistic regression model may serve as a good baseline for a detailed model.
- For a particular problem, a baseline helps a model developer quantify the minimum expected performance that a new model must achieve to be useful.
Production Set
- #Basic Principles
- A set of examples used in training iterations. The batch size determines the number of examples in a batch.
For information on the relationship between batches and epochs, see Epochs.
For more information, see Hyperparameters in Linear Regression: A Machine Learning Crash Course.
Batch Inference
#GoogleCloud
The process of inferring predictions for multiple unlabeled examples that are split into smaller subsets ("batches").
Batch inference can take advantage of the parallelization properties of accelerator chips. That is, multiple accelerators can simultaneously infer predictions for different batches of unlabeled examples, dramatically increasing the number of inferences per second.
For more information, see the ML Production Systems section: Static Inference Extraction and Dynamic Inference Extraction in the Crash Learning Machines course.
Batch NormalizationNormalization of the input or output of trigger functions to the hidden layer. Batch normalization has the following advantages:
Making the neural network more stable by protecting against extreme weights.
Allowing higher learning rates, accelerating learning.
Reducing overfitting.
Batch Size
#Basic Principles
The number of examples in a batch. For example, if the batch size is 100, the model processes 100 examples per iteration.
- The following are commonly used batch size strategies:
- Stochastic Gradient Descent (SGD), with a batch size of 1.
- In full batch, the batch size is the number of examples in the entire training set. For example, if the training set contains 1 million examples, the batch size is 1 million. Full batch is usually an inefficient strategy.
- Mini-batch: The batch size is usually between 10 and 1000. Mini-batch is usually the most efficient strategy.
For more information, see:
ML Production Systems: Static and Dynamic Inference in Machine Learning Crash Course.
#Basic principlesBayesian Neural Network
A probabilistic neural network that represents uncertainty in the weights and outputs. Standard neural network regression models typically predict scalar values. For example, a standard model predicts house prices as 853. 000. In contrast, a Bayesian neural network predicts variance prices. For example, a Bayesian model predicts house prices as 853. 000 with a standard deviation of 67. 200.
Bayesian neural networks rely on Bayes' theorem to calculate the weights and prediction uncertainty. Bayesian neural networks are useful when it is important to quantify uncertainty, such as in pharmaceutical-related models. Bayesian neural networks also help prevent overloading.
Most models fall somewhere between the extremes. For example, the model below separates positives and negatives somewhat, so its AUC is somewhere between 0, 5 and 1, 0:
An alternative regression model approach to optimizing computationally expensive objective functions is to instead optimize a surrogate that quantifies the uncertainty using Bayesian learning techniques. Bayesian optimization is very expensive, so it is often used to optimize expensive tasks with a small number of parameters, such as hyperparameter selection.
#Basic principlesIn reinforcement learning, the optimal Q-function satisfies the following identity:
\Q (s, a) = r (s, a) + ㎤ ㎤ ㎤ γ
\.Reinforcement learning algorithms apply this identity to generate q-functions with the following update rule:
\ Q (s, a) ∕ takes q (s, a) + ∕α ∕ left [r (s, a) + ∕γ ∕ displayStyle ∕ max_& amp; gt; q (s', a') - q (s, a )
Besides reinforcement learning, Bellman's equation also has applications in dynamic programming. See the Wikipedia entry for Bellman Equation.
Bert (Bidirectional Encoder Representation of Transformers)
A model architecture for text representation. A trained BERT model can serve as part of a larger model for text classification and other ML tasks.
Bert has the following characteristics:
It uses a transformer architecture, so it is based on self-preservation.
It uses the encoder part of a transformer. The encoder's job is to generate a good text representation, not to perform a specific task like classification.
It is bidirectional.
It uses masking for unsupervised learning.
Variations of Bert include:
#Basic principles#Basic Principles
1. Prejudiced or favorable attitudes towards certain things, people, or groups over others. Such biases can affect how data is collected and interpreted, systems are designed, and how users interact with systems. Examples of this type of bias include:
- 2. Systematic error introduced by sampling or reporting procedures. Forms of this type of bias include:
- Do not confuse with bias in machine learning models or prediction bias.
See "Fairness: Types of Bias in Machine Learning Crash Course" for more information.
Bias (mathematics) or Bias Terminology
#Basic Principles
Deviation from the origin. Bias is a parameter in a machine learning model, which can be expressed as:
For example, bias is b in:
For example, bias is b in: $$y' = b + w_1x_1 + w_2x_2 + ... w_nx_n$$ .
For a simple two-dimensional line, bias simply means the "y-intercept". [...]
[...]
[...]
[...]
A term used to describe a system that evaluates the text preceding and following a target text module. In contrast, unidirectional systems only evaluate the text preceding the target text module.
For example, consider a covert language model that must determine the probability of the word or words that underline the following question:
Unidirectional language models base the probability solely on the context provided by the words "What", "is" and "the". In contrast, bidirectional language models can also get context from "with" and "you", helping the model generate better predictions.
Bidirectional Language Model
A language model that determines the probability of a token being present at a position in a text based on the surrounding text.
Digram
Binary Classification
#Basic Principles
A type of classification task that predicts one of two mutually exclusive categories:
For example, the following two machine learning models perform binary classification:
A model that determines whether an email message is spam (positive category) or junk (negative category).
A model that evaluates medical symptoms to determine whether a person has a particular disease (positive category) or does not have that disease (negative category).
For more information, see Classification in Machine Learning Crash Course.
#Basic principlesIn the decision tree, there are only two possible results, usually "yes" or "no". For example, the following is the situation of the binary conflict:
temperature. = 100
- For more information, see the types of determined trees lessons.
- delay
BLEU (bilingual evaluation)
While a valuable measurement in some situations, accuracy is very misleading in others. In particular, accuracy is usually a poor metric for evaluating classification models that deal with class-imbalanced datasets.
enhancement
A set of a simple and uncertain classle (called a "weak" classification device) is a hig h-precision classification device ("strong" classification) by strengthening the evidence that the model is currently classifying. Machine learning technique that combines repeatedly.
Look at the slope boosting decisive trees. I will explain in detail on the course.
Divided framework
C
In the image, a rectangular coordinates (x, y) surrounding the do g-like areas in the image below.
radio
The shape of the operator of the MATRIX Math function is expanded to a dimension that conforms to the function. For example, in linear algebra, the two operators in the queue must be the same dimension. Therefore, a matrix of (M, N) cannot be added to the vector of the length n. In Broadcast, this operation is possible by copying the same value in each column to effectively expand the length N operator in a matrix in the shape (M, N).
For example, if the following definitions are given, A and B have different dimensions, so the linear algebra prohibits A+B:A = [[7, 10, 4], [13, 5, 9]] b = [2]
However, the release is almost expanded and forgive with the A+B function:
[[2, 2, 2], [2, 2, 2]
Therefore, A+B is currently an effective operation:
- [7, 10, 4], + [[2, 2, 2], = [9, 12, 6], [13, 5, 9]] [2, 2, 2]] [15, 7, 11 ]]
- See the explanation of the following Numpy for details.
- Cava
#Basic principle
Converts a single attribute to multiple binary attributes called buckets or buckets, usually based on the range of values. The cut attribute is usually continuous attributes.
For example, instead of expressing temperature as a single continuous risk attribute, the temperature series can be divided into the following individual buckets:
#Basic principles11? = If you are 11-24 degrees, "Temperate" bucket, Conde 11?
The model treats each value in the same bottle in the same way. For example, both values 13 and 22 are both extreme bottles, so the models treat both values at the same time.
Click the icon to display other notes.
When the temperature is expressed as a continuous function, the model treats the temperature as one attribute. When the temperature is expressed as three buckets, the model treats each bucket as a separate characteristics. In other words, the model can learn the separate relationship between each bottle of the label. For example, a linear regression model allows you to learn different weights for each bottle.
Increasing the number of buckets will increase the number of relationships that the model must learn, and the model becomes complicated. For example, cold, temperate, and hig h-temperature buckets are essentially three independent attributes, and models learn this. If you decide to add more buckets, for example, Freeze and Warm, you will have to learn about five different attributes.
How do you know how many buckets you make and how many rows of each bucket? The answer usually requires a considerable amount of experiments.
For more information, see the vination of the machine learning crash course.
The mechanism used in neural networks shows the meaning of a specific word or word. Attention compresses the amount of information required for the model to predict the next token/ word. A typical attainment mechanism may be composed of weight with weight for a series of inputs, and the weight of each input is calculated by another part of the neural network.The adjustment after the starting point is usually considered for predictive bias. The adjusted predicted values and probability must match the distribution of the observed label set.
Nomination
#Recossed system
The mechanism used in neural networks shows the meaning of a specific word or word. Attention compresses the amount of information required for the model to predict the next token/ word. A typical attainment mechanism may be composed of weight with weight for a series of inputs, and the weight of each input is calculated by another part of the neural network.For more information, see the "Candidate Production Outline" on the Systems Systems Course.
Nomination
Learning time optimization is to calculate the probability of all positive labels using a soft max, for example, but to calculate only the probability of random samples of negative labels. For example, if a BEAGLE and DOG tag is given, candidate sampling calculates the following prediction probability and corresponding losses:
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.dog
Random subset of the remaining negative category (for example, CAT, Lollipop, Fence).
This idea is that as long as the positive class is constantly enhanced, the negative class can be learned from a smaller frequency enhancement, which is actually empirically observed. 。
- Candidate sampling is more comparable than a learning algorithm that calculates all negative class predictions, especially if the number of negative classes is very large.
- Category data
#Basic principle
A feature that has a set of specific values. For example, let's consider the categorical feature called Lantern-State:
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.Category features are sometimes called discrete features.
For more information, see Working With Categorical Data in Course Crash Machine Learning.
Causal language model
See the tw o-way language model to compare the approach of different directions to language modeling.
- Centroid
- #Group
- The center of the cluster determined by K-Means or K-Diameter algorithm. For example, if K is 3, the K-Means or K-Median algorithm finds three centers.
- For more information, see the Group's Group's section section for more information.
Centroi d-based clustering
#Basic principlesA class of clustering algorithms that organize data into no n-hierarchical clusters. K-Means is the most widely used Centroid-based clustering algorithm.
- For more information, see the Group's Group's section section for more information.
- Chain of thinking
#Generative AI
See "Fairness: Types of Bias in Machine Learning Crash Course" for more information.
How many g do drivers receive in a car that accelerates from 0 miles / h to 60 miles in 7 seconds? In the answer, show all the associated calculations.
#Basic principlesAttach a value of 0, 60, and 7 in the appropriate part and show a series of physical formulas.
- Explain why you chose these formulas and what different variables mean.
- With the chain prompt of thinking, LLM can perform all calculations and lead the correct answer. In addition, the chain prompt for inference allows users to examine the LLM steps and determine whether the answer makes sense.
- chat
#generative
The content of the dialogue before and after the ML system (usually a large language model). The exchange in front of the conversation (what type you type and how the larg e-scale language model responded) is the context of the conversation afterwards.
Chatbot is an implementation of a large language model.
#Basic principlesData that records the status of the parameters of the model during training or after training. For example, during training
Perhaps intentionally or as a result of some error, stop training.
- Record a checkpoint.
- Then reload the checkpoint.
Resume training.
Restart the class.
While a valuable measurement in some situations, accuracy is very misleading in others. In particular, accuracy is usually a poor metric for evaluating classification models that deal with class-imbalanced datasets.
Category to which the tag belongs. for example
In the binary classification model that detects spam, the two classes are spam and no n-spam.
- In a mult i-class classification model that identifies dog varieties, classes are poodles, beagles, and pugs.
- The classification model predicts the class. In contrast, regression model predicts numerical values, not classes.
For more information, see the classification of the machine learning crash course.
Classification model
- #Basic principle
- A model with prediction as a class. For example, all are all classified models:
A model that predicts the input phrase language (French? Spanish? Italian?)
A model that predicts the type of wood (Maple? Nara? Baobab?)
A model that predicts the positive or negative of a specific medical condition.
#Basic principlesThere are two common classification models:
- Classified value
- #Basic principle
In binary classification, the numerical value between 0 and 1, which converts the output of logistic regression model into either a positive class or negative class prediction. Note that the category values are not values selected by model learning, but values selected by humans.
The logistic regression model outputs raw values between 0 and 1:
- If this raw value is larger than a category, the positive order is predicted.
- If this raw value is smaller than the categorized value, a negative class is predicted.
For example, the threshold value of the classification is 0 and 8. If the raw value is 0, 9, the model predicts the positive class. If the value is 0, 7, the model predicts the negative class. < SPAN> The contents of the dialogue before and after with the ML system (usually a large language model). The exchange in front of the conversation (what type you type and how the larg e-scale language model responded) is the context of the conversation afterwards.
- Chatbot is an implementation of a large language model.
- Control point
- Data that records the status of the parameters of the model during training or after training. For example, during training
Perhaps intentionally or as a result of some error, stop training.
#Basic principlesThen reload the checkpoint.
- Resume training.
- Restart the class.
- #Basic principle
- Category to which the tag belongs. for example
In the binary classification model that detects spam, the two classes are spam and no n-spam.
In a mult i-class classification model that identifies dog varieties, classes are poodles, beagles, and pugs.
The classification model predicts the class. In contrast, regression model predicts numerical values, not classes.
<#TensorFlow See the neural network of the machine learning crash course for details.Classification model
#Basic principle
The mechanism used in neural networks shows the meaning of a specific word or word. Attention compresses the amount of information required for the model to predict the next token/ word. A typical attainment mechanism may be composed of weight with weight for a series of inputs, and the weight of each input is calculated by another part of the neural network.A model that predicts the input phrase language (French? Spanish? Italian?)
A model that predicts the type of wood (Maple? Nara? Baobab?)
A model that predicts the positive or negative of a specific medical condition.
In contrast, regression models predict the number, not class.
There are two common classification models:
Classified value
#Basic principle
In binary classification, the numerical value between 0 and 1, which converts the output of logistic regression model into either a positive class or negative class prediction. Note that the category values are not values selected by model learning, but values selected by humans.
For example, if the following definitions are given, A and B have different dimensions, so the linear algebra prohibits A+B:If this raw value is larger than a category, the positive order is predicted.
If this raw value is smaller than the categorized value, a negative class is predicted.
For example, the threshold value of the classification is 0 and 8. If the raw value is 0, 9, the model predicts the positive class. If the value is 0, 7, the model predicts the negative class. The content of the dialogue before and after the ML system (usually a large language model). The exchange in front of the conversation (what type you type and how the larg e-scale language model responded) is the context of the conversation afterwards.
Chatbot is an implementation of a large language model.
Control point
Data that records the status of the parameters of the model during training or after training. For example, during training
- Perhaps intentionally or as a result of some error, stop training.
- Record a checkpoint.
- Then reload the checkpoint.
Resume training.
- Restart the class.
- #Basic principle
Category to which the tag belongs. for example
While a valuable measurement in some situations, accuracy is very misleading in others. In particular, accuracy is usually a poor metric for evaluating classification models that deal with class-imbalanced datasets.
In a mult i-class classification model that identifies dog varieties, classes are poodles, beagles, and pugs.
The classification model predicts the class. In contrast, regression model predicts numerical values, not classes.
For more information, see the classification of the machine learning crash course.
Classification model
For example, bias is b in:
A model with prediction as a class. For example, all are all classified models:
A model that predicts the input phrase language (French? Spanish? Italian?)
A model that predicts the type of wood (Maple? Nara? Baobab?)
A model that predicts the positive or negative of a specific medical condition.
In contrast, regression models predict the number, not class.
There are two common classification models:
Classified value
#Basic principle
In binary classification, the numerical value between 0 and 1, which converts the output of logistic regression model into either a positive class or negative class prediction. Note that the category values are not values selected by model learning, but values selected by humans.
#Basic principlesIf this raw value is larger than a category, the positive order is predicted.
If this raw value is smaller than the categorized value, a negative class is predicted. | For example, the threshold value of the classification is 0 and 8. If the raw value is 0, 9, the model predicts the positive class. If the value is 0, 7, the model predicts the negative class. | |
---|---|---|
The choice of classification threshold strongly influences the number of false positives and false negatives. | Click on the icon for additional information. | As models and datasets evolve, engineers may change the classification threshold. When the classification threshold is changed, a positive order prediction may suddenly become negative order, or vice versa. |
For example, consider a binary classification disease prediction model. Suppose the system is run in the first year: | A patient's raw value is 0, 95. | The classification threshold is 0, 94. |
Thus, the system diagnoses the positive class. (The patient yells "Oh no! I'm sick!").
- A year later, the current values are probably like this:
- The raw value of the same patient remains 0, 95.
The classification threshold has been changed to 0, 97.
Thus, the system reclassifies this patient into the negative class. (Same patient. Different diagnosis. | For more information, see Boundaries and Confusion Matrices in Machine Learning Crash Course. | Exhausted Class Datasets | |
---|---|---|---|
#Basic Principles | 88 | 12 | 0 |
A dataset for a classification problem where the total number of labels for each class is significantly different. For example, consider a binary classification dataset where the two labels are separated as follows: | 6 | 141 | 7 |
1. 000. 000 negative labels | 2 | 27 | 109 |
10 positive labels
This is a class-compensated dataset because the ratio of negative to positive labels is 100. 000 to 1.
In contrast, the following dataset is unbalanced because the ratio of negative to positive labels is relatively close to 1:
517 negative labels
483 positive labels
- Multi-class datasets can also be balanced. For example, the following multi-category classification dataset is also balanced because one label has more examples than the other two:
- 1. 000. 000 labels with the category "green"
200 labels with the class "purple"
350 labels with the class "orange" Labels
- Clipping
- #Basic principles
A technique for handling excess values by doing one or both of the following:
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.Increase attribute values that are less than the minimum threshold to that minimum threshold.
Scale all values greater than or equal to 60 (the maximum threshold) to be exactly 60.
Scale all values less than 40 (the minimum threshold) to be exactly 40.
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.See Numerical Data: Normalization in Machine Learning Crash Course for more details.
TPU Cloud
#Basic principlesSpecialized hardware accelerators designed to accelerate machine learning workloads on Google Cloud.
Clustering
#Cluster
Grouping related examples together, especially in unsupervised learning. Once all the examples have been clustered, a human can optionally give each cluster a meaning.
#Basic principlesA human researcher can review the clusters and, for example, label cluster 1 as a "dwarf tree" and cluster 2 as a "full-sized tree".
As another example, consider a clustering algorithm based on the distance of instances from a center point:
See the Clustering lesson for more details.
Coadaptation
When neurons predict patterns in the training data based only on the output of other specific neurons, rather than on the behavior of the entire network. Co-adaptation causes overload when the pattern that caused the adaptation is not present in the validation data. Abandonment reduces co-adaptation because it prevents neurons from relying solely on other specific neurons.
Collaborative filtering
#recsystem
Making predictions about a user's interests based on the interests of many other users. Collaborative filtering is often used in recommender systems.
See Collaborative Filtering in the Recommender Systems course for more information.
Drift
The relationship between attributes and tags shifts. Over time, concept drift degrades the quality of a model.
During training, the model learns the relationship between features in the training set and their labels. If the labels in the training set are good proxies for the real world, the model should make good predictions of the real world. However, due to concept shift, the model's predictions tend to deteriorate over time.2For example, consider a binary classification model that predicts whether a particular car model is "fuel efficient". That is, the features are:1Car weight
Engine compression
Transmission type
The labels are
Low fuel consumption
Low fuel consumption
However, the concept of a "fuel-efficient car" continues to change. A car that was labeled as fuel-efficient in 1994 will almost certainly not be labeled as fuel-efficient in 2024. Models that suffer from concept drift tend to become less able to make useful predictions over time.
Click on the icon to see additional information.
To correct for concept drift, the model retrains faster than the rate of concept drift. For example, if concept drift causes the model to lose accuracy significantly every two months, retrain the model more frequently than every two months.
Status
In a decision tree, each node that evaluates an expression. For example, the following section of a decision tree contains two conditions:
Conditions are also called splits or tests.
For more information, see Types of Conditions in the Decision Trees course.
For more information about tree conditions, see Conditions in Decision Trees.
Merge is probably a more technically accurate term than illusion. However, illusion was more widespread. Configuration
The process of assigning initial property values used to train a model:
Machine learning projects can use configuration files or configuration libraries, such as:
Confirmation bias
The tendency to seek out, interpret, support, and recall information in ways that confirm one's existing beliefs and assumptions. Machine learning programmers may inadvertently collect or label data in ways that influence results to support one's existing beliefs. Confirmation bias is a form of implicit bias.
Experimenter bias is a form of confirmation bias, in which the experimenter continues to train the model until the existing hypothesis is confirmed.
Confusion matrix
#Basic principles
An NXN matrix summarizing the number of correct and incorrect predictions made by a classification model. For example, consider the following confusion matrix for a binary classification model:
Volumetric (Prediction)
Non-Volumetric (Prediction)
Volumetric (True Ground)
- 18 (TP)
- 1 (FN)
Non-Volumetric (True Ground)
6 (FP)
452 (TN)
The confusion matrix above would look like this:
Of the 19 predictions where the truth was volumetric, the model correctly classified 18 and incorrectly classified 1.
Of the 458 predictions where the truth was not volumetric, the model correctly classified 452 and incorrectly classified 6.
Confusion tables for multiclass classification problems can help identify patterns of errors. For example, consider a confusion table for a three-class multiclass classification model that classifies three different types of iris (Virginica, Versicolor, Setosa). When the ground truth was Virginica, the confusion table shows that the model was much more likely to incorrectly predict Versicolor than Setosa:
- Setosa (prediction)
- Versicolor (prediction)
- Virginica (prediction)
Setosa (ground truth)
Versicolor (ground truth)
Virginica (ground truth)
As another example, a confusion matrix might reveal that a model trained to recognize handwritten digits tends to incorrectly predict 9 instead of 4, or 1 instead of 7.
The confusion table contains enough information to calculate a variety of performance metrics, including precision and recall.
- Voter Analysis
- By breaking down a sentence into smaller grammatical structures ("constituents"), downstream ML systems, such as natural language understanding models, can parse the constituents more easily than they would in the original sentence. For example, consider the following sentence:
A component analyzer can break this sentence into two components:
My friend is a noun phrase.
Adopted two cats is a verb phrase.
While a valuable measurement in some situations, accuracy is very misleading in others. In particular, accuracy is usually a poor metric for evaluating classification models that deal with class-imbalanced datasets.
- A verb clause can be broken down further into:
- adopted is a verb.
- two cats is another noun phrase.
- Contextual Language Integration
- #Generative AI
Integration that gets closer to "understanding" words and phrases the way native speakers can. Context-based language embeddings can understand complex syntax, semantics, and context.
For example, consider the embedding of the English word cow. Older embeddings like word2vec can represent the English word such that the distance in the embedding space from cow to bull is similar to the distance from ewe to ram, or from female to male. Context-based language embeddings can go a step further by recognizing that English speakers sometimes use the word cow to mean cow or bull.
Context Window
#generativeAI
The number of tokens the model can process for a given prompt. The larger the context window, the more information the model can use to provide a coherent and consistent response to the prompt.
Consistent Attributes
#Basic Principles
D
Floating-point properties with an infinite range of possible values, such as temperature and weight.
Convenience Sampling
Use a non-scientifically constructed dataset to perform quick experiments. It is important to later switch to a scientifically constructed dataset.
Convergence
#Basic Principles
#Basic principlesConvergence occurs when increasing the amount of training does not improve the model.
In deep learning, the loss value may remain constant or nearly the same for many iterations before it eventually drops. Long periods of constant loss values can give the illusion of temporary convergence.
For more information, see Model Convergence and Loss Curves in the Machine Learning Errors lesson.
Convex Function
A function whose area on the graph of the function is a convex set. The archetype of a convex function is shaped like the letter U. For example, all of the following are convex functions:
In contrast, the following functions are not convex: Note that the region above the graph is not a convex set:
A strictly convex function has exactly one local minimum, which is also a global minimum. Classical U-shaped functions are strictly convex. However, some convex functions (e. g. a line) are not U-shaped.
#Basic principlesMany common loss functions are convex, such as:
- Many variations of gradient descent are guaranteed to find a point near the minimum of a strictly convex function. Similarly, many variations of stochastic descent gradient can find a point near the minimum of a strictly convex function with high probability (although they cannot be guaranteed).
- The sum of two convex functions (e. g. L
loss + L
fit) is a convex function.Deep models are never convex. It is worth noting that algorithms designed for convex optimization tend to find reasonable solutions for deep networks, but these solutions are not guaranteed to be global minima.
For more information, see Convolutions and Convex Functions in the Machine Learning course.
Convex optimization
A process that uses mathematical methods such as gradient descending to find the minimum value of convex functions. Many studies in machine learning consist of various problems as convex maximization issues, focusing on solving these problems more efficiently.
See Boyd and Vandenberghe, Convex Optimization for details.
Convex gathering
In a partial set of Euclid space, a line drawn between any two points in the part set is completely remained in the part of the part. For example, the following two shapes are convex gathers:
On the other hand, the following two shapes are not convex gathers:
Partial assembly
In mathematics, two functions are mixed in casual. In machine learning, the weight is mixed with the folded filter and the input grid.
The term "folding" in machine learning is often a narrow meaning that refers to either a folded function or a folded layer.
Without implementation, machine learning algorithms must learn different weights for each cell of large tensor. For example, a machine learning algorithm learned with a 2K x 2K image must find a different weight of 4m. Thanks to the shrinkage, the machine learning algorithms can be dramatically reduced the memory required for model learning. When a folded filter is applied, it is simply duplicated to the cell and the filter is interacted with each.
For more information, see the introduction of the global neural network of the image classification course.
Coherent filter
One of the two actors of folding enterprise. (Another actor is a slice in the input matrix) The folded filter has the same number as the input matrix, but is a smaller matrix. For example, if a 28x28 input line is given, the filter will be an arbitrary tw o-dimensional matrix smaller than 28x28.
In the photo process, all cells in the folded filter are usually set to a fixed pattern of 1 and 0. In machine learning, folding filters are usually randomized, and the network learns the optimal value.
#Basic principlesA process that uses mathematical methods such as gradient descending to find the minimum convex function < Span>. Many studies in machine learning consist of various problems as convex maximization issues, focusing on solving these problems more efficiently.
See Boyd and Vandenberghe, Convex Optimization for details.
Convex gathering
In a partial set of Euclid space, a line drawn between any two points in the part set is completely remained in the part of the part. For example, the following two shapes are convex gathers:
On the other hand, the following two shapes are not convex gathers:
Partial assembly
In mathematics, two functions are mixed in casual. In machine learning, the weight is mixed with the folded filter and the input grid.
The term "folding" in machine learning is often a narrow meaning that refers to either a folded function or a folded layer.
Without implementation, machine learning algorithms must learn different weights for each cell of large tensor. For example, a machine learning algorithm learned with a 2K x 2K image must find a different weight of 4m. Thanks to the shrinkage, the machine learning algorithms can be dramatically reduced the memory required for model learning. When a folded filter is applied, it is simply duplicated to the cell and the filter is interacted with each.
For more information, see the introduction of the global neural network of the image classification course.
- Coherent filter
- One of the two actors of folding enterprise. (Another actor is a slice in the input matrix) The folded filter has the same number as the input matrix, but is a smaller matrix. For example, if a 28x28 input line is given, the filter will be an arbitrary tw o-dimensional matrix smaller than 28x28.
In the photo process, all cells in the folded filter are usually set to a fixed pattern of 1 and 0. In machine learning, folding filters are usually randomized, and the network learns the optimal value.
- See the folded section of the image classification course for details.
- A process that uses mathematical methods such as gradient descending to find the minimum value of the folded layer convex function. Many studies in machine learning consist of various problems as convex maximization issues, focusing on solving these problems more efficiently.
See Boyd and Vandenberghe, Convex Optimization for details.
#Basic principlesIn a partial set of Euclid space, a line drawn between any two points in the part set is completely remained in the part of the part. For example, the following two shapes are convex gathers:
8 3 7 5 2 4 0 4 9 6 On the other hand, the following two shapes are not convex gathers:
Partial assembly
#Basic principles- The term "folding" in machine learning is often a narrow meaning that refers to either a folded function or a folded layer.
- Without implementation, machine learning algorithms must learn different weights for each cell of large tensor. For example, a machine learning algorithm learned with a 2K x 2K image must find a different weight of 4m. Thanks to the shrinkage, the machine learning algorithms can be dramatically reduced the memory required for model learning. When a folded filter is applied, it is simply duplicated to the cell and the filter is interacted with each.
- For more information, see the introduction of the global neural network of the image classification course.
Coherent filter
One of the two actors of folding enterprise. (Another actor is a slice in the input matrix) The folded filter has the same number as the input matrix, but is a smaller matrix. For example, if a 28x28 input line is given, the filter will be an arbitrary tw o-dimensional matrix smaller than 28x28.
In the photo process, all cells in the folded filter are usually set to a fixed pattern of 1 and 0. In machine learning, folding filters are usually randomized, and the network learns the optimal value.
See the folded section of the image classification course for details.
Folded layer
The following animation indicates a Colinia layer consisting of nine folding functions, including a 5x5 input matrix. Note that each aggregation function is operated for 3x3 slices with different input processions. The 3x3 matrix (right side) obtained as a result consists of the results of nine folding functions:
For more information, see the complete connecting layer of the image classification lesson.
<#TensorFlow See the neural network of the machine learning crash course for details.A neural network where at least one layer is folded. A typical folding neural network is composed of the following layers:
- The coherent neural network is a great success in some kind of problem, such as image recognition.
- The following tw o-stage mathematics function:
Match element of sliced slice in the folded filter and input matrix. (The slice in the input line has the same number and size as the folded filter).
Total of all values in the projection of the problem program.
For example, let's consider the following 5x5 input matrix:
Here, imagine the following 2x2 communal filter.
Each comprehensive operation includes one 2x2 slice in the input matrix. For example, use a 2x2 slice at the upper left of the input matrix. Therefore, the folding operation to this slice is as follows:
The folding layer is composed of several folding functions, and each function acts on different slices in the input matrix.
cost
- Coordination learning
- The sem i-teaching learning approach is particularly valid when all the following conditions apply:
- The ratio of labe l-free examples for label examples in datasets is high.
- It is a classification problem (two or multiple categories).
Data sets include two different predictive features sets that are independent and complementary to each other.
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.A representative paper on c o-occurrence is a combination of Blum and Mitchell's c o-occurrence data, labeled data and labe l-free data.
#Basic principlesFor more information, see the complete connecting layer of the image classification lesson.
Coherent neural network
A neural network where at least one layer is folded. A typical folding neural network is composed of the following layers:
The coherent neural network is a great success in some kind of problem, such as image recognition.
The following tw o-stage mathematics function:
Match element of sliced slice in the folded filter and input matrix. (The slice in the input line has the same number and size as the folded filter).
Total of all values in the projection of the problem program.
For example, let's consider the following 5x5 input matrix:
Here, imagine the following 2x2 communal filter.
Each comprehensive operation includes one 2x2 slice in the input matrix. For example, use a 2x2 slice at the upper left of the input matrix. Therefore, the folding operation to this slice is as follows:
The folding layer is composed of several folding functions, and each function acts on different slices in the input matrix.
cost
Coordination learning
The sem i-teaching learning approach is particularly valid when all the following conditions apply:
The ratio of labe l-free examples for label examples in datasets is high.
It is a classification problem (two or multiple categories).
Data sets include two different predictive features sets that are independent and complementary to each other.
Co-transining basically amplifies independent signals into a stronger signal. For example, let's consider a model that classes the quality of a used car. The set of predictive features can focus on the overall characteristics of cars, manufacturers, and models. Another predictive feature set may focus on the driving record of the former owner and the maintenance history of the car.
A representative paper on c o-occurrence is a combination of Blum and Mitchell's c o-occurrence data, labeled data and labe l-free data.
The following animation, seeing mutual reference, indicates a Colinia layer consisting of nine folding functions, including a 5x5 input matrix. Note that each aggregation function is operated for 3x3 slices with different input processions. The 3x3 matrix (right side) obtained as a result consists of the results of nine folding functions:
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.Coherent neural network
- A neural network where at least one layer is folded. A typical folding neural network is composed of the following layers:
- The coherent neural network is a great success in some kind of problem, such as image recognition.
The following tw o-stage mathematics function:
Match element of sliced slice in the folded filter and input matrix. (The slice in the input line has the same number and size as the folded filter).
Total of all values in the projection of the problem program.
For example, let's consider the following 5x5 input matrix:
Here, imagine the following 2x2 communal filter.
Each comprehensive operation includes one 2x2 slice in the input matrix. For example, use a 2x2 slice at the upper left of the input matrix. Therefore, the folding operation to this slice is as follows:
The folding layer is composed of several folding functions, and each function acts on different slices in the input matrix.
- cost
- Coordination learning
The sem i-teaching learning approach is particularly valid when all the following conditions apply:
The ratio of labe l-free examples for label examples in datasets is high.
It is a classification problem (two or multiple categories).
The mechanism used in neural networks shows the meaning of a specific word or word. Attention compresses the amount of information required for the model to predict the next token/ word. A typical attainment mechanism may be composed of weight with weight for a series of inputs, and the weight of each input is calculated by another part of the neural network.Co-transining basically amplifies independent signals into a stronger signal. For example, let's consider a model that classes the quality of a used car. The set of predictive features can focus on the overall characteristics of cars, manufacturers, and models. Another predictive feature set may focus on the driving record of the former owner and the maintenance history of the car.
A representative paper on c o-occurrence is a combination of Blum and Mitchell's c o-occurrence data, labeled data and labe l-free data.
- Mutual reference
- A fairness of fairness that verifies whether a classifier produces the same result to another classifier as the first classifier, except for one or more sensitive characteristics. Evaluating the contrast of the classifier is a method for investing the potential bias of the model.
See one of the following for details:
Focus on the trial on the justice machine learning crash course.
When the world collides: Fairly integrate different conflicts
Coverage bias
Crash brow
#Basic principlesClick the icon to see the supplementary explanation.
- Bureaucracy may refer to one of the following:
- glue
- Excessive bureaucracy
Structural support
#Basic principlesCriticism
E
Localization
fit) is a convex function.Interruption verification
A mechanism that evaluates how much the model is generalized for new data by testing a model for one or more duplicate part of the data held from the learning set.
#Basic principlesA function that determines the frequency of samples below the target value. For example, consider the regular distribution of continuous values. The CDF indicates that about 50%of the sample is less than the average value, and about 84%of the sample is one standard deviation than the average value.
While a valuable measurement in some situations, accuracy is very misleading in others. In particular, accuracy is usually a poor metric for evaluating classification models that deal with class-imbalanced datasets.
Understand data by investigating, measuring, and visualization of samples. Data analysis is especially useful when taking a dataset before creating the first model. It is also indispensable for understanding experiments and debugging system problems.
Data enhancement
Artificially increasing the range and number of training examples by transforming existing examples to create additional examples. For example, suppose that images are one of the features, but your dataset does not contain enough image examples for the model to learn useful associations. Ideally, you would add enough labels to the dataset so that the model can learn well. If this is not possible, you can rotate, stretch, and mirror each image to create many variations of the original image, potentially resulting in enough tagged data for good learning.
DataFrame
#Basic Principles
A common pandas data type for representing datasets in memory.
A DataFrame is similar to a table or spreadsheet. Each column in a DataFrame has a name (header), and each row is identified by a unique number.
- Each column in a DataFrame is structured like a two-dimensional table, but each column can be assigned its own data type.
- Data Parallelism
- A method of scaling training or inference by replicating the entire model on multiple devices and passing a subset of the input data to each device. Data parallelism allows training or inference at very large batch sizes. However, data parallelism requires the model to be small enough to fit on all devices.
Data parallelism typically speeds up training and inference.
Dataset
#Basic Principles
A collection of raw data, usually organized in one of the following formats (but not limited to):
#Basic principlesA file in CSV format (comma separated).
Dataset API (tf. data)
#Tensorflow
A high-level TensorFlow API for reading data and converting it into the format required by machine learning algorithms. A tf. data. dataset object represents a sequence of elements, each of which contains one or more tensors. A tf. data. iterator object provides access to the elements of a dataset.
Decision Threshold
In a binary or multiclass classification problem, the separator between classes learned from a model. For example, in the following diagram, which represents a binary classification problem, the decision boundary is the boundary between the orange and blue classes:
Decision Forest
A model created from multiple decision trees. A decision forest aggregates the predictions of the decision trees to make a prediction. Common types of decision forests include random forests and gradient emphasis trees.
For more information, see the "Decision Forests" section of the "Decision Forests" course.
Decision threshold
Decision tree
A teacher's learning model with a hierarchical set of conditions and leaves. For example, the following is a decisive tree:
Decomposition device
Generally, an ML system that converts from a processed dense internal expression to a more raw external expression.
Decoders are often combined with the encoder and are the larger model components.
In the operation of each sequence, the decoder predicts the next sequence from the internal state generated by the encoder.
See Transformers for the definition of decoders in transformers architecture.
For more information, see the large language model of the machine learning crash course.
Deep model
#Basic principle
The deep model is also called a deep neural network.
Deep neural network
Deep Q network (DQN)
Critical is a synonym for deep Q network.
Population parity
A fair scale that is satisfied when the classification result of a model does not depend on a sensitive characteristics.
For example, if both the Liliputo and the Brove Dinnagi enter the Grabbudubbudrib University, the group on the other side, regardless of whether one group has higher qualities than the other group If the percentage and the ratio of the brove Dinnagi people to enter are the same, the population equality will be achieved.
In contrast to payoff equilibrium and equal opportunities, cluster classification results are allowed to rely on sensitive characteristics, but it is not permissible for specific soil label classification results to depend on sensitive characteristics. There is an illustration that explores trading off when optimizing population statistical equality by attacking discrimination with smarter machine learning.
See justice: For details, see "Population Statistical Parity" in the "Machine Learning Crash Course".
compensation
- HArtificially add noise to the dataset.
- pThe model tries to remove noise.
- qNoise removal enables learning from an example without label. The original dataset becomes the target or label, and the noise data is entered.
- By masking some chips, the noise is artificially added to the no n-labeled sentence.The model tries to predict the first token.2Close characteristics
#Basic principle
- Most or all values are attributes, usually the floating point tensor. For example, the following 10 tensols are dense because the nine values are zero.
- Close layer
Depth
- #Basic principle
- Number of hidden layers
- Number of output layers (usually 1)2Number of integrated layers2For example, the depth of a neural network with five hidden layers and one output layer is 6.
Note that the input layer does not affect the depth.
Breakable foldable Neural network (SEPCNN)
Although it is a neural network architecture based on the Inception, the inception module has been replaced by a separate depth condition. Also known as XCEPTION.
Separateable depth folding (abbreviated as separation can be folded) converts the standard 3D folding into two separate folding modes with higher calculation efficiency. The first is a normal depth of 1 (n ✕ n ✕ 1), and the second is a dotted length and width of 1 (1 ✕ 1 ✕ n).
Derivative label
device
#GoogleCloud
Overloaded terms that allow the following two definitions:
A hardware class that can execute TensorFlow sessions, including CPU, GPU, and TPU.
When building an ML model on the accelerator chip (GPU or TPU), the part that actually processes the tensor and embedding. The device is executed on the accelerator chip. On the other hand, the main frame usually works on the CPU.
#Basic principlesAn anonymization approach to protect the confidential data (for example, personal information of the individual) from the model set in machine learning. This approach guarantees that the model does not learn or memorize many of the specific individuals. This is achieved by adding sampling and noise during model training and masking individual data points to reduce the risk of being exposed to sensitive training data.
Difference privacy is also used other than machine learning. For example, data scientists may use a differential privacy to protect personal privacy when calculating different demographics product use statistics.
Dimensional reduction
- Usually, converting to an embedded vector reduces the number of dimensions used to express certain characteristics in the feature vector.
- dimension
Overloaded terms with one of the following definitions:
Number of coordinates of the tensor. for example
From a calculus point of view, the back applies the rules of the chain. In other words, back propagation calculates the radical function of errors in each parameter.
The vector is one dimension. For example, [3, 5, 7, 11].
The line has two dimensions. For example, [[2, 4, 18], [5, 7, 14]]. A specific cell of the on e-dimensional vector can be specified uniquely in one coordinates. Two coordinates are required to specify a specific cell in the tw o-dimensional matrix.
Direct prompt
#generative
Clear feature
#Basic principle
Attributes with a finite set of possible values. For example, the attributes of only animals, vegetables, or minerals are discrete (or category) attributes.
Discrete model
A model that predicts a label from a set of one or multiple attributes. More in general, the discriminatory model specifies the conditional probability of the output when the function and weight are given. In other words
P (Output | Features | Heavy) For example, a model that predicts whether a certain e-mail is spam by characteristics and weight is an identification model. Most teachers, including classification models and regression models, are identified models. 45 3 Clarifier 45 7 A system that determines whether the case is genuine or fake. 90 10 Alternatively, a subsystem in a genetic hostile network that determines whether an example generated by a generator is genuine or fake. See the GAN course identifier section for details.
P (Output | Features | Heavy) For example, a model that predicts whether a certain e-mail is spam by characteristics and weight is an identification model. Most teachers, including classification models and regression models, are identified models. 5 9 Clarifier 5 81 A system that determines whether the case is genuine or fake. 10 90 The delicate characteristics of the subject in the decisio n-making process facing people in different su b-groups to receive different treatments. < SPAN> The matrix has two dimensions. For example, [[2, 4, 18], [5, 7, 14]]. A specific cell of the on e-dimensional vector can be specified uniquely in one coordinates. Two coordinates are required to specify a specific cell in the tw o-dimensional matrix. Direct prompt
#generative
- Clear feature
- #Basic principle
Attributes with a finite set of possible values. For example, the attributes of only animals, vegetables, or minerals are discrete (or category) attributes.
Discrete model
A model that predicts a label from a set of one or multiple attributes. More in general, the discriminatory model specifies the conditional probability of the output when the function and weight are given. In other words
P (Output | Features | Heavy)
For example, a model that predicts whether a certain e-mail is spam by characteristics and weight is an identification model.
Most teachers, including classification models and regression models, are identified models.
Clarifier
P (Output | Features | Heavy) For example, a model that predicts whether a certain e-mail is spam by characteristics and weight is an identification model. Most teachers, including classification models and regression models, are identified models. 45 2 Clarifier 45 8 A system that determines whether the case is genuine or fake. 90 10 For example, if an algorithm, which determines the qualification of the lilipute Tiny House loan, contains a specific postal code in the mailing address, is likely to be classified as "inappropriate". This algorithm may have a different meaning if the Great Liliptan has more likely to have this postal code address than Lilliptan in Little Endian. In contrast, the discrimination is focused on the inequality that occurs when the subgroup's characteristics are explicitly input to the algorithm determined.
P (Output | Features | Heavy) For example, a model that predicts whether a certain e-mail is spam by characteristics and weight is an identification model. Most teachers, including classification models and regression models, are identified models. 5 18 Clarifier 5 72 A system that determines whether the case is genuine or fake. 10 90 #Basic principle Attributes with a finite set of possible values. For example, the attributes of only animals, vegetables, or minerals are discrete (or category) attributes.
Discrete model
A model that predicts a label from a set of one or multiple attributes. More in general, the discriminatory model specifies the conditional probability of the output when the function and weight are given. In other words
P (Output | Features | Heavy)
For example, a model that predicts whether a certain e-mail is spam by characteristics and weight is an identification model.
fit) is a convex function.Clarifier
A system that determines whether the case is genuine or fake.
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.See the GAN course identifier section for details.
Different effects
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.For example, if an algorithm, which determines the qualification of the lilipute Tiny House loan, contains a specific postal code in the mailing address, is likely to be classified as "inappropriate". This algorithm may have a different meaning if the Great Liliptan has more likely to have this postal code address than Lilliptan in Little Endian.
In contrast, the discrimination is focused on the inequality that occurs when the subgroup's characteristics are explicitly input to the algorithm determined.
Displacement processing
#Basic principlesFor example, let's consider an algorithm that determines the eligibility of the Koguchi mortgage based on the data that the lilipute tribe has entered in the loan application form. This algorithm uses a lilistan of Big Endians or Little Endians as input, and has a proorbent treatment in that dimension.
- In contrast to the Disparate Impact, it focuses on the social impact of the algorithm decision on the su b-group, regardless of whether the su b-group is a model input.
- The warning sensitive attributes are mostly correlated with other attributes that may have data, so even if sensitive attribute information is explicitly deleted, it is guaranteed that su b-groups are equally handled. Not done. For example, if a sensitive population statistical attribute is removed from an educational dataset that contains a postal code as an attribute, it may be possible to deal with the heterogeneous treatment of the subgroup, but the postal code is the other population statistical information. There is a possibility that these groups may be in gaps because they may work as a substitute.
Search source
#generative A process that reduces the model (known as a teacher) into a smaller model (known as a student) that imitates the prediction of the original model as much as possible. Distilation is useful because smaller models have two main benefits for larger models (teachers): Shortening of inference time Reduction of memory and energy usage But usually, students 'predictions are not as good as teachers' predictions. Distilation trains student models to minimize the loss function based on the difference between the prediction output between the student model and the teacher model. 15 47 998 Compare distillation and the following conditions. 19 34 1020 For more information, see the Machine Learning Crash Course "LLM: Fine Tuning, Distillation, Prompt Engineering". 18 92 1012 distribution Frequency and range of given attributes or different values of labels. The distribution captures how the specific value exists.
Shortening of inference time Reduction of memory and energy usage But usually, students 'predictions are not as good as teachers' predictions. 12 62 1014 21 47 1017 19 41 1021 Understanding each attribute and label distribution helps to normalize values and determine how to detect abnormal values.
Out-of-distribution refers to values that do not appear in a dataset or are very rare. For example, in a dataset consisting of images of cats, images of Saturn are considered out-of-distribution.
Divided clustering
#grouping
Subsampling
An overloaded term that can mean any of the following:
Reducing the amount of information in a feature in order to train a model more efficiently. For example, reducing the sampling of high-resolution images to a lower-resolution format before training an image recognition model.
Training with a disproportionately low proportion of examples in which an over-represented class is present in order to improve model learning for under-represented classes. For example, in a dataset with imbalanced classes, the model tends to learn more about the majority class and less about the minority class. Downward sampling helps to balance the amount of learning for the majority and minority classes.
F
F1
For more information, see "Datasets: Unbalanced Datasets" in Machine Learning Error Course.<_1>= \frac& amp; gt; $$Dropout Fit
- A type of ordering useful for training neural networks. Abandonment ordering randomly selects and removes a certain number of units at the network level in one gradient step. The more units that are discarded, the stronger the fit. This is similar to training a network to mimic an exponentially larger ensemble of smaller networks. For more information, see Dropout: A Simple Way to Prevent Neural Networks from Overfitting.
- Dynamic
A dynamic model (or web model) is a model that is frequently or continuously retrained.1Dynamic training (or web-based training) is training frequently or continuously.1Dynamic inference (or web inference) is the process of generating predictions on demand.
- Dynamic Models
- #Basic Principles
#TensorFlow
TensorFlow programming environment where operations are executed immediately. On the other hand, operations called in graph execution are not executed until they are explicitly evaluated. Eager execution is an imperative interface like code in most programming languages. Eager execution programs are generally much easier to debug than graph execution programs.
- Early stopping
- #Basic principles
- A cleanup method that ends the workout before the workout loss decreases. In early stopping, you intentionally stop training a model when the loss on the validation dataset starts to increase, i. e., when generalization performance gets worse.
Click on the icon for additional explanation.
Early stopping may seem counterintuitive. After all, stopping training a model while the loss continues to decrease is like telling a chef to stop cooking before the cake is fully baked. However, training a model for too long can lead to overfitting. That is, training a model for too long can cause the model to overfit the training data and not make good predictions for new examples.
Earth Shift Distance (EMD)
#Basic principlesDistance operations.
A measure of how similar two strings of text are. In machine learning, distance operations are useful because they are easy to compute and are an effective way to compare two strings that are known to be similar or to find strings that are similar to a given string.
There are several definitions of edit distance, each using different string operations. For example, the Levenshtein distance considers the fewest deletion, insertion, and substitution operations.
textheart → deart (replace "h" with "d") A model that predicts the type of wood (Maple? Nara? Baobab?)
dart → arrow → darts (insert "s")
#Basic principlesAn effective notation for describing how two tensors are combined. Tensors are combined by multiplying the elements of one tensor by the elements of the other and summing the products. In Einsum notation, symbols are used to identify the axes of each tensor, and the rearrangement of these same symbols determines the shape of the resulting new tensor.
A model that predicts the type of wood (Maple? Nara? Baobab?)
Integral Layers
#Basic principlesA special hidden layer that learns with high-dimensional categorical features and gradually learns lower-dimensional embedding vectors. Embedding layers allow neural networks to learn much more efficiently than they could with only high-dimensional categorical features.
textheart → deart (replace "h" with "d") In some cases, hashing can be used instead of an embedding layer.
The mechanism used in neural networks shows the meaning of a specific word or word. Attention compresses the amount of information required for the model to predict the next token/ word. A typical attainment mechanism may be composed of weight with weight for a series of inputs, and the weight of each input is calculated by another part of the neural network.
Embedding Space
#Basic principlesThe DOT product of two embeddings is a measure of their similarity.
#generative A process that reduces the model (known as a teacher) into a smaller model (known as a student) that imitates the prediction of the original model as much as possible. Distilation is useful because smaller models have two main benefits for larger models (teachers): Shortening of inference time Reduction of memory and energy usage But usually, students 'predictions are not as good as teachers' predictions. Distilation trains student models to minimize the loss function based on the difference between the prediction output between the student model and the teacher model. 15 47 998 92 19 34 1020 84 18 92 1012 87 Out-of-distribution refers to values that do not appear in a dataset or are very rare. For example, in a dataset consisting of images of cats, images of Saturn are considered out-of-distribution.
The selection of a function that minimizes the loss on a training set. Contrast with structural risk minimization.
#Basic principlesTypically, an ML system that transforms from a raw, sparse, or external representation to a more processed, denser, or more internal representation.
Encoders are often a component of a larger model combined with a decoder. Some transformers combine an encoder and a decoder, while others use only an encoder or only a decoder.
Some systems use the output of an encoder as input for sorting or regression networks.
In sequence-to-sequence operations, an encoder takes a sequence input and returns an internal state (vector). The decoder uses this internal state to predict the next sequence.
- For a definition of an encoder in the Transformer architecture, see Transformer.
- For more information, see "What is LLM: A Language of Large Scale Models?" in Machine Learning Crash Course.
- Set
- A collection of independently trained models whose predictions are averaged or aggregated. Sets often make better predictions than individual models. For example, a random forest is an ensemble made from multiple decision trees. Note that not all decision trees are ensembles.
- For more information, see Random Forests in Machine Learning Crash Course.
- Entropy
- In information theory, a description of how unpredictable a distribution or probability is. Alternatively, entropy can be defined as how much information each instance contains. The entropy of a distribution is maximized when all values of the random variables are equally likely.
- The entropy of a set with two possible values, '0' and '1' (e. g., the labels in a binary sorting problem), is given by the following formula:
- H = -p log p - q log q = -p log p - (1-p) * log (1-p)
- is the entropy.
- is the proportion of examples that are '1'.
- is the proportion of examples that are '0'. q = (1 - p)
log
is typically a calendar.
In this case, the units of entropy are bits.
For example, suppose:
100 examples contain the value '1'
300 examples contain the value '0'.
<#fundamentals Therefore, the entropy value isp = 0, 25
- q = 0, 75
- H = (-0, 25)log
(0, 25) - (0, 75)log
(0, 75) = 0, 81 bits per example
A perfectly balanced set (e. g., 200 "0"s and 200 "1"s) has 1, 0 bits of entropy per instance. As the set becomes more unbalanced, its entropy tends toward 0, 0.
In decision trees, entropy helps shape information gain when developing classification decision trees to help the classifier select conditions.
Compare with entropy:
Entropy is often called Shannon's entropy.
- For more information, see Exact partitioning for binary classification by numerical features in the Decision Trees course.
- .
In reinforcement learning, the world contains an agent, and the agent can observe the state of the world. For example, the world represented could be a game like chess, or a natural world like a maze. As the agent applies actions to the environment, the environment transitions between states.
Epoch
#Basic principlesEpoch
#Basic principles
fit) is a convex function.An epoch represents training iterations of size N/batch, where N is the total number of samples.
- For example, assume that:
- The dataset consists of 1. 000 examples.
- The batch size is 50 examples.
Thus, one season requires 20 iterations:
#Basic principlesFor more information, see Linear Regression. See Hyperparameters in Machine Learning Crash Course.
Epsilon-Lamage Policy
In reinforcement learning, a policy that either follows a random policy with probability ε, or a greedy policy otherwise. For example, if ε is 0. 9, there is a 90% chance of following the random policy and a 10% chance of following the greedy policy.
After a continuous episode, algorithms reduce the value of Ipsilon to shift from random policies to greedy policies. By shifting the policy, the agent first explores the environment and then uses the random search results.
Equality of opportunity
A fair scale that evaluates whether a model is equally predicted for all values of a sensitive attribute. In other words, if the desired result for the model is positive, the goal is that the true positive percentage is the same in all groups.
The opportunity is associated with the balance of the return, which requires the same true positive and false positive ratio in all groups.
- The Glab de Budrib University has enrolled both the Liliputo and the Blob Dinnagi in a strict mathematics plogram. Lilipati secondary schools provide powerful math curriculum, and most of the students are suitable for university programs. There are no mathematics classes at the Brob Buding Nurinsei secondary school, so the number of students who are qualified to enter university is much less. If a qualified student has the ability to enroll, regardless of whether it is a lilimeting person or a brovered kanagin, a priority label for ethnic groups (lirling or blobed nagi) will give a priority label. Equality is satisfied.
- For example, 100 Lillipat people and 100 brove Dinnagians have applied to the Grabdubbudrib University, and the enrollment decisions were made as follows:
- Table 1. Lilipat candidates (90 % are qualified)
ability
Absolute
In decision trees, entropy helps shape information gain when developing classification decision trees to help the classifier select conditions.
Failure
total
Ratio of successful applicants: 45/90 = 50% failure ratio: 7/10 = 70% successful applicants: (45+3)/100 = 48
Table 2. Brows Nazian candidates (10 % are qualified):
Competent
absolute
Qualification
#Basic principlestotal
Ratio of qualified students who have been allowed to enroll: 5/10 = 50 % No n-business student ratio: 81/90 = 90 % Brows Nazian students who have been allowed to enter. 5+9)/100 = 14
In the above example, the skilled Lilliputs and Brobing Na Gangs are both a 50 % chance of passing the equality of the opportunity for skilled students to pass.
The equality of the opportunity is satisfied, but the following two fairnesses are not satisfied:
Popular equality: The admission rate of Lilipat and Blob Dinnaya is different. 48 % of Lilipat students are allowed to enter, but only 14 % of Brov Dinnaya students are allowed to enter.
Open probability: qualified Lilipatutian students and Brov Dinnaya are the same probability that they are allowed to enroll, but the unqualified Lilipatutian students and Brov Dinnaya students are equal. No additional restrictions are not satisfied. The improper Lilipat people were 70 %, and the no n-qualified Blob Dinnag was 90 %.
For details, see "fairness: equality of opportunity" in the machine learning crash course.
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.A fairness of a fairness that evaluates whether a model is equally predicted for all sensitive attributes, regarding both positive classes and negative classes. In other words, both true positive and fake negative rates must be the same for all groups.
The equalized probability is related to the equal occasion (Equality of Opportunity), which focuses only on the error rate of individual categories (positive or negative). For example, the Grab Dubbudrib University enrolls both the Liliputo and the Blob Dinnayan in a strict mathematics profile. Lilipcia secondary schools provide powerful math curriculum, and most of the students are qualified to participate in university programs. Blob Dinnaya secondary school does not provide math curriculum at all, and as a result, the number of qualified students is much less. Whether the applicant is a lilipute or a brove Dinnagi, it is equal to be accepted by the program if the applicant is qualified, and the possibility of failing without qualifications is equal. The probability is satisfied. Suppose 100 Lilipat and 100 Brove Dinnagan have applied to Grab Dubdrib University and have been enrolled as follows: Table 3. Lilipat applicants (90 % are qualified) Person in charge Absolute Admission permit Failure total Ratio of successful applicants: 45/90 = 50% failure ratio: 8/10 = 80% successful applicants: (45+2)/100 = 47 Table 4. Candidates from Blob Dinnagians (10 % are eligible):
ability
absolute
Passing
Failure
- total
- Percentage of successful applicants: 5/10 = 50% Percentage of unsuccessful applicants: 72/90 = 80% Percentage of Brobdingnagian applicants: (5+18)/100 = 23
Equal odds are met because qualified Lilliputian and Brobdingnagian students have a 50% chance of being admitted and unqualified Lilliputian and Brobdingnagian students have an 80% chance of being rejected.
Note: Equalization of odds is met here, but demographic equality is not. Lilliputian and Brobdingnayan students will be admitted to the University of Grabdubbdrib at different rates: 47% of Lilliputian students will be admitted and 23% of Brobdingnayan students will be admitted.
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.Note: Equalizing probability contrasts with a looser measure of equality of opportunity.
- Estimators
- #tensorflow
Deprecated TensorFlow API. Use tf. keras instead of Estimators.
- Evaluate
- #generativeAI
Mainly used as shorthand for LLM evaluation. More generally, evals is shorthand for any form of evaluation.
- Evaluate
- #generativeAI
- The process of measuring the quality of a model or comparing different models with each other.
To evaluate a supervised machine learning model, one typically bases judgments on a validation set and a test set. LLM evaluations usually include a more extensive quality and safety assessment.
Examples
#basic principles
Examples in supervised learning fall into two general categories:
Labeled examples are 1 A tagged example consists of one or more attributes and a label. Tagged examples are used in training.
Unlabeled examples consist of one or more features but no label. Unlabeled examples are used when drawing conclusions.
For example, suppose you want to train a model to determine the impact of weather on students' test scores. Here are three labeled examples:
Features
Labels
Temperature
Humidity
Pressure
Test Rating
Good
Publisher
G
Poor
Below are three unlabeled examples:
#Basic principlesHumidity
While a valuable measurement in some situations, accuracy is very misleading in others. In particular, accuracy is usually a poor metric for evaluating classification models that deal with class-imbalanced datasets.
A dataset's scope is typically a raw source of instances. That is, examples typically consist of a subset of the dataset's columns. Furthermore, example attributes may include composite attributes, such as intersections of attributes.
For more information, see the Supervised Learning section of the Introduction to Machine Learning course.
Replay Experience
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.Experimenter Bias
- Exploding Gradient Problem
- The tendency for gradients in deep neural networks (especially recurrent neural networks) to be surprisingly steep (high). When gradients are steep, the weights of each node in a deep neural network are often updated very large.
- Models that suffer from the gradient burst problem become difficult or impossible to learn. Gradient truncation can alleviate this problem.
- $$F
= $$frac& amp; gt? $$
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.Precision = 0, 6
Recall = 0, 4
#Basic principles= ¶frac& amp; gt; = 0, 48$$
When precision and recall are fairly similar (as in the previous example), F
is close to their average value. When precision and recall are significantly different, F
- is closer to the lowest price. For example, precision = 0, 9
- recall = 0, 1
- $$F
= $$frac& amp; gt? & amp; gt? = 0, 18$$
Fairness constraints
- Applying constraints to an algorithm to ensure that one or more fairness definitions are met. Examples of fairness constraints include:
- Post-processing the model output.
Modifying the loss function to incorporate penalties for violating the fairness metric.
Adding mathematical constraints directly to the optimization problem.
Fairness metric
A mathematical definition of "justice" that can be measured. Commonly used fairness metrics include:
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.#Basic principles
An example where a model incorrectly predicts the negative class. For example, the model predicts that a particular email message is not unwanted (negative class), but in fact the email is unwanted.
False negative rate
The percentage of true positive examples where the model incorrectly predicts the negative class. Calculate the false negative rate with the following formula:
Text
- = False Negative Rate + \text& amp; gt;$$.
- For more information, see Thresholds and Confusion Matrices in the Crash Learning Machine course.
False Positive (FP)
#Basic Principles
An example where the model incorrectly predicts a positive order. For example, the model predicts that a certain email is unsolicited (positive class), but in fact the email is not unsolicited.
For more information, see Thresholds and Confusion Matrices in the Crash Learning Machine course.
False positive rate
#Basic principle
A percentage of a true negative example that the model accidentally predicts the positive order. The following formula calculates the false positive rate:
text
= Ⅾfrac & amp; gt? + \ Text & amp; gt; gt; $k.
The false positive rate is the X-axis of the ROC curve.
- ISee Classification: Roc and AUC in Course Crash Machine Learning for details.
- pThe model tries to remove noise.
- q#Basic principleInput variable in machine learning model. The instance consists of one or more features. For example, suppose you learn a model that determines the effect of the weather on the score of the student test. The table below shows three examples, including three attributes and labels:
Features
- label
- temperature
Humidity
pressure
Test evaluation
For more information, please refer to the teachers of the introductory course of machine learning.
Cross feature
#Basic principle
For example, let's consider the "disposal prediction" model that represents one of the following four bottles:
And the wind speed is represented by one of the following three bottles:
- If there is no intersection, the linear model will be trained independently of each of the previous seven different bottles. Therefore, the model is, for example, for freezing, is trained independently of training using wind.
- Alternatively, you can create a crossover for temperature and wind speed. This synthetic attribute has the following 12 possible values:
Frozen-Staint
FROZEN-LIGHT
Frozen-Wind
Cold-Still
Cold light
Cold air
Temple-Still
Hottestness
Warm air
Hotstill<0>Hot air0>- Hot air
- Thanks to this feature cross, the model can learn the differences between the cold days and the day when there is no chills.
- If you create a synthetic feature from two features with different buckets, the obtained features will have a huge number of combinations. For example, if one feature has 1, 000 buckets and the other features have 2, 000 buckets, the resulting cross is 2, 000, 000 buckets.
- Formally, the cross is dusty.
- Featchakrossover is mainly used in linear models and is rarely used in neural networks.
See Category Data: For details, see "Featuring" for machine learning error courses.
Characteristic engineering
#Significance engineering Tensorflow
Process including the following steps:
#Basic principlesConvert raw data from the dataset into an effective version of these characteristics.
For example, you can specify that temperature is a useful feature. You can then experiment with bucketing to optimize what your model can learn from different temperature ranges.
Click the icon to see more notes about TensorFlow.
fit) is a convex function.See Numeric Data: For more information, see "Numerical Data: How Models Consume Data Using Feature Vectors" in Machine Learning Crash Course.
How does machine learning extract features using feature extraction methods?
fit) is a convex function.Taking an intermediate feature representation computed from an unsupervised or pre-augmented model (e. g. the hidden layer values of a neural network) and using it as input for another model.
A synonym for feature engineering.
Feature Notes
Feature Sets
#Basic principlesYour train group features a train of machine learning models. For example, zip code, property size, and property status might constitute a simple attribute set for a model that predicts house prices.
Attribute Specification
While a valuable measurement in some situations, accuracy is very misleading in others. In particular, accuracy is usually a poor metric for evaluating classification models that deal with class-imbalanced datasets.
Describes the information needed to extract attribute data from a TF protocol buffer example. Because the TF. example protocol buffer is simply a container for data, you need to specify:
- The data to extract (i. e., the attribute keys).
- Data type (e. g. float or int)
- Length (fixed or variable)
(variable (fixed or float), fixed (fixed or float)
#Basic principles
H
The range of attribute values that make up an instance. Attribute vectors are input during training and inference. For example, the feature vector for a model with two discrete features might look like this:
[0, 92, 0, 56]
Each example has a different value for the feature vector, so the feature vector for the next example might look like this:
[0, 73, 0, 49]
Feature engineering determines how features are replicated on the feature vector. For example, a binary categorical attribute with five possible values can be represented with a one-band encoding. In this case, the feature vector segment for a particular example would consist of four zeros and one 1 in the third position, followed by a 0, as follows:
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
As another example, suppose your model consists of three features:
1 A binary categorical attribute with 5 possible values represented with band coding, e. g. [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
Another binary categorical attribute with 3 possible values represented with 1-band coding, e. g. [0, 0, 0, 0, 1, 0]
#Basic principlesIn this case, the characteristics of each example are represented by nine values. In the example of the list above, the feature vector is as follows:
0, 0 1, 0, 0, 0, 0, 0, 0, 0 1, 0 8, 3
For more information, see the Numerical Data Section: Course Crash Machine Learning Data Absorption Methods using feature vectors.
The mechanism used in neural networks shows the meaning of a specific word or word. Attention compresses the amount of information required for the model to predict the next token/ word. A typical attainment mechanism may be composed of weight with weight for a series of inputs, and the weight of each input is calculated by another part of the neural network.A process that extracts features from input sources such as documents and videos, and maps them to vectors.
- Federal learning
- Расый подход машоб е, к котом моде машобобобоб оа аны х, расолых нак уойах, как сароны. с саловогогог ной. йа загаг у м моде (но н оающ пы) на кордиацацацый с, где оъ Ди добадн у лой лобалой моде. посоде, выч у companies отошы.
Посол оающ пы ногагагаю, едеатое о оду понап цап ный сор даных и х мац.
Бол поДроборац о одеатом оожожай в эом уоводс.
text#базовые пы In machine learning, the prediction of a certain model affects the learning data of the same model or another model. For example, a model that recommends a movie affects the movie that users see, which affects the following movie recommendation models.
\.Feed forward neural network (FFN)
A neural network without circulation connection or recursion connection. For example, the conventional deep neural network is a feed forward neural network. It is different from a recycling recurrent neural network.
Best learning
It is designed to learn efficient classification devices from only a small number of examples of a machine learning approach that is often used in object classification.
Several prompts
#Playback AI
A prompt that shows how a large language model should respond, including one or more ("small") example. For example, the next long prompts include two examples of how large language models respond to queries.
<#TensorFlow See the neural network of the machine learning crash course for details.remarks
- What is the official currency of this country?
- The question you want to answer in LLM.
French Euro
example
#Basic principlesAnother example
India
Real problem
I
Several prompts generally have more desirable results than zer o-based prompts and one prompt. But several shots prompts need longer prompts.
For more information, see the mechanics section of the machine learning crash course.
violin
The Python-First configuration library defines functions and class values without invasive code or infrastructure. For PAX and other ML code base, these functions and classes represent models and hyperparametric learning.
Fiddle assumes that machine learning code is generally divided as follows:
Library code that defines layers and optimizers.
- "Glue" code of dataset: Call the library and wire everything.
- The violin captures the glue code calling structure in an unused and can be changed.
Arrange
#generative
The second special training release will be performed on a pr e-training model to improve parameters for specific use cases. For example, the complete training sequence of some large language models is as follows:
Pr e-training: Train a large language model with a huge genera l-purpose dataset, such as all English Wikipedia pages.
Fine tuning: Trains a prior trained model to execute a specific task, such as a response to a medical query. Fine adjustments usually include hundreds to thousands of examples that focus on a specific task.
As another example, the complete learning sequence of the larg e-scale image model is as follows:
Preliminary learning: Learn a larg e-scale image model with a huge genera l-purpose image dataset, such as all images of Wikimedia Commons.
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.For fine adjustments, the following strategy can be combined:
#Basic principlesOnly a part of the existing parameters of the trained model (usually the most close to the output layer) are changed, and other existing parameters do not change (usually the most close to the input layer). See parameter tuning.
Usually, a further layer is added to the existing layer close to the output layer.
Refinement is a form of transfer learning. Therefore, refinement can use a different loss function or a different model type than the one used to train the pretrained model. For example, you can set a pretrained big image model to generate a regression model that returns the number of birds in an input image.
Compare and contrast refinement on the following terms:
For more information, see the Reset section of the Crash Machine Learning course.
For more information, see Reset in the Crash Machine Learning course.
#Basic principlesLinas
An open-source transformation library built on top of flax, designed primarily for natural language processing and multimodal research.
Forget Gateway
The part that controls the flow of information in a long short-term memory cell. The forget gate maintains context by deciding which information to discard from the cell's state.
Fully Softmax
Fully Connected Layer
A hidden layer in which each node is connected to every node in the next hidden layer.
Fully connected layers are also called dense layers.
Mode Transformation
- A function that takes a function as input and returns a transformed function as output. Jax uses function transformation.
- Gasari
- Generalization
#Basic Principles
- The ability of a model to make correct predictions on new data that it has not seen before. A model that can generalize is the opposite of a model that can transcend.
Click on the icon to see additional information.
- You train your model on examples from your training set. This way, your model learns the characteristics of the data in your training set. Generalization essentially asks whether your model can make good predictions on examples that are not in your training set.
- To promote generalization, regularization helps your model learn more accurately about the idiosyncrasies of the data in your training set.
Gemini
#Generative Eye
An ecosystem that includes Google's cutting-edge AI. Elements of this ecosystem include:
Gemini models.
A conversational chat interface for Gemini models. A user's keyboard prompts and Gemini's responses to them.
Various Gemini APIs.
Various business products based on Gemini models. For example, Gemini for Google Cloud.
#Basic principles#generativeai
Users can interact with the Gemini model in a variety of ways, including a screening dialogue interface and an SDK.
Generalization curve
#Basic principles
The generalized curve helps to detect the possibility of overloading. For example, the following generalized curves are overloaded because the verification loss ultimately higher than the training loss.
General line model
A maximum 2-passing model based on Gaussian noise is a generalized model based on other types of noise, such as Poisson noise and category noise. Examples of generalized line model
Logistic regression
For more information, see Classification: ROC and AUC in Course Crash Machine Learning.Heavy category regression, heavy category regression, heavy category regression
The parameters of the generalized line model are found by convex optimization.
The generalized line model shows the following characteristics.
#Basic principlesThe average probability predicted by the optimal logistic regression model is equal to the average label in training data.
The power of the generalized line model is limited by its characteristics. Unlike deep models, generalized line models cannot "learn new features".
Genetic hostile network (GAN)
A system that generates a new data in which the generator generates data and determines whether the data generated is valid or abnormal.
Genetic AI
#Generative AI
A new field of change without a formal definition. Nevertheless, most experts agree that genetic AI models can generate ("generation"):
- For example, a genetic AI model can create sophisticated essays and images.
- Certain preceding technologies, including LSTMS and RNN, can also create original and consistent content. Some experts consider these preceding techniques to be genetic AI, while others believe that true genetic AI requires more complex outputs than those that can generate preceding techniques.
Genetic model
Actually, a model that performs one of the following:
Create (create) a new example from the training dataset. For example, a genetic model can generate poetry after training in a poem dataset. The generation of genetic networks is supplemented in this category.
For example, if the following definitions are given, A and B have different dimensions, so the linear algebra prohibits A+B:A genetic model can theoretically distinguish between examples in a dataset or distributions of specific features, i. e.
P(examples)
Unsupervised learning models are genetic.
For example, if the following definitions are given, A and B have different dimensions, so the linear algebra prohibits A+B:Gini impurity
A measure similar to entropy. Separators use values derived from either Gini impurity or entropy to compound the terms of a classification decision tree. Information gain is derived from entropy. There is no universally accepted equivalent term for the measure derived from Gini impurity. However, this anonymous metric is just as important as information gain.
#Basic principlesClick on the icon for more mathematical details on Gini impurity.
Gini impurity is the probability of misclassifying new data drawn from the same distribution. The Gini impurity of a set with two possible values, 0 and 1 (for example, labels in a binary classification problem), is calculated as follows:
- I = 1 - (p 2 + q 2) = 1 - (p 2 + ( 1-p) 2)
- is the Gini impurity.
J
is the proportion of examples 1.
is the proportion of examples 0.
- where q = 1-p
- For example, consider the following dataset:
- 100 tags (0, 25 in the dataset) contain the value 1.
- 300 labels (0. 75 in the dataset) contain the value 0.
Hence, the Gini impurity is
Hence, a random label from the same dataset will be incorrectly classified 37. 5% of the time and correctly classified 62. 5% of the time.
K
Perfectly balanced labels (e. g., 200 "0"s and 200 "1"s) have a Gini impurity of 0, 5, whereas perfectly balanced labels are closer to 0, 0.
Gold Dataset
A collection of hand-curated data that captures the ground truth. Teams can use one or more gold datasets to evaluate the quality of their models.
Some gold datasets capture different subdomains of the ground truth. For example, a golden dataset for image classification can capture lighting conditions and image resolution.
GPT (Genetic Pretrained Transformers)
GPT variants can be applied to multiple techniques, such as:
Image generation (e. g. ImageGPT)
ImageGraphics (e. g. ImageGraphics; ImageGraphics; ImageGraphics; ImageGraphics; ImageGraphics; ImageGraphics; ImageGraphics; ImageGraphics; ImageGraphics; ImageGraphics; ImageGraphics.)
Partial differential operator with respect to all independent variables. In machine learning, the gradient is the vector of partial derivatives of a function in a model. The gradient points in the direction of steepest uphill slope.
Gradient accumulation
The back propagation technique updates the parameters only once for each season, not once for each repetition. After each mini batch is processed, the gradient accumulation simply updates the gradient set. Then, after processing the last minimilot in the season, the system ultimately updates the parameters based on the sum of all gradient changes.
The mechanism used in neural networks shows the meaning of a specific word or word. Attention compresses the amount of information required for the model to predict the next token/ word. A typical attainment mechanism may be composed of weight with weight for a series of inputs, and the weight of each input is calculated by another part of the neural network.(GBT)
- Gradient boosting
- A training algorithm trained by a weak model to improve the quality of a strong model (reduces loss). For example, a weak model is a linear model or a small determined tree model. The strong model is the sum of all previously trained weak models.
The simplest gradient boosting learns to predict the loss slope of a weak model in each repetition. The output of the strong model is updated by subtracting the predicted gradient, similar to the gradient descent.
$ $ f_
= 0 $ $ $ $ $ $ f_ = f_i -˶ XI f_i $ $ $ $ $ f_ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $.
F_ $ is the first strong model.
F_ $ is the next strong model.
F_ $ is the current strong model.
The mechanism used in neural networks shows the meaning of a specific word or word. Attention compresses the amount of information required for the model to predict the next token/ word. A typical attainment mechanism may be composed of weight with weight for a series of inputs, and the weight of each input is calculated by another part of the neural network.F_ $ is a weak model learned to predict the loss gradient of $ f_ $.
- Recent gradient boosings include the secon d-liative function (Hesian) of the loss.
- Gradient gradient
A mechanism that is commonly used to alleviate explosive gradient problems by artificially restricted (cut) the maximum gradient's maximum value when using gradient descending in model learning.
- gradient
- #Basic principle
L
L0The slope descending is much larger than machine learning.
#Basic principles#TensorFlow
L0Calculation specifications in TensorFlow. The graph node represents a function. The edge is suitable and expresses the result of a certain operation (tensol) as a different operator. To visualize the graph, use Tensorboard.
While a valuable measurement in some situations, accuracy is very misleading in others. In particular, accuracy is usually a poor metric for evaluating classification models that deal with class-imbalanced datasets.
L0#TensorFlow0In the TensorFlow programming environment, the program first builds a graph and executes all or part of the graph. The graph execution is the default execution mode of TensorFlow 1.
L1Greedy policy
#Basic principlesBasic principle1#Basic principle
What really happened. For example, we consider a binary evaluation model that predicts whether a firs t-year student student will graduate within 6 years. The base truth of this model is whether students will graduate within six years. Click the icon to see the supplementary explanation. 7 6 1 5 4 1 8 11 3 4 6 2 9 8 1 8 = L1Greedy policy In an example of graduation, is the graduation record of each student always correct? Is the university record management perfect?
The label is the floating point value measured by the instrument (for example, a diolometer). How can you see that each device is calmed at the same time or that each measured value is measured under the same conditions?
- If labeling is a human opinion issue, how can each human judge also evaluate the facts in the same way? In order to enhance consistency, human reviewers may intervene.
- Team bias
- Assuming that being true for one person is true for everyone in the group. Convenience sampling is used to collect data, the effects of group performance bias may worsen. In no n-representative samples, answers that do not reflect reality may be given.
L1The slope descending is much larger than machine learning.
#Basic principlesFragmentation1The mechanism for category data when the number of categories that actually appear in the dataset is relatively small, especially in machine learning. < SPAN> #Tensorflow
L2Greedy policy
#Basic principlesA policy that always chooses the most highly expected performance in enhanced learning.2#Basic principle
What really happened. For example, we consider a binary evaluation model that predicts whether a firs t-year student student will graduate within 6 years. The base truth of this model is whether students will graduate within six years. For example, we consider a binary evaluation model that predicts whether a firs t-year student student will graduate within 6 years. The base truth of this model is whether students will graduate within six years. 7 6 1 5 4 1 8 11 9 4 6 4 9 8 1 Click the icon to see the supplementary explanation.2Greedy policy In an example of graduation, is the graduation record of each student always correct? Is the university record management perfect?2The label is the floating point value measured by the instrument (for example, a diolometer). How can you see that each device is calmed at the same time or that each measured value is measured under the same conditions?2If labeling is a human opinion issue, how can each human judge also evaluate the facts in the same way? In order to enhance consistency, human reviewers may intervene.1Team bias1Assuming that being true for one person is true for everyone in the group. Convenience sampling is used to collect data, the effects of group performance bias may worsen. In no n-representative samples, answers that do not reflect reality may be given.
Illusion2The genetic AI model, which should claim the real world, will output, but actually output. For example, a genetic AI model, which claims to have died in 1865, is hallucinations.2Fragmentation
In an example of graduation, is the graduation record of each student always correct? Is the university record management perfect?
- If labeling is a human opinion issue, how can each human judge also evaluate the facts in the same way? In order to enhance consistency, human reviewers may intervene.
- Team bias
- Assuming that being true for one person is true for everyone in the group. Convenience sampling is used to collect data, the effects of group performance bias may worsen. In no n-representative samples, answers that do not reflect reality may be given.
L2The slope descending is much larger than machine learning.
#Basic principlesWhat really happened.2For example, we consider a binary evaluation model that predicts whether a firs t-year student student will graduate within 6 years. The base truth of this model is whether students will graduate within six years.
L2Click the icon to see the supplementary explanation.
We evaluate the quality of the model to Grand Truth. But the Grand Trousth is not always a perfect true truth. For example, let's consider an example of the following Grand Trousth:
#Basic principlesThe label is the floating point value measured by the instrument (for example, a diolometer). How can you see that each device is calmed at the same time or that each measured value is measured under the same conditions?
If labeling is a human opinion issue, how can each human judge also evaluate the facts in the same way? In order to enhance consistency, human reviewers may intervene.
#Basic principlesAssuming that being true for one person is true for everyone in the group. Convenience sampling is used to collect data, the effects of group performance bias may worsen. In no n-representative samples, there is a possibility that answers that do not reflect reality will be given.
Illusion The genetic AI model, which should claim the real world, will output, but actually output. For example, a genetic AI model, which claims to have died in 1865, is hallucinations. Fragmentation The mechanism for category data when the number of categories that actually appear in the dataset is relatively small, especially in machine learning. 3 2 15 For example, about 73, 000 kinds of trees inhabit the earth. 73, 000 kinds of trees live. Part 73. 000 kinds of trees can be expressed in 73. 000 category bins. Alternatively, if there are only 200 species of trees that actually appear in the dataset, the seeds of the tree can be divided into 500 bottles. 2 1 72 One bottle can include multiple tree species. For example, two genetically different trees, baobab and red maple, can be placed in the same bottle. In any case, hussing is still a good way to map a larg e-scale category set on the selected bottle. The hash converts category features with a large number of values to a smaller number and cluster the value in a deterministic way. 4 2 34 heuristic A solution to a simple and prompt solution for a problem. For example, "Huristic achieved 86 % accuracy. When the migration to the deep neural network was shifted, the accuracy reached 98 %."
Hidden layer
#Basic principle
A layer between the input layer (feature) of the neural network and the output layer (prediction). Each hidden layer is composed of one or more neurons. For example, the following neural networks have two hidden layers, the first is three neurons, and the second is two neurons:
#Basic principlesHierarchical clustering
#cluster
A class of clustering algorithm for creating a cluster tree. Hierarchical clustering is suitable for hierarchical data such as plant classification. There are two types of hierarchical clustering algorithm:
In j-fi d-type clustering is first assigned each instance to its own cluster, and the closest cluster is repeatedly merged to create a hierarchical tree.
In split clustering, all cases are first clustered to the cluster and the cluster is repeatedly divided to create a hierarchical tree.
Hinge loss
While a valuable measurement in some situations, accuracy is very misleading in others. In particular, accuracy is usually a poor metric for evaluating classification models that deal with class-imbalanced datasets.
text
= \ text (0, 1- (y * y ')) $ about
Here Y is a true label-1 or+1, y 'is the output of the classification model:
y '= b + w_1x_1 + w_2x_2 + ... w_nx_n $ $.
So a plot of hinge loss versus (y * y') looks like this:
#Basic principlesA type of bias that is already present in the world and has crept into the dataset. Such biases tend to reflect existing cultural stereotypes, demographic inequalities, and biases against certain social groups.
- For example, consider a classification model that predicts whether a loan applicant will default on their loans. The model was trained on historical repayment data from the 1980s from two different communities' local banks. If applicants from community A in the past were six times more likely to default than applicants from community B, the model can learn historical biases that make it less likely to approve loans for community A, even if the historical circumstances that led to higher default rates in community A are no longer important.
- Holdout Data
- Examples that were not intentionally used ("reserved") during training. The validation and test datasets are examples of HoldOut data. Holdout data helps evaluate the model's ability to generalize to data other than the data it was trained on. Losses on the holdout dataset are a better estimate of losses on unseen datasets than losses on the training set.
Cloud
#Googlecloud
When training an ML model on an accelerator chip (GPU or TPU), you control both:
fit) is a convex function.The input pipeline extraction and transformation.
- Mainframes usually run on CPUs, not on accelerator chips. This device processes tensors on the accelerator chip.
- Superpiezometer
#Basic principle
A variable that you or a hyperparameter tuning service adjusts during successive builds of a model. For example, the learning rate is a hyperparameter. Before a training session, you can set the learning rate to 0, 01. If you find that 0, 01 is too high, perhaps you can set the learning rate to 0, 003 for the next training session.
Hyperplasia
A boundary that divides a space into two subspaces. For example, a line is a 2-dimensional hyperplane, and a plane is a 3-dimensional hyperplane. In more general machine learning, a hyperplane is a boundary that divides a high-dimensional space. Support kernel vector machines use a hyperplane to separate positive and negative categories, often in a very high-dimensional space.
i. i. d.
Image recognition
A process to classify objects, patterns, or concepts in the image. Image recognition is also called image classification.
Datromatic dataset
#Basic principlesAutomatically associate and assume based on the patterns and memories of your heart. Implanted bias has the following effects:
Collection and classification method of data.
How to design and introduce the machine learning system.
For example, when building a classifier that recognizes a wedding photo, engineers may use the characteristics of white dresses in the photo. However, white dresses were only common in certain times and culture.
$$F
Incompatibility of fairness measurement2The idea that some kind of justice concepts cannot be fully compatible with each other and cannot be fully satisfied at the same time. As a result, there is no single universal scale to quantify justice that can be applied to all ML problems.
This may seem discouraged, but the contradiction of justice measurement standards does not mean that justice efforts are not fruitful. Rather, justice suggests that it should be defined according to the context, with the aim of preventing the given ML problem with the purpose of preventing the case used.
#Basic principlesInconnection learning
#generative
Distributed independently and the same (i. i. D)
#Basic principlesData obtained from distributions that do not change and do not depend on the previously designed values. I. I. D. is an ideal gas for machine learning and useful mathematical components, but is rarely found in the real world. For example, visitor's distribution to a website may be I. I. D. in a short period of time. In other words, during this short window, the distribution does not change, and the visit of a person is generally independent of other people's visit. However, extending the time window may cause a seasonal difference between website visitors.
Personal fairness
A fair scale that tests whether similar people are classified in the same way. For example, Blob Dinnagien Academy may want to satisfy individual fairness by enrolling two students with the same grades and standardized test scores in the same way. < SPAN> Process that classes objects, patterns, or concepts in images. Image recognition is also called image classification.
Datromatic dataset
\.- Automatically associate and assume based on the patterns and memories of your heart. Implanted bias has the following effects:
- Collection and classification method of data.
- How to design and introduce the machine learning system.1For example, when building a classifier that recognizes a wedding photo, engineers may use the characteristics of white dresses in the photo. However, white dresses were only common in certain times and culture.2Belonging
- Incompatibility of fairness measurement1The idea that some kind of justice concepts cannot be fully compatible with each other and cannot be fully satisfied at the same time. As a result, there is no single universal scale to quantify justice that can be applied to all ML problems.2This may seem discouraged, but the contradiction of justice measurement standards does not mean that justice efforts are not fruitful. Rather, justice suggests that it should be defined according to the context, with the aim of preventing the given ML problem with the purpose of preventing the case used.
- b = 7
- w1For more detailed discussions on this issue, see "Acute (A) Possibility".
- w2Inconnection learning
- w3#generative
Distributed independently and the same (i. i. D)1#Basic principle2Data obtained from distributions that do not change and do not depend on the previously designed values. I. I. D. is an ideal gas for machine learning and useful mathematical components, but is rarely found in the real world. For example, visitor's distribution to a website may be I. I. D. in a short period of time. In other words, during this short window, the distribution does not change, and the visit of a person is generally independent of other people's visit. However, extending the time window may cause a seasonal difference between website visitors.3)
Personal fairness
- x1 = 4
- x2A fair scale that tests whether similar people are classified in the same way. For example, Blob Dinnagien Academy may want to satisfy individual fairness by enrolling two students with the same grades and standardized test scores in the same way. A process to classify objects, patterns, or concepts in the image. Image recognition is also called image classification.
- x3 = 5
Datromatic dataset
Topable bias
Automatically associate and assume based on the patterns and memories of your heart. Implanted bias has the following effects:
#Basic principlesHow to design and introduce the machine learning system.
- For example, when building a classifier that recognizes a wedding photo, engineers may use the characteristics of white dresses in the photo. However, white dresses were only common in certain times and culture.
- Belonging
Incompatibility of fairness measurement
The idea that some kind of justice concepts cannot be fully compatible with each other and cannot be fully satisfied at the same time. As a result, there is no single universal scale to quantify justice that can be applied to all ML problems.