[ja] cs-229-machine-learning-tips-and-tricks #99

umu1729 · 2018-11-16T06:31:09Z

No description provided.

first commit

second commit

complete all translaton for now

final v2

shervinea · 2018-11-16T09:28:10Z

Thanks for your ongoing work @umu1729! I only kept the cheatsheet you were translating, so that this PR does not conflict with #96.

shervinea · 2019-09-11T05:42:59Z

Just realized that your work was already ready to be reviewed @umu1729! Please feel free to invite friends to take a look at it.

ytknzw · 2019-10-28T13:02:52Z

@shervinea
This cheat sheet has been translated and reviewed in the following PR. Even though it's a different translation from umu1729's, I think we could call this cheat sheet 'done'.
#164 (comment)

shervinea · 2019-10-29T05:38:18Z

Hi @ytknzw, thanks for your message. It is my understanding that PR #164 is not the same as the current one, right? (deep learning tips and tricks as opposed to machine learning tips and tricks)

ytknzw · 2019-10-29T06:37:16Z

Hi @shervinea, yes, sorry! My misunderstanding!

shervinea · 2019-10-30T07:14:40Z

No worries @ytknzw, thanks for helping out in seeing what can be already done! By the way, would you be interested in reviewing the translation?

ytknzw · 2019-10-30T12:08:59Z

By the way, would you be interested in reviewing the translation?
Yes!

shervinea · 2019-11-15T08:37:54Z

@ytknzw, thanks for proposing your help!
Hi @umu1729, please feel free to invite anyone else who you think could be interested in this process. Looking forward to the final version of this translation!

ytknzw

Reviewed 1 - 11.

ytknzw · 2019-11-17T05:14:47Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

@@ -0,0 +1,285 @@
+**1. Machine Learning tips and tricks cheatsheet**
+
+&#10230;　機械学習チップ&トリック　チートシート


Suggested change

⟶　機械学習チップ&トリック　チートシート

⟶機械学習のアドバイスやコツのチートシート

To be consistent with translation of "Deep learning tips and tricks cheatsheet":
https://stanford.edu/~shervine/l/ja/teaching/cs-230/cheatsheet-deep-learning-tips-and-tricks

ytknzw · 2019-11-17T05:15:16Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**2. Classification metrics**
+
+&#10230;　分類評価指標


ytknzw · 2019-11-17T05:21:13Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+&#10230; 二値分類の文脈では，次のようなモデルの性能を評価するための重要な評価指標があります．


Suggested change

⟶ 二値分類の文脈では，次のようなモデルの性能を評価するための重要な評価指標があります．

⟶二値分類の場面では、モデルの性能を評価するために追うべき主な指標として次のものがあります。

for "の文脈では" v. "の場面では", how about "を背景に"?

二値分類において、モデルの性能を評価するための主要な指標として次のものがあります.

二値分類において、モデルの性能を評価する際の主要な指標として次のものがあります.

I think the latter one is better. I intentionally omit the words (that are important to track) in translation. It is a bit redundant if you translate all words as they are.

ytknzw · 2019-11-17T05:25:28Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+&#10230; 混同行列 - 混同行列はモデルの性能を評価する際に，より完全な描像を得るために用いられます．


Suggested change

⟶ 混同行列 - 混同行列はモデルの性能を評価する際に，より完全な描像を得るために用いられます．

⟶ 混同行列 - 混同行列はモデルの性能を評価する際に、より完全に理解するために用いられます。次のように定義されます：

I suppose that a complete picture be translated as '全体像' in Japanese. We use confusion matrix to assess the whole model, so the part in question should be like:

混同行列はモデルの性能を評価する際に、より全体像を把握するために用いられます.

or something like

混同行列はモデルの性能をより全体から評価するために用いられます.

The more in the sentence is comparing it (confusion matrix) to other metrics which focusing on more detail part of assessment.

ytknzw · 2019-11-17T05:26:31Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**5. [Predicted class, Actual class]**
+
+&#10230; [予測したクラス, 実際のクラス]


ytknzw · 2019-11-17T05:28:45Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**7. [Metric, Formula, Interpretation]**
+
+&#10230; [評価指標,式,解釈]


ytknzw · 2019-11-17T05:28:59Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**8. Overall performance of model**
+
+&#10230;　モデルの全体的な性能


ytknzw · 2019-11-17T05:33:38Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**9. How accurate the positive predictions are**
+
+&#10230;　正と判断された予測の正答率


Suggested change

⟶　正と判断された予測の正答率

⟶陽性判定の正解率（陽性的中率）

Cf. https://ja.wikipedia.org/wiki/%E9%99%BD%E6%80%A7%E9%81%A9%E4%B8%AD%E7%8E%87

IMHO, the source phrase in English is trying to put the interpretation in layman's terms. Perhaps something similar to yet simpler than "陽性と判定された場合に、真の陽性である確率" from the aforementioned Wikipedia page.
For example, "陽性判定は、どれくらい正確ですか."

ytknzw · 2019-11-17T05:42:40Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**10. Coverage of actual positive sample**
+
+&#10230; 実際には正であるサンプルを正しく正と予測した割合


Suggested change

⟶ 実際には正であるサンプルを正しく正と予測した割合

⟶ 本当に陽性であるサンプルに対する正解率（真陽性率）

Cf. https://ja.wikipedia.org/wiki/%E9%99%BD%E6%80%A7%E9%81%A9%E4%B8%AD%E7%8E%87#%E5%8F%82%E8%80%83

How about 実際に陽性であるサンプル ?

ytknzw · 2019-11-17T05:43:40Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**11. Coverage of actual negative sample**
+
+&#10230;　実際には負であるサンプルを正しく負と予測した割合


Suggested change

⟶　実際には負であるサンプルを正しく負と予測した割合

⟶　本当に陰性であるサンプルに対する正解率（真陰性率）

Cf. https://ja.wikipedia.org/wiki/%E9%99%BD%E6%80%A7%E9%81%A9%E4%B8%AD%E7%8E%87#%E5%8F%82%E8%80%83

How about 実際に陰性であるサンプル ?

hrkmr-tech · 2019-11-21T10:50:04Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+&#10230; 二値分類の文脈では，次のようなモデルの性能を評価するための重要な評価指標があります．


二値分類において、モデルの性能を評価するための主要な指標として次のものがあります.

二値分類において、モデルの性能を評価する際の主要な指標として次のものがあります.

I think the latter one is better. I intentionally omit the words (that are important to track) in translation. It is a bit redundant if you translate all words as they are.

hrkmr-tech · 2019-11-21T11:16:35Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+&#10230; 混同行列 - 混同行列はモデルの性能を評価する際に，より完全な描像を得るために用いられます．


I suppose that a complete picture be translated as '全体像' in Japanese. We use confusion matrix to assess the whole model, so the part in question should be like:

混同行列はモデルの性能を評価する際に、より全体像を把握するために用いられます.

or something like

混同行列はモデルの性能をより全体から評価するために用いられます.

The more in the sentence is comparing it (confusion matrix) to other metrics which focusing on more detail part of assessment.

hrkmr-tech · 2019-11-21T11:21:31Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**10. Coverage of actual positive sample**
+
+&#10230; 実際には正であるサンプルを正しく正と予測した割合


How about 実際に陽性であるサンプル ?

hrkmr-tech · 2019-11-21T11:23:32Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**11. Coverage of actual negative sample**
+
+&#10230;　実際には負であるサンプルを正しく負と予測した割合


How about 実際に陰性であるサンプル ?

hrkmr-tech · 2019-11-21T11:25:07Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**12. Hybrid metric useful for unbalanced classes**
+
+&#10230;　不均衡データに対する有用な複合指標


hrkmr-tech · 2019-11-21T11:33:45Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+
+&#10230; ROC曲線 - 受信者動作特性曲線(ROC)は閾値を変えていく際のFPRに対するTPRのグラフです．


Please add the next line:

これらの指標は下表の通りまとめられます.

hrkmr-tech · 2019-11-21T11:35:04Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+<br>
+
+**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+


NOTE

An and should be removed from the original.

hrkmr-tech · 2019-11-21T11:41:32Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**14. [Metric, Formula, Equivalent]**
+
+&#10230;　[評価指標,式,等価な指標]


I don't know which one is the best, but I give two more examples.

同等の指標

同義の指標

hrkmr-tech · 2019-11-21T11:45:05Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
+
+&#10230; AUC - ROC曲線下面積(AUC,AUROC)は次の図のようにROC曲線の下側の面積のことです．


Suggested change

⟶ AUC - ROC曲線下面積(AUC,AUROC)は次の図のようにROC曲線の下側の面積のことです．

⟶ AUC - ROC曲線下面積(AUC,AUROC)は次の図に示される通りROC曲線の下側面積のことです

hrkmr-tech · 2019-11-21T11:46:14Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**16. [Actual, Predicted]**
+
+&#10230; [実際，予測]


hrkmr-tech · 2019-11-21T11:51:36Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
+
+&#10230;　[総平方和,説明された平方和,残差平方和]


Suggested change

⟶　[総平方和,説明された平方和,残差平方和]

⟶ [全平方和,回帰平方和,残差平方和]

scrambleegg7

I have done small changes on the last parts of article.

scrambleegg7 · 2019-11-21T11:33:44Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
+
+&#10230; [分類評価指標,混同行列,正解率,適合率,再現率,F値,ROC]


Suggested change

⟶ [分類評価指標,混同行列,正解率,適合率,再現率,F値,ROC]

⟶ [分類評価指標,混同行列,正解率,適合率,再現率,F値,ROC曲線]

scrambleegg7 · 2019-11-21T11:44:39Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+&#10230; アブレーション分析 - アブレーション分析は，ベースラインとするモデルと現在のモデル間の性能差の主要な要因を分析することです．


Suggested change

⟶ アブレーション分析 - アブレーション分析は，ベースラインとするモデルと現在のモデル間の性能差の主要な要因を分析することです．

⟶ アブレーション分析 - アブレーション分析は，ベースライン・モデルと改良されたモデル間で発生したパフォーマンスの差異の原因を分析することです．

I think the current is '現在のモデル'.

scrambleegg7 · 2019-11-21T11:46:33Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
+
+&#10230; [回帰評価指標,R二乗,マローズのCp,AIC,BIC]


scrambleegg7 · 2019-11-21T11:47:04Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**47. [Model selection, cross-validation, regularization]**
+
+&#10230; [モデル選択，交差検証，正則化]


Suggested change

⟶ [モデル選択，交差検証，正則化]

⟶ [モデルの選択，交差検証，正則化]

scrambleegg7 · 2019-11-21T11:47:57Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+&#10230; [分析，バイアス・バリアンストレードオフ，エラー・アブレーション分析]


Suggested change

⟶ [分析，バイアス・バリアンストレードオフ，エラー・アブレーション分析]

⟶ [解析（分析），バイアス・バリアンストレードオフ，エラー・アブレーション分析]

I think 診断 or 診断方法 is better. We need discuss this topic.
https://github.com/shervinea/cheatsheet-translation/pull/99/files#r349865518

Yep, I think it might be ok with "診断", however when Google searching with combination words "機械学習" and "診断", I look at many headlines related to the medical topics in Japan. The combination words "機械学習" and ("解析" or "分析") pick up headlines related to the machine learning in Japan. Though my approaching to find best translated word not be good one, how do you think about that ?

Yes, you're right. "診断" is a bit weird. This section is about trouble shooting when your model doesn't work as expected. But "分析" sounds like "データ分析". How about "問題分析" ?

Thank you for your nice comment. I agree with "問題分析". It will be definitely fit to original nuance in English.

As far as I know, "打診" in Japanese had a similar word sense derivation like "diagnostics "in English: from medical usages to common ones.
However, when I see "問題分析", I may think the source phrase in English was "error analysis."

scrambleegg7 · 2019-11-21T11:49:50Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**44. Regression metrics**
+
+&#10230; 回帰評価指標


Suggested change

⟶ 回帰評価指標

⟶ 回帰分析の精度評価指標

hrkmr-tech · 2019-11-21T11:57:35Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
+
+&#10230;　[基本的な評価指標] 回帰モデルfが与えられたとき，次のようなよう化指標がモデルの性能を評価するために一般的に用いられます．


Suggested change

⟶　[基本的な評価指標] 回帰モデルfが与えられたとき，次のようなよう化指標がモデルの性能を評価するために一般的に用いられます．

⟶ [基本的な評価指標] 回帰モデルfが与えられたとき,モデルの性能を評価するために次のような指標が一般的に用いられます．

hrkmr-tech · 2019-11-23T08:05:48Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
+
+&#10230;　決定係数 - よくR2やr2と書かれる決定係数は，実際の結果がモデルによってどの程度よく再現されているかを測る評価指標であり，次のように定義される．


hrkmr-tech · 2019-11-23T08:11:51Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
+
+&#10230; 主要な評価指標 - 次の評価指標は説明変数の数を考慮して回帰モデルの性能を評価するために，一般的に用いられています．


hrkmr-tech · 2019-11-23T08:20:41Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
+
+&#10230; ここでLは尤度であり，ˆσ2は各応答に対する誤差分散の推定値です．


hrkmr-tech · 2019-11-23T08:20:48Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**22. Model selection**
+
+&#10230; モデル選択


hrkmr-tech · 2019-11-23T08:28:10Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230; 用語 - モデルを選択するときには，次のように，データの種類を異なる３つに区別します．


Suggested change

⟶ 用語 - モデルを選択するときには，次のように，データの種類を異なる３つに区別します．

⟶ 用語 - モデルを選択するときには，次のようにデータの種類を異なる３つに区別します．

hrkmr-tech · 2019-11-23T08:50:53Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
+
+&#10230; [k=5か10が一般的,p=1の場合はLeave-one-out cross validation法と呼ばれる．]


Suggested change

⟶ [k=5か10が一般的,p=1の場合はLeave-one-out cross validation法と呼ばれる．]

⟶ [一般的にはk=5または10,p=1の場合は一個抜き交差検証と呼ばれます]

hrkmr-tech · 2019-11-23T09:07:11Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
+
+&#10230;　最も一般的に用いられている方法はk交差検証法であり，データセットをk群に分け，1群を検証に，残りのk-1群を学習に用います．これをk回繰り返します．求められた検証誤差はk群全てにわたって平均化され，これは交差検証誤差と呼ばれています．


Suggested change

⟶　最も一般的に用いられている方法はk交差検証法であり，データセットをk群に分け，1群を検証に，残りのk-1群を学習に用います．これをk回繰り返します．求められた検証誤差はk群全てにわたって平均化され，これは交差検証誤差と呼ばれています．

⟶ 最も一般的に用いられている方法はk交差検証法です．データセットをk群に分けた後，1群を検証に使用し残りのk-1群を学習に使用するという操作を順番にk回繰り返します．求められた検証誤差はk群すべてにわたって平均化されます．この平均された誤差のことを交差検証誤差と呼びます．

hrkmr-tech · 2019-11-23T09:14:45Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230;　正則化 - 正則化はモデルが過学習するのを避ける目的としており，したがってバリアンスが大きくなる問題に対処します．次の表は一般的に使用されるいくつかの正則化法をまとめたものです．


Suggested change

⟶　正則化 - 正則化はモデルが過学習するのを避ける目的としており，したがってバリアンスが大きくなる問題に対処します．次の表は一般的に使用されるいくつかの正則化法をまとめたものです．

⟶ 正則化 - 正則化はモデルの過学習状態を回避することが目的であり,したがってハイバリアンス問題(オーバーフィット問題)に対処できます. 一般的に使用されるいくつかの正則化法を下表にまとめました.

hrkmr-tech · 2019-11-23T09:15:52Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230; [係数を0にする,変数選択に適する,係数を小さくする,変数選択と係数を小さくすることのトレードオフ]


hrkmr-tech · 2019-11-23T09:21:19Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**35. Diagnostics**
+
+&#10230; 分析


I think 診断 or 診断方法 is better because the word diagnostics is a pair of symptoms and remedies in the later part: a metaphor.

hrkmr-tech · 2019-11-23T09:32:12Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
+
+&#10230; バイアス - モデルのバイアスとは，予測するあるデータ点における，予測した結果の期待値と正しいモデルによる結果との差です．


Suggested change

⟶ バイアス - モデルのバイアスとは，予測するあるデータ点における，予測した結果の期待値と正しいモデルによる結果との差です．

⟶ バイアス - モデルのバイアスとは，ある標本値群を予測する際の期待値と正しいモデルの結果との差異のことです．

hrkmr-tech · 2019-11-23T09:35:36Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
+
+&#10230; バリアンス - モデルのバリアンスとは，予測するあるデータ点における，予測した結果の分散です．


Suggested change

⟶ バリアンス - モデルのバリアンスとは，予測するあるデータ点における，予測した結果の分散です．

⟶ バリアンス - モデルのバリアンスとは，ある標本値群に対するモデルの予測値のばらつきのことです．

hrkmr-tech · 2019-11-23T09:41:22Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
+
+&#10230; バイアス・バリアンストレードオフ - よりシンプルなモデルではバイアスが高くなり，より複雑なモデルはバリアンスが高くなります．


hrkmr-tech · 2019-11-23T09:42:18Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
+
+&#10230; [症状,回帰モデルでの図,分類モデルでの図,深層学習での図,可能な解決策]


hrkmr-tech · 2019-11-23T09:45:38Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
+
+&#10230; [高い訓練誤差,訓練誤差がテスト誤差に近い，高いバイアス,訓練誤差がテスト誤差より少しだけ小さい,極端に小さい訓練誤差,訓練誤差がテスト誤差に比べて非常に小さい,高いバリアンス]


hrkmr-tech · 2019-11-23T09:46:16Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
+
+&#10230; [より複雑なモデルを試す,特徴量を増やす，より長く学習する,正則化を導入する,データ数を増やす]


hrkmr-tech · 2019-11-23T09:47:31Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
+
+&#10230; エラー分析 - エラー分析は完璧なモデルと現在のモデル間の性能差の主要な要因を分析することです．


Suggested change

⟶ エラー分析 - エラー分析は完璧なモデルと現在のモデル間の性能差の主要な要因を分析することです．

⟶ エラー分析 - エラー分析は現在のモデルと完璧なモデル間の性能差の主要な要因を分析することです．

hrkmr-tech · 2019-11-23T09:49:28Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+&#10230; アブレーション分析 - アブレーション分析は，ベースラインとするモデルと現在のモデル間の性能差の主要な要因を分析することです．


I think the current is '現在のモデル'.

hrkmr-tech · 2019-11-23T09:53:53Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+&#10230; [分析，バイアス・バリアンストレードオフ，エラー・アブレーション分析]


I think 診断 or 診断方法 is better. We need discuss this topic.
https://github.com/shervinea/cheatsheet-translation/pull/99/files#r349865518

scrambleegg7

I mostly agree with Hiroki san's japanese tranlation.

scrambleegg7 · 2019-11-23T22:00:34Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+&#10230; [分析，バイアス・バリアンストレードオフ，エラー・アブレーション分析]


Yep, I think it might be ok with "診断", however when Google searching with combination words "機械学習" and "診断", I look at many headlines related to the medical topics in Japan. The combination words "機械学習" and ("解析" or "分析") pick up headlines related to the machine learning in Japan. Though my approaching to find best translated word not be good one, how do you think about that ?

scrambleegg7

I think there are no major issues on conversion to Japan. Thank you for Hiroki san's great review.

scrambleegg7 · 2019-11-25T08:54:08Z

ja/cheatsheet-machine-learning-tips-and-tricks.md

+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+&#10230; [分析，バイアス・バリアンストレードオフ，エラー・アブレーション分析]


Thank you for your nice comment. I agree with "問題分析". It will be definitely fit to original nuance in English.

yoshiyukinakai · 2020-02-06T05:10:18Z

Hello @umu1729, a team from Machine Learning Tokyo completed reviewing your translation and added some suggestions. Could you check and incorporate our suggestions?

Here is how to incorporate suggestions:
https://help.github.com/ja/github/collaborating-with-issues-and-pull-requests/incorporating-feedback-in-your-pull-request

shervinea · 2020-04-23T06:46:04Z

Hi @umu1729, would there be anything I could do to help out in finishing the reviewing process?

yoshiyukinakai · 2020-06-02T01:02:15Z

Hi @shervinea, is it possible for you to incorporate suggestions and merge? I think this translation is as good as other completed translations.

…ne-learning-tips-and-tricks.md

shervinea · 2020-06-30T06:15:15Z

Thanks @yoshiyukinakai for pinging me on this PR. I went ahead and manually merged in all relevant reviewers suggestions. Hopefully it should be good now. Thanks again to everyone who helped on this translation, really appreciate your time and work!

umu1729 and others added 5 commits November 15, 2018 15:47

Create cheatsheet-machine-learning-tips-and-tricks.md

750fff2

first commit

Update cheatsheet-machine-learning-tips-and-tricks.md

ce2cb37

second commit

Update cheatsheet-machine-learning-tips-and-tricks.md

7dd9203

complete all translaton for now

Update cheatsheet-machine-learning-tips-and-tricks.md

b45ce79

final v2

Delete cheatsheet-deep-learning.md

d06840c

shervinea changed the title ~~fix ja~~ [ja] Machine learning tips and tricks Nov 16, 2018

shervinea added the in progress Work in progress label Nov 16, 2018

shervinea mentioned this pull request Jun 3, 2019

[ja] cs-229-probability #142

Merged

shervinea added reviewer wanted Looking for a reviewer and removed in progress Work in progress labels Sep 11, 2019

ytknzw suggested changes Nov 17, 2019

View reviewed changes

hrkmr-tech reviewed Nov 21, 2019

View reviewed changes

hrkmr-tech suggested changes Nov 21, 2019

View reviewed changes

hrkmr-tech reviewed Nov 21, 2019

View reviewed changes

ja/cheatsheet-machine-learning-tips-and-tricks.md Outdated

**16. [Actual, Predicted]**

⟶ [実際，予測]

Copy link

Contributor

hrkmr-tech Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

hrkmr-tech suggested changes Nov 21, 2019

View reviewed changes

scrambleegg7 reviewed Nov 21, 2019

View reviewed changes

hrkmr-tech reviewed Nov 21, 2019

View reviewed changes

hrkmr-tech suggested changes Nov 23, 2019

View reviewed changes

hrkmr-tech reviewed Nov 23, 2019

View reviewed changes

scrambleegg7 reviewed Nov 23, 2019

View reviewed changes

scrambleegg7 reviewed Nov 25, 2019

View reviewed changes

shervinea added 6 commits June 29, 2020 22:08

Rename cheatsheet-machine-learning-tips-and-tricks.md to cs-229-machi…

80f0cd3

…ne-learning-tips-and-tricks.md

Restore old commit for merging purposes

03769df

Fix formatting details

89a1882

Incorporate reviewers' suggestions

07b5a4b

Add [ja] contributors

d11d3d6

Merge branch 'master' into master

c1079bf

shervinea merged commit c74c255 into shervinea:master Jun 30, 2020

shervinea changed the title ~~[ja] Machine learning tips and tricks~~ [ja] cs-229-machine-learning-tips-and-tricks Oct 6, 2020

		@@ -0,0 +1,285 @@
		1. Machine Learning tips and tricks cheatsheet

		⟶　機械学習チップ&トリック　チートシート

	⟶　機械学習チップ&トリック　チートシート
	⟶機械学習のアドバイスやコツのチートシート


		3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.

		⟶ 二値分類の文脈では，次のようなモデルの性能を評価するための重要な評価指標があります．

	⟶ 二値分類の文脈では，次のようなモデルの性能を評価するための重要な評価指標があります．
	⟶二値分類の場面では、モデルの性能を評価するために追うべき主な指標として次のものがあります。


		4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:

		⟶ 混同行列 - 混同行列はモデルの性能を評価する際に，より完全な描像を得るために用いられます．

	⟶ 混同行列 - 混同行列はモデルの性能を評価する際に，より完全な描像を得るために用いられます．
	⟶ 混同行列 - 混同行列はモデルの性能を評価する際に、より完全に理解するために用いられます。次のように定義されます：


		5. [Predicted class, Actual class]

		⟶ [予測したクラス, 実際のクラス]


		7. [Metric, Formula, Interpretation]

		⟶ [評価指標,式,解釈]


		8. Overall performance of model

		⟶　モデルの全体的な性能


		9. How accurate the positive predictions are

		⟶　正と判断された予測の正答率

	⟶　正と判断された予測の正答率
	⟶陽性判定の正解率（陽性的中率）


		10. Coverage of actual positive sample

		⟶ 実際には正であるサンプルを正しく正と予測した割合

	⟶ 実際には正であるサンプルを正しく正と予測した割合
	⟶ 本当に陽性であるサンプルに対する正解率（真陽性率）


		11. Coverage of actual negative sample

		⟶　実際には負であるサンプルを正しく負と予測した割合

	⟶　実際には負であるサンプルを正しく負と予測した割合
	⟶　本当に陰性であるサンプルに対する正解率（真陰性率）


		12. Hybrid metric useful for unbalanced classes

		⟶　不均衡データに対する有用な複合指標


		13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:

		⟶ ROC曲線 - 受信者動作特性曲線(ROC)は閾値を変えていく際のFPRに対するTPRのグラフです．

		<br>

		13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:


		14. [Metric, Formula, Equivalent]

		⟶　[評価指標,式,等価な指標]


		15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:

		⟶ AUC - ROC曲線下面積(AUC,AUROC)は次の図のようにROC曲線の下側の面積のことです．

[ja] cs-229-machine-learning-tips-and-tricks #99

[ja] cs-229-machine-learning-tips-and-tricks #99

Conversation

umu1729 commented Nov 16, 2018

shervinea commented Nov 16, 2018

shervinea commented Sep 11, 2019

ytknzw commented Oct 28, 2019

shervinea commented Oct 29, 2019

ytknzw commented Oct 29, 2019

shervinea commented Oct 30, 2019

ytknzw commented Oct 30, 2019

shervinea commented Nov 15, 2019

ytknzw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrkmr-tech Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrkmr-tech Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ytknzw Nov 17, 2019 • edited Loading

Choose a reason for hiding this comment

tianjianjiang Nov 18, 2019 • edited Loading

Choose a reason for hiding this comment

ytknzw Nov 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ytknzw Nov 17, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrkmr-tech Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

hrkmr-tech Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrkmr-tech Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NOTE

Choose a reason for hiding this comment

hrkmr-tech Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrkmr-tech Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

scrambleegg7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrkmr-tech Nov 25, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tianjianjiang Nov 25, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrkmr-tech Nov 21, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hrkmr-tech Nov 23, 2019 • edited Loading

hrkmr-tech Nov 21, 2019 •

edited

Loading

hrkmr-tech Nov 21, 2019 •

edited

Loading

ytknzw Nov 17, 2019 •

edited

Loading

tianjianjiang Nov 18, 2019 •

edited

Loading

ytknzw Nov 17, 2019 •

edited

Loading

ytknzw Nov 17, 2019 •

edited

Loading

hrkmr-tech Nov 21, 2019 •

edited

Loading

hrkmr-tech Nov 21, 2019 •

edited

Loading

hrkmr-tech Nov 21, 2019 •

edited

Loading

hrkmr-tech Nov 21, 2019 •

edited

Loading

hrkmr-tech Nov 21, 2019 •

edited

Loading

hrkmr-tech Nov 25, 2019 •

edited

Loading

tianjianjiang Nov 25, 2019 •

edited

Loading

hrkmr-tech Nov 21, 2019 •

edited

Loading

hrkmr-tech Nov 23, 2019 •

edited

Loading

	⟶ AUC - ROC曲線下面積(AUC,AUROC)は次の図のようにROC曲線の下側の面積のことです．
	⟶ AUC - ROC曲線下面積(AUC,AUROC)は次の図に示される通りROC曲線の下側面積のことです


		18. [Total sum of squares, Explained sum of squares, Residual sum of squares]

		⟶　[総平方和,説明された平方和,残差平方和]

	⟶　[総平方和,説明された平方和,残差平方和]
	⟶ [全平方和,回帰平方和,残差平方和]


		45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]

		⟶ [分類評価指標,混同行列,正解率,適合率,再現率,F値,ROC]

	⟶ [分類評価指標,混同行列,正解率,適合率,再現率,F値,ROC]
	⟶ [分類評価指標,混同行列,正解率,適合率,再現率,F値,ROC曲線]


		43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.

		⟶ アブレーション分析 - アブレーション分析は，ベースラインとするモデルと現在のモデル間の性能差の主要な要因を分析することです．

	⟶ アブレーション分析 - アブレーション分析は，ベースラインとするモデルと現在のモデル間の性能差の主要な要因を分析することです．
	⟶ アブレーション分析 - アブレーション分析は，ベースライン・モデルと改良されたモデル間で発生したパフォーマンスの差異の原因を分析することです．


		46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]

		⟶ [回帰評価指標,R二乗,マローズのCp,AIC,BIC]


		47. [Model selection, cross-validation, regularization]

		⟶ [モデル選択，交差検証，正則化]

	⟶ [モデル選択，交差検証，正則化]
	⟶ [モデルの選択，交差検証，正則化]