Term frequency–inverse document frequencyとは何？わかりやすく解説 Weblio辞書

索引トップ用語の索引ランキング画像一覧カテゴリー

tf-idf

フルスペル：term frequency–inverse document frequency
読み方：てぃーえふあいでぃーえふ

tf-idfとは、情報探索やテキストマイニングなどの分野で利用される、文書中に出現した特定の単語がどのくらい特徴的であるかを識別するための指標のことである。

tf-idfの内、「tf(term frequency)」は、その文書の中で特定の単語が出現した回数を表し、「idf(inverse document frequency)」は、コーパス全体の中でその文書を含む文書数の自然対数を表し、「tf×idf」が、その文書中におけるその単語のtf-idf値となる。

tf-idfによる重み付けを利用したアルゴリズムは広く「tf-idf法」と呼ばれ、キーワード抽出や全文検索エンジンの重み付けなどに応用される。また、ベクトル空間モデルにおいて、文書間の類似度を判定する場合などにも、コサイン類似度を計算する際の、単語の特徴ベクトルとしてtf-idfの値が利用される。

情報と社会のほかの用語一覧

情報工学：

人工生命人工知能 Tay tf-idf 適合率 tf-idf法デコヒーレンス

>>情報工学カテゴリの他の用語

ウィキペディア小見出し辞書

索引トップランキング

Term frequency–inverse document frequency (tf-idf)

出典: フリー百科事典『ウィキペディア（Wikipedia）』 (2022/03/28 23:00 UTC 版)

「tf-idf」の記事における「Term frequency–inverse document frequency (tf-idf)」の解説

ここで、tf-idfは次のように計算される。 t f i d f ( t , d , D ) = t f ( t , d ) ⋅ i d f ( t , D ) {\displaystyle \mathrm {tfidf} (t,d,D)=\mathrm {tf} (t,d)\cdot \mathrm {idf} (t,D)} tf-idfの重みが高くなるのは、（与えられた文書内で）その単語の単語頻度(term frequency, tf)が高く、かつ、文書集合全体においてその単語の文書頻度(document frequency, df)が低い場合である。それゆえに、重みは普遍的な語をフィルタする傾向がある。idfの対数内の分数は常に1以上となるため、idf（とtf-idf)の値は常に0以上になる。単語がより多くの文書に現れる場合、対数の中の分数は1に近づき、それゆえにidfとtf-idfは0に近づく。推奨されているtf–idf 重み付け手法重み付け手法文書における利用クエリにおける利用 1 f t , d ⋅ log ⁡ N n t {\displaystyle f_{t,d}\cdot \log {\frac {N}{n_{t}}}} ( 0.5 + 0.5 f t , q max t f t , q ) ⋅ log ⁡ N n t {\displaystyle \left(0.5+0.5{\frac {f_{t,q}}{\max _{t}f_{t,q}}}\right)\cdot \log {\frac {N}{n_{t}}}} 2 log ⁡ ( 1 + f t , d ) {\displaystyle \log(1+f_{t,d})} log ⁡ ( 1 + N n t ) {\displaystyle \log \left(1+{\frac {N}{n_{t}}}\right)} 3 ( 1 + log ⁡ f t , d ) ⋅ log ⁡ N n t {\displaystyle (1+\log f_{t,d})\cdot \log {\frac {N}{n_{t}}}} ( 1 + log ⁡ f t , q ) ⋅ log ⁡ N n t {\displaystyle (1+\log f_{t,q})\cdot \log {\frac {N}{n_{t}}}}

※この「Term frequency–inverse document frequency (tf-idf)」の解説は、「tf-idf」の解説の一部です。
「Term frequency–inverse document frequency (tf-idf)」を含む「tf-idf」の記事については、「tf-idf」の概要を参照ください。

ウィキペディア小見出し辞書の「Term frequency–inverse document frequency」の項目はプログラムで機械的に意味や本文を生成しているため、不適切な項目が含まれていることもあります。ご了承くださいませ。お問い合わせ。

>> 「Term frequency–inverse document frequency」を含む用語の索引
Term frequency–inverse document frequencyのページへのリンク

Term frequency–inverse document frequencyとは？わかりやすく解説

tf-idf

Term frequency–inverse document frequency (tf-idf)

「Term frequency–inverse document frequency」の関連用語


	Copyright © 2005-2025 Weblio 辞書 IT用語辞典バイナリさくいん。この記事は、IT用語辞典バイナリの【tf-idf】の記事を利用しております。
	Text is available under GNU Free Documentation License (GFDL). Weblio辞書に掲載されている「ウィキペディア小見出し辞書」の記事は、Wikipediaのtf-idf (改訂履歴)の記事を複製、再配布したものにあたり、GNU Free Documentation Licenseというライセンスの下で提供されています。

Term frequency–inverse document frequencyとは？ わかりやすく解説

tf-idf

Term frequency–inverse document frequency (tf-idf)

「Term frequency–inverse document frequency」の関連用語

Term frequency–inverse document frequencyとは？わかりやすく解説