Beyond AI Logo

少ない教師データからの高精度予測モデル自動構築 Automated Learning of High Accurate Prediction Model from Limited Supervised Data



研究リーダー Project Leader
原田 達也 教授 Tatsuya Harada Professor
東京大学 先端科学技術研究センター Research Center for Advanced Science and Technology, The University of Tokyo
研究者 Researchers
  • 杉山 将 教授 Masashi Sugiyama Professor





教師データの整備という機械学習の導入障壁 Development of Training Data: The Barrier to Introduction of Machine Learning


Systems and services equipped with sophisticated prediction functions based on machine learning are rapidly expanding, attracting substantial interest. But did you know that there exists a huge barrier of training data development to the introduction of machine learning, particularly deep learning? Currently, deep learning is successfully introduced, primarily in the area of “supervised learning.” However, there is a need to prepare massive amounts of “training data” (data with correct answers) to achieve high prediction accuracy using the supervised learning. In numerous applications, sufficient training data cannot be prepared when attempts are made to introduce new machine learning processes, which may exclude developers and users from the benefits of training data. Even if the data required for building training data were available, the preparation process would require enormous cost and specialized knowledge, thereby posing as a major obstacle to its introduction. The construction of a system that allows the introduction of machine learning even with limited information and the reduction of costs for creating high-quality training data are therefore the foremost challenges in the universal use of intelligent systems in practice.


Details of Project

限られた教師データから高精度な予測モデルを自動的に構築する機械学習の基盤技術を研究  Research on Underlying Technology of Machine Learning for Automatically Building High Precision Prediction Models from Limited Training Data


To solve the challenge, this research aims to establish underlying technology for automatically building high precision prediction models from limited training data. The technology will fundamentally solve diverse problems related to the developing of training data, which has posed as a barrier to the introduction of machine learning, from areas where machine learning could not be introduced due to the failure to gather large amounts of high-quality training data to ones where introduction of machine learning had to be given up due to human costs and lack of specialized knowledge required for the development of training data.
In this research, we plan to establish innovative underlying technology from three perspectives. Our research team has already obtained world-leading achievements and evaluation results for different processes. By maintaining and accelerating this advantage, we aim to create more predominant core technologies.

[1]弱教師データを活用した予測モデルの学習理論とアルゴリズムの開発 [1] Development of learning theories and algorithms of prediction models using weakly supervised learning


Our research team has built world-leading learning theories and general-purpose algorithms in a field called “weakly supervised learning.” Weakly supervised learning is a learning method in which the label information of the training data is inaccurate or given only partially. In recent years, it has attracted interests across the world as a formidable solution for overcoming the massive costs required for collecting labels for medical information, etc. This project thus aims to establish a unified theory on weakly supervised learning by further developing the research results achieved thus far and refining the theory. To enable practical application of the unified theory in the real world, we also promote research and development to build general-purpose algorithms and pursue robustness for dealing with noise and abnormal values.

[2]知識転移の理論とアルゴリズムの開発 [2] Development of knowledge transfer theory and algorithms


Another approach of our focus is the development of knowledge transfer technology called domain adaptation. Domain adaptation is the technique of diverting a prediction model learned in a certain area to a target with different properties. This makes it possible to convert the artificial knowledge generated in large quantities by simulation into knowledge that can be used in real environments. A challenge faced with current domain adaptation is the strong restriction requiring the source and target categories to match. In this project, we promote research on technologies that can ease the restriction. Other research and development endeavors include technologies for increasing the number and diversity of source areas and selecting and adapting appropriate knowledge from them, as well as methods for combining [1] weakly supervised learning and domain adaptation.

[3]高精度な予測モデルの自動構築と応用 [3] Automatic development and application of high precision prediction models


We are also engaged on the development of integrated automatic learning underlying technology combining [1] weakly supervised learning and [2] knowledge transfer technology. To build high precision models in machine learning, it is imperative to appropriately decide model structures, preprocess data, set parameters, etc. We therefore aim to build an efficient parallel distribution processing platform that automatically performs these settings. Particularly, for [1] weakly supervised learning, we will focus on automating the selection of optimum algorithms based on the quality of information assigned to training data (e.g., labeled, unlabeled, label reliability, similarity). For [2] knowledge transfer technology, we will focus on automating the selection of appropriate knowledge to be transferred as well as develop integrated methods for these.


Values / Hopes

本研究プロジェクトが切り開く未来の可能性 Future Possibilities Created by This Project

教師データの整備という機械学習の導入障壁に取り組む本研究は、AI利活用の基盤技術であり、きわめて広い適用範囲があります。本研究によって機械学習導入の新たな可能性を切り開くことで、これまでよりもより広範囲な業種やサービスに機械学習を適用できるようになることを期待しています。また、この基盤技術によって、AIがもたらす知的システムが世の中で汎用的に利用されることで、Beyond AIが目指すよりよい社会の実現に大きく貢献できると信じています。

To overcome the training data development barrier to the introduction of machine learning, this project focuses on the development of underlying technology for leveraging artificial intelligence (AI), which can be applied extensively. We hope that our endeavors will pave new avenues for introducing machine learning, and enable machine learning to be applied to an even wider range of industries and services than ever before. We also believe that the underlying technology will significantly contribute to the building of a better society that “Beyond” AI strives for by enabling the wide use of the AI-based intellectual systems around the world.
Furthermore, the underlying technology is expected to produce critical results in terms of the development of science and technology. It has the potential to change conventional methodologies of scientific research, with which it has been difficult to obtain massive amounts of training data due to restrictions on the number of experiments that can be conducted, as well as the potential to help acquire never-imagined knowledge in various areas of natural science.