Microsoft has released a set of 100,000 questions and answers that artificial intelligence (AI) researchers can use to create systems that can read and answer questions as precisely as a human.
“The dataset is called MS MARCO, which stands for Microsoft MAchine Reading COmprehension, and can be used to teach artificial intelligence systems to recognise questions and formulate answers and, eventually, to create systems that can come up with their own answers based on unique questions they have not seen before,” said Microsoft in a blog post.
By providing realistic questions and answers, the researchers said they can train systems to better deal with the nuances and complexities of questions regular people actually ask, including those queries that have no clear answer or multiple possible answers.
“Our dataset is designed not only using real-world data but also removing such constraints so that the new-generation deep learning models can understand the data first before they answer questions,” added Li Deng, Partner Research Manager of Microsoft’s Deep Learning Technology Centre.
The MS MARCO dataset is available for free to any researcher who wants to download it and use it for non-commercial applications, Microsoft said.