Séminaire Using Directed Crawling and Word2vec to find Characteristic Vocabulary for a new Domain (Gregory Grefenstette, INRIA)
L'Équipe ERTIM de l'INALCO a le plaisir de vous inviter à son prochain séminaire de recherche, le mardi 22 novembre à 10h30 au 2 rue de Lille, Paris 7ème :
Using Directed Crawling and Word2vec to find Characteristic Vocabulary for a new Domain (Gregory Grefenstette, INRIA)
Specialized dictionaries are used to understand concepts in specific domains, especially where those concepts are not part of the general vocabulary, or having meanings that differ from ordinary languages. The first step in creating a specialized dictionary involves detecting the characteristic vocabulary of the domain in question. Classical methods for detecting this vocabulary involve gathering a domain corpus, calculating statistics on the terms found there, and then comparing these statistics to a background or general language corpus. Terms which are found significantly more often in the specialized corpus than in the background corpus are candidates for the characteristic vocabulary of the domain. Here we present two tools, a directed crawler, and a distributional semantics package, that can be used together, circumventing the need of a background corpus.
Gregory Grefenstette is senior researcher at INRIA Saclay, France. An expert in information retrieval and natural language processing, Grefenstette established the field of Cross Language Information Retrieval and is also one of the pioneers of distributional semantics. Involved in information retrieval since the early TREC days, he has always been keen on large scale solutions to NLP problems. Former chief scientist at the Xerox Research Centre Europe (1993-01), at Clairvoyance Corporation (2001-04), with the French CEA (2004-08), and scientific director at Exalead (2008-13), he has been active in transferring research into products as inventor in 19 U.S patents. His current research interests are lifelogging and personal semantics.
Le séminaire aura lieu mardi 22 novembre 2016, de 10h30 à 12h30 à l'Inalco Recherche, 2 rue de Lille, Paris (salons d'honneur). Accès : stations de métro Saint Germain des Prés (ligne 4), Rue du Bac (ligne 12), Palais Royal - Musée du Louvre (ligne 1), Musée d'Orsay (RER C).