1 PhD position at the Toulouse University, France.
The Renault Labs France and the Informatics Research Institute of Toulouse (IRIT) invite applications for a fully funded PhD position on "Deep learning for texts and knowledge bases access." The PhD will be co-supervised by François-Paul Servant (Renault), Prof. Lynda Tamine and Dr. Jose Moreno.
The thesis targets two main objectives:
1) the semantic representation of documents that mention entities from different external resources;
2) the categorization of documents by family of entities mentioned and the search of documents meeting entities' needs. To achieve these objectives, we plan to move towards an approach based on deep learning to solve both representation and access problem (categorization, information retrieval), constrained by the content and structure of multiple external resources (terminology, thesauri, knowledge graphs etc.).
From the point of view of representation, we are in line with recent works based on the joint regularization of neural embeddings augmented by resources [Faruqui2014; Yu2014; Wang2014; Yamada2016]. This work is based on the hypothesis that learned representations are interpretable if they are aligned with entities derived from resources so that representations of entities obtained in latent space are all the closer as they are associated with semantically related entities in the external resources. These representations extended to sentences, texts, are exploitable in an information search task [Nguyen2018], in the identification of mentions of entities [Moreno2017], or the categorization of short textual documents [Kim2014]. Although distributional representations exist for words/texts, structured resources and their combinations, no work is interested in the constrained regularization of multiple resources, nor in the multi-level structuring of entities in these resources. One of the first works in this direction uses Poincaré geometry to represent the hierarchies in resources[Nickel2017], but completely ignores the representation of relationships between entities. However, relationships are omnipresent in today's widely used knowledge bases, including those considered at Renault.
The thesis project faces new scientific challenges related to the definition of adequate neural architectures and associated cost functions, capable of learning compositionality (semantic compositionality) both in the local context (text) and global contexts (resources).
The envisioned starting date is September 2018 (starting in early 2019 is also possible).
We are looking for one candidate with a strong focus on information retrieval/NLP and machine learning with the following profile:
+ Good Master’s degree in Computer Science, Statistics, Mathematics or
related disciplines (essential)
+ Good programming skills in Python/Keras|TensorFlow|Torch (essential)
+ Advanced knowledge in algorithms and data structures (optional)
+ Ability to work independently and be self-motivated (essential)
+ Excellent communication skills in English (essential - minumil score of 750 in TOEIC)
The application should consist of the following:
+ a curriculum vitae
+ transcript of marks according to M1-M2 profile or last 3 years of engineering school (with indication on the ranking if possible)
+ covering letter
+ letter(s) of recommendation including at least one letter drawn up by a university referent
Potential candidates will be invited for an interview with the supervisors. The application file should be sent
Conditions of employment
You will be hired on fixed-term contract (3 years contract - CIFRE) at Renault, a world leader in car manufacturing.
Working at Toulouse (IRIT/Renault)
You will be integrated in two teams with academic and industrial profiles: the IRIS team of IRIT recognized for its research activities in the field of information retrieval and information synthesis with a focus on the use of Deep Learning technologies and the team Renault.
The IRIT lab represents one of the major potential of the French research in computer science, with a workforce of more than 700 members including 272 researchers and teachers 204 PhD students, 50 post-doc and researchers under contract and also 32 engineers and administrative employees.
Toulouse is located on the banks of the Garonne River, 150 kilometres from the Mediterranean Sea, 230 km from the Atlantic Ocean and 680 km from Paris. It is the fourth-largest metro area in France, with 1,312,304 inhabitants as of January 2014. Toulouse is the centre of the European aerospace industry, with the headquarters of Airbus, the Galileo positioning system, the SPOT satellite system, ATR and the Aerospace Valley. It also hosts the European headquarters of Intel and CNES's Toulouse Space Centre (CST), the largest space centre in Europe. Thales Alenia Space, and Astrium Satellites also have a significant presence in Toulouse. The University of Toulouse is one of the oldest in Europe (founded in 1229) and, with more than 103,000 students, it is the fourth-largest university campus in France, after the universities of Paris, Lyon and Lille. The city was the capital of the Visigothic Kingdom in the 5th century and the capital of the province of Languedoc in the Late Middle Ages and early modern period, making it the unofficial capital of the cultural region of Occitania (Southern France).
[Faruqui2014] Faruqui M., Dodge J., Jauhar S. K., Dyer C., Hovy E.,
Smith N. A. Retrofitting Word Vectors to Semantic Lexicons, NAACL, 2014.
[Moreno2017] Moreno, J. G., Besançon, R., Beaumont, R., D’hondt, E.,
Ligozat, A. L., Rosset, S., Grau, B. (2017, Combining word and entity
embeddings for entity linking. In Extended Semantic Web Conference
(ESWC) pp. 337-352, 2017.
[Nickel2017] Nickel, M., & Kiela, D. Poincaré embeddings for learning hierarchical representations. In Advances in Neural Information Processing Systems (pp. 6341-6350), 2017.
[Nguyen2018] Gia Nguyen, Lynda Tamine, Laure Soulier, Nathalie Souf, A Tri-Partite Neural Document Language Model for Semantic Information Retrieval. In Extended Semantic Web Conference (ESWC), 2018.
[Yu2014] Yu M., Dredze M. Improving Lexical Embeddings with Semantic Knowledge, ACL, p. 545- 550, 2014.
[Wang2014] Wang Z., Zhang J., Feng J., Chen Z., « Knowledge Graph and Text Jointly Embedding », EMNLP, p. 1591- 1601, 2014
[Yamada2016] Yamada, I., Shindo, H., Takeda, H., Takefuji, Y., « Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation », CoNLL, p. 250-259, 2016