In the second half of July (20th of July – 27th of July) I attended the Lisbon Machine Learning Summer School (LxMLS2017). As every year, the summer school is held in Lisbon, Portugal, at Instituto Superior Técnico (IST). The summer school is organized jointly by IST, the Instituto de Telecomunicações, the Instituto de Engenharia de Sistemas e Computadores, Investigação e Desenvolvimento em Lisboa (INESC-ID), Unbabel, and Priberam Labs.
Around 170 students (mostly PhD students but also master students) attended the summer school. It’s important to mention that around 40% of the applicants are accepted, so make sure you have a strong motivation letter! For eight days we learned about machine learning with focus on natural language processing. The day was divided into 3 parts: lectures in the morning, labs in the afternoon and practical talks in the evening (yes, quite a busy schedule).
In general, the morning lectures and the labs mapped really well, first learn the notions and then put them into practice. During the labs we worked with Python and IPython Notebooks. Most of the labs had the base code already implemented and we just had to fill in some functions. However, for some of the lectures/labs this wasn’t that easy. I’m not going to discuss in detail the morning lectures but I’ll mention the speakers and their topics (also, the slides are available of the website of the summer school):
- Mario Figueiredo: an introduction to probability theory which proved to be fundamental for understanding the following lectures.
- Stefan Riezler: an introduction to linear learners using an analogy with the perceptual system of a frog, i.e., given that the goal of a frog is to capture any object of the size of an insect or worm providing it moves like one, can we build a model of this perceptual system and learn to capture the right objects?
- Noah Smith: gave an introduction of sequence models such as Markov models and Hidden Markov models and presented the Viterbi algorithm which is used to find the most likely sequence of hidden states.
- Xavier Carreras: talked about structured predictors (i.e., given training data, learn a predictor that performs well on unseen inputs) using as running example a named entity recognition task. He also discussed about Conditional Random Fields (CRF), approach that gives good results in such tasks.
- Yoav Goldberg: talked about syntax and parsing by providing many examples of using them in sentiment analysis, machine translation and many other examples. Compared to the rest of the lectures, this one had much less math and was easy to follow!
- Bhiksha Raj: gave an introduction to neural networks, more exactly convolutional neural networks (CNN) and recurrent neural networks (RNN). He started with the early models of human cognition, associationism (i.e., humans learn through association) and connectionism (i.e., the information is in the connexions and the human brain is a connectionist machine).
- Chris Dyer: discussed about modeling sequential data with recurrent networks (but not only). He showed many examples related to language models, long short-term memories (LSTMs), conditional language models, among others. However, even if it’s easy to think of tasks that could be solved by conditional language models, most of the times the data does not exist, a problem that seems to appear in many fields and many examples.
In the last part of the day we had practical talks or special talks of concrete applications that are based on the techniques learnt during the morning lectures. During the first day we were invited to attend a panel discussion named “Thinking machines: risks and opportunities” at the conference “Innovation, Society and Technology” where 6 speakers (Fernando Pereira – VP and Engineering Fellow at Google, Luís Sarmento – CTO at Tonic App’s, André Martins – Unbabel Senior researcher, Mário Figueiredo – Instituto de Telecomunicações at IST, José Santos Victor – president of the Institute for Systems and Robotics at IST and Arlindo Oliveira – president of Instituto Superior Técnico) in the AI field discussed about the benefits and risks of artificial intelligence and automatic learning. Here are a couple of thoughts:
- Fernando Pereira: In order to enable people to make better use of technology, we need to make machines smarter at interacting with us and helping us.
- André Martins pointed out an interesting problem: people spend time on solving very specific things but these are never generalized. -> but what if this is not possible?
- Fernando Pereira: we build smart tools but only a limited amount of people are able to control them, so we need to build the systems in a smarter way and make the systems responsible to humans.
Another evening hosted the Demo Day, an informal gathering that brings together a number of highly technical companies and research institutions, all with the aim of solving machine learning problems through technology. There were a lot of enthuziastic people to talk to, many demos and products. I even discovered a new crowdsourcing platform, DefinedCrowd that soon might start competing with CrowdFlower and Amazon Mechanical Turk.
Here are some other interesting talks that we followed:
- Fernando Pereira – “Learning and representation in language understanding”: talked about learning language representation using machine learning. However, machine understanding of language is not a solved problem. Learning from labeled data or learning with distant supervision may not yield the desired results, so it’s time to go implicit. He then introduced the work done by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention Is All You Need. In this paper, the authors claim that you do not need complex CNNs or RNNs models, but it’s enough to use attention mechanisms in order to obtain quality machine translation data.
- Graham Neubig – “Simple and Efficient Learning with Dynamic Neural Networks”: dynamic neural networks such as DyNet can be used as alternatives to TensorFlow or Theano. According to Graham, here as some advantages of using such nets: the API is closer to standard Python/C++ and it’s easier to implement nets with varying structure and some disadvantages: it’s harder to optimize graphs (but still possible) and it’s also harder to schedule data transfer.
- Kyunghyun Cho – “Neural Machine Translation and Beyond”: showed why sentence-level and word-level machine translation is not desired: (1) it’s inefficient to handle various morphological words variants, (2) we need good tokenisation for every language (not that easy), (3) they are not able to handle typos or spelling errors. Therefore, character-level translation is what we need because it’s more robust to errors and handles better rare tokens (which are actually not necessarily rare).