This post is coming a bit late (yay, summer holidays!), but here is my overview of the Lisbon Machine Learning Summer School (LxMLS 2015). Already on its 5th edition, the school was held at the Instituto Superior Técnico, organized jointly with the Instituto de Telecomunicações and the Spoken Language Systems Lab — L2F of INESC-ID, in Lisbon, Portugal, and lasted a little over one week, from the 16th to the 23rd of July. The broad topic was machine learning in use for natural language processing. We had lectures, practical sessions, but also a series of fun keynotes on various NLP projects, delivered by some of the most famous people in the field. The program attracted a lot of attention, with around 100 participants, and an acceptance rate of around 40% (there wasn’t enough space for all the people that wanted to attend), understandably since NLP is quite a hot topic these days!
The lecture topics were quite diverse, starting from a basic introduction in probability theory by Mário Figueiredo, all the way to the latest breakthroughs in the field on learning with big data and deep learning. Some highlights:
- Noah Smith talked about sequence classification using generative approaches (i.e. approaches that attempt to model the probability distribution of the data P(X,Y)), with a focus on hidden Markov models. This was a really fun lecture — Markov models were exemplified as series of die throws (first time I see this metaphor, it really helped conceptualize the model), and we got an overview of the Viterbi algorithm for finding the most likely sequence of hidden states.
- Xavier Carreras (from Xerox) gave a lecture on learning structured predictors with a discriminative approach (i.e. modeling the conditional probability of the sequence, given the observed data, P(Y|X)). In particular, we discussed conditional random fields, and the structured perceptron.
- Slav Petrov from Google lectured on syntax and parsing, discussing methods for both constituency (the CYK algorithm) and dependency parsing (Eisner’s algorithm). In contrast with the rest of the course, this lecture used almost no math notation — all formulas/formal definitions were shown in pseudocode. I was not alone in thinking this made the slides easier to follow!
- Chris Dyer from Amazon talked about the MapReduce algorithm and how it can be used in learning with big data. In particular, we took a look at how to implement the PageRank algorithm using the MapReduce approach to parallelization. This was a very technical talk (distributed file system architecture was discussed), a welcome addition in a program heavy on the theoretical side.
- Yoshua Bengio’s lecture on deep learning was probably the most anticipated topic of the course. The statement that deep learning is trying to “solve AI” was bound to generate some controversies. On the other hand, it is hard to argue against the impressive results shown in image recognition and question answering (more about this in the keynote discussion). We had a general introduction on multi-layered neural networks and manifold models, the characteristics and issues of high dimensional data (i.e. moving from the paradigm of global minimum optimization to saddle points), to more experimental methods like adversarial networks (i.e. networks that compete against each other to improve the performance of the overall model).
The practical component of the course featured some intense programming sessions, implementing a set of the algorithms introduced in the lecture, from a simple Naïve Bayes classifier, to Eisner’s algorithm for projective dependency parsing. The lab manual is available on GitHub. I’ve been meaning to become familiar with how data analysis is done in Python, and the lab sessions were a perfect opportunity to play with NumPy, Python’s library for statistics and scientific computing, and Matplotlib for data visualization. IPython Notebook proved to be a great tool for putting together code, course notes and plots. Normally I do my data analysis in R, while results discussions with my group happen on spreadsheets in Google Drive, whereas Notebook combines these functionalities in one tool — I am seriously considering to switch!
On the topic of deep learning, we also had an introduction to Theano, the Python library to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays. This is used to implement deep learning models and run them on the GPU, which is much more efficient. Specifically, Theano uses a symbolic representation of functions, which can then be used for computing the gradient descent algorithm needed for learning.
The program also included great keynotes on various NLP applications and the state of the art in the field. Some personal highlights:
- Yejin Choi talked about generating textual descriptions of images, combining syntax parsing with semantics and image recognition. I had discussed her paper on BabyTalk some years ago in a job interview, it was a nice surprise to be able to put a name on the work!
- Fernando Pereira from Google gave a talk on combining semantics with machine learning to perform document search. The bottom line is: we have knowledge bases, and we have a lot of text, but linking the two is difficult. Rule-based methods are not the best, but ML techniques do not generalize well from inductive reasoning. Anaphora resolution seems to be the next big research goal here. Could crowdsourcing be a solution, as proposed by Massimo Poesio at ESWC2015?
- Roberto Navigli gave a talk on multilingual word sense disambiguation and entity linking, introducing his project BabelNet. This was an updated version of his ISWC2014 talk, this time focusing on the NLP side. He also brought some t-shirts for us.
— Ben Verhoeven (@verhoevenben) July 22, 2015
- Saving the best for the last, the summer school ended with an awesome talk by Phil Blunsom from the now Google-owned DeepMind. Blunsom introduced his latest work on teaching machines to read and comprehend language with deep learning. This method shows very promising results in an experiment of question answering over news articles. The best performing model, so-called “impatient reader”, generates a large number of candidates, iterates through the query and gradually removes candidate solutions. The main challenge for the future is getting enough training data to properly generalize the model. It is also not clear how suited deep learning is to modeling more complex language structures, like anaphora resolution.
Also part of the summer school was a demo evening, introducing us to Lisbon’s emerging start-up scene. I had an interesting chat with the guys at UnBabel, a start-up working on translation. Their idea is to fix machine translation errors with language experts. They have their own text annotation tool, but so far only one person working per language pair, with plans of expanding though. I am curious how they will tackle inter-annotator disagreement in the future, will definitely keep an eye on them!
An interesting thing I noticed was the lack of activity on Twitter. Coming from a web science background, I am used to lots of social media buzz around conferences and summer schools. The ML/NLP community seems more traditional in this respect — most of the discussions were done on the mailing list. I wonder whether the fact that many of the pioneers in the field are working for companies that are competing with Twitter could also be influencing this.
And finally, it wouldn’t be the summer school without the social events, where some of the most informative discussions were had! It’s how I found out about this comprehensive paper on tackling various NLP tasks with deep learning. Some personal highlights from Lisbon:
- the banquet at the beautiful Moorish palace in Casa do Alentejo,
- the awesome sea food (seems to be a theme this summer) and pastry, the light fizzy wine,
- the beautiful beaches on the Atlantic coast!