DS314BKK
Mastering NLP: Foundations to Frontiers with LLMs

Faculty
Polina Proskura
Applied Scientist at Amazon
Course length
Duration
Total hours
Credits
Language
Course type
Fee for single course
Fee for degree students
Skills you’ll learn
Overview
Natural Language Processing is central to many modern AI applications – from search engines and machine translation to large-scale text analysis. This course offers a comprehensive introduction to state-of-the-art NLP methods, with a strong focus on recent advances in large language models (LLMs), specifically BERT and GPT.
We will begin the course with the fundamentals of NLP, such as text preprocessing and vector-based representations of words and sentences. We will then move on to tasks such as text classification, language modelling, and machine translation, to develop familiarity with the core challenges in the field. Building on this foundation, we will explore modern neural network architectures for language tasks, progressing from RNNs to BERT and GPT-based transformers.
The course places a strong emphasis on both practical skills and theoretical understanding, providing valuable preparation for a future career as a data scientist.
Learning highlights
- Understand the theoretical foundations of modern Natural Language Processing, including the architecture of complex systems such as transformers and large language models.
- Gain hands-on experience with state-of-the-art NLP frameworks and libraries, such as PyTorch, Hugging Face Transformers, and the OpenAI API.
- Develop the ability to implement and evaluate solutions for tasks such as text generation, machine translation, question answering, and text summarisation.
- Become familiar with the capabilities, limitations, and complexities of NLP models in various industrial applications.
- Invent and present their solution to a real-world NLP problem.
Course outline
15 classes
Introduction. Text Processing.
Structure of text data. Preprocessing techniques: tokenisation, normalisation, stemming, and lemmatisation. Text preprocessing in Python using the NLTK and spaCy libraries.
Linguistic Foundations of NLP problems.
Linguistic basics: syntax, semantics, and morphology. POS tagging and dependency parsing using real-world problems. Errors and ambiguities in real-world texts.
Text Representations.
Basic methods: Bag of Words, TF-IDF, and the vector space model. Document similarity and text classification problems.
Language Modelling Problem.
N-gram language models: sequence probabilities, smoothing techniques, and perplexity. Generation strategies including greedy decoding and beam search.
Word Embeddings.
Distributional semantics. Word2Vec: CBOW and Skip-Gram. Negative sampling. GloVe. Evaluation of word embeddings: intrinsic vs extrinsic. Gensim library.
Neural Networks for NLP.
Introduction to neural networks: feedforward neural network, backpropagation, gradient descent. Text classification problem using PyTorch.
Recurrent Neural Networks and LSTMs.
RNN. Vanishing gradient problem. LSTM and GRU. Applications and evaluation.
Sequence-to-sequence problems.
Encoder-decoder models. Machine translation problem. Conditional language modelling. Bottleneck. Attention mechanism.
Transformers.
Self-attention, positional encoding, residual connections. Architecture of the modern transformer. Interpretability of transformers.
Pretrained models and BERT.
Masked language modelling. Next sentence prediction. BERT architecture and applications. Fine-tuning BERT using the transformers library.
GPT models.
Causal language modelling. GPT architecture and applications. Zero-shot vs few-shot learning. Prompt engineering basics.
Evaluation and Interpretability.
Metrics for classification and generation (BLEU, ROUGE, WER). Explainability of language models. Attention visualisation.
LLM API.
Introduction to OpenAI API and HuggingFace. Prompt design. Fine-tuning, adapters, in-context learning.
Modern LLMs.
Evaluation of LLM behaviour: bias, hallucinations, safety. Mixture of Experts. Retrieval-augmented generation.
Project.
NLP demo projects.
Course materials
Books
Media
Prerequisites
Linear algebra: vectors, dot products, linear functions, matrices, matrix decompositions.
Probability theory and statistics.
Python: functions, classes, wrappers.
Methodology
The course consists of 3-hour sessions, which will be divided into lectures and seminars. The seminars include practical assignments that will be completed both in class with the support of the instructor and individually at home. The final project will be carried out in groups. There will be 15-minute tests each week on the material covered.
Grading
Polina Proskura is an applied scientist and researcher specialising in deep learning, natural language processing, and reinforcement learning. She graduated from MIPT in 2019 and completed her Master's in Data Science at EPFL in 2022.
She currently works as an Applied Scientist at Amazon, where she contributes to various deep learning projects powering large-scale applications on amazon.com. Her recent research focuses on effective ensembling techniques for neural networks, with applications in NLP and model robustness. She is the co-author of several peer-reviewed publications, including work on uncertainty estimation for neural networks, core deep learning problems, and large-scale NLP systems.
See full profileApply for this course
Mastering NLP: Foundations to Frontiers with LLMs
by Polina Proskura
Total hours
45 Hours
Dates
Jun 30 - Jul 18, 2025
Fee for single course
€1500
Fee for degree students
€750
How to secure your spot
Complete the form below to kickstart your application
Schedule your Harbour.Space interview
If successful, get ready to join us on campus
FAQ
Will I receive a certificate after completion?
Yes. Upon completion of the course, you will receive a certificate signed by the director of the program your course belonged to.
Do I need a visa?
This depends on your case. Please check with the Spanish or Thai consulate in your country of residence about visa requirements. We will do our part to provide you with the necessary documents, such as the Certificate of Enrollment.
Can I get a discount?
Yes. The easiest way to enroll in a course at a discounted price is to register for multiple courses. Registering for multiple courses will reduce the cost per individual course. Please ask the Admissions Office for more information about the other kinds of discounts we offer and what you can do to receive one.