The tidyverse is essential for any data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation. The tidyverse allows you to import, tidy, transform, manipulate and visualise data. This course covers key tidyverse areas, such as {dplyr}, {lubridate}, {tidyr} and tibbles.
The tidyverse is essential for any statistician or data scientist who deals with data on a day-to-day basis. By focusing on small key tasks, the tidyverse suite of packages removes the pain of data manipulation. This course takes the next steps in using the tidyverse and examines how and where to use packages such as {purrr}, {stringr}, {forcats} and {tidytext} in an analysis.
When working on data analysis projects version control is essential, for tracking project progress and in aiding project collaboration. Fortunately it is now easier than ever before to integrate version control into your project, using RStudio’s interface to the version control software git and online code sharing websites such as GitHub / GitLab.
An important aspect of managing workflow in data science is being able to work in tandem with your colleagues! This course outlines how effective git is as a tool for version control in collaborative projects. We will be making use of the RStudio git interface and remote project hosting platforms, such as Github and Gitlab.
This is a one-day intensive course on R and assumes no prior knowledge. By the end of the course, participants will be able to import, summarise and plot their data. At each step, we avoid using “magic code”, and stress the importance of understanding what R is doing.
This is a one-day intensive course on advanced graphics with R. The standard plotting commands in R are known as the base graphics, but are starting to show their age. In this course, we cover more advanced graphics packages - in particular, {ggplot2}. The {ggplot2} package can create advanced and informative graphics.
The benefit of using a programming language such as R is that we can automate repetitive tasks. This course covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and when to use them. This is a one-day intensive course on R.
Despite the promise of big data, inferences are often limited by its systematic structure. Only by carefully modelling this structure can we take full advantage of the data. Stan is a platform for facilitating this modelling, providing an expressive modelling language to implement state-of-the-art algorithms, to draw subsequent Bayesian inferences. The course will teach participants how to interface with Stan through R!
Despite the promise of big data, inferences are often limited by its systematic structure. Only by carefully modelling this structure can we take full advantage of the data. Stan is a platform for facilitating this modelling, providing an expressive modelling language to implement state-of-the-art algorithms, to draw subsequent Bayesian inferences. The course will teach participants how to interface with Stan through Python!
This is a one-day intensive course on the R package {shiny}. Shiny allows you to create cutting-edge interactive web-graphics. From the Shiny documentation ‘Shiny makes it incredibly easy to build interactive web applications with R. Automatic ‘reactive’ binding between inputs and outputs and extensive pre-built widgets make it possible to build beautiful, responsive, and powerful applications with minimal effort.’
Do you want to dynamically create static or interactive documents? Do you want your reports to automatically update when the data changes? Then this session is for you! R Markdown is easy to use and allows for dynamic report generation. Whether you are hoping to generate HTML, PDF or Microsoft Word like documents, or even slides for a presentation, R Markdown tailors to your needs.
This is a one-day intensive course on Python and assumes no prior knowledge. By the end of the course, participants will be able to import, summarise and plot their data. At each step, we avoid using “magic code”, and stress the importance of understanding what Python is doing.
The benefit of using a programming language such as Python is that we can automate repetitive tasks. This course covers the fundamental techniques such as functions, for loops and conditional expressions. By the end of this course, you will understand what these techniques are and when to use them.
Python has a number of packages for the effective creation of graphics to communicate your data insights. This one day course will examine a range of packages for building impactful visualisations. During the training session, we’ll cover the main Python plotting libraries: plotly, matplotlib and seaborn. Additionally, we discuss how to effectively use faceting and layers in a graphic.
From the very beginning, R was designed for statistical modelling. Out of the box, R makes standard statistical techniques easy. This course covers the fundamental modelling techniques. We begin the day by revising hypotheses tests, before moving onto ANVOA tables and regression analysis. The class ends by looking at more sophisticated methods such as clustering and principal components analysis (PCA).
As spatial data sets get larger, more sophisticated software needs to be harnessed for their analysis. R is now a widely used open source software platform for working with spatial data thanks to its powerful analysis and visualisation packages. The focus of this course is providing participants with the understanding needed to apply R’s powerful suite of geographical tools to their own problems.
This is a two-day intensive course on advanced R programming. The training course will not only cover advanced R programming techniques, such as S3/S4 objects, reference classes and function closures, we will spend significant time discussing why and where these methods are used. By the end of the course, participants will be able to use OOP within their own code.
Using databases is a fundamental part of a data scientists role. The main focus of this training course is to introduce SQL databases and how R can be used to retrieve and manipulate data stored in a relational database. We use the PostgresSQL database as an example for public courses. For in-house training, we are happy to adapt the course to match your database requirements.
This is a one-day intensive course on building a package in R. The focus will be on getting a working R package ready for distribution.
This is a one-day Docker course aimed at R users. Docker is a popular platform for packaging, deploying, and running applications. These applications run in containers. Crucially, this container can be used on any system: a developer’s laptop, systems on premises, or in the cloud. Applications are packaged as images that contain everything needed to run them: code, libraries, and configuration.
In recent years Python has exploded onto the data-science scene, and with it has come a great swathe of data-oriented packages. However, as easy as these packages make analysis, using these tools efficiently requires much more know-how. By the end of this course participants will be able to locate and address bottlenecks in their data-science workflows, using a number of different techniques and tools.
This course is for anyone who wants to make their R code faster to type, faster to run and more scalable. During the course, we’ll cover the main R sins (and how to avoid them), dabble with hardware, look at running in parallel and think about efficient R data structure. This course should be useful to people with a range of skill levels.
We are very happy to announce that following “Jumping Rivers: Bayesian Inference using Stan”, Michael Betancourt, a core developer of Stan, is running a series of 5 modules for principled statistical modelling with Stan. This module introduces Gaussian processes as a statistical modelling technique, motivating principled prior models that avoid pathological behaviour. For full event information and booking details, please visit the event page
We are very happy to announce that following “Jumping Rivers: Bayesian Inference using Stan”, Michael Betancourt, a core developer of Stan, is running a series of 5 modules for principled statistical modelling with Stan. This module introduces exchangeability and hierarchical models with a strong focus on the inherent identifiability issues and their computational consequences, as well as strategies for moderating this issues. Completion of the Regression Modelling module is recommended.
The capturing and quantification of uncertainty is a very important aspect of model-fitting and parameter inference. Bayesian inference represents a fully-probabilistic approach to parameter inference, allowing a practitioner to quantify their uncertainties through probability densities. However, fitting models in a Bayesian framework can be an involved and complicated affair, often necessitating the use of Markov chain Monte Carlo (MCMC) algorithms and their programmatic implementation.
RStudio Connect is an enterprise-grade publishing platform which gives you, the user, the ability to easily share code, documents and applications with collaborators, colleagues and clients. By the end of this course participants will be able to deploy their content to RStudio Connect, manage its access and settings, and tune how this content scales with usage.
Python (along with R) has become the dominant language in machine learning and data science. This two-day intensive course will equip you with the knowledge and tools to undertake a variety of tasks in a standard machine learning analytics pipeline. We stress the importance of data preparation, both in terms of data standardisation and feature selection, before tackling model building. We run a separate course on using Tensorflow and Keras with Python.
This two-day course is aimed at not only teaching an understanding of some of the most common machine learning techniques, but also the approach to implementing machine learning. During this course, attendees will learn how to define a problem and prepare data, the range of techniques available for solving common problems and the approaches to take to evaluate models and achieve the best results possible.
We are very happy to announce that following “Jumping Rivers: Bayesian Inference using Stan”, Michael Betancourt, a core developer of Stan, is running a series of 5 modules for principled statistical modelling with Stan. This module introduces conditional exchangeability, marginal exchangeability, and multifactor modelling (also known as multilevel or random effects modelling) with a focus on efficient implementations. Completion of the Regression Modelling and Hierarchical Modelling modules is highly recommended.
We are very happy to announce that following “Jumping Rivers: Bayesian Inference using Stan”, Michael Betancourt, a core developer of Stan, is running a series of 5 modules for principled statistical modelling with Stan. In this module we review a principled Bayesian workflow that guides the development of statistical models suited to the particular details of a given application. For full event information and booking details, please visit the event page
Deep learning is a cutting-edge machine learning technique for classification and regression. In the past few years, it has produced state-of-the-art results in fields such as image classification, natural language processing, bioinformatics and robotics. This course will cover the main ideas of deep learning, and how to implement it in practice with tensorflow: a software framework for efficient and scalable deep learning.
Python (along with R) has become the dominant language in machine learning and data science. PyTorch is an open-source machine learning library for Python, based on Torch, used for applications such as natural language processing. It is primarily developed by Facebook’s artificial-intelligence research group, and Uber’s “Pyro” software for probabilistic programming is built on it.
Dealing with big data sets in R can be painful. One small mistake, and a seemingly trivial calculation makes our computer grind to a halt. This training course is a one-day intensive practical introduction to dealing with big data. Unfortunately, there are no easy answers. So we’ll take you through the different possible strategies you might employ, clearly highlighting the positives and negatives of each.
Jane produces reports both weekly progress, monthly, quarterly and annual overviews for management and the board. She uses a variety of licensed software/tools because each one has limitations. This course aims to take each individual through the fundamental approach to using R programming in her current role. By the end of the course the individual will be working towards automating all of their reports.
We are very happy to announce that following “Jumping Rivers: Bayesian Inference using Stan”, Michael Betancourt, a core developer of Stan, is running a series of 5 modules for principled statistical modelling with Stan. This module presents linear and general linear regression techniques from a modelling perspective, using that context to motivate robust implementations. We will especially emphasize principled prior modelling strategies for linear, log, and logistic regression models. For full event information and booking details, please visit the event page
This course is aimed at statisticians and data scientists already familiar with a dynamic programming language (such as R, Python or Octave). Scala is a free modern, powerful, strongly-typed, functional programming language. In particular, it is fast and efficient, runs on the Java virtual machine (JVM), and is designed to easily exploit modern multi-core and distributed computing architectures.
This course is a practical introduction to some of every day and more sophisticated tools used for the analysis of survival data.
Predicting the future is a tough problem. Time series analysis makes it possible to assess whether or not predictions are possible and, if they are, build a model which can generate informed predictions for the future with realistic estimates of uncertainty. This training course will introduce participants to the packages in the Tidyverts. The best qualification of a prophet is to have a good memory – George Savile
This is a 1/2 day session that gives an overview of where and how R is used. Using a combination of lecture-based case studies, and hands-on practicals we’ll cover some of the latest developments in the R world. This course is intended to be interactive and is aimed at an organisation that is considering why (or why not) to move to R.
Moving you from data storage to data insights with our expert training courses.
Contact our support team if you have any questions about a specific course or if you need a course creating tailored to your needs.