Workshops

Workshops build on each other such that successive workshops use skills developed in earlier ones. All GC DRI participants will first attend workshops on core skills in the Command Line, Data Management, and Data Literacies, then they will choose between tracks in Python and R programming languages. See below for more details about each.

Python Track

Introduction to Python
Jupyter Notebook and Pandas
Text Analysis

Creating Simulations with Python

R Track

Introduction to R & RStudio
Data Wrangling in R
Data and Text Analysis in R
Visualizations with R

Core Workshops

Command Line

Introduction to the UNIX command line. Topics covered will include navigating the file system, manipulating the environment, executing useful commands, and using pipes to communicate between programs. This session will teach you how to communicate directly with your computer’s operating system using a text-based interface and is a useful first step in learning many other technical skills. By the end of the workshop, you will be able to:

Explore a comma separated values (.csv) dataset using word and line counts, head and tail, and the concatenate command cat
Learn commands to create directories (mkdir), and files (touch and echo)
Move content from one place to another using redirects (>) and pipes (|) and navigate file structures using change directory (cd), print working directory (pwd), and list (ls)\

Data Management

Data management plans (DMP) are required in most grant applications these days. In this workshop we’ll provide a DMP checklist and a good sample to learn from. We will also be discussing what research data is, why we should be sharing our data, and also how to write a data management plan for your grant proposal, or paper.

Data Literacies

Data is foundational to nearly all digital projects and often helps us to understand and express our ideas and narratives. Hence, in order to do digital work, we should know how data is captured, constructed, and manipulated. In this workshop we will be discussing the basics of research data, in terms of its material, transformation, and presentation. We will also engage with the ethical dimensions of what it means to work with data, from collection to visualization to representation. If the rest of the course is practical, then this is a small detour to allow us to sit and think about what we are doing. By the end of the workshop, you will:

Become familiar with the specific requirements of “high quality data”
Know the stages of data analysis
Learn about ethical issues around working with different types of data and analysis
Understand the difference between proprietary and open data formats

Python Track

Introduction to Python

Python is a general-purpose programming language suitable for a wide variety of tasks in the digital humanities. Learning Python fundamentals is a gateway to analyzing data, creating visualizations, composing interactive websites, scraping the internet, and engaging in the distant reading of texts. This workshop first introduces participants to core programming concepts such as data types, variables, and functions. Participants will then learn about basic control flow by writing small programs with loops and conditional statements. They will also learn to problem solve, and practice searching for answers and debugging scripts. The workshop wraps up by exposing participants to intermediate tools for further exploration. By the end of the workshop, you will:

Understand what Python is and, in general terms, what it can do.
Become familiar with core programming concepts, including variables, loops, and conditionals.
Engage with error output and use the internet and documentation to independently research language features.

Jupyter Notebook and Pandas

In this workshop, we will learn how to interact with Python using Jupyter Notebook, a free open-source web application where we can create and share documents that contain live code, equations, visualizations and narrative text. We will explore the capabilities of Jupyter while working with the pandas Python package for data wrangling, cleaning, and visualization. By the end of this workshop, you will

Feel comfortable using Jupyter Notebook to interact with Python.
Leave with a basic knowledge of the pandas Python package for cleaning and visualizing data.

Text Analysis

This session will introduce text analysis and text classification in Python using The Natural Language Toolkit (NLTK) library and scikit-learn. Through attending this session, you will learn how to use Python to analyze large amounts of text (i.e., literary works, social media corpora, etc.) to find word frequencies, collocations, and learn the basics of text classification with machine learning. This session is designed for researchers who work with various forms of text-based data. By the end of this workshop, you will be able to:

Clean and standardize your data, including powerful tools such as stemmers and lemmatizers.
Prepare texts for computational analysis, including strategies for transforming texts into numbers.
Tokenize your data and put it in a format compatible with Natural Language Toolkit.
Use NLTK methods such as concordance and similar.
Understand stop words and how to remove them when needed.

Creating Simulations with Python

In this workshop, you will learn to create a basic computer simulation in Python. Specifically, you’ll build a simulation that tracks the population growth/decline of an (imaginary) species of Critter, accounting for age, food availability, ability to reproduce, ecological disasters, and other factors. After taking this workshop, you’ll be able to:

Use classes to model and manipulate software objects.
Write your own functions and methods to perform custom computational tasks.
Simulate basic real-world processes to make predictions or model outcomes.

R Track

Introduction to R & RStudio

R is a versatile programming language, best suited for a variety of data science needs. Whether you need to run some statistical analyses, create publication-worthy data visualizations, or even publish a website, R has a multitude of tools to help you get what you need done. This workshop is intended to get interested users up to speed with the R language and the RStudio interactive development environment. By the end of the workshop, you will:

Gain familiarity with R and RStudio functionality.
Understand the basics of R programming and tools to work with your own data.
Know how to get help.

Data Wrangling in R

Raw data come in many shapes and forms. Wrangling these into an interpretable format is often a formidable task, where time for data prep exceeds time for analysis. The tidyverse, an ecosystem of opinionated R packages, can streamline this process and open up new avenues for insight. In this workshop, we are highlighting the major features of the tidyverse to help new and experienced R users get a grip on their data. By the end of the workshop, you will:

Learn to filter, sort, group, split, select, mutate, and summarize your data.
Prepare your data for analysis.

Data and Text Analysis in R

In this workshop, you will learn the necessary tools and functions in data analysis. We will introduce text processing for strings and regression methods for quantitative data. By the end of the workshop, you will be able to:

Create summary tables for the data with customized statistics and run the basic regression analysis.
Deal with strings in the dataset and do basic text processing, ex., pattern match.

Visualizations with R

In the workshop Visualizations with R, you will learn tools to visualize information and different formats of data inside the R environment. We will develop on what we learned from our previous workshops in the R track and visualize the results of our data manipulation and modeling. By the end of the workshop, you will be able to:

Use R functions to create graphs and plots
Organize your R data frame in a way to be plotted using the ggplot2 package
Import and plot spatial data files within the R environment

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.

GC Digital Research Institute

January 2022 Workshops

Workshops

Core Workshops

Python Track

R Track

Core Workshops

Command Line

Data Management

Data Literacies

Python Track

Introduction to Python

Jupyter Notebook and Pandas

Text Analysis

Creating Simulations with Python

R Track

Introduction to R & RStudio

Data Wrangling in R

Data and Text Analysis in R

Visualizations with R

Need help with the Commons?