Workshops build on each other such that successive workshops use skills developed in earlier ones.

All GC DRI participants will first attend workshops on Core skills in the Command Line and Data Ethics, then they will choose between tracks in Python and R programming languages.

Following the required Core workshops, participants may attend a number of Elective workshops, which cover more specialized skills in mapping, web development, archiving, and more.

Core Workshops

Command Line

Introduction to the UNIX command line. Topics covered will include navigating the file system, manipulating the environment, executing useful commands, and using pipes to communicate between programs. This session will teach you how to communicate directly with your computer’s operating system using a text-based interface and is a useful first step in learning many other technical skills.

Data Literacies

In this brief workshop we will be discussing the basics of research data, in terms of material, transformation, and presentation. We will also be focusing on the ethics of data cleaning and representation. If the rest of the course is practical, then this is a small detour to allow us to sit and think about what we are doing. Because everyone has a different approach to data and ethics, this workshop will also include multiple sites for discussions to help us think together as a group.

Python

Python is a programming language that can be used for a wide range of tasks, including collecting and analyzing data in a variety of formats, building web applications, and much more. It is likely the most popular language for academic researchers because of its flexibility and adaptability.

Text Analysis with Python

This session will introduce text analysis and text classification in Python using The Natural Language Toolkit (NLTK) library and scikit-learn. Through attending this session, you will learn how to use Python to analyze large amounts of text (i.e., literary works, social media corpora, etc.) to find word frequencies, collocations, and learn the basics of text classification with machine learning. This session is designed for researchers who work with various forms of text-based data.

Introduction to R

R is a versatile programming language, best suited for a variety of data science needs. This workshop is intended to get interested users up to speed with the R language and the RStudio interactive developing environment.

Data Analysis with R

Raw data come in many shapes and forms. Wrangling and visualizing data into an interpretable format is often a formidable task, where time for data prep exceeds time for analysis. In this workshop, we are highlighting the major features of the tidyverse R packages to help new and experienced R users get a grip on their data. Additionally we will use the incredibly popular ggplot2 package in R to make reproducible data visualizations of a variety of data types. Participants will learn to filter, sort, group, split, select, mutate, and summarize their data, as well as experiment with flexible data visualizations

Elective Workshops

Git

Git is a tool for managing changes to a set of files. It allows users to access open source repositories, recover earlier versions of a project, and collaborate with other contributors. This session will be beneficial to anyone working with data, code, or text.

Mapping

This session introduces simple yet powerful ways of displaying spatial information through CartoDB and QGIS. This session will be of particular interest both to researchers working with spatial information as well as anyone interested in storytelling with maps.

Omeka

Creating an online exhibition requires metadata management, attention to presentation, and an awareness of issues around archiving and preservation. Omeka is a content management system that allows the creation of online archives and exhibitions.

HTML and CSS and Platforms

Modern web pages are created using HTML to control content, CSS to control appearance, and JavaScript to dictate behavior. The second part this workshop focuses on how websites are made available on the web by discussing web-hosting, and two platforms that offer web-hosting services. This session will be helpful for anyone that wants to build on the web.

Twitter/API

APIs (Application Programming Interfaces) are a structured way for programs to communicate with other programs. A knowledge of APIs allows your programs to communicate with major services such as The New York Times and Twitter and collect data from organizations such as the Library of Congress. In this workshop, we will learn to interact with the Twitter API, doing things like tweeting and collecting metadata from a Python script.