January 2020 Workshops – GC Digital Research Institute

Workshops build on each other such that successive workshops use skills developed in earlier ones. All participants attend workshops on core skills, then choose which skills they wish to develop further through advanced workshops.

Command Line

Introduction to the UNIX command line. Topics covered will include navigating the file system, manipulating the environment, executing useful commands, and using pipes to communicate between programs. This session will teach you how to communicate directly with your computer’s operating system using a text-based interface and is a useful first step in learning many other technical skills.

Digital Ethics and Data

In this brief workshop we will be discussing the basics of research data, in terms of material, transformation, and presentation. We will also be focusing on the ethics of data cleaning and representation. If the rest of the course is practical, then this is a small detour to allow us to sit and think about what we are doing. Because everyone has a different approach to data and ethics, this workshop will also include multiple sites for discussions to help us think together as a group.

Git

Git is a tool for managing changes to a set of files. It allows users to access open source repositories, recover earlier versions of a project, and collaborate with other contributors. This session will be beneficial to anyone working with data, code, or text.

Python

Python is a programming language that can be used for a wide range of tasks, including collecting and analyzing data in a variety of formats, building web applications, and much more. It is likely the most popular language for academic researchers because of its flexibility and adaptability.

Text Analysis (Python)

This session will introduce text analysis and text classification in Python using The Natural Language Toolkit (NLTK) library and scikit-learn. Through attending this session, you will learn how to use Python to analyze large amounts of text (i.e., literary works, social media corpora, etc.) to find word frequencies, collocations, and learn the basics of text classification with machine learning. This session is designed for researchers who work with various forms of text-based data.

Introduction to R

R is a versatile programming language, best suited for a variety of data science needs. This workshop is intended to get interested users up to speed with the R language and the RStudio interactive developing environment.

Data Manipulation

Raw data come in many shapes and forms. Wrangling these into an interpretable format is often a formidable task, where time for data prep exceeds time for analysis. The tidyverse, an ecosystem of opinionated R packages, can streamline this process and open up new avenues for insight. The developers’ emphasis on human-readable code reduces the barrier to entry for folks more familiar with navigating a spreadsheet than the command line. In this workshop, we are highlighting the major features of the tidyverse to help new and experienced R users get a grip on their data. Participants will learn to filter, sort, group, split, select, mutate, and summarize their data so that they can prepare for whatever analysis they think up.

Data Visualization

Visualizing data effectively is an important component of any research project, from initial data exploration to the final research product. The same data can tell many stories, and this workshop will give you the tools to uncover them. Using the incredibly popular ggplot2 package in R, you will learn how to make reproducible data visualizations of a variety of data types. By the end, you will have the necessary skills to forge your own way forward.

Mapping

This session introduces simple yet powerful ways of displaying spatial information through CartoDB and QGIS. This session will be of particular interest both to researchers working with spatial information as well as anyone interested in storytelling with maps.

Omeka

Creating an online exhibition requires metadata management, attention to presentation, and an awareness of issues around archiving and preservation. Omeka is a content management system that allows the creation of online archives and exhibitions.

HTML and CSS and Platforms

Modern web pages are created using HTML to control content, CSS to control appearance, and JavaScript to dictate behavior. The second part this workshop focuses on how websites are made available on the web by discussing web-hosting, and two platforms that offer web-hosting services. This session will be helpful for anyone that wants to build on the web.

Twitter/API

APIs (Application Programming Interfaces) are a structured way for programs to communicate with other programs. A knowledge of APIs allows your programs to communicate with major services such as The New York Times and Twitter and collect data from organizations such as the Library of Congress. In this workshop, we will learn to interact with the Twitter API, doing things like tweeting and collecting metadata from a Python script.

This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.