How do I choose? R or Python at DRI… First, it is important to note that you can accomplish most projects you might envision (data analysis, textual analysis, data visualization, etc.) in either R or Python. Both languages are free to install and use, unlike proprietary data analysis software like SPSS or Stata. A growing number of programmers are learning both languages. At an introductory level, the Python community has created more versatile resources to get started and the language itself is more human readable, compared to R, which was developed initially for statisticians and has more data analysis specific tools. See our DRI facilitator testimonials for examples. At the Digital Research Institute, we also want to emphasize that your research should drive which language you learn first. This Datacamp resource guides you through some of the major considerations when choosing a first programming language to learn. The resource has a number of questions to help you assess your own needs and discipline, including thinking about what your project needs are, what language packages you might need in, which language your colleagues use. The Two Tracks Each track of the institute will center around different learning outcomes. When deciding which track to take, we’d suggest considering your immediate or potential research interests, and choosing the track that aligns most closely with your goals. Both the R and the Python track will begin by teaching you basic command line fundamentals, along with basic data management and data literacy. After that, you will spend time focusing more specifically on your language of choice. See below for a brief synopsis of each track: The R Track The R track is designed to teach you increasingly complex ways to manage, manipulate, and visualize datasets. Our workshops provide a scaffolded approach meant to familiarize you with the R language and the RStudio interactive development environment. You’ll learn to install and load packages, read in data, and navigate the RStudio environment efficiently. Once you are comfortable with the environment, we’ll guide you into basic data wrangling, the art of getting your data into R in a useful format for visualization and modeling. We’ll then learn how R allows the user to perform complex analytical tasks easily, quickly, and succinctly. You’ll learn to perform useful qualitative and quantitative analysis on datasets and a textual corpus and to utilize R packages and libraries to make this process easier. Finally, you’ll apply all of these skills in a series of projects in which you’ll learn to visualize your data and work with geospatial data to create mapping projects. The Python Track Python is an inherently flexible language that can be employed in many different contexts. Our workshops will emphasize this flexibility by teaching you a variety of skillsets through a scaffolded learning approach. We’ll start off by introducing you to the basics of the Python programming language and then create increasingly complex scenarios in which you can apply this knowledge. Like the R track, a major focus will be on data exploration, analysis, and visualization with Python. We will introduce you to the Pandas library and Jupyter Notebook, tools that have become the standard for data analysis with Python. You will also learn to perform in-depth analyses on textual sources to give your research new insights. Finally, you’ll apply these skills in a project in which you’ll create a Python simulation that allows you to write your own functions and model large-scale, real-life scenarios and processes. **************************************** See below for some useful projects and resources that have been centered around Python and R: R resources, including packages and projects: A computational social science perspective on qualitative data exploration: Using topic models for the descriptive analysis of social media data Her R code is also available on her github repo: https://github.com/MariaYR/WhyIStayed The R Graph Gallery Python resources, including packages and project Academic Open Source Projects on Github Reddit on PrEP: Posts About Pre-exposure Prophylaxis for HIV from Reddit Users, 2014–2019 **************************************** We also asked our DRI workshop facilitators what they use each language for to get research-specific examples. “I personally find R has a very good statistical nature. I used R for running survey data analysis in an experiment-based research project involved with human subjects. R has a sophisticated package `psych` for psychometric analysis, including exploratory factor analysis, discriminant analysis, ANOVA/ANCOVA, etc. These functions are well integrated and do the same job in SPSS software based on my experience.” — Yuxiao, Business Information Systems, GCDI digital fellow “R can satisfy all my needs in regression analysis. It has a series of native functions for different kinds of regressions. These regressions are a necessary part of my research project for causality analysis. I used Python for data collection in one of my projects, in which I used BS4 and Selenium packages for web scraping of glassdoor.com. Python has many customized APIs (Application Programming Interface) to work with, which makes online data collection convenient.” — Yuxiao, Business Information Systems, GCDI digital fellow “I used R for one of my projects and ran through a series of text analysis. I employed dictionary-based word count, sentiment analysis, and topic modeling in my research method, and analyzed the CEO letters of Fortune 500 firms. The R packages `tidytext`, `topicmodels`, and `tm` helped me with those tasks.” — Yuxiao, Business Information Systems, GCDI digital fellow “I have used R throughout most of my academic career. As a biologist, I mostly have used it for statistical analyses, from basic ones like regression and ANOVA to more complex tools like Machine Learning and spatial modeling. Recently, I have been interested in implementing R for many manipulation and visualization of spatial data.” — Rilquer, Biology, GCDI digital fellow “I learned R in my stats class and ‘taught myself’ basic Python to do a textual analysis project. I found myself using Python as default because it is easier to start and specifically, I chose Python for a textual analysis project because it was more beginner friendly and the Github codes were in Python. Python helped me think logically and when I went back to using R in other projects, I felt more comfortable. I later used R for a project to analyze conjoint experiment data that an R package was specifically designed to handle.” — Leanne, Sociology, GCDI digital fellow “Python is a great language to learn due to its flexibility and simplicity. Whether you are wanting to visualize data, build an interactive website, or perform qualitative/quantitative analysis of a set of texts, Python offers a number of libraries and tools that make the process easier. I personally enjoy using Python to create educational computer games for use in my classrooms.” — Zach, Comparative Literature, GCDI digital fellow This entry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.