Getting started on FASSE#

The following are instructions for logging in to FASSE and setting up your own workspace.

Prerequisites. Join our project group#

  1. Get a FASRC account by requesting it here.

  2. Navigate to the Add Grants page in portal, you will need to login with your FASRC account

  3. Expand the plus sign next to “Other”

  4. Find the project group you want to be added to: dominici_nsaph

  5. Select the checkbox for the project group you want to be added to

Your PI will have to approve the addition. Once you’re notified of the approval, it can take up to an hour for your permissions to be configured. If you’re not able to access the VPN or your home directory, try waiting an hour and logging in again.

Step 1. Connect to Harvard’s VPN#

Install Cisco AnyConnect client to connect to the FASRC VPN. Install 2FA, i.e., Google Authenticator for FASRC. Set it up as explained here.

  1. Type vpn.rc.fas.harvard.edu in the Cisco AnyConnect text box (see figures).

  2. Type your username in the format username@fasse, password and verification code (same as for FASRC).

_images/fasse_vpn.png
_images/fasse_form.png

Warning

CMS prohibits accessing data while outside of the U.S., this includes not only opening data files but also submitting code/jobs to run on the data.

Step 2. Access FASSE#

There are a few ways to access FASSE. You can access it via VDI/OoD (in the web browser) by clicking the link here: https://fasseood.rc.fas.harvard.edu/

You can also access it via command line (Terminal) by typing: ssh username@fasselogin.rc.fas.harvard.edu. To learn more about working in the command line, check out this Unix Shell tutorial.

Note

The username, password and verification code are the same as in the previous step (and the same as for FASRC).

Tip

For more information, see the official documentation.

Step 3. Project workspace#

Your project name should be informative for the group members and outsiders. Think of a project name in the following format:

<exposure>-<outcome>-<method>
  • Exposure examples: pm-components, pm-no2, pm-no2-o3, heat-alert

  • Outcome examples: cardiovascular, respiratory, adrd

  • Method: reinforcement-learning, causalgps

For example: heat_alert-mortality-reinforcement_learning or shorter heat_alert-mortality-rl.

In practice, you may have multiple exposures and outcomes. In that case, use your best judgement for your project name based on the guidelines. Avoid adding information such as usernames and current date or year.

Next, you should create a folder with your project name in the NSAPH projects folder at /n/dominici_nsaph_l3/projects. You can do that by opening “File System” in FAS-RC Remote Desktop and navigating to the projects folder (see Fig.).

_images/img_1.png

Create there a new folder with your project name (ie, heat_alert-mortality-rl).

Note

Use your project name folder in /n/dominici_nsaph_l3/projects as a workspace for your analysis data and code.

Step 4. Create a git repository on GitHub#

Navigate to NSAPH Projects GitHub organization in your web browser. NSAPH Projects GitHub organization is a shared account where all NSAPH members can collaborate across many projects at once. If you are not already a member of NSAPH Projects, ask one of the admins to add you to the organization.

Crete a new git repository under NSAPH Projects and name it with your project name.

Going forward, make sure to update your GitHub repository daily with your analysis code and documentation. If you are not familiar with using git, check out this git tutorial. Also, check out our guidelines for collaborative work on GitHub.

Note

You should link your GitHub account to the FASSE workspace by typing the commands below in FASSE’s command line. By doing this, all code contributions (commits) from FASSE will be linked to your GitHub account.

git config --global user.name "Mona Lisa"
git config --global user.email "email@example.com"

Step 5. Analytic Data#

Much of the NSAPH data is already available on FASSE. Check out the data catalogue here.

If you’d like to use any of the analytic datasets, create a symbolic link (symlink) of that dataset instead of creating a new copy. A symbolic link is a reference to another file or directory that the operating system interprets as a path to that file or directory (a shortcut).

This is how you create a symlink from your data folder (in the command line):

cd data
ln -s ../analytic/DATA_FOLDER .

Step 6. Setting up R and RStudio#

To load R and install packages, follow these directions. If you’re using RStudio, you’ll need your R_LIBS_USER path to set up the interactive session.

In RStudio, if you want to see files outside of your home directory, you can click the three dots on the upper right-hand side of the Files window in RStudio (under the refresh arrow) and type in the directory path you want. If you want to save files outside your home directory, you can change your working directory using the command setwd([directory path]) in the Console.

Tip

If you are using R software in your analysis, have a look at best practices and recommendations here and here.

Step 7. Organize your folder#

Consider organizing your project folder (and repository) as follows:

project-name
├── README.md
├── data/
├── code/
├── figures/
├── reports/
├── results/
└── .gitignore

Tip

Have a look at the NSAPH Project Template. Also, here is another template example for new research projects: https://github.com/djnavarro/newproject/

Make sure to use the README.md and .gitignore special files. A README.md file is a standard documentation file where you should put information about the content of your repository. A .gitignore file tells Git which files to ignore when committing your project to the GitHub repository. It should be located in the root directory of your repo. Large data file and sensitive data should be ignored by Git.

Warning

Be careful not to push sensitive data on GitHub. Don’t forget that Medicare/Medicaid data should not leave Harvard, but your analysis code should be versioned with Git. A .gitignore file helps with that.

Add path and/or file names of your data in the .gitignore file. You can ignore the data sub-folder and/or all files of a certain format like .csv, .nc or .rst. Add these as new lines in .gitignore. For example:

data/
*.csv 
*.nc
*.rst