Research Data Management

Resources, tools, and best practices to help you manage your research data at Lakehead University.

Quick Start: I need help with...

Click a topic to jump to the relevant section.

? Writing a DMP for my grant Templates, guidance, common mistakes Learn more →	? Storing sensitive data Options by classification level Learn more →	? Working with Indigenous communities OCAP® and CARE principles Learn more →
? Using AI tools safely ChatGPT, Gemini, what's allowed Learn more →	⚖️ Understanding my legal obligations FIPPA, PHIPA, data residency Learn more →	? Talking to someone Get personalized help Learn more →

The Research Data Lifecycle

Good data management spans your entire research project. This guide follows you through each stage.

Plan

→

Collect

→

Analyze

→

?️

Preserve

Click any section below to learn what to do at each stage.

Getting Started

What Is Research Data Management?

Research data management is how you organize, store, protect, and share your research data throughout your project and beyond. Think of it as the difference between a lab bench covered in unlabeled samples versus one where you can find exactly what you need, when you need it.

Why it matters:

Find your own data later. Six months from now, will you remember what "final_version_v3_FINAL.xlsx" contains?
Meet funding requirements. Tri-Agency grants now require data management plans. No plan = no funding.
Protect sensitive information. Health data, Indigenous community data, and personal information need specific security measures.
Collaborate effectively. Your grad students and co-investigators need to understand your data structure.
Preserve your work. When you leave Lakehead or when hard drives fail, your data needs to survive.

?️ Read Lakehead's RDM Institutional Strategy

Contact & Support

For personalized assistance with data management plans, security protocols, storage options, or any other RDM questions.

Andrew Austin

Research Security & Data Management Specialist

✉️ rdm.research@lakeheadu.ca

? +1 (807) 343-8010 ext. 8190

? CASES Building - FB 2004J

Additional Support

? Library RDM Support ⚖️ Research Ethics Board ?️ Technology Services Centre (TSC)

For Graduate Students

▼

Planning Your Research

Data Management Plans (DMPs)

▼

Data Classification

▼

Data Collection Best Practices

▼

Costs & Budgeting for RDM

▼

Managing Your Data

Where Should I Store My Data?

▼

All Lakehead staff and students receive 100GB of Google Drive storage through the Google Education Tenant. Data is encrypted at rest and in transit, but Google Drive stores data in US data centres (not Canadian), which may not meet all research requirements.

For research requiring Canadian data residency, the Digital Research Alliance of Canada offers Nextcloud — a Dropbox-like service with 100GB storage hosted in Canadian data centres (British Columbia).

?️ Storage Decision Guide

Answer these questions to find the right storage for your data.

Does your data contain personal or sensitive information?

Health records, personal identifiers, Indigenous community data, etc.

Data Type	Recommended Storage	Notes
Public / Internal	Lakehead Google Drive	100GB per user, US data centres, encrypted at rest/transit
Internal (Canadian residency)	DRAC Nextcloud	100GB, Canadian data centres (BC), requires CCDB account
Confidential	TSC-approved solutions	Contact TSC for current options
Long-term archival	Borealis, FRDR	After project completion
Large datasets / HPC	Digital Research Alliance	Compute-intensive research

Common Questions

"Can I use personal Dropbox or OneDrive?"

Not approved for research data with personal information. May store data outside Canada.

"What about US-based cloud services?"

May violate FIPPA. Even Canadian companies may route through US servers. Note: Google Drive data is stored in the US.

"I need Canadian data residency."

Use DRAC Nextcloud (100GB, BC data centres). Requires a free CCDB account. Data syncs between devices and is backed up nightly.

"I need more than 100GB."

Contact TSC for additional allocation. For very large datasets, consider DRAC.

Storage Resources

? Check Google Storage Usage ? DRAC Nextcloud (Canadian) ?️ Borealis (Lakehead Dataverse) ☁️ FRDR (Long-term Repository) ? Encryption & Cybersecurity Guide ? Nextcloud Documentation

File Naming & Organization

▼

RDM Best Practices

▼

The FAIR Principles

Make your data Findable, Accessible, Interoperable, and Reusable:

Findable: Rich metadata and persistent identifiers (DOIs)
Accessible: Retrievable with clear access conditions
Interoperable: Standardized formats and vocabularies
Reusable: Clear licenses and provenance

Note: FAIR ≠ open. Data can be FAIR with controlled access.

? Learn more at GO-FAIR.org

The 3-2-1 Backup Rule

The gold standard for protecting your research data from loss.

Copies of your data

Original + 2 backups

Different storage types

Don't put all eggs in one basket

Off-site copy

Protects against local disasters

Example: 3-2-1 Using Lakehead Resources

Here's how a researcher could implement 3-2-1 for a typical project:

Copy	Location	Storage Type	Purpose
Copy 1	Lakehead Google Drive Working copy	Cloud storage (US servers)	Day-to-day work, collaboration, automatic sync
Copy 2	External hard drive Office or home	Local physical storage	Weekly backup, fast recovery if cloud fails
Copy 3	DRAC Nextcloud Canadian data centre (BC)	Cloud storage (different provider)	Off-site backup, Canadian data residency

Why this works: If Google has an outage, you have local backup. If your office floods, you have two cloud copies. If one cloud provider fails, you have another. Different failure modes are covered.

Alternative Configurations

For Canadian data residency:
DRAC Nextcloud (primary) + External drive + Borealis (archive)
For large datasets:
DRAC project storage (primary) + Tape/nearline + Google Drive (docs only)
For sensitive health data:
TSC-approved storage + Encrypted external drive + Encrypted off-site

Common Mistakes

Two copies on same physical drive ≠ 2 copies
Synced folders aren't backups (deletions sync too)
External drive kept next to computer isn't "off-site"
Never testing if backups actually restore
Backing up only at project end

? Backup Schedule Suggestion

Daily: Working files auto-sync to cloud (Google Drive/Nextcloud)
Weekly: Manual backup to external drive + verify sync is working
Monthly: Test restore a random file from each backup location
At milestones: Create dated archive copy (e.g., "2025-01-15_data_collection_complete")

README Files

Every dataset needs a README explaining:

Project description and collection methods
File inventory and variable definitions
Units, formats, missing data codes
Access conditions and contact info

? Cornell README Template

Metadata & Documentation

▼

Metadata is "data about data" — the information that makes your data findable, understandable, and reusable. Without good metadata, even well-organized files become unusable.

Data Dictionaries / Codebooks

Essential for any dataset with variables. Document each variable with:

Variable Name	Description	Type	Valid Values	Missing Code
participant_id	Unique participant identifier	String	P001-P999	N/A
age_years	Age at enrollment	Integer	18-99	-99
consent_date	Date consent signed	Date	YYYY-MM-DD	blank

Persistent Identifiers

Permanent links that ensure your work remains findable even if websites change.

DOIs (Digital Object Identifiers)

Permanent links for datasets, publications, and other research outputs.

Example: 10.5683/SP3/ABC123

Borealis and FRDR automatically assign DOIs to deposited datasets.

ORCIDs (Researcher IDs)

Your unique researcher identifier that links all your work.

Example: 0000-0002-1234-5678

Discipline-Specific Metadata Standards

Many fields have established standards. Using them makes your data interoperable.

? DDI (Social Sciences) ?️ Dublin Core (General) ? Darwin Core (Biodiversity) ? RDA Metadata Directory

? Documentation Tip

Write documentation as if you're explaining your data to a stranger who will use it five years from now — because that stranger might be you.

Collaboration & Sharing

▼

Compliance & Ethics

Privacy & Legal Compliance

▼

Handling Sensitive Data

▼

Sensitive data requires extra precautions throughout its lifecycle. This section covers practical techniques for protecting confidential information.

De-identification vs. Anonymization

These terms are often confused, but they have different meanings with significant legal and ethical implications.

Aspect	De-identified Data	Anonymous Data
Definition	Direct identifiers removed, but re-identification may be possible with additional information	No reasonable possibility of re-identification, even with additional data
Key linking	Often maintains a key linking codes to identities (held separately)	No key exists — link permanently broken
Privacy law status	Still considered personal information under FIPPA/PHIPA	May fall outside privacy legislation scope
REB oversight	Usually still requires REB approval and oversight	May not require ongoing REB oversight (but verify)
Data sharing	Typically requires DSAs and restricted access	Can often be shared more freely
Reversibility	Can be re-identified if needed (e.g., for follow-up)	Cannot be reversed — participants cannot be contacted again

De-identification Example

A health study replaces patient names with codes (P001, P002) and stores the linking key in a separate secure file. The researcher can re-contact participants if needed.

Risk: If someone obtains both the data and the key, participants can be identified.

Anonymization Example

A survey dataset has all identifiers permanently removed, dates generalized to year only, and geographic data aggregated to regional level. No key exists.

Trade-off: Cannot go back to participants for clarification or follow-up studies.

Types of Identifiers

Direct Identifiers (Always Remove)

Names (including initials)
Social Insurance Numbers
Health card numbers
Email addresses
Phone numbers
Full addresses
Photos/videos showing faces
Biometric data
IP addresses

Indirect/Quasi-Identifiers (Assess Risk)

Dates (birth, admission, death)
Geographic data (postal codes, cities)
Occupation + employer combination
Rare diseases or conditions
Ethnicity in small populations
Unique event dates
Institutional affiliations
Detailed age (use ranges instead)

Common De-identification Techniques

Technique	Description	Example
Suppression	Remove the value entirely	Delete name column
Generalization	Make values less specific	Age 47 → "45-49" range
Pseudonymization	Replace with artificial identifiers	"Jane Smith" → "P0042"
Date shifting	Shift all dates by random interval	All dates +/- 30 days
Top/bottom coding	Cap extreme values	Age 95 → "90+"
Data swapping	Exchange values between records	Swap postal codes between similar records

⚠️ The "Mosaic Effect"

Even when individual data elements seem harmless, combining multiple quasi-identifiers can uniquely identify someone. Example: "Female + Age 34 + Profession: Pilot + City: Thunder Bay" may identify only one person. Always assess re-identification risk across the entire dataset, not just individual fields.

De-identification Resources

? McGill Anonymization Guide ? Ontario IPC Guidelines (PDF) ?️ Amnesia (Free Tool) ? ARX Anonymization Tool

Encryption

Encryption scrambles data so only authorized users can read it.

When Encryption is Required

Confidential data on portable devices
Data transfers outside secure networks
PHIPA-regulated health information
When specified by REB or funder

Types of Encryption

At rest: Files on disk (BitLocker, FileVault)
In transit: Data being transmitted (HTTPS, SFTP)
End-to-end: Only sender/receiver can decrypt

? Lakehead Encryption Guide

Secure File Transfer

Never send confidential data via regular email.

SFTP: Secure File Transfer Protocol — encrypted transfers to servers
Google Drive (Lakehead): Share links with specific people, not "anyone with link"
Encrypted email: Use institutional tools for sensitive attachments
Globus: For large research dataset transfers between institutions

⚠️ If You Suspect a Data Breach

Don't panic, but act quickly.
Document: What data, how many records, when discovered, how it may have occurred
Report immediately: Contact TSC and the REB
Preserve evidence: Don't delete files or emails related to the incident
Follow institutional procedures: Lakehead has breach notification requirements

TSC Security: Report security incidents to the Technology Services Centre immediately.

Indigenous Data Sovereignty

▼

Data Retention & Disposal

▼

Retention Requirements

Per the LUFA Collective Agreement: minimum 7 years after project completion. Contracts or funders may extend this.

The Tri-Agency RDM Policy also requires data preservation for validation and reuse purposes.

Disposal Methods

Classification	Electronic	Paper	Devices
Confidential	Secure wipe	Certified shred	Return to TSC
Internal	Delete + backups	Shred	Return to TSC
Public	Delete	Any method	Return to TSC

Third-party contracts: Providers must return or destroy data with written certification within 30 days.

Sharing & Preservation

Data Deposit & Publication

▼

Depositing your data in a repository preserves it for the long term and makes it findable and citable. This is often required by funders and journals.

When to Deposit

At publication: Many journals require data availability statements and DOIs
At project completion: Before grant closes and team disperses
After embargo: Some data can be embargoed during patent applications or ongoing analysis
Before you leave: If you're graduating or leaving Lakehead, deposit before losing access

Choosing a Repository

Repository	Best For	Key Features
Borealis (Lakehead)	Most research data	Canadian, free, DOIs, access controls
FRDR	Large datasets (100GB+)	Curated, discovery platform
Zenodo	Code, supplementary materials	GitHub integration, free
Discipline-specific	Field standards	ICPSR (social), GenBank (genomics), etc.

Use re3data.org to find discipline-specific repositories.

Preparing Data for Deposit

File Preparation

Use open, non-proprietary formats (CSV, TXT, PDF/A)
Remove or de-identify personal information
Include README and data dictionary
Organize files logically
Use clear, descriptive file names

Metadata to Include

Title, authors, description
Keywords and subject terms
Collection methods
Geographic and temporal coverage
Related publications

Choosing a License

Licenses tell others how they can use your data.

CC0 (Public Domain)

No restrictions. Maximum reusability. Recommended for data.

CC-BY (Attribution)

Users must cite you. Good for most research data.

CC-BY-NC (Non-Commercial)

No commercial use. Limits some research applications.

Restricted Access

Users must request access. For sensitive data.

DOIs and Data Citation

When you deposit data, repositories assign a DOI (Digital Object Identifier) — a permanent link that makes your data citable.

Example citation:

Smith, J., & Jones, M. (2024). Survey data on Northern Ontario housing [Data set]. Borealis. https://doi.org/10.5683/SP3/EXAMPLE

Include your data DOI in publications and link your dataset to your ORCID profile.

✓ Deposit Checklist

☐ Data cleaned and de-identified (if needed)
☐ Files in open formats
☐ README file included
☐ Data dictionary/codebook included
☐ License selected
☐ Metadata complete
☐ Embargo period set (if needed)
☐ DOI obtained and recorded

Discipline-Specific Guidance

▼

Different research fields have unique data management considerations. Find guidance for your discipline below.

? Health Research

Key requirements:

PHIPA compliance: Ontario health information must be protected under the Personal Health Information Protection Act
De-identification required: Remove all direct identifiers before sharing or publishing
Secure storage: Confidential classification — contact TSC for approved solutions
Data sharing: Often requires DSAs and REB approval for secondary use
Retention: Typically 10+ years for clinical research

Repositories: Restricted-access deposits on Borealis, ICPSR for survey data, dbGaP for genomic data

? Social Sciences

Key considerations:

Qualitative data: Interview transcripts and field notes require careful de-identification
Consent for sharing: Include data sharing in consent forms from the start
Codebooks essential: Survey data needs comprehensive variable documentation
Longitudinal considerations: Plan for linking data across time points securely

Repositories: Borealis, ICPSR, Qualitative Data Repository (QDR), UK Data Archive

? Lab Sciences

Key considerations:

Instrument data: Document equipment settings, calibration, and software versions
Lab notebooks: Electronic lab notebooks provide version control and timestamps
Raw vs. processed: Preserve raw data separately; document all processing steps
Reproducibility: Include analysis scripts and computational environment details
Large file sizes: May require DRAC storage or discipline-specific repositories

Repositories: Zenodo, Figshare, discipline-specific (GenBank, PDB, PANGAEA)

? Computational Research

Key considerations:

Version control: Use Git for code; tag releases corresponding to publications
Environment documentation: requirements.txt, conda environments, Docker containers
Code citation: Get DOIs for software through Zenodo-GitHub integration
Licensing: Choose appropriate open-source license (MIT, GPL, Apache)
README files: Include installation, usage instructions, and examples

Repositories: GitHub + Zenodo, Software Heritage, CodeOcean

? Environmental & Field Research

Key considerations:

Geospatial data: Include coordinate reference systems, precision, and collection methods
Temporal data: Document time zones, sampling frequency, and any gaps
Field conditions: Record weather, equipment issues, and deviations from protocol
Indigenous territories: Follow OCAP® principles for research on traditional lands
Sensor data: Document calibration and any post-processing applied

Repositories: PANGAEA, Dryad, Environmental Data Initiative (EDI), GBIF

Special Considerations

AI in Research

▼

Survey Tips: Avoiding Bots

▼

Research Websites

▼

Resources

National Research Infrastructure

▼

Lakehead University TSC

▼

Training & Events

▼

Templates & Downloads

▼

What If Things Go Wrong?

▼

Data emergencies happen. Here's what to do when things don't go as planned.

?️ Accidentally Deleted Files

Google Drive:

Check Trash — files stay for 30 days
For files deleted from Trash, contact TSC immediately — recovery may be possible within 25 days

Local files:

Stop using the drive immediately to prevent overwriting
Check backups (external drives, cloud sync)
Contact TSC — they may have backup options

? Lost Access to Storage

Lakehead account issues: Contact TSC Help Desk
Shared drive access: Contact the drive owner or your supervisor
DRAC/Alliance resources: Contact Alliance support or renew your CCDB account
Left/graduated: Your supervisor or department can request access to institutional data

⚠️ Suspected Data Breach

Report immediately to TSC and your supervisor
Document what happened, when, and what data may be affected
Don't delete anything — preserve evidence
Notify REB if human participant data is involved
Follow institutional procedures — Lakehead has breach notification requirements

TSC Security Contact: TSC Help Desk

? Hardware Failure

Laptop/computer died: If drive is intact, data may be recoverable — contact TSC
External drive failed: Professional recovery is expensive ($500-$2000+) and not guaranteed
Prevention: Follow the 3-2-1 rule (3 copies, 2 different media, 1 offsite)

? Collaborator Conflict Over Data

Review agreements: Check your DMP, DSA, or any written agreements about data ownership
Consult your supervisor or department head
Contact Research Services: They can advise on institutional policies and help mediate
Document everything: Keep records of contributions and communications

Prevention: Clarify data ownership and access rights in writing before starting collaborative projects.

? Corrupted Files

Check version history: Google Drive keeps versions for 30 days (or 100 versions)
Restore from backup: This is why regular backups matter
Try file repair: Some software can recover partially corrupted files
Raw data priority: If you have raw data, you can regenerate processed files

?️ Prevention is Better Than Recovery

Most data emergencies are preventable with good practices:

Regular automated backups
Version control for important files
Clear documentation so others can help
Data ownership discussions before projects start

Glossary of Terms

▼

Quick reference for common research data management terms and acronyms.

CCDB

Compute Canada Database — account system for accessing Digital Research Alliance resources

De-identification

Removing direct identifiers from data; re-identification may still be possible with additional information

DMP

Data Management Plan — document outlining how research data will be handled throughout a project

DOI

Digital Object Identifier — permanent link for datasets, publications, and other research outputs

DRAC

Digital Research Alliance of Canada — national organization providing research computing and data management infrastructure

DSA

Data Sharing Agreement — formal contract governing how data can be shared between parties

FAIR

Findable, Accessible, Interoperable, Reusable — principles for scientific data management

FIPPA

Freedom of Information and Protection of Privacy Act — Ontario legislation governing public sector privacy

FRDR

Federated Research Data Repository — Canadian national repository for large research datasets

GDPR

General Data Protection Regulation — European Union privacy law affecting research with EU data

Metadata

"Data about data" — information describing the content, context, and structure of research data

OCAP®

Ownership, Control, Access, Possession — First Nations principles for data governance

ORCID

Open Researcher and Contributor ID — unique identifier linking researchers to their work

PHIPA

Personal Health Information Protection Act — Ontario law governing health information privacy

Principal Investigator — lead researcher responsible for a research project

PIPEDA

Personal Information Protection and Electronic Documents Act — federal Canadian privacy law

RAC

Resource Allocation Competition — process for requesting large allocations of DRAC computing resources

REB

Research Ethics Board — committee that reviews research involving human participants

TCPS 2

Tri-Council Policy Statement — Canadian ethical guidelines for research involving humans

TSC

Technology Services Centre — Lakehead University's IT department

README Template Cornell University	Download →
Data Dictionary Template OSF Template	Download →
Codebook Guide ICPSR	View Guide →
DMP Assistant DRAC Tool	Open Tool →

Data Classification Quick Reference Lakehead PDF	Download PDF →
File Naming Best Practices DRAC PDF	Download PDF →

Contact Us

Thunder Bay

Research Data Management

Quick Start: I need help with...

On This Page

The Research Data Lifecycle

What Is Research Data Management?

Why it matters:

Contact & Support

Andrew Austin

Additional Support

For Graduate Students

Your Data vs. Your Supervisor's Data

What Happens When You Graduate?

Building Good Habits Early

✓ Do This

✗ Avoid This

? Thesis Data Checklist

Data Management Plans (DMPs)

What is a DMP?

Why You Need a DMP

Common DMP Mistakes

DMP Resources

Lakehead Resources

National Resources

Funder-Specific Guidance

Data Classification

Confidential

Internal

Public

Classification Resources

Data Collection Best Practices

Before You Collect

During Collection

Quality Control

Common Pitfalls

Survey-Specific Guidance

Approved Collection Tools

Costs & Budgeting for RDM

What's Free

Lakehead Resources

National Resources

What May Have Costs

Including RDM in Grant Budgets

? Budget Tip

Where Should I Store My Data?

?️ Storage Decision Guide

Common Questions

Storage Resources

File Naming & Organization

✓ Do This

✗ Don't Do This

Recommended Folder Structure

RDM Best Practices

The FAIR Principles

The 3-2-1 Backup Rule

Example: 3-2-1 Using Lakehead Resources

Alternative Configurations

Common Mistakes

? Backup Schedule Suggestion

README Files

Metadata & Documentation

Data Dictionaries / Codebooks

Persistent Identifiers

DOIs (Digital Object Identifiers)

ORCIDs (Researcher IDs)

Discipline-Specific Metadata Standards

? Documentation Tip

Collaboration & Sharing

Google Workspace

Best Practices

Privacy & Legal Compliance

Key Legislation

FIPPA

PHIPA

PIPEDA

TCPS 2

Canadian Data Residency

Security Resources

Handling Sensitive Data