Hire a data manager

By Crystal Lewis in tips

July 7, 2023

If you were struck by lightning, would other people be able to access and understand your data? - Sarah Arena

Even with the growing open science movement, the expanding requirements for publicly sharing sharing federally funded data, and the ongoing news about data falsification motivating us to produce more accurate, documented, and reproducible data, many researchers are still not able to confidently say “yes” to this question.

One solution to this problem is to hire a data manager. In clinical research, the term “data manager” is well-known. If you do a Google search for the term “research data manager”, you’ll find an abundance of job ads for data managers in clinical research. You can also easily find information on training to become a clinical data manager as well as articles on the skills required and expected salaries. Yet, it is much less common to find similar positions in education or developmental science research.

In fields such as education research, it becomes more common for other team members to take on this data management role, in addition to their other responsibilities. While it may be feasible for a project coordinator or PI to take on data management tasks for a single project, as the number of projects grows within a lab, team members can be spread too thin and data management can suffer. Hiring someone to specifically focus on data management allows all team members to specialize and excel in their area of expertise.

Data management takes expertise.

In a 2019 blog post on finding research data management (RDM) support, Professor Bas Teusink was quoted as saying

As a Principal Investigator, I have no idea how to instruct my students in RDM. I’m not an expert. So I needed support. I needed somebody who actually has the time to look up what tools are available and who can translate general policies and general infrastructure into daily practical solutions that fit our local needs. There’s a huge gap between policy and implementation for people doing the daily work. We need discipline-specific support and we need hands-on help.

In Season 2 Episode 13, when discussing data management, the hosts of the Within & Between podcast remind us

That is an area of expertise you can have……you can be an expert in the idea of taking a question that gets asked to a human being on paper, and turning that into data in a spreadsheet somewhere.

Data management takes time.

When working on an analysis, it is common for data wrangling to take at least half of the project’s time.

Not only does data wrangling take time, but so do all the activities surrounding collecting, documenting, and sharing that data.

Data management takes interest.

In Season 4 Episode 7 of the Quantitude podcast, the hosts bring up a common theme around data management, and that is that most people don’t like managing data.

In talking about perceptions around teaching data management, the hosts say

And a lot of people look down on that……that’s just a mechanical kind of thing.

Data management is often considered monotonous and boring; a task many people want to quickly finish so they can begin analyzing data. Yet, for data managers, data management is not a means to an end, it is the end. They appreciate the process of organizing and describing data and take pride in producing well-curated data products.

What is a data manager?

A data manager is someone responsible for overseeing the integrity and security of data throughout the life cycle of a research project. While the roles may vary, the term data manager can be somewhat synonymous with many other terms.

  • Data Steward: A data quality expert that oversees data governance in an organization.
  • Data Wrangler: Someone who organizes data into shareable data products.
  • Data Champion: Someone who drives data culture in an organization.

Examples of data management roles

From DOI: 10.1629/uksg.484 From DOI: 10.5281/zenodo.3332807

While you can often find very helpful research data management services through institutions such as university libraries, those services do not replace the need to have a permanent team member that oversees ongoing data management.

Common tasks often associated with a data manager role include:

  • Writing or contributing to data management and sharing plans
  • Creating project and dataset level documentation (e.g., data dictionaries, protocols)
  • Building data collection and tracking tools
  • Creating reproducible data cleaning and validation pipelines
  • Overseeing data sharing (e.g., working with repositories, responding to data requests)
  • Designing and overseeing workflows, ensuring the integrity of data every step of the way

Yet, a data manager can also help build a data culture in your organization:

  • Create team level documentation (e.g., data governance documents, style guides, templates)
  • Train other staff in data management skills (e.g., onboarding, coding skills, best practices)
  • Help investigate and vet new data tools
  • Promote equitable data practices
  • Oversee all data policies (e.g., ownership, licensing, confidentiality)
  • Ensure standardization of practices across projects
  • Act as a data champion, guiding and inspiring team members to care about data management
    • “If you’re passionate about your work, it makes the people around you want to be involved too.” - Wanda Sykes

You’ll notice that there are several tasks that I left off of this list that you might be wondering about (e.g., analyzing data, supervising staff, report development, visualization development). While those tasks may be added to the data manager role if the data manager and team agree to include those tasks, I don’t think those should be the focus of a data manager role. Believe it or not, the tasks mentioned above easily fill up a full-time role on a team, especially if the team is running more than one project. If the team is running several projects (e.g., > 3), those tasks can easily fill up more than one full-time role and you may even consider hiring two data managers. Quality data management takes time and adding extraneous tasks to this role means that the quality of your data begins to suffer, defeating the purpose of hiring the role in the first place.

Data manager skills

The skills needed to be a data manager can range. Education levels can be anywhere from a bachelor’s degree to a Ph.D. When hiring for this role, it is typically more important to focus on skills, experience, and interest. Some ideas of what to look for include:

  • Technical skills
    • These skills will vary based on the technology used by the lab and the types of data collected, but the general skills that can be helpful include:
      • Understanding of database structure (i.e., how datasets are organized)
      • Experience building reproducible data cleaning pipelines
      • Coding experience in the tool preferred by the lab (e.g., R, Stata)
      • Specific software/tool experience (e.g., REDCap, Qualtrics)
  • Domain skills
    • It will be invaluable to have domain skills in addition to technical skills. In education research you may look for someone who has worked with education data, understands typical variables that may be collected in education research, how those variables are analyzed, as well as a general understanding of data privacy issues (e.g. HIPAA and FERPA).
  • Experience
    • If you are looking for someone who can manage data for a study that collects original data, it can be very helpful to hire someone who has been involved in original data collection before (e.g., either as a data collector, a project coordinator, or data manager). Understanding the complexity of collecting real-world original data can help a data manager understand how to create better data processes.
    • Some knowledge of data sharing best practices (e.g., metadata standards, FAIR principles) can also be very beneficial
  • Interpersonal skills
    • For lack of a better term here, I’m calling this section interpersonal skills. This includes things such as
      • Communication skills
        • The ability to build collaborative relationships with many people (e.g., grad students, PIs, coordinators, data collectors, school districts, administrative staff, repositories)
      • Acute attention to detail
        • This can be attention to the details in a dataset (e.g., noticing errors). As mentioned by the hosts of the Quantitude podcast, there is a ton of complexity that goes into creating data files and there are so many places along the data management pipeline where errors can creep in. Having someone who is highly organized and excels in implementing and reviewing data checks is very important.
        • This can also be attention to details in regulations and policies (e.g., understanding the intricacies of keeping data secure)
      • Comfortable with problem solving
        • In the world of data, nothing ever goes as expected so it is important for a data manager to not only be comfortable with ambiguity but also be resourceful when troubleshooting and resolving issues

Hiring a data manager

The first step in hiring a data manager is to budget for one. Funding agencies (e.g., IES, NIH) expect you to budget for costs associated with data curation. When applying for a grant, include a data manager in your budget. If possible, add them as key personnel and include biosketches. Consider devoting 6-8% of your budget to funding a data manager and if you have multiple projects, budget a certain percentage of a data manager’s time across each project. If the instability of covering this role entirely with soft funds is a concern, it may even be possible to acquire funds or matching funds from your center or institution.

While budgeting for a full-time data manager may seem like a big investment, it’s important to consider the return on this investment. Consider all of the data curation debt that is often incurred because of poor data management practices (e.g., lost data, unusable data, bad data), and the costs associated with recollecting, re-entering, and cleaning that data, if the data is salvageable at all. The costs of hiring a data manager can absolutely bring you larger returns.

If you are able to budget for and hire a data manager, the next consideration is how can supervisors support data managers? What do data managers need in order to be successful in their role? A few ideas are provided below.

  • Professional status
    • As mentioned at the DBFest in 2022, “Recognising and rewarding technical staff is at the heart of ensuring high-quality, reproducible research”.
    • Data managers often live in a fuzzy area between academics and professional staff. They may miss out on opportunities such as recognition, awards, and promotions. Recognizing data curation as an essential research activity, paying data managers well, providing career progression opportunities, and finding ways to recognize the contributions of data managers can help both recruit and maintain top talent.
  • Ongoing training
    • Data management requirements are constantly evolving. Providing resources for data managers to stay up to date on both changes in technology as well as updates to funder requirements is absolutely necessary for a data manager to be effective.
  • Integrate them into the decision-making team
    • Data managers cannot be effective if data-related decisions are made without them in the room. Giving data managers a voice at the highest level in the organization and allowing them to provide feedback on plans helps to reduce data curation debt.
  • Encourage the formation of support groups as part of their paid work
    • While data managers are part of research teams, the work they do is often siloed, making it very difficult to get feedback, learn, or grow in a position. Depending on the size of your organization, support can come in many ways
      • If you are a small team, this may mean encouraging your data manager to simply join existing groups of other data managers. One example of this is joining the Providing Opportunities for Women in Education Research (POWER) Issues in Data Management in Education Research Hub or it could be meeting up with other data managers in your university, even across disciplines.
      • If you are a larger organization, creating a Data Core can be an excellent way for data managers across different units to share ideas and develop standardized practices

In summary, hiring a data manager is a worthwhile investment if you are able to make it work. I absolutely recognize that it is not in everyone’s budget to do this and that, for some, it may be more practical to have other team members take on these responsibilities for a project. Teams can still successfully manage data for a project without a data manager, especially if the team members managing project data are implementing best practices. Yet, maybe after reading this post, you can see all that is possible with a data manager role, and you may consider hiring this important role on your team in the future.

If you are reading this and you either are a data manager or have hired a data manager and you have your own tips (e.g., other ways you’ve funded a data manager), I’d love to hear from you! Please share your comments below!

Special thank you to Rebekah Jacob and Tara Reynolds for their feedback on this post.

Posted on:
July 7, 2023
11 minute read, 2163 words
data management
See Also:
Creating unique participant study identifiers
Cleaning sample data in standardized way
Creating a data cleaning workflow