With countless popular news articles (and popular opinions) announcing loudly that diversity training doesn’t work or is a waste of time, we as practitioners are left in the awkward position of proving that our work can, in fact, be worthwhile. With this formidable task at hand, today we’re addressing evaluations and metrics for D&I programs. 

Allow me, first and foremost, to dispel the myth that evaluations are a waste of time. While perhaps no one likes being handed a survey to take, the benefits for practitioners far outweigh the annoyances, particularly when the stakes are high; if you can make a compelling case that your intervention is shifting perspectives, equipping individuals with skills that benefit them, or otherwise having a positive impact, that sets you apart in a field where the margin for error is so often small and unforgiving. Furthermore, when evaluations are designed and distributed effectively, the data they yield can be a priceless resource in the process of continually refining and improving your work. So: let’s talk strategies, challenges, and best practices around evaluation. 

Regardless of participant enthusiasm, recognize that inaction is your greatest enemy.  

Workplaces. Are. Busy. No matter how much someone may have enjoyed your session, it’s likely that a post-training survey will not naturally be at the top of their to-do list when they get back to the office and are faced with a flooded inbox and an afternoon of meetings. This is one of the perennial challenges in evaluation. Fortunately, there are a few things you can do to increase the likelihood that participants will spend the time to complete your survey. 

  • Emphasize the importance of evaluations. Before parting, share with participants how much you value their opinions, and if possible, provide an example of how you have incorporated feedback in the past. 
  • Distribute surveys promptly. Participants are most likely to complete surveys while the material is fresh in their mind, and will also provide more detailed responses during this window. 
  • Consider devoting time at the close of the session for participants to complete evaluations. Practitioners never want to sacrifice our content, but consider: if you can expect 90% survey completion rate as opposed to 15-20% out-of-session, you will have a much more comprehensive picture of participants’ thoughts about the session. While paper surveys require more work to analyze, a physical survey may also be worth considering if a high completion rate is among your top priorities.  
  • Leave out non-essentials. The longer a survey is, the less likely participants are to complete it in full. Keep it short, ideally under 20 questions, to make it quick and easy for participants to complete. 
  • If you are leveraging digital survey platforms, plan to send out 1-2 reminders after the first link is shared. Each reminder should bring in more responses. Time the reminders carefully—2-3 days after the session, and no more than a week after—to ensure you are getting specific and accurate responses. Survey tools like Survey Monkey will allow you to schedule automated reminders when you distribute the survey. 

If possible, use a pre- and post- evaluation model. 

While this practice probably isn’t feasible for every 2-hour or daylong session you might offer, if you are trying to demonstrate the efficacy of an intervention or a series of learning experiences, conducting a pre-survey to use as a baseline and compare to a post-survey is your best bet. Even if not every client or group of participants will be willing to go through this process, doing it with at least one can provide you with compelling evidence to cite in the future for groups who might expect to see similar changes. 

Use the right metrics. 

Did participants like my training? Did they learn something? Did it change their attitudes about the topic? How about their behavior? Research tells us these are 4 distinct questions that evaluate 4 different things, and if someone answers yes to the latter two questions, that suggests that the learning experience will have more lasting effects than if someone only answers yes to the first two. (Furthermore, because of the difficult nature of D&I topics, how much a participant “liked” a session may not be helpful information at all—perhaps it made them very uncomfortable, but they learned a lot and plan to change their behavior going forward.) A common trap that survey writers fall into is failing to go beyond participant responses or knowledge to assess their attitudes or behavior. Fortunately, there are models and tools we can leverage to get at these deeper questions. 

Did participants like my training? Did they learn something? Did it change their attitudes about the topic? How about their behavior? If someone answers yes to the latter two questions, that suggests that the learning experience… Share on X
  • The Kirkpatrick Model of Training & Evaluation outlines four levels of outcomes that we might measure following an educational intervention: reaction, learning, behavior, and results (the degree to which desired organizational outcomes occur following a training). While Levels 3 and 4 certainly prove more challenging to evaluate in a survey, being intentional to include questions that get at these higher levels in some manner can give us a much more informative picture of the outcomes of our sessions. Questions like “what goals can you set for yourself based on the material covered in this session?” can not only scratch the surface of potential behavioral change, but can also aid participants in synthesizing takeaways and setting intentions that are more likely to translate to action. 
  • The Intercultural Development Inventory is a validity-tested psychometric tool that measures how people respond to cultural difference. In addition to being highly respected in the field, it has the added benefit of going beyond reactions and knowledge to assess deeper levels of attitudes and behaviors. It is a longer assessment best suited to in-depth analyses of personal or organizational development around D&I, but particularly when it is used as a pre- and post- assessment tool around long-term learning or interventions, the results will be very telling about how effective the intervention was at moving participants along in their developmental journey. 

Leverage qualitative and quantitative data in tandem. 

Many people assume quantitative data is the most worthwhile to collect—after all, it easily translates to charts and graphs, and can be crunched neatly in spreadsheets. Don’t be fooled: failing to collect qualitative data is one of the biggest mistakes you can make in the course of evaluation. We could ask 10 questions about how beneficial 10 different parts of the training were, and never learn that the PowerPoint was not accessible to someone with a visual impairment, or that the example we came up with off the top of our head to illustrate a point provided the essential “aha moment” for several people in the room. At a minimum, include open-ended questions about what worked especially well, and what could be improved for future sessions. Even better, add an optional “comments” box alongside any multiple-choice questions so that participants are not required, but can elaborate on their response and provide context. Some of the most valuable insights we have gathered on our evaluations at The Winters Group have come from details included in these supplemental qualitative answers. 

Many people assume quantitative data is the most worthwhile to collect—after all, it easily translates to charts and graphs, and can be crunched neatly in spreadsheets. Don’t be fooled: failing to collect qualitative data is one of… Share on X

Be specific. 

“I enjoyed the facilitator” is not a particularly informative statement for participants to rate. Maybe they’re thinking about the flow of the session, maybe they’re thinking about the facilitator’s jokes, maybe they’re thinking about their outfit…it’s easy to be vague when writing questions, and sometimes it may feel more efficient than adding multiple questions related to the same topic. However, specificity is your friend. Isolate the specific information you want to gather and build the question around that. “The facilitator presented material in an engaging way” gets to the same topic, but is a much more informative statement to analyze. 

Similarly, answer choices should be clearly distinguished and specific. In most cases, including a rating scale of 5 is sufficient (strongly disagree, disagree, neutral, agree, strongly agree.) Any more than that and you may leave participants unnecessarily puzzling over whether they “somewhat agree” or “agree”—a distinction that probably isn’t all that helpful to you at the end of the day either. (And if you want to capture those nuances, that’s what the qualitative comments box is there for!) 

Answer choices should be clearly distinguished and specific. In most cases, including a rating scale of 5 is sufficient. Any more and you may leave participants unnecessarily puzzling over whether they “somewhat agree” or “agree”… Share on X

Keep it anonymous. 

Ensuring participants that their responses will be anonymous increases the likelihood that you will receive genuine, honest feedback. 

Collect demographic data in evaluations. 

I know what you’re thinking: “didn’t she just tell me to keep it anonymous and keep my surveys short and to the point?” Well…yes. However, demographic data is neither superfluous, nor intended to personally identify individuals. While it’s possible that collecting demographics may turn some people away from completing a survey, you shouldn’t be studying demographics trying to decipher which participant answered that way, so you can still truthfully share that responses will remain anonymous. 

The benefit of collecting demographic data is that it informs us about how our content is being received differently by people with different identities. If everyone in the group indicated positive reception of content except for someone who is transgender, perhaps we need to think about where we might be missing the mark in our discussion of gender identity (and, consult that person’s qualitative responses that we were clever enough to collect!) Conversely, if the people of color in the room thought the session was great, while many white people seem to have left feeling defensive, we might want to think about how we could make the content more accessible to people at various points in their developmental journey. If the two executives in the room disliked our material around income inequality, maybe we could think about how to reframe it…or maybe we accept that by nature the topic was uncomfortable for them, and move on knowing that it was impactful for others. Having context about respondents’ identities can be immensely helpful in determining whether we need to adjust material. Consider collecting demographic data around generation, race, gender identity, or other identities that may be relevant given the audience and topic. 

Having context about respondents’ identities can be helpful in determining whether we need to adjust material. Consider collecting demographic data around generation, race, gender identity, or other identities that may be relevant… Share on X

Don’t allow time restraints to rule out evaluations. 

Time is always a challenge, and maybe sometimes, as much as you would like to conduct a formal survey, it’s not in the cards. Don’t despair! You can still leverage alternate forms of evaluation that take less time, coordination, or involvement.  

  • Plus, Minus, Delta is a quick and easy evaluation alternative: At the close of the session, post 3 large chart papers or posters across the room and label them “+” “-” and “Δ.” Next, instruct participants to write on 3 separate post-it notes 1) what went well 2) what didn’t go well and 3) what they would like to change. Have them post their notes on the appropriate posters as they leave the session. This only takes 2 or 3 minutes and has the added benefit of capturing responses from most, if not all, participants. 
  • Focus groups can be an informative alternative to deploying a survey, and even though they take more time for the individuals involved, fewer individuals need to be recruited. Particularly if participants are selected at random and are relatively representative of the organization as a whole, you may be able to collect many of the same insights that you could surveying hundreds of people by using targeted questioning with just a few.  

I will leave you with one last thought: as most of us are well aware, D&I work takes time. We may not see the change we hope to see for participants after one session, or two, or five. This is to be expected, and it’s essential that we both reflect on the constructive criticism we receive in evaluations, and not lose hope if they aren’t full of sunshine and roses. A general rule of thumb is: the more a response or trend surprises you, the more it is worth exploring in the name of creating content or plans that work for everyone. This is a continuous journey for all of us; just as we don’t expect our participants to leave every session with a refreshed worldview, we can’t expect that after months, or even years, we have perfected our own work beyond improvement. This is the value of evaluations, and the rationale behind doing them right. Let us know in the comments below how evaluations have informed your work! 

This is a continuous journey for all of us; just as we don’t expect our participants to leave every session with a refreshed worldview, we can’t expect that after months, or even years, we have perfected our own work beyond… Share on X