A four-level training evaluation framework measuring Reaction, Learning, Behavior, and Results, created by Dr. Donald Kirkpatrick in 1959 and still the most widely used model for assessing training effectiveness.
Key Takeaways
Every L&D team faces the same question from leadership: "How do we know the training worked?" The Kirkpatrick Model provides a structured answer. Instead of relying on gut feeling or participant satisfaction surveys alone, it offers four distinct levels of evidence, each more meaningful than the last. Level 1 tells you whether people liked the training. Level 2 tells you whether they learned anything. Level 3 tells you whether they changed their behavior on the job. Level 4 tells you whether the business benefited. Most organizations do a decent job at Levels 1 and 2 but struggle with Levels 3 and 4. That's because measuring behavior change and business results requires more effort, longer timeframes, and collaboration with managers and business units. It's also where the real proof of value lives. The model isn't just an evaluation tool. When used properly, it's a design tool. If you start by defining what Level 4 results you want (reduce customer complaints by 20%), you can work backwards to define what behaviors need to change (Level 3), what knowledge and skills enable those behaviors (Level 2), and what learning experience will deliver them (Level 1).
Level 1 measures how participants respond to the training experience. It's the easiest level to measure and the most commonly collected.
Did learners find the training relevant? Was it engaging? Was it well-organized? Did they feel the facilitator was knowledgeable? Would they recommend it to a colleague? Level 1 captures the learner's subjective experience immediately after the program. Think of it as the customer satisfaction survey for training.
Post-training surveys (often called "smile sheets" or "happy sheets") are the standard tool. Use a mix of quantitative ratings (1-5 scales for relevance, engagement, facilitator quality) and open-ended questions ("What will you apply immediately?" "What would you change?"). Net Promoter Score adapted for learning ("How likely are you to recommend this program to a colleague?") provides a single benchmark metric. Survey within 24 hours of completion for the most accurate responses.
High satisfaction doesn't mean effective learning. People can enjoy a workshop and learn nothing. They can also dislike a challenging program that produces significant skill growth. Level 1 data is necessary but not sufficient. A program that scores poorly on Level 1 has an engagement problem that will undermine Levels 2-4. But a program that scores well on Level 1 has only cleared the first bar.
Level 2 measures whether participants actually acquired the intended knowledge, skills, attitudes, confidence, and commitment.
Did learners gain the knowledge the program was designed to teach? Can they demonstrate the target skills? Have their attitudes shifted? The New World Kirkpatrick Model adds confidence ("I believe I can do this") and commitment ("I intend to apply this") as critical Level 2 components. Without confidence and commitment, even strong knowledge gains won't translate to behavior change.
Pre-test and post-test comparisons are the gold standard. Test before training begins and again afterward to measure actual knowledge gain rather than just final knowledge level. For skills, use demonstrations, role plays, case study analyses, or simulations evaluated against a rubric. For attitudes and confidence, use pre/post surveys with specific behavioral intention questions. Digital badges and micro-certifications can formalize Level 2 achievement.
Active learning beats passive consumption. Programs with practice exercises, discussions, and application activities produce stronger Level 2 results than lecture-only formats. Spaced practice (spreading learning over time with gaps for reflection) improves retention by 200-400% compared to massed practice (cramming everything into one session). Build assessment into the learning experience, not just at the end.
Level 3 measures whether learners are actually applying what they learned on the job. This is where training evaluation gets hard, and where most organizations stop.
Are participants using the new skills, knowledge, or processes in their daily work? Behavior change is the bridge between learning (Level 2) and results (Level 4). Without it, the training investment produces knowledge that sits unused. Level 3 captures the transfer of learning from the training environment to the work environment.
Manager observations and assessments at 30, 60, and 90 days post-training are the most common method. Self-assessments from learners compared against manager assessments reveal perception gaps. 360-degree feedback before and after training captures behavior change from multiple perspectives. Work product analysis (compare the quality of work before and after training) provides objective evidence. Customer feedback scores, error rates, and process compliance data can serve as behavioral proxies.
The work environment either supports or kills behavior change. If a learner returns from negotiation training to a manager who says "just give the customer whatever they want," the training won't stick. Barriers to transfer include: lack of opportunity to practice, unsupportive managers, competing priorities, outdated systems or processes that prevent new behaviors, and peer pressure to maintain the status quo. L&D must identify and address these barriers proactively.
The New World Kirkpatrick Model identifies "required drivers" that support behavior change: accountability mechanisms (manager check-ins, action plans), reinforcement activities (follow-up emails, refresher modules, coaching), support tools (job aids, checklists, reference guides), and reward systems (recognition for applying new skills). Building these drivers into the program design is as important as building the training content itself.
Level 4 measures the business impact of training. It's what leadership actually cares about, and it's the level that justifies the L&D budget.
Did the training produce the business outcomes it was designed to achieve? This varies by program: reduced safety incidents, increased sales revenue, improved customer satisfaction scores, lower employee turnover, faster time-to-productivity, fewer compliance violations, reduced error rates, or higher quality scores. Level 4 connects training to the metrics that appear on executive dashboards.
Start by defining leading indicators (early signals of progress) and lagging indicators (final business outcomes). For a sales training program: leading indicators include pipeline velocity and proposal quality; lagging indicators include revenue and win rate. Use control groups when possible: compare the performance of trained vs. untrained groups. Time-series analysis (comparing before-training and after-training metrics) works when control groups aren't feasible. Isolate the training effect from other variables (market changes, new products, seasonal patterns) to avoid claiming credit for results training didn't cause.
The hardest part of Level 4 is attribution. If sales increased 15% after sales training, how much of that increase was caused by the training vs. a new product launch, a market upturn, or a competitor's failure? Perfect attribution is impossible. Practical approaches include: trained vs. untrained group comparisons, participant estimation surveys ("What percentage of your improvement would you attribute to the training?"), trend analysis (did the improvement timeline match the training timeline?), and correlation analysis between training completion and performance metrics.
Jim and Wendy Kirkpatrick updated the original model in 2016 to address common misapplications and make it more practical for modern L&D teams.
The biggest change: work backwards. Define desired business results (Level 4) first. Then identify the behaviors that drive those results (Level 3). Then determine what knowledge and skills enable those behaviors (Level 2). Then design the learning experience (Level 1). This inverted approach ensures training is designed to produce business impact from the start rather than hoping impact will happen after the fact.
Level 1 now includes "engagement" and "relevance" as distinct measures alongside satisfaction. Level 2 adds "confidence" and "commitment" beyond just knowledge and skills. Level 3 adds "required drivers" (accountability, reinforcement, support, rewards) as planned components rather than afterthoughts. Level 4 distinguishes between "leading indicators" (early evidence of progress) and "desired outcomes" (final business impact).
The New World Model introduces Return on Expectations (ROE) as the ultimate measure of L&D success. Rather than calculating a dollar-based ROI (which is often imprecise), ROE asks stakeholders: "Did this program deliver what you expected?" Defining expectations at the start and measuring against them at the end creates accountability and a clear success criteria that both L&D and business leaders agree on.
Here's how to apply the model to a real training program from planning through evaluation.
A practical reference for selecting the right measurement approach at each Kirkpatrick level.
| Level | Common Tools | Timing | Effort Required | Data Quality |
|---|---|---|---|---|
| Level 1: Reaction | Post-training surveys, NPS, focus groups, real-time polls | Immediately after training | Low | Subjective but useful |
| Level 2: Learning | Pre/post tests, skill demonstrations, simulations, case analyses, digital badges | During and immediately after training | Medium | Objective (knowledge), moderate (skills) |
| Level 3: Behavior | Manager assessments, 360 feedback, observation checklists, work product review, system usage data | 30-90 days post-training | High | Moderate to High |
| Level 4: Results | Business metrics analysis, control group comparison, ROI calculation, trend analysis, stakeholder assessment | 3-12 months post-training | Very High | High (with proper methodology) |