Kirkpatrick Model

A four-level training evaluation framework measuring Reaction, Learning, Behavior, and Results, created by Dr. Donald Kirkpatrick in 1959 and still the most widely used model for assessing training effectiveness.

What Is the Kirkpatrick Model?

Key Takeaways

  • The Kirkpatrick Model is a four-level framework for evaluating training programs: Level 1 (Reaction), Level 2 (Learning), Level 3 (Behavior), and Level 4 (Results).
  • Created by Dr. Donald Kirkpatrick in 1959, it remains the most recognized training evaluation model used by L&D professionals globally.
  • Each level builds on the previous one, moving from subjective satisfaction to objective business impact measurement.
  • 95% of organizations measure Level 1 (did learners enjoy the training?), but only 8% consistently measure Level 4 (did business outcomes improve?) according to ATD research.
  • The New World Kirkpatrick Model, updated by Jim and Wendy Kirkpatrick, reverses the process: start by defining desired Level 4 results, then work backwards to design training that produces those outcomes.

Every L&D team faces the same question from leadership: "How do we know the training worked?" The Kirkpatrick Model provides a structured answer. Instead of relying on gut feeling or participant satisfaction surveys alone, it offers four distinct levels of evidence, each more meaningful than the last. Level 1 tells you whether people liked the training. Level 2 tells you whether they learned anything. Level 3 tells you whether they changed their behavior on the job. Level 4 tells you whether the business benefited. Most organizations do a decent job at Levels 1 and 2 but struggle with Levels 3 and 4. That's because measuring behavior change and business results requires more effort, longer timeframes, and collaboration with managers and business units. It's also where the real proof of value lives. The model isn't just an evaluation tool. When used properly, it's a design tool. If you start by defining what Level 4 results you want (reduce customer complaints by 20%), you can work backwards to define what behaviors need to change (Level 3), what knowledge and skills enable those behaviors (Level 2), and what learning experience will deliver them (Level 1).

4Evaluation levels: Reaction, Learning, Behavior, Results
1959Year Donald Kirkpatrick first published the four-level model in the Journal of the ASTD
95%Of organizations measure Level 1 (Reaction), but only 8% consistently measure Level 4 (Results) (ATD, 2023)
#1Most widely used training evaluation framework worldwide (CIPD, 2023)

Level 1: Reaction

Level 1 measures how participants respond to the training experience. It's the easiest level to measure and the most commonly collected.

What it measures

Did learners find the training relevant? Was it engaging? Was it well-organized? Did they feel the facilitator was knowledgeable? Would they recommend it to a colleague? Level 1 captures the learner's subjective experience immediately after the program. Think of it as the customer satisfaction survey for training.

How to measure it

Post-training surveys (often called "smile sheets" or "happy sheets") are the standard tool. Use a mix of quantitative ratings (1-5 scales for relevance, engagement, facilitator quality) and open-ended questions ("What will you apply immediately?" "What would you change?"). Net Promoter Score adapted for learning ("How likely are you to recommend this program to a colleague?") provides a single benchmark metric. Survey within 24 hours of completion for the most accurate responses.

Limitations

High satisfaction doesn't mean effective learning. People can enjoy a workshop and learn nothing. They can also dislike a challenging program that produces significant skill growth. Level 1 data is necessary but not sufficient. A program that scores poorly on Level 1 has an engagement problem that will undermine Levels 2-4. But a program that scores well on Level 1 has only cleared the first bar.

Level 2: Learning

Level 2 measures whether participants actually acquired the intended knowledge, skills, attitudes, confidence, and commitment.

What it measures

Did learners gain the knowledge the program was designed to teach? Can they demonstrate the target skills? Have their attitudes shifted? The New World Kirkpatrick Model adds confidence ("I believe I can do this") and commitment ("I intend to apply this") as critical Level 2 components. Without confidence and commitment, even strong knowledge gains won't translate to behavior change.

How to measure it

Pre-test and post-test comparisons are the gold standard. Test before training begins and again afterward to measure actual knowledge gain rather than just final knowledge level. For skills, use demonstrations, role plays, case study analyses, or simulations evaluated against a rubric. For attitudes and confidence, use pre/post surveys with specific behavioral intention questions. Digital badges and micro-certifications can formalize Level 2 achievement.

Design tips for better Level 2 outcomes

Active learning beats passive consumption. Programs with practice exercises, discussions, and application activities produce stronger Level 2 results than lecture-only formats. Spaced practice (spreading learning over time with gaps for reflection) improves retention by 200-400% compared to massed practice (cramming everything into one session). Build assessment into the learning experience, not just at the end.

Level 3: Behavior

Level 3 measures whether learners are actually applying what they learned on the job. This is where training evaluation gets hard, and where most organizations stop.

What it measures

Are participants using the new skills, knowledge, or processes in their daily work? Behavior change is the bridge between learning (Level 2) and results (Level 4). Without it, the training investment produces knowledge that sits unused. Level 3 captures the transfer of learning from the training environment to the work environment.

How to measure it

Manager observations and assessments at 30, 60, and 90 days post-training are the most common method. Self-assessments from learners compared against manager assessments reveal perception gaps. 360-degree feedback before and after training captures behavior change from multiple perspectives. Work product analysis (compare the quality of work before and after training) provides objective evidence. Customer feedback scores, error rates, and process compliance data can serve as behavioral proxies.

Why behavior change often fails

The work environment either supports or kills behavior change. If a learner returns from negotiation training to a manager who says "just give the customer whatever they want," the training won't stick. Barriers to transfer include: lack of opportunity to practice, unsupportive managers, competing priorities, outdated systems or processes that prevent new behaviors, and peer pressure to maintain the status quo. L&D must identify and address these barriers proactively.

Required drivers for Level 3

The New World Kirkpatrick Model identifies "required drivers" that support behavior change: accountability mechanisms (manager check-ins, action plans), reinforcement activities (follow-up emails, refresher modules, coaching), support tools (job aids, checklists, reference guides), and reward systems (recognition for applying new skills). Building these drivers into the program design is as important as building the training content itself.

Level 4: Results

Level 4 measures the business impact of training. It's what leadership actually cares about, and it's the level that justifies the L&D budget.

What it measures

Did the training produce the business outcomes it was designed to achieve? This varies by program: reduced safety incidents, increased sales revenue, improved customer satisfaction scores, lower employee turnover, faster time-to-productivity, fewer compliance violations, reduced error rates, or higher quality scores. Level 4 connects training to the metrics that appear on executive dashboards.

How to measure it

Start by defining leading indicators (early signals of progress) and lagging indicators (final business outcomes). For a sales training program: leading indicators include pipeline velocity and proposal quality; lagging indicators include revenue and win rate. Use control groups when possible: compare the performance of trained vs. untrained groups. Time-series analysis (comparing before-training and after-training metrics) works when control groups aren't feasible. Isolate the training effect from other variables (market changes, new products, seasonal patterns) to avoid claiming credit for results training didn't cause.

The attribution challenge

The hardest part of Level 4 is attribution. If sales increased 15% after sales training, how much of that increase was caused by the training vs. a new product launch, a market upturn, or a competitor's failure? Perfect attribution is impossible. Practical approaches include: trained vs. untrained group comparisons, participant estimation surveys ("What percentage of your improvement would you attribute to the training?"), trend analysis (did the improvement timeline match the training timeline?), and correlation analysis between training completion and performance metrics.

The New World Kirkpatrick Model

Jim and Wendy Kirkpatrick updated the original model in 2016 to address common misapplications and make it more practical for modern L&D teams.

Start at Level 4, not Level 1

The biggest change: work backwards. Define desired business results (Level 4) first. Then identify the behaviors that drive those results (Level 3). Then determine what knowledge and skills enable those behaviors (Level 2). Then design the learning experience (Level 1). This inverted approach ensures training is designed to produce business impact from the start rather than hoping impact will happen after the fact.

Added components at each level

Level 1 now includes "engagement" and "relevance" as distinct measures alongside satisfaction. Level 2 adds "confidence" and "commitment" beyond just knowledge and skills. Level 3 adds "required drivers" (accountability, reinforcement, support, rewards) as planned components rather than afterthoughts. Level 4 distinguishes between "leading indicators" (early evidence of progress) and "desired outcomes" (final business impact).

Return on Expectations (ROE)

The New World Model introduces Return on Expectations (ROE) as the ultimate measure of L&D success. Rather than calculating a dollar-based ROI (which is often imprecise), ROE asks stakeholders: "Did this program deliver what you expected?" Defining expectations at the start and measuring against them at the end creates accountability and a clear success criteria that both L&D and business leaders agree on.

Implementing the Kirkpatrick Model: Step-by-Step

Here's how to apply the model to a real training program from planning through evaluation.

  • Before the program: Meet with stakeholders to define Level 4 success metrics. What business outcomes should improve? By how much? Over what timeframe? Document these as the program's return on expectations. Without this step, you have no target to evaluate against.
  • During design: Map backwards from Level 4 results to Level 3 behaviors to Level 2 knowledge and skills to Level 1 experience. Design assessments at each level as part of the program architecture, not as an afterthought.
  • Before delivery: Administer pre-assessments (Level 2 baseline knowledge test, Level 3 baseline behavior assessment, Level 4 baseline business metrics). You can't measure change without a starting point.
  • During delivery: Collect Level 1 data (participant engagement, relevance perceptions). Administer Level 2 assessments (knowledge checks, skill demonstrations). These happen within the training itself.
  • 30-60-90 days after: Measure Level 3 behavior change through manager assessments, self-reports, work product analysis, and system data. Identify barriers to transfer and provide additional support where needed.
  • 3-6 months after: Analyze Level 4 business metrics. Compare against baseline. Account for external variables. Present results to stakeholders alongside the original expectations defined in the planning phase.

Evaluation Tools and Methods by Level

A practical reference for selecting the right measurement approach at each Kirkpatrick level.

LevelCommon ToolsTimingEffort RequiredData Quality
Level 1: ReactionPost-training surveys, NPS, focus groups, real-time pollsImmediately after trainingLowSubjective but useful
Level 2: LearningPre/post tests, skill demonstrations, simulations, case analyses, digital badgesDuring and immediately after trainingMediumObjective (knowledge), moderate (skills)
Level 3: BehaviorManager assessments, 360 feedback, observation checklists, work product review, system usage data30-90 days post-trainingHighModerate to High
Level 4: ResultsBusiness metrics analysis, control group comparison, ROI calculation, trend analysis, stakeholder assessment3-12 months post-trainingVery HighHigh (with proper methodology)

Frequently Asked Questions

Do I need to measure all four levels for every training program?

No. Measure Level 1 and Level 2 for all programs (these are relatively low-effort and provide essential quality data). Reserve Level 3 and Level 4 measurement for programs that are strategically important, expensive, or need to justify continued investment. A 30-minute compliance refresher doesn't need Level 4 ROI analysis. A $200,000 leadership development program absolutely does.

What's the difference between Kirkpatrick and Phillips ROI?

Jack Phillips added a fifth level, ROI, on top of Kirkpatrick's four levels. Phillips Level 5 converts Level 4 business results into monetary value and compares them against the total program cost to calculate a percentage return on investment. Kirkpatrick focuses on whether results were achieved. Phillips focuses on whether the investment was financially justified. Phillips also introduced the practice of isolating training's contribution from other factors, which is one of the hardest aspects of Level 4 evaluation.

Why do so few organizations measure Levels 3 and 4?

Three reasons. First, it's hard. Measuring behavior change requires collaboration with managers, and measuring business impact requires access to business data and analytical skills. Second, it takes time. Level 3 data isn't available for 30-90 days; Level 4 data for 3-12 months. Third, attribution is messy. Isolating training's impact from every other variable affecting business outcomes is methodologically difficult. Despite these challenges, organizations that invest in Level 3-4 measurement build stronger credibility with leadership and make better resource allocation decisions.

Can the Kirkpatrick Model be used for e-learning?

Yes, and in some ways it's easier. Level 1 reactions can be captured through in-course feedback and completion surveys. Level 2 learning is measured through built-in assessments, quizzes, and simulations, all tracked automatically by the LMS. Level 3 behavior still requires manager observation and work product analysis, but LMS completion data and system usage analytics provide supporting evidence. Level 4 business metrics are measured the same way regardless of delivery format.

How do I convince stakeholders that Level 1 isn't enough?

Use an analogy: Level 1 is like asking patients whether they enjoyed their visit to the doctor. It's good to know, but it doesn't tell you whether the treatment worked. Show the gap: "We know 95% of participants liked the training, but we don't know whether anyone is using it. Our customer satisfaction scores haven't changed." Present a pilot proposal: "Let me add Level 3 measurement to one program and show you what we learn." Once stakeholders see the richer data, they rarely go back to Level 1 only.

Is the Kirkpatrick Model still relevant with modern learning approaches like microlearning and social learning?

Absolutely. The model evaluates the impact of learning, not the delivery format. Microlearning should still produce measurable reactions (Level 1), knowledge or skill gains (Level 2), behavior change on the job (Level 3), and business impact (Level 4). Social learning programs should do the same. The tools and methods for measurement may differ (analytics dashboards instead of paper surveys, community engagement metrics instead of attendance), but the four-level framework applies regardless of how learning is delivered.
Adithyan RKWritten by Adithyan RK
Surya N
Fact-checked by Surya N
Published on: 25 Mar 2026Last updated:
Share: