A performance appraisal method that combines numerical rating scales with specific behavioral examples at each scale point, reducing subjectivity by anchoring each rating level to observable, job-relevant behaviors identified through systematic job analysis.
Key Takeaways
Think of BARS as a graphic rating scale that shows its work. Instead of asking a manager to rate an employee's 'customer service' on a scale of 1 to 5, BARS tells the manager exactly what customer service looks like at each level. Level 1 might read: 'Ignores customer inquiries for more than 24 hours and provides inaccurate information when responding.' Level 3: 'Responds to all customer inquiries within 8 hours and accurately resolves 80% of issues on first contact.' Level 5: 'Anticipates customer needs before they arise, maintains a 98%+ first-contact resolution rate, and receives unsolicited positive feedback from customers monthly.' Now the manager isn't interpreting what a '3' means. They're matching the employee's observed behavior against a specific description. Two different managers evaluating the same employee are much more likely to arrive at the same rating because they're comparing against the same behavioral benchmark. This precision is why BARS is considered the gold standard for evaluation accuracy. It's also why it's expensive to build. Those behavioral anchors don't write themselves. They come from systematic job analysis, interviews with subject matter experts, and iterative refinement.
Developing BARS is a structured process that typically takes 6-8 weeks per job family and involves job experts, HR professionals, and managers.
Conduct a job analysis to determine the 5-9 key performance dimensions for the role. For a customer service representative, dimensions might include: response timeliness, problem resolution accuracy, communication clarity, product knowledge, and escalation judgment. Each dimension should be distinct (not overlapping) and observable (not an internal trait like 'attitude'). Involve experienced employees and managers in defining these dimensions to ensure job relevance.
Gather 50-100+ specific examples of effective and ineffective behavior for each performance dimension. Use structured interviews with subject matter experts, review incident documentation, and analyze customer feedback data. For 'response timeliness,' examples might range from 'left a voicemail unreturned for three business days' (poor) to 'called the customer back within 15 minutes during peak volume' (excellent). The more real incidents you collect, the stronger your anchors will be.
A separate group of subject matter experts (not the ones who provided the incidents) independently assigns each behavioral example to a scale point. If you're using a 7-point scale, each rater places every incident at the level they believe it represents. Keep only the incidents where raters show strong agreement (standard deviation below 1.5 on a 7-point scale). Discard ambiguous examples. This quality control step is what gives BARS its accuracy advantage.
Choose 1-2 behavioral anchors per scale point per dimension. These should be the examples with the highest inter-rater agreement and the clearest, most specific language. A 7-point BARS for one dimension will have 7-14 behavioral statements. Across 6 dimensions, you'll have 42-84 total anchors. Write each anchor in present tense, using observable behavior: 'Completes quality checks on all outgoing orders before shipping' rather than 'Is quality-conscious.'
Have managers use the draft BARS to rate current employees. Compare their BARS ratings to other performance indicators (output metrics, customer feedback scores, peer evaluations) to check for convergent validity. If BARS ratings don't correlate with objective performance measures, the anchors need revision. Collect manager feedback on clarity and usability, then refine the scales before full deployment.
Here's what a completed BARS dimension looks like for the 'Problem Resolution' competency in a customer service role.
| Rating | Behavioral Anchor |
|---|---|
| 7 (Outstanding) | Identifies systemic issues from individual complaints, proposes process changes that prevent recurrence, and achieves first-contact resolution on 98%+ of cases including edge cases |
| 6 (Excellent) | Resolves non-standard issues without escalation by creatively applying policy exceptions within authority limits, maintaining a 95% first-contact resolution rate |
| 5 (Above Average) | Accurately diagnoses the root cause of common and moderately complex issues, resolves them within established timeframes, and follows up with the customer to confirm satisfaction |
| 4 (Average) | Handles routine issues according to established procedures, occasionally needs guidance on non-standard cases, and meets the department's 85% first-contact resolution target |
| 3 (Below Average) | Resolves simple issues but frequently misdiagnoses moderately complex problems, resulting in repeat contacts and customer frustration |
| 2 (Poor) | Applies incorrect solutions to common issues, fails to ask clarifying questions, and escalates cases that should be resolved at first contact |
| 1 (Unacceptable) | Provides inaccurate information that worsens customer problems, fails to document case details, and has a first-contact resolution rate below 50% |
BARS offers measurable improvements in evaluation quality that justify the development investment for roles where accuracy matters most.
BARS isn't the right choice for every organization. Understanding the drawbacks helps you make an informed decision.
Building BARS requires significant upfront investment. The job analysis, critical incident collection, retranslation exercise, and pilot testing typically take 6-8 weeks per job family and involve 10-20 subject matter experts. For an organization with 50 distinct job families, a full BARS implementation is a multi-year project. This is why most organizations use BARS selectively for high-impact roles rather than enterprise-wide.
Jobs evolve. Technologies change. Customer expectations shift. The behavioral anchors that accurately described excellent performance in 2024 may be outdated by 2026. BARS requires periodic updates (ideally every 2-3 years per role) to keep anchors current. Organizations that build BARS once and never update it end up with a sophisticated system measuring the wrong behaviors.
Because BARS is developed for specific job families, you can't directly compare ratings across different roles. A '5' on a customer service BARS means something different than a '5' on an engineering BARS. This complicates talent review processes that need to compare performance across functions. Some organizations address this by using BARS for within-function evaluations and a simpler scale for cross-functional talent reviews.
Behavioral Observation Scales (BOS) are often confused with BARS. Both use behavioral descriptions, but they work differently.
| Feature | BARS | BOS |
|---|---|---|
| What the rater does | Matches employee behavior to the closest behavioral anchor on the scale | Rates how frequently the employee demonstrates each listed behavior |
| Scale type | Behavioral anchors at each scale level | Frequency scale (Almost Never to Almost Always) for each behavior |
| Number of items | 1-2 anchors per scale point per dimension | 5-10 behaviors per dimension, each rated separately |
| Rating task | Choose the best-matching anchor | Rate frequency of each behavior (more items to complete) |
| Development complexity | High (retranslation exercise required) | Moderate (behaviors listed but no anchoring process) |
| Best for | Roles where the quality of behavior matters most | Roles where frequency of desired behaviors is the key indicator |
These practical recommendations come from organizations that have successfully deployed BARS systems at scale.
Research data on the effectiveness and adoption of behaviorally anchored rating systems.