Story points: Wikis

Advertisements

Note: Many of our articles have direct quotes from sources you can cite, within the Wikipedia article! This article doesn't yet, but we're working on it! See more info or our list of citable articles.

Encyclopedia

From Wikipedia, the free encyclopedia

In agile software development, story points are units of relative size used in estimating requirements as an alternative to units of time.

Contents

What are Story Points?

Story points (referred to as just points in this article) are units of relative size used in estimating software requirements as an alternative to units of time. Points are a measurement of complexity and/or size of a requirement as compared to the duration to complete that requirement. Points are sometimes called "story points" because they are most often used in conjunction with User Stories as a requirements device.

Points are used in agile software development for long term (release-level) planning and tracking. They are distinct from time-based estimation used for task-level Sprint planning. Point-estimated stories on a backlog along with a team velocity can be used to provide rough release-level scheduling and project progress measuring using the project backlog burndown.

There are two aspects of points which are critically important to comprehend as they are fundamental to the definition and application of point-based estimating. That is that a point as a unit of estimation has the following two qualities:

  • It is not a measurement of duration, but rather a measurement of size/complexity, and
  • It is a relative unit of measurement
Advertisements

Points Are Size, not Duration

It is important for understanding points to divorce the concepts of size and duration for the thing being estimated. This is explained further in the following section(s) but it is worth repeating many times because it is a paradigm shift that is commonly missed when communicating about points. In fact, it is generally acknowledged that the point-based estimation technique's most common failure or shortcoming is that people (including those doing the estimation) fail to comprehend and apply this timeless property of "the point".

A common question at this point becomes "if points are not duration-based estimates, how long will something estimated as n points take to do?" To understand how points address this question, let's use an analogy. Assume you are tasked with painting the walls of the rooms in a house that has the floorplan shown here:

Simple house floor plan.gif

To know how much paint you'll need (roughly), you are concerned with answer the question of "how big is this job" more than "how long will it take" right? You'll want to know dimensions of the rooms, and probably some other factors like existing wall coloring, windows, doors, fireplaces and other obstructions, etc. You are trying to answer the question of size/complexity not duration first.

Why not immediately combine this calculation in such a way as to arrive at a time-based estimate? After all, if you have the size/complexity data, don't you have what you need to answer the "how long" question? The simple answer is that we can (and do) "finish" the estimation problem for software, but using a slightly different approach. However, points are useful for more than just answering the "how long" question. This will be addressed below.

One interesting thing to note about this property of point-based estimation is that the estimated size of the problem does not change when applied to one team vs. another. The size of the problem is the same, regardless of how fast each could tackle the problem. The importance of this is addressed below.

So to sum up this point, point-based estimation is a step away from answering "how long will it take" to address the more critical first question of "how big is the job".

Points are Relative, not Absolute

The second important property of points is that the units are relative, not absolute. This is critically different from time-based estimates where the unit is absolute - a day is a universally agreed-upon size - everyone will know that a day is 24 hours or 1/7th of a week. What's more, 100 days is exactly 100x a single day in duration.

Point-based estimates assume that the unit of the "point" is relative to each other (e.g. 2 points is twice as big as 1 point) and to the requirements being estimated. In other words, a requirement estimated as a 4 point requirement is assumed to be 2x the size of a 2 point requirement and 1/2 the size of an 8 point requirement.

The reasoning behind this is best kept to the "Why Points" section below, but in short, this is an acknowledgement of two problems we have in software development estimation:

  • we never have 100% of the information needed to fully assess the size or duration in absolute units
  • it is harder and more costly to assess absolute size than relative size (e.g. between the parking garage pillars, can you estimate the # of feet between them or the # of cars that will fit between them more easily (assuming you have no tape measure)?)
  • what we know at the time of estimation often changes, having to re-estimate every change is very costly and not always productive; point-based estimation acknowledges that change (usually in the form of further definition of the details of the requirements) and is usually spread fairly evenly across all the requirements estimated, which means that the points need not change if the relative sizes of the requirements (compared to each other) are not generally changing

Using the floor plan diagram above, if you were to assign some point value to the smallest bedrooms (#s 1 and 2), what would you then assign to the other rooms? While the actual values are up for discussion, you might estimate the rooms as follows:

Room Points/Size
Bedroom 1 2
Bedroom 2 2
Bedroom 3 3
Living Room 4
Kitchen 4

Why did I give the kitchen a 4 when it is arguably closer to the size of bedroom 3? Two factors:

  • It is larger than bedroom 3
  • Kitchens will generally also be more complex on average - you've got cabinets, appliances, etc. to work around

Also note that I used a linear scale here. Ideally you do use an exponential scale (see below). Had I used an exponential scale (e.g. 1, 2, 4, 8) I probably would have assigned the same value to bedroom 3, the kitchen and the living room. For the sake of our illustration right now, I went with linear to better illustrate the relativity of the points.

Why Points?

So, points are relative measurements of size/complexity, not absolute measurements of duration. Why? What is the point of points? What use are they, especially when the question being asked is more often "how long will this take" vs. "how big is it?"

Let's first ask the question: what is it that you really want to know? Is it that you want to know how long will what you asked for take, or when you will have what you want? (Or, in the case of a fixed date, the question becomes what are you most likely to get by that date?) The distinction here is that there are assumptions behind each version of the question.

  • How long will what you ask for take? - this assumes that:
    • you know exactly, precisely and completely what you want and can specify it 100% up front
    • the developers can with a high degree of confidence/accuracy, given enough time, tell you how long it will take
    • you are willing to spend what it takes to get an "accurate" estimate
  • When will you have what you want? - this assumes that
    • you know what you want now, but what you want may change at various detail levels as you see the project being executed
    • what you can specify now will be at a relatively consistent level of detail/specificity
    • developers' unpadded/unbuffered estimates are statistically inaccurate (on average they are low) for requirements which (a) are not fully specified up front and (b) are mutable
    • time spent on estimation does not linearly result in greater accuracy (and in fact there is some evidence to state that the more time spent on estimates the worse they get after their "accuracy" peaks)
    • the developers can estimate relative size of what you know you want now - that most change or increase in detailed specificity will statistically be evenly spread across the stories
    • you want to spend little on long-term plans because you know that the details of those plans' assumptions are subject to change

Benefits of Points

To put this another way, points bring the following benefits to release-level schedule planning:

  • They are cheaper to arrive at
  • They foster collaborative estimation - it's not just developers who can or do estimate, it is a product team including the product owner, analyst, tester and developers
    • As a result, the estimates of size are more transparent and universally agreed upon
    • Now a discussion about the estimates is no longer on "why is this going to take you so long" but rather "why is this so complex"? It is emotionally easier for us to attack the complexity of the story to try and simplify it rather than to attack/defend our estimate of how long it will take to do. The former focuses on the story, the latter focuses (implicitly) on the skills and abilities of the estimator.
  • They form a consistent measurement - the relative size/complexity of one requirement to the other does not shift as much as the effort/duration necessary to complete it (or, put differently, the size/complexity is a variable that changes less often and can be applied to different teams without the need to re-estimate)
  • They allow planning with a level of ambiguity in the requirements

Because of these benefits above, points offer a way not only to plan a release but also to continually plan the release. Put another way, they provide a consistent variable that, when joined with a team's velocity, provides a continually-updated projection of when the target functionality will be completed OR what functionality will be completed when a certain date is met. While this can be attempted with time-based estimates, in order to do so you have to treat the time-based estimates like points, disconnecting them from the actual "time" that they refer to, and apply a "velocity" which in this case would be estimation accuracy.

Why not Time-based Estimates for Release Schedule Planning

While it is possible to continue to strive to use a time-based estimate of a project's scope to plan a schedule, there are several downsides to this approach when compared to using the points-based method. If you accept that requirements are rarely if ever fully fleshed out up front (and that the value of doing so is especially low for subjectively-determined goals such as visual design and usability), you can then propose that this detriment can be mitigated through proper schedule buffering. The assumption of buffering a schedule is that we know there are some unknowns we'll run into and we want to plan for it by adding time to the time-based estimated schedule. However, doing this will still suffer from the following disadvantages over using a points-based estimation technique (plus appropriate schedule and/or feature buffering) to plan the schedule:

  • A focus on time-based estimates will entail much more analysis and planning time as compared to points-based estimation. To maximize time-based estimation accuracy, deep detail is required in the estimates, detailed, wide-ranging and deep designs are required, and a lot of time must be spent to achieve these objectives. This is time well spent only if the effort invested does not incur waste due to (a) changing assumptions like incomplete requirement specification or (b) the results of the effort changing the plan (e.g. expending cost to estimate scope that ultimately cannot be done due to schedule constraints). These factors often arise in the project making waterfall analysis like this quite wasteful.
    • As a corollary to the above statement, time-based estimates, due to their cost and traditional lack of credibility, do not serve any more useful of a purpose at the release-level stage of planning. However, they do have value at various points within the software development process. SCRUM encourages this kind of estimation at the sprint planning phase when the user stories are decomposed into tasks. This helps better plan the sprint and helps the team to know how much they can commit to doing within the iteration.
  • A focus on time-based estimates will tend to focus too much "definition of done" on the development tasks as opposed to what the customer/product owner really desires or how the testing team will treat the requirements. This is the seed of a team which will not work well/collaboratively together. Instead, each "team" (requirements, development, testing) will have competing agendas on what "done" actually means, and no "team" will achieve its goals satisfactorily. (Thinking back at all of the software projects we've done, how often did the PM feel like they got what they wanted with minimal compromise, that developers had the time to write decent, quality code, and that the testers felt like they weren't up against the schedule wall, swamped with so many defects that there was no way we were going to get a "great quality" release out in the time allotted (even with schedule slips)?)
  • A focus on time-based estimates forces the developers too low in their thinking of the projects too soon. Developers eventually need to get to this level but our objective should be to get them understanding the goals of the requirements and thinking "functional design" first, then "technical design."
  • A focus on time-based estimates puts developers at the mercy of the question that management then feels compelled to ask, "why is your estimation accuracy so low?" While it is good to encourage better estimation for tasks, starting with a point-based approach acknowledges that not everything that will occur has been thought of (or could be thought of), and allows developers (and testers/analysts!) the perceived freedom to add new tasks that are necessary for development of the requirements without necessarily being penalized for underestimating a requirement. SCRUM encourages the team members to estimate the tasks decomposed from the user stories in a time-based measurement, but it also encourages these tasks to be small and well-defined enough so as to make it fairly easy to achieve the "accurate estimates" goal. Ideally what you should see in this case is that while a developer may be fairly accurate on the tasks identified ahead of time, those tasks which were not/could not have been identified sooner are the ones that add "unexpected" time to the user story completion. And to complete this point, these tasks are not always added during a sprint because the developer failed to properly decompose the user story - it could very well be that more information was generated about the user story as it was coming together. The tester or analyst might have generated this information just as easily as the developer. The goal of the software development process is to develop software first, not meet an estimation accuracy target. Point-based estimation helps to prioritize these goals accordingly.
  • Time-based estimates necessarily depend on the resources brought to bear on a particular requirement or task. Knowing far ahead who will work on a requirement or task, what environmental factors will be in effect at that time, and even what new/changed information will be available to those resources at the time of execution is rarely possible. Therefore, a time-based estimate's accuracy (however good or bad it is at the start) has a tendency to deteriorate over time. Using points combined with a team's velocity helps to smooth out these uncertainties and allow us to plan and track the schedule with a greater fidelity with a lower cost.

All that said, as a reminder, using points does not preclude using time-based estimation. However, the key is using these different techniques at the right time within the development cycle. Use the least-costly (yet effective) method on the information which has the most uncertainty and the more costly, more precise method when it is (a) needed and (b) the information/assumptions are likely to experience the least amount of change.

Problem: Computing Time per Points

What if my point estimates don't consistently agree with the time-based estimates the team does later? This is often phrased in different ways, but might be summed up in the following: How can I be sure that the point-based estimates are correct if a 1-point requirement takes 10 hours here and 20 hours there? The first point to make here is that what this is really about is trying to map how much time a team takes to complete a point. Or worse, the temptation is to divide the team's velocity by the # of people on the team, possibly adjusting for availability, and compute an average points/person to weigh against individuals ("Scott, why are you so much less productive than Jim - you did 2 points and he did 4 last sprint?").

The point of point-based estimation is to acknowledge and accommodate uncertainty and ambiguity at the time of release planning. It must therefore be acknowledged that there will be a range of time it takes to complete any given point-based requirement. These ranges will necessarily overlap as well. While it is advised in nearly all resources I've looked at to not be concerned with this - the averages of these fluctuations will smooth out the inconsistency they say - there is probably some value in monitoring this to discover earlier rather than later if you have a problem at the root of your point-based estimate. For instance, if the information available at the time of the point-based estimation was very poor, the estimates themselves will suffer accordingly. Hopefully this is more of a case of uniform under- or over-estimation, in which case team velocity will smooth this out from a scheduling/tracking perspective. However, if a high deviation is detected in similarly-estimated requirements, that could be an indicator of inconsistent level of detail/information across the requirements when the estimation was done originally. This may spark a need for a re-estimation (at the points level).

Problem: Managing Consistency of Point Sizes

How can I be sure that the team is estimating consistently both within the project but also across projects? If I cannot count on this, how can I:

  • be sure that the team is working productively compared to other teams?
  • manage a product backlog that will span multiple teams?

There are a few things that are advised to tackle this problem. I will list two:

Triangulation

Triangulation is a technique that teams should use within the project to ensure that they are estimating consistently within that context. This can be done several ways, but one suggested way to achieve this is to "bucket" the similarly-estimated stories together when estimating so that it is visibly apparent what stories are in the "1 point" bucket, "2 point" bucket, etc. When estimating, this provides a better way for folks to "hold" stories up to each other to gauge relative size.

When starting a project, estimated stories from previous projects can also be used to "seed" these buckets. This works best if the same team had estimated these prior projects' stories, however, it is not critical.

To keep the story estimates consistent across project teams, triangulation with the other projects' stories helps achieve this as well. To maximize this, having members from the other teams participate in the project's estimation sessions might help normalize the story point sizes as well.

Product Management suggested this idea: we can also have a "reference backlog" of stories that contains several 1 and 2 point stories. We would make sure that all teams estimating stories in points would start with this reference backlog to help baseline the size estimates.

Ideal Days Basis

Another technique which can be applied here (in addition to triangulation) is to use an ideal day as the basis for a 1-point story. An ideal day is said to be what could be accomplished by an individual if he were totally undistracted and unobstructed. This is not necessarily a guarantee of consistency as the meaning of an "ideal day" will be subjective. A danger point here is that using an ideal day is necessarily bringing in a unit of time, which will focus attention (especially from managers) on how much time was estimated for something vs. how much time it actually took. One must be careful here to remember that an ideal day is merely a consistency-setting device here, not an attempt to estimate duration. Therefore, the output of a point-based estimation using the ideal day as the basis should still be called "points" and not "ideal days" when the estimate is published.

Expectations

Trying to normalize point size across projects and teams is necessary and so should be attempted, but it should be understood and expected that this will not result in perfectly normalized estimates. Each team will be different from many different factors, including

  • Technology
  • Domain
  • Team size
  • Team composition
  • Product owner
  • Tools
  • Working environment
  • Estimator(s)

All of these affect the team's velocity and the point-based estimation values. Even if the same team of people did the point-based estimation for all projects, the point values would still be subject to the value of the information known at the time of estimating.

The expectation should be that we try to normalize point-based estimates, especially across projects on the same product backlog, but that the values are going to be suspect.

Points and Answering the "How Long Will It Take" Question

Given a backlog of point-based estimated requirements, a rough schedule can be computed by factoring in the velocity of the team against that backlog. Team Velocity basically is a measurement (or, in the case where data is not available, a prediction) of how many points' worth of requirements that that team can complete - bring to "done" - in a sprint. If a team has shown a velocity of 10 for instance (averaged over the past few sprints), the remaining backlog can be divided by 10 to realize the number of remaining iterations (don't round down!) to show a projected end date. Ideally, as the team generates more empiracle data, three velocity figures can be used to project best, worst and expected case. Best would take the team's highest average velocity, worst takes the team's lowest average velocity, and expected would take the team's overall average, perhaps just for the past few sprints (to allow for learning-based acceleration).

What if a team has no history to factor into determining a velocity? There are a few techniques that are suggested to arrive at a projected velocity. Obviously real data will be a much better indicator of reality, but one has to start somewhere when a date is being asked for.

Run a Sprint and Generate Actual Data

The ideal case is where there is time to give the project a sprint or two to generate a team velocity projection off of real data. Some "salt" should be taken with early velocity indications as early sprints in projects often experience higher variance (from sprint to sprint) in velocity as the team gels and learns the project.

Decompose and Estimate Several Stories

Take a few stories at various estimation levels to get a fairly good random sample and decompose them into tasks and estimate those tasks. This can then be compared against the projected available hours in a sprint and a rough guess at a velocity can be extrapolated from this. Again, the development tasks are a route to implementing the requirements, they are not the requirements, so some room should be left in that projection to account for uncertainty in that estimate. However, it gives a starting point.

Common Worries about Points

Points Preclude Schedule Planning

(Addressed above - any more needed here?)

Points Excuse Estimation Incompetence

"Developers want to use points to hide the fact that they are incompetent at estimating to begin with."

It would probably be more accurate to say that developers want to avoid making false and inherently unreliable commitments based on incomplete information. The problem at issue is not the developer, but the process of estimation: the traditional "developer estimation process" is fundamentally flawed in that it expects a developer to accurately estimate the time it will take to produce software that the customer wants based on a set of information which is less than complete. The developer can make an educated guess at best as an estimate. When combined with an environment where buffering of estimates is perceived as sandbagging or encouraging waste, you wind up with estimates that are already overly-optimistic (because the developer estimates only what he knows, which is less than the total knowledge required to define the end product result) which are then driven even lower by management pressure, resulting in a schedule that is wrought with unmanageable risk of slippage and/or quality problems. Maybe you've seen this?

Points are an acknowledgement that we don't know everything up front, especially for planning horizons that span months. It is also an acknowledgement that developers (by themselves) do not produce the product - they write code. The point estimation approach is a team-based estimation technique that factors in all team players responsible for delivering the software: product owner, analyst, developer, tester.

So points do not excuse estimation incompetence, they are rather evidence of the development team trying to supply an honest, more useful process for planning a project schedule. John Maynard Keynes is credited with saying "It is better to be roughly right than precisely wrong."

Points Absolve Developer Responsibility

"Developers want to estimate in points to get out from under the responsibility of committing to finishing something in a set timeframe."

This objection misses two critical nuances of point-based estimation.

  1. Developers are not solely responsible for the development of a software product. It is the developer along with the product owner, analyst, tester and any other key players in the team. Together they are jointly responsible for the project, right? Therefore, the schedule for a project should be a factor of all of these resources' ability to collaboratively produce the desired software.
  2. Point-based estimation is used for high-level project schedule estimation and tracking. At the lower levels - within the sprint for instance - we are still expecting a team to decompose a story into tasks that are estimated with time-based units (e.g. hours). The expectation is that the team members are doing their best to estimate these accurately and manage the effort of completing those tasks into the time estimated. However, again, this task breakdown and estimation must be acknowledged to be constrained by what you know at the time you do this. Therefore, you should not expect this effort to be effective until the last possible moment where it is required. Since you are using points and team velocity to track overall project scheduling and status, you don't need this information for the entire project up front. (Caveat - you might decompose some stories this way up front to facilitate a team velocity projection.) Instead, making this effort happen at the beginning of a sprint for the purpose of planning the sprint maximizes the information available for this effort, thereby minimizing the cost of this effort.

In fact, it might be worth stating that an equally audacious (and hopefully untrue) accusation might be to state that "up-front time-based project estimates from developers are required by managers who want to absolve themselves of the responsibility of managing the project - it is a way to keep the blame pointed at the developers (their estimates or their productivity) for project failures."

Points Obscure Developer Productivity

"How can I know how productive Scott is if I don't know how much work he is actually taking on/accomplishing?"

Putting aside for the moment the agile team concepts of self managing teams, points, as stated above, are not intended to (a) be applied at the individual level and (b) plan out the actual tasks that team members work on.

Points are a measurement of how big something is, and velocity is a measurement of how many points a team (not an individual on the team) can chew through in a sprint. It is meant to diminish the role of an individual by stating that it is the responsibility of the team to get that requirement completed. After all, we're developing software for the customer, which means that the developer is but one role in the implementation of that requirement.

The actual tasks that individuals do are estimated in time-units (e.g. hours) at the sprint planning meeting, and if one desires, it is here where one could look to see how "fast" an individual is going relative to those estimates. Points do not come into play here one way or the other.

I would add, however, that measurements of productivity for software engineering are notoriously misleading. This is a universally-acknowledged tough problem of basic productivity measurements of thought-workers. If I sit and think for a week, am I being productive? How do you measure that? Are you happier with me if I did that but told you ahead of time I estimate I needed 2 weeks of thinking? Now I am twice as productive b/c I beat my estimate but produced no physical artifact right? Or, if you like, am I more productive because I write more lines of code? Or write more pages of documentation? What if I produce no physical artifact to prove my productivity but am such a helper to the team (mentoring, coaching, reviewing, problem solving, etc.) that if I stop doing what I am doing the team slows down more than the perceived percentage of contribution that I should have been making to the physical artifacts? My point being that productivity of software engineers is difficult to measure, subtle and is highly subjective.

Points Can Vary from Team to Team

(Addressed above - any more needed here?)

Point Estimates Can Vary From Programmer to Analyst to Tester

"How do you assign an estimate to a requirement when a developer says it is a 4, the tester says it is a 1 and the analyst thinks it is a 2?"

This is a tricky because it belies a subtle misunderstanding of points. As soon as different roles disagree as described above on a point estimate for a story, they are most likely falling into the trap of asking themselves "how much time will this take me to do?"

Points are measurements of size, not duration or even effort. Let's use an analogy to clarify this. Let's say that three of us - a developer, a tester and an analyst - want to go to Starbucks for coffee. We are told that the location from here is to go north one block and west one block. I ask, "how far is that?" This is a question of estimating size. The three of us should not disagree (fundamentally) on the answer to this question assuming we are estimating with the same unit of measure. (As a side point, the greater the precision of your estimation units, the more (a) disagreement and (b) inaccuracy you will incur. As an example of this, if your UoM was blocks, probably we'd all agree and be right in stating it is 2 blocks' walk from here. If we tried to estimate that in inches, our figures would be wildly off and disagreement would abound.)

Point-based estimation is analogous to the "how far is it to Starbucks" example above. We're trying to answer the size question. We should all agree on the answer to this within reasonable limits.

Now, to extend that analogy, the next question might be "how long will it take to get there?" Here we would expect that there could be quite a bit of divergence in the opinions of how long it would take each individual. The developer is 110 years old and uses a walker with high-friction tennis balls jammed on the legs, and that limits his maximum speed to 1 mile per hour. The tester drinks too much caffeine and runs everywhere, and has a minimum speed of 8 mph. The analyst is just smart and will drive.

Is this to state that point-based estimates will not vary from person to person? Of course they can vary. We'll all have some bias as to how we'll approach the problem and what information we might bring to bear on the assessment of a requirement that might not be commonly shared. However, by collaboratively arriving at a point estimation for the requirement, discussing reasons for the variance from person to person, we can mitigate this and usually get estimation variance to within an acceptable range where the team can then decide whether to take the highest, lowest or average value as the final estimation for that requirement.

Tips and Techniques - How to Estimate in Points

Have an Appropriate Point Scale

Here are some pointers/best practices on coming up with a point scale.

  • Use a Fibonacci or exponential scale, for instance: 1, 2, 3, 5, 8 or 1, 2, 4, 8. While some like exponential due to the consistent relativity of the values, others have indicated that the Fibonacci scale has the advantage of providing some extra values for use in sizing stories (e.g. sometimes it's not really an 8 but it is larger than 4 - having 5 is a nice option sometimes). Use what the team feels comfortable using.
    • Don't use a linear scale, as this has the problem that people naturally (a) get less accurate the larger something is (see next bullet point) and (b) want to associate time with a linear scale.
    • People are better choosing an estimation value when the difference between neighboring values is high. The difference between a 2 and 4 is 100% and relatively easy to determine. The difference between a 6 and 7 is 17% - much harder to qualitatively judge. Is a 17% difference really something we can argue over intelligently without spending a lot of time on details which might change anyway? Can we really tell quickly which is right?
    • No in-between estimate values (it is a false sense of precision). If your scale is exponential, for instance, sizing a story at 3 points is not allowed. Instead think of the points as buckets into which you are pouring sand. You can get a little teeny bit over the top but if you can't fit the sand in the bucket, you need the next larger one. The reasoning behind this is stated in above already, but in addition to that: the larger something is, the MORE uncertainty there will be present in that something. Increasing the distance between subsequent estimation values helps to protect against this.
  • Keep the range within an order of magnitude (e.g. 1-10) - we are best at relative comparison in that range.
    • Stories that would be estimated over the top-end of the point scale range might still be estimated (by extending the range) but these are no longer stories, these are epics, and should not fall into sprints at that level of estimation. When planning such a thing for a sprint, or ideally when this epic will fall into an up-coming sprint, try to split the epic into stories.
  • Keep a 0 (zero) point on the scale as an option.
    • Some things really are so small (e.g. correcting a spelling mistake in an existing label?) that you don't want to "burn a point" for it - it would skew the team's velocity and cause possible over-planning of the up-coming sprints.
    • If you have many 0-point stories that fall in a sprint, consider aggregating them or accounting for them with a placeholder 1-point story.

Starting Out: Choosing a Baseline

So you sit down and start going through the requirements/stories and want to start assigning points to a story. How do you start? For that matter, where do you start? How do you determine a baseline so that you can judge all other stories against that baseline? Or put another way, how do you go about choosing a value for your first estimated story?

Associated with this, how do you ensure that your choice of a "1 point" or "2 point" story is close to that of another team's estimations?

There are several methods to do this. I'll list them in order of preference.

  • In a department where there are multiple project teams and you want to normalize story point estimation across those teams and across time, you can use the Triangulation technique. This is described later in this article. In this exercise, you will already have a set of reference stories sized at 1 and 2 points to compare your new stories to.
  • You can try and start without a triangulation source, however, but choosing your own baseline. The following are several ways to do this:
    • Find the simplest stories out of your backlog and start comparing other stories to them. Use "bucketing" (see below) as you begin to estimate in order to group similarly-sized stories. To start, do not assign point values to anything - just group your stories in similar buckets. Your bucket sizes should be relative to the other ones by a factor of 2, meaning that stories in bucket B are 2x the size of stories in bucket A, and the ones in bucket C are 2x the size of those in bucket B, etc. Once you are done, you can begin to assign 1, 2, 4, 8 etc. points to the stories in the relevant buckets.
    • Look at baselining a 2-point estimate as "1 ideal developer week". This is what you could get done in 1 week where you were undistracted and unimpeded. This association with a time-based UOM is only temporary - only for baselining the 2-point story estimates. Once you get a few stories estimated, call them points, not ideal days. Don't let these points be tagged/associated with time.

Some best practices/lessons learned from the industry that apply to this process:

  • Don’t weigh a story on the specification length! Dialog about the story and explore its complexity and size collaboratively rather than rely or be distracted by the amount or lack of documented facts about the story.
  • Try to make sure your average story size (not 1-point story size!) is something that can be completed in the range of 5-8 (?) ideal man days. While this is using a time-based unit of measure to try and size a story, the point here is that stories should be generally small to increase the accuracy of the point-based estimates and allow for mobility of the story in terms of prioritization in the backlog and within the sprint.

Triangulation: Bucketing Estimated Stories

When estimating, you are going to measure one story against another. This is a technique called "estimation by analogy." Which is bigger than the other? One thing you can do to assist in this is to try and "bucket" stories together of the same basic size where the buckets get bigger by 100% for each level. For instance, you would compare stories and try to determine if this is about the same size, 2x the size or 1/2 the size (or beyond that range) and in so doing divide up your stories into buckets where the sizes from smallest to largest form an exponential progression. Given this approach, you are triangulating the size estimation of each story by comparing it to several others, "purifying" the buckets as you do so and determine more information by which to move stories out of one bucket into another.

If you have a collection of already-estimated stories from either a prior project or another team's project, you can also seed your buckets with these to help you divide up your stories faster.

A key here is that you are not going for precision - so don't spend a lot of time arguing over whether this belongs in one bucket vs. the other. If something is larger than the average in bucket A, put it in the next larger bucket. Don't try and divide up buckets into more precise groupings - try and stay as close to the exponential scale as possible. One of the reasons for estimating in points is that it can be done quickly with minimal negative impact to accuracy (your precision is already very coarse-grained, therefor your "inaccuracy" is smoothed out).

Collaborative Estimation: Planning Poker as a Process Device

Planning poker is a method (as well as a web site)[1] to facilitate collaborative estimation. The technique can be applied without using the web site (in fact the web site acknowledges that it is useful to facilitate geographically-distributed teams).

Collaboration in Moderation

How much discussion about a requirement is really needed to do point-based estimation?

Keep in mind that the goal of a point-based estimate is a quick sizing of the requirement, not a detailed, in-depth, fully analyzed, "we know exactly what we're gonna do" kind of effort. The goal of point-based estimation is to size a backlog quickly and with minimal cost.

From Planning Poker in Detail:

Some amount of preliminary design discussion is necessary and appropriate when estimating. However, spending too much time on design discussions is often wasted effort. Here’s an effective way to encourage some amount of discussion but make sure that it doesn’t go on too long.

Buy a two-minute sand timer, and place it in the middle of the table where planning poker is being played. Anyone in the meeting can turn the timer over at any time. When the sand runs out (in two minutes), the next round of cards is played. If agreement isn’t reached, the discussion can continue. But someone can immediately turn the timer over, again limiting the discussion to two minutes. The timer rarely needs to be turned over more than twice. Over time this helps teams learn to estimate more rapidly.

Accept Ambiguity

You don't need to know everything to estimate in points. That's the point (or one of them).

Also, you won't know everything. That's the fallacy of a waterfall planning approach. Making solid plans based on incomplete (and in many cases incorrect) knowledge is foolish.

Point-based estimation says: plan based on what you know. It'll change later. So spend the least amount of time necessary to get a decent point estimate and move on. Better to be roughly right than mostly wrong.

Timebox Estimation: Maximizing Estimation ROI

Ideally, the more time you spend on an estimate effort, the more accurate that estimate would be. However, evidence from our industry as well as directly from our own experience shows this to be untrue - even when an effort to maximize the available information is made for a complete up-front, detailed estimate.

Estimation accuracy vs time.gif

The law of diminishing returns applies to the time spent on an estimate. Some "experts" will even say that estimation accuracy begins to degrade after too much time is spent (the slope of the curve in the above graph would begin to head south).

Where you would draw the line at the point of diminished return is difficult - I've seen many such graphs as the one above and none have #s on the axes. However, the graph is useful for illustrating the point: spending a little time on estimates yields about as much as spending a LOT of time on estimates. Therefore, maximize your ROI by controlling - or time-boxing - the time spent on the estimation effort.

This becomes especially true when doing point-based estimates in a team/group estimation effort because:

  • You are already acknowledging the presence of and inevitibility of ambiguity
  • Striving for precision with point-based estimates reflects an incorrect understanding of point-based estimation and "ruins" the result
  • You are spending a lot of peoples' time - so the cost of the estimate is multiplied accordingly

One way to practically address this is to use a timer during estimation sessions. Set the timer for some value agreed-upon by the group for each requirement estimation cycle, say 4 minutes. Start the timer at the beginning of discussing the requirement and when the timer is up, everyone must estimate.

One point of resistance that teams will undoubtedly experience is the fear of ambiguity and the fear of being "wrong". It'll happen, especially for folks just getting used to point-based estimation. However, learn and adapt, and keep in mind that the lack of precision for point-based estimation covers over a lot of the problems related to ambiguity.

So when discussing the requirement, ask questions, clarify, dig, get what you can get from what is known about the requirement, estimate points, and move on.

References

[1] Mike Cohn, "Agile estimating and planning", Prentice Hall, 2005.
[2] Mike Cohn, "User Stories Applied for Agile Software Development", Addison-Wesley, 2004.


Advertisements






Got something to say? Make a comment.
Your name
Your email address
Message