How to Design an Audit Sampling Plan

Q: What comparisons does ISA 530 require when evaluating sampling results?

ISA 530.A22 requires comparing projected misstatement to tolerable misstatement (the pass/fail line) and also comparing projected misstatement to the expected misstatement used when sizing the sample. If projected errors significantly exceed expected errors, the auditor must consider whether the sample remains sufficient, even if projected misstatement is below tolerable misstatement. A third consideration is the qualitative nature of the misstatements found.

Q: Is haphazard selection appropriate for MUS sampling?

No. ISA 530.A13 explicitly states that haphazard selection is not appropriate for statistical sampling, and MUS is a statistical method. If the working paper describes the sampling approach as MUS but the selection method as haphazard, the two are inconsistent. For MUS, use systematic monetary unit selection with a random starting point, or random selection.

Most firms we work with set a sample size from a PY template and never recompute it, even when the population, the risk assessment, or the materiality figure has moved. We call it PIOOMA. Pulled It Out Of My A**. The sample size looks calculated, but in reality the senior just rolled last year’s number forward because nobody wanted to redo the selection halfway through fieldwork. That gap between using a sampling tool and being able to defend what drives the numbers is where most ISA 530 deficiencies start. The AFM and FRC have both flagged insufficient documentation of the rationale behind sample sizes, not the sample sizes themselves.

ISA 530.6 defines audit sampling as applying audit procedures to less than 100% of items within a population, where all items have a chance of selection. In practice, designing the plan means defining the population and sampling unit, choosing between monetary unit and attribute sampling, setting tolerable misstatement from your ISA 320 performance materiality, estimating expected misstatement from PY results, calculating sample size, and evaluating results by projecting found errors to the population.

Key takeaways

How to choose between monetary unit sampling (MUS) and attribute sampling based on what assertion you’re testing, with the decision criteria from ISA 530 .A10 and A11
How to determine sample size using the four inputs that drive every sample: population, tolerable misstatement, expected misstatement, and the acceptable risk of incorrect acceptance
How to evaluate results properly using all comparisons ISA 530 .A22 requires (not just the pass/fail threshold most teams stop at)
How to document the sampling plan so it survives both internal quality review and external inspection

What ISA 530 actually requires you to document

ISA 530.6 defines audit sampling as applying audit procedures to less than 100% of items within a population, where all items have a chance of selection. This matters because selecting specific high-value items or items with particular characteristics is not sampling under ISA 530 . That’s targeted testing under ISA 500 . Mixing the two in a single WP without distinguishing them is a common documentation failure.

ISA 530.7 requires you to determine a sample size sufficient to reduce sampling risk to an acceptably low level. The standard does not prescribe a specific method for calculating that size. It requires that the auditor consider the purpose of the procedure and the characteristics of the population ( ISA 530 .A5). In practice, this means the WP must show your reasoning from the purpose of the test through to the number you landed on. A sample size without a recorded rationale does not comply.

ISA 530.8 requires the auditor to select items such that each sampling unit has a chance of selection. This rules out haphazard selection for statistical sampling ( ISA 530 .A13 explicitly notes that haphazard selection is not appropriate when using statistical sampling techniques). For non-statistical sampling, haphazard selection is permitted but carries documentation risk: a reviewer can’t verify that each item genuinely had a chance of selection unless you describe the method used.

The documentation requirements sit in ISA 530.14 and ISA 230.8 . You must record the sampling approach, the selection method, the sample size and how it was determined, the procedures performed on each item, and the results evaluation. None of these can be implicit. If the WP says “sample of 25 selected” without explaining why 25, the documentation is insufficient.

Most files we’ve reviewed set a sample size without any documented link to risk. Nobody wants to redo the selection halfway through fieldwork, so the number stays, and the next year’s senior just rolls it forward again.

Choosing between MUS and attribute sampling

The choice between monetary unit sampling and attribute (or classical) sampling is not a preference. It depends on what you’re testing and what type of misstatement you expect to find.

Monetary unit sampling (MUS) works by treating each monetary unit in the population (each euro) as a separate sampling unit. A €100,000 invoice has 100,000 chances of selection; a €500 invoice has 500 chances. MUS is the standard approach for tests of details where the objective is to detect monetary overstatement. ISA 530 .A10 notes that MUS gives higher-value items a greater probability of selection, which aligns naturally with the audit objective of detecting material misstatement. The practical effect: MUS is the default for balance sheet testing (receivables, inventory, payables, accrued income) on most mid-tier engagements.

MUS has two practical limitations. First, it is designed to detect overstatement. If the population could contain understatements (testing for completeness rather than existence), MUS applied to the recorded balance will systematically undersample. Second, MUS does not handle populations with many small errors well. If you expect a high rate of small misstatements spread across many items, MUS can produce misleading projections because it assumes errors are isolated.

Attribute sampling treats each item in the population as a single sampling unit regardless of value. It is the standard approach for tests of controls, where you’re testing a rate (the deviation rate) rather than a monetary amount. If you need to determine whether a control operated effectively across the population (e.g., purchase orders approved before goods receipt), attribute sampling gives each transaction an equal chance of selection, which is what you need.

For tests of details where understatement risk is the concern (testing completeness of liabilities, for example), attribute sampling on a population defined from an alternative source (subsequent payments or supplier statements) is more appropriate than MUS on the recorded balance. ISA 530 .A11 acknowledges that the sampling approach should be designed to meet the specific audit objective.

The decision is not complicated. MUS fits overstatement testing on monetary balances. Attribute sampling fits control testing and rate-of-deviation questions. For understatement risk or populations where errors are expected to be numerous and small, classical variable sampling is more appropriate.

The four inputs that determine sample size

Every sample size calculation, whether you use a statistical formula, a firm methodology table, or ciferi’s MUS sampling calculator, reduces to four inputs. Understanding what each one does is the difference between using the tool and defending the result.

1. Population value. For MUS, this is the recorded monetary value of the population. For attribute sampling, this is the number of items. The population must be defined precisely: ISA 530 .A5 requires that the population is appropriate for the objective. Testing revenue existence from the revenue ledger is appropriate. Testing revenue completeness from the revenue ledger is not, because items missing from the ledger can’t be sampled from it.

2. Tolerable misstatement (for MUS) or tolerable deviation rate (for attribute sampling). Tolerable misstatement comes directly from your PM calculation under ISA 320 . For most substantive tests, tolerable misstatement equals PM for the relevant account. ISA 530 .A3 states that tolerable misstatement is the application of PM to a particular sampling procedure. The lower you set tolerable misstatement, the larger the sample. A reviewer who sees a tolerable misstatement figure that doesn’t trace back to the ISA 320 WP will flag it.

3. Expected misstatement. This is your best estimate of the actual misstatement in the population before you test it. ISA 530 .A4 notes that higher expected misstatement requires a larger sample. Where does the estimate come from? PY audit results (did last year’s sample find errors?), interim testing results, analytical procedures, and understanding of the client’s control environment. If you set expected misstatement at zero and then find errors, your sample may have been too small to draw a valid conclusion, even if projected misstatement is below tolerable misstatement. The sample was designed for a population with no errors. The population had errors.

4. Acceptable risk of incorrect acceptance (for substantive tests) or risk of overreliance (for control tests). This is the complement of the confidence level. A 5% risk of incorrect acceptance means 95% confidence. ISA 530.7 requires this to be “acceptably low” without specifying a number, but sampling tables and formulas require a specific value. Most firm methodologies set this at 5% for high-risk areas and 10% for lower-risk areas. The risk of incorrect acceptance has an inverse relationship with detection risk from ISA 330 : the lower the detection risk required (because inherent risk and control risk are high), the lower the acceptable risk of incorrect acceptance must be, and the larger the sample.

These four inputs interact. You can’t change one without affecting the sample size. Documenting all four in the sampling WP (with cross-references to the ISA 320 materiality memo and the PY results) is what separates a defensible sample from an arbitrary number.

Selecting items from the population

ISA 530.8 permits four selection methods. Each has a specific use case.

Random selection assigns a selection probability to each item using random number generation. Required for statistical sampling. Every item has a known, calculable chance of selection. The random seed or number table should be documented.

Systematic selection picks every *n*th item from an ordered population, with a random starting point. ISA 530 .A12 notes that systematic selection is appropriate provided the population is not structured in a pattern that coincides with the selection interval. Ordering invoices by date and selecting every 15th one works. Ordering by department code where department sizes vary might not.

Haphazard selection picks items without any structured technique, attempting to avoid bias. ISA 530 .A13 states it is not appropriate for statistical sampling. Even for non-statistical sampling, it carries risk: a reviewer cannot independently verify that every item had a chance of selection. If you use it, document explicitly how you avoided conscious bias (e.g., “items selected by scrolling through the population listing with eyes averted from the value column”).

Block selection picks a contiguous block of items (e.g., all transactions in March). ISA 530 .A14 notes that block selection is rarely appropriate for audit sampling because a single block cannot represent the full population. Use it only as a supplement to other methods, never as the sole selection approach.

For MUS, the standard selection technique is systematic monetary unit selection with a random starting point. Calculate the sampling interval (population value divided by sample size), generate a random start between 1 and the interval, then select items at each multiple of the interval. Each item containing the *n*th monetary unit is selected. This automatically gives higher-value items proportionally greater selection probability.

Worked example: sampling trade receivables at Pieters Bouw B.V.

Client scenario: Pieters Bouw B.V. is a Dutch construction company based in Eindhoven with €34M in revenue. Trade receivables at year-end total €5.8M across 412 outstanding invoices. The engagement team is performing a test of details for the existence and valuation of receivables. PY testing found two misstatements totalling €18,000 in projected error. Inherent risk for receivables is assessed as significant. PM is set at €170,000.

Define the population and sampling unit

The population is the trade receivables ledger at 31 December: €5.8M across 412 invoices. The sampling unit for MUS is one euro. For items above the sampling interval (calculated in step 4), each item will be individually selected.

Documentation note: record the population source (trade receivables subledger at 31 December, extracted from the client’s SAP system on [date]), total value (€5,800,000), item count (412), and confirm the population was reconciled to the general ledger balance.

Set tolerable misstatement

Tolerable misstatement equals PM: €170,000. This traces to the ISA 320 materiality WP (WP ref: A3.1).

Documentation note: cross-reference the ISA 320 memo. Record tolerable misstatement as €170,000 and state the basis (PM for the receivables balance).

Determine expected misstatement

PY testing projected €18,000 in total misstatement. No interim exceptions were identified. The client’s credit control processes have not changed. Expected misstatement: €20,000 (slightly above PY as a conservative estimate).

Documentation note: state the expected misstatement (€20,000), the basis for the estimate (PY projected error of €18,000 adjusted upward for conservatism), and note that no new risk factors were identified.

Calculate sample size and sampling interval

Using MUS with a 95% confidence level (5% risk of incorrect acceptance, appropriate given the significant inherent risk assessment), the reliability factor for zero expected errors at 95% confidence is 3.0. Adjusted for expected errors: the sample size formula produces approximately 58 items.

Sampling interval: €5,800,000 / 58 = €100,000.

Any individual receivable above €100,000 will be selected with certainty (individually significant items). Of the 412 invoices, 8 exceed €100,000 (totalling €1,240,000). The remaining population of €4,560,000 is sampled using systematic MUS with a random start.

Documentation note: record the confidence level (95%), reliability factor (3.0), expected error allowance, calculated sample size (58), and sampling interval (€100,000). List the 8 individually significant items separately. Document the random starting point for systematic selection.

Perform the procedures and evaluate results

The team sends confirmations for 58 items plus the 8 individually significant items (66 total procedures). Two exceptions are identified: one confirmation reply shows a €4,200 overstatement on a €28,000 invoice (a pricing error), and one individually significant item of €145,000 is confirmed at €139,000 (a €6,000 timing difference that represents a genuine misstatement).

Evaluating results: the comparison most files miss

ISA 530 .A22 requires you to make comparisons when evaluating MUS results. Most WPs make one: projected misstatement versus tolerable misstatement. If the projected error is below tolerable misstatement, the sample passes. File gets signed.

But ISA 530 .A22 identifies additional considerations. You should compare projected misstatement to the expected misstatement you used when you sized the sample. In the Pieters Bouw example, expected misstatement was €20,000. If projected misstatement comes back at €48,000 (still below the €170,000 tolerable misstatement), the sample “passes” on the tolerable misstatement test. But projected errors are more than double what you expected. The sample was sized for a €20,000-error population. It encountered a €48,000-error population. That discrepancy doesn’t invalidate the conclusion automatically, but it does require you to consider whether the sample remains sufficient.

ISA 530.12 addresses this directly: if the auditor concludes that audit sampling has not provided a reasonable basis for conclusions, the auditor shall request management to investigate identified misstatements and the potential for further misstatements, and make any necessary adjustments. Alternatively, the auditor shall modify the nature, timing, or extent of further audit procedures.

The practical implication: your evaluation section needs two documented comparisons, not one. First, projected misstatement versus tolerable misstatement (the pass/fail line). Second, projected misstatement versus expected misstatement (the “was my sample designed for this population?” check). If the second comparison shows a material exceedance, document your conclusion on whether additional procedures are needed, even if the first comparison passes.

A third consideration under ISA 530 .A22: the qualitative nature of the misstatements found. A single pricing error and a timing difference have different implications for the audit conclusion than two instances of management override. Even if the monetary amounts are identical, the qualitative assessment may require different responses.

Practical checklist for your next sampling plan

Action items

What to do Monday morning

Before calculating sample size, verify the population is appropriate for the audit objective ( ISA 530 .A5). Existence testing uses the recorded ledger as the population. Completeness testing uses an alternative source. If your population doesn’t match your objective, the sample is invalid regardless of size.
Document all four inputs in the sampling WP: population value, tolerable misstatement (cross-referenced to the ISA 320 memo), expected misstatement (with source), and the acceptable risk of incorrect acceptance. Missing any one of these is a documentation deficiency.
Separate individually significant items (those exceeding the sampling interval) from the sampling population. Test them 100%. Record the separation in the WP so a reviewer can trace how many items were sampled versus individually tested.
After evaluating results, perform both comparisons: projected misstatement versus tolerable misstatement, and projected misstatement versus expected misstatement. If projected errors significantly exceed expected errors, document your assessment of whether the sample remains sufficient ( ISA 530.12 ).
Record the selection method used and retain the random starting point or random number sequence. For systematic MUS selection, the random start and interval must be documented so the selection is reproducible by a reviewer.
If any misstatement is identified, document its qualitative nature alongside the monetary projection. Two errors of €5,000 do not carry the same audit significance if one is a timing difference and the other is a management adjustment made after the confirmation was sent.

Common mistakes

Firms sometimes use haphazard selection for MUS procedures. ISA 530 .A13 is explicit: haphazard selection is not appropriate for statistical sampling. If the WP describes the sampling approach as MUS (a statistical method) but the selection method as “items selected by scrolling through the listing,” the two are inconsistent. A reviewer will flag this as a methodology error.

Glossary: monetary unit sampling. Covers the definition, the relationship between MUS and probability-proportional-to-size selection, and when MUS is appropriate versus classical variable sampling.
MUS Sampling Calculator. Use this to calculate sample sizes for MUS testing, with inputs for population value, tolerable misstatement, expected misstatement, and confidence level.
How to calculate and document materiality under ISA 320 . The tolerable misstatement input in every sampling plan traces directly to your ISA 320 performance materiality calculation. Start there if you haven’t set materiality yet.

Related guides:

Put audit concepts into practice with these free tools:

Frequently asked questions

When should I use MUS instead of attribute sampling?

Use monetary unit sampling (MUS) for tests of details where the objective is to detect monetary overstatement on balance sheet items such as receivables, inventory, and payables. MUS gives higher-value items a greater probability of selection, which aligns with the audit objective of detecting material misstatement. Use attribute sampling for tests of controls where you are testing a deviation rate, and for tests of details where understatement is the primary risk.

What are the four inputs that determine audit sample size?

Every sample size calculation reduces to four inputs: (1) population value or item count, (2) tolerable misstatement (from ISA 320 performance materiality) or tolerable deviation rate, (3) expected misstatement based on prior-year results or risk assessment, and (4) the acceptable risk of incorrect acceptance (the complement of the confidence level). All four must be documented in the sampling working paper with cross-references.

What comparisons does ISA 530 require when evaluating sampling results?

ISA 530 .A22 requires comparing projected misstatement to tolerable misstatement (the pass/fail line) and also comparing projected misstatement to the expected misstatement used when sizing the sample. If projected errors significantly exceed expected errors, the auditor must consider whether the sample remains sufficient, even if projected misstatement is below tolerable misstatement. A third consideration is the qualitative nature of the misstatements found.

Is haphazard selection appropriate for MUS sampling?

No. ISA 530 .A13 explicitly states that haphazard selection is not appropriate for statistical sampling, and MUS is a statistical method. If the working paper describes the sampling approach as MUS but the selection method as haphazard, the two are inconsistent. For MUS, use systematic monetary unit selection with a random starting point, or random selection.

What should I do if expected misstatement was set at zero but errors are found?

If you set expected misstatement at zero and then find errors, your sample may have been too small to draw a valid conclusion, even if projected misstatement is below tolerable misstatement. The sample was designed for a population with no errors, but the population had errors. Under ISA 530.12 , consider whether additional procedures are needed and document your assessment.