maths

Usually, the best approach is to propagate the uncertainty, for example by saving the uncertainty as another variable in the database and using it directly when the number is used. If you do that, there is no practical needs to lose time to format the numbers. Using significant numbers seems a "cheap trick" that risk to mislead you more often than help.

Visit for more information maths

Significant figures are not a convention for making your deliverable pretty. They have semantic meaning. Don't think about dumb rules from high school chemistry, think about the actual problem. There are two entwined sources of uncertainty I am referring to:

1) measurement uncertainty, due to a lack of precision in the instrument (or the quantity itself, e.g. many financial computations are not meaningful if they involve fractional cents)

2) computational uncertainty, which is exclusively due to algebraic propagation of measurement uncertainty

Far too many data scientists don't care about the first category of uncertainty because they don't care about where the data came from. And they don't even realize the second category is a problem.

No, that is the exact opposite of what I said! For starters, "uncertainty" and "error" are not the same thing here. I am saying significant figures in a measurement encapsulates an inherent measurement-specific uncertainty conveyed by significant figures, and that this uncertainty must be considered when doing calculations with that measurement. Just like the person I responded to, I don't think you've thought about why significant figures actually exist in the first place.

The correct solution is error propagation (with appropriate estimates of the errors of the inputs), not arbitrarily rounding numbers at each step

The whole point of my argument is that uncertainties in calculated quantities can be rigorously determined from the uncertainty of the inputs, and measurement inputs have uncertainty determined by the significant figures. On the other hand, ignoring significant figures in calculations means we're ignoring a potential source of uncertainty in downstream analysis. If you think significant figures is about "arbitrarily rounding something" then you are thoughtlessly applying high school chemistry rules. Please read this carefully:

Programmers and data scientists are lazy about significant figures because they don't care where the data is coming from, to them it's all doubles in a database, and significant figures is just a matter of rounding things correctly at the end. The area-of-a-square argument proves that this is a mistake.

You are explaining error propagation, but my point is that _if you are doing error propagation (as you should do if you want to do things properly), significant figures ARE just for making deliverable pretty_.

This is why the other person was talking about "arbitrarily rounding".

(and, no, you should not do the distinction "it's a measurement, so it's written differently", because in practice, a lot of "measurements" are in fact already a transformation, and sometimes you cannot even know for sure yourself. For example, a temperature sensor will measure an electrical resistance (with a measurement uncertainty) and then convert it into a temperature, and according to you, it should not be written the same way, just for arbitrary reasons)

I know the notion of measurement that you try to explain, I've studied it when I was an undergraduate students. Since then, I have passed beyond this notion and use something better. It's not a matter of "you don't understand", it's rather a matter of "you understand too well and see the limits of this notion and that it's not useful for you anymore".

The book you share seems to confirm that: it is for undergraduates. Things get more complicated with real world practice, and the basic rules used to forge the understanding needs to be left behind. For undergraduate students, they are going to do basic lab experiment with a ruler and a chronometer, and the goal is just to practice, not to answer to a real unknown situation. In real life, no one needs to measure things as trivial as what they are measuring. When people do that, they realise that the distinction between calculated value and measured value is meaningless and not helpful at all.

It reminds me of the tomato. Some people don't know the tomato is a fruit. Some people know the tomato is a fruit and treat them like a fruit. Some people know the tomato is a fruit and still treat them like a vegetable because it is what makes sense.

In practice, I have never saw someone mistreating a badly written number in a way that had any impact. If they don't know themselves the concept of significant digit and believe the number is 1.234567890 precisely and not 1.234567889, what would be the wrong decision they will take that they would not have taken if the number was 1.234567889?

It starts to matter when you have 10% or ~100% uncertainty, in which case, writing 1.2 or 1 is still not enough to convey the meaning to someone who does not get significant numbers, because for them, 1 = 1.0000 anyway. In this case, you need to explicitly explain the limitation on decision due to the uncertainty.

In practice, splitting hairs on the significant digit convention is just missing the point: if you apply the convention to people who are not informed on the precision, they will make bad decision anyway even if technically the number has the correct significant number.

123.45 Kg and 7.32 Kg have the same decimal places, but the former has more significant figures, and implies greater precision.

If the number stays at 32.56m for a period of time, you want them to consider it sideways movement rather than start worrying about the fact that it did technically go up by a few hundred dollars.

Massive computer experiments are common and (often) cheap. If I’m comparing two classification models on a large dataset a difference between 97% and 96.8% accuracy might well be both statistically and practically significant.

An error I’ve seen professional statisticians make is asserting that averaging samples (or performing more complex statistical analyses like regression) justifies increasing the number of significant digits. For example, if I have a hundred observations measured with two significant digits of precision, the mean of those observations is asserted to be accurate to three significant digits, the standard deviation of the mean being 1/10th that of each observation. This is a natural conclusion if you treat the significant digits as accounting only for independent random errors, but in fact significant digits are also used to account for systematic errors in measuring equipment. A scale that measures down to 0.1g, for example, may be miscalibrated by 0.03g. This makes reporting the mean of 100 samples down to 0.01g obviously nonsensical.

In short, how many digits to report deserves more thought than either most data scientists or the author give it.

The latter is the right answer in almost all instances.

Most human beings are happy with a 1% error margin.

While he doesn't say it as a rule outright, the author repeatedly uses a "small and fixed" two significant digits. He later says this:

You must choose the number of significant digits deliberately.

which I agree with, but is at odds with the "small and fixed" dictum with which he leads off the post.

They are like "Chesterton's digits" when they appear unexplained in parameter files. Do we really know this number or is it just a wild-assed guess?

Does it matter? You have 23 bits of mantissa to fill one way or the other, and if you don't know the lower, say, 12, one pattern of bits is as good as any other pattern.

Humans know the difference between "1.0" and "1.0000" but they're the same to the computer.

In my experience reporting models in industry, even 1 or 2 decimal places are irrelevant to the overall decision to be made, at least if the number is around "87%" as stated in the example.

Of course it's context dependent. If you're evaluating incremental improvements perhaps that warrants more careful precision.

After all, by definition each decimal place is 10x less important.

I’m not sure about that, since it depends on the error margin tolerance one has.

Regardless, my biggest gripe with most reports of quantitative information I stumble across is that, lacking mention of significant digits accuracy, it’s very easy (and I’ve seen this happen a lot) for people to attribute more accuracy than what is there to a value.

One example: AWS cloudwatch reports external replication delay in MySQL Aurora in milliseconds, which is impossible, since the value reported by MySQL is in seconds. Yet many times I’ve been involved in discussions with people who say “our system can tolerate at most 500ms” of replication delay for reads, because they see the value reported in that unit.

There is no rule of thumb, except that people should understand that I can give the illusion of precision to every measurement by capturing crap. Those 32 bits on your laptops onboard audio interface are the real banger, when even the least significant digits in 16 bit mode are filled with electromagnetic interference and noise from the power rails. Real, actual precision is way more hard to reach and you will know when you reach it, because you will have spent a lot of money on the way.

Whether you need it, or wield it correctly is a different story.

Are all location estimates that accurate? No. Should the precision of your location estimate be limited by your storage format? Of course not. That's why doubles are frequently used.

I've heard that some on this site do spend time measuring plate drift and correcting for atmospheric path distortion on long soak GPS base stations (that spend months recording GPS data for a fixed position to smooth out satellite and path error).

Your point stands though, there's no sense displaying drama digits that lack meaning in context.

Teachers have a lot less latitude nowadays; see for an example that is shocking in the 1960s context.

Granted, the teacher that tried to teach me this had no understanding of it themselves and would incorrectly mark answers wrong when they had the correct number of significant digits, so it was pretty botched in my case. Still was attempted in the curriculum though.

That’s the point. That’s the correct usage of significant figures. Your inputs have significant figures ; it determines exactly the significant figures of your output. Significant figures have nothing to do with things like sampling variance.

What you’re looking for is not significant figures, it’s confidence intervals.

"For JPL's highest accuracy calculations, which are for interplanetary navigation, we use 3.141592653589793. Let's look at this a little more closely to understand why we don't use more decimal places."

"Let's go to the largest size there is: the known universe. The radius of the universe is about 46 billion light years. Now let me ask (and answer!) a different question: How many digits of pi would we need to calculate the circumference of a circle with a radius of 46 billion light years to an accuracy equal to the diameter of a hydrogen atom, the simplest atom? It turns out that 37 decimal places (38 digits, including the number 3 to the left of the decimal point) would be quite sufficient. "

When we go to industry or "the real world", it's as if we are forced to unlearn all good practices

Obs: Don't wanna sound a douche here but I can't help but notice that they're talking about accuracy citing Wikipedia.