Created: | 2019-11-14
State: | PROPOSED
Author: | jtattermusch@
Proposes a new "decimal" well known type as a standard representation for accurate fractional (e.g. monetary) values and provide language bindings for languages that have built-in language support for such a type.
"decimal" type is important for many applications (especially financial applications) that need to express fractional values (such as monetary amounts) accurately. While many programming languages offer a built-in support for "decimal" type, protobuf currently doesn't offer an easy way to utilize those built-in types and users are forced to resort to workarounds and to reinvent the wheel.
Better support for decimal type has been requested at https://github.com/protocolbuffers/protobuf/issues/4406 and is heavily upvoted (but there isn't much progress on the issue).
// file:fixed_decimal.proto
package google.protobuf;
message FixedDecimal {
// Whole units part of the amount
int64 units = 1;
// Nano units of the amount (10^-9)
// Must be same sign as units
// Example: The value -1.25 is represented as units=-1 and nanos=-250000000
sfixed32 nanos = 2;
}
Pros
Cons
The type of "units" can be either int64 or sint64. sint64 is more
efficient if negative values are common, but
money.proto
uses int64. So the choice is either staying more consistent with money.proto
or aiming for the highest efficiency.
The type of "nanos" can be either int32, sfixed32 or sint32. int32 is
what money.proto uses, but the signed types will be more efficient for
negative values. One advantage of sfixed32 over sint32 is that the "nanos"
value is likely to be a large number (numerical values like 0.5, 0.25 etc. are
more common than e.g. 0.000000001), and so sfixed32 has potential to be more
efficient that sint32 or int32 (we'd need to do some measurements).
The sign of "units" and "nanos" is required to be the same (money.proto uses the
same restriction). Nevertheless, there are other types (e.g. timestamp.proto),
where the "nanos" part is always positive. The former seems to make more sense
for representing decimal numbers, because e.g. -7.5 will be represented as
units=-7 and nanos=-500000000 and the represented value can always be
computed as units + 0.000000001 * nanos regardless of the sign.
TODO(jtattermusch): do the signs of units vs nanos influence conversion speed between the message fields and the numeric representation? it might for decimal -> FixedDecimal direction of conversion
The message name FixedDecimal(in fixed_decimal.proto) is chosen to make the
nature of the WKT explicit (and to express that this is not necessarily a 1:1 of
language's "decimal" type).
For languages that provide a built-in "decimal" type (see overview in Appendix),
language owners should add an API that allows easy conversion between
FixedDecimal WKT and the built-in "decimal" type in given language.
The API changes in each language should be purely additive. It's fine for language implementations to provide the bindings independent of other languages (e.g. C# might provide the bindings sooner than some other language).
The value range representable by the proposed FixedDecimal message (at most ~29 significant digits) is smaller than the representable range for all the language-specific "decimal" types. Therefore, there should be no issue converting FixedDecimal message to any of the decimal types. In the opposite direction (setting a value from a "decimal" type into a FixedDecimal), there can be an "out of range" error and language implentations should be consistent in how this situation is handled.
Language specific-APIs should generally allow accessing the FixedDecimal values in two modes: 1. access the raw FixedDecimal value (the "units" and "nanos" fields) 2. access the language-specific value as a "decimal" type
// extra methods for FixedDecimal generated class
public partial class FixedDecimal
{
private const decimal NanoFactor = 1_000_000_000;
public decimal ToDecimal()
{
// IsNormalized checks for Units and Nanos having the same sign
// and Nanos being from the right range.
if (!IsNormalized(Units, Nanos))
{
throw new InvalidOperationException(@"Fixed decimal contains invalid values: Units={Units}; Nanos={Nanos}");
}
return Units + Nanos / NanoFactor;
}
public static FixedDecimal FromDecimal(decimal value)
{
// ToInt64() throws OverflowException if value is out of range of int64
var units = decimal.ToInt64(value);
var nanos = decimal.ToInt32((value - units) * NanoFactor);
return new FixedDecimal { Units = units, Nanos = nanos };
}
}
// TODO: we can also add extension methods for "decimal" type to convert into FixedDecimal values
// by invoking value.ToFixedDecimal();
// TODO: in C# we can also define implicit conversion operators between
// FixedDecimal and decimal, but other WKT binding in protobuf C# don't do that
inspired by https://docs.microsoft.com/en-us/dotnet/architecture/grpc-for-wcf-developers/protobuf-data-types#creating-a-custom-decimal-type-for-protobuf and the way bindings for other WKTs are currently implemented in protobuf C#.
TODO: add design
TODO: add design for conversion between FixedDecimal message and BigDecimal type
No special bindings: C++ doesn't have a standard way to represent the "decimal" type, and until such type exists, users can just access the raw message (the "units" and "nanos" fields directly).
TODO: add design
TODO: add design
TODO: add design
Many WKT types provide a custom JSON serialization format (e.g. timestamp.proto).
One way to represent the decimal number in JSON is with a string (e.g.
"123.25") that complies with the Java's
decimal floating point literal.
On the other hand, just using the default JSON format (2 fields "units" and "nanos") doesn't require a special JSON handling to be implemented in all languages and is more consistent with TextFormat and the representation of the message itself. 2 fields "units" and "nanos" seem to be human readable enough for this to be an option.
Better support for decimal type has been requested externally at https://github.com/protocolbuffers/protobuf/issues/4406 and is heavily upvoted (but there isn't much progress on the issue).
This change should definitely be released to open-source.
Not much beyond the risk of coming with bad design that provided suboptimal developer experience.
Several different .proto representations are proposed in this doc. Approach 1 seems to have the best pros/cons.
message Decimal {
int64 value = 1;
int32 scale = 2;
}
message Decimal {
// 96-bit mantissa broken into two chunks
// this representation matches exactly the C# decimal spec
// but not such a good match for other languages
uint64 mantissa_msb = 1;
uint32 mantissa_lsb = 2;
required sint32 exponent_and_sign = 3;
}
Pros
Cons
message DecimalValue {
/* This string contains a decimal floating point literal in the format
* defined by the Java Language Specification: https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-DecimalFloatingPointLiteral
*/
string value = 1;
}
The key features of "decimal" type are that its internal representation must be base-10 (as opposed to base-2 used for floating point types) and that it provides some level of rounding protection (in the sense that it forbids values that would require storing more significant digits than what the type itself is capable of - this is what can lead to imprecise representation of floating point values).
C#: decimal 128bit in total, 96-bit bit mantissa, exponent -28..0 (28-29 significant digits).
Python: decimal 28 digits precision by default (can be adjusted)
Java: BigDecimal uses arbitrary precision, the type is slightly heavy-weight
Go: no officially recommended type, but there are community provided libraries with decimal support
Ruby: BigDecimal uses arbitrary precision
ObjC: NSDecimal stores up to 38digits, exponent -128 - 127
Swift: Decimal essentially a value-type wrapper around NSDecimal, so same constraints/bugs as ObjC
https://en.wikipedia.org/wiki/Decimal_data_type
BigQuery NUMERIC type: "exact numeric value with 38 digits of precision and 9 decimal digits of scale" https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#numeric-type
TODO(jtattermusch)