瀏覽代碼

public version of FixedDecimal proposal

Jan Tattermusch 5 年之前
父節點
當前提交
1d6d1a698d
共有 1 個文件被更改,包括 348 次插入0 次删除
  1. 348 0
      docs/proposal_decimal_wkt.md

+ 348 - 0
docs/proposal_decimal_wkt.md

@@ -0,0 +1,348 @@
+# Protobuf "decimal" well known type
+
+
+Created: | 2019-11-14
+
+State:   | PROPOSED
+
+Author:  | jtattermusch@
+
+
+## Summary
+
+Proposes a new "decimal" well known type as a standard representation for
+accurate fractional (e.g. monetary) values and provide language bindings for
+languages that have built-in language support for such a type.
+
+## Motivation
+
+"decimal" type is important for many applications (especially financial
+applications) that need to express fractional values (such as monetary amounts)
+accurately. While many programming languages offer a built-in support for
+"decimal" type, protobuf currently doesn't offer an easy way to utilize those
+built-in types and users are forced to resort to workarounds and to reinvent the
+wheel.
+
+Better support for decimal type has been requested at
+https://github.com/protocolbuffers/protobuf/issues/4406 and is heavily upvoted
+(but there isn't much progress on the issue).
+
+## Detailed Explanation
+
+### Requirements for the new decimal well known type
+
+-   introduce the "decimal" type as a "well-known type" (as opposed to extending
+    the protobuf wire protocol, which would be way too complicated). Protobuf
+    library implementations in different languages will then provide
+    language-specific bindings and helpers to integrate with the native
+    "decimal" type supported by given language.
+-   Provide message representation that is reasonably usable if the well known
+    type doesn't have special handling in given language (= not all languages
+    will have a special binding for the new well known type and the type must
+    still be usable in its "raw form" as a protobuf message).
+-   Match semantics of decimal type in popular programming languages (if
+    possible, in different languages the built-in decimal type can have slightly
+    different properties).
+-   Stay consistent with other pre-existing commonly used .proto types so that
+    the new WKT fits well into the existing ecosystem (e.g. there's a
+    money.proto)
+
+### Rollout plan
+
+-   Decide which .proto definition of the "decimal" type approach to use (see
+    below)
+-   Check in the well known type .proto definition of the type
+-   Language owners can provide their own bindings for the WKT
+-   Because the "decimal" WKT will be designed to be usable even without special
+    language bindings, initially the language-specific bindings for the WKT can
+    be marked as "experimental" in each of the languages and they can be later
+    stabilized once they prove themselves.
+
+### Proposal: message with "units" and "nanos"
+
+```
+// file:fixed_decimal.proto
+package google.protobuf;
+
+
+message FixedDecimal {
+
+    // Whole units part of the amount
+    int64 units = 1;
+
+    // Nano units of the amount (10^-9)
+    // Must be same sign as units
+    // Example: The value -1.25 is represented as units=-1 and nanos=-250000000
+    sfixed32 nanos = 2;
+}
+```
+
+Pros
+
+-   simple concept, easy to understand for users
+-   values are human readable and easy to understand without specialized support
+-   Very similar to .proto definitions that have already been used internally at
+    google
+-   Consistent with other widely used .proto definitions (such as money.proto)
+-   range of values can be safely expressed by "decimal" type in all languages
+    that support such type.
+-   representable value range should be good enough to vast majority of
+    applications (e.g. for financial applications)
+-   Currently this is recommended for C# users by Microsoft (= which gives some
+    indication that this approach works well in the .NET ecosystem).
+    https://docs.microsoft.com/en-us/dotnet/architecture/grpc-for-wcf-developers/protobuf-data-types#decimals
+
+Cons
+
+-   lower range and resolution than "decimal" types in all languages (the good
+    news is all common decimal types can represent the values carried by the
+    proto, but converting from lang-specific decimal values can lose precision /
+    throw exception)
+
+The type of "units" can be either `int64` or `sint64`. `sint64` is more
+efficient if negative values are common, but
+`money.proto`
+uses `int64`. So the choice is either staying more consistent with `money.proto`
+or aiming for the highest efficiency.
+
+The type of "nanos" can be either `int32`, `sfixed32` or `sint32`. `int32` is
+what `money.proto` uses, but the signed types will be more efficient for
+negative values. One advantage of `sfixed32` over `sint32` is that the "nanos"
+value is likely to be a large number (numerical values like 0.5, 0.25 etc. are
+more common than e.g. 0.000000001), and so `sfixed32` has potential to be more
+efficient that `sint32` or `int32` (we'd need to do some measurements).
+
+The sign of "units" and "nanos" is required to be the same (money.proto uses the
+same restriction). Nevertheless, there are other types (e.g. timestamp.proto),
+where the "nanos" part is always positive. The former seems to make more sense
+for representing decimal numbers, because e.g. -7.5 will be represented as
+`units=-7` and `nanos=-500000000` and the represented value can always be
+computed as `units + 0.000000001 * nanos` regardless of the sign.
+
+TODO(jtattermusch): do the signs of units vs nanos influence conversion speed
+between the message fields and the numeric representation? it might for decimal
+-> FixedDecimal direction of conversion
+
+The message name `FixedDecimal`(in `fixed_decimal.proto`) is chosen to make the
+nature of the WKT explicit (and to express that this is not necessarily a 1:1 of
+language's "decimal" type).
+
+### API Changes
+
+For languages that provide a built-in "decimal" type (see overview in Appendix),
+language owners should add an API that allows easy conversion between
+`FixedDecimal` WKT and the built-in "decimal" type in given language.
+
+The API changes in each language should be purely additive. It's fine for
+language implementations to provide the bindings independent of other languages
+(e.g. C# might provide the bindings sooner than some other language).
+
+The value range representable by the proposed FixedDecimal message (at most ~29
+significant digits) is smaller than the representable range for all the
+language-specific "decimal" types. Therefore, there should be no issue
+converting FixedDecimal message to any of the decimal types. In the opposite
+direction (setting a value from a "decimal" type into a FixedDecimal), there can
+be an "out of range" error and language implentations should be consistent in
+how this situation is handled.
+
+Language specific-APIs should generally allow accessing the FixedDecimal values
+in two modes: 1. access the raw FixedDecimal value (the "units" and "nanos"
+fields) 2. access the language-specific value as a "decimal" type
+
+#### C# bindings
+
+```
+// extra methods for FixedDecimal generated class
+public partial class FixedDecimal
+{
+    private const decimal NanoFactor = 1_000_000_000;
+
+    public decimal ToDecimal()
+    {
+        // IsNormalized checks for Units and Nanos having the same sign
+        // and Nanos being from the right range.
+        if (!IsNormalized(Units, Nanos))
+        {
+            throw new InvalidOperationException(@"Fixed decimal contains invalid values: Units={Units}; Nanos={Nanos}");
+        }
+        return Units + Nanos / NanoFactor;
+    }
+
+    public static FixedDecimal FromDecimal(decimal value)
+    {
+        // ToInt64() throws OverflowException if value is out of range of int64
+        var units = decimal.ToInt64(value);
+        var nanos = decimal.ToInt32((value - units) * NanoFactor);
+        return new FixedDecimal { Units = units, Nanos = nanos };
+    }
+}
+
+// TODO: we can also add extension methods for "decimal" type to convert into FixedDecimal values
+// by invoking value.ToFixedDecimal();
+
+// TODO: in C# we can also define implicit conversion operators between
+// FixedDecimal and decimal, but other WKT binding in protobuf C# don't do that
+```
+
+inspired by
+https://docs.microsoft.com/en-us/dotnet/architecture/grpc-for-wcf-developers/protobuf-data-types#creating-a-custom-decimal-type-for-protobuf
+and the way bindings for other WKTs are currently implemented in protobuf C#.
+
+#### Python bindings
+
+TODO: add design
+
+#### Java bindings
+
+TODO: add design for conversion between FixedDecimal message and BigDecimal type
+
+#### C++ bindings
+
+No special bindings: C++ doesn't have a standard way to represent the "decimal"
+type, and until such type exists, users can just access the raw message (the
+"units" and "nanos" fields directly).
+
+#### Go bindings
+
+TODO: add design
+
+#### Ruby bindings
+
+TODO: add design
+
+#### Obj-C bindings
+
+TODO: add design
+
+### JSON Representation of FixedDecimal
+
+Many WKT types provide a custom JSON serialization format (e.g.
+timestamp.proto).
+
+One way to represent the decimal number in JSON is with a string (e.g.
+`"123.25"`) that complies with the Java's
+[decimal floating point literal](https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-DecimalFloatingPointLiteral).
+
+On the other hand, just using the default JSON format (2 fields "units" and
+"nanos") doesn't require a special JSON handling to be implemented in all
+languages and is more consistent with TextFormat and the representation of the
+message itself. 2 fields "units" and "nanos" seem to be human readable enough
+for this to be an option.
+
+### Open-Source Plan
+
+Better support for decimal type has been requested externally at
+https://github.com/protocolbuffers/protobuf/issues/4406 and is heavily upvoted
+(but there isn't much progress on the issue).
+
+This change should definitely be released to open-source.
+
+## Drawbacks
+
+Not much beyond the risk of coming with bad design that provided suboptimal
+developer experience.
+
+## Alternatives Considered
+
+Several different .proto representations are proposed in this doc. Approach 1
+seems to have the best pros/cons.
+
+### Alternative 1: Use integer value and "scale/exponent"
+
+```
+message Decimal {
+ int64 value = 1;
+ int32 scale = 2;
+}
+```
+
+-   slightly less human-readable than the approach proposed.
+-   no protection against using too high/too low exponents which can lead to
+    losing significant digits if not used carefully.
+
+### Alternative 2: Low level representation
+
+```
+message Decimal {
+    // 96-bit mantissa broken into two chunks
+    // this representation matches exactly the C# decimal spec
+    // but not such a good match for other languages
+    uint64 mantissa_msb = 1;
+    uint32 mantissa_lsb = 2;
+    required sint32 exponent_and_sign = 3;
+}
+```
+
+Pros
+
+-   efficient on the wire
+-   1:1 mapping between the protobuf message and the language type (for some
+    languages)
+
+Cons
+
+-   matches spec of some implementations of "decimal" exactly, but doesn't match
+    analogous type in other languages
+-   too low level to be used as a raw message (=problem for cross-language
+    interoperability)
+-   not intelligible by humans unless good language bindings are provided
+
+### Alternative 3: String representation
+
+```
+message DecimalValue {
+
+ /* This string contains a decimal floating point literal in the format
+  * defined by the Java Language Specification: https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-DecimalFloatingPointLiteral
+  */
+
+ string value = 1;
+
+}
+```
+
+-   can represent numbers with arbitrary precision (some languages support that)
+-   high processing overhead due to string <-> number conversions
+-   bad efficiency on the wire
+
+## Unresolved Questions
+
+-   The request for this feature has been initiated by external users and it
+    would be good to somehow enable an open review process (external
+    contributors can bring useful insights and help prototyping the solution)
+
+## Appendix 1: Overview of built-in "decimal" type in programming languages
+
+The key features of "decimal" type are that its internal representation must be
+base-10 (as opposed to base-2 used for floating point types) and that it
+provides some level of rounding protection (in the sense that it forbids values
+that would require storing more significant digits than what the type itself is
+capable of - this is what can lead to imprecise representation of floating point
+values).
+
+C#: decimal 128bit in total, 96-bit bit mantissa, exponent -28..0 (28-29
+significant digits).
+
+Python: decimal 28 digits precision by default (can be adjusted)
+
+Java: BigDecimal uses arbitrary precision, the type is slightly heavy-weight
+
+Go: no officially recommended type, but there are community provided libraries
+with decimal support
+
+Ruby: BigDecimal uses arbitrary precision
+
+ObjC: NSDecimal stores up to 38digits, exponent -128 - 127
+
+Swift: Decimal essentially a value-type wrapper around NSDecimal, so same
+constraints/bugs as ObjC
+
+https://en.wikipedia.org/wiki/Decimal_data_type
+
+## Appendix 2: Overview of "decimal" type in common databases
+
+BigQuery NUMERIC type: "exact numeric value with 38 digits of precision and 9
+decimal digits of scale"
+https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#numeric-type
+
+TODO(jtattermusch)