5 năm trước cách đây · 1d6d1a698d
--- a/docs/proposal_decimal_wkt.md
+++ b/docs/proposal_decimal_wkt.md
@@ -0,0 +1,348 @@
 
															+# Protobuf "decimal" well known type
														
 
															+
														
 
															+
														
 
															+Created: | 2019-11-14
														
 
															+
														
 
															+State:   | PROPOSED
														
 
															+
														
 
															+Author:  | jtattermusch@
														
 
															+
														
 
															+
														
 
															+## Summary
														
 
															+
														
 
															+Proposes a new "decimal" well known type as a standard representation for
														
 
															+accurate fractional (e.g. monetary) values and provide language bindings for
														
 
															+languages that have built-in language support for such a type.
														
 
															+
														
 
															+## Motivation
														
 
															+
														
 
															+"decimal" type is important for many applications (especially financial
														
 
															+applications) that need to express fractional values (such as monetary amounts)
														
 
															+accurately. While many programming languages offer a built-in support for
														
 
															+"decimal" type, protobuf currently doesn't offer an easy way to utilize those
														
 
															+built-in types and users are forced to resort to workarounds and to reinvent the
														
 
															+wheel.
														
 
															+
														
 
															+Better support for decimal type has been requested at
														
 
															+https://github.com/protocolbuffers/protobuf/issues/4406 and is heavily upvoted
														
 
															+(but there isn't much progress on the issue).
														
 
															+
														
 
															+## Detailed Explanation
														
 
															+
														
 
															+### Requirements for the new decimal well known type
														
 
															+
														
 
															+-   introduce the "decimal" type as a "well-known type" (as opposed to extending
														
 
															+    the protobuf wire protocol, which would be way too complicated). Protobuf
														
 
															+    library implementations in different languages will then provide
														
 
															+    language-specific bindings and helpers to integrate with the native
														
 
															+    "decimal" type supported by given language.
														
 
															+-   Provide message representation that is reasonably usable if the well known
														
 
															+    type doesn't have special handling in given language (= not all languages
														
 
															+    will have a special binding for the new well known type and the type must
														
 
															+    still be usable in its "raw form" as a protobuf message).
														
 
															+-   Match semantics of decimal type in popular programming languages (if
														
 
															+    possible, in different languages the built-in decimal type can have slightly
														
 
															+    different properties).
														
 
															+-   Stay consistent with other pre-existing commonly used .proto types so that
														
 
															+    the new WKT fits well into the existing ecosystem (e.g. there's a
														
 
															+    money.proto)
														
 
															+
														
 
															+### Rollout plan
														
 
															+
														
 
															+-   Decide which .proto definition of the "decimal" type approach to use (see
														
 
															+    below)
														
 
															+-   Check in the well known type .proto definition of the type
														
 
															+-   Language owners can provide their own bindings for the WKT
														
 
															+-   Because the "decimal" WKT will be designed to be usable even without special
														
 
															+    language bindings, initially the language-specific bindings for the WKT can
														
 
															+    be marked as "experimental" in each of the languages and they can be later
														
 
															+    stabilized once they prove themselves.
														
 
															+
														
 
															+### Proposal: message with "units" and "nanos"
														
 
															+
														
 
															+```
														
 
															+// file:fixed_decimal.proto
														
 
															+package google.protobuf;
														
 
															+
														
 
															+
														
 
															+message FixedDecimal {
														
 
															+
														
 
															+    // Whole units part of the amount
														
 
															+    int64 units = 1;
														
 
															+
														
 
															+    // Nano units of the amount (10^-9)
														
 
															+    // Must be same sign as units
														
 
															+    // Example: The value -1.25 is represented as units=-1 and nanos=-250000000
														
 
															+    sfixed32 nanos = 2;
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+Pros
														
 
															+
														
 
															+-   simple concept, easy to understand for users
														
 
															+-   values are human readable and easy to understand without specialized support
														
 
															+-   Very similar to .proto definitions that have already been used internally at
														
 
															+    google
														
 
															+-   Consistent with other widely used .proto definitions (such as money.proto)
														
 
															+-   range of values can be safely expressed by "decimal" type in all languages
														
 
															+    that support such type.
														
 
															+-   representable value range should be good enough to vast majority of
														
 
															+    applications (e.g. for financial applications)
														
 
															+-   Currently this is recommended for C# users by Microsoft (= which gives some
														
 
															+    indication that this approach works well in the .NET ecosystem).
														
 
															+    https://docs.microsoft.com/en-us/dotnet/architecture/grpc-for-wcf-developers/protobuf-data-types#decimals
														
 
															+
														
 
															+Cons
														
 
															+
														
 
															+-   lower range and resolution than "decimal" types in all languages (the good
														
 
															+    news is all common decimal types can represent the values carried by the
														
 
															+    proto, but converting from lang-specific decimal values can lose precision /
														
 
															+    throw exception)
														
 
															+
														
 
															+The type of "units" can be either `int64` or `sint64`. `sint64` is more
														
 
															+efficient if negative values are common, but
														
 
															+`money.proto`
														
 
															+uses `int64`. So the choice is either staying more consistent with `money.proto`
														
 
															+or aiming for the highest efficiency.
														
 
															+
														
 
															+The type of "nanos" can be either `int32`, `sfixed32` or `sint32`. `int32` is
														
 
															+what `money.proto` uses, but the signed types will be more efficient for
														
 
															+negative values. One advantage of `sfixed32` over `sint32` is that the "nanos"
														
 
															+value is likely to be a large number (numerical values like 0.5, 0.25 etc. are
														
 
															+more common than e.g. 0.000000001), and so `sfixed32` has potential to be more
														
 
															+efficient that `sint32` or `int32` (we'd need to do some measurements).
														
 
															+
														
 
															+The sign of "units" and "nanos" is required to be the same (money.proto uses the
														
 
															+same restriction). Nevertheless, there are other types (e.g. timestamp.proto),
														
 
															+where the "nanos" part is always positive. The former seems to make more sense
														
 
															+for representing decimal numbers, because e.g. -7.5 will be represented as
														
 
															+`units=-7` and `nanos=-500000000` and the represented value can always be
														
 
															+computed as `units + 0.000000001 * nanos` regardless of the sign.
														
 
															+
														
 
															+TODO(jtattermusch): do the signs of units vs nanos influence conversion speed
														
 
															+between the message fields and the numeric representation? it might for decimal
														
 
															+-> FixedDecimal direction of conversion
														
 
															+
														
 
															+The message name `FixedDecimal`(in `fixed_decimal.proto`) is chosen to make the
														
 
															+nature of the WKT explicit (and to express that this is not necessarily a 1:1 of
														
 
															+language's "decimal" type).
														
 
															+
														
 
															+### API Changes
														
 
															+
														
 
															+For languages that provide a built-in "decimal" type (see overview in Appendix),
														
 
															+language owners should add an API that allows easy conversion between
														
 
															+`FixedDecimal` WKT and the built-in "decimal" type in given language.
														
 
															+
														
 
															+The API changes in each language should be purely additive. It's fine for
														
 
															+language implementations to provide the bindings independent of other languages
														
 
															+(e.g. C# might provide the bindings sooner than some other language).
														
 
															+
														
 
															+The value range representable by the proposed FixedDecimal message (at most ~29
														
 
															+significant digits) is smaller than the representable range for all the
														
 
															+language-specific "decimal" types. Therefore, there should be no issue
														
 
															+converting FixedDecimal message to any of the decimal types. In the opposite
														
 
															+direction (setting a value from a "decimal" type into a FixedDecimal), there can
														
 
															+be an "out of range" error and language implentations should be consistent in
														
 
															+how this situation is handled.
														
 
															+
														
 
															+Language specific-APIs should generally allow accessing the FixedDecimal values
														
 
															+in two modes: 1. access the raw FixedDecimal value (the "units" and "nanos"
														
 
															+fields) 2. access the language-specific value as a "decimal" type
														
 
															+
														
 
															+#### C# bindings
														
 
															+
														
 
															+```
														
 
															+// extra methods for FixedDecimal generated class
														
 
															+public partial class FixedDecimal
														
 
															+{
														
 
															+    private const decimal NanoFactor = 1_000_000_000;
														
 
															+
														
 
															+    public decimal ToDecimal()
														
 
															+    {
														
 
															+        // IsNormalized checks for Units and Nanos having the same sign
														
 
															+        // and Nanos being from the right range.
														
 
															+        if (!IsNormalized(Units, Nanos))
														
 
															+        {
														
 
															+            throw new InvalidOperationException(@"Fixed decimal contains invalid values: Units={Units}; Nanos={Nanos}");
														
 
															+        }
														
 
															+        return Units + Nanos / NanoFactor;
														
 
															+    }
														
 
															+
														
 
															+    public static FixedDecimal FromDecimal(decimal value)
														
 
															+    {
														
 
															+        // ToInt64() throws OverflowException if value is out of range of int64
														
 
															+        var units = decimal.ToInt64(value);
														
 
															+        var nanos = decimal.ToInt32((value - units) * NanoFactor);
														
 
															+        return new FixedDecimal { Units = units, Nanos = nanos };
														
 
															+    }
														
 
															+}
														
 
															+
														
 
															+// TODO: we can also add extension methods for "decimal" type to convert into FixedDecimal values
														
 
															+// by invoking value.ToFixedDecimal();
														
 
															+
														
 
															+// TODO: in C# we can also define implicit conversion operators between
														
 
															+// FixedDecimal and decimal, but other WKT binding in protobuf C# don't do that
														
 
															+```
														
 
															+
														
 
															+inspired by
														
 
															+https://docs.microsoft.com/en-us/dotnet/architecture/grpc-for-wcf-developers/protobuf-data-types#creating-a-custom-decimal-type-for-protobuf
														
 
															+and the way bindings for other WKTs are currently implemented in protobuf C#.
														
 
															+
														
 
															+#### Python bindings
														
 
															+
														
 
															+TODO: add design
														
 
															+
														
 
															+#### Java bindings
														
 
															+
														
 
															+TODO: add design for conversion between FixedDecimal message and BigDecimal type
														
 
															+
														
 
															+#### C++ bindings
														
 
															+
														
 
															+No special bindings: C++ doesn't have a standard way to represent the "decimal"
														
 
															+type, and until such type exists, users can just access the raw message (the
														
 
															+"units" and "nanos" fields directly).
														
 
															+
														
 
															+#### Go bindings
														
 
															+
														
 
															+TODO: add design
														
 
															+
														
 
															+#### Ruby bindings
														
 
															+
														
 
															+TODO: add design
														
 
															+
														
 
															+#### Obj-C bindings
														
 
															+
														
 
															+TODO: add design
														
 
															+
														
 
															+### JSON Representation of FixedDecimal
														
 
															+
														
 
															+Many WKT types provide a custom JSON serialization format (e.g.
														
 
															+timestamp.proto).
														
 
															+
														
 
															+One way to represent the decimal number in JSON is with a string (e.g.
														
 
															+`"123.25"`) that complies with the Java's
														
 
															+[decimal floating point literal](https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-DecimalFloatingPointLiteral).
														
 
															+
														
 
															+On the other hand, just using the default JSON format (2 fields "units" and
														
 
															+"nanos") doesn't require a special JSON handling to be implemented in all
														
 
															+languages and is more consistent with TextFormat and the representation of the
														
 
															+message itself. 2 fields "units" and "nanos" seem to be human readable enough
														
 
															+for this to be an option.
														
 
															+
														
 
															+### Open-Source Plan
														
 
															+
														
 
															+Better support for decimal type has been requested externally at
														
 
															+https://github.com/protocolbuffers/protobuf/issues/4406 and is heavily upvoted
														
 
															+(but there isn't much progress on the issue).
														
 
															+
														
 
															+This change should definitely be released to open-source.
														
 
															+
														
 
															+## Drawbacks
														
 
															+
														
 
															+Not much beyond the risk of coming with bad design that provided suboptimal
														
 
															+developer experience.
														
 
															+
														
 
															+## Alternatives Considered
														
 
															+
														
 
															+Several different .proto representations are proposed in this doc. Approach 1
														
 
															+seems to have the best pros/cons.
														
 
															+
														
 
															+### Alternative 1: Use integer value and "scale/exponent"
														
 
															+
														
 
															+```
														
 
															+message Decimal {
														
 
															+ int64 value = 1;
														
 
															+ int32 scale = 2;
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+-   slightly less human-readable than the approach proposed.
														
 
															+-   no protection against using too high/too low exponents which can lead to
														
 
															+    losing significant digits if not used carefully.
														
 
															+
														
 
															+### Alternative 2: Low level representation
														
 
															+
														
 
															+```
														
 
															+message Decimal {
														
 
															+    // 96-bit mantissa broken into two chunks
														
 
															+    // this representation matches exactly the C# decimal spec
														
 
															+    // but not such a good match for other languages
														
 
															+    uint64 mantissa_msb = 1;
														
 
															+    uint32 mantissa_lsb = 2;
														
 
															+    required sint32 exponent_and_sign = 3;
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+Pros
														
 
															+
														
 
															+-   efficient on the wire
														
 
															+-   1:1 mapping between the protobuf message and the language type (for some
														
 
															+    languages)
														
 
															+
														
 
															+Cons
														
 
															+
														
 
															+-   matches spec of some implementations of "decimal" exactly, but doesn't match
														
 
															+    analogous type in other languages
														
 
															+-   too low level to be used as a raw message (=problem for cross-language
														
 
															+    interoperability)
														
 
															+-   not intelligible by humans unless good language bindings are provided
														
 
															+
														
 
															+### Alternative 3: String representation
														
 
															+
														
 
															+```
														
 
															+message DecimalValue {
														
 
															+
														
 
															+ /* This string contains a decimal floating point literal in the format
														
 
															+  * defined by the Java Language Specification: https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-DecimalFloatingPointLiteral
														
 
															+  */
														
 
															+
														
 
															+ string value = 1;
														
 
															+
														
 
															+}
														
 
															+```
														
 
															+
														
 
															+-   can represent numbers with arbitrary precision (some languages support that)
														
 
															+-   high processing overhead due to string <-> number conversions
														
 
															+-   bad efficiency on the wire
														
 
															+
														
 
															+## Unresolved Questions
														
 
															+
														
 
															+-   The request for this feature has been initiated by external users and it
														
 
															+    would be good to somehow enable an open review process (external
														
 
															+    contributors can bring useful insights and help prototyping the solution)
														
 
															+
														
 
															+## Appendix 1: Overview of built-in "decimal" type in programming languages
														
 
															+
														
 
															+The key features of "decimal" type are that its internal representation must be
														
 
															+base-10 (as opposed to base-2 used for floating point types) and that it
														
 
															+provides some level of rounding protection (in the sense that it forbids values
														
 
															+that would require storing more significant digits than what the type itself is
														
 
															+capable of - this is what can lead to imprecise representation of floating point
														
 
															+values).
														
 
															+
														
 
															+C#: decimal 128bit in total, 96-bit bit mantissa, exponent -28..0 (28-29
														
 
															+significant digits).
														
 
															+
														
 
															+Python: decimal 28 digits precision by default (can be adjusted)
														
 
															+
														
 
															+Java: BigDecimal uses arbitrary precision, the type is slightly heavy-weight
														
 
															+
														
 
															+Go: no officially recommended type, but there are community provided libraries
														
 
															+with decimal support
														
 
															+
														
 
															+Ruby: BigDecimal uses arbitrary precision
														
 
															+
														
 
															+ObjC: NSDecimal stores up to 38digits, exponent -128 - 127
														
 
															+
														
 
															+Swift: Decimal essentially a value-type wrapper around NSDecimal, so same
														
 
															+constraints/bugs as ObjC
														
 
															+
														
 
															+https://en.wikipedia.org/wiki/Decimal_data_type
														
 
															+
														
 
															+## Appendix 2: Overview of "decimal" type in common databases
														
 
															+
														
 
															+BigQuery NUMERIC type: "exact numeric value with 38 digits of precision and 9
														
 
															+decimal digits of scale"
														
 
															+https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#numeric-type
														
 
															+
														
 
															+TODO(jtattermusch)