Given:
X ::= BIT STRING {a(0), b(1)} (SIZE(2, ...))
what is the correct length for the PER encoding to use for the value '00'B?
I argue that the encoding should use a length of 0, because X.691 16.3 calls for trimming to "the smallest size capable of carrying this value and satisfies the effective size constraint", and the constraint on X, by being extensible, allows for values having length 0.
However, "satisfies the effective size constraint" doesn't seem to be very well defined, and I know of tools that encode using a length of 2.
Nonetheless, even if the above phrase is not well defined, suppose we had:
X-V2 ::= BIT STRING {a(0), b(1)} (SIZE(2, ..., 0))
then I think it becomes hard to say, under any interpretation, that a length of 0 does not satisfy the effective size constraint. That means for X-V2, '00'B would be encoded using a length of 0. Furthermore, I believe in order to enable canonical PER encodings, X and X-V2 must encode '00'B in the same way, which means also using a length of 0 for '00'B for type X.
To my mind, the ambiguity arises from exactly what values are considered as making up, or being "present", in X, because X.691 gives these relevant definitions:
3.7.8 effective size constraint (for a constrained string type): A single finite size constraint that could be applied to a built-in string type and whose effect would be to permit all and only those lengths that can be present in the constrained string type.
10.3.10 The effective size constraint for a constrained type is a single size constraint such that a size is permitted if and only if there is some value of the constrained type that has that (permitted) size.
In a couple of places, X.680 leads us to believe that an extensible type consists of the values in the root plus any values in the extension additions:
Annex I 4.2.3 :
A1 ::= INTEGER (1..32, ... , 33..128)A1 is extensible, and contains values 1 to 128 with 1 to 32 in the root and 33 to 128 as extension additions.
50.1:
NOTE 6 – The elements that are referenced by "ElementSetSpecs" is the union of the elements referenced by the "RootElementSetSpec" and "AdditionalElementSetSpec" (when present).
But then it also seems to say that an extensible type contains all of the same values that are in the type being constrained:
6:
In formal terms, an abstract syntax defined by the extensible type X contains not only the values of type X, but also the values of all types that are extension-related to X.
and 3.8.38:
extension-related: Two types that have the same extension root, where one was created by adding zero or more extension additions to the other.
I think section 6 rules here and the language in Annex I 4.2.3 and in 50.1 note 6 is being imprecise.
Update: Discussion of X.691 16.3 and 16.6 Alessandro's answer (thanks for your input) applies 16.3 only after 16.6 is applied. Here's my interpretation of the relationship between the two.
First, 16.3 is an unqualified statement. It does not say it applies only if the PER-visible constraint is not extensible, or, in the case where the constraint is extensible, only if the length of the BIT STRING falls inside the extension root of the constraint. It also does not say anything that requires us to look at 16.6 to understand how to apply 16.3. It uses the phrase "effective size constraint", but that is a phrase that is elsewhere defined without depending on 16.6, although the definition is somewhat unclear, as I noted above. Apparently, though, an "effective size constraint" can be extensible because Annex B.3 has a IA5String example with an effective size constraint that is extensible:
A13 ::= IA5String (SIZE(1..10, ...) ^ FROM("A".."D"))
-- A13 has an extensible effective size constraint of SIZE(1..10,...)
Second, when 16.6 says "In the latter case the length and value shall be encoded as if no extension is present in the constraint." it does not say to go back to 16.1 and start over, or even to invoke 16.3 at that time. Rather, I take this phrase to simply mean that we should act as if there were no extension marker as we continue on to the next point, 16.7, which says "If an extension marker is not present in the constraint specification of the bitstring type, then 16.8 to 16.11 apply." Indeed, 16.8 - 16.11 are concerned with how to encode the length, which means that the two branches of 16.6 both end up specifying how the length is encoded. That makes this a natural reading of 16.6.
Third, 16.6 requires us to take one of two branches depending on whether the "length of this encoding" is in the extension root or not. The "length of this encoding" can only mean the "length of the string value" or the "length that is being used in the encoding". It can't mean the "length that is being used in the encoding" unless 16.3 logically applies before 16.6.
Now, suppose we have
Y ::= BIT STRING {a(0), b(1)} (SIZE(0..4, ...))
and suppose that in 16.6 we are to understand "length of this encoding" to mean simply the "length of the string" and that 16.3 only applies when the length of the string is in the extension root. Then, a Y of '11000'B is not trimmed (5 not in the root), but a Y of '1100'B is trimmed (to 2; 4 is in the root) - a result which seems somewhat odd (that one is trimmed but the longer one is not).
Moreover, for a Y value of '1100'B, in applying 16.6, we treated "the length of this encoding" as 4 (the length of the string) but then the length in the encoding is 2. That seems a very strange use of English indeed!
It is 16.3 that creates the possibility that the length of the string and the length used in the encoding might not be equal. When 16.6 then refers to the "length of this encoding", the natural reading is to assume that 16.3, which introduced the idea, logically precedes 16.6 and has therefore already been applied to determine the length that will be used in the encoding, which is what "length of this encoding" refers to. The wording itself - length of this encoding - suggests this reading.
Under my interpretation, there are cases where a value is encoded as an extension value when it would seem more sensible to be encoded as a root value. This happens whenever the root excludes values in the range 0..k because then the trimming can then result in a size shorter than what the root covers. Nonetheless, I think that is what the spec calls for, on the most natural reading. Personally, I think it is clear 16.3 logically precedes 16.6 and the ambiguity arises it what it means for a size to "satisfy" an "effective size constraint" in 16.3. Does 0 "satisfy" an effective size constraint of SIZE(2, ...)? I think we have to say it does; at least the spec is not clear that it doesn't, as far as I can see.
(End of update)
Update 2: Counterarguments concerning 16.3 and 16.6 First, as Alessandro notes, 17.3 (for octet strings) and 20.4 (for sequence of) parallel 16.6 and the one refers to the "length of this encoding" while the other refers to the "number of components in this encoding", both of which really just refer to the length or to the number of components of the value. This gives us reason to think 16.6 also refers to just the length of the value when it says "length of this encoding". If it weren't for 16.3, they would be one in the same, but given 16.3, there is the potential for confusion. Still, this might just be an unintentional and unfortunate choice of words, and it could be they meant to refer to the length of original value.
Second, it is pretty clear that when 16.6 requires encoding "as if no extension is present in the constraint", this applies not just to the following clauses but also to a prior clause, namely 16.4, since, otherwise, extensible types would never use an optimized encoding for the length (ub would be unset in such cases). So, if the "as if" applies to one clause before 16.6, then why not to another clause, namely 16.3? Then for type X, '00'B has length of 2, which is in the root, so according to 16.6 we encode it as if the extension weren't present. That means 16.3 does not trim the string, as the effective size constraint is then SIZE(2), not SIZE(2, ...).
This interpretation raises a new question: what about an X of '1110'B? The length is not in the extension root, so 16.6 does not have us encode it as if the extension were not present and, supposedly, 16.3 does not apply (though it is not absolutely clear that 16.3 only applies to "root" values). Thus, X.691 does not clearly require trimming the trailing zero bit in this case. However, Paul Thorpe argues it should be trimmed nonetheless. After all, X.680 22.7 tells us that '1110'B and '111'B should be treated (by application designers) as having the same semantics, and encoding rules can arbitrarily add or remove trailing bits. Therefore, some encoding rules could encode '1110'B using the exact same bits as for '111'B. However, this doesn't mean that all encoding rules must do so, nor does it dictate that PER, in particular, must do so. Still, it does mean that if a PER implementation trims '1110'B to '111'B when encoding, a user at least can't complain they got back an unepxected value of '111'B after decoding. However, this doesn't answer what the correct encoding is, and there must be a single correct encoding if we're to have canonical encodings.
A similar question as for '1110'B arises for an X of '1100'B. The length is 4, which length is not in the root, but it is semantically equivalent to '11'B, whose length is in the root. It seems not unreasonable to encode '1100'B the same as '11'B, but the specs don't make it absolutely clear that this is what PER requires. Similarly for an X of ''B, with a length of 0, which length is not in the root, but it is also semantically equivalent to '00'B, which, under this alternative interpretation, should be encoded as length 2.
IMO, ITU-T needs to make some clarifications here as I don't think the arguments one way or the other are decisive. I believe the specification is inherently vague.
(End of update 2)
Basically, I'm looking for a contrary argument or for someone to point out what I've missed, if I have missed something.