Removing Digraphs

Document number:
Dxxxx
Date:
2026-05-10
Audience:
SG22
EWG
Project:
ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21
Reply-to:
Matthias Wippich <[email protected]>

This paper proposes removal of digraphs from the C++ language.

Revision history

0.1. R0 May 2026

Original version of the paper.

1. Introduction

Digraphs are a complicated solution to a very old problem, that cause more problems than they solve in a modern environment. Digraphs also severely limit the design space of C++, although as we have seen with @P2996 we are already fine with special-casing our way out of this pickle.

This however introduces an interesting problem: If you need to use a source encoding that requires use of digraphs, you **cannot use all of C++26** directly.

Since we are most likely going to continue seeing similar problems, this paper proposes to remove digraphs from the language entirely.

2. Design Space

As mentioned before, digraphs severely limit the design space of C++. This isn't an entirely new insight, in fact we've ran into issues because of digraphs already and will most likely continue to run into new issues because of digraphs.

This leads to a fragmented language - some parts you can write if you need to use digraphs, some you don't. At the same time we're accumulating workarounds (like @CWG1104), which lead to an excessively complex language.

2.1. Splicers

Splicers from @P2996 were accepted for C++26 with the proposed syntax [: expr :]. However, we are not allowed to use digraphs to spell this as <:: expr ::>.

While that seems to be in direct contradiction of the guarantees we're given in [[lex.digraph]/2](https://eel.is/c++draft/lex.digraph#2): > In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling.

it actually isn't. The tokens [: and :] are distinct preprocessing tokens rather than being composed from [ and : (or : and ] respectively). Therefore it doesn't matter whether <: is a valid alternative spelling for [ - the splicer syntax does not contain [ tokens.

Unfortunately that doesn't exactly help if your source encoding does not have angle brackets. In such cases you cannot use this language feature directly - you'd have to find some workaround (such as inventing some arbitrary replacement sequence that is expanded to [: after transcoding).

3. Interpolated string literals

The design problems stemming from digraphs do not end there. In some of the recent discussions around string interpolation (@P3412, @P3951) some interesting code was brought up. Consider the following:

t"foo { bar %> baz"

In an interpolated string literal, the interpolated expression field is wrapped in curly braces. To parse an interpolated string literal you must therefore switch between regular string literal parsing and expression parsing as soon as you see a field introducer ({).

However, once you parse the interpolated expression things get a little strange. %> is an alternative spelling of }. We haven't yet returned back to literal parsing, so this would yield the correct token. So, should we be able to signify the end of the interpolation field with %>?

Since allowing anything but literal } to terminate a interpolation field seems extremely surprising and will most likely not match user expectations, we are noce again looking to disallow digraphs in this context.

Unfortunately that also means that we are once again looking to introduce a feature that you cannot directly use if your source encoding requires the use of the corresponding digraphs.

4. Compatibility

In C++14 we removed support for trigraphs. Since this has been quite a while back now, it is fair to assume that mitigations for users that required use of trigraphs but wanted to target anything beyond C++11 are in place.

While the situation around digraphs is arguably different and might require extra preprocessing to work with source encodings that do not have angle brackets or curly braces, mitigations will likely look similar to the ones required for trigraphs.

5. Wording

Make the following changes to the C++ Working Draft. All wording is relative to [N5032], the latest draft at the time of writing.

Lexical conventions [lex]

Preprocessing tokens [lex.pptoken]

Modify paragraph 5 as indicated

5 If the input stream has been parsed into preprocessing tokens up to a given character:

5.1 If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as `R"`, the next preprocessing token shall be a raw string literal. Between the initial and final double quote characters of the raw string, any transformations performed in phase 2 (line splicing) are reverted; this reversion is applied before any d-char, r-char, or delimiting parenthesis is identified. The raw string literal is defined as the shortest sequence of characters that matches the raw-string pattern

encoding-prefixopt R raw-string

5.2 Otherwise, if the next three characters are <:: and the subsequent character is neither : nor >, the < is treated as a preprocessing token by itself and not as the first character of the alternative token `<:`.

5.3 Otherwise, if the next three characters are [:: and the subsequent character is not :, or if the next three characters are :>, the [ is treated as a preprocessing token by itself and not as the first character of the preprocessing token [:.

[Note: The tokens [: and :] cannot be composed from digraphs. — end note]

5.4 Otherwise, the next preprocessing token is the longest sequence of [...]

Operators and punctuators [lex.operators]

Modify as indicated.

1 The lexical representation of C++ programs includes a number of preprocessing tokens that are used in the syntax of the preprocessor or are converted into tokens for operators and punctuators:

preprocessing-op-or-punc:
preprocessing-operator
operator-or-punctuator
preprocessing-operator:
one of
# ## %: %:%:
operator-or-punctuator:
one of
{ } [ ] ( ) [: :]
<% %> <: :> ; : ...
? :: . .* -> ->* ^^ ~
! + - * / % ^ & |
= += -= *= /= %= ^= &= |=
== != < > <= >= <=> && ||
<< >> <<= >>= ++ -- ,
and or xor not bitand bitor compl
and_eq or_eq xor_eq not_eq

Each operator-or-punctuator is converted to a single token in translation phase 6 ([lex.phases]).

Alternative tokens [lex.digraphalt]

Rename to [lex.alt] and update all references accordingly.

1 Alternative token representations are provided for some operators and punctuators.

2 In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling.

[Note: The “stringized” values ([cpp.stringize]) of [ and <: are different, maintaining the source spelling. — end note]

The set of alternative tokens is defined in Table 3.

Modify Table 3

Alternative Primary Alternative Primary Alternative Primary
<% { and && and_eq &=
%> } bitor | or_eq |=
<: [ or || xor_eq ^=
:> ] xor ^ not !
%: # compl ~ not_eq !=
%:%: ## bitand &

Remove footnote 10

🞰) These include “digraphs” and additional reserved words. The term “digraph” (token consisting of two characters) is not perfectly descriptive, since one of the alternative preprocessing-tokens is %:%: and of course several primary tokens contain two characters. Nonetheless, those alternative tokens that aren't lexical keywords are colloquially known as “digraphs”.

Preprocessing directives [cpp]

Argument substitution [cpp.subst]

Modify Example 1 as indicated.

[Example:

#define LPAREN() ( #define G(Q) 42 #define F(R, X, ...) __VA_OPT__(G R X) ) int x = F(LPAREN(), 0, <:[-); // replaced by int x = 42;

end example]

Annex C (informative) [diff]

C++ and ISO C++ 2026 [diff.cpp26]

[lex] lexical conventions [diff.cpp26.lex]

Add new entry

Affected subclause: [lex.digraph]

Change: Removal of digraph support as a required feature.

Rationale: Resolves fragmentation of the language, opens up design space and simplifies the language.

Effect on original feature: Valid C++ 2026 code that uses the punctuation digraphs <%, %>, <: and :> may not be valid or have different semantics in this revision of C++. Implementations may choose to translate digraphs as specified in C++2026 if they appear outside of a raw string literal, as part of the implementation-defined mapping from input source file characters to the translation character set.

6. Acknowledgements

Thanks to Jan Schultke for the markup language and document generator used for this paper.


7. References

[N5032] Thomas Köppe. Working Draft Programming Languages — C++ 2025-12-15 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/n5032.pdf