Removing Digraphs
- Document number:
- Dxxxx
- Date:
2026-05-10 - Audience:
- SG22
- EWG
- Project:
- ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21
- Reply-to:
- Matthias Wippich <[email protected]>
Revision history
0.1. R0 May 2026
Original version of the paper.
1. Introduction
Digraphs are a complicated solution to a very old problem, that cause more problems than they solve in a modern environment. Digraphs also severely limit the design space of C++, although as we have seen with @P2996 we are already fine with special-casing our way out of this pickle.
This however introduces an interesting problem: If you need to use a source encoding that requires use of digraphs, you **cannot use all of C++26** directly.
Since we are most likely going to continue seeing similar problems, this paper proposes to remove digraphs from the language entirely.
2. Design Space
As mentioned before, digraphs severely limit the design space of C++. This isn't an entirely new insight, in fact we've ran into issues because of digraphs already and will most likely continue to run into new issues because of digraphs.
This leads to a fragmented language - some parts you can write if you need to use digraphs, some you don't. At the same time we're accumulating workarounds (like @CWG1104), which lead to an excessively complex language.
2.1. Splicers
Splicers from @P2996 were accepted for C++26 with the proposed syntax . However,
we are not allowed to use digraphs to spell this as .
While that seems to be in direct contradiction of the guarantees we're given in [[lex.digraph]/2](https://eel.is/c++draft/lex.digraph#2): > In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling.
it actually isn't. The tokens and are distinct preprocessing tokens rather than being
composed from and (or and respectively). Therefore it doesn't matter whether is a valid
alternative spelling for - the splicer syntax does not contain tokens.
Unfortunately that doesn't exactly help if your source encoding does not have angle brackets. In such
cases you cannot use this language feature directly - you'd have to find some workaround (such as
inventing some arbitrary replacement sequence that is expanded to after transcoding).
3. Interpolated string literals
The design problems stemming from digraphs do not end there. In some of the recent discussions around string interpolation (@P3412, @P3951) some interesting code was brought up. Consider the following:
In an interpolated string literal, the interpolated expression field is wrapped in curly braces. To parse
an interpolated string literal you must therefore switch between regular string literal parsing and expression parsing as soon
as you see a field introducer ().
However, once you parse the interpolated expression things get a little strange. is an alternative spelling of . We haven't yet
returned back to literal parsing, so this would yield the correct token. So, should we be able to signify the end of the interpolation field with ?
Since allowing anything but literal to terminate a interpolation field seems extremely surprising and will most likely not match user expectations, we
are noce again looking to disallow digraphs in this context.
Unfortunately that also means that we are once again looking to introduce a feature that you cannot directly use if your source encoding requires the use of the corresponding digraphs.
4. Compatibility
In C++14 we removed support for trigraphs. Since this has been quite a while back now, it is fair to assume that mitigations for users that required use of trigraphs but wanted to target anything beyond C++11 are in place.
While the situation around digraphs is arguably different and might require extra preprocessing to work with source encodings that do not have angle brackets or curly braces, mitigations will likely look similar to the ones required for trigraphs.
5. Wording
Make the following changes to the C++ Working Draft. All wording is relative to [N5032], the latest draft at the time of writing.
Lexical conventions [lex]
Preprocessing tokens [lex.pptoken]
Modify paragraph 5 as indicated5 If the input stream has been parsed into preprocessing tokens up to a given character:
5.1
If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as `R"`, the next preprocessing token shall be a raw string literal.
Between the initial and final double quote characters of the raw string, any transformations performed in phase 2 (line splicing) are reverted; this reversion is applied before any
5.2
Otherwise, if the next three characters are and the subsequent character is neither nor , the is treated as a preprocessing token by itself and not as the first character of the alternative token `<:`.
5.3
Otherwise, if the next three characters are and the subsequent character is not , or if the next three characters are the , is treated as a preprocessing token by itself and not as the first character of the preprocessing token .
[Note:
The tokens and cannot be composed from digraphs.
— end note]
5.4 Otherwise, the next preprocessing token is the longest sequence of [...]
Operators and punctuators [lex.operators]
Modify as indicated.
1 The lexical representation of C++ programs includes a number of preprocessing tokens that are used in the syntax of the preprocessor or are converted into tokens for operators and punctuators:
| | | |
| | | | | | | | |
| | | | | | | ||
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | ||
| | | | | | | ||
| | | |
Each
Alternative tokens [lex.digraphalt]
Rename to [lex.alt] and update all references accordingly.
1 Alternative token representations are provided for some operators and punctuators.
2 In all respects of the language, each alternative token behaves the same, respectively, as its primary token, except for its spelling.
[Note:
The “stringized” values ([cpp.stringize]) of and are different, maintaining the source spelling.
— end note]
The set of alternative tokens is defined in Table 3.
Modify Table 3
| Alternative | Primary | Alternative | Primary | ||
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | |
Remove footnote 10
🞰)
These include “digraphs” and additional reserved words.
The term “digraph” (token consisting of two characters) is not perfectly descriptive, since one of the alternative preprocessing-tokens is %:%: and of course several primary tokens contain two characters.
Nonetheless, those alternative tokens that aren't lexical keywords are colloquially known as “digraphs”.
Preprocessing directives [cpp]
Argument substitution [cpp.subst]
Modify Example 1 as indicated.
[Example:
— end example]
Annex C (informative) [diff]
C++ and ISO C++ 2026 [diff.cpp26]
[lex] lexical conventions [diff.cpp26.lex]
Add new entry
Affected subclause: [lex.digraph] Change: Removal of digraph support as a required feature. Rationale: Resolves fragmentation of the language, opens up design space and simplifies the language. Effect on original feature: Valid C++ 2026 code that uses the punctuation digraphs , , and may not be valid or have different semantics in this revision of C++. Implementations may choose to translate digraphs as specified in C++2026 if they appear outside of a raw string literal, as part of the implementation-defined mapping from input source file characters to the translation character set.
6. Acknowledgements
Thanks to Jan Schultke for the markup language and document generator used for this paper.