What should our "static vector" type look like

scottdixon · March 7, 2023, 5:17pm

Per Issue #6 in the CETL project I’d like to start a discussion about the design of our static vector type. I’ll repost the issue here:

Add a resizable container that is backed by a fixed capacity buffer. Options include:

re-implement std::vector where max_size is the size of the internal storage (instead of just std::numeric_limits<size_t>::max()). Provide appropriate STL allocators to support static buffers. (this appears to be the approach ETL took).
Port Nunavut’s VariableLengthArray and expand its functionality to be more generic.
re-implement boost static vector but as a standalone type (i.e. without boost dependencies) (also see static_vector)

Thoughts?

pavel.kirienko · March 7, 2023, 5:33pm

I have no strong opinion either way here as long as the solution has a bit-level-storage specialization for booleans and its API is similar to std::vector. I think option 2 is probably the easiest at this point, no? If so, it might be pragmatic to choose that.

This issue is tangentially related also:

github.com/OpenCyphal/nunavut

C++: Serialization of bit arrays (both fixed and variable-length) is slow

opened 12:29PM - 03 Mar 23 UTC

pavel-kirienko

under consideration

Large bit arrays can be found even in the standard data type set, so this matter… may potentially affect common applications. In C, the memory storage format of bit arrays is the same as their wire representation, which enables serialization via `memcpy`. The application can manipulate the contents using `nunavutCopyBits`, `nunavutSetBit`, and `nunavutGetBit`. In Python, bit arrays are stored using NumPy arrays and serialized using `numpy.packbits`. In C++, the current implementation (de)serializes arrays bit-by-bit which is likely to cause performance issues. Sadly the C++ implementation cannot enforce a wire-compatible memory storage format because it has to be compatible with standard containers like `std::vector<bool>` and `std::bitset`. Are there any ideas on how to improve the bit array serialization without requiring the use of custom bit containers where the memory storage format is known? In the specialization of `VariableLengthArray<bool>` that I implemented in #284 the memory storage format is the same as the wire format but currently, the serialization methods cannot benefit from that.

Hypothetically speaking, what if Nunavut or Nunavut-generated code was able to determine if the chosen bit array/vector implementation uses a DSDL-compatible in-memory bit array representation and, in that case, resort to a faster memcpy-based serialization method instead of copying the bits one by one? It would only resort to the latter, slow, generic option if the generated code uses the standard containers that lack fast serialization hooks.

erik.rainey · March 7, 2023, 8:47pm

As I mentioned in the other post, I don’t agree that it should be modelled on vector at all as I think we want to aim for an exceptionless system here. It should be explicitly different to underscore the static nature as well. It’s not going to be that different given how simple things are here, but for example:

Constructors (basically the same, but no exceptions)

Default
Parameter w/ desired size - what to do if bigger than static size, assert?
Copy Constructor
Move - meaningless unless the object inside need to be moved.
Initializer List

Operators:

Index operators (w/ assert on out of bounds or %?)

Reasonable, but irksome API:

size() - this is not the sizeof or even the number of elements but actually the number of active, contiguous elements. (should have been called count() and this should return the byte size() of the active contigous region so you could pair data() and size() in a memcpy/memset)
capacity() - this is the actual statically given size, not some max() from numeric_limits.
clear() - sets active elements to zero
push_back() - adds one element to the end until it hits the end and then what? (no return code!)
pop_back() - subs one element from the end until it this the beg() and when what? (no return code!)
emplace_back() - adds one element to the end until it passes the capacity and then what? (no return code)

Iterator API

beg/end - are these based on active elements or the static elements?
front/back (w/ asserts for zero size) - asserts on zero active elements?

Questionable/Undesirable API:

insert - would require copy-shuffle all the way down the memory and still have no return code, just an iterator (excepts exceptions)
erase - reverse problem for insert
shink_to_fit - it literally can’t
resize - can’t do this either (past a certain point)
get_allocator - there isn’t one
reserve (past what it already has) - api has no return code and we shouldn’t support exceptions

If we want to implement all these questionable APIs so that it can be used by <algorithms>, then I would propose our own <algorithms> too. These API should be exceptionless and have a cleaner interface.

pavel.kirienko · March 7, 2023, 9:41pm

The desire to ditch the conventional API, however inconvenient it is in an exceptionless environment, is very unsettling as there is a large amount of generic code out there that expects containers and other basic entities to support the standard STL-like API. Ditching the standard APIs amounts to pronouncing CETL unusable in all applications except those that are specifically designed for CETL, which I understand is against its design goals. I am strongly against this proposal.

scottdixon · March 7, 2023, 10:02pm

We do say, in the CETL README:

Where CETL types provide functionality found in newer C++ standards the CETL version will prefer mimicking the standard over mimicking other support libraries like Boost.

That said, we did not write a tenet about compatibility with existing and newer c++ concepts (or Concepts) like container but I tend to agree with Pavel here; if we can make it interoperable with STL we should.

I don’t think this means we require exceptions. Take a look at, the much beleaguered, Nunavut VariableLengthArray. It works with -fnoexceptions and documents where undefined behaviour will occur and how the user must defend against allowing this. For example, in the documentation for push_back we have:

///
/// Construct a new element on to the back of the array. Grows size by 1 and may 
//// grow capacity
///
/// If exceptions are disabled the caller must check before and after to see if the 
/// size grew to determine success. If using exceptions this method throws 
/// {@code std::length_error} if the size of this collection is at capacity
/// or {@code std::bad_alloc} if the allocator failed to provide enough memory.
///
/// If exceptions are disabled use the following logic:
///
///     const size_t size_before = my_array.size();
///     my_array.push_back();
///     if (size_before == my_array.size())
///     {
///         // failure
///         if (size_before == my_array.max_size())
///         {
///             // length_error: you probably should have checked this first.
///         }
///         else
///         {
///             // bad_alloc: out of memory
///         }
///      } // else, success.
///
/// @throw std::length_error if the size of this collection is at capacity.
/// @throw std::bad_alloc if memory was needed and none could be allocated.
///
constexpr void push_back()

Now, is that ergonomic (to use Pavel’s favourite phrase)? Not in the least. But we could add try_push_back that returns error codes and support both/either with little code duplication between the two.