Faster Rust Serialization
The tech industry happily wastes a lot of resources on serializing and deserializing JSON with its inefficient plain text format. But sadly, JSON is currently (still) the standard for sending data over the internet.
Nevertheless, we can at least try to make serialization and deserialization as efficient as possible!
In this blog post, we will see how you can improve the serialization performance of serde
in Rust.
We will take a look at a simple example and improve its performance by up to 2.25x 🚀
Disclaimer
The second part about formatters in this blog post is not for Rust beginners.
If you are just starting with Rust, don't confuse yourself with the details in this post.
Just use serde
with the derive macro and you will get very decent performance without further efforts.
You are already wasting much less resources by using Rust instead of a language like Python or Javascript 😉
The basics of the following concepts are required:
- Iterators
- Traits
- Generics
- Lifetimes
Landscape mode recommended on mobile devices
Contents
The problem
For our example, let's assume we have this struct:
struct Name {
first_name: String,
last_name: String,
}
We want to use the full name representation when formatting it (first and last name separated by a whitespace).
Let's implement the Display
trait to define that representation:
use std::fmt::{self, Display};
impl Display for Name {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{} {}", self.first_name, self.last_name)
}
}
This implementation allows us to use println!
for example to print instances of this struct:
let name = Name {
first_name: "Max".to_string(),
last_name: "Mustermann".to_string(),
};
println!("Hello {name}");
// Output: Hello Max Mustermann
Let's assume that we have a vector or slice of Name
as input (e.g. result of a database query).
Our task is to serialize it to a JSON vector of full names.
Pause reading and think about it for a minute. How would you achieve that goal?
The naive way would be to convert the names to full name strings and then serialize them:
fn naive(names: &[Name]) -> serde_json::Result<String> {
let full_names = names
.iter()
.map(|name| name.to_string())
.collect::<Vec<_>>();
serde_json::to_string(&full_names)
}
We iterate over the input slice and map Name
to its Display
string representation using the to_string()
method.
Then, we collect our goal vector of full names and serialize it.
Straightforward, right?
The problem is that we are making an allocation for each name!
The to_string()
method returns a String
which has to be allocated on the heap.
Heap allocations are expensive!
"But our serialized output is supposed to be a vector of strings!", you might say.
Yes, but this doesn't mean that we have to create that strings before the serialization. We can create them during the serialization and directly append them to the serializer's buffer instead of allocating our own buffers first. But how?
Implementing Serialize
A serde serializer can serialize a type that implements the serde trait Serialize
(check the signature of serde_json::to_string
for example).
Serialize
is implemented on Vec<T>
or [T]
if T
itself implements Serialize
.
Therefore, to directly serialize our input slice, Name
must implement Serialize
.
We could derive the default implementation of Serialize
for Name
by adding #[derive(Serialize)]
above the struct.
But the derived default implementation would serialize an instance to the following JSON object:
{ "first_name": "Max", "last_name": "Mustermann" }
But we actually want a serialization to the following string:
"Max Mustermann"
This means that we have to manually implement the Serialize
trait:
use serde::{Serialize, Serializer};
impl Serialize for Name {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
serializer.collect_str(&self)
}
}
We tell the serializer to "collect" a string from our instance.
collect_str
takes an argument &T
where T
implements Display
and appends that Display
string representation to the serializer.
We just built a bridge between the Display
and Serialize
trait 🌉
Now that our type implements Serialize
, we can just directly pass its slice to the serializer:
fn manual_serialize(names: &[Name]) -> serde_json::Result<String> {
serde_json::to_string(names)
}
Let's benchmark both methods!
The following line chart shows the ratio of the time that naive
takes to serialize N
names in comparison with manual_serialize
:
We get a speedup between 1.25x and 2.25x 🚀
The speedup depends on the number of names N
.
There is a tendency of higher speedups for higher N
values.
All what we did to get this speedup is implementing the Serialize
trait using one line for the body of the serialize
method!
Get used to it
You could use the crate serde_with
to implement the bridge between Display
and Serialize
with a macro.
But manually implementing the Serialize
trait is much more flexible if you want to do more than this bridging.
Serialize
is a tiny but very powerful trait that you should get used to.
Yes, the signature of the single trait method is rather long.
But you are not forced to manually type it!
Just write the following and your editor will suggest autocompleting the signature after typing fn
:
impl Serialize for TYPENAME {
fn // <- Autocompletion on your cursor here
}
After autocompletion:
impl Serialize for TYPENAME {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
// <- Cursor after autocompletion
}
}
Note
Your editor needs to have LSP (language server protocol) support to apply that autocompletion from rust-analyzer.
Formatters
By directly implementing Serialize
manually on our type Name
, we lost the ability to have the default derived implementation.
You are out of luck if you need to send the following JSON object representation later in another context:
{ "first_name": "Max", "last_name": "Mustermann" }
Therefore, I would recommend to not manually implement the Serialize
trait directly on your data types.
Instead, you should implement it on wrapper types that act like formatters.
Here is an example:
struct DisplayFormatter<T: Display>(T);
impl<T: Display> Serialize for DisplayFormatter<T> {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
serializer.collect_str(&self.0)
}
}
This generic formatter takes a type that implements Display
and serializes it using that representation.
You can use it not only on our Name
type, but on any type that implements Display
.
But your type doesn't always have to implement Display
!
Or maybe you want to have different implementations for Display
and Serialize
.
In that case, you can use such a concrete formatter instead:
struct FullNameFormatter<'a>(&'a Name);
impl<'a> Serialize for FullNameFormatter<'a> {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
serializer.collect_str(&format_args!("{} {}", self.0.first_name, self.0.last_name))
}
}
Here, we directly use format_args!
which is what all formatting macros like println!
and format!
use under the hood.
But format_args!
doesn't allocate or even apply the formatting!
It only returns Arguments
which is a formatter that borrows its arguments.
The important thing is that Arguments
implements Display
which we need for the collect_str()
method.
Now, we can use that formatter when we want to serialize the full name:
fn formatter(names: &[Name]) -> serde_json::Result<String> {
let full_names = names.iter().map(FullNameFormatter).collect::<Vec<_>>();
serde_json::to_string(&full_names)
}
We map each &Name
to FullNameFormatter(&Name)
and collect the map into a vector which is then passed to the serializer.
Check out the benchmark results in the chart below to see that this method has almost the same performance as the one before:
There is no real performance difference because wrapper types are a zero cost abstraction in Rust.
But actually, there should be an additional cost not related to the wrapper type itself.
If you zoom into the chart above, you should see that this method has slightly worse performance than manual_serialize
for low N
values 😱
It is the allocation of collecting the map into a Vec
!
This allocation barely shows up in the benchmark results, especially for higher N
values.
This is because it is done only once and its overhead is neglectable in comparison with the serialization itself.
Although it seems like an unneeded optimization, we will eliminate that allocation, at least to see one more example of formatters.
Sequence formatters
We can't just skip collecting the map and pass the iterator to the serializer because serde doesn't directly support serialization of iterators.
The Serialize
trait is not implemented for iterators, but a serde Serializer
provides the method collect_seq
for collecting iterators.
We need a wrapper type that takes our slice and serializes it by passing the map to collect_seq
:
struct FullNameSequenceFormatter<'a>(&'a [Name]);
impl<'a> Serialize for FullNameSequenceFormatter<'a> {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
serializer.collect_seq(self.0.iter().map(FullNameFormatter))
}
}
We can now just pass our sequence formatter to the serializer:
fn sequence_formatter(names: &[Name]) -> serde_json::Result<String> {
serde_json::to_string(&FullNameSequenceFormatter(names))
}
You can check the benchmark results above to see that this method has the same performance as manual_serialize
.
We went back to a zero cost abstraction without coupling Serialize
to Display
🌟
When to use formatters
You might argue that we introduced unneeded complexity to our code by deciding to implement these two formatters.
You are right.
If you are sure that you don't need the default derived implementation of Serialize
, then it is much simpler to have a direct manual implementation of Serialize
on your type.
But I showed you how to use formatters in this example in case you additionally need the default derived implementation.
But you have to use formatters if you want to manually implement Serialize
on foreign types (types defined in other crates)!
Even if a foreign type doesn't already implement Serialize
, Serialize
is a foreign trait and you can't implement foreign traits on foreign types in Rust.
This is because of the potential conflict in case the foreign crate owning either the trait or the type implements that trait on that type.
If you want to implement a trait on a type, either the trait or the type has to be defined in your crate.
Therefore, if you want to define Serialize
for OffsetDatetime
for example, you need to use a wrapper type as a datetime formatter.
Note
This "wrapper type" pattern is also called the Newtype pattern in the Rust community.
Outlook
It is worth noting that formatters, as introduced in this post, don't have to be simple wrapper types. They have to take a reference to the type that we want to format, but they can also take other input that can be used while formatting.
You might have also guessed that these formatter types don't have to be restricted to implementing Serialize
.
It would make a lot of sense to implement Display
on them when needed.
Our example was rather simple and we only used two of the methods that a serde serializer provides.
Check out the many methods of the Serializer
trait for more specific serializations.
Finally, our example used JSON, but the presented content in this blog post can be applied to any serializer with serde support.
Conclusion
"Avoid allocations" is the real conclusion of this blog post. But to be more specific: Avoid allocating intermediate states of your data before serialization.
We have seen that a simple manual implementation of the Serialize
trait can lead to a major performance improvement.
This doesn't mean that you should always implement Serialize
manually though!
Only if you want a custom serialization.
Otherwise, just use #[derive(Serialize)]
on your type 😃
Search for format!
, String
, to_owned
, to_string
and collect
in your code just before serialization.
If you find any of these allocating pieces, think about whether you can easily avoid them 🚀
Appendix
Benchmarking strategy
It took me a long time to achieve good benchmark results. The strategy that I ended up with isn't trivial. Therefore, I wanted to present it in the appendix.
With good results, I mean results without huge fluctuations.
The problem is that the serialization time for low values of N
(number of names) varies a lot because it is in the range of nano- and microseconds.
The randomness of the OS and CPU frequency made the benchmark results very inaccurate.
How could we eliminate that randomness?
The answer is easy: Instead of benchmarking one serialization, benchmark multiple iterations.
But there is a huge difference in the required time needed for N = 2^0 = 1
and N = 2^20 = 1_048_576
!
This means that the number of iterations that we benchmark must depend on N
.
We want one iteration for the maximum benchmarked value of N
and more iterations the lower the value N
becomes.
Note
Although I say that the maximum value of N
has only one iteration, these iterations are only one sample in divan
.
I used 25 samples for all N
values for better statistics.
First, I tried a linear dependency in the exponent with 2^(20 - log2(N))
.
This formula for the number of iterations leads to only one iteration for the maximum value of N
which is 2^20
in this benchmark.
The problem was that values of N
between 2^12
and 2^18
took too long.
But the results for low values were good!
Therefore, I needed to use less iterations for that high N
values, but I had to keep a high number of iterations for low N
values.
This means that I had to try the next dependency in the exponent which is a quadratic one: 2^((20 - log2(N))^2 / 20)
.
The division through 20 keeps the value for log2(N) = 0
unchanged in comparison with the linear dependency.
You can see the shape of both formulas in the chart below:
Of course, the exponent has to be an integer to get an integer for the number of iterations. Therefore, the division in the exponent is an integer division and its result is also visualized in the chart above.
This strategy resulted in times in the range of millisconds for all values of N
and the huge randomness was eliminated.
Of course, I divided the time through the number of iterations for each N
before comparing the serialization performance.
Yes, a lot of heuristics, but I got good, stable results 😃
Full code
The full code used in this post can be found here.