Faster Rust Serialization

Tags: #rust

Reading time: ~13min


The tech industry happily wastes a lot of resources on serializing and deserializing JSON with its inefficient plain text format. But sadly, JSON is currently (still) the standard for sending data over the internet.

Nevertheless, we can at least try to make serialization and deserialization as efficient as possible!

In this blog post, we will see how you can improve the serialization performance of serde in Rust. We will take a look at a simple example and improve its performance by up to 2.25x 🚀

Disclaimer

The second part about formatters in this blog post is not for Rust beginners.

If you are just starting with Rust, don't confuse yourself with the details in this post. Just use serde with the derive macro and you will get very decent performance without further efforts. You are already wasting much less resources by using Rust instead of a language like Python or Javascript 😉

The basics of the following concepts are required:

Landscape mode recommended on mobile devices

The problem

For our example, let's assume we have this struct:

struct Name {
    first_name: String,
    last_name: String,
}

We want to use the full name representation when formatting it (first and last name separated by a whitespace). Let's implement the Display trait to define that representation:

use std::fmt::{self, Display};

impl Display for Name {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(f, "{} {}", self.first_name, self.last_name)
    }
}

This implementation allows us to use println! for example to print instances of this struct:

let name = Name {
    first_name: "Max".to_string(),
    last_name: "Mustermann".to_string(),
};

println!("Hello {name}");
// Output: Hello Max Mustermann

Let's assume that we have a vector or slice of Name as input (e.g. result of a database query). Our task is to serialize it to a JSON vector of full names.

Pause reading and think about it for a minute. How would you achieve that goal?

The naive way would be to convert the names to full name strings and then serialize them:

fn naive(names: &[Name]) -> serde_json::Result<String> {
    let full_names = names
        .iter()
        .map(|name| name.to_string())
        .collect::<Vec<_>>();

    serde_json::to_string(&full_names)
}

We iterate over the input slice and map Name to its Display string representation using the to_string() method. Then, we collect our goal vector of full names and serialize it.

Straightforward, right?

The problem is that we are making an allocation for each name! The to_string() method returns a String which has to be allocated on the heap. Heap allocations are expensive!

"But our serialized output is supposed to be a vector of strings!", you might say.

Yes, but this doesn't mean that we have to create that strings before the serialization. We can create them during the serialization and directly append them to the serializer's buffer instead of allocating our own buffers first. But how?

Implementing Serialize

A serde serializer can serialize a type that implements the serde trait Serialize (check the signature of serde_json::to_string for example). Serialize is implemented on Vec<T> or [T] if T itself implements Serialize. Therefore, to directly serialize our input slice, Name must implement Serialize.

We could derive the default implementation of Serialize for Name by adding #[derive(Serialize)] above the struct. But the derived default implementation would serialize an instance to the following JSON object:

{ "first_name": "Max", "last_name": "Mustermann" }

But we actually want a serialization to the following string:

"Max Mustermann"

This means that we have to manually implement the Serialize trait:

use serde::{Serialize, Serializer};

impl Serialize for Name {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        serializer.collect_str(&self)
    }
}

We tell the serializer to "collect" a string from our instance. collect_str takes an argument &T where T implements Display and appends that Display string representation to the serializer.

We just built a bridge between the Display and Serialize trait 🌉

Now that our type implements Serialize, we can just directly pass its slice to the serializer:

fn manual_serialize(names: &[Name]) -> serde_json::Result<String> {
    serde_json::to_string(names)
}

Let's benchmark both methods!

The following line chart shows the ratio of the time that naive takes to serialize N names in comparison with manual_serialize:

The speedup from the benchmark results

We get a speedup between 1.25x and 2.25x 🚀

The speedup depends on the number of names N. There is a tendency of higher speedups for higher N values.

All what we did to get this speedup is implementing the Serialize trait using one line for the body of the serialize method!

Note

I used the new benchmarking crate divan for the benchmarks and Julia for plotting. You can find the full benchmark code here.

We should of course test that both functions return the same output! You can find the test here.

The test passes, trust me 😇

Get used to it

You could use the crate serde_with to implement the bridge between Display and Serialize with a macro. But manually implementing the Serialize trait is much more flexible if you want to do more than this bridging. Serialize is a tiny but very powerful trait that you should get used to.

Yes, the signature of the single trait method is rather long. But you are not forced to manually type it! Just write the following and your editor will suggest autocompleting the signature after typing fn:

impl Serialize for TYPENAME {
    fn // <- Autocompletion on your cursor here
}

After autocompletion:

impl Serialize for TYPENAME {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        // <- Cursor after autocompletion
    }
}

Note

Your editor needs to have LSP (language server protocol) support to apply that autocompletion from rust-analyzer.

Formatters

By directly implementing Serialize manually on our type Name, we lost the ability to have the default derived implementation. You are out of luck if you need to send the following JSON object representation later in another context:

{ "first_name": "Max", "last_name": "Mustermann" }

Therefore, I would recommend to not manually implement the Serialize trait directly on your data types. Instead, you should implement it on wrapper types that act like formatters.

Here is an example:

struct DisplayFormatter<T: Display>(T);

impl<T: Display> Serialize for DisplayFormatter<T> {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        serializer.collect_str(&self.0)
    }
}

This generic formatter takes a type that implements Display and serializes it using that representation. You can use it not only on our Name type, but on any type that implements Display.

But your type doesn't always have to implement Display! Or maybe you want to have different implementations for Display and Serialize. In that case, you can use such a concrete formatter instead:

struct FullNameFormatter<'a>(&'a Name);

impl<'a> Serialize for FullNameFormatter<'a> {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        serializer.collect_str(&format_args!("{} {}", self.0.first_name, self.0.last_name))
    }
}

Here, we directly use format_args! which is what all formatting macros like println! and format! use under the hood. But format_args! doesn't allocate or even apply the formatting! It only returns Arguments which is a formatter that borrows its arguments. The important thing is that Arguments implements Display which we need for the collect_str() method.

Now, we can use that formatter when we want to serialize the full name:

fn formatter(names: &[Name]) -> serde_json::Result<String> {
    let full_names = names.iter().map(FullNameFormatter).collect::<Vec<_>>();

    serde_json::to_string(&full_names)
}

We map each &Name to FullNameFormatter(&Name) and collect the map into a vector which is then passed to the serializer.

Check out the benchmark results in the chart below to see that this method has almost the same performance as the one before:

The time of every method from the benchmark results

There is no real performance difference because wrapper types are a zero cost abstraction in Rust.

But actually, there should be an additional cost not related to the wrapper type itself. If you zoom into the chart above, you should see that this method has slightly worse performance than manual_serialize for low N values 😱

It is the allocation of collecting the map into a Vec!

This allocation barely shows up in the benchmark results, especially for higher N values. This is because it is done only once and its overhead is neglectable in comparison with the serialization itself.

Although it seems like an unneeded optimization, we will eliminate that allocation, at least to see one more example of formatters.

Sequence formatters

We can't just skip collecting the map and pass the iterator to the serializer because serde doesn't directly support serialization of iterators. The Serialize trait is not implemented for iterators, but a serde Serializer provides the method collect_seq for collecting iterators.

We need a wrapper type that takes our slice and serializes it by passing the map to collect_seq:

struct FullNameSequenceFormatter<'a>(&'a [Name]);

impl<'a> Serialize for FullNameSequenceFormatter<'a> {
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        serializer.collect_seq(self.0.iter().map(FullNameFormatter))
    }
}

We can now just pass our sequence formatter to the serializer:

fn sequence_formatter(names: &[Name]) -> serde_json::Result<String> {
    serde_json::to_string(&FullNameSequenceFormatter(names))
}

You can check the benchmark results above to see that this method has the same performance as manual_serialize. We went back to a zero cost abstraction without coupling Serialize to Display 🌟

When to use formatters

You might argue that we introduced unneeded complexity to our code by deciding to implement these two formatters.

You are right. If you are sure that you don't need the default derived implementation of Serialize, then it is much simpler to have a direct manual implementation of Serialize on your type. But I showed you how to use formatters in this example in case you additionally need the default derived implementation.

But you have to use formatters if you want to manually implement Serialize on foreign types (types defined in other crates)!

Even if a foreign type doesn't already implement Serialize, Serialize is a foreign trait and you can't implement foreign traits on foreign types in Rust. This is because of the potential conflict in case the foreign crate owning either the trait or the type implements that trait on that type.

If you want to implement a trait on a type, either the trait or the type has to be defined in your crate. Therefore, if you want to define Serialize for OffsetDatetime for example, you need to use a wrapper type as a datetime formatter.

Note

This "wrapper type" pattern is also called the Newtype pattern in the Rust community.

Outlook

It is worth noting that formatters, as introduced in this post, don't have to be simple wrapper types. They have to take a reference to the type that we want to format, but they can also take other input that can be used while formatting.

You might have also guessed that these formatter types don't have to be restricted to implementing Serialize. It would make a lot of sense to implement Display on them when needed.

Our example was rather simple and we only used two of the methods that a serde serializer provides. Check out the many methods of the Serializer trait for more specific serializations.

Finally, our example used JSON, but the presented content in this blog post can be applied to any serializer with serde support.

Conclusion

"Avoid allocations" is the real conclusion of this blog post. But to be more specific: Avoid allocating intermediate states of your data before serialization.

We have seen that a simple manual implementation of the Serialize trait can lead to a major performance improvement.

This doesn't mean that you should always implement Serialize manually though! Only if you want a custom serialization. Otherwise, just use #[derive(Serialize)] on your type 😃

Search for format!, String, to_owned, to_string and collect in your code just before serialization. If you find any of these allocating pieces, think about whether you can easily avoid them 🚀


Appendix

Benchmarking strategy

It took me a long time to achieve good benchmark results. The strategy that I ended up with isn't trivial. Therefore, I wanted to present it in the appendix.

With good results, I mean results without huge fluctuations. The problem is that the serialization time for low values of N (number of names) varies a lot because it is in the range of nano- and microseconds. The randomness of the OS and CPU frequency made the benchmark results very inaccurate.

How could we eliminate that randomness? The answer is easy: Instead of benchmarking one serialization, benchmark multiple iterations. But there is a huge difference in the required time needed for N = 2^0 = 1 and N = 2^20 = 1_048_576!

This means that the number of iterations that we benchmark must depend on N. We want one iteration for the maximum benchmarked value of N and more iterations the lower the value N becomes.

Note

Although I say that the maximum value of N has only one iteration, these iterations are only one sample in divan. I used 25 samples for all N values for better statistics.

First, I tried a linear dependency in the exponent with 2^(20 - log2(N)). This formula for the number of iterations leads to only one iteration for the maximum value of N which is 2^20 in this benchmark. The problem was that values of N between 2^12 and 2^18 took too long. But the results for low values were good!

Therefore, I needed to use less iterations for that high N values, but I had to keep a high number of iterations for low N values. This means that I had to try the next dependency in the exponent which is a quadratic one: 2^((20 - log2(N))^2 / 20). The division through 20 keeps the value for log2(N) = 0 unchanged in comparison with the linear dependency. You can see the shape of both formulas in the chart below:

A chart showing the number of benchmark iterations over the number of names N

Of course, the exponent has to be an integer to get an integer for the number of iterations. Therefore, the division in the exponent is an integer division and its result is also visualized in the chart above.

This strategy resulted in times in the range of millisconds for all values of N and the huge randomness was eliminated. Of course, I divided the time through the number of iterations for each N before comparing the serialization performance.

Yes, a lot of heuristics, but I got good, stable results 😃


Full code

The full code used in this post can be found here.

You can suggest improvements on the website's repository

Content license: CC BY-NC-SA 4.0