Rust Concept Clarification: Deref vs AsRef vs Borrow vs Cow

原创
07/06 20:32
阅读数 0

本篇是《Rust 概念解惑 | Deref vs AsRef vs Borrow vs Cow》的英文版,得益于现在翻译高科技,我终于可以用英文完成长篇大论了。

此文也发布于  https://dev.to/zhanghandong/rust-concept-clarification-deref-vs-asref-vs-borrow-vs-cow-13g6

Understanding by Module

In fact, the classification by standard library will first give you a small idea of what they do.

  1. std [1]:: ops [2]:: Deref [3] . As you can see, Deref is categorized as an ops module. If you look at the documentation, you will see that this module defines the   trait [4]  fr all the overloadable operators. For example, Add trait corresponds to +, while Deref trait  corresponds to a shared(immutable) borrowing dereference operation, such as *v . Correspondingly, there is also the DerefMut trait, which corresponds to the dereferencing operation of exclusive(mutable) borrowing. Since the Rust ownership semantics is a language feature throughout , the semantics of Owner / immutable borrowing(&T)/ mutable borrowing(&mut T) all appear together.
  2. std [5]:: convert [6]:: AsRef [7] . As you can see, AsRef is grouped under the convert module. if you look at the documentation, you will see that   traits [8] related to type conversions are defined in this module. For example, the familiar "From/To", "TryFrom/TryTo" and "AsRef/AsMut" also appear in pairs here, indicating that the feature is releated to type conversions. Based on the naming rules in the   Rust API Guidelines [9] , wen can infer that methods starting with as_ represent conversions from borrow -> borrow, i.e, reference -> reference , and are overhead-free. And such conversions do not fail.
  3. std [10]:: borrow [11]:: Borrow [12]. As you can see, Borrow is categorized in the borrow module. The documentation for this module is very minimal, with a single sentence saying that this is for using borrowed data. So the trait is more or less related to expressing borrwo semantics. Three   traits [13]  are provided:   Borrow [14] / BorrowMut [15]/ ToOwned [16] , which corresponds exactly to the ownership semantics.
  4. std [17]:: borrow [18]:: Cow [19]. It can be seen that Cow is also classified as a borrow module.  According to the description, Cow is a clone-on-write smart pointer. The main reason for putting it in the borrow module is to use borrowing as much as possible and avoid copying, as an optimization.

std[20]::ops[21]::Deref[22]

First, let's look at the definition of the trait.

pub trait Deref {
    type Target: ?Sized;
    #[must_use]
    pub fn deref(&self) -> &Self::Target;
}

The definition is not complicated, Deref contains only a deref method signature. The beauty of this trait is that it is called "implicitly" by the compiler, officially called "deref coercion[23] ".

Here is an example from the standard library.

use std::ops::Deref;

struct DerefExample<T> {
    value: T
}

impl<T> Deref for DerefExample<T> {
    type Target = T;

    fn deref(&self) -> &Self::Target {
        &self.value
    }
}

let x = DerefExample { value: 'a' };
assert_eq!('a', *x);

In the code, the DerefExample structure implements the Deref trait, so it can be executed using the dereference operator *. In the example, the value of the field value is returned directly.

As you can see, DerefExample has a pointer-like behavior , because it implements Deref, because it can be dereferenced. DerefExample also becomes a kind of smart pointer. This is one way to identify if a type is a smart pointer, by seeing if it implements Deref. But not all smart pointers implement Deref, some implent Drop, or both.

Now let's summarize Deref.

If T implements Deref<Target=U>, and x is an instance of type T, then.

  1. In an immutable context, the operation of *x (when T is neither a reference nor a primitive pointer) is equivalent to *Deref::deref(&x).
  2. The value of &T is forced to be converted to the value of &U. (deref coercion).
  3. T implements all the (immutable) methods of U.

The beauty of Deref is that it enhances the Rust development experience. A typical example from the standard library is that Vec<T> shares all the methods of slice by implemented  Deref.

impl<T, A: Allocator> ops::Deref for Vec<T, A> {
    type Target = [T];

    fn deref(&self) -> &[T] {
        unsafe { slice::from_raw_parts(self.as_ptr(), self.len) }
    }
}

For example, the simplest method, len(), is actually defined in the `slice` [24] module. In Rust, when executing . call, or at the function argument position, the compiler automatically performs the implicit act of deref coercion. so it is equivalent to Vec<T> having the slice method as well.

fn main() {
 let a = vec![123];
 assert_eq!(a.len(), 3); // 当 a 调用 len() 的时候,发生 deref 强转
}

Implicit behavior in Rust is not common,  but Deref is one of them, and its implicit coercion make smart pointers easy to use.

fn main() {
    let h = Box::new("hello");
    assert_eq!(h.to_uppercase(), "HELLO");
}

For example, if we manipulate Box<T> , instead of manually dereferencing T inside to manipulate it,  as if the outer layer of Box<T> is transparent, we can manipulate T directly.

Another example.

fn uppercase(s: &str) -> String {
    s.to_uppercase()
}

fn main() {
    let s = String::from("hello");
    assert_eq!(uppercase(&s), "HELLO");
}

The argument type of the uppercase method above is obviously &str, but the actual type passed in the main function is &String, so why does it compile successfully? It is because String implementsDeref.

impl ops::Deref for String {
    type Target = str;

    #[inline]
    fn deref(&self) -> &str {
        unsafe { str::from_utf8_unchecked(&self.vec) }
    }
}

That's the beauty of Deref. But some people may mistake it for inheritance. Big mistake.

This behavior seems a bit like inheritance, but please don't just use Deref to simulate inheritance.

std[25]::convert[26]::AsRef[27]

Let's look at the definition of AsRef.

pub trait AsRef<T: ?Sized> {
    fn as_ref(&self) -> &T;
}

We already know that AsRef can be used for conversions. Compared to Deref, which has an implicit behavior, AsRef is an explicit conversion.

fn is_hello<T: AsRef<str>>(s: T) {
   assert_eq!("hello", s.as_ref());
}

fn main() {
    let s = "hello";
    is_hello(s);
    
    let s = "hello".to_string();
    is_hello(s);
}

In the above example, the function of is_hello is a generic function. The conversion is achieved by qualifying T: AsRef<str> and using an explicit call like s.as_ref() inside the function. Either String or str actually implements the AsRef trait.

So now the question is, when do you use AsRef? Why not just use &T?

Consider an example like this.

pub struct Thing {
    name: String,
}

impl Thing {
    pub fn new(name: WhatTypeHere) -> Self {
        Thing { name: name.some_conversion() }
}

In the above example, the new function name has the following options for the type parameter.

  1. &str. In this case, the caller needs to pass in a reference. But in order to convert to String, the called party (callee) needs to control its own memory allocation, and will have a copy.
  2. String. In this case, the caller is fine passing String, but if it is passing a reference, it is similar to case 1.
  3. T: Into<String>. In this case, the caller can pass &str and String, but there will be memory allocation and copying during the type conversion as well.
  4. T: AsRef<str>. Same as case 3.
  5. T: Into<Cow<'a, str>>, where some allocations can be avoided. Cow will be described later.

There is no one-size-fits-all answer to the question of when to use which type. Some people just like &str and will use it no matter what. There are trade-offs here.

  1. On occasions when assignment and copying are less important,  there is no need to make type signatures too complicated, just use &str.
  2. Some need to look at method definitions and whether they need to consume ownership, or return ownership or borrowing.
  3. Some need to minimize assignment and copy, so it is necessary to use more complex type signatures, as in case 5.

Application of Deref and AsRef in API design

The wasm-bindgen[28] library contains a component called **web-sys**[29].

This component is the binding of Rust to the browser Web API. As such, web-sys makes it possible to manipulate the browser DOM with Rust code, fetch server data, draw graphics, handle audio and video, handle client-side storage, and more.

However, binding Web APIs with Rust is not that simple. For example, manipulating the DOM relies on JavaScript class inheritance, so web-sys must provide access to this inheritance hierarchy. In web-sys, access to this inheritance structure is provided using Deref and AsRef.

Using Deref

let element: &Element = ...;

element.append_child(..); // call a method on `Node`

method_expecting_a_node(&element); // coerce to `&Node` implicitly

let node: &Node = &element; // explicitly coerce to `&Node`

If you have web_sys::Element, then you can get web_sys::Node implicitly by using deref.

The use of deref is mainly for API ergonomic reasons, to make it easy for developers to use the . operation to transparently use the parent class.

Using AsRef

A large number of AsRef conversions are also implemented in web-sys for various types.

impl AsRef<HtmlElement> for HtmlAnchorElement
impl AsRef<Element> for HtmlAnchorElement
impl AsRef<Node> for HtmlAnchorElement
impl AsRef<EventTarget> for HtmlAnchorElement
impl AsRef<Object> for HtmlAnchorElement
impl AsRef<JsValue> for HtmlAnchorElement

A reference to a parent structure can be obtained by explicitly calling .as_ref().

Deref focuses on implicitly and transparently using the parent structure, while AsRef focuses on explicitly obtaining a reference to the parent structure. This is a trade-off with a specific API design, rather than a mindless simulation of OOP inheritance.

Another example of using AsRef is the http-types[30] library, which uses AsRef and AsMut to convert various types.

For example, Request is a combination of Stream / headers/ URL, so it implements AsRef<Url>, AsRef<Headers>, and AsyncRead. Similarly, Response is a combination of Stream / headers/ Status Code. So it implements AsRef<StatusCode>, AsRef<Headers>, and AsyncRead.

fn forwarded_for(headers: impl AsRef<http_types::Headers>) {
    // get the X-forwarded-for header
}

// 所以,forwarded_for 可以方便处理 Request/ Response / Trailers 
let fwd1 = forwarded_for(&req);
let fwd2 = forwarded_for(&res);
let fwd3 = forwarded_for(&trailers);

std[31]::borrow[32]::Borrow[33]

Take a look at the definition of Borrow.

pub trait Borrow<Borrowed: ?Sized> {
    fn borrow(&self) -> &Borrowed;
}

Contrast AsRef:

pub trait AsRef<T: ?Sized> {
    fn as_ref(&self) -> &T;
}

Isn't this very similar? So, some people suggest that one of these two functions could be removed altogether. But in fact, there is a difference between Borrow and AsRef, and they both have their own uses.

The Borrow trait is used to represent borrowed data. the AsRef trait is used for type conversion. In Rust, it is common to provide different type representations for different use cases for different semantics.

A type provides a reference/borrow to T in the borrow() method by implementing Borrow<T>, expressing the semantics that it can be borrowed, rather than converted to some type T. A type can be freely borrowed as several different types, or it can be borrowed in a mutable way.

So how do you choose between Borrow and AsRef?

  • Choose Borrow when you want to abstract different borrow types in a uniform way, or when you want to create a data structure that handles self-contained values (owned) and borrowed values (borrowed) in the same way.
  • When you want to convert a type directly to a reference and you are writing generic code, choose AsRef. simpler case.

In fact, the HashMap example given in the standard library documentation explains this very well. Let me translate it for you.

HashMap<K, V> stores key-value pairs, and its API should be able to retrieve the corresponding value in the HashMap properly using either the key's own value or its reference. Since the HashMap has to hash and compare keys, it must require that both the key's own value and the reference behave the same when hashed and compared.

use std::borrow::Borrow;
use std::hash::Hash;

pub struct HashMap<K, V> {
    // fields omitted
}

impl<K, V> HashMap<K, V> {
    // The insert method uses the key's own value and takes ownership of it.
    pub fn insert(&self, key: K, value: V) -> Option<V>
    where K: Hash + Eq
    {
        // ...
    }

    // If you use the get method to get the corresponding value by key, you can use the reference of key, which is denoted by &Q here
    // and requires Q to satisfy `Q: Hash + Eq + ?Sized`
    // As for K, it is expressed as a borrowed data of Q by `K: Borrow<Q>`.
    // So, the hash implementation of Q is required to be the same as K
    pub fn get<Q>(&self, k: &Q) -> Option<&V>
    where
        K: Borrow<Q>,
        Q: Hash + Eq + ?Sized
    {
        // ...
    }
}

Borrow is a bound on borrowed data and is used with additional traits, such as Hash and Eq in the example.

See another example.

//  Can this structure be used as the key of a HashMap?
pub struct CaseInsensitiveString(String);

// It implements PartialEq without problems
impl PartialEq for CaseInsensitiveString {
    fn eq(&self, other: &Self) -> bool {
       // Note that the comparison here is required to ignore ascii case
        self.0.eq_ignore_ascii_case(&other.0)
    }
}

impl Eq for CaseInsensitiveString { }

// Implementing Hash is no problem
// But since PartialEq ignores case, the hash calculation must also ignore case
impl Hash for CaseInsensitiveString {
    fn hash<H: Hasher>(&self, state: &mut H) {
        for c in self.0.as_bytes() {
            c.to_ascii_lowercase().hash(state)
        }
    }
}

Can CaseInsensitiveString implement Borrow<str>?

Obviously, CaseInsensitiveString and str have different implementations of Hash. str does not ignore case. Therefore, Borrow<str> must not be implemented for CaseInsensitiveString, so CaseInsensitiveString cannot be used as a key for a HashMap. What happens if we force Borrow<str> to be used? It will fail due to case difference when determining the key.

But CaseInsensitiveString can be fully implemented as AsRef.

This is the difference between Borrow and AsRef. Borrow is a bit stricter and represents a completely different semantics than AsRef.

std[34]::borrow[35]::Cow[36]

Look at the definition of Cow.

pub enum Cow<'a, B> 
where
    B: 'a + ToOwned + ?Sized
 {
    Borrowed(&'a B),
    Owned(<B as ToOwned>::Owned),
}

As you can see, Cow is an enumeration. It is somewhat similar to Option, in that it represents one of two cases, Cow here means borrowed and self-owned, but only one of these cases can occur.

The main functions of Cow are:

  1. acts as a smart pointer, providing transparent immutable access to instances of this type (e.g. the original immutable methods of this type can be called directly, implementing Deref, but not DerefMut).
  2. if there is a need to modify an instance of this type, or to gain ownership of an instance of this type, Cow provides methods to do cloning and avoid repeated cloning.

Cow is designed to improve performance (reduce replication) while increasing flexibility, because most of the time, business scenarios are read more and write less. With Cow, this can be achieved in a uniform, canonical form, where object replication is done only once when a write is needed. This may reduce the number of replications significantly.

It has the following key points to master.

  1. Cow<T> can directly call the immutable methods of T, since Cow, an enumeration, implements Deref.
  2. the .to_mut() method can be used to obtain a mutable borrow with an ownership value when T needs to be modified.
    1. note that a call to .to_mut() does not necessarily result in a Clone.
    2. calling .to_mut() when ownership is already present is valid, but does not produce a new Clone.
    3. multiple calls to .to_mut() will produce only one Clone.
  3. .into_owned() can be used to create a new owned object when T needs to be modified, a process that often implies a memory copy and the creation of a new object.
    1. calling this operation will perform a Clone if the value in the previous Cow was in borrowed state.
    2. this method, whose argument is of type self, will "consume" the original instance of that type, after which the life cycle of the original instance of that type will end, and cannot be called more than once on Cow.

Cow is used more often in API design.

use std::borrow::Cow;

// Use Cow for the return value to avoid multiple copies
fn remove_spaces<'a>(input: &'a str) -> Cow<'astr> {
    if input.contains(' ') {
        let mut buf = String::with_capacity(input.len());
        for c in input.chars() {
            if c != ' ' {
                buf.push(c);
            }
        }
        return Cow::Owned(buf);
    }
    return Cow::Borrowed(input);
}

Of course, when to use Cow comes back to the "when to use AsRef" discussion in our previous article, there are trade-offs and no one-size-fits-all standard answer.

Summary

To understand the various types and traits in Rust, you need to take into account the ownership semantics and ponder the documentation and examples, which should be easy to understand. I don't know if reading this article has solved your doubts? Feel free to share your feedback.

参考资料

[1]

std: https://doc.rust-lang.org/std/index.html

[2]

ops: https://doc.rust-lang.org/std/ops/index.html

[3]

Deref: https://doc.rust-lang.org/std/ops/trait.Deref.html

[4]

trait: https://doc.rust-lang.org/std/ops/index.html#traits

[5]

std: https://doc.rust-lang.org/std/index.html

[6]

convert: https://doc.rust-lang.org/std/convert/index.html

[7]

AsRef: https://doc.rust-lang.org/std/convert/trait.AsRef.html

[8]

traits: https://doc.rust-lang.org/std/convert/index.html#traits

[9]

Rust API Guidelines: https://rust-lang.github.io/api-guidelines/

[10]

std: https://doc.rust-lang.org/std/index.html

[11]

borrow: https://doc.rust-lang.org/std/borrow/index.html

[12]

Borrow: https://doc.rust-lang.org/std/borrow/trait.Borrow.html

[13]

traits: https://doc.rust-lang.org/std/borrow/index.html#traits

[14]

Borrow: https://doc.rust-lang.org/std/borrow/trait.Borrow.html

[15]

BorrowMut: https://doc.rust-lang.org/std/borrow/trait.BorrowMut.html

[16]

ToOwned: https://doc.rust-lang.org/std/borrow/trait.ToOwned.html

[17]

std: https://doc.rust-lang.org/std/index.html

[18]

borrow: https://doc.rust-lang.org/std/borrow/index.html

[19]

Cow: https://doc.rust-lang.org/std/borrow/enum.Cow.html

[20]

std: https://doc.rust-lang.org/std/index.html

[21]

ops: https://doc.rust-lang.org/std/ops/index.html

[22]

Deref: https://doc.rust-lang.org/std/ops/trait.Deref.html

[23]

deref coercion: https://doc.rust-lang.org/std/ops/trait.Deref.html#more-on-deref-coercion

[24]

slice : https://doc.rust-lang.org/std/primitive.slice.html

[25]

std: https://doc.rust-lang.org/std/index.html

[26]

convert: https://doc.rust-lang.org/std/convert/index.html

[27]

AsRef: https://doc.rust-lang.org/std/convert/trait.AsRef.html

[28]

wasm-bindgen: https://github.com/rustwasm/wasm-bindgen

[29]

web-sys: https://github.com/rustwasm/wasm-bindgen/tree/

[30]

http-types: https://github.com/http-rs/http-types

[31]

std: https://doc.rust-lang.org/std/index.html

[32]

borrow: https://doc.rust-lang.org/std/borrow/index.html

[33]

Borrow: https://doc.rust-lang.org/std/borrow/trait.Borrow.html

[34]

std: https://doc.rust-lang.org/std/index.html

[35]

borrow: https://doc.rust-lang.org/std/borrow/index.html

[36]

Cow: https://doc.rust-lang.org/std/borrow/enum.Cow.html


本文分享自微信公众号 - 觉学社(WakerGroup)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。

展开阅读全文
打赏
0
0 收藏
分享
加载中
更多评论
打赏
0 评论
0 收藏
0
分享
返回顶部
顶部