Asked  7 Months ago    Answers:  5   Viewed   22 times

Given

struct node
{
     int a;
     struct node * next;
};

To malloc a new structure,

struct node *p = malloc(sizeof(*p));

is safer than

struct node *p = malloc(sizeof(struct node));

Why? I thought they are the same.

 Answers

71

It is safer becuse you don't have to mention the type name twice and don't have to build the proper spelling for "dereferenced" version of the type. For example, you don't have to "count the stars" in

int *****p = malloc(100 * sizeof *p);

Compare that to the type-based sizeof in

int *****p = malloc(100 * sizeof(int ****));

where you have too make sure you used the right number of * under sizeof.

In order to switch to another type you only have to change one place (the declaration of p) instead of two. And people who have the habit of casting the result of malloc have to change three places.

More generally, it makes a lot of sense to stick to the following guideline: type names belong in declarations and nowhere else. The actual statements should be type-independent. They should avoid mentioning any type names or using any other type-specific features as much as possible.

The latter means: Avoid unnecessary casts. Avoid unnecessary type-specific constant syntax (like 0.0 or 0L where a plain 0 would suffice). Avoid mentioning type names under sizeof. And so on.

Tuesday, June 1, 2021
 
erotsppa
answered 7 Months ago
47

No they're not absolutely the same. While both let you store the same number and type of objects, keep in mind that:

  • You can free() a malloced array, but you can't free() a variable length array (although it goes out of scope and ceases to exist once the enclosing block is left). In technical jargon, they have different storage duration: allocated for malloc versus automatic for variable length arrays.
  • Although C has no concept of a stack, many implementation allocate a variable length array from the stack, while malloc allocates from the heap. This is an issue on stack-limited systems, e.g. many embedded operating systems, where the stack size is on the order of kB, while the heap is much larger.
  • It is also easier to test for a failed allocation with malloc than with a variable length array.
  • malloced memory can be changed in size with realloc(), while VLAs can't (more precisely only by executing the block again with a different array dimension--which loses the previous contents).
  • A hosted C89 implementation only supports malloc().
  • A hosted C11 implementation may not support variable length arrays (it then must define __STDC_NO_VLA__ as the integer 1 according to C11 6.10.8.3).
  • Everything else I have missed :-)
Thursday, June 3, 2021
 
clean_coding
answered 7 Months ago
70

The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.


Deep dive: We have a combination of factors here:

  • We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).

  • Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.

  • However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.

  • The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.

  • So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)

  • In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).

  • Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).

  • BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."

The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).


If one wants to try to understand this via concrete exploration, you can try comparing how the compiler differs in its treatment of little container types that vary in their use of #[may_dangle] and PhantomData.

Here is some sample code I have whipped up to illustrate this:

// Illustration of a case where PhantomData is providing necessary ownership
// info to rustc.
//
// MyBox2<T> uses just a `*const T` to hold the `T` it owns.
// MyBox3<T> has both a `*const T` AND a PhantomData<T>; the latter communicates
// its ownership relationship with `T`.
//
// Skim down to `fn f2()` to see the relevant case, 
// and compare it to `fn f3()`. When you run the program,
// the output will include:
//
// drop PrintOnDrop(mb2b, PrintOnDrop("v2b", 13, INVALID), Valid)
//
// (However, in the absence of #[may_dangle], the compiler will constrain
// things in a manner that may indeed imply that PhantomData is unnecessary;
// pnkfelix is not 100% sure of this claim yet, though.)

#![feature(alloc, dropck_eyepatch, generic_param_attrs, heap_api)]

extern crate alloc;

use alloc::heap;
use std::fmt;
use std::marker::PhantomData;
use std::mem;
use std::ptr;

#[derive(Copy, Clone, Debug)]
enum State { INVALID, Valid }

#[derive(Debug)]
struct PrintOnDrop<T: fmt::Debug>(&'static str, T, State);

impl<T: fmt::Debug> PrintOnDrop<T> {
    fn new(name: &'static str, t: T) -> Self {
        PrintOnDrop(name, t, State::Valid)
    }
}

impl<T: fmt::Debug> Drop for PrintOnDrop<T> {
    fn drop(&mut self) {
        println!("drop PrintOnDrop({}, {:?}, {:?})",
                 self.0,
                 self.1,
                 self.2);
        self.2 = State::INVALID;
    }
}

struct MyBox1<T> {
    v: Box<T>,
}

impl<T> MyBox1<T> {
    fn new(t: T) -> Self {
        MyBox1 { v: Box::new(t) }
    }
}

struct MyBox2<T> {
    v: *const T,
}

impl<T> MyBox2<T> {
    fn new(t: T) -> Self {
        unsafe {
            let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
            let p = p as *mut T;
            ptr::write(p, t);
            MyBox2 { v: p }
        }
    }
}

unsafe impl<#[may_dangle] T> Drop for MyBox2<T> {
    fn drop(&mut self) {
        unsafe {
            // We want this to be *legal*. This destructor is not 
            // allowed to call methods on `T` (since it may be in
            // an invalid state), but it should be allowed to drop
            // instances of `T` as it deconstructs itself.
            //
            // (Note however that the compiler has no knowledge
            //  that `MyBox2<T>` owns an instance of `T`.)
            ptr::read(self.v);
            heap::deallocate(self.v as *mut u8,
                             mem::size_of::<T>(),
                             mem::align_of::<T>());
        }
    }
}

struct MyBox3<T> {
    v: *const T,
    _pd: PhantomData<T>,
}

impl<T> MyBox3<T> {
    fn new(t: T) -> Self {
        unsafe {
            let p = heap::allocate(mem::size_of::<T>(), mem::align_of::<T>());
            let p = p as *mut T;
            ptr::write(p, t);
            MyBox3 { v: p, _pd: Default::default() }
        }
    }
}

unsafe impl<#[may_dangle] T> Drop for MyBox3<T> {
    fn drop(&mut self) {
        unsafe {
            ptr::read(self.v);
            heap::deallocate(self.v as *mut u8,
                             mem::size_of::<T>(),
                             mem::align_of::<T>());
        }
    }
}

fn f1() {
    // `let (v, _mb1);` and `let (_mb1, v)` won't compile due to dropck
    let v1; let _mb1;
    v1 = PrintOnDrop::new("v1", 13);
    _mb1 = MyBox1::new(PrintOnDrop::new("mb1", &v1));
}

fn f2() {
    {
        let (v2a, _mb2a); // Sound, but not distinguished from below by rustc!
        v2a = PrintOnDrop::new("v2a", 13);
        _mb2a = MyBox2::new(PrintOnDrop::new("mb2a", &v2a));
    }

    {
        let (_mb2b, v2b); // Unsound!
        v2b = PrintOnDrop::new("v2b", 13);
        _mb2b = MyBox2::new(PrintOnDrop::new("mb2b", &v2b));
        // namely, v2b dropped before _mb2b, but latter contains
        // value that attempts to access v2b when being dropped.
    }
}

fn f3() {
    let v3; let _mb3; // `let (v, mb3);` won't compile due to dropck
    v3 = PrintOnDrop::new("v3", 13);
    _mb3 = MyBox3::new(PrintOnDrop::new("mb3", &v3));
}

fn main() {
    f1(); f2(); f3();
}
Tuesday, August 3, 2021
 
elias
answered 5 Months ago
30

Use new. You shouldn't need to use malloc in a C++ program, unless it is interacting with some C code or you have some reason to manage memory in a special way.

Your example of node = malloc(sizeof(Node)) is a bad idea, because the constructor of Node (if any exists) would not be called, and a subsequent delete node; would have undefined results.

If you need a buffer of bytes, rather than an object, you'll generally want to do something like this:

char *buffer = new char[1024];

or, preferably, something like this:

std::vector<char> buffer(1024);

Note that for the second example (using std::vector<>), there is no need to delete the object; its memory will automatically be freed when it goes out of scope. You should strive to avoid both new and malloc in C++ programs, instead using objects that automatically manage their own memory.

Thursday, August 5, 2021
 
Kenny
answered 4 Months ago
91

Because you're using iterators, and it'll make the loop look exactly the same as other containers, should you choose to switch to other container types, such as set, list, unordered_set, etc. where < has no meaning.

Thursday, August 12, 2021
 
konstantin
answered 4 Months ago
Only authorized users can answer the question. Please sign in first, or register a free account.
Not the answer you're looking for? Browse other questions tagged :
 
Share