As I mentioned in a comment the layout of objects in memory is very implementation-specific and the tools to explore it are necessarily also implementation-dependent.
This answer discusses the layout for 64-bit versions of SBCL and only for 64-bit versions which have 'wide fixnums'. I'm not sure in which order these two things arrived in SBCL as I haven't looked seriously at any of this since well before SBCL and CMUCL diverged.
This answer also may be wrong: I'm not an SBCL developer and I'm only adding it because no one who is has (I suspect tagging the question properly might help with this).
Information below comes from looking at the GitHub mirror, which seems to be very up to date with the canonical source but a lot faster.
Pointers, immediate objects, tags
[Information from here.] SBCL allocates on two-word boundaries. On a 64-bit system this means that the low four bits of any address are always zero. These low four bits are used as a tag (the documentation calls this the 'lowtag') to tell you what sort of thing is in the rest of the word.
- A lowtag of xyz0 means that the rest of the word is a fixnum, and in particular xyz will then be the low bits of the fixnum, rather than tag bits at all. This means both that there are 63 bits available for fixnums and that fixnum addition is trivial: you don't need to mask off any bits.
- A lowtag of xy01 means that the rest of the word is some other immediate object. Some of the bits to the right of the lowtag (which I think SBCL calls a 'widetag' although I am confused about this as the term seems to be used in two ways) will say what the immediate object is. Examples of immediate objects are characters and single-floats (on a 64-bit platform!).
- the remaining lowtag patterns are xy11, and they all mean that things are pointers to some non-immediate object:
- 0011 is an instance of something;
- 0111 is a cons;
- 1011 is a function;
- 1111 is something else.
Conses
Because conses don't need any additional type information (a cons is a cons) the lowtag is enough: a cons is then just two words in memory, each of which in turn has lowtags &c.
Other non-immediate objects
I think (but am not sure) that all other non-immediate objects have a word which says what they are (which may also be called a 'widetag') and at least one other word (because allocation is on two-word boundaries). I suspect that the special tag for functions means that function call can just jump to the entry point of the function's code.
Looking at this
room.lisp has a nice function called hexdump which knows how to print out non-immediate objects. Based on that I wrote a little shim (below) which tries to tell you useful things. Here are some examples.
> (hexdump-thing 1)
lowtags: 0010
fixnum: 0000000000000002 = 1
1 is a fixnum and its representation is just shifted right one bit as described above. Note that the lowtags actually contain the whole value in this case!
> (hexdump-thing 85757)
lowtags: 1010
fixnum: 0000000000029DFA = 85757
... but not in this case.
> (hexdump-thing #\c)
lowtags: 1001
immediate: 0000000000006349 = #\c
> (hexdump-thing 1.0s0)
lowtags: 1001
immediate: 3F80000000000019 = 1.0
Characters and single floats are immediate: some of the bits to the left of the lowtag tells the system what they are, I think?
> (hexdump-thing '(1 . 2))
lowtags: 0111
cons: 00000010024D6E07 : 00000010024D6E00
10024D6E00: 0000000000000002 = 1
10024D6E08: 0000000000000004 = 2
> (hexdump-thing '(1 2 3))
lowtags: 0111
cons: 00000010024E4BC7 : 00000010024E4BC0
10024E4BC0: 0000000000000002 = 1
10024E4BC8: 00000010024E4BD7 = (2 3)
Conses. In the first case you can see the two fixnums sitting as immediate values in the two fields of the cons. In the second, if you decoded the lowtag of the second field it would be 0111: it's another cons.
> (hexdump-thing "")
lowtags: 1111
other: 00000010024FAE8F : 00000010024FAE80
10024FAE80: 00000000000000E5
10024FAE88: 0000000000000000 = 0
> (hexdump-thing "x")
lowtags: 1111
other: 00000010024FC22F : 00000010024FC220
10024FC220: 00000000000000E5
10024FC228: 0000000000000002 = 1
10024FC230: 0000000000000078 = 60
10024FC238: 0000000000000000 = 0
> (hexdump-thing "xyzt")
lowtags: 1111
other: 00000010024FDDAF : 00000010024FDDA0
10024FDDA0: 00000000000000E5
10024FDDA8: 0000000000000008 = 4
10024FDDB0: 0000007900000078 = 259845521468
10024FDDB8: 000000740000007A = 249108103229
Strings. These have some type information, a length field, and then characters are packed two to a word. A single-character string needs four words, the same as a four-character one. You can read the character codes out of the data.
> (hexdump-thing #())
lowtags: 1111
other: 0000001002511C3F : 0000001002511C30
1002511C30: 0000000000000089
1002511C38: 0000000000000000 = 0
> (hexdump-thing #(1))
lowtags: 1111
other: 00000010025152BF : 00000010025152B0
10025152B0: 0000000000000089
10025152B8: 0000000000000002 = 1
10025152C0: 0000000000000002 = 1
10025152C8: 0000000000000000 = 0
> (hexdump-thing #(1 2))
lowtags: 1111
other: 000000100252DC2F : 000000100252DC20
100252DC20: 0000000000000089
100252DC28: 0000000000000004 = 2
100252DC30: 0000000000000002 = 1
100252DC38: 0000000000000004 = 2
> (hexdump-thing #(1 2 3))
lowtags: 1111
other: 0000001002531C8F : 0000001002531C80
1002531C80: 0000000000000089
1002531C88: 0000000000000006 = 3
1002531C90: 0000000000000002 = 1
1002531C98: 0000000000000004 = 2
1002531CA0: 0000000000000006 = 3
1002531CA8: 0000000000000000 = 0
Same deal for simple vectors: header, length, but now each entry takes a word of course. Above all entries are fixnums and you can see them in the data.
And so it goes on.
The code that did this
This may be wrong and an earlier version of it definitely did not like small bignums (I think hexdump doesn't like them). If you want real answers either read the source or ask an SBCL person. Other implementations are available, and will be different.
(defun hexdump-thing (obj)
;; Try and hexdump an object, including immediate objects. All the
;; work is done by sb-vm:hexdump in the interesting cases.
#-(and SBCL 64-bit)
(error "not a 64-bit SBCL")
(let* ((address/thing (sb-kernel:get-lisp-obj-address obj))
(tags (ldb (byte 4 0) address/thing)))
(format t "~&lowtags: ~12T~4,'0b~%" tags)
(cond
((zerop (ldb (byte 1 0) tags))
(format t "~&fixnum:~12T~16,'0x = ~S~%" address/thing obj))
((= (ldb (byte 2 0) tags) #b01)
(format t "~&immediate:~12T~16,'0x = ~S~%" address/thing obj))
((= (ldb (byte 2 0) tags) #b11) ;must be true
(format t "~&~A:~12T~16,'0x : ~16,'0x~%"
(case (ldb (byte 2 2) tags)
(#b00 "instance")
(#b01 "cons")
(#b10 "function")
(#b11 "other"))
address/thing (dpb #b0000 (byte 4 0) address/thing))
;; this tells you at least something (and really annoyingly
;; does not pad addresses on the left)
(sb-vm:hexdump obj))
;; can't happen
(t (error "mutant"))))
(values))