1

I want to read a .dbf file in R that contains "deleted" rows. However, I don´t know how to avoid reading all records. I am using the package "foreign" and the function read.dbf.

According to the .dbf file description, each record begins with a 1-byte "deletion" flag. The byte's value is a space (0x20), if the record is active, or an asterisk (0x2A), if the record is deleted.

How can I extract this information using R? E.g. for a small sample of the iris data set saved as a .dbf file:

library(foreign)
dbf.file <- 'iris.dbf'
write.dbf(iris[1:5, ], file=dbf.file)
3
  • 1
    DBF deleted records start with a one byte character, the asterisk, ASCII 2A (hex) or 42 (decimal). And if the record is not deleted, that byte should be a space, 20 hex or 32 decimal. I am not sure how this can be of help, this century I haven't worked with DBF files yet. Commented Aug 25, 2023 at 14:36
  • Thanks. I read that information in another question here in stack overflow, but I don´t know how it helps in practice. Commented Aug 25, 2023 at 14:54
  • 2
    It's easier to help you if you include a simple reproducible example with sample input and desired output that can be used to test and verify possible solutions. Commented Aug 25, 2023 at 15:03

1 Answer 1

1

We can use the readBin() function to read the .dbf file as binary data.

# read binary data
bin <- readBin(dbf.file, what='raw', n=file.size(dbf.file))

Then, based on the .dbf format description, we can read information necessary to navigate ourselves to the first byte of each record. I use a custom function to convert the appropriate bytes from the .dbf header into an unsigned integer.

bin_to_int <- function(bin) {
  if (.Platform$endian == 'big') {
    bin <- rev(bin)
  }
  sum(2^(which(as.logical(rawToBits(bin))) - 1))
}

# number of records
n <- bin_to_int(bin[5:8])
# numer of bytes in the header
header.len <- bin_to_int(bin[9:10])
# numer of bytes in the record
record.len <- bin_to_int(bin[11:12])

With these, it is possible to compute what are the records' first bytes and see if they mark the record as deleted or not.

# first bytes of records
record.starts <- header.len + 1 + (seq_len(n)-1) * record.len
is.deleted <- bin[record.starts] == 0x2A
is.deleted
# [1] FALSE FALSE FALSE FALSE FALSE

Indeed, none of the records were marked as deleted so we can at least check if the bytes hold the expected value of 0x20:

bin[record.starts]
# [1] 20 20 20 20 20

On a side note, from documentation it is not clear how read.dbf() treats the deleted records so chances are it ignores them and you won't have to deal with this issue at all. It would be interesting to know this so please let us know in the comments.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, Robert,! Indeed, it worked! The final part should be to read the file as final.file=read.dbf(dbf.file), and then final.file = final.file[!is.deleted,]. The read.dbf() does not recognize the deleted, it reads all records and does not have a parameter to avoid reading all.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.