1

I'm working on a java project and i have to read some files like these: - EntryID.data - EntryID.index - KeyText.data - KeyText.index ...

I think these files are used in a dictionary project but i can't find a any document about this. How can i read them or know the format of them ? Sorry for my english =.=

Thanks alot!

1
  • Umm ... if you don't know where the files come from and you don't what they contain, why do you need to read them? Commented Dec 21, 2010 at 8:44

3 Answers 3

2

This looks like files from a database management system. One file to store the data, another one to store at least one index to speed up queries.

I'd start with a hex editor and look at the file. Sometimes, the content binaries gives a hint.

Another idea: look at the classpath and inspect property and resource files. Maybe you'll find a database driver or some config files with jdbc connect strings.


Google told me, that all four files are used by Apple's Dictionary.app. Have a look at this blog, this can point you in the correct direction.


Last note - reading undocumented binaries is a challenge. I usually start with 010 Editor to analyse the datastructure and develop a java based test tool to read the data. It's some sort of try and error evolutionary process.

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you guys, i'have used Hex Editor to open it, i can read some meaningful string but i still cant get the file format. The keytext.data file contains definitions of English words in vietnamese, i cant understand the content of keytext.index file. I also read a lot topic about apple dic app but i cant find anything, i will try again. Thanks in advance !
It is a dictionary and you can expect some sort of key/value pairs: an english word and an translation in a different language. I doubt that you can find some sort of java lib that allows using the dictionary files - concentrate on parsing the required information from the file, maybe into a new datastructure.
1

Well, this is kinda difficult. data could mean anything.

You could try the UNIX utility file or open the file with a hex editor and look for interesting strings (the utility strings is helpful for that too).

Comments

0

Some information is in info.plist.
KeyText.data is sometimes compressed using zlib. 78 9C is well-known zlib-header so you can decompress when you find it. Size of decompressed entry comes before compressed entry.
Size of entry comes before entry of array.

C# library is in https://github.com/kurema/MacDictionaryGeneral. But *.index is too difficult to understand and implement. info.plist says *.index is trie index which is not enough information to understand fully.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.