Barely known, but easy to get information on, is the Scratch File Format. The spec is outdated, but it exists. But what this topic is about is the sprite file format instead. If we had this, we could write programs that would be able to export data to scratch. Imagine the possibilities.
To start, I made a one dot sprite with no scripts, and exported it to a file. From there, I opened the file up. Here is a dump of the file's first 32 bytes in my hex editor.
ObjS.Stch....!}..c....c...c.....
4F 62 6A 53 01 53 74 63 68 01 00 00 00 21 7D 05
15 63 00 00 02 01 63 00 00 03 63 00 00 04 05 00
If you've done any work with scratch files, you recognize at once the first 10 bytes--a scratch object. This gives me the impression that the scratch sprite files are really just sprite objects separated from the rest of the file. The pattern that can be recognized is that the character c keeps coming up. Lower-case c's ascii code is 99, or 63 in hex. The number 99 is significant in base 10, and gives the impression that it means something. A little decompiling of scratch reveals this piece of code.
(fragment, cut for ease of reading)
fixedFormatClasses "Answer an array of records for fixed-format classes." ^ #( "id class read selector write selector" (35 ColorForm getForm:id: putForm:id:) "99 reserved for object references" "100-255 reserved for user-defined classes" )
The interesting part is the note--99 must mean that we are looking at a file full of object references. Weird.
This document is a work in progress, and it needs your help to finish it. If you know something about the Scratch file format's that could possibly help development of this kindly respond.
DISCLAIMER: I am not on the Scratch Team and cannot guarantee that any information is accurate.
-bobbybee
Offline
I'm pretty sure that a .sprite file is just the Sprite serialized with ObjStream. A .sb file is the stage serialized with ObjStream; this contains all the sprites in the "sprites"--and, indirectly, "submorphs"--fields.
The object references are there to preserve identity and store potentially circular structures--for example, a sprite contains a reference to its parent (the stage), and its parent contains a reference to that sprite. If the stage was serialized each time it appeared in an object's field, attempting to serialize it would result in an infinite loop (serialize stage -> serialize child sprite -> serialize parent stage -> ...). With object references, each object is serialized only the first time it is encountered (including an object ID), and if it occurs again, a reference is inserted in its place. When the objects are deserialized, these object references are resolved using a lookup table, so they store references to the same object.
Offline
Offline
nXIII, that would explain what those object references are for. MathWizz, I already knew about that part, but this is the individual sprite files, rather then the entire project itself.
Thanks to both of you.
-bobbybee
Offline
Just in case you missed it, there's the following in the document MathWizz linked:
5.5 Object References An object reference allows a field in one object to contain a pointer to another object. It has the following format: <99: 1 byte constant><object table index: 3 byte big-endian integer> The value 99 is a reserved classID value used to indicate an object reference. The first object table index is 1, unlike C or Java arrays where the first entry is at index 0. Example: An object reference to the second entry in the object table is encoded as four bytes: 99, 0, 0, 0, 2.
I think nXIII may be right -- sprites are serialised in a very similar fashion to projects, so that document should have most of what you need.
Last edited by blob8108 (2011-12-12 08:03:00)
Offline
I'll take a look at that. Thanks everyone.
Offline
I was working on reverse-engineering some of this stuff, and I made a little wiki to collect my findings. Hope it helps!
Offline
hsshah wrote:
Have you tried parsing the Project Summary directly?
I got to know about it as a reply to my post yesterday. Shift-Click on File --> Write Project Summary
Thanks! That's interesting... That would definitely help for reading project files, but not for writing Scratch files... There's no way of importing the summary
Last edited by blob8108 (2011-12-14 15:06:28)
Offline
With some squeak hacking, you should be able to read it in.
Offline
bobbybee wrote:
With some squeak hacking, you should be able to read it in.
I don't think that does the right thing, though. Writing Scratch files seems better, somehow. Also, that way sounds far too easy.
Did you try to join that wiki? If so, I'll hit accept...
Offline
Yeah, I did try to join the wiki. (It clearly could use a little help and I always have expressed an interest in reverse-engineering, so it should be fine)
-bobbybee
Offline
bobbybee wrote:
Yeah, I did try to join the wiki. (It clearly could use a little help and I always have expressed an interest in reverse-engineering, so it should be fine)
Oh, I'd love the help! Just wanted to make sure 'twas you. And reverse-engineering is fun...
Offline
I know right. Just wondering, what section of the file format should I research. I am working on an app to write sprite files, if it matters.
Offline
bobbybee wrote:
I know right. Just wondering, what section of the file format should I research. I am working on an app to write sprite files, if it matters.
Sprite files miss out the info table, so they're slightly easier to deal with. I've figured out the formats of most of the fixed-format objects, but not always what they do (like the field called "Form"). There's a lot missing under user-class objects, like the properties stored of each one. It should be easy to find in the Squeak browser, if you look at the storeFieldsOn method under "object i/o" on each object.
Offline
blob8108 wrote:
Sprite files miss out the info table, so they're slightly easier to deal with. I've figured out the formats of most of the fixed-format objects, but not always what they do (like the field called "Form"). There's a lot missing under user-class objects, like the properties stored of each one. It should be easy to find in the Squeak browser, if you look at the storeFieldsOn method under "object i/o" on each object.
I summarized how to decode Forms here.
For user classes, look at the #initFieldsFrom: message in each class's instance methods; they show the order in which their fields are encoded.
Offline
nXIII wrote:
I summarized how to decode Forms here.
Thanks! I think I'm beginning to understand it. For what I'm trying to do, I think can just parse it as an object with 5/6 fields and ignore its contents; I'm not going to try and decode it just yet
Is it okay, though, if I copy your description to the little wiki I made? Happy to remove it if not.
nXIII wrote:
For user classes, look at the #initFieldsFrom: message in each class's instance methods; they show the order in which their fields are encoded.
That's vaguely what I've been doing. I wish there was some automated way to export those lists of fields for each class from Squeak in one go...
Last edited by blob8108 (2011-12-15 15:42:40)
Offline
So, what should I work on?
Offline
bobbybee wrote:
So, what should I work on?
Descriptions of the properties of each (or even some) of the user-class objects would be great. As nXIII pointed out:
nXIII wrote:
For user classes, look at the #initFieldsFrom: message in each class's instance methods; they show the order in which their fields are encoded.
(It should be under "object i/o" on each object.)
Feel free to edit any of the wiki pages to correct/add/fix/question something — that's what it's for (:
Offline
On it.
Offline
blob8108 wrote:
nXIII wrote:
For user classes, look at the #initFieldsFrom: message in each class's instance methods; they show the order in which their fields are encoded.
That's vaguely what I've been doing. I wish there was some automated way to export those lists of fields for each class from Squeak in one go...
You can browse implementors of #storeFieldsOn: or #initFieldsFrom:version:; that will give you a list (new workspace -> type "storeFieldsOn:" -> cmd+m or cmd+b)
Offline
Nice tip, nXIII. That should save me a ton of time.
Offline
nXIII wrote:
blob8108 wrote:
nXIII wrote:
For user classes, look at the #initFieldsFrom: message in each class's instance methods; they show the order in which their fields are encoded.
That's vaguely what I've been doing. I wish there was some automated way to export those lists of fields for each class from Squeak in one go...
You can browse implementors of #storeFieldsOn: or #initFieldsFrom:version:; that will give you a list (new workspace -> type "storeFieldsOn:" -> cmd+m or cmd+b)
That is so useful. Thank you!
Offline
Also, I've add a description on the wiki about SoundMedia. Are morphs marked with a star morphs that are important and need a description if they don't have one?
Offline
bobbybee wrote:
Also, I've add a description on the wiki about SoundMedia. Are morphs marked with a star morphs that are important and need a description if they don't have one?
The asterisk is from the original document — I think it just means they're in current use.
Thanks for your help!
Last edited by blob8108 (2011-12-15 16:07:57)
Offline