You can write your own routines for reading and writing files with examples. This is essentially trivial: a function for loading should get a file name and a flag named createNewOn
telling when to create the new attribute; the latter has to have a default value None
and be set to orange.Variable.MakeStatus.NoRecognizedValues
in the function itself if None
is given. When constructing new attributes, function should do so by calling Variable.make
, which accepts the name of the attribute, its type, its values and createNewOn
. make
will return an attribute and the status of the variable, telling whether a new attribute was created or the old one reused and why. The function should return ExampleTable
and a list of statuses. However, if createNewOn
was None
, the function should return a plain ExampleTable
.
This may not look as trivial as promised, but it is easier explained in Python: see module orngIO.py
. There is a function loadArff
which you can use as a template.
The function for writing gets the file name and examples and writes the file. No special functionality in Orange is needed for this.
Examples are usually loaded by calling ExampleTable
, giving it a file name as an argument. If the user writes his own routines for a format with file extension, say, ".my", it would be handy if ExampleTable
recognized this extension as well and treated the file format the same way as the built-in formats. Here's how you do it. First, define the function(s) for reading and/or writing the examples. You don't need to support both. The function for reading should accept a single argument, the file name, and possibly some keyword arguments; it should return an ExampleTable
with the loaded examples. The writing function gets the file name and the examples, it writes the file(s) and returns nothing.
Unless you do it for a strict private use and you really know what you are doing, the reading function should also accept keyword arguments. These should be the same arguments that are usually passed to ExampleTable
, such as dontCheckStored
. You can also add format specific keyword arguments if you will; everything that ExampleTable
is called with gets passed to your function. In addition, and even more important, you should use DomainDepot
s to construct domains and store them (see documentation on loading multiple files with the same domain on why this is needed and the page about domain depots to see how it's done).
You can also add keyword arguments to the writing function. No general arguments of this kind are currently used in Orange, but you can introduce format specific arguments if needed.
The file name is passed exactly as the caller to ExampleTable
or its save
method gave it. If the caller gave an extension, it will be there. Your code should therefore accept both, so you'll need to commence it by something like if filename[-5:] == ".arff": filename = filename[:-5]
.
Finally, if you want the ExampleTable
's constructor and its save
method to recognize the format, you need to register your functions like this
The first argument tells the format's name; the second and third give the functions for loading and saving the data and the last one specifies the extension. If multiple extensions are possible, give a list of strings instead of a single string.
If you don't support certain operation, pass None
for a function. The following call registers a C5.0 file type that can save the data but doesn't load it.
User defined types have a precedence over the built-ins if the extension is given: if you load a file as ExampleTable("mydata.names")
the above functions are used (are rather would be - loading function is not defined above). If there is no extension, built-ins are tried first. This is rather confusing, so it is advisable not to override the existing extensions when not absolutely needed. (Note: these rules might change - in the future user-defined formats will probably have a precedence over built-ins regardless of whether the extension is given or not.)
For more examples see the module orngIO.py in your Orange directory.