Filter-DB

Overview

Filter-DB (Filter-DataBase) is a utility for pre-processing datasets for further work with Coron-base and AssRuleX. It can filter the input dataset horizontally (i.e. the rows) and/or vertically (i.e. the attributes).

Synopsis

Usage: ./pre03_filterDb.sh  [switches]  <database>  <option>

Options:
   -attributes=<file>              In <file> specify some attributes.
                                   Lines having these attributes are kept.
                                   Optionally you can specify a 2nd line in <file>,
                                   which may contain one number.
                                   This means: keep lines having at least $number
                                               attributes of the list $attributes.
   -columnkeep=<file>              In <file> specify some attributes.
                                   Only the specified columns are kept.
   -columndelete=<file>            In <file> specify some attributes.
                                   The specified columns will be deleted.

Database:                          database file. Must be a RCF file.

Horizontal Filtering

Option: -attributes=<file>, where the configuration file <file> can have two forms.

The configuration file has one line

The configuration file has just one line, a list of attributes, e.g.

1 2

Meaning: keep rows that have attributes 1 and 2.

Example:

./pre03_filterDb.sh sample/laszlo.rcf -attributes:sample/filter_db/horizontal_a

Output:

# keep lines that have attributes {1, 2}

[Relational Context]
Default Name
[Binary Relation]
Name_of_dataset
o1 | o3 | o5
a | b | c | d | e
1 1 0 1 1
1 1 1 0 1
1 1 1 0 1
[END Relational Context]

The configuration file has two lines

The conguration le has one more line, containing a number, e.g.

1 2 3 5
3

Meaning: keep rows that have at least three attributes of {1,2,3,5}. That is: keep rows that have the following attributes: 235 or 135 or 125 or 123 or 1235.

Example:

./pre03_filterDb.sh sample/laszlo.rcf -attributes:sample/filter_db/horizontal_b

Output:

# keep lines that have at least 3 attributes of {1, 2, 3, 5}

[Relational Context]
Default Name
[Binary Relation]
Name_of_dataset
o1 | o3 | o4 | o5
a | b | c | d | e
1 1 0 1 1
1 1 1 0 1
0 1 1 0 1
1 1 1 0 1
[END Relational Context]

Vertical Filtering

Keep some columns

The option to use is -columnkeep=<file>, where the configuration file <file> can have one line, a list of attributes, e.g.

1 2 3

Meaning: in the rows keep the following columns only: 1 and 2 and 3. Note that the other columns are still present, but all their values are set to 0.

Example:

./pre03_filterDb.sh sample/laszlo.rcf -columnkeep:sample/filter_db/vertical_a

Output:

# in the rows of the database only keep the following columns: {1, 2, 3}

[Relational Context]
Default Name
[Binary Relation]
Name_of_dataset
o1 | o2 | o3 | o4 | o5
a | b | c | d | e
1 1 0 0 0
1 0 1 0 0
1 1 1 0 0
0 1 1 0 0
1 1 1 0 0
[END Relational Context]

Delete some columns

The option to use is -columndelete=<file>, where the configuration file <file> can have one line, a list of attributes, e.g.

1 3

Meaning: delete the 1st and the 3rd columns in each row of the dataset. Note that these columns are still present, but all their values are set to 0.

Example:

./pre03_filterDb.sh sample/laszlo.rcf -columndelete:sample/filter_db/vertical_b

Output:

# in the rows of the database delete the following columns: {1, 3}

[Relational Context]
Default Name
[Binary Relation]
Name_of_dataset
o1 | o2 | o3 | o4 | o5
a | b | c | d | e
0 1 0 1 1
0 0 0 0 0
0 1 0 0 1
0 1 0 0 1
0 1 0 0 1
[END Relational Context]
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License