This example demonstrates the implementation of a simple core subset selection problem in JAMES. A distance matrix is given, which expresses the dissimilarity between each pair of items as a value in [0,1]. The goal is to select a subset of items so that the average distance between all pairs of selected items is maximized. Every item also has a name. Therefore, the input consists of the combination of an N × N distance matrix and an array of N item names (in corresponding order).
First create a data class CoreSubsetData
by implementing the IntegerIdentifiedData
interface.
The data contains the distance matrix and item names, and every item is assigned
a unique integer ID. Here the IDs correspond to the respective indices in the distance matrix and name array.
The IntegerIdentifiedData
interface has a single required method getIDs()
which returns the set of all IDs that have been assigned to the items. The framework needs to know these
IDs so that any subset selection problem can be translated into the generic problem of selecting a subset of IDs.
public class CoreSubsetData implements IntegerIdentifiedData { // item names private String[] names; // distance matrix private double[][] dist; // IDs private Set<Integer> ids; public CoreSubsetData(String[] names, double[][] dist){ // store data this.names = names; this.dist = dist; // infer IDs: 0..N-1 in case of N items // (indices in distance matrix and name array) ids = new HashSet<>(); for(int id=0; id<names.length; id++){ ids.add(id); } } public Set<Integer> getIDs() { return ids; } }
Now add two custom methods to access the underlying data based on IDs of selected items.
The method getName(int id)
returns the name of an item with a given ID and getDistance(int id1, int id2)
yields the distance between two items.
public String getName(int id){ return names[id]; } public double getDistance(int id1, int id2){ return dist[id1][id2]; }
Now create a class CoreSubsetObjective
that implements the Objective
interface. This interface has two
type parameters: the solution and data type. Set the solution type to SubsetSolution
(as for all
subset selection problems) and the data type to CoreSubsetData
(specifically for this example).
The Objective
interface requires to implement two methods:
Evaluation evaluate(SubsetSolution, CoreSubsetData)
: evaluates a subset solution given our core subset data.
The solution indicates the IDs of the selected items based on which the necessary data can be retrieved. In this example,
the average distance between all selected items is computed. The value is fixed to 0.0 if less than 2 items are selected
(no distances).
boolean isMinimizing()
: indicates whether evaluations are to be minimized or maximized, i.e. whether lower
values or higher values are better, respectively. In this example, false
is returned as the average distance
is to be maximized.
evaluate(...)
method returns an object of type Evaluation
(interface) that can be converted into
a double value by calling getValue()
. A predefined implementation of a SimpleEvaluation
is provided that
merely wraps a double value. For this example we can just compute the average distance as a double value and wrap it in
such simple evaluation object when returning. Example 1B and
example 1C further explain why evaluate(...)
returns an evaluation object and not just a plain double value, and how this can be
used to implement efficient delta evaluations. For now, we will keep things simple.
public class CoreSubsetObjective implements Objective<SubsetSolution, CoreSubsetData>{ public Evaluation evaluate(SubsetSolution solution, CoreSubsetData data) { double value = 0.0; if(solution.getNumSelectedIDs() >= 2){ // at least two items selected: compute average distance int numDist = 0; double sumDist = 0.0; Integer[] selected = new Integer[solution.getNumSelectedIDs()]; solution.getSelectedIDs().toArray(selected); for(int i=0; i<selected.length; i++){ for(int j=i+1; j<selected.length; j++){ sumDist += data.getDistance(selected[i], selected[j]); numDist++; } } value = sumDist/numDist; } return SimpleEvaluation.WITH_VALUE(value); } public boolean isMinimizing() { return false; } }
We are now ready to apply an optimization algorithm to sample a good core subset. Here we will use the
random descent algorithm, starting from a random initial solution (default)
with the predefined SingleSwapNeighbourhood
.
This neighbourhood generates moves that swap a single selected and unselected ID.
The data and objective are combined in a
SubsetProblem
with data type CoreSubsetData
.
When creating the problem, the desired subset size is also specified.
The problem is then passed to the search to find a good solution.
Note that searches are parameterized on the solution type
of the problem that is being solved, which should be set to SubsetSolution
for this example.
A maximum runtime is set before starting the search.
Alternatively, a different stop criterion could be used such as a maximum number of steps
or a maximum time without finding any improvement. Calling search.start()
blocks
until the search has terminated, after which the best found solution and its evaluation are
printed to the console. Here, the data is used again to map selected IDs to item names.
Finally, the search is disposed so that all resources are released.
// set name array and distance matrix (e.g. read from a file) String[] names = ... double[][] dist = ... // create data object CoreSubsetData data = new CoreSubsetData(names, dist); // create objective CoreSubsetObjective obj = new CoreSubsetObjective(); // combine data and objective in a subset problem, specify desired subset size int subsetSize = ... SubsetProblem<CoreSubsetData> problem = new SubsetProblem<>(data, obj, subsetSize); // create random descent search with single swap neighbourhood RandomDescent<SubsetSolution> search = new RandomDescent<>(problem, new SingleSwapNeighbourhood()); // set maximum runtime (in seconds) long timeLimit = ... search.addStopCriterion(new MaxRuntime(timeLimit, TimeUnit.SECONDS)); // start search search.start(); // print best solution and evaluation System.out.println("Best solution (IDs): " + search.getBestSolution().getSelectedIDs()); System.out.println("Best solution (names): " + search.getBestSolution().getSelectedIDs() .stream() .map(data::getName) .collect(Collectors.toSet())); System.out.println("Best solution evaluation: " + search.getBestSolutionEvaluation()); // dispose search search.dispose();
It is possible to attach one or more search listeners to a search engine. These are informed by the algorithm when certain
events occur. For example, an event is fired when the search has started, stopped or found a new best solution. We now
create a ProgressSearchListener
that prints messages to the console to stay informed about the progress
of a search.
public class ProgressSearchListener implements SearchListener<Solution> { public void searchStarted(Search search) { System.out.println(" >>> Search started"); } public void searchStopped(Search search) { System.out.println(" >>> Search stopped (" + search.getRuntime()/1000 + " sec, " + search.getSteps() + " steps)"); } public void newBestSolution(Search search, Solution newBestSolution, Evaluation newBestSolutionEvaluation, Validation newBestSolutionValidation) { System.out.println(" >>> New best solution: " + newBestSolutionEvaluation); } }
Search listeners are parameterized on the solution type of the searches to which they can listen. However, the solution
type of a listener is allowed to be more general than the solution type of the search (i.e. a super class or interface).
As our progress listener does not perform any solution type specific actions, its solution type is set to
the top-level class Solution
. It can thus be used to track the progress of any search with any
solution type as these are all required to be subclasses of Solution
.
SearchListener
interface have a default empty implementation.
Listeners can therefore simply ignore any callbacks for which no action is to be taken.
To track the progress of the applied random descent algorithm, simply add the listener before starting the search.
search.addSearchListener(new ProgressSearchListener());
The complete source code of this example is available on GitHub, including some additional code to read the input from a CSV file. You can also download a ZIP file that contains the Java sources of all examples, a compiled JAR (including all dependencies) as well as some input files for in case you want to run any of the examples. To run this example, execute
$ java -cp james-examples.jar org.jamesframework.examples.coresubset.CoreSubset <inputfile> <subsetsize> <runtime>from the command line. The input file should be a CSV file in which the first row contains N item names and the subsequent N rows specify an N × N distance matrix. The runtime is given in seconds.