Gene Stacker

How should the input parameters be chosen?

Gene Stacker provides a lot of input parameters and options that influence its runtime and the quality and properties of the constructed crossing schedules. It is advised to first try the default settings, specifying only the required parameters (maximum number of generations and overall success rate) and those constraints that are important for your application (number of seeds obtained from a crossing, maximum crossings per plant, ...) with a reasonable runtime limit (e.g. 24 hours).

If Gene Stacker is too slow or runs out of memory (see below) for your input using these settings, consider:

Specifying additional constraints that shrink the search space:
- maximum plants per generation
- maximum overall linkage phase ambiguity
- maximum total number of crossings
Setting tighter constraints, e.g. by reducing the maximum number of generations.
Using heuristic presets -faster or -fastest. However, do keep in mind that this may yield worse solutions. Especially the preset -fastest is usually very fast and memory friendly, but often also yields a rough approximation of the actual Pareto frontier.

If presets -faster and -fastest are still too slow or use too much memory, consider specifying a maximum number of simultaneous crossovers per chromosome, using -mco,--max-crossovers <c>. For example

$ java -jar genestacker.jar -fastest -mco 1 [<option>]* <input> <output>

applies preset -fastest and allows only one simultaneous crossover per chromosome. Especially if your input contains chromosomes with a large number of loci, this additional option may result in a significant speedup. Note that -mco can only be used in combination with a heuristic seed lot constructor (H5/H5c) or a preset that applies such constructor, i.e. the default setting or presets -faster or -fastest.

Gene Stacker uses independent threads to extend partial schedules in different ways, which can be executed in parallel on multicore machines. Therefore, consider running Gene Stacker on a computer with a higher number of processor cores to speed up the execution.

In case the default setting is more than fast enough consider running presets -better and -best as well to check whether this produces better schedules, as the heuristics might have missed something. Usually, differences between these presets and the default setting are very small (if any) except for the runtime which is significantly increased.

After a while, Gene Stacker crashes with a memory exception. What now?

By default, when running a Java application, not all available RAM memory is reserved for this application. As Gene Stacker is quite memory intensive for larger problem instances, it might happen that the default amount of RAM is not sufficient and that Gene Stacker runs out of memory. This will cause the program to crash, printing an error message such as java.lang.OutOfMemoryError: Java heap space.

This problem can be solved by reserving more memory for Gene Stacker, using the -Xmx option of the Java virtual machine. For example, to reserve 4 GB of RAM for Gene Stacker, run

$ java -Xmx4g -jar genestacker.jar [<option>]* <input> <output>

specifying the desired options, input and output as explained on the usage page. To change the memory settings when running Gene Stacker from R with the provided interface, use the parameter mem, e.g. mem="4g", as in

> genestacker.run("examples/A.xml", "out", g=3, s=0.95, mem="4g")

By default, this R interface reserves 2 GB of RAM for Gene Stacker.

When using the R interface, R complains about not being able to create the Java Virtual Machine and Gene Stacker does not run. What now?

If R prints an error message like

Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

this probably means that your computer has less than 2 GB of RAM, which is the default setting of the Gene Stacker R interface. To solve this problem, reduce the amount of memory reserved for Gene Stacker, using the mem=... parameter. For example, set it to 512 MB using mem="512m", as in

> genestacker.run("examples/A.xml", "out", g=3, s=0.95, mem="512m")

However, note that it is not advised to run Gene Stacker on a machine with less than 2 GB of RAM, as Gene Stacker is quite memory intensive. If the available memory is insufficient, Gene Stacker will crash, printing an error message such as java.lang.OutOfMemoryError: Java heap space.

Some of the schedules created by Gene Stacker indicate LPA scores, what does this mean?

LPA is short for linkage phase ambiguity. It arises when the linkage phase of a genotype \(G\) among the offspring generated by crossing \(P\) and \(Q\) can not be inferred from the parental genotypes, i.e. when multiple phases are possible. The LPA of \(G\) then expresses the probability that an undesired phase will be obtained. The overall LPA of a crossing schedule corresponds to the probability that this will happen for at least one of the targets aimed for through the schedule.

The overall LPA is considered as an objective for minimization in the Pareto setting. It can also be restricted using the option ‑lpa,‑‑max-linkage-phase-ambiguity <a>.

FAQ

How should the input parameters be chosen?

After a while, Gene Stacker crashes with a memory exception. What now?

When using the R interface, R complains about not being able to create the Java Virtual Machine and Gene Stacker does not run. What now?

Some of the schedules created by Gene Stacker indicate LPA scores, what does this mean?