forked from prioritizr/prioritizr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
CONTRIBUTING.Rmd
279 lines (223 loc) · 14.7 KB
/
CONTRIBUTING.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
---
title: "Contributing"
output:
rmarkdown::github_document:
html_preview: no
---
<!--- CONTRIBUTING.md is generated from CONTRIBUTING.Rmd. Please edit that file -->
```{r, include = FALSE}
devtools::load_all()
```
First of all, thanks for thinking about contributing to the _prioritizr R_ package! This is an open source project, and contributions are extremely welcome.
[repo]: https://github.com/prioritizr/prioritizr
[issues]: https://github.com/prioritizr/prioritizr/issues
[new_issue]: https://github.com/prioritizr/prioritizr/issues/new
[website]: https://prioritizr.net
## How you can contribute
There are several ways you can contribute to this project. If you want to know more about why and how to contribute to open source projects like this one, see this [Open Source Guide](https://opensource.guide/how-to-contribute/).
### Ask a question
Using the package and got stuck? Browse the [documentation][website] to see if you can find a solution. Still stuck? Post your question as an [issue on GitHub][new_issue]. While we cannot offer user support, we'll try to do our best to address it, as questions often lead to better documentation or the discovery of bugs.
Want to ask a question in private? Please feel free to email the package developers [Jeffrey Hanson](mailto:[email protected]) or [Richard Schuster](mailto:[email protected]).
### Propose an idea
Have an idea for a new feature? Take a look at the [documentation][website] and [issue tracker][issues] to see if it isn't included or suggested yet. If not, suggest your idea as an [issue on GitHub][new_issue]. While we can't promise to implement your idea, it helps to:
* Explain in detail how it would work.
* Keep the scope as narrow as possible.
See below if you want to contribute code for your idea as well.
### Report a bug
Using the package and discovered a bug? That's annoying! Don't let others have the same experience and report it as an [issue on GitHub][new_issue] so we can fix it. A good bug report makes it easier for us to do so, so please include:
* Your operating system name and version (e.g., Mac OS 10.13.6).
* Details about your local setup that might be helpful in troubleshooting, such as your session information (via `sessionInfo()`).
* Detailed steps to reproduce the bug.
### Improve the documentation
Noticed a typo on the website? Think a function could use a better example? Good documentation makes all the difference, so your help to improve it is very welcome!
#### The website
[This website][website] is generated using the [_pkgdown R_ package](http://pkgdown.r-lib.org/) package. That means we don't have to write any html: content is pulled together from documentation in the code, vignettes, [Markdown](https://guides.github.com/features/mastering-markdown/) files, the package `DESCRIPTION` and `_pkgdown.yml` settings. If you know your way around `pkgdown`, you can [propose a file change](https://help.github.com/articles/editing-files-in-another-user-s-repository/) to improve documentation. If not, [report an issue][new_issue] and we can point you in the right direction.
## Learning resources
If you're new too R package development, we recommend reading the [_Advanced R_](http://adv-r.had.co.nz/) book. Since a large part of the code base is written in the C++ programming language using the [_Rcpp R_ package](https://www.rcpp.org/), we also recommend reading the [_Rcpp for everyone_](https://teuder.github.io/rcpp4everyone_en/) book. Both of these are books are free, online, and absolutely fantastic resources.
#### Function documentation
Functions are described as comments near their code and translated to documentation using the [_roxygen2 R_ package](https://github.com/r-lib/roxygen2) package. If you want to improve a function description:
1. Go to `R/` directory in the [code repository][repo].
2. Look for the file with the name of the function.
3. [Propose a file change](https://help.github.com/articles/editing-files-in-another-user-s-repository/) to update the function documentation in the roxygen comments (starting with `#'`).
### Contribute code
Care to fix bugs or implement new functionality? That's brilliant! Have a look at the [issue list][issues] and leave a comment on the things you want to work on. See also the development guidelines below. Please note that we use American spelling for the package, and follow the [tidyverse style guide](https://style.tidyverse.org/).
## Development guidelines
We try to follow the [GitHub flow](https://guides.github.com/introduction/flow/) for development.
1. Fork [this repo][repo] and clone it to your computer. To learn more about this process, see [this guide](https://guides.github.com/activities/forking/).
2. If you have forked and cloned the project before and it has been a while since you worked on it, [pull changes from the original repo](https://help.github.com/articles/merging-an-upstream-repository-into-your-fork/) to your clone by using `git pull upstream master`.
3. Open the RStudio project file (`.Rproj`).
4. Make your changes:
* Write your code.
* Test your code (bonus points for adding unit tests).
* Document your code (see function documentation above).
* Check your code with `devtools::check()` and aim for 0 errors and warnings.
5. Commit and push your changes.
6. Submit a [pull request](https://guides.github.com/activities/forking/#making-a-pull-request).
## Internal design
Here we describe the internal design of the package. A `ConservationProblem` class (defined in `R/ConservationProblem-class.R`) is used to store the data and problem formulation of the conservation planning problem. Additionally, an `OptimizationProblem` class (defined in `R/OptimizationProblem-class.R`) is used to store the precise mathematical formulation of a conservation planning problem. When a `ConservationProblem` object is solved (using the `solve()` function), the `ConservationProblem` object is first compiled into an `OptimizationProblem` object (using the `compile()` function) and the `OptimizationProblem` object is then solved (using the `solve()` function). This distinction is important because if you want to add a new objective, constraint, penalty, or decision function, then you'll need to write code that adds the mathematical formulation of the function to a `OptimizationProblem` object.
All `OptimizationProblem` objects are simply a wrapper to an external pointer (`XPtr`) that points to a C++ `OPTIMIZATIONPROBLEM` class object (defined in `src/optimization_problem.h`). The C++ `OPTIMIZATIONPROBLEM` class contains the standard data needed to formulate a mixed integer linear programming problem (i.e., model sense, problem objective , constraint matrix, constraint senses, right-hand-side values, variable decision types, and variable bounds). This class also contains additional data that pertain to conservation problems (i.e., number of features, planning units, and zones). If you're unfamiliar with the standard representation of a mixed integer linear programming problem, we encourage you to read the documentation for the [_gurobi R_ package](http://www.gurobi.com/documentation/8.0/refman/r_api_overview.html#r:problem). The fields `A_i`, `A_j`, and `A_x` correspond the row, column, and cell values for the constraint matrix of the optimization problem (respectively). The other fields follow standard conventions. Note that the constraint matrix is, ultimately, constructed as a `Matrix::sparseMatrix()` so row and column indices do not need to be sequential. Additionally, all C++ constraint and penalty functions should be independent. In other words, though it may be computationally efficient to reuse constraints and variables encoded in other functions, each C++ constraint and penalty function defines its own constraints and variables. All conservation planning problems are defined following one of two standard mathematical formulations: the compressed and the expanded formulation.
The compressed formulation defines a problem which assumes that all instances of a conserved feature are used to achieve the target or count towards the benefit of a solution. Although the expanded formulation can provide identical solutions to the compressed formulation, the compressed formulation is provided because it is simpler and can be used with much larger sized problems. Currently, all constraints use the compressed formulation except for the `add_feature_contiguity_constraints()` function. Under this formulation, the first set of decision variables (the first number of planning units $\times$ number of zones) always pertain to the state of the planning units. Thus in a problem containing 3 planning units and 2 zones, the first six variables indicate the allocation of: planning unit 1 in zone 1, planning unit 2 in zone 1, planning unit 3 in zone 1, planning unit 1 in zone 2, planning unit 2 in zone 2, planning unit 3 in zone 2. The first set of constraints (rows) always correspond to each target (noting that some objectives use "fake" targets to initialize the feature by planning unit data, see `R/compile.R`). These rows, which each correspond to a single target, contain the amount of each feature in each zone for which the target pertains. Thus rows for targets which pertain to a single zone will only contain feature abundance data for planning units (columns) in a single zone, and rows for targets which pertain to a single feature in multiple zones will contain feature abundance data for planning units (columns) in multiple zones.
To help illustrate the compressed formulation, consider the following problem:
```{r}
# simulate data
pu <- data.frame(id = 1:3, cost_1 = 4:6, cost_2 = 7:9)
zone <- data.frame(id = 1:2, name = c("z1", "z2"))
feature <- data.frame(id = 1:2, name = c("f1", "f2"))
rij <- expand.grid(pu = 1:3, species = 1:2, zone = 1:2)
rij$amount <- seq_len(nrow(rij)) + nrow(rij)
targets <- matrix(1, nrow = 2, ncol = 2)
# create problem
p <-
problem(pu, feature, rij, c("cost_1", "cost_2"), zone) %>%
add_absolute_targets(targets)
# print problem
print(p)
```
The compressed formulation expresses the planning unit and feature data using the following constraint matrix. Here, each variable (column) corresponds to a different planning unit and a different zone allocation, each constraint (row) corresponds to a different target, and each cell corresponds to the amount of a each feature in each planning unit given a different zone (based on the _rij_ data and the targets).
```{r, echo = FALSE}
# generate targets
targets2 <- p$targets$output()
# create matrix
m <- matrix(
NA,
ncol = p$number_of_zones() * p$number_of_planning_units(),
nrow = nrow(targets2)
)
# add row names
rownames(m) <- paste0(
"target_", rep(p$feature_names(), p$number_of_zones()),
"_", rep(p$zone_names(), each = p$number_of_features())
)
# add column names
colnames(m) <- paste0(
"pu", rep(seq_len(p$number_of_planning_units()), p$number_of_zones()),
"_", rep(p$zone_names(), each = p$number_of_planning_units())
)
# add values according to targets
for (i in seq_len(nrow(targets2))) {
# extract indicees
curr_feature <- targets2$feature[i]
curr_zone <- targets2$zone[i][[1]]
curr_cols <-
(p$number_of_planning_units() * (curr_zone - 1)) +
seq_len(p$number_of_planning_units())
curr_amount <-
rij$amount[unlist(rij$zone) == curr_zone & rij$species == curr_feature]
# set matrix values
m[i, curr_cols] <- curr_amount
}
# convert to sparse matrix
m[is.na(m)] <- 0
m <- as(m, "sparseMatrix")
# print matrix
print(m)
```
The expanded formulation, on the other hand, defines a problem which can allow for some instances of conserved features to not be used for achieving the targets or maximizing the conservation benefit. This formulation is a generalized version of the compressed formulation. It contains additional variables (columns) and constraints (rows) for each combination of feature, planning unit, and zone that indicate if a given planning unit allocated to a specific zone is also allocated to conserve a given feature.
Given the previous problem, the expanded formulation expresses the planning unit and feature data in the constraint matrix as:
```{r}
# generate targets
targets2 <- p$targets$output()
# create matrix
m <- matrix(
NA,
ncol =
(p$number_of_zones() * p$number_of_planning_units()) +
(p$number_of_zones() * p$number_of_planning_units() *
p$number_of_features()),
nrow =
(p$number_of_zones() * p$number_of_planning_units() *
p$number_of_features()) +
(p$number_of_features() * p$number_of_zones())
)
# add row names
rownames(m) <- c(
paste0(
"pu",
rep(
seq_len(p$number_of_planning_units()),
p$number_of_zones() * p$number_of_features()
),
"_",
rep(
rep(p$feature_names(), each = p$number_of_planning_units()),
p$number_of_zones()
),
"_",
rep(
p$zone_names(),
each = p$number_of_planning_units() * p$number_of_features()
)
),
paste0(
"target_",
rep(
p$feature_names(), p$number_of_zones()), "_", rep(p$zone_names(),
each = p$number_of_features()
)
)
)
# add column names
colnames(m) <- c(
paste0(
"pu",
rep(seq_len(p$number_of_planning_units()), p$number_of_zones()),
"_",
rep(p$zone_names(), each = p$number_of_planning_units())
),
paste0(
"pu",
rep(
seq_len(p$number_of_planning_units()),
p$number_of_zones() * p$number_of_features()
),
"_",
rep(
rep(p$feature_names(), each = p$number_of_planning_units()),
p$number_of_zones()
),
"_",
rep(
p$zone_names(),
each = p$number_of_planning_units() *
p$number_of_features()
)
)
)
# add in indicator variables and constraints
curr_row <- 0
for (z in seq_len(p$number_of_zones())) {
for (i in seq_len(p$number_of_features())) {
for (j in seq_len(p$number_of_planning_units())) {
curr_row <- curr_row + 1
curr_col1 <- ((z - 1) * p$number_of_planning_units()) + j
curr_col2 <-
(p$number_of_planning_units() * p$number_of_zones()) +
((z - 1) * p$number_of_features() * p$number_of_planning_units()) +
((i - 1) * p$number_of_planning_units()) + j
m[curr_row, curr_col1] <- -1
m[curr_row, curr_col2] <- 1
}
}
}
# add in targets
for (i in seq_len(nrow(targets2))) {
# extract indices
curr_row <- curr_row + 1
curr_feature <- targets2$feature[i]
curr_zone <- targets2$zone[i][[1]]
curr_cols <-
(p$number_of_planning_units() * p$number_of_zones()) +
((curr_zone - 1) * p$number_of_features() * p$number_of_planning_units()) +
((curr_feature - 1) * p$number_of_planning_units()) +
seq_len(p$number_of_planning_units())
curr_amount <-
rij$amount[unlist(rij$zone) == curr_zone & rij$species == curr_feature]
# set matrix values
m[curr_row, curr_cols] <- curr_amount
}
# convert to sparse matrix
m[is.na(m)] <- 0
m <- as(m, "sparseMatrix")
# print matrix
print(m)
```