Title: | Selections on Flow Matrices, Statistics on Selected Flows, Map and Graph Visualisations |
---|---|
Description: | The analysis and representation of flows often presuppose a selection to facilitate interpretation. Various methods have been proposed for selecting flows, one of the most widely used being based on major flows: it selects only the most important flows, absolute or relative, on a local or global scale. These methods often highlight hierarchies between locations, but the loss of information caused by selection is rarely taken into account. We propose statistical indicators to assess the loss of information and the characteristics of selected flows. We provide functions that select flows (main, dominant or major flows), provide statistics on selections and offer visualizations in the form of maps and graphs. See Beauguitte et al (2015) <doi:10.4000/netcom.2134>. |
Authors: | Timothée Giraud [cre, aut] , Laurent Beauguitte [aut], Marianne Guérois [aut] |
Maintainer: | Timothée Giraud <[email protected]> |
License: | GPL-3 |
Version: | 2.0.0 |
Built: | 2025-01-01 04:21:22 UTC |
Source: | https://github.com/riatelab/flows |
Data on commuters between Urban Areas of the French Grand Est region in 2011.
Fields:
i: Code of the urban area of residence
namei: Name of the urban area of residence
wi: Total number of active occupied persons in the urban area of residence
j: Code of the urban area of work
namej: Name of the urban area of work
wj: Total number of active occupied persons in the urban area of work
fij: Number of commuters between i and j
Geopackage of the Grand Est region in France and its urban areas (2010 delineation).
Commuters dataset: https://www.insee.fr/fr/statistiques/2022113
Spatial dataset: https://www.data.gouv.fr/en/datasets/geofla-r
nav <- read.csv(system.file("csv/nav.csv", package = "flows")) library(sf) UA <- st_read(system.file("gpkg/GE.gpkg", package = "flows"), layer = "urban_area") GE <- st_read(system.file("gpkg/GE.gpkg", package = "flows"), layer = "region")
nav <- read.csv(system.file("csv/nav.csv", package = "flows")) library(sf) UA <- st_read(system.file("gpkg/GE.gpkg", package = "flows"), layer = "urban_area") GE <- st_read(system.file("gpkg/GE.gpkg", package = "flows"), layer = "region")
Compares two matrices of same dimension, with same column and row names.
compare_mat(mat1, mat2, digits = 0)
compare_mat(mat1, mat2, digits = 0)
mat1 |
A square matrix of flows. |
mat2 |
A square matrix of flows. |
digits |
An integer indicating the number of decimal places to be used when printing the data.frame in the console (see round). |
A data.frame that provides statistics on differences between mat1 and mat2: absdiff are the absolute differences and reldiff are the relative differences (in percent).
# # Import data nav <- read.csv(system.file("csv/nav.csv", package = "flows")) mat <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") # Remove the matrix diagonal diag(mat) <- 0 # Select the first flows flowSel1 <- select_flows(mat = mat, method = "nfirst", k = 1) # Select flows greater than 2000 flowSel2 <- select_flows(mat = mat, method = "xfirst", k = 2000) # Combine selections flowSel <- mat * flowSel1 * flowSel2 # Compare flow matrices compare_mat(mat1 = mat, mat2 = flowSel, digits = 1)
# # Import data nav <- read.csv(system.file("csv/nav.csv", package = "flows")) mat <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") # Remove the matrix diagonal diag(mat) <- 0 # Select the first flows flowSel1 <- select_flows(mat = mat, method = "nfirst", k = 1) # Select flows greater than 2000 flowSel2 <- select_flows(mat = mat, method = "xfirst", k = 2000) # Combine selections flowSel <- mat * flowSel1 * flowSel2 # Compare flow matrices compare_mat(mat1 = mat, mat2 = flowSel, digits = 1)
Selections on flow matrices, statistics on selected flows, map and graph visualisations.
An introduction to the package conceptual background and usage is proposed in
a vignette (see vignette(topic = "flows")
) and a paper (Beauguitte, Giraud & Guérois 2015).
Maintainer: Timothée Giraud [email protected] (ORCID)
Authors:
Laurent Beauguitte
Marianne Guérois
L. Beauguitte, T. Giraud & M. Guérois, 2015. "Un outil pour la sélection et la visualisation de flux : le package flows", Netcom, 29-3/4:399-408. https://journals.openedition.org/netcom/2134.
Useful links:
Perform a Nystuen & Dacey's dominants, or nodal, flows analysis and plot a dominant flows map.
map_nodal_flows( mat, x, inches = 0.15, col_node = c("red", "orange", "yellow"), breaks = "equal", nbreaks = 4, lwd = c(1, 5, 10, 20), col_flow = "grey20", leg_node = c("Dominant", "Intermediate", "Dominated", "Size proportional\nto sum of inflows"), leg_flow = "Flow intensity", leg_pos_flow = "topleft", leg_pos_node = "topright", add = FALSE )
map_nodal_flows( mat, x, inches = 0.15, col_node = c("red", "orange", "yellow"), breaks = "equal", nbreaks = 4, lwd = c(1, 5, 10, 20), col_flow = "grey20", leg_node = c("Dominant", "Intermediate", "Dominated", "Size proportional\nto sum of inflows"), leg_flow = "Flow intensity", leg_pos_flow = "topleft", leg_pos_node = "topright", add = FALSE )
mat |
A square matrix of flows. |
x |
An sf object, the first column contains a unique identifier matching mat column and row names. |
inches |
Size of the largest circle. |
col_node |
Node colors, a vector of 3 colors. |
breaks |
How to classify flows, either a numeric vector with the actual breaks, or a classification method name (see mf_get_breaks()) |
nbreaks |
Number of classes. |
lwd |
Flows widths |
col_flow |
Flows color |
leg_node |
Labels for the nodes legend |
leg_flow |
Label for the flows legend |
leg_pos_flow |
Position of the flows legend |
leg_pos_node |
Position of the node legend |
add |
A boolean, if TRUE, add the layer to an existing plot. |
A list of sf objects is returned. The first element contains the nodes with their weight and classification (dominant, intermediary, dominated). The second element contains the flows (i, j, fij)
library(sf) library(mapsf) nav <- read.csv(system.file("csv/nav.csv", package = "flows")) mat <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") UA <- st_read(system.file("gpkg/GE.gpkg", package = "flows"), layer = "urban_area") GE <- st_read(system.file("gpkg/GE.gpkg", package = "flows"), layer = "region") mf_map(GE) map_nodal_flows( mat = mat, x = UA, col_node = c("red", "orange", "yellow"), col_flow = "grey30", breaks = c(4, 100, 1000, 2500, 8655), lwd = c(1, 4, 8, 16), add = TRUE ) mf_title("Dominant flows")
library(sf) library(mapsf) nav <- read.csv(system.file("csv/nav.csv", package = "flows")) mat <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") UA <- st_read(system.file("gpkg/GE.gpkg", package = "flows"), layer = "urban_area") GE <- st_read(system.file("gpkg/GE.gpkg", package = "flows"), layer = "region") mf_map(GE) map_nodal_flows( mat = mat, x = UA, col_node = c("red", "orange", "yellow"), col_flow = "grey30", breaks = c(4, 100, 1000, 2500, 8655), lwd = c(1, 4, 8, 16), add = TRUE ) mf_title("Dominant flows")
Perform a Nystuen & Dacey's dominants flows analysis.
nodal_flows(mat)
nodal_flows(mat)
mat |
A square matrix of flows. |
The matrix of the selected flows is returned.
nav <- read.csv(system.file("csv/nav.csv", package = "flows")) mat <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") res <- nodal_flows(mat) res[1:5, 1:5]
nav <- read.csv(system.file("csv/nav.csv", package = "flows")) mat <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") res <- nodal_flows(mat) res[1:5, 1:5]
This function plots a dominant flows graph.
plot_nodal_flows( mat, leg_pos_flows = "topright", leg_flow = "Flows Intensity", leg_pos_node = "bottomright", leg_node = c("Dominant", "Intermediary", "Dominated", "Size proportional\nto sum of inflows"), labels = FALSE )
plot_nodal_flows( mat, leg_pos_flows = "topright", leg_flow = "Flows Intensity", leg_pos_node = "bottomright", leg_node = c("Dominant", "Intermediary", "Dominated", "Size proportional\nto sum of inflows"), labels = FALSE )
mat |
A square matrix of dominant flows (see nodal_flows). |
leg_pos_flows |
Position of the flows legend, one of "topleft", "top", "topright", "left", "right", "bottomleft", "bottom", "bottomright". |
leg_flow |
Title of the flows legend. |
leg_pos_node |
Position of the nodes legend, one of "topleft", "top", "topright", "left", "right", "bottomleft", "bottom", "bottomright". |
leg_node |
Text of the nodes legend. |
labels |
A boolean, if TRUE, labels of dominant and intermediary nodes are plotted. |
This function uses the Davidson Harel algorithm from igraph.
As square matrices can easily be plotted with plot.igraph or gplot functions from igraph and sna packages, we do not propose visualisation for other outputs.
nav <- read.csv(system.file("csv/nav.csv", package = "flows")) mat <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") res <- nodal_flows(mat) # Plot dominant flows graph plot_nodal_flows(mat = res)
nav <- read.csv(system.file("csv/nav.csv", package = "flows")) mat <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") res <- nodal_flows(mat) # Plot dominant flows graph plot_nodal_flows(mat = res)
From a long format matrix to a a wide format matrix.
prepare_mat(x, i, j, fij)
prepare_mat(x, i, j, fij)
x |
A data.frame of flows between origins and destinations: long format matrix (origins, destinations, flows intensity). |
i |
A character giving the origin field name in mat. |
j |
A character giving the destination field name in mat. |
fij |
A character giving the flow field name in mat. |
A square matrix of flows. Diagonal can be filled or empty depending on data used.
# Import data nav <- read.csv(system.file("csv/nav.csv", package = "flows")) # Prepare data myflows <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") myflows[1:5, 1:5]
# Import data nav <- read.csv(system.file("csv/nav.csv", package = "flows")) # Prepare data myflows <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") myflows[1:5, 1:5]
Flow selection from origins.
select_flows(mat, method = "nfirst", ties = "first", global = FALSE, k, w)
select_flows(mat, method = "nfirst", ties = "first", global = FALSE, k, w)
mat |
A square matrix of flows. |
method |
A method of flow selection, one of "dominant", "nfirst", "xfirst" or "xsumfirst":
|
ties |
In case of equality with "nfirst" method, use "random" or "first" (see rank). |
global |
If TRUE flows selections is done at the matrix scale. |
k |
Selection threshold for nfirst, xfirst and xsumfirst methods, ratio for dominant method. |
w |
A vector of units weigths (sum of incoming flows, sum of outgoing flows...). |
If method = "dominant", select which flow (fij or fji) must be kept. If the ratio weight of destination (wj) / weight of origin (wi) is greater than k, then fij is selected and fji is not. This function can perform the second criterion of the Nystuen & Dacey's dominants flows analysis.
A boolean matrix of selected flows. Use element-wise multiplication to get flows intensity.
# Import data nav <- read.csv(system.file("csv/nav.csv", package = "flows")) # Prepare data mat <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") # remove diagonal diag(mat) <- 0 # Select the first flow from each origin res <- select_flows(mat = mat, method = "nfirst", global = FALSE, k = 1) rowSums(res) # Select the 5 first flows of the matrix res <- select_flows(mat = mat, method = "nfirst", global = TRUE, k = 5) sum(res) # Select the flows greater than 5000 res <- select_flows(mat = mat, method = "xfirst", k = 5000) r <- mat * res r[r > 0] # Select as many flows as necessary for each origin so that their sum is at least equal to 500. res <- select_flows(mat = mat, method = "xsumfirst", global = FALSE, k = 500) r <- mat * res rowSums(r) # Select as many flows in the matrix so that their sum is at least equal to 50000. res <- select_flows(mat = mat, method = "xsumfirst", global = TRUE, k = 50000) r <- mat * res sum(rowSums(r)) # Select dominant flows m <- mat[1:5, 1:5] ws <- colSums(m) res <- select_flows(mat = m, method = "dominant", k = 1, w = ws) # 2nd element has a lower weight than 3rd element (ratio > 1) ws[3] / ws[2] # The flow from 2nd element to 3rd element is kept res[2, 3] # The flow from 3rd element to 2nd element is removed res[3, 2]
# Import data nav <- read.csv(system.file("csv/nav.csv", package = "flows")) # Prepare data mat <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") # remove diagonal diag(mat) <- 0 # Select the first flow from each origin res <- select_flows(mat = mat, method = "nfirst", global = FALSE, k = 1) rowSums(res) # Select the 5 first flows of the matrix res <- select_flows(mat = mat, method = "nfirst", global = TRUE, k = 5) sum(res) # Select the flows greater than 5000 res <- select_flows(mat = mat, method = "xfirst", k = 5000) r <- mat * res r[r > 0] # Select as many flows as necessary for each origin so that their sum is at least equal to 500. res <- select_flows(mat = mat, method = "xsumfirst", global = FALSE, k = 500) r <- mat * res rowSums(r) # Select as many flows in the matrix so that their sum is at least equal to 50000. res <- select_flows(mat = mat, method = "xsumfirst", global = TRUE, k = 50000) r <- mat * res sum(rowSums(r)) # Select dominant flows m <- mat[1:5, 1:5] ws <- colSums(m) res <- select_flows(mat = m, method = "dominant", k = 1, w = ws) # 2nd element has a lower weight than 3rd element (ratio > 1) ws[3] / ws[2] # The flow from 2nd element to 3rd element is kept res[2, 3] # The flow from 3rd element to 2nd element is removed res[3, 2]
This function provides various indicators and graphical outputs on a flow matrix.
stat_mat(mat, output = "all", verbose = TRUE)
stat_mat(mat, output = "all", verbose = TRUE)
mat |
A square matrix of flows. |
output |
Graphical output. Choices are "all" for all graphics, "none" to avoid any graphical output, "degree" for degree distribution, "wdegree" for weighted degree distribution, "lorenz" for Lorenz curve of link weights and "boxplot" for boxplot of link weights (see 'Details'). |
verbose |
A boolean, if TRUE, returns statistics in the console. |
Graphical ouputs concern outdegrees by default. If the matrix is transposed, outputs concern indegrees.
The function returns a list of statistics and may plot graphics.
nblinks: number of cells with values > 0
density: number of links divided by number of possible links (also called gamma index by geographers), loops excluded
connectcomp: number of connected components (isolates included,
weakly connected: use of clusters
where mode = "weak")
connectcompx: number of connected components (isolates deleted,
weakly connected: use of clusters
where mode = "weak")
sizecomp: a data.frame of connected components: size and sum of flows per component (isolates included)
compocomp: a data.frame of connected components giving membership of units (isolates included)
degrees: a data.frame of nodes degrees and weighted degrees
sumflows: sum of flows
min: minimum flow
Q1: first quartile of flows
median: median flow
Q3: third quartile of flows
max: maximum flow
mean: mean flow
sd: standart deviation of flows
# Import data nav <- read.csv(system.file("csv/nav.csv", package = "flows")) myflows <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") # Get statistics and graphs about the matrix mystats <- stat_mat(mat = myflows, output = "all", verbose = TRUE) # Size of connected components mystats$sizecomp # Sum of flows mystats$sumflows # Plot Lorenz curve only stat_mat(mat = myflows, output = "lorenz", verbose = FALSE) # Statistics only mystats <- stat_mat(mat = myflows, output = "none", verbose = FALSE) str(mystats)
# Import data nav <- read.csv(system.file("csv/nav.csv", package = "flows")) myflows <- prepare_mat(x = nav, i = "i", j = "j", fij = "fij") # Get statistics and graphs about the matrix mystats <- stat_mat(mat = myflows, output = "all", verbose = TRUE) # Size of connected components mystats$sizecomp # Sum of flows mystats$sumflows # Plot Lorenz curve only stat_mat(mat = myflows, output = "lorenz", verbose = FALSE) # Statistics only mystats <- stat_mat(mat = myflows, output = "none", verbose = FALSE) str(mystats)