Download Dataverse File(s). get_file_*
functions return a raw binary file, which cannot be readily analyzed in R.
To use the objects as dataframes, see the get_dataframe_*
functions at
?get_dataframe
instead.
get_file(
file,
dataset = NULL,
format = c("original", "bundle"),
vars = NULL,
return_url = FALSE,
key = Sys.getenv("DATAVERSE_KEY"),
server = Sys.getenv("DATAVERSE_SERVER"),
original = TRUE,
version = ":latest",
...
)
get_file_by_name(
filename,
dataset,
format = c("original", "bundle"),
vars = NULL,
return_url = FALSE,
key = Sys.getenv("DATAVERSE_KEY"),
server = Sys.getenv("DATAVERSE_SERVER"),
original = TRUE,
...
)
get_file_by_id(
fileid,
dataset = NULL,
format = c("original", "bundle"),
vars = NULL,
original = TRUE,
progress = NULL,
return_url = FALSE,
key = Sys.getenv("DATAVERSE_KEY"),
server = Sys.getenv("DATAVERSE_SERVER"),
...
)
get_file_by_doi(
filedoi,
dataset = NULL,
format = c("original", "bundle"),
vars = NULL,
original = TRUE,
return_url = FALSE,
key = Sys.getenv("DATAVERSE_KEY"),
server = Sys.getenv("DATAVERSE_SERVER"),
...
)
An integer specifying a file identifier; or a vector of integers
specifying file identifiers; or, if used with the prefix "doi:"
, a
character with the file-specific DOI; or, if used without the prefix, a
filename accompanied by a dataset DOI in the dataset
argument, or an object of
class “dataverse_file” as returned by dataset_files
.
Can be a vector for multiple files.
A character specifying a persistent identification ID for a dataset,
for example "10.70122/FK2/HXJVJU"
. Alternatively, an object of class
“dataverse_dataset” obtained by dataverse_contents()
.
A character string specifying a file format for download.
by default, this is “original” (the original file format). If NULL
,
no query is added, so ingested files are returned in their ingested TSV form.
For tabular datasets, the option “bundle” downloads the bundle
of the original and archival versions, as well as the documentation.
See https://guides.dataverse.org/en/latest/api/dataaccess.html for details.
A character vector specifying one or more variable names, used to extract a subset of the data.
Instead of downloading the file, return the URL for download.
Defaults to FALSE
.
A character string specifying a Dataverse server API key. If one
is not specified, functions calling authenticated API endpoints will fail.
Keys can be specified atomically or globally using
Sys.setenv("DATAVERSE_KEY" = "examplekey")
.
A character string specifying a Dataverse server.
Multiple Dataverse installations exist, with "dataverse.harvard.edu"
being the
most major. The server can be defined each time within a function, or it can
be set as a default via an environment variable. To set a default, run
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")
or add DATAVERSE_SERVER = "dataverse.harvard.edu"
in one's .Renviron
file (usethis::edit_r_environ()
), with the appropriate domain as its value.
A logical, defaulting to TRUE. If a ingested (.tab) version is
available, download the original version instead of the ingested? If there was
no ingested version, is set to NA. Note in get_dataframe_*
,
original
is set to FALSE by default. Either can be changed.
A character specifying a version of the dataset.
This can be of the form "1.1"
or "1"
(where in "x.y"
, x is a major
version and y is an optional minor version), or
":latest"
(the default, the latest published version).
We recommend using the number format so that
the function stores a cache of the data (See cache_dataset
).
If the user specifies a key
or DATAVERSE_KEY
argument, they can access the
draft version by ":draft"
(the current draft) or ":latest"
(which will
prioritize the draft over the latest published version.
Finally, set use_cache = "none"
to not read from the cache and re-download
afresh even when version
is provided.
Additional arguments passed to an HTTP request function,
such as GET
, POST
, or
DELETE
. See use_cache
for details
on how the R dataverse package uses disk and session caches to
improve network performance.
Filename of the dataset, with file extension as shown in Dataverse (for example, if nlsw88.dta was the original but is displayed as the ingested nlsw88.tab, use the ingested version.)
A numeric ID internally used for get_file_by_id
. Can be a vector for multiple files.
Whether to show a progress bar of the download.
If not specified, will be set to TRUE
for a file larger than 100MB. To fix
a value, set FALSE
or TRUE
.
A DOI for a single file (not the entire dataset), of the form
"10.70122/FK2/PPIAXE/MHDB0O"
or "doi:10.70122/FK2/PPIAXE/MHDB0O"
.
Can be a vector for multiple files.
get_file
returns a raw vector (or list of raw vectors,
if length(file) > 1
), which can be saved locally with the writeBin
function. To load datasets into the R environment dataframe, see
get_dataframe_by_name.
This function provides access to data files from a Dataverse entry.
get_file
is a general wrapper,
and can take either dataverse objects, file IDs, or a filename and dataverse.
Internally, all functions download each file by get_file_by_id
.
get_file_by_name
is a shorthand for running get_file
by
specifying a file name (filename
) and dataset (dataset
).
get_file_by_doi
obtains a file by its file DOI, bypassing the
dataset
argument.
To load the objects as datasets get_dataframe_by_name.
if (FALSE) { # \dontrun{
# 1. Using filename and dataverse
f1 <- get_file_by_name(
filename = "nlsw88.tab",
dataset = "10.70122/FK2/PPIAXE",
server = "demo.dataverse.org"
)
# 2. Using file DOI
f2 <- get_file_by_doi(
filedoi = "10.70122/FK2/PPIAXE/MHDB0O",
server = "demo.dataverse.org"
)
# 3. Two-steps: Find ID from get_dataset
d3 <- get_dataset("doi:10.70122/FK2/PPIAXE", server = "demo.dataverse.org")
f3 <- get_file(d3$files$id[1], server = "demo.dataverse.org")
# 4. Retrieve multiple raw data in list
f4_meta <- get_dataset(
"doi:10.70122/FK2/PPIAXE",
server = "demo.dataverse.org"
)
f4 <- get_file(f4_meta$files$id, server = "demo.dataverse.org")
names(f4) <- f4_meta$files$label
# Write binary files. To load into R environment, use get_dataframe_by_name()
# The appropriate file extension needs to be assigned by the user.
writeBin(f1, "nlsw88.dta") # .tab extension but save as dta
writeBin(f4[["nlsw88_rds-export.rds"]], "nlsw88.rds") # originally a rds file
writeBin(f4[["nlsw88.tab"]], "nlsw88.dta") # originally a dta file
} # }