Command line interface¶
The a3k command can be invoked from the shell as follows.
a3k: Relational interface to publication metadata
usage: a3k [-h] [-d DEBUG] [-p] [-v]
{help,populate,process,query,list-processes,list-complete-schema,list-source-schema,list-process-schema,list-sources,version,download}
...
Positional Arguments¶
- command
Possible choices: help, populate, process, query, list-processes, list-complete-schema, list-source-schema, list-process-schema, list-sources, version, download
Name of the a3k operation to perform.
Named Arguments¶
- -d, --debug
- Output debuggging information according to the comma-separated arguments.
files-read: Counts of Crossref data files read; link: Record linking operations; sql: Executed SQL statements; perf: Performance timings; populated-counts: Counts of the populated database; populated-data: Data of the populated database; populated-reports: Query results from the populated database; sorted-tables: Topologically ordered Crossref query tables; stacktrace: Produce a stack trace when an error occurs; stderr: Log to standard error;
Default:
[]
- -p, --progress
Show a progress bar (where available)
Default:
False
- -v, --version
Report program version and exit
Default:
False
Sub-commands¶
help¶
Show top-level help message.
a3k help [-h]
populate¶
Populate an SQLite database from the specified data source.
a3k populate [-h] [-a ATTACH_DATABASES [ATTACH_DATABASES ...]]
[-c COLUMNS [COLUMNS ...]] [-R ROW_SELECTION_FILE]
[-r ROW_SELECTION] [-s SAMPLE]
database
{doaj,funder-names,pubmed,issn-subject-codes,ror,asjcs,orcid,crossref,uspto,journal-names,datacite}
[data_location]
Positional Arguments¶
- database
File path of the database to populate
- data_name
Possible choices: doaj, funder-names, pubmed, issn-subject-codes, ror, asjcs, orcid, crossref, uspto, journal-names, datacite
Name of the data source to use
- data_location
Path or URL of the source’s data
Named Arguments¶
- -a, --attach-databases
Databases to attach for the row selection expression
- -c, --columns
Columns to populate using table.column or table.*
- -R, --row-selection-file
File containing SQL expression that selects the populated rows
- -r, --row-selection
SQL expression that selects the populated rows
- -s, --sample
Python expression to sample the data (e.g. random.random() < 0.0002). The expression can also use a variable named data whose value is documented in the constructor API of each data source.
Default:
'True'
process¶
Run a processing step on the specified database.
a3k process [-h]
database
{link-aa-top-ror,link-works-asjcs,link-aa-base-ror,link-uspto-doi}
Positional Arguments¶
- database
file path of the database to run the process on
- process
Possible choices: link-aa-top-ror, link-works-asjcs, link-aa-base-ror, link-uspto-doi
Name of the process to perform;see the data processing operations in the Alexandria3k Python user API documentation for more details
query¶
Run a query directly on a data source. The query’s results can be sent to the standard output (default), to a specified file, or to populate a table in an attached database.
a3k query [-h] [-a ATTACH_DATABASES [ATTACH_DATABASES ...]]
[-E OUTPUT_ENCODING] [-F FIELD_SEPARATOR] [-H] [-o OUTPUT] [-P]
(-Q QUERY_FILE | -q QUERY) [-s SAMPLE]
{doaj,funder-names,pubmed,issn-subject-codes,ror,asjcs,orcid,crossref,uspto,journal-names,datacite}
[data_location]
Positional Arguments¶
- data_name
Possible choices: doaj, funder-names, pubmed, issn-subject-codes, ror, asjcs, orcid, crossref, uspto, journal-names, datacite
Name of the data source to use
- data_location
Path or URL of the source’s data
Named Arguments¶
- -a, --attach-databases
Databases to attach making them available to the query
- -E, --output-encoding
Query output character encoding (use utf-8-sig for Excel)
Default:
'utf-8'
- -F, --field-separator
Character to use for separating query output fields
Default:
','
- -H, --header
Include a header in the query output
Default:
False
- -o, --output
Output file for query results
- -P, --partition
Run the query over partitioned data slices. (Warning: arguments are run per partition.)
Default:
False
- -Q, --query-file
File containing query to run on the virtual tables
- -q, --query
Query to run on the virtual tables
- -s, --sample
Python expression to sample the data (e.g. random.random() < 0.0002). The expression can also use a variable named data whose value is documented in the constructor API of each data source.
Default:
'True'
list-processes¶
List available data processes.
a3k list-processes [-h]
list-complete-schema¶
List all data source and process schemas.
a3k list-complete-schema [-h]
list-source-schema¶
List all data source schemas (default) or the specified one.
a3k list-source-schema [-h]
[{doaj,funder-names,pubmed,issn-subject-codes,ror,asjcs,orcid,crossref,uspto,journal-names,datacite}]
Positional Arguments¶
- facility
Possible choices: doaj, funder-names, pubmed, issn-subject-codes, ror, asjcs, orcid, crossref, uspto, journal-names, datacite
list-process-schema¶
List the schema of all processes (default) or of the specified one.
a3k list-process-schema [-h]
[{link-aa-top-ror,link-works-asjcs,link-aa-base-ror,link-uspto-doi}]
Positional Arguments¶
- facility
Possible choices: link-aa-top-ror, link-works-asjcs, link-aa-base-ror, link-uspto-doi
list-sources¶
List available data sources
a3k list-sources [-h]
version¶
Report program version
a3k version [-h]
download¶
Download data using the specified data source.
a3k download [-h] [-d [DATABASE]] [--sql-query SQL_QUERY]
[--extra_args [EXTRA_ARGS ...]] [-s SAMPLE]
[-a ATTACH_DATABASES [ATTACH_DATABASES ...]]
{doaj,funder-names,pubmed,issn-subject-codes,ror,asjcs,orcid,crossref,uspto,journal-names,datacite}
data_location
Positional Arguments¶
- data_name
Possible choices: doaj, funder-names, pubmed, issn-subject-codes, ror, asjcs, orcid, crossref, uspto, journal-names, datacite
Name of the data source to use
- data_location
File or directory path to save the downloaded data
Named Arguments¶
- -d, --database
File path of the database to use
- --sql-query
SQL query to retrieve the data for downloading
- --extra_args
Additional arguments for the data source (e.g. URL, key, file path)
- -s, --sample
Python expression to sample the data (e.g. random.random() < 0.0002). The expression can also use a variable named data whose value is documented in the constructor API of each data source.
Default:
'True'
- -a, --attach-databases
Databases to attach for the row selection expression