Command line interface¶

The a3k command can be invoked from the shell as follows.

a3k: Relational interface to publication metadata

usage: a3k [-h] [-d DEBUG] [-p] [-v]
           {help,populate,process,query,list-processes,list-complete-schema,list-source-schema,list-process-schema,list-sources,version,download}
           ...

Positional Arguments¶

command

Possible choices: help, populate, process, query, list-processes, list-complete-schema, list-source-schema, list-process-schema, list-sources, version, download

Name of the a3k operation to perform.

Named Arguments¶

-d, --debug

Output debuggging information according to the comma-separated arguments.: files-read: Counts of Crossref data files read; link: Record linking operations; sql: Executed SQL statements; perf: Performance timings; populated-counts: Counts of the populated database; populated-data: Data of the populated database; populated-reports: Query results from the populated database; sorted-tables: Topologically ordered Crossref query tables; stacktrace: Produce a stack trace when an error occurs; stderr: Log to standard error;

Default: []

-p, --progress

Show a progress bar (where available)

Default: False

-v, --version

Report program version and exit

Default: False

Sub-commands¶

help¶

Show top-level help message.

a3k help [-h]

populate¶

Populate an SQLite database from the specified data source.

a3k populate [-h] [-a ATTACH_DATABASES [ATTACH_DATABASES ...]]
             [-c COLUMNS [COLUMNS ...]] [-R ROW_SELECTION_FILE]
             [-r ROW_SELECTION] [-s SAMPLE]
             database
             {doaj,funder-names,pubmed,issn-subject-codes,ror,asjcs,orcid,crossref,uspto,journal-names,datacite}
             [data_location]

Positional Arguments¶

database

File path of the database to populate

data_name

Possible choices: doaj, funder-names, pubmed, issn-subject-codes, ror, asjcs, orcid, crossref, uspto, journal-names, datacite

Name of the data source to use

data_location

Path or URL of the source’s data

Named Arguments¶

-a, --attach-databases

Databases to attach for the row selection expression

-c, --columns

Columns to populate using table.column or table.*

-R, --row-selection-file

File containing SQL expression that selects the populated rows

-r, --row-selection

SQL expression that selects the populated rows

-s, --sample

Python expression to sample the data (e.g. random.random() < 0.0002). The expression can also use a variable named data whose value is documented in the constructor API of each data source.

Default: 'True'

process¶

Run a processing step on the specified database.

a3k process [-h]
            database
            {link-aa-top-ror,link-works-asjcs,link-aa-base-ror,link-uspto-doi}

Positional Arguments¶

database

file path of the database to run the process on

process

Possible choices: link-aa-top-ror, link-works-asjcs, link-aa-base-ror, link-uspto-doi

Name of the process to perform;see the data processing operations in the Alexandria3k Python user API documentation for more details

query¶

Run a query directly on a data source. The query’s results can be sent to the standard output (default), to a specified file, or to populate a table in an attached database.

a3k query [-h] [-a ATTACH_DATABASES [ATTACH_DATABASES ...]]
          [-E OUTPUT_ENCODING] [-F FIELD_SEPARATOR] [-H] [-o OUTPUT] [-P]
          (-Q QUERY_FILE | -q QUERY) [-s SAMPLE]
          {doaj,funder-names,pubmed,issn-subject-codes,ror,asjcs,orcid,crossref,uspto,journal-names,datacite}
          [data_location]

Positional Arguments¶

data_name

Possible choices: doaj, funder-names, pubmed, issn-subject-codes, ror, asjcs, orcid, crossref, uspto, journal-names, datacite

Name of the data source to use

data_location

Path or URL of the source’s data

Named Arguments¶

-a, --attach-databases

Databases to attach making them available to the query

-E, --output-encoding

Query output character encoding (use utf-8-sig for Excel)

Default: 'utf-8'

-F, --field-separator

Character to use for separating query output fields

Default: ','

-H, --header

Include a header in the query output

Default: False

-o, --output

Output file for query results

-P, --partition

Run the query over partitioned data slices. (Warning: arguments are run per partition.)

Default: False

-Q, --query-file

File containing query to run on the virtual tables

-q, --query

Query to run on the virtual tables

-s, --sample

Python expression to sample the data (e.g. random.random() < 0.0002). The expression can also use a variable named data whose value is documented in the constructor API of each data source.

Default: 'True'

list-processes¶

List available data processes.

a3k list-processes [-h]

list-complete-schema¶

List all data source and process schemas.

a3k list-complete-schema [-h]

list-source-schema¶

List all data source schemas (default) or the specified one.

a3k list-source-schema [-h]
                       [{doaj,funder-names,pubmed,issn-subject-codes,ror,asjcs,orcid,crossref,uspto,journal-names,datacite}]

Positional Arguments¶

facility: Possible choices: doaj, funder-names, pubmed, issn-subject-codes, ror, asjcs, orcid, crossref, uspto, journal-names, datacite

list-process-schema¶

List the schema of all processes (default) or of the specified one.

a3k list-process-schema [-h]
                        [{link-aa-top-ror,link-works-asjcs,link-aa-base-ror,link-uspto-doi}]

Positional Arguments¶

facility: Possible choices: link-aa-top-ror, link-works-asjcs, link-aa-base-ror, link-uspto-doi

list-sources¶

List available data sources

a3k list-sources [-h]

version¶

Report program version

a3k version [-h]

download¶

Download data using the specified data source.

a3k download [-h] [-d [DATABASE]] [--sql-query SQL_QUERY]
             [--extra_args [EXTRA_ARGS ...]] [-s SAMPLE]
             [-a ATTACH_DATABASES [ATTACH_DATABASES ...]]
             {doaj,funder-names,pubmed,issn-subject-codes,ror,asjcs,orcid,crossref,uspto,journal-names,datacite}
             data_location

Positional Arguments¶

data_name

Possible choices: doaj, funder-names, pubmed, issn-subject-codes, ror, asjcs, orcid, crossref, uspto, journal-names, datacite

Name of the data source to use

data_location

File or directory path to save the downloaded data

Named Arguments¶

-d, --database

File path of the database to use

--sql-query

SQL query to retrieve the data for downloading

--extra_args

Additional arguments for the data source (e.g. URL, key, file path)

-s, --sample

Python expression to sample the data (e.g. random.random() < 0.0002). The expression can also use a variable named data whose value is documented in the constructor API of each data source.

Default: 'True'

-a, --attach-databases

Databases to attach for the row selection expression

Command line interface¶

Positional Arguments¶

Named Arguments¶

Sub-commands¶

help¶

populate¶

Positional Arguments¶

Named Arguments¶

process¶

Positional Arguments¶

query¶

Positional Arguments¶

Named Arguments¶

list-processes¶

list-complete-schema¶

list-source-schema¶

Positional Arguments¶

list-process-schema¶

Positional Arguments¶

list-sources¶

version¶

download¶

Positional Arguments¶

Named Arguments¶

alexandria3k

Navigation

Related Topics