Use overview¶
After downloading the Crossref data you can use alexandria3k through its Python API or as a command-line tool.
With alexandria3k you can process diverse sources of bibliographic data, such as Crossref, ORCID, USPTO, DOAJ, and ROR. Specifically, you can perform the following actions.
Directly run ad hoc SQL queries on any of the supported data sets.
Populate SQLite databases with selected elements of the supported data sets.
Select a horizontal subset of a data set’s records.
Through an SQL expression.
By sampling a subset of the data set’s elements or containers. (For example sampling some of the 26 thousand containers in the Crossref data set.)
Select a horizontal subset of a data set’s records by only loading those associated with already populated records or records available in another database.
Select a vertical subset of a data set’s columns
Using the
Table.Column
orTable.*
notation
Process already populated databases to cross-link or normalize their elements.
Populating a database can take minutes (for a small, e.g. experimental, subset), a few hours (to traverse the whole Crossref data set and obtain a few thousands of records), or a couple of days (to produce a large set, e.g. by selecting some columns).
After your populate an SQLite database and create suitable indexes, SQL queries often run in seconds.
You can find many complete proof-of-concept example studies conducted with command-line invocations in the examples directory. Consider using the hello world (work authors by decade) example as a starting point.