SPDX

Two Dandy Queries for SPDX

In case you haven’t heard, the Linux Foundation develops and promotes SPDX – a standard for documenting the external components used in your software. This standard covers the things that a simple “ingredients label” would not. Usage and licensing of components (which can be far more fluid than one may expect), origins of components, and much else. But perhaps the most important aspect of SPDX is that it is machine-readable.

SPDX builds on RDF – a w3c standard format for linked data. I won’t cover the mechanics of RDF here (I will however, cover them at LinuxCon Europe 2016). Suffice it to say that RDF allows for creation of documents that are structured, interconnected, and yet queriable in their entirety using the query language SPARQL. Below, I will go through a two queries (with some variations) that should be useful to anyone working with SPDX documents.

Note: because SPARQL works on RDF data, if your document is in the tag-value format, use the command line utility in SPDX Tools to convert it to RDF.

Spacing is irrelevant in SPARQL. I tried to make the queries readable, but whether you delimit terms with one space or one hundred makes no functional difference.

Do try this at home

The easiest way I’ve found to run queries against an SPDX document is to grab Twinkle, an open-source Java-built tool to easily run SPARQL queries on RDF files. To integrate sparql queries into your application or to query the combined data from multiple SPDX documents, you will want to use an RDF engine such as Apache Jena, which is what Twinkle actually does internally.

Once you’ve unpacked Twinkle, run the jar with Java:

java -jar twinkle.jar

When the GUI comes up, click on the “File…” button and point at the SPDX file you wish to query against. If you don’t have an SPDX 2.1 document handy, I’ve got one for you here.

Query 1. List Everything Related to a Particular Package

prefix spdx: <http://spdx.org/rdf/terms#>

select ?name ?relationshipType
{
  <Your Package>   spdx:relationship      ?relationship .
  ?relationship   spdx:relationshipType   ?relationshipType .
  ?relationship   spdx:relatedSpdxElement ?relatedElement .
  ?relatedElement spdx:name               ?name .
}

 

Replace <Your Package> above with the URI of the package whose related items you wish to find. Remember, all absolute URIs must be enclosed in <>, e.g.

<http://example.org/packages#SPDXRef-1>

.

Tweaks:

  • The  query above will return the name of every element to which your package is related. This may be more than what you expect, since one of those element may be the containing document, to which the package has a DESCRIBED_BY relationship. To limit the results to specific relationship, remove ?relationshipType from the select line. Then, replace the instance of ?relationshipType inside the curly brackets with the specific relationship you’re looking for. For example, to view all statically linked packages, you might use this query:
prefix spdx: <http://spdx.org/rdf/terms#>

select ?name
{
  <Your Package>  spdx:relationship      ?relationship .
  ?relationship   spdx:relationshipType   spdx:relationshipType_staticLink .
  ?relationship   spdx:relatedSpdxElement ?relatedElement .
  ?relatedElement spdx:name               ?name .
}
  • If you want to return the package version in addition to the name (as the name shouldn’t include the version), we can easily add it. It’s important that we use the optional operator, since versionInfo isn’t a required attribute of the package. Leaving out the optional would filter out any packages that don’t have this field set.
prefix spdx: <http://spdx.org/rdf/terms#>

select ?name ?version
{
  <Your Package>  spdx:relationship      ?relationship .
  ?relationship   spdx:relationshipType   spdx:relationshipType_staticLink .
  ?relationship   spdx:relatedSpdxElement ?relatedElement .
  ?relatedElement spdx:name               ?name .
  optional { ?relatedElement    spdx:versionInfo   ?version } .
}

Limitation: SPDX relationships have been introduced in version 2.0 of the specification. In version 2.1, they can be used to refer to packages whose contents are not documented in the SPDX document. Older documents may use the artifactOf* attributes to document related components, rendering the queries above unusable. artifactOf* fields have been deprecated in SPDX 2.1 and should never be used again by anyone ever. Anyone who uses them henceforth shall be known far and wide as a terrible person.

Query 2. Search by License

Licensing has been a core consideration of SPDX since the beginning. Here’s a query against an SPDX document that finds every element (package, file, snippet) with a particular license.

prefix spdx: <http://spdx.org/rdf/terms#>
prefix license: <http://spdx.org/licenses/>

select distinct ?x
{
  {?x spdx:licenseDeclared license:Apache-2.0}
  UNION
  {?x spdx:licenseConcluded license:Apache-2.0}
}

This query finds all elements whose either declared or concluded license (or both) is Apache-2.0. I used Apache-2.0 in the example, because it’s found in the sample document I linked above. You’re welcome to substitute your own identifier from the SPDX license list.

Note: The above query will not detect license expressions where Apache-2.0 is but one of the terms. For that, you’d have to make another query and add it to the query above as another union clause.

Tweaks.

  • You may want to search by the name of a license instead of an SPDX identifier. This is especially true if the license is extracted from a file in your package and not found in the SPDX license list. The following query will return all elements under the “FAUST, INC. PROPRIETARY LICENSE.
prefix spdx: <http://spdx.org/rdf/terms#>

select distinct ?x
{
  ?license spdx:name "FAUST, INC. PROPRIETARY LICENSE" .
  {?x spdx:licenseDeclared ?license}
  UNION
  {?x spdx:licenseConcluded ?license}
}

 

For more info on SPDX, check out the full specification. For more information on RDF and SPARQL, check out Manning’s Linked Data book. For a complete SPARQL reference, I recommend going  straight to the spec – it’s written very clearly.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s