Data Search Guide FDAT Repository
This guide explains how to write advanced search queries using easy to understand examples.
Simple search (one or multiple terms)
Example:
open science
Results will match records with the terms open
or science
in any field. Note that stemming
is applied so e.g. science
will also match
sciences
. Search results are ranked according to an algorithm
that takes your query terms into account.
You can require presence of both terms using either the
+
or AND
operator:
Examples:
+open +science
or
open AND science
You can require absence of one or more terms using either the
-
or NOT
operator:
Examples:
-open +science
or
NOT open AND science
Phrase search
Example:
"open science"
Results will match records with the phrase
open science
in any field.
Field search
Example:
metadata.title:open
Results will match records with the term open
in the
field metadata.title
. If you want to search for multiple terms
in the title you must group the terms using parenthesis:
Example:
metadata.title:(open science)
See the field reference below for the full list of fields you can search.
Combined simple, phrase or field search
Example:
+metadata.title:"open science" -metadata.title:policy
or e.g.
metadata.title:(-open +science)
You can combine simple, phrase and field search to construct advanced search queries.
Range search
Example:
metadata.publication_date:[2017 TO 2018]
(note, you must capitalize TO
).
Results will match any record with a publication date between 2017-01-01 and 2018-01-01 (both dates inclusive).
Note that, partial dates are expanded to full dates, e.g.:
- 2017 is expanded to 2017-01-01
- 2017-06 is expanded to 2017-06-01
Use square brackets ([]
) for inclusive ranges and use
curly brackets ({}
) for exclusive ranges, e.g.:
-
[2017 TO 2018}
is equivalent to[2017-01-01 TO 2017-12-31]
because of date expansion and exclusive upper bound.
Examples of other ranges:
-
metadata.publication_date:{* TO 2017-01-01}
: All days until 2017. -
metadata.publication_date:[2017-01-01 TO *]
: All days from 2017.
Ranking/Sorting
By default all searches are sorted according to an internal ranking algorithm that scores each match against your query. In both the user interface and REST API, it's possible to sort the results by:
- Most recent
- Best match
Regular expressions
Regular expressions are a powerful pattern matching language that allow to search for specific patterns in a field. For instance if we wanted to find all records with a DOI-prefix 10.5281 we could use a regular expression search:
Example:
metadata.subjects.identifier:/03yrm5c2[1,6]/
Careful, the regular expression must match the entire field value. See the regular expression syntax for further details.
Missing values
It is possible to search for records that either are missing a value or have
a value in a specific field using the _exists_
and
_missing_
field names.
Example:
_missing_:metadata.additional_titles
(all records without metadata.additional_titles)
Example:
_exists_:metadata.creators
(all records with metadata.creators)
Advanced concepts
Boosting
You can use the boost operator ^
when one term is more relevant
than another. For instance, you can search for all records with the phrase
open science in either title or
description field, but rank records with the phrase in the
title field higher:
Example:
metadata.title:"open science"^5 metadata.description:"open science"
Fuzziness
You can search for terms similar to but not exactly like your search term
using the fuzzy operator ~
.
Example:
oepn~
Results will match records with terms similar to oepn
which
would e.g. also match open
.
Proximity searches
A phrase search like "open science"
by default expect all terms
in exactly the same order, and thus for instance would not match a record
containing the phrase "open access and science". A proximity search
allows that the terms are not in the exact order and may include other terms
inbetween. The degree of flexiblity is specified by an integer afterwards:
Example:
"open science"~5
Wildcards
You can use wildcards in search terms to replace a single character (using
?
operator) or zero or more characters (using
*
operator).
Example:
ope? scien*
Wildcard searches can be slow and should normally be avoided if possible.