Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
GingerDJ 6.1.1 documentation
GingerDJ 6.1.1 documentation
  • GingerDJ documentation
  • Getting started
    • GingerDJ at a glance
    • Quick install guide
    • Writing your first GingerDJ app, part 1
    • Writing your first GingerDJ app, part 2
    • Writing your first GingerDJ app, part 3
    • Writing your first GingerDJ app, part 4
    • Writing your first GingerDJ app, part 5
    • Writing your first GingerDJ app, part 6
    • Writing your first GingerDJ app, part 7
    • Writing your first GingerDJ app, part 8
    • Advanced tutorial: How to write reusable apps
    • What to read next
  • Using GingerDJ
    • How to install GingerDJ
    • Models and databases
      • Models
      • Making queries
      • Aggregation
      • Search
      • Managers
      • Performing raw SQL queries
      • Database transactions
      • Multiple databases
      • Tablespaces
      • Database access optimization
      • Database instrumentation
      • Fixtures
      • Examples of model relationship API usage
        • Many-to-many relationships
        • Many-to-one relationships
        • One-to-one relationships
    • Handling HTTP requests
      • URL dispatcher
      • Writing views
      • View decorators
      • File Uploads
      • GingerDJ shortcut functions
      • Generic views
      • Middleware
      • How to use sessions
    • Working with forms
      • Formsets
      • Creating forms from models
      • Form Assets (the Media class)
    • Templates
    • Class-based views
      • Introduction to class-based views
      • Built-in class-based generic views
      • Form handling with class-based views
      • Using mixins with class-based views
    • Migrations
    • Managing files
    • Testing in GingerDJ
      • Writing and running tests
      • Testing tools
      • Advanced testing topics
    • GingerDJ’s cache framework
    • Conditional View Processing
    • Cryptographic signing
    • Sending email
    • Internationalization and localization
      • Translation
      • Format localization
      • Time zones
    • Logging
    • Pagination
    • Security in GingerDJ
    • Performance and optimization
    • Serializing GingerDJ objects
    • GingerDJ settings
    • Signals
    • System check framework
    • External packages
    • Asynchronous support
  • “How-to” guides
    • How to use GingerDJ’s CSRF protection
    • How to create custom gingerdj-admin commands
    • How to create custom model fields
    • How to write custom lookups
    • How to implement a custom template backend
    • How to create custom template tags and filters
    • How to write a custom storage class
    • How to deploy GingerDJ
      • How to deploy with WSGI
        • How to use GingerDJ with Gunicorn
        • How to use GingerDJ with uWSGI
        • How to use GingerDJ with Apache and mod_wsgi
      • How to deploy with ASGI
        • How to use GingerDJ with Daphne
        • How to use GingerDJ with Hypercorn
        • How to use GingerDJ with Uvicorn
      • Deployment checklist
    • How to upgrade GingerDJ to a newer version
    • How to manage error reporting
    • How to provide initial data for models
    • How to integrate GingerDJ with a legacy database
    • How to configure and use logging
    • How to create CSV output
    • How to create PDF files
    • How to override templates
    • How to manage static files (e.g. images, JavaScript, CSS)
    • How to deploy static files
    • How to install GingerDJ on Windows
    • How to create database migrations
    • How to delete a GingerDJ application
  • GingerDJ FAQ
    • FAQ: General
    • FAQ: Installation
    • FAQ: Using GingerDJ
    • FAQ: Databases and models
    • FAQ: The admin
    • Troubleshooting
  • API Reference
    • Applications
    • System check framework
    • Built-in class-based views API
      • Base views
      • Generic display views
      • Generic editing views
      • Generic date views
      • Class-based views mixins
        • Simple mixins
        • Single object mixins
        • Multiple object mixins
        • Editing mixins
        • Date-based mixins
      • Class-based generic views - flattened index
    • Clickjacking Protection
    • contrib packages
      • The GingerDJ admin site
        • Admin actions
        • ModelAdmin List Filters
        • The GingerDJ admin documentation generator
        • JavaScript customizations in the admin
      • The flatpages app
      • GeoGinger
        • GeoGinger Tutorial
        • GeoGinger Installation
          • Installing Geospatial libraries
          • Installing PostGIS
          • Installing SpatiaLite
        • GeoGinger Model API
        • GeoGinger Database API
        • GeoGinger Forms API
        • GIS QuerySet API Reference
        • Geographic Database Functions
        • Measurement Objects
        • GEOS API
        • GDAL API
        • Geolocation with GeoIP2
        • GeoGinger Utilities
          • LayerMapping data import utility
          • OGR Inspection
          • GeoJSON Serializer
        • GeoGinger Management Commands
        • GeoGinger’s admin site
        • Geographic Feeds
        • Geographic Sitemaps
        • Testing GeoGinger apps
        • Deploying GeoGinger
      • gingerdj.contrib.humanize
      • The messages framework
      • gingerdj.contrib.postgres
        • PostgreSQL specific aggregation functions
        • PostgreSQL specific database constraints
        • PostgreSQL specific query expressions
        • PostgreSQL specific model fields
        • PostgreSQL specific form fields and widgets
        • PostgreSQL specific database functions
        • PostgreSQL specific model indexes
        • PostgreSQL specific lookups
        • Database migration operations
        • Full text search
        • Validators
      • The redirects app
      • The sitemap framework
      • The “sites” framework
      • The staticfiles app
      • The syndication feed framework
    • Cross Site Request Forgery protection
    • Databases
    • gingerdj-admin and manage.py
    • Running management commands from your code
    • GingerDJ Exceptions
    • File handling
      • The File object
      • File storage API
      • Uploaded Files and Upload Handlers
    • Forms
      • The Forms API
      • Form fields
      • Model Form Functions
      • Formset Functions
      • The form rendering API
      • Widgets
      • Form and field validation
    • Logging
    • Middleware
    • Migration Operations
    • Models
      • Model field reference
      • Field attribute reference
      • Model index reference
      • Constraints reference
      • Model _meta API
      • Related objects reference
      • Model class reference
      • Model Meta options
      • Model instance reference
      • QuerySet API reference
      • Lookup API reference
      • Query Expressions
      • Conditional Expressions
      • Database Functions
    • Paginator
    • Request and response objects
    • SchemaEditor
    • Settings
    • Signals
    • Templates
      • The GingerDJ template language
      • Built-in template tags and filters
      • The GingerDJ template language: for Python programmers
    • TemplateResponse and SimpleTemplateResponse
    • Unicode data
    • gingerdj.urls utility functions
    • gingerdj.urls functions for use in URLconfs
    • gingerdj.conf.urls functions for use in URLconfs
    • GingerDJ Utils
    • Validators
    • Built-in Views
  • Meta-documentation and miscellany
    • API stability
    • Design philosophies
  • Glossary
Back to top
View this page

Full text search¶

The database functions in the gingerdj.contrib.postgres.search module ease the use of PostgreSQL’s full text search engine.

For the examples in this document, we’ll use the models defined in Making queries.

See also

For a high-level overview of searching, see the topic documentation.

The search lookup¶

A common way to use full text search is to search a single term against a single column in the database. For example:

>>> Entry.objects.filter(body_text__search="Cheese")
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]

This creates a to_tsvector in the database from the body_text field and a plainto_tsquery from the search term 'Cheese', both using the default database search configuration. The results are obtained by matching the query and the vector.

To use the search lookup, 'gingerdj.contrib.postgres' must be in your INSTALLED_APPS.

SearchVector¶

class SearchVector(*expressions, config=None, weight=None)¶

Searching against a single field is great but rather limiting. The Entry instances we’re searching belong to a Blog, which has a tagline field. To query against both fields, use a SearchVector:

>>> from gingerdj.contrib.postgres.search import SearchVector
>>> Entry.objects.annotate(
...     search=SearchVector("body_text", "blog__tagline"),
... ).filter(search="Cheese")
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]

The arguments to SearchVector can be any Expression or the name of a field. Multiple arguments will be concatenated together using a space so that the search document includes them all.

SearchVector objects can be combined together, allowing you to reuse them. For example:

>>> Entry.objects.annotate(
...     search=SearchVector("body_text") + SearchVector("blog__tagline"),
... ).filter(search="Cheese")
[<Entry: Cheese on Toast recipes>, <Entry: Pizza Recipes>]

See Changing the search configuration and Weighting queries for an explanation of the config and weight parameters.

SearchQuery¶

class SearchQuery(value, config=None, search_type='plain')¶

SearchQuery translates the terms the user provides into a search query object that the database compares to a search vector. By default, all the words the user provides are passed through the stemming algorithms, and then it looks for matches for all of the resulting terms.

If search_type is 'plain', which is the default, the terms are treated as separate keywords. If search_type is 'phrase', the terms are treated as a single phrase. If search_type is 'raw', then you can provide a formatted search query with terms and operators. If search_type is 'websearch', then you can provide a formatted search query, similar to the one used by web search engines. 'websearch' requires PostgreSQL ≥ 11. Read PostgreSQL’s Full Text Search docs to learn about differences and syntax. Examples:

>>> from gingerdj.contrib.postgres.search import SearchQuery
>>> SearchQuery("red tomato")  # two keywords
>>> SearchQuery("tomato red")  # same results as above
>>> SearchQuery("red tomato", search_type="phrase")  # a phrase
>>> SearchQuery("tomato red", search_type="phrase")  # a different phrase
>>> SearchQuery("'tomato' & ('red' | 'green')", search_type="raw")  # boolean operators
>>> SearchQuery(
...     "'tomato' ('red' OR 'green')", search_type="websearch"
... )  # websearch operators

SearchQuery terms can be combined logically to provide more flexibility:

>>> from gingerdj.contrib.postgres.search import SearchQuery
>>> SearchQuery("meat") & SearchQuery("cheese")  # AND
>>> SearchQuery("meat") | SearchQuery("cheese")  # OR
>>> ~SearchQuery("meat")  # NOT

See Changing the search configuration for an explanation of the config parameter.

SearchRank¶

class SearchRank(vector, query, weights=None, normalization=None, cover_density=False)¶

So far, we’ve returned the results for which any match between the vector and the query are possible. It’s likely you may wish to order the results by some sort of relevancy. PostgreSQL provides a ranking function which takes into account how often the query terms appear in the document, how close together the terms are in the document, and how important the part of the document is where they occur. The better the match, the higher the value of the rank. To order by relevancy:

>>> from gingerdj.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
>>> vector = SearchVector("body_text")
>>> query = SearchQuery("cheese")
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).order_by("-rank")
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]

See Weighting queries for an explanation of the weights parameter.

Set the cover_density parameter to True to enable the cover density ranking, which means that the proximity of matching query terms is taken into account.

Provide an integer to the normalization parameter to control rank normalization. This integer is a bit mask, so you can combine multiple behaviors:

>>> from gingerdj.db.models import Value
>>> Entry.objects.annotate(
...     rank=SearchRank(
...         vector,
...         query,
...         normalization=Value(2).bitor(Value(4)),
...     )
... )

The PostgreSQL documentation has more details about different rank normalization options.

SearchHeadline¶

class SearchHeadline(expression, query, config=None, start_sel=None, stop_sel=None, max_words=None, min_words=None, short_word=None, highlight_all=None, max_fragments=None, fragment_delimiter=None)¶

Accepts a single text field or an expression, a query, a config, and a set of options. Returns highlighted search results.

Set the start_sel and stop_sel parameters to the string values to be used to wrap highlighted query terms in the document. PostgreSQL’s defaults are <b> and </b>.

Provide integer values to the max_words and min_words parameters to determine the longest and shortest headlines. PostgreSQL’s defaults are 35 and 15.

Provide an integer value to the short_word parameter to discard words of this length or less in each headline. PostgreSQL’s default is 3.

Set the highlight_all parameter to True to use the whole document in place of a fragment and ignore max_words, min_words, and short_word parameters. That’s disabled by default in PostgreSQL.

Provide a non-zero integer value to the max_fragments to set the maximum number of fragments to display. That’s disabled by default in PostgreSQL.

Set the fragment_delimiter string parameter to configure the delimiter between fragments. PostgreSQL’s default is " ... ".

The PostgreSQL documentation has more details on highlighting search results.

Usage example:

>>> from gingerdj.contrib.postgres.search import SearchHeadline, SearchQuery
>>> query = SearchQuery("red tomato")
>>> entry = Entry.objects.annotate(
...     headline=SearchHeadline(
...         "body_text",
...         query,
...         start_sel="<span>",
...         stop_sel="</span>",
...     ),
... ).get()
>>> print(entry.headline)
Sandwich with <span>tomato</span> and <span>red</span> cheese.

See Changing the search configuration for an explanation of the config parameter.

Changing the search configuration¶

You can specify the config attribute to a SearchVector and SearchQuery to use a different search configuration. This allows using different language parsers and dictionaries as defined by the database:

>>> from gingerdj.contrib.postgres.search import SearchQuery, SearchVector
>>> Entry.objects.annotate(
...     search=SearchVector("body_text", config="french"),
... ).filter(search=SearchQuery("œuf", config="french"))
[<Entry: Pain perdu>]

The value of config could also be stored in another column:

>>> from gingerdj.db.models import F
>>> Entry.objects.annotate(
...     search=SearchVector("body_text", config=F("blog__language")),
... ).filter(search=SearchQuery("œuf", config=F("blog__language")))
[<Entry: Pain perdu>]

Weighting queries¶

Every field may not have the same relevance in a query, so you can set weights of various vectors before you combine them:

>>> from gingerdj.contrib.postgres.search import SearchQuery, SearchRank, SearchVector
>>> vector = SearchVector("body_text", weight="A") + SearchVector(
...     "blog__tagline", weight="B"
... )
>>> query = SearchQuery("cheese")
>>> Entry.objects.annotate(rank=SearchRank(vector, query)).filter(rank__gte=0.3).order_by(
...     "rank"
... )

The weight should be one of the following letters: D, C, B, A. By default, these weights refer to the numbers 0.1, 0.2, 0.4, and 1.0, respectively. If you wish to weight them differently, pass a list of four floats to SearchRank as weights in the same order above:

>>> rank = SearchRank(vector, query, weights=[0.2, 0.4, 0.6, 0.8])
>>> Entry.objects.annotate(rank=rank).filter(rank__gte=0.3).order_by("-rank")

Performance¶

Special database configuration isn’t necessary to use any of these functions, however, if you’re searching more than a few hundred records, you’re likely to run into performance problems. Full text search is a more intensive process than comparing the size of an integer, for example.

In the event that all the fields you’re querying on are contained within one particular model, you can create a functional GIN or GiST index which matches the search vector you wish to use. For example:

GinIndex(
    SearchVector("body_text", "headline", config="english"),
    name="search_vector_idx",
)

The PostgreSQL documentation has details on creating indexes for full text search.

SearchVectorField¶

class SearchVectorField¶

If this approach becomes too slow, you can add a SearchVectorField to your model. You’ll need to keep it populated with triggers, for example, as described in the PostgreSQL documentation. You can then query the field as if it were an annotated SearchVector:

>>> Entry.objects.update(search_vector=SearchVector("body_text"))
>>> Entry.objects.filter(search_vector="cheese")
[<Entry: Cheese on Toast recipes>, <Entry: Pizza recipes>]

Trigram similarity¶

Another approach to searching is trigram similarity. A trigram is a group of three consecutive characters. In addition to the trigram_similar, trigram_word_similar, and trigram_strict_word_similar lookups, you can use a couple of other expressions.

To use them, you need to activate the pg_trgm extension on PostgreSQL. You can install it using the TrigramExtension migration operation.

TrigramSimilarity¶

class TrigramSimilarity(expression, string, **extra)¶

Accepts a field name or expression, and a string or expression. Returns the trigram similarity between the two arguments.

Usage example:

>>> from gingerdj.contrib.postgres.search import TrigramSimilarity
>>> Author.objects.create(name="Katy Stevens")
>>> Author.objects.create(name="Stephen Keats")
>>> test = "Katie Stephens"
>>> Author.objects.annotate(
...     similarity=TrigramSimilarity("name", test),
... ).filter(
...     similarity__gt=0.3
... ).order_by("-similarity")
[<Author: Katy Stevens>, <Author: Stephen Keats>]

TrigramWordSimilarity¶

class TrigramWordSimilarity(string, expression, **extra)¶

Accepts a string or expression, and a field name or expression. Returns the trigram word similarity between the two arguments.

Usage example:

>>> from gingerdj.contrib.postgres.search import TrigramWordSimilarity
>>> Author.objects.create(name="Katy Stevens")
>>> Author.objects.create(name="Stephen Keats")
>>> test = "Kat"
>>> Author.objects.annotate(
...     similarity=TrigramWordSimilarity(test, "name"),
... ).filter(
...     similarity__gt=0.3
... ).order_by("-similarity")
[<Author: Katy Stevens>]

TrigramStrictWordSimilarity¶

class TrigramStrictWordSimilarity(string, expression, **extra)¶

Accepts a string or expression, and a field name or expression. Returns the trigram strict word similarity between the two arguments. Similar to TrigramWordSimilarity(), except that it forces extent boundaries to match word boundaries.

TrigramDistance¶

class TrigramDistance(expression, string, **extra)¶

Accepts a field name or expression, and a string or expression. Returns the trigram distance between the two arguments.

Usage example:

>>> from gingerdj.contrib.postgres.search import TrigramDistance
>>> Author.objects.create(name="Katy Stevens")
>>> Author.objects.create(name="Stephen Keats")
>>> test = "Katie Stephens"
>>> Author.objects.annotate(
...     distance=TrigramDistance("name", test),
... ).filter(
...     distance__lte=0.7
... ).order_by("distance")
[<Author: Katy Stevens>, <Author: Stephen Keats>]

TrigramWordDistance¶

class TrigramWordDistance(string, expression, **extra)¶

Accepts a string or expression, and a field name or expression. Returns the trigram word distance between the two arguments.

Usage example:

>>> from gingerdj.contrib.postgres.search import TrigramWordDistance
>>> Author.objects.create(name="Katy Stevens")
>>> Author.objects.create(name="Stephen Keats")
>>> test = "Kat"
>>> Author.objects.annotate(
...     distance=TrigramWordDistance(test, "name"),
... ).filter(
...     distance__lte=0.7
... ).order_by("distance")
[<Author: Katy Stevens>]

TrigramStrictWordDistance¶

class TrigramStrictWordDistance(string, expression, **extra)¶

Accepts a string or expression, and a field name or expression. Returns the trigram strict word distance between the two arguments.

Next
Validators
Previous
Database migration operations
Copyright © Ginger Society and contributors
Made with Sphinx and @pradyunsg's Furo
Last updated on Dec 14, 2024
On this page
  • Full text search
    • The search lookup
    • SearchVector
    • SearchQuery
    • SearchRank
    • SearchHeadline
    • Changing the search configuration
    • Weighting queries
    • Performance
      • SearchVectorField
    • Trigram similarity
      • TrigramSimilarity
      • TrigramWordSimilarity
      • TrigramStrictWordSimilarity
      • TrigramDistance
      • TrigramWordDistance
      • TrigramStrictWordDistance