Cool Solution - OpenSearchServer

From Univention Wiki

Jump to: navigation, search
Produktlogo UCS Version 4.1

Note: Cool Solutions are articles documenting additional functionality based on Univention products. Not all of the shown steps in the article are covered by Univention Support. For questions about your support coverage contact your contact person at Univention before you want to implement one of the shown steps.

Also regard the legal notes at Terms of Service.

OpenSearchServer (OSS) is an open-source enterprise search engine software.

The version of OSS used in this article is v1.5.13.

Prerequisites

OSS needs OpenJDK7

univention-install openjdk-7-jdk

Installation

Go to opensearchserver.com and download the .deb package.

Install it via dpkg:

dpkg -i opensearchserver-X.X.X-bXXX.deb

Open port 9090 to access the OSS backend and search pages:

ucr set \
security/packetfilter/package/univention-apache/tcp/9090/all/en=HTTP \
security/packetfilter/package/univention-apache/tcp/9090/all=ACCEPT

Restart univention-firewall to apply changes:

service univention-firewall restart


Setting up a database crawler

We explained how to set up a database crawler for a MediaWiki MySQL database as an example in this paragraph. To learn how to set up a web or JSON API crawler head to the OSS documentation.

Open the OSS backend on <HOSTNAME OR IP>:9090 and create a new index with the template "Empty index".

Give an index name and click on "Create". The index will be opened afterwards.

Creating the analyzer

Analyzers transform crawled and queried data. Using "tokenizers", they split data into tokens, which are then used to find matching search results. Filters allow to transform data. E.g. the "RegularExpressionReplaceFilter" allows to look for a Regex pattern and replace it with something while crawling.

The "TextAnalyzer" would split this sentence

Planet of the apes

into

planet, of, the, ape

You can define new analyzers and adapt the index to various situations. It is also possible to apply Regex replacements in the crawler module. This is also a possibility to process crawled data but it is only possible to apply one rule per field there.

Creating the schema

Go to the "Schema"/"Fields" tab and create the fields you need for your data. These fields are the place where crawled data is stored, processed and returned in a query. You can assign an analyzer to each field. To fully understand the idea of analyzers, read this article from the offical OSS documentation. At last, set a unique and default field. The unique field is needed to identify each unique document. When running the crawler, the documents already existent in the index will be updated rather than created again based on this unique field.

For this MediaWiki example the following fields may be set

Name Indexed Stored TermVector Analyser Copy of
title Yes Yes positions_offset TextAnalyzer
text Yes Yes No TextAnalyzer
lastEditedOn Yes Yes No
id Yes Yes No
url Yes Yes No
length Yes Yes No

Configuring the crawler

This is the module that actually accesses the database. Go to the "Crawler" / "Database" tab and click on "New crawl"

Configure the "General settings" tab as follows:

Name Value
Name <NAME OF YOUR CHOICE>
Driver class com.mysql.jdbc.Driver
JDBC URL jdbc:mysql://<HOSTNAME OR IP>:<PORT IF ANY>/<DATABASE NAME>
Isolation level TRANSACTION_SERIALIZABLE
User <DB USER>
Password <DB USER PASSWORD>
Language Undefined
Fetch size 50
Buffer size 100
Sleep (ms) 0
Primary key
Unique key (for deletion)
SQL Select SELECT page.page_title, page.page_touched, page.page_id, page.page_len, searchindex.si_text

FROM mediawiki.page INNER JOIN mediawiki.searchindex ON searchindex.si_page=page.page_id;

SQL Update

Click the "Check" button to verify your SQL query is working. The little pop-up will either show an error or which columns it could find.

Next you have to configure the "Field Map" for the crawler, to map SQL columns to index fields. For our MediaWiki example the following can be applied:

SQL column Index field Tags Find Reg.Exp. Replace Reg.Exp. Entities File File crawler URL
page_id id
page_len length
page_title title
page_title url (.+) http:\/\/<HOSTNAME OR IP with "\" before all dots (e.g. 10\.200\.14\.30)>\/<MediaWiki FOLDER>\/index\.php\/$1
page_touched lastEditedOn ([0-9]{4})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2})([0-9]{2}) $4:$5:$6 $2.$3.$1
si_text text (.{250}) $1

Click "Save".

In the "Crawl list" tab you can start the crawler by clicking on the small green tick button. The "Status" of the crawler will show what it is currently doing and whether it encountered an error. Hover over the status to get a tooltip. The schema's "Unique field" must have a mapped value. Try to execute the crawl job to see if it works and to try the query we will create in the next step.

Creating a query

Go to Query and create a new query with the type "Search (field)" and a name. In the tab "Query"/"Query" apply those values.

Name Value
Enter the query
Empty query string returns all documents (tick)
Default operator OR
Start offset 0
Number of rows 10
Phrase stop 10
Language If you have just one language in the wiki, set it here, otherwise leave it empty or set "Undefined".

In the tab "Query"/"Searched fields" add those fields:

Field Mode Term boost Phrase boost
title Pattern 1 1
text Pattern 1 1
lastEditedOn Pattern 1 1
length Pattern 1 1
id Pattern 1 1

In the tab "Returned fields" add all fields that a search query shall return later. In this example add all fields.

To execute the query, simply switch to the main tab below "Query" and click on "Search". This returns everything in the index.

Configuring the autocompletion

Go to "Schema"/"Auto-completion" to create an auto-completion item. Set a name, which field the auto-completed terms shall be derived from, Number of suggestion and which Search request it shall execute. Click on "Create" and "Build" to populate the auto-completion.

Creating the renderer

Go to the tab "Renderer" and click "New renderer..." and apply those settings to the main tab:

Name Value
Renderer name <NAME OF YOUR CHOICE>
Search button label <LABEL OF YOUR CHOICE>
One result text 1 result found
Filters panel title text Active filters
Logout text Logout
Facet width 200px
Name Value
Request name <NAME OF YOUR QUERY>
No result text No results found
Many results text results found
Clear all filters text Clear all
HTML template Bootstrap
Autocompletion <NAME OF AUTO-COMPLETION ITEM>

Go to "Renderer"/"Fields" and add those:

Field URL Field Css Class
FIELD: title url ossfieldrdr1
FIELD: text ossfieldrdr2
FIELD: lastEditedOn ossfieldrdr5
FIELD: id ossfieldrdr7
FIELD: length ossfieldrdr6


Go to Renderer/CSS style to apply a simple CSS style to the search page.

Sample CSS for a MediaWiki crawler

This CSS stylesheet provides your renderer with a simple design. It is based on a code snippet for the database crawler example on the official OSS documentation.

body { font-family: Arial, sans-serif; background:#efedea; color:rgb(124, 112, 94); }
#oss-wrap { width:1000px; margin:20px auto 0 auto; font-size:14px;  line-height: 24px;}

.oss-one-result { background:white; padding:20px; margin:0 0 10px 0; border:1px solid rgb(228, 225, 220);}

/* Search box */
.ossinputrdr { height:30px; padding:3px; font-size:1em;}
/* Search button */
.ossbuttonrdr { height:40px; }
/* Num found */
.ossnumfound { margin:10px 0 0 0; }

/* Title of article */
.ossfieldrdr1 { text-align:center; color:#bd5532; font-family: Tahoma; font-size:2.1em; margin:10px 0 20px 0;}
/* Content */
.ossfieldrdr2 {}

/* Author, category, date */
.ossfieldrdr5:before {content: ', Last edited at ';}
.ossfieldrdr5 { color:#ada393; font-style:italic; display:inline-block;}

/* For debugging purpose: Crawl date and ID article */
.ossfieldrdr6:before {content:'Buchstaben: ';}
.ossfieldrdr7:before {content:'ID: ';}
.ossfieldrdr6 { border-top:1px solid #CDCCC9; margin-top:20px;}
.ossfieldrdr6, .ossfieldrdr7 { font-size:0.9em; color:#CDCCC9;}

/* Facets */
.oss-facet h3 { color:#6f9d9f; font-family:Tahoma;}
.oss-facet a { color:#bd5532; }
.oss-facet ul { padding-left:10px;}

/* Autocomplete */
#ossautocompletelist{ background:white; padding:0px; border:1px solid rgb(228, 225, 220);}
.ossautocomplete_link , .ossautocomplete_link_over { cursor:pointer; display:block; padding:5px; }
.ossautocomplete_link_over { background:#efedea;}


Click "Save & Close". You can view the renderer with a click on "View" beneath the renderer. Having executed the crawler and set up the query successfully, the renderer should now be fully operative.

Add OSS to UCS web services

Copy OSS logo to webserver directory

cp /var/lib/opensearchserver/server/ROOT/images/oss_logo.png /var/www

Backend

ucr set \
ucs/web/overview/entries/service/oss-backend/description="OpenSearchServer configuration backend" \
ucs/web/overview/entries/service/oss-backend/icon="/oss_logo.png" \
ucs/web/overview/entries/service/oss-backend/label="OpenSearchServer backend" \
ucs/web/overview/entries/service/oss-backend/link=":9090"

Search renderer page

ucr set \
ucs/web/overview/entries/service/oss-<INDEX NAME>-<RENDERER NAME>/description="Intranet search engine page" \
ucs/web/overview/entries/service/oss-<INDEX NAME>-<RENDERER NAME>/icon="/oss_logo.png" \
ucs/web/overview/entries/service/oss-<INDEX NAME>-<RENDERER NAME>/label="Intranet search" \
ucs/web/overview/entries/service/oss-<INDEX NAME>-<RENDERER NAME>/link=":9090/renderer?use=<INDEX NAME>&name=<RENDERER NAME>"

Hint: Change <INDEX NAME> and <RENDERER NAME> to the names given to index and renderer before.

Further information

Further information about other features functionality and various optional features can be obtained from OSS documentation.

Personal tools