EPrints 2.2 Documentation - Configuring an Archive


EPrints Archive Configuration

This section describes all the configuration files in an single archive in the EPrints system.

Primary archive configuration file

Once you have created an EPrints archive the information you entered is placed in an XML file in /usr/local/eprint2/archives/ with the name archiveid.xml - this file is documented later in this section.

Archive configuration directory

The bulk of the archive configuration is copied from /opt/eprints2/defaultcfg/ into the archives own configuration directory (usually /opt/eprints2/archives/archiveid/cfg/ This directory will usually contain the following files and directories:

apache.conf
This file is generated by generate_apacheconf. See the documentation of generate_apacheconf for more information.

apachevhost.conf (added v2.2)
This file is generated by generate_apacheconf. See the documentation of generate_apacheconf for more information.

ArchiveConfig.pm
The general configuration items which don't fit anywhere else are in this perl module. It is described fully later in this section of documentation. This module ``requires'' the other 5 perl modules. They are in seperate files to make them easier to get to grips with.

ArchiveMetadataFieldsConfig.pm
This module configures the metadata fields and the default values.

ArchiveOAIConfig.pm
This module configures how the archive exports itself via the Open Archives protocol.

ArchiveRenderConfig.pm
This module contains subroutines which handle rendering the data into XHTML (mostly) for display as webpages.

ArchiveTextIndexingConfig.pm
This module handles turning UTF8 text strings into lists of index words for free text searches.

ArchiveValidateConfig.pm
This module contains subroutines which check the metadata for problems.

auto-apache.conf
This file is generated and overwritten by generate_apacheconf. Do not edit it directly. See the documentation of generate_apacheconf for more information.

citations-languageid.xml
One of these files for each languageid supported by this archive. These XML files describe how to turn metadata for an item into a citation (with markup). They are described fully later in this section of documentation.

entities-languageid.dtd
One of these files for each languageid supported by this archive. These DTD files are generated automaticly just before eprints loads the archives configuration and should not be edited directly.

metadata-types.xml
This XML file describes the various types of eprints, users etc. and which metadata fields are required or relevant to each. It is described fully later in this section of documentation.

phrases-languageid.xml
One of these files for each languageid supported by this archive. These XML files contain all the phrases which are specific to this archives such as the titles of metadata fields. They are described fully later in this section of documentation.

ruler.xml
This XML file just contains the horizontal divider used in webpages created by the system. It is described fully later in this section of documentation.

static/
This directory contains the data needed to create the static webpages such as the homepage, and about page. It is described fully later in this section of documentation.

subjects
This file contains the initial subjects for the system. It is described fully in the documentation for import_subjects.

template-languageid.xml
One of these files for each languageid supported by this archive. These XML/XHTML files describe the outline for webpages for this system. They are described fully later in this section of documentation.


XML Config Files in EPrints

This section contains some general information about the XML archive config files: template, phrases, ruler and citations. metadata-types.xml uses XML but these comments do not apply.

XHTML

This files use HTML elements (and other elements too). XHTML is a fairly new version of HTML which is back compatable with HTML 4 but written using XML not SGML. This means that it is much stricter but less ambiguous and easier to parse and modify. Assuming you know HTML, the main differences are as follows:

Tags must be closed
All elements must be closed, even ones such as <li>. Tags which do not have a close tag in HTML, like <br> or <img src="foo"> still must be closed eg. <img src="foo"></img> - this can be abbreviated to the neater looking: <img src="foo" />

All tags and attributes must be lower case
Self explanitary.

Strict definition of what tags may appear within which others.
Not actually checked by EPrints. It will let any rubbish past as long as it's valid XML. But that's no reason to be naughty.

All attributes must be wrapped in quotes
In HTML the values of attributes do not have to be wrapped in quotes, but in XML (and therefore XHTML) they do.

All attributes must have a value
In HTML some attribues did not require a value, for example in <hr noshade> elements. In XHTML it is represented as <hr noshade="noshade" />

So in summary, the HTML:

 <img SRC=someurl>
 <hr NOSHADE WIDTH=2>
 <P>Foo bar</P>

should become in XHTML:

 <img src="someurl" />
 <hr noshade="noshade" width="2" />
 <p>Foo bar</p>

And that's more or less it. See http://www.w3c.org/ for a complete description.

Language specific files.

phrases, template and citations have one instance per supported language. This allows the system to generate pages and emails in more than one language. Supporting a new language will require translating the all the english in the english config files currently shipped. If you do intend it do this (lots of work!) please get in touch with the eprints admin so that we can avoid duplicated effort.

Extra Entities

The XML files all use a DTD which defines a few extra entities. Entities are items in XML (or HTML) which start with ``&'' and end with ``;'' like &amp;. These additional entities come from the entities DTD file created by generate_entities. One DTD is created per language, although currently the only variation is the archive name.

&archivename;
The name of the archive in the current language.

&adminemail;
The administrators email address.

&base_url;
The base URL of the system (without a trailing slash)

&perl_url;
The base URL of the CGI directory (without a trailing slash)

&frontpage;
The URL of the system homepage.

&userhome;
The URL of the user homepage.

&version;
The current EPrints version.

&ruler;
The XHTML of the standard divider.

Any XHTML character entity (since EPrints v2.1)
You may now use any XHTML character entity, eg. &nbsp; &eacute; &euro;.

User configured entities
You can generate your own entities by modifying the function which generates them in ArchiveConfig.pm

None of these entities are not available in the citations file or the ruler file.

Name Spaces and XHTML

These files contain a mixture of custom tags and XHTML. To keep these distinct the XML files contain a name space definition in the first element. The pratical upshot is that all EPrints own tags have the prefix ``ep:''. The namespace information is actually ignored by the current version of the eprints system.

example of mixed tags (and entities):

 <ep:phrase ref="lib/session:contact"><p>Feel free to contact 
 <a href="mailto:&;adminemail;">&archivename; administration</a> 
 with details.</p></ep:phrase>
 eprints elements: phrase
 xhtml elements: p, a
 eprints entities: archiveemail, archivename


The Primary Archive Configuration File

This XML file appears in the archives/ directory, usually /opt/eprints2/archives/, it describes the most very basic details about the archive. It is generated (and modified) by configure_archive and will not normally need to be edited.

EPrints looks in this directory for XML files and attempts to load them all when starting the webserver.

This file should be chmod'd so that it can not be read by random users as it contains the database password.

The top level element is ``archive'' which has the attribute ``id'' which is the id of the archive. It should be the same as the filename. If this file is foo.xml then the id should be foo.

<archive> contains a list of XML tags enclosing some text. eg.

  <host>stoatprints.org</host>

The following tags are expected in no special order:

<host>
The hostname of this archive.

<alias redirect=``yes-or-no''>
This is optional and may be repeated. It has the attribute ``redirect'' which may be set to yes or no. This controls what virtual hosts are supported and if they should redirect to the main <host>.

<language>
The ISO id of a language supported by this archive. Repeatable. One of these should also be the defaultlanguage. See below.

<port>
The port number that the server is running on. Usually 80.

<urlpath>
The directory from the root of the server name. Usually /

<archiveroot>
The filesystem path of the rest of the archive configuration.

<configmodule>
The path to the perl module which does the main configuration (ArchiveConfig.pm)

<dbname>
The name of the MySQL database. Usually the same as the archive ID.

<dbhost>
The host on which MySQL is running. Usually localhost.

<dbport>
An optional MySQL port, if it's not the standard one. Should be empty if we are to use the default.

<dbsock>
An optional MySQL socket. Should be empty if we are to use the default.

<dbuser>
The username to use when connecting to MySQL, usually ``eprints''.

<dbpass>
The password to use to connect to MySQL.

<defaultlanguage>
One of the supported language. This is the default for this archive.

<adminemail>
The email address of the archive administrator. I strongly suggest that this is an alias rather than a personal email address. If all your webpages contain ``bob@footle.edu'' and bill takes over from bob you would have to regenerate every page with ``bill@footle.edu''. Much better to set up an email alias or forward from ``archive-support@footle.edu'' and point it at bob (for now). Heed these words spoken from grim experience!

<archivename language=``langcode''>
The name of the archive. This has an attribute ``language'' the value of which is an iso language id. There should be one of these archivename elements per supported language. eg.
    <archivename language="en">White Lemur</archivename>
    <archivename language="fr">La Archive d'Lemur Blanc</archivename>

(apologies to the french, human languages aren't my strong suit)

<securehost> (since v2.2)
Used for experiemental https support.

<securepath> (since v2.2)
Used for experiemental https support.


ArchiveConfig.pm

This module imports the other 5 perl modules. It allows lots of little tweaks to the system, which are all commented in the file.

It includes options to hide various features you may not want and to customise the browse, search and subscription functions.

Also you can customise what each type of user can and can't do, and how they authenticate their passwords.

This configuaration file contains perl methods which are called when a session starts and ends, to log things, to generate the entities for the entities file and security on non public files.

Browse Views

The browse views are generated by the script ``generate_views'' and what that script does is configured by the ``browse_views'' item in the config.

It is a reference to a perl array [], each item of which is a hash {}.

The hash has 3 required properties and a number of optional ones.

id (required)
The ID of this view - the view will be placed in a subdirectory of /views/ of this name. The ID is also used to identify the full name of this view in the phrase file. id=>"foo" would find it's title in the phrase ``viewname_eprint_foo''

fields (required)
The list of the names of the fields to browse, seperated by a slash ``/''. This should normally be a single field unless you want to merge the values of two fields. The id part of a field may be specified by appending ``.id'' to the fieldname.

order (required)
A list of fields to sort by in order of priority, sepearted by slashes ``/''. A minus sign prefixing the fieldname ``-'' indicates reverse sorting on that field.

allow_null
Should we make a page for the ``unset'' condition? A page for items which do not have a year set may be useful. But for other fields this may be meaningless. Set it to 1 for true.

include
Generate a file for every value, ending in ``.include'' which contains the XHTML of the citations of records and the number of records, but without wrapping the site standard template around it.

nohtml
Normally the system generates a page like that described for ``include'' with a .html suffix and the site template. If nohtml is set to 1 then it won't.

citation
Normally the citation used is that for the ``type'' of eprint. If this is set then that citation (from the citations file) will be used for all items. This allows for some clever stuff if you want to make page which can get sucked into another website.

Normally the system puts a paragraph tag around each citation, but if you use a custom citation this will not happen.

nocount
Do not include the count of how many items at the top of the page.

nolink
The system generates an index.html in /view/ with a list of all the browse views available. Setting nolink to 1 will hide this item.

noindex
Do not generate an index.html file in /view/foo/ listing all the values of the view and linking to their respective pages.

notimestamp (since v2.2)
Do not add the timestamp at the bottom of the view page.

hideempty (since v2.2)
Only applicable to subjects. This option will supress subjects which do not have any records in. This is useful on ``young'' archives which look very empty if you have a large subject tree and only a few records, and those clustered in 3 or 4 subjects.

The most common view is to browse by subject:

 { id=>"subject", allow_null=>0, fields=>"subjects", 
    order=>"title/authors", hideempty=>1 }

A more complex view generates a view on author & editor ID's which are not advertised but may be captured by some other software to build staff CV pages.

 { id=>"person", allow_null=>0, fields=>"authors.id/editors.id", 
    nohtml=>1, nolink=>1, noindex=>1, include=>1, 
    order=>"-year/title" }

For my example person id ``wh'' this will generate a webpage called /view/person/wh.include (and one for each other value of authors or editors ID's) which can be captured by an external automated system.

User Privs

The user permission configuration allows you to set what types of user can and can't do. The user home page will only show a user options which they can do.

New types of user, and which data about themselves they can edit is set in metadata-fields.xml.

Permissions are set by ``type'' of user. By default there are 3 kinds of user: ``user'', ``editor'' and ``admin''.

Admin can, by default, do everything.

subscription (since EPrints v2.1)
If included then this kind of user can create subscriptions.

set-password
Reset their password via the web registration system.

deposit
Submit items into the archive.

view-status
View the archive status page.

editor
User can edit then approve submitted items into the main archive, or delete them, or return them to sender. Also can remove items from the archive back into the edit buffer for corrections, and move records into the deleted table (delete them).

staff-view
User can perform a ``staff search'' of user or eprint records and view ALL the metadata.

edit-subject
User can edit the subject tree via the online interface.

edit-user
User can edit other users records.

change-email
User can change their email address via the web interface. This is safer than allowing them to edit it directly as it ensures they cannot set it to an address which they recieve (it mails them a confirmation pin number)

change-user
This allows the sinister feature which lets you log in as someone else. It still requires a password. This is useful if you want to perform admin tasks as a super user, then log-in as a normal user to deposit items.

no_edit_own_record (since v2.2)
This supresses the ``edit my user record'' option. This may be useful if you disable web-registration and import the user records from some other database.


ArchiveMetadataFieldsConfig.pm

Fields Configuration

Metadata is data about data. The information which we store to describe each record (eprint) in the system. Users also have metadata.

This module is the configuration for the metadata. This is probably the most important part of the system.

The system automatically assigns some fields to each dataset (users, eprints, etc.) such as ``type'' to eprints and ``username'' to users. The majority of the fields are optional, and configured in this module.

Fields have a number of properties. The only required properties are ``name'' and ``type''. Name is the name of the field. This is used to identify this throughout the system. The other properties depend on what type the field is.

When you add a field you need to add the ``human readable'' version in the phrase file, this seperation allows you to change the description without changing the field itself. When you add a field named ``foo'' to the ``eprint'' metadata you should add ``eprint_typename_foo'' to the phrases. You may also wish to add ``eprint_typehelp_foo'' which is the explanation given to the user on the metadata input page.

The following types of field are supported, along with their special property options.

int
Optional properties: digits

This type describes a positive integer. Stored as an INT in the database.

year
This type describes a year. It works pretty much like ``int'' but is always 4 digits long. Stored as an INT in the database.

longtext
Optional properties: input_rows, input_cols, search_cols

This type describes an unlimited length text field. Used for things like titles and abstracts. It can't be effiently searched as a single value, the system indexes the words. See ``free text indexing'' section. Stored in MySQL as a TEXT field.

date
This type describes a date, always expressed as yyyy-mm-dd, eg. 1969-05-23. It is stored as a DATE in the database.

boolean
Optional properties: input_style

This is a simple yes/no field which is stored in the database as SET( 'TRUE','FALSE' ). It can be rendered as a menu, a check box or radio buttons. (See input_style)

name
Optional properties: input_name_cols, search_cols

This type is used to store names of people (eg. authors). It is split into 4 parts: honourific, given names, family name and lineage. This may seem over fussy but it avoids people putting ``Reverend'' in the given names or ``Junior'' in the family name. If you dislike this you can hide honourific and lineage (See ArchiveConfig.pm).

We use ``family name'' rather than ``last name'' in the hope of avoiding international confusion (some countries list family name first, so their last name is what I would call their ``christian'', or ``first'', name.

Names are stored using 4 SQL fields. The name field ``supervisor'' would be stored as supervisor_honourific, supervisor_given, supervisor_family, supervisor_lineage. Each is a VARCHAR(255).

set
Required properties: options

Optional properties: input_rows, search_rows

This type is a limited set of options. The list of options must be specified. Each option must also be added to the phrase file. Option ``foo'' of field ``bar'' in the ``user'' dataset will have the phrase id ``user_fieldopt_bar_foo''.

Stored in the database as a VARCHAR(255), containing the id of the option.

text
Optional properties: input_cols, maxlength, search_cols

This is a simple text field. It normally has a maximum length of 255 ASCII characters, less if non-ASCII characters are used as these are UTF-8 encoded.

Stored in the database as a VARCHAR(255).

secret
Identical to ``text'' except that the input field is a starred-out password input field, and it is only ever written to the database, it can't be read back. Writing an empty value will NOT change the previous value.

url
Identical to ``text'' except it is rendered and validated differently.

email
Identical to ``text'' except it is rendered and validated differently.

subject
Optional properties: top, showtop, showall, input_rows, search_rows

This is a hierarchical subject tree. At first glance it works like sets, but it can be searched for all items in or below a given subject. Subjects may be added to the live system.

The subject tree starts at a subject with the id ``ROOT'' but a subject field only offers all the items below the subject with the id ``subjects''. This can be changed using the ``top'' property, so that you can have two fields which options are different parts of the same tree.

Subjects may have more than one parent. eg. biophysics can appear under both physics and biology, while still being the same subject.

See the bin/import_subjects manpage for more information on seting up the initial subjects.

You may have more than one ``subject'' field, eg. Subject and Department, with unrelated parts of the subject tree as their ``top''.

A later version of eprints2 will have a feature which allows an admin user to limit an editor user to a certain subject (and things below it). So that in the above example you can declare an editor of either a Subject (capital-S) or a Department.

pagerange
A range of pages, eg 1-44. Currently not searchable.

Stored in the database as a VARCHAR(255).

datatype
Required properties: datasetid

Optional properties: input_rows, search_rows

This field works like a set, but gets its options from the types of the dataset specified.

For example, if you specified the datasetid ``user'' then, unless you've changed the defaults, would give the options ``user'',``editor'' and ``admin'' - which are the types of user specified in metadata-types.xml.

Options are:

user
The types of user.

document
The types of document.

eprint
The types of eprint.

security
Security levels of a document (probably not very useful).

language
All the languages specified in languages.xml

arclanguage
The languages supported by this archive. Configured in ArchiveConfig.pm. Stored in the database as a VARCHAR(255).

langid
This is used internally, it contains an ISO language ID. You probably don't want to use it. Stored as a CHAR(16).

id
This is also used internally, it contains the ID part of a field with the hasid property. Don't use it! Stored in the database as a VARCHAR(255).

search (Since EPrints v2.1)
Required properties: datasetid, fieldnames

Optional properties: allow_set_order

This type describes a stored search acting on the named dataset. The fields that can be searched are described by fieldnames.

This field type is quite unusual and you are not really expected to use it. It was created for use in the systems field of the Subscription dataset.

This field is stored in MySQL as a TEXT field.

Field Properties:

``status'' indicates either ``system'' or ``cosmetic'' or ``other''. ``system'' properties cannot be changed without erasing and recreating your archive. ``cosmetic'' fields only effect the display of data and can be safely changed. ``other'' is explained in the description.

name
Status: system

Required by: all

Default: NO DEFAULT

The name of the field. Strongly recommended to only be lowercase a-z only.

type
Status: system

Required by: all

Default: NO DEFAULT

The type of field. One of the list described above.

browse_link
Status: cosmetic

Optional on: all

Default: undef

This is the id of a ``browse'' view. This will hyperlink this value to the browse for that value when rendering it.

confid
Status: cosmetic

Internal use only. Sets the confid if a field is being created without a dataset. The confid is used as a fake dataset for generating phrase ids.

datasetid
Status: other

Required by: datatype

Default: NO DEFAULT

Used to set which dataset's types are this fields options.

Changing this on a live system could cause some confusion, as values in the old dataset may exist.

digits
Status: cosmetic

Optional on: int

Default: 20

Maximum number of digits for this number.

input_rows
Status: cosmetic

Optional on: longtext, set, subject, datatype

Default: set in ArchiveConfig.pm

The number of input rows in a text area, or options to display at once in a menu. Setting to 1 will make a pull down menu (unless this is a ``multiple'' field).

search_cols
Status: cosmetic

Optional on: text, longtext, url, email, name, id

Default: set in ArchiveConfig.pm

The width of the search field. If searching multiple fields at once then the value is taken from the first field in the list.

search_rows
Status: cosmetic

Optional on: datatype, set, subject

Default: set in ArchiveConfig.pm

The number of items to display in a search field list. If searching multiple fields at once then the value is taken from the first field in the list.

input_cols
Status: cosmetic

Optional on: text, longtext, url, email

Default: set in ArchiveConfig.pm

The width of the input field.

input_name_cols
Status: cosmetic

Optional on: name

Default: set in ArchiveConfig.pm

The width of the input fields of a ``name'' field.

input_id_cols
Status: cosmetic

Optional on: fields with ``hasid'' set.

Sets the width of the ID input field on a field with an ID.

Default: set in ArchiveConfig.pm

input_add_boxes
Status: cosmetic

Optional on: fields with ``multiple'' or ``multilang'' set.

Default: set in ArchiveConfig.pm

How many boxes to add when the user hits the ``more spaces'' button.

input_boxes
Status: cosmetic

Optional on: fields with ``multiple'' set.

Default: set in ArchiveConfig.pm

How many boxes to initially show on a multiple field.

input_style
Status: cosmetic

Optional on: boolean

Default: undef

By default booleans render as a check box. These other formats look a bit clearer on the input field:

menu
Display as a pull-down menu. You will need to set the phrases dataset_fieldopt_fieldname_TRUE and dataset_fieldopt_fieldname_FALSE (where dataset & fieldname are the ids of the dataset and field). These are the menu options.

radio
Display as radio buttons (ones which deselect when you select another one). You will need to set the phrase dataset_radio_fieldname. This phrase should have two ``pin'' elements: true and false, which are the positions to place the radio buttons.

fromform
Status: cosmetic

Optional to: all

Default: undef

A reference to a perl function which will process the value from the form before storing it. The function will be passed ($value, $session) where value is the value from the form and session is the current EPrints::Session. It should return the processed value.

This could be used, for example, to turn a username ``moj199'' into a userid ``312'' for internal user.

toform
Status: cosmetic

Optional to: all

Default: undef

A reference to a perl function which will process the value just before it is displayed in the form. The function will be passed ($value, $session) where value is the value from the database and session is the current EPrints::Session. It should return the processed value.

This could be used, for example, to turn a userid ``312'' being used internally by your systems into more human-friendly username ``moj199''.

If you use toform then you should probably set fromform to change your values back again.

maxlength
Status: cosmetic

Optional to: text, email, url, secret

Default: 255

The maximum length of the value.

hasid
Status: system

Optional to: all

Default: 0

This adds an additional ``ID'' property to the field. This is most useful on a ``name'' field which is ``multiple''. It associates an additional value with the name, for example a username, or email address, which can be used to uniquely identify that person. If you want to get an accurate list of all of someones papers then their name is NOT good enough.

You might also wish to make a ``publication'' text field have an ID which is an optional ISSN, but it makes more sense in ``multiple'' fields.

multilang
Status: system

Optional to: all (but silly for date, year, int, boolean)

Default: 0

If set this makes the field ``multilingual''. That is to say it can have more than one value, one value per language.

For example, the ``canadian stuff'' archive may wish to make your title and abstract multilang so that authors can enter them in both french and english.

This is more useful than having title_en and title_fr as eprints understands it and can render the version of the field appropriate to the viewer (if they set a language preference).

multiple
Status: system

Optional to: all (but silly for date, year, int, boolean)

Default: 0

If set this property makes the field a LIST rather than one value and handles rendering it as a list and inputing it. The input field will appear with a default of 3 inputs and a ``more spaces'' button which will reload the page with more if you need more than 3.

This causes the field to be stored in a seperate SQL table.

options
Status: other

Required by: set

Default: NO DEFAULT

This should be a array of options. eg.

 [ "blue", "green", "red" ]

Removing options on a live system could leave invalid values floating around. Adding options is fine. Don't forget to add them to the phrase file too.

required
Status: system

Optional to: all

Default: 0

This indicates that this field is always required. It is not recommended to set this, but rather indicate requirednes of fields by type in the metadata-types.xml file.

Either way you set it, required fields will cause the item they are in to fail to validate unless the field has a value.

requiredlangs
Status: other

Optional to: fields with ``multilang'' property

Default: []

A list of languages which are required for this multilang field. eg. you can force an ``en'' (english) entry, while allowing them to optionally add others.

eg. [ ``en'', ``fr'' ]

A list of codes can be found in languages.xml

Adding more requiredlangs does not magically give you values for these languages in existing data.

showall
Status: cosmetic

optional to: subjects

Default: 0

By default subjects are only shown if they are ``depositable''. This option makes all subjects, depositable or not, options.

showtop
Status: cosmetic

optional to: subjects

Default: 0

If set then the topmost item in the subject is shown. Usually this is a container, eg. ``subjects'', and should remain hidden.

top
Status: cosmetic

optional to: subjects

Default: ``subjects''

Sets the top node in the tree. The options are all the children (and their children).

idpart
Used internally.

mainpart
Used internally.

render_single_value
Status: cosmetic

Optional to: all

Default: undef

This overrides the rendering of a single item. In a multiple, multilang field it will be called on each value of the language to display.

This is a reference to a function which takes ( $session, $field, $value ) and returns a XHTML DOM fragment.

Set this to \&EPrints::Latex::render_string to make eprints try and spot latex in this fields values and render it as images instead!

(Since EPrints v2.1) Set this to \&EPrints::Utils::render_xhtml_field to make eprints read this field as XML and place that XML right in the XHTML web page. (Normally the system would escape all the greater-than and less-than characters.

render_value
Status: cosmetic

Optional to: all

Default: undef

This is a reference to a function which will render the entire value of the field, overriding eprints own renderer. It should take as parameters: ( $session, $field, $value, $alllangs, $nolink )

The function should return an XHTML DOM fragment.

If $alllangs is set then the function should render all values on a multilang field, rather than just the ``best'' one.

If $nolink is set then no HTML anchor links should be used, eg. to link a URL.

export_as_xml
Status: cosmetic

Optional to: all

Default: 1

If this attribute is set to zero then this field will be ommitted from the output of the XML export script.

make_value_orderkey
Status: other

Optional to: all

Default: undef

This may be a reference to a subroutine which returns a single string which can be used to alphabetically sort this string. It is used to order the results within the database. The function is passed the following parameters ( $field, $value, $session, $langid ). You may wish to sort certainly fields differently for different languages.

For example - for some reason you may want a field formated with a single character then an integer ( a934 or b3 ) - If you sort this alphabetically then a2 would come after a11. So you make the orderkey function do something like:

 $value =~ m/^(.)([0-9]+)$/;
 return sprintf( "%s%08d", $1, $2 );

This would turn a2 into a00000002 and a11 into a00000011 which will sort correctly alphabetically. Don't worry - these values are only ever used for sorting, they should never get output.

You should probably use the bin/reindex command on the dataset in question (probably ``archive'' or ``user'' after adding or changing this property to a field. This may take a significant amount of time.

make_single_value_orderkey
Status: other

Optional to: all

Default: undef

This is a slightly more simple version of make_value_orderkey. It only takes ( $field, $value ) as parameters. It is only ever passed single values of $value and lets eprints takes care of multiple values (or multilang values) by calling the function once per value.

As with make_value_orderkey you should reindex after meddling with orderkeys.

fieldnames
Status: cosmetic(ish)

Required by: search

Default: NO DEFAULT

This should be a reference to an array of field names - exactly like the ones used in ArchiveConfig.pm to configure search, advanced search and subscriptions.

Adding fields to this will cause no problem. Removing fields will mean that those fields are ignored when turning values of this field back into searches.

can_clone (since v2.2)
Status: changeable (but changes functionality)

Default: 1

If can_clone is set to zero then this field will not be cloned when the record is cloned. This may be useful for automaticly generated fields or fields with meaning such as ``content has been spellchecked'' or somesuch.

sql_index (since v2.2)
Status: system

Default: 1

If this field is set to zero then an SQL index will NOT be created for it. This means the field should never be used in a ``value exactly matches'' search as it may be very slow. MySQL has a limit of 32 indexes per table, which is why you should use this field if you go over that limit.

id_editors_only (since 2.2)
Status: cosmetic

Default: 0

Optional on: fields with ``has_id'' set.

It means that the ``id'' part of the field only appears in the editor view, not the normal user submission form. Some archives may wish to do this to save confusing the person making the deposit.

allow_set_order (since 2.2)
Status: changeable (but changes functionality)

Default: 1

Optional on: search

Prompt user for a search order in addition to the search fields.

Defaults

This section of the file contains subroutines which are called to set default values for Users, Documents and EPrints.

Automatics

These functions let you set automatic fields. This allows you to make fields which are updated automatically each time the item (User/EPrints/Document) is commited to the database.

This allows you to create ``compound'' fields. Such fields are created by processing the values of other fields rather than being edited directly.

For example, if you wanted to make an automatic int field which contains the number of authors, you could add the following to set_eprint_automatic_fields:

 # no authors at all will be undef, not [] so check first
 if( $eprint->is_set( "authors" ) )
 {
        my $auths = $eprint->get_value( "authors" );
        $eprint->set_value( "authcount" , scalar @{$auths} );
 }
 else
 {
        $eprint->set_value( "authcount" , 0 );
 }


ArchiveOAIConfig.pm

This module configures how the archive exports its data via the OAI protocol.

For more inforamtion on the how and why of OAI see http://www.openarchives.org/

OAI allows a harvestor to request the metadata from your archive and other archives to provide a federated search. The next time the harvestor harvests your archive it only has to ask for items which have changed or been added since last time it asked.

The current version of EPrints supports both OAI 1.1 and OAI 2.0.

The base URL for your OAI v1.1 interface will be http://archivepath/perl/oai

The base URL for your OAI v2.0 interface will be http://archivepath/perl/oai2

If you want to use the OAI system then you need to fill in the blanks, such as policy and the OAI-id of the archive.

You may create OAI sets in a similar manner to ``browse views'' in ArchiveConfig.pm.

If you want to change the way that an EPrint is mapped into Dublin Core then edit the make_metadata_oai_dc - which returns a DOM XML object.

To add a new metadata type you need to add a new mapping function and add entries to the namespaces, schemas and functions items near the top of the file.


ArchiveRenderConfig.pm

This module contains fuctions which turn data into XHTML for displaying on the web.

If you want to change the way a user info page, or an eprint ``abstract'' page is rendered then here's the place to do it.

There are also ``full'' versions of these functions which display all the internal variables and things. These are the views which the editors and admin see.

The XHTML is generated using DOM (Document Object Model), but eprints provides some functions for easily generating XHTML DOM. The only method of DOM you should need to use is appendChild - which adds an element to this element.

EPrints API functions which return XHTML objects.

Note, all text strings should be in UTF-8.

Example:

 my $page = $session->make_doc_fragment(); 
 my $h1 = $session->make_element( "h1" );
 $h1->appendChild( $session->make_text( "Title" ) );
 $page->appendChild( $h1 );
 $page->appendChild( 
    $session->make_element( 
       "img",
        src=>"/images/cheese.gif",
        width=>128,
        height=>53 ) );

$page now contains:

 <h1>Title</h1><img src="/images/cheese.gif" width="128" height="53" />

Many of the EPrints modules are now properly(!) documented. For an example try running:

 % perldoc /opt/eprints2/perl_lib/EPrints/Archive.pm

The functions most useful to extacting and rendering information are documented here:

$session->make_text( $text )
Returns a DOM object representing that text.

$session->make_doc_fragment()
Returns a document fragment. This renders to nothing but is a container to which you can add stuff.

$session->make_element( $name, %opts )
Makes a simple XHTML element. %opts is an optional series of attributes.

To make <h1 class="foo">...</h1> you would call:

 $session->make_element( "h1", class=>"foo" );

$session->render_ruler();
Returns the default ruler for the archive (from ruler.xml).

$session->render_link( $uri, $target )
Returns the XHTML element (with URI properly escaped):
  <a href="uri"></a>

Which you can appendChild stuff into. If $target is specified then a target attribute is included - to make it pop up a new window.

$item->render_value( $fieldname, $showall )
$item is either an EPrint, a User or a Document.

$fieldname is the name of the field you want to render. If $showall is 1 then ALL values are rendered in a multilang field.

$item->render_citation( $style )
Renders the citation of the item using the citation for the item's type from the citation file.

If $style is set then it uses the citation with that id instead.

$item->render_citation_link( $style )
This renders a citation as above, but links it to the url of the item.

$item->render_description()
This renders a simple description of the item using the default citation for this dataset eg. for eprint it uses citation type ``eprint''.

$session->html_phrase( $phraseid, %opts )
Returns the item from the phrase file. If you don't care about supporting multiple languages then just use make_text instead, it's easier.

It looks first in the archive field from the current language.

Then in the archive phrase file for english.

Then is the system phrase file for the current language.

Then is the system phrase file for the english.

The %opts are a series of DOM elements to place in the ``pin'' items in the phrase file.

Some other useful functions you may need

$item->get_value( $fieldname, $no_id )
Returns the value of field $fieldname from the item. An optional second parameter may be set to 1 to return the value without the ``id'' part, to keep things simple.

$item->is_set( $fieldname )
Returns true if the field is set on this object, false otherwise.

$eprint->get_all_documents()
Return an array of the document objects belonging to this eprint.


ArchiveTextIndexingConfig.pm

This module you probably won't need to change unless you want to modify how eprints does searches for words in strings.

When a record is added to the system eprints uses this module to turn a string into a list of values which are indexed. By default these are words with 3 letters or more except some predefined stop words. It also turns latin characters with acutes into the their plain ascii (no acute/grave) versions.

It then does the same with the search string and looks for these keys.

Example:

 The rain in spain falls mainly on the plains.

Is turned (by default) into the keys:

 rain spain fall mainly plain

Thus searching for ``rain'' or ``plain'' or ``plains'' or ``MaiNlY'' will all match this string.

You may wish to add your own ``stop words''. eg. If you are running an archive about badgers, a search for the word ``badger'' will return almost all the records.

At a more complex level you may wish to add handling for non-european character sets (I have no idea how well the default setting will work on these), or do ``stemming'' - removing ``ed'', ``ing'', ``ies'', ``s'' etc. from the end of words so that ``land'' will match ``land'', ``landed'', ``landing'' and ``lands''. (It current removes 's').

Another suggestion is using soundex or similar techniques to match words which sound similar.

Changing the indexing on a live system will require you to regenerate the indexes using the reindex script. (If you don't then some of the search results will be wrong).


ArchiveValidateConfig.pm

This module handles validating data entered by users. Each subroutine is described in more detail in the module itself.

Each subroutine returns a list of DOM elements, each of which describing a single problem. Any problems will prevent the user from continuing with editing until they correct the problems.

As with the rendering functions, if you don't care about making this work in more than one language then you can just make the DOM items by calling $session->make_text( ``problem explanation'' )

The eprint & document validation routines have a flag $for_archive which, if true, indicates that the item is being checked before going into the actual archive. You can use this to force an editor to enter fields which the user may leave blank.

Validation Functions

validate_field
Called for all fields. Use it to check individual field values. By default checks that url's look OK.

validate_eprint_meta
Check the metadata of an eprint. Use this to test dependencies between fields. eg. if you have a requirement that field ``A'' OR field ``B'' must be set.

validate_eprint
Validate the whole eprint. The last part of the validation of an eprint.

validate_document_meta
Validate the metadata of the document (as with eprint_meta)

validate_document
Validate the whole document, files and metadata.

validate_user
Validate a user record.


citations-languageid.xml

The ciations file describes how to render an item (eprint/user/whatever) into a short piece of XHTML. Each citation has a ``type''. There are 3 kinds of citation:

default citation
This is a very short description of the item. Usually ``the title or failing that, the id''. The type id is just the name of the dataset. eg. ``eprint''

type citation
These are richer descriptions which vary between type of eprint, user or document. The type id is dataset_type eg. eprint_preprint.

other citation
Used by custom browse views. Any name you like.

The citation file contains a list of citation elements:

 <ep:citation type="...">

Each one may contain text and tags. The text may also include the names of fields in the record being rendered. These names should be between @ symbols. eg. @authors@ or @title@. These will be replaced with a rendered version of the value in that field. (if you need an actual @ symbol for some reason two @@ with nothing inside will be rendered as a single @).

Note. The @title@ style was introduced in EPrints 2.2. Before that this file used XML entities such as &title; but this caused problems and didn't solve any. Use of entities is still supported, but deprecated.

In addition you may use XHTML elements and the following elements in the eprints namespace. These elements are always removed but they control if their contents is kept or not. Conditional elements may be placed inside each other since v2.2.

<ep:linkhere>
This element is replaced with an XHTML anchor linking to the item. If this citation is being rendered without a link then it is just removed (but not the contents).

<ep:iflink>
The contents of this element are only preserved if we are rendering this citation as a link. Maybe an icon which you don't want if it's not a link.

<ep:ifnotlink>
The opposite of iflink.

<ep:ifset ref=``fieldname''>
The contents of this element are only preserved if the field ``fieldname'' has a value.

<ep:ifnotset ref=``fieldname''>
The contents of this element are only preserved if the field ``fieldname'' does not have a value.

<ep:ifmatch name=``fieldname(s)'' value=``searchparam''>
This is the swiss army knife of the world of conditional rendering. It is also a bit complicated, and few people will need to use it. This actually works like a single search element. The attributes are:
name
This is the name of one or more fields, specified as in the search fields configuration. eg. ``title/abstract''

value
This is a value to search for. Treated like the value entered in a search field.

merge (optional)
Can be ANY or ALL. Works like the match all? in a search form.

match (optional)
Can be IN, EQ, or EX. In, Equal or Exact. Exact on subjects means that subject, but not any below it in the heirarchy.

For example:


 @year@<ep:ifmatch name="year" value="-1949"> 
 (approx)</ep:ifmatch>

This will render (approx) after years before 1950. Neat eh?

<ep:ifnotmatch name=``fieldname(s)'' value=``searchparam''>
Like ifmatch but only includes the values inside if the search does not match.


metadata-types.xml

This file allows you to configure the types of eprint, user, document and document security level.

When you add a new type you should add it's name to the archive phrases file(s). The phraseid is ``dataset_typename_typename'' eg. ``document_typename_pdf'', and you should add a new citation to the citations file. Any fields which are not required but appear in the citation should probably be inside a <ep:ifset> so that you don't get see ``UNSPECIFIED'' if they are not, er, specified.

The main element is ``metadatatypes''. This contains a list of ``dataset'' elements each of which has a name attribute.

The ``type'' elements in user and eprint ``dataset''s should contain a list of ``field'' elements. This describes the fields which may be edited for this type and the order that they appear on the form.

You may include system fields in this list, but be careful if you do.

Attributes for ``field'' are:

name
The name of the metadata field.

required
If set to ``yes'' then this field may not be left blank. Some system fields are always required no matter how this is set.

staffonly
This field only appears on the ``editor'' edit eprint form, not the user one. Or, in the case of the user dataset, the staff edit-user page.

The ``security'' dataset

This is a handy place to define the security levels. The type with no name is special. It is the ``public'' security type. All other types will require a valid username and password. If that username is acceptable for a given document is decided by the can_user_view_document subroutine in ArchiveConfig.pm

The ``document'' dataset

By default eprints requires at least one of ps, pdf, ascii or html to be uploaded before an eprint is valid. You may change this list in ArchiveConfig.pm - any more complicated conditions will have to be checked in the eprint validation subroutine.


phrases-languageid.xml

This file contains a list of XML ``phrasees''. Everything eprints ``says'' to users is stored in this file and its system-level counterpart. If you want the site to run in more than one language, you need one phrase file per language.

The phrase file is XML and contains a toplevel ``phrases'' element. This contains the list of phrases.

Each phrase has a ``ref'' attribute to identify it and contains text and optionally some XHTML tags. It may also contain eprints entities such as &archivename; and also some phrases should contain ``pin'' elements, described below.

The phrases in the archive phrase file are specific to that archive, the system phrase file contains non-archive specific phrases. The id's of most of the phrases in the archive phrases are generated from the id's of the fields, datasets, types etc.

The archive phrase file contains: names of dataset types, names of metadata fields, help on entering each Ametadata field, the names of options in ``set'' fields, the description of different search ordering options, names of browse views, phrases used in the render and validation routines, mail which eprints sends out and phrases which override those in the system file.

pins

Some phrases need some ``pin'' elements to show eprints where to insert values. Usually pins don't contain any elements but occasionally they do when they represent what to place a link around.

Overriding System Phrases

If you don't like some of the phrases in the main system phrases file you can override them by creating a phrase with the same ``ref'' in the archive file.

Don't edit the system file, if you upgrade eprints to a newer version it will get over-written.

Emails

EPrints sends out emails when a user registers/changes their password, when a user changes their email, when a deposited item is rejected/deleted by an editor and when the system is low on resources. These mails can be customised in the phrase file.

Make sure you wrap your text in paragraph <p> tags. EPrints will automatically word wrap these in the email. <hr /> elements in a mail are turned into a line of dashes.

When eprints sends a mail it will send it as plain ASCII text, unless it contains latin-1 elements, in which case it will be latin-1 encoded. If it contains unicode characters not in the latin-1 charset then it will be utf-8 encoded.


ruler.xml

This file configures the horizontal divider which eprints uses, which is inserted in place of &ruler;

If you have no great dislike of <lt>hr<gt> horizontal rulers then you can leave it alone.

You can't use entities like &frontpage; in ruler.


The static/ directory

This directory contains the static pages for the site - the frontpage, the help pages, images, the stylesheet etc.

static/ contains one directory per language, eg. en. Plus a general directory which contains files which don't need translating like images and the stylesheet.

When you run the generate_static command it copies the files for each language, and the gerneral dir, into the static site for that language.

See the generate_static documentation for more details.


subjects

This file is not used by the core eprints system. It is used by import_subjects to set up the initial subjects. For more information see the instructions for import_subjects.


template-languageid.xml

This file is the shell of every page in the system. It is more or less a normal XHTML page but you can use the eprints &foo; entities in it and it should contain ``pin'' elements like a phrase. The pins it should contain are:

<ep:pin ref="title" />
This is where to put the title of the page. It can be used more than once - in the title in the page header and somewhere in the body. If placing it in the title in the head of the page you must use the additional attribute textonly=``yes'' which only works here. It removes images from the title (which can happen if using the ``Latex'' mode).

<ep:pin ref="head" />
This goes somewhere in the head of the page. It shows eprints where to insert the ``meta'' and ``link'' elements.

<ep:pin ref="pagetop" />
This goes at the top of the body. It is sometimes used as a ``target''.

<ep:pin ref="page" />
Where to place the bulk of the content of the page.

 EPrints 2.2 Documentation - Configuring an Archive