Terms commonly used by the Digital Scholarship Group





administrative metadata

Administrative metadata is used to manage a digital object. The DRS uses administrative metadata to record who has the right to view an object or what changes have been made to an object over time.



A set of step-by-step instructions for achieving a computational task.


application programming interface (API)

An API (application programming interface) is a set of specifications used to instruct technologies on how to interact with each other. For example, Fedora, the architectural foundation of the DRS, has an API that allows other programs and interfaces, like WordPress or Omeka, to access or display files stored in the DRS.


authority control

Authority control is the method of selecting and approving a single term to describe a concept or a single name for a person or entity. Authorized terms and names are used to eliminate ambiguity and confusion in descriptive practices, and can be useful in the sorting and browsing of search results.




Blacklight is discovery interface that allows users to interact with search results in a useful way. The DRS uses Blacklight to create facets and limits for search results.



Cascading style sheets (CSS)

A language for describing the formatting of an XML or HTML document. CSS is primarily used by web browsers to control the look of web pages.


Central Authentication Service

Central Authentication Service (CAS) is a protocol that allows users to access to many different applications by signing in to only one service. Northeastern University uses Shibboleth as its CAS, and the DRS uses Shibboleth to authenticate users for the DRS.


command-line interface (CLI)

A very powerful computer program which completes tasks at the textual command of a user. CLIs only use keyboard input to interact with users, unlike the mouse-heavy input of a graphical user interface (GUI). When using a CLI, users must write exactly what they want to happen, using a terse grammar. While this may seem less intuitive than accomplishing the same tasks within a GUI, CLIs have these advantages: (1) speed, since commands can be typed and executed quickly; (2) precision, since a wide variety of useful commands exist; (3) customization, since most commands can be optionally modified to execute in specific ways; and (4) reusability, since commands can be saved and used again. Some programs and tools can only be used through a CLI.

Examples of CLIs include MS-DOS, the Mac Terminal, and bash. (It’s worth noting that different CLIs may use different vocabularies and commands.)


content management system

A web application for managing (ie. create, delete, publish) content on web pages. Content management systems provide interfaces to authoring, collaboration, and administration of web content. Commonly used content management systems include Drupal, WordPress, Omeka.


controlled vocabulary

A predefined list of terms from which a value must be selected. E.g., the controlled vocabularly for the “best contact method” question might be “e-mail”, “telephone”, and “text message”; because the vocabularly is controlled, no other answers would be allowed. Theodor Geisel famously wrote an entire book, _The Cat in the Hat_, using a controlled vocabulary of 236 words. The opposite is a natural language vocabulary, in which the responses to “best contact method” might include “text msg”, “TXT”, “TXT msg”, “SMS msg”, “sms message”, etc.


Creative Commons licensing

Standardized licenses that content creators assign to indicate what permissions they grant for sharing and reuse of their work. Creative Commons licenses do not replace traditional copyright; content creators retain all rights to their work that are not included in the license they choose to apply.



Darwin Core

Darwin Core is a descriptive metadata standard used to describe objects in the biology and life sciences fields. It is primarily designed to accommodate taxonomy and specimen details.


data curation

Data curation is the process of maintaining and preserving data, in all formats, to ensure it remains useful and can be accessed over time. Data curation, as a process, is closely related to data management.


data management

Data management is similar to data curation, but it typically refers to maintenance and preservation of data during the course of a project. Research projects are quite often required to establish a data management plan to document the strategy for collecting, organizing, sharing, maintaining, and preserving data collected during the life of the research project.


descriptive metadata

Descriptive metadata is information used to illustrate the general details of an object, like title, author, description, and keywords. Descriptive metadata is the primary metadata set indexed in the DRS, and descriptive metadata supplies the details that accompany each DRS object.


digital curation

Digital curation is the process of maintaining and preserving digital assets of all kinds (data, documents, audio, video, etc.) to ensure they remain useful and can be accessed over time. The process of digital curation for data is called data curation.


Digital Humanities

A domain of research and practice focused on the intersection of humanities research and digital technologies and methods. Like all subject domains, the precise boundaries and definition are a topic of ongoing debate; some community definitions have been collected for instance at the Day of DH site: http://dayofdh2013.matrix.msu.edu/members/


digital object

A digital object is a discrete digital asset. For the purposes of the DRS, an object can be considered a single file, like a PDF or a JPG. Objects are often also referred to as items, files, materials, or resources.


Digital Object Identifier (DOI)

A character string assigned to a digital object that links to metadata about the object, including a URL where the object may be found. A DOI does not change over the lifetime of the object, thus providing a permanent, stable way of citing the object. The DOI system is implemented by a network of organizations overseen by the International DOI Foundation.


digital preservation

Digital preservation is the process of maintaining and storing digital assets (both digitized and born digital) so that they can be used over time, regardless of the currency of the file format or the availability of a physical medium to interpret the asset.


digital publishing

Also known as electronic publishing or e-publishing. Differs from “desktop publishing” in that it follows the formal practices of traditional publishing, but does not include the traditional processes of printing and distributing the finished product. Often makes use of XML and stylesheets to provide for reflowable content that can be easily formatted for output on different devices and platforms.


digital repository

A digital repository is a digital object storage solution that stores and maintains a broad range of digital files from various sources.


Digital Repository Service (DRS)

The DRS [https://repository.library.northeastern.edu] is a secure repository system, designed to store and share scholarly, administrative, and archival materials from the Northeastern University community.


Digital Scholarship Commons (DSC)

The Digital Scholarship Commons is the physical space on the second floor of Snell Library for faculty and doctoral students to collaborate, study, or seek help with digital scholarship projects. The DSC is the home of the Digital Scholarship Group (DSG), the Center for Advancing Teaching and Learning Through Research (CATLR), and Academic Technology Services (ATS).


Dublin Core

Dublin Core is a widely used descriptive metadata standard that uses a very basic set of elements to describe an object (Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, and Rights). Every object in the DRS has a Dublin Core record.




An embargo is a ban on viewing or downloading a particular digital object. Objects in the DRS that are embargoed typically have an embargo date, which is the date the object will become publicly available.


Encoded Archival Description (EAD)

An XML language for finding aids. EAD is a non-proprietary de facto standard maintained by the US Library of Congress and the Society of American Archivists. Finding aids are inventories,indexes, or guides that are created by archival and manuscript repositories to provide information about specific collections to help people find items of interest.


Exif (Exchangeable image file format)

Often called EXIF, this is a framework for metadata, and a set of metadata fields, that can be embedded in image files. E.g., a typical JPEG file generated by a modern digital camera will have Exif data that includes the date and time the picture was taken, the brand and model of the camera, exposure information, etc.



An open-source database and publishing system that works directly with XML data. It provides full-text searching and also searching based on the XML markup, as well as the ability to develop fairly complex user interfaces. eXist can also be used as a way of generating data from XML collections in other formats appropriate for visualizations. DSG is exploring the use of eXist as a way of publishing XML data



fair use

An exception to the requirement to ask permission for use of copyrighted material, and its application is treated on a case-by-case basis. Whether a usage is fair is determined by an analysis of four factors: the purpose and character of the use, including whether it is educational or commercial; the nature of the copyrighted work; the amount and substance of the portion used; and the effect of the use on the potential market for or value of the work.



Fedora (Flexible Extensible Digital Object Repository Architecture) is an open source repository framework, which is supported by a community of users, as well as by DuraSpace. The DRS uses Fedora as its underlying storage architecture.


file formats

Different types of computer files, designed for use by particular software or operating systems. Formats are usually denoted by the final part of a file name, also called the extension. For example, file “a.exe” is an executable file, encoded for launching software within the Windows operating system. We can tell because it has a file name extension of “exe”. Wikipedia has a list of file formats here: http://en.wikipedia.org/wiki/List_of_file_formats




An open-source version control system. Git offers a distributed system of version control – whenever you clone (make a copy of) someone else’s repository, your copy has all of the files and version information that the original does. Git repositories can synchronize later changes between themselves, rather than relying on a centralized server to do it for them. Among other things, this means that work can be done entirely offline, and that any clone can serve as backup for another. The free book Pro Git is an excellent resource: http://git-scm.com/



A website that facilitates Git-driven version control by offering a hosting for public or private repositories, as well as a graphical interface for managing versions and changes. Though often used to develop code, GitHub is a useful tool for collaborating on many types of projects, especially open-source ones. The website is located at https://github.com




A type of Permanent URL.



See Samvera



institutional repository (IR)

An online archive for collecting, preserving, and disseminating digital copies of the intellectual output of an institution, particularly a research institution.








An operating system (i.e., software that runs your computer, like Windows or Mac OS X or iOS) that runs on not only Macintosh and standard PC computers, but may also be used on phones or other devices. Often called “GNU/Linux”, it comes in a variety of distributions including Ubuntu, Debian, Red Hat, and Android. The main advantage of the Linux operation system is that it is free (as in speech) open-source software.




“Data about data,” such as: dates of creation or modification, language or languages, authorship details, document length or file size, and so on. Typically used for the discovery, management, organization, and retrieval of information.



An open-source relation database system. MySQL is used as the back-end of several products DSG supports and uses, including Drupal, OJS, Omeka, and WordPress



name disambiguation

The process of using authority control or assigning unique identifiers to distinguish between authors and researchers with the same names.


nonconsumptive research

A way of performing research by digitally scanning a corpus of written material rather than reading it, in order to analyze trends and themes on a massive scale. Also known as nonexpressive research.




An open-source content management system for online digital collections.


Open Access

Unrestricted access to the products of scholarly research.


Open Journal Systems (OJS)

An open-source journal management and publishing platform produced by the Public Knowledge Project with the goal of expanding the field of open-access journal publishing.


open source

Describes software for which the original source code is made freely available and may be redistributed and modified.



A proprietary (as opposed to free as-in-speech or open source) text editor that is specifically and well designed to make editing XML documents easier. Almost all DSG staff regularly use oXygen to edit TEI, XSLT, HTML, and other XML language documents.



permanent URL

A URL for an object on the web that will never change, even when the object itself moves locations.



A scripting language for web development. It is a server-side language commonly used to build dynamic database-driven websites. PHP is commonly paired with MySQL.





regular expressions (regex)

A formalized and powerful system of ”wildcard” search-and-replace.


Ruby on Rails

Commonly known simple as “rails” is a web application framework built on the Ruby programming language. It uses the MVC (model-view-controller) style of programming. Ruby on Rails is used to build web applications such as Samvera and the Digital Repository Service.




Samvera (formerly known as Hydra) is a suite of tools and technologies that processes the files and metadata stored in Fedora so users can interact with them in a meaningful way. Samvera uses Ruby on Rails to program, Solr to index, and Blacklight to enable searching and browsing.


style sheet

A style sheet is a set of rules used to convert a markup document, like HTML or XML into a presentable format, or into another file format altogether.


Subversion, SVN

(Technically called ”Apache Subversion”) A powerful free (as in speech) open-source revision control or version control system






A mapping of (most) all the characters of all the scripts in the world to numbers. Thus a computer (which, after all, only speaks in numbers) can refer to any character. E.g., this letter ”A” is internally represented by the number 65. Unicode is particularly useful because it is an almost universally used international standard — almost all computers world-wide use this system.



version control

Version control, or revision control, is a method of managing and tracking changes to files over time. Version control systems, such as GitHub and Subversion, store recorded changes so users can revert back to pervious file or repository versions, if necessary.



An open source, semantic web application that enables the discovery of research and scholarship across disciplines at a particular institution and beyond. It is populated with detailed profiles of faculty and researchers including information such as publications, teaching, service, and professional affiliations