Supporting and Enhancing Scholarship in the Digital Age: The Role of Open-Access Institutional Repositories

Leslie Chan

Abstract: Scholarly communication and publishing are increasingly taking place in the electronic environment. With a growing proportion of the scholarly record now existing only in digital format, serious and pressing issues regarding access and preservation are being raised that are central to future scholarship. At the same time, the desire of scholars to maximize readership of their research and to take control of the scholarly communication process back from the restrictive domain of commercial publishing has prompted the proliferation of access options and experimental models of publishing. This paper examines the emerging trend of university-based institutional repositories (IRs) designed to capture the scholarly output of an institution and to maximize the research impact of this output. The relationship of this trend to the open access movement is discussed and challenges and opportunities for using IRs to promote new modes of scholarship are provided.

Résumé : La communication et l'édition savantes ont de plus en plus lieu dans un milieu électronique. En effet, une proportion croissante de la recherche existe seulement sous forme numérique. Cette situation soulève des questions sérieuses et pressantes sur l'accès et la préservation qui sont vitales pour l'érudition future. En même temps, les désirs de chercheurs de maximiser leur lectorat et de retirer la communication savante de l'emprise des éditeurs commerciaux ont mené à une prolifération d'options pour l'accès et de modèles d'édition expérimentaux. Cet article examine la tendance émergeante vers des entrepôts institutionnels basés dans des universités conçus de manière à préserver la production savante d'une institution et à maximiser son impact sur la recherche. Cet article discute du rapport de cette tendance au mouvement favorisant un accès libre et il propose des défis et des occasions dans l'utilisation d'entrepôts institutionnels pour promouvoir de nouvelles approches érudites.


The institutional repository (IR), a university-based digital-asset management system, is fast emerging as a key component of the current debate on open access (OA) and reform of the scholarly communication process. Some proponents of the open access movement see the IR or the open-access archive as the most cost-effective and immediate route to providing maximal access to the results of publicly funded research, thereby maximizing the potential research impact of these publications (Harnad, 2001; 2001b; 2003). Some research libraries see IRs as a means to expand on the amount and diversity of scholarly material that is collected and preserved, thus enhancing teaching, learning, and research at the host institution and beyond (McCord, 2003). Some see IRs as a way to enhance an institution's prestige or branding by showcasing its faculty's research output (Crow, 2002). Yet others see IRs as an essential infrastructure for the reform of the entire enterprise of scholarly communication and publishing (Guédon, 2003).

Given that the IR has only emerged in the last two years, the diversity of opinions regarding its potential for scholarly communication is understandable. In addition, there is considerable debate as to who should be responsible for setting up IRs and maintaining them, the types of contents to be deposited, the personnel and budgetary implications, the technology to employ, the intellectual property rights of authors and publishers that need to be resolved, and how to ensure quality and authentication of archived material (Ware, 2004). There is also considerable discussion regarding the role, if any, that government and research-funding agencies should play with regard to IRs, as well as debate regarding the relationship between institution-based repositories and discipline-based repositories such as the famous physics e-prints archive, known as arXiv, which is the precursor of the current institutional-repository movement (Lynch, 2003).

This paper examines the roles of the IR in the open-access movement and assesses its impact on scholarly communication. By focusing on a recent implementation of an IR at the University of Toronto, specific benefits and potential problems associated with setting up and developing an institutional repository are examined. Issues related to faculty participation, content selection, long-term preservation, legitimacy and certification, costs, and responsibilities are also discussed. The paper begins with an overview of the context that gave rise to the IR movement, and it concludes with a discussion of the potential impact of open-access repositories on the scholarly communication process.


The so-called "Scholarly Communication Crisis" is now in its third decade, and awareness of the inherent problems in the current system of scholarly publishing has spread from the research-library community to the hallways of the academies and even to the mainstream press.1

In short, the scholarly communication crisis encompasses two distinct though interrelated problems. On the one hand, serial-subscription costs, particularly for science and medical journals, have been increasing rapidly over the last two decades, often at rates far above the cost of inflation. At the same time, research-library budgets have been decreasing or are otherwise unable to keep pace with price increases. The result is that libraries are spending more, but they are in fact getting less, in terms of journal titles and new monograph acquisition, as more of the budget is being consumed by serial subscriptions. So even the richest university libraries cannot afford to subscribe to most of the journals that their faculty need for their research and teaching (Edwards & Schulenburger, 2003; Park, 2002). This is referred to as the "affordability" problem, and it is well documented by the Association of Research Libraries.2

Another problem is that with most journals being published electronically and distributed in bundled databases controlled by large commercial publishers, libraries and users are facing increasingly restrictive licensing terms on who can access the databases and when they can do so, as well as how users can share the material (Cox, 2001). No longer do libraries subscribe to tangible publications of which they hold a physical copy, and libraries and their users now only have access to the databases for as long as the libraries maintain their subscription (Davis, 1997). Libraries often lose access to the back-files of journals when they terminate their subscriptions, resulting in serious gaps in their serial holdings (McGinnis, 2000). Licensed access thus calls into question the short-term as well as long-term "accessibility" to the scholarly record (Muir, 2003). It also raises questions as to who should be responsible for the preservation and stewardship of the scholarly record in the electronic environment (Ayre & Muir, 2004; Friedlander, 2002).

The "affordability" and the "accessibility" problems together constitute a countervailing trend that undermines the traditional academic co-operative and sharing ethos of scholarly communication (Lynch, 1994; Okerson, 1992; Suber, 2003). More importantly, they act to undermine what scholars hope to achieve with their publications, namely research impact. Authors who contribute freely to academic journals do not expect any monetary return for their writing. Authors also perform peer review as part of their professional obligation and contribution to their disciplines. In exchange, they wish their papers to be widely circulated, read, cited, and built upon. This process in turn generates further research questions and funding proposals, and increases the impact of the research. Limiting access leads to lower visibility and needless loss of research impact for the researchers. It also leads to lower returns on investment for the research institutions and the funding bodies that support the researchers (Harnad, 2001; Harnad, Carr, Brody, & Oppenheim, 2003).

The open-access movement

Following the definition set forth by the Budapest Open Access Initiative (2002), open access refers to literature that provides:

free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited. ("Budapest Open Access Initiative," p. 3)

This definition underscores the open or non-proprietary nature of Internet technologies and their potential, as well as the recognition that research results are best utilized when others are permitted to build upon them, provided credits are duly given.

Since the public release of the Budapest Open Access Initiative (BOAI), a chorus of statements from a broad range of institutions, including national and international funding bodies, has declared their support and commitment to open access in scholarly and scientific literature (see summary and references in Suber, 2004). Governments in the U.S. and the U.K. have also begun to look into the issue of access to publicly funded research results and how they could be made publicly available (Brahic, 2004; Zandonella, 2003). Funding bodies such as the Max Planck Society in Germany3 and the Wellcome Trust in the U.K.4 are promoting the view that the cost of open-access dissemination is part of the cost of research, so researchers who receive funding from these organizations are obliged to provide open access to the research publications.

In the U.S., the National Institute of Health developed PubMed Central ( for archiving published research papers in medicine and the biological sciences. The Energy Department recently set up its E-print Network ( to provide access to research funded by the Department. These agencies recognize that open repositories of research literature facilitate discovery and retrieval and thus advance their institutional missions.

Clearly open access is no longer a marginal, scholar-driven initiative, but a mainstream movement that is receiving worldwide attention from researchers, institutional leaders, policymakers, and funding bodies, as well as commercial publishers.5

Strategies for archiving open access

The BOAI further recommends two complementary strategies for authors to participate in open access:

  1. Publishing in open-access journals, which do not charge for subscription or access fees, but instead rely on other methods for covering the publishing expenses. This is generally known as Open-Access Publishing (OAP).

  2. Self-archiving, the practice of depositing e-prints (published papers and pre-prints) into open electronic archives - preferably ones set up by the researchers' own institutions. E-print archive and digital repository are common terms that are used to refer to institutional servers. If an institution-based archive does not exist, authors should deposit their e-prints into a centralized discipline-based archive for the time being. This is generally known as Open-Access Archiving (OAA).

One model of OAP that has been receiving substantial attention and debate in recent months is the author-fee model whereby authors are requested to pay an article processing fee to cover the cost of peer-reviewing and other associated publishing costs. BioMed Central ( and the Public Library of Science ( are two prominent publishers that are implementing this model.

At the same time, many society publishers are making available the electronic version of their journals online for free while continuing to charge for print subscriptions. Other society-based publishers are exploring means to convert their subscription-based journals into open-access publications and Willinsky (2003) believes that some society publishers could provide open access to their journals while maintaining services to their members. The Open Society Institute and the Scholarly Publishing and Academic Resource Coalition (SPARC) have produced useful resources that guide society publishers through the process and financial options of open-access publishing (

It is important to note that current discussions of open access tend to treat Open Access-Publishing (OAP) as synonymous with open access and so the potential of Open-Access Archiving (OAA) is often undermined. While OAP is the more intuitive way of archiving open access, current figures show that only about 1,100 journals are open access (according to the Directory of Open Access Journals, This number constitutes only 5% of the 23,000 peer-reviewed titles listed by Ulrich (, a widely consulted source on periodical references. This means that the majority of papers published today are accessible only through payment unless authors and their institutions provide access to them through institutional archiving. This is indeed a key impetus for the recent surge of open-access archiving.

Enabling technology

Another impetus for the recent growth of IRs is the emergence of enabling technology and the availability of open-source software applications for setting up repositories. In response to the need to develop interoperability frameworks for linking the growing number of e-print archives, the Open Archive Initiative ( was formed in 2000, with the mission to develop and promote "interoperability standards" that aimed to facilitate the efficient dissemination and discovery of digital content. While the early concern of the Open Archive Initiative (OAI) was primarily the interoperability of well-defined e-prints, it has been growing to extend interoperability to a broad range of digital objects (such as datasets, video, databases, theses, technical reports, and other grey literature), as well as added applications and services (Van de Sompel, Young, & Hickey, 2003).

The highly distributed nature of resources scattered across the World Wide Web has made meaningful searches difficult and often elusive. It is often difficult to determine the origins and authenticity of materials retrieved from a general search engine. This is, in large part, because digital resources often lack a proper description of what the resource is about, commonly known as metadata. The interoperability standard developed by the Open Archive Initiative, the Protocol for Metadata Harvesting (OAI-PMH),7 is designed to harvest metadata and associated resources that are distributed across different OAI-compliant servers, thus connecting all distributed servers into a seamless global digital library. OAI-MHP has built-in support for basic Dublin Core metadata, an internationally recognized standard used in library and digital-resource cataloguing (Dublin Core Metadata Initiative,

The OAI-MHP is now widely adopted by library, publishing, and scientific communities eager to ensure that their resources on the Web are interoperable with each other.8 The accessibility, and therefore impact, of materials is greatly reduced if they remain invisible to others because of the lack of interoperable standards. OAI-MHP is a key building block that makes the institutional repository the platform of choice to support open access to institutional research output.

Views of institutional repositories

A common definition of IR is that it is a Web-based archive of scholarly material produced by the members of a defined institution. Accordingly, the content of the repository, as well as the policy on selection and retention, is also defined by the institution (Johnson, 2002). This is in contrast to the discipline- or subject-based repository, such as the physics archive ( or the Cogprint archive (, whose depositing policies are determined by the research communities and often develop in a rather organic manner without much preplanning.

Additional qualities of an IR are that it should be openly accessible and interoperable with other repositories, preferably using the Open Archive Initiative Protocol for Metadata Harvesting. The inclusion of long-term digital preservation is considered by some to be an essential feature of an IR, though this is a contentious area. For Clifford Lynch (2003), an IR encompasses:

a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.

It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution. (p. 5)

This definition identifies an IR as more than simply a technology platform for digital storage. It implies strategic planning for institution-wide digital-asset management and the development of a support structure and related applications. In addition to facilitating access to e-prints and a variety of publications, Lynch also sees a role for IRs in supporting teaching and learning, digital preservation, new forms of scholarly publishing, and even as a platform for capturing many of the events of campus life, such as lectures, symposia, and performances.

Lynch's broad definition is, of course, not without its detractors. In particular, Stevan Harnad, who has been for over a decade the strongest and most persistent advocate of freeing research literature by author self-archiving, objects to the diluted and unfocused use of IRs (Harnad, 2003).9 Harnad prefers to see institutional archives focusing on the mission of storing and providing open access to peer-reviewed publications generated by faculty, leaving the other functions envisioned by Lynch, such as new forms of scholarly publishing, to other constituents. This is because, for Harnad, a "publication" is a well-defined entity:

…for scholarly and scientific purposes, only meeting the quality standards of peer review, hence acceptance for publication by a peer-reviewed journal counts as publication. Self-archiving should on no account be confused with self-publishing (vanity press). (Self-archiving pre-refereeing preprints, however, is an excellent way of establishing priority and asserting copyright). (2002, p. 1)

Thus, making one's publication accessible is not the same as scholarly publishing, and Harnad regards attempts to implement new forms of quality control and experimentations with new forms of digital publishing as distractions from the principal goal of freeing the primary research literature that is kept behind toll-gates.

Harnad also persistently points out that currently, the technical infrastructure for setting up institutional archives is in place and the costs are trivial. However, filling up these archives once they have been set up remains a real challenge, primarily because, with rare exceptions, institutions lack consistent policies that would encourage their faculty to systematically archive their work.

To lower the technology barrier and encourage institutions to set up OAI-compliant archives, Harnad and his research team at the University of Southampton developed the Eprints software (, which was released as open-source software in 2001. Version 2.3 was released in March 2004, and currently over 120 known archives are running on the software.

The launch of the MIT DSpace Institutional Repository in 2002 (, and the subsequent release of the DSpace software under an open-source licence have also generated a great deal of excitement and encouraged a second wave of institutions from around the world to begin installing, testing, and evaluating the software for local use. DSpace has managed to create added enthusiasm for the self-archiving and IR movement because the software is designed more for community-based organizations than discipline-based organizations, and so it is better suited for an institution-based application.

Further, while Eprints is designed specifically to facilitate archiving of more familiar or traditional forms of scholarly publications (journal articles, book chapters, conference proceedings, et cetera), DSpace is designed to manage heterogeneous types of digital content, including multimedia files, datasets, and new and emerging forms of scholarly representations. In other words, Eprints is well suited for the need to provide free access to research papers, while DSpace is better suited for managing diverse digital content (Nixon, 2003). So the choice of software application will depend on the needs of the institution. As both applications are OAI-compliant, they are indeed complementary, and they can be made to supplement each other (an example is provided below).

A further difference between the two software applications, and a key one in terms of function, is the use of persistent identifiers by DSpace. Persistent identifiers are a means of providing a permanent and unique digital identifier for each specific resource or document stored in the repository. The persistent identifier allows users to locate the appropriate document even when the linked document's physical location has changed, provided that the persistent identifier is maintained with the correct current associated location. This form of mapping is generally performed by a resolver database, and there are several existing systems or schemes for assigning persistent identifiers; some services are free, while others are fee-based. DSpace employs the free Corporation for National Research Initiatives (CNRI) Handle System, which is "a comprehensive system for assigning, managing, and resolving persistent identifiers, known as 'handles,' for digital objects and other resources on the Internet" (Corporation for National Research Initiatives, 2003, p. 1). The provision for persistent identifiers is an important feature of DSpace, and it reflects DSpace's concern for digital preservation and long-term access. The importance of persistent identifiers is discussed in a subsequent section.

In addition, while MIT and Hewlett-Packard are the core partners developing DSpace, there is a growing technical user base that is providing useful suggestions for improvement of the core program. There is also the DSpace Federation, with members from selected partnering universities guiding the development of the software toward an open-source development model (Smith et al., 2003).

T-Space at the University of Toronto

In the spring of 2003, the University of Toronto libraries, along with eight other research libraries, joined the DSpace Federation ( to test the adaptability of the system for local needs and to document the successes and barriers to faculty participation. T-Space ( was chosen as the name of the local repository, and an early adopters program was launched. T-Space was marketed in the university as an easy-to-use, dependable institutional-repository service where faculty and staff can manage, host, preserve, and distribute scholarly materials in digital formats, including e-prints, images, datasets, learning objects, e-books, text, audio, and video.

There is no cost to faculty for using the service, and the expectation is that the archived material will be openly accessible to all. T-Space provides various levels of access control, so that material can be restricted to members of individual or grouped university communities.

T-Space is designed to model the organizational structure of an institution and to accommodate the multidisciplinary and organizational needs of a large institution. Designated library staff launch custom T-Space portals, known as "communities," for research groups or departments that request the service, and they also provide customization, guidance in developing a workflow process, metadata requirements, and access policies. T-Space is then made up of "communities" that manage their own "collections," with locally defined policies for submission, workflow, and access provision.

According to the help file on T-Space, a "community" is "a unit at U of T that produces research, has a defined leader, has long-term stability, and can assume responsibility for setting community policies. Each community must be able to assign a coordinator who can work with T-Space staff" (University of Toronto Libraries, p. 2).

In practice, a "community" may be a department, a research centre, an institute of studies, or even an entire suburban campus, as in the case of UTSC, the Scarborough campus of the university.

T-Space provides a consistent framework for navigation and searching (see Figure 1). It supports full text searching, either within an individual collection or across collections. T-Space is a registered "data provider" with the Open Archive Initiative, and the content is also indexed by Google, as well as OAI search services such as OAIster ( The latter allows users to search across OAI-compliant institutional servers that house only academic and quality material. Materials deposited in T-Space are therefore highly visible on the Web as well as searchable in a more trusted environment like OAIster.10

Submissions and retrievals are done through a simple Web interface. Submission is a seven-step procedure of pointing and clicking, as well as typing in the required metadata, such as author, title, dates, abstract, citation source, and other relevant information (see Figure 2).

Users may browse the content of the repository by title, author, or date. For any browse option, entries are hyperlinked to a descriptive record for the item, and the details available are dependent on the metadata entries. The full record for each item is in the Dublin Core format and can be viewed by clicking on the "Show full item record" button within the brief record screen. A Uniform Resource Identifier (URI) or handle is also provided for each item (see Figure 3).

At present, metadata about copyright is not part of the submission process. However, submitters are required to warrant that they are the copyright holder, or that they have obtained the necessary permission to post their material on T-Space. Further, they must agree that they will grant a non-exclusive licence to the library to distribute and preserve the items, although the libraries will not own the content on T-Space (see Figure 4).

Who is participating?

As of April 2004, roughly a year after its initial launch, there are eight "early adopter" communities on T-Space, with a total of 850 items in the repository. The number and size of collections within each community vary greatly, with some communities having only a few items. The range of items deposited includes policy papers, technical reports, previously published journal articles, conference proceedings, video clips, e-monographs, datasets, digitized versions of out-of-print books, and curriculum resources in the form of websites. The remainder of this section focuses on the UTSC community and its collections, as well as how some faculty are making use of the repositories to meet their research needs.

The pilot project at UTSC began in September 2003, and it is ongoing. Like other communities on T-Space, participation is entirely voluntary. For the pilot phase, no specific collection policy has been defined, and faculty are encouraged to submit research materials that they would like to make freely available through the repository. Faculty who have been experimenting with digital publishing and multimedia courseware development are also encouraged to submit material. It was felt that, given the fragility of electronic objects and that T-Space is well suited to house these objects, priority should be given to these materials. Faculty who have collected long-term data that are in need of archiving and preservation are also encouraged to participate.

The Vice-Principal of Research at UTSC, Rudy Boonstra, lent full support to the pilot project and also provided seed funding for hiring a work-study student to do the proxy archiving for interested faculty.11 The VP also periodically sent out communications to department chairs to encourage participation. The library's decision to perform archiving is intended to minimize the workload of the faculty, to fill the repository quickly, and to learn about the range of issues that may arise as a result of diverse types of submission. The student assistant is employed 12 hours per week digitizing print documents and converting various file formats into Adobe's portable document format (PDF), a format explicitly supported by T-Space. The student also performs copyright checking and sends out permission requests to publishers where necessary. I play the role of co-ordinator, supervising the student, communicating with faculty, and liaising with the service and technical librarians who are overseeing the deployment of T-Space at the central libraries. I also approve the materials deposited by the student to ensure accuracy. Time spent on the entire workflow has been documented, including document gathering, conversion, metadata entries, rights clearance, and approval. This information is available to other communities that wish to set up a similar workflow process.

Between September 2003 and April 2004, a total of 362 items were deposited in the UTSC community on T-Space. Of these, 165 items are in the biology collection. With the exception of two datasets, all items in the biology collection are previously published journal articles. The next largest collection is social sciences, with 38 items, all of which are previously published book chapters or peer-reviewed articles. Three "e-monographs" and three digitized versions of out-of-print books were also archived.

Traditional publications

The fact that the biology collection has the highest number of submissions is not surprising, given that the Vice-Principal of Research at UTSC is a champion for the project and that he is an active biological researcher with a strong publication record. So when he called for participation on campus, other biological researchers were the first to respond. It is also worth noting that many of the biologists who submitted articles to T-Space already had most of their papers on their personal websites and they are cognizant of the benefits of disseminating their research on the Web. The materials on personal websites are widely scattered and difficult to locate, however, so many biologists immediately see the benefit of having their publications on a central server with a persistent identifier associated with each publication. Many faculty already have online CVs, and some see the use of the persistent identifier as an effective way of linking their personal sites to the publications from an online publication list.

The idea of making one's older publications openly accessible was particularly attractive to two senior biologists, who saw T-Space as a great opportunity for giving new life to their early works that may have been forgotten. Their work still has significant scientific value, and it is also important for undergraduate teaching, as well as graduate-student training. More importantly, they see the archiving of their life's work as a validation of their contribution to their discipline and to the university.

The low number of articles submitted by humanities faculty is also not surprising, given that the primary form of scholarly communication in the humanities remains the monograph, although this trend has also been changing in recent years (Steele, 2003). The lack of a local champion in the humanities may also account for the low participation. Unlike colleagues in the biological sciences, many humanities faculty are not familiar with the Web as a publishing and distribution tool, nor are they familiar with the conversion process of print publications into digital formats; consequently, they are not as likely to take advantage of what T-Space has to offer.

Since biological research is generally team based, involving multiple researchers at different institutions, it is possible that the use of electronic tools for exchanging data, pre-prints, and post-prints is common practice. This is in contrast to research in the humanities, which is still primarily a solitary endeavour. Awareness of open access is also higher in the biological-research community because of high-profile repositories such as PubMed Central, funded by the National Institute of Health in the U.S., which contains a large number of free articles in the biological sciences, and the highly publicized open-access journals such as the Public Library of Science's Journal of Biology and the many titles published by BioMed Central. So while many biologists were not initially familiar with open-access archiving, they were easily sensitized to the idea.

Humanities scholarship

It should, however, be pointed out that some humanities scholars at UTSC are amongst the first wave to experiment with the Web and electronic publishing. Based on his many years of research, historian Wayne Dowler developed an innovative Web-based teaching resource, Russian Heritage: Land, People and Culture ( It is

a survey of Russian history from the rise of Kievan Rus in the ninth century to the death of Stalin in 1953. It consists of a number of modules that treat such topics as the Russian nobility, Russian women, the Russian peasantry, the Great Reforms of the 1860s, the Revolutions of 1917 and the Stalinist era. The Russian Heritage brings together primary texts, a large array of images and narration and music in order to convey the flavour of the Russian historical experience. (Dowler, 2004)

This resource was in danger of disappearing because the original server at UTSC on which the material resided was being decommissioned. The archiving of the Russian history site on T-Space not only guarantees continued access for students, it promises greatly improved visibility of the project because of the discovery services that will be able to locate it. The intellectual labour invested in the project is thereby preserved, and the resource is no longer hidden.

For the same reason, Dowler is in the process of requesting the digital rights to his out-of-print book on Russian history.12 This book is still in high demand for undergraduate teaching, but access has been an issue, since the copies in the library are deteriorating and they cannot be replaced. Providing access to the book through T-Space will give the book new life and enable learning and research on the topic.

Digital publications

Only a small number of digital-only publications are archived on T-Space at present. But these e-monographs represent an important category of scholarly publications for which an institutional repository provides an important infrastructure for both long-term preservation and access.

Jean Mason's e-monograph, "From Gutenberg's Galaxy to Cyberspace: The Transforming Power of Electronic Hypertext" (, is "based on the first-ever web-based doctoral dissertation at McGill University in Canada, where it successfully challenged academic norms and expanded the perimeters of doctoral discourse" (Mason, 2002, p. 3). The work examines the relationship between writing in the hypertext environment and the traditional writing process. Mason decided to publish her monograph in hypertext form for the obvious reason that it is the natural medium for the publication. She also decided to publish the work with an experimental e-publishing centre at UTSC because she believes that:

as a researcher dedicated to exploring the potential of hypertext, it is my responsibility to challenge existing boundaries to determine what is possible or desirable in that medium in my discipline. I believe both my original dissertation and this monograph combine the best to date of linear and hypertextual, word-based and multi-medial, traditional and futuristic options to create an ideal hybrid for the presentation of scholarly research in most disciplines. (Mason, 2002b, p. 8)

While acknowledging the risk of publishing in a non-traditional medium, Mason also recognizes the importance of access and preservation as key to the legitimacy of electronic scholarship. Mason predicts that:

Although my dissertation was among the first of web-based submissions, it will certainly not be the last. It seems to me that the ideal way to archive web-based dissertations and other scholarly publications would be for universities or national or international organizations to establish and maintain digital archives, including active "archives" that could be maintained and updated should a researcher so choose. (Mason, 2002b, p. 10)

Permanence and accessibility are two fundamental features of the cumulative scholarly record. When new knowledge is cited it becomes part of the permanent record of scholarship and gains legitimacy as a result. Gaining legitimacy and credibility is indeed the major hurdle that authors of electronic text still face (Ross, 2000; Willinsky & Wolfson, 2001). In evaluating digitally based work, sceptics of this new form of publication often focus on the medium and access problems, rather than on the quality of the scholarship (Siemens, 2000). Cornelius Holtorf, author of "Monumental Past," which is also archived on T-Space (, points out in a personal e-mail:

Every time the URL changes I get frustrated messages about my work having 'disappeared' - there will be many more who simply give up without contacting me. This only fuels all those critics who doubt the durability of e-works. It's like moving a book to different shelf locations all the time, without updating the catalogue. (Cornelius Holtorf, Project Leader, "The Portrayal of Archaeology in Contemporary Popular Culture," Riksantikvarieämbetet, Stockholm, Sweden, personal communication, March 12, 2004)

As Mason and Holtorf's works are archived on T-Space, each has acquired a persistent identifier, and this should ensure that the "moving shelf" problem described by Holtorf ends. This in turn should enable readers and critics to concentrate on the quality of the work, rather than on access difficulties. As Lynch has pointed out, since "most individual faculty lack the time, resources, or expertise to ensure preservation of their own scholarly work even in the short term, and clearly cannot do it in the long term that extends beyond their careers; the long term can only be addressed by an organizationally based strategy" (Lynch, 2003, p. 14). Institutional repositories are therefore particularly important when it comes to the long-term preservation of digital-only scholarly material.

Supporting open-access journal publishing

The journal Women's Health and Urban Life is a two-year-old publication edited and published by sociology professor Aysan Sever at UTSC. Like many other university- or department-based journals driven by dedicated faculty members, the journal is vibrant but has an uncertain future because of the lack of long-term funding - and the limited subscription return is insufficient to cover the production and printing cost. Many journals of this nature have come and gone because published articles are often difficult to access due to limited circulation and inadequate archiving. In this regard, journals with small print runs share the same problem of credibility and scholarly legitimacy as the electronic publications discussed earlier, and authors are often reluctant to submit to such journals for fear of irrevocable loss of their scholarly record or low readership.

The institutional repository is a good way to remedy the problem of long-term archiving. It also provides worldwide access to these local journals, thereby vastly increasing the number of potential readers and contributors. For small journals with limited resources that lack an e-publishing infrastructure, the IR also provides a ready-made, low-barrier electronic platform for managing the publications and making them accessible.

In the case of Women's Health and Urban Life (, a website for the journal was created, with a table of contents page for each issue. Articles listed in the table of contents linked directly into the T-Space archive.

As each article in the archive is given a persistent handle, it then becomes the permanent version for the record. This persistence of access means that authors are assured that their intellectual efforts are not lost and that they are credited, while researchers can be more secure in citing the persistent source from the archive.

It is possible that in time, as printing costs for the journal become unmanageable, the print version could be discontinued to save money and the journal could continue in electronic-only form with the support of T-Space. Of course, the first-copy costs still need to be met, and perhaps a nominal article-processing fee may be a viable option. There are other economic models for providing open-access journal publishing, but discussion of this issue is beyond the scope of this paper (see Prosser, 2003; Velterop, 2003; Willinsky, 2003).

Supporting international publishing and knowledge sharing

Supporting journals with small print runs is the objective of a project known as Bioline International (BI), currently housed in the Department of Social Sciences at UTSC. Rather than supporting journals that are published locally, Bioline International ( has been assisting scientific publishers from various developing countries with electronic publishing and open-access distribution of their journal publications. The primary objective is to make visible the otherwise "lost science" that is inaccessible due to distribution barriers. Research knowledge from developing countries is indeed critical, because true global understanding of science, particularly in the areas of biodiversity, emerging diseases, and sustainable environment, would be incomplete without the north-south and south-north flow of knowledge (Canhos, Chan, & Kirsop, 2001; Chan & Sahu, 2003).

Now in its twelfth year, BI has recently begun archiving all the articles in its database, which is housed on a server in Brazil (, into an e-prints archive ( run by the free Eprints software provided by the University of Southampton. The database in Brazil is currently not OAI-compliant; by archiving the content on an e-print server, we are able to take advantage of the OAI interoperability of the e-print archive. This also allows harvesting services to access the article metadata, thereby maximizing the visibility of the publications. In other words, we made use of the free software to take advantage of what OAI had to offer.

When UTSC joined T-Space as an early adopter, we also started to mirror the BI database on the T-Space e-prints server, in part to determine the ease of exchanging data from one system to another, and in part to ensure that the data would have multiple copies for safekeeping. Since T-Space provides persistent handles not available in Eprints, this provided an added opportunity for us to evaluate the relative strength of the system in exposing its metadata and enabling retrieval. Thus far, the migration of data from Eprints to T-Space, achieved with the assistance of a "cross-walk" routine between the two systems, has been unproblematic. Preliminary results show that materials that are archived on the e-prints server are easily discovered by OAIster and Google, and the search rankings of material on the e-prints server are consistently higher than those of articles stored on the regular server. Whether visibility and accessibility will translate into increased citation is being documented and will be reported at a later date.

Why the participation rate is so low

As mentioned earlier, the total number of items on T-Space has not even reached 1,000 after one year, signifying very low participation considering that this is an institution with over 2,000 faculty members. The number of items on T-Space clearly does not match the publication output of the faculty, and the obvious question is why the submission rate is so low. Is this problem unique to the University of Toronto, or is this a common issue across universities?

In a recent DSpace user meeting at MIT ( where a dozen institutions using DSpace software compared their implementation experiences, it was clear that submission rates are low across campuses, despite continuous and sustained effort on the part of the libraries to encourage participation. A recent survey of 45 institutional repositories by Ware (2004) reveals an average of only 290 records per institution. The registry of e-print archives ( shows a steady growth in the number of digital repositories across institutions in the developed world, but most have a problem populating them once they are launched. Ware (2004) concludes that at present IRs appear to have little impact on the reform of scholarly communication, and their role in providing open access to primary research literature is also small, given that only 22% of the content in the surveyed repositories is e-prints.

Cultural inertia is often cited by faculty members as the reason for the slow adoption of self-archiving. Lack of awareness of the importance of open access is another common reason. Lack of trust in institutional commitment to the long-term maintenance of the repository could also be a factor. And uncertainty about intellectual-property rights is another key issue (Bjork, 2004).

Since one of the key goals of open access is to maximize research impact, evidence demonstrating the higher citation, and hence impact, of open-access literature would go a long way toward convincing sceptics and fence-sitters about the benefits of open access (Harnad & Carr, 2001). Steven Lawrence (2001) found an "average of 336% more citations of online articles compared to offline articles published in the same venue" (p. 7). But Lawrence's study is based on publications in computer sciences, and it is not clear whether this effect is applicable across disciplines, especially for disciplines without an e-print exchange culture or with different communication styles. A recent study conducted by the Institute of Scientific Information (ISI) comparing the citation rate of 190 open access journals with non-OA journals found no significant difference between the citation impact of OA and non-OA journals (Thomson ISI, 2004). This is in fact encouraging, as James Pringle (2004) of ISI points out that "Open Access journals can have a similar impact to other journals, and prospective authors should not fear publishing in these journals merely because of their access model" (p. 11).

However, ISI's "impact factor" (IF) has important limitations, because rather than measuring the actual citation counts of journal articles, IF actually measures the average citation counts of the journals in which the article appears (Adam, 2002). So IF does not really measure the actual citation impact of an individual article. A large-scale study is now under way, led by Stevan Harnad, to examine the "Lawrence effect" across all disciplines in a 10-year ISI sample of 14 million articles. The goal is to measure the citation effect of the articles from non-OA journals that have been made OA by their authors through self-archiving, and to compare this with articles that have not been made OA by their authors. Preliminary results suggest that there is a discernible difference in terms of the frequency with which the articles are cited, and that the difference is between 250% and 550% in favour of the articles that authors have made OA (Brody, Stamerjohanns, Vallières, Harnad, Gingras, & Oppenheim, 2004). This offers a compelling reason for researchers to make their research openly accessible through their institutional archives and for institutions to begin implementing policies for setting up and filling their archives to maximize the impact of their collective research output.

Beyond journals

While quantitative studies of the citation impact of OA articles are welcomed, it should be remembered that not all disciplines use articles as their primary units of exchange or communication. Furthermore, scholars are increasingly using a variety of media and formats for exchanging ideas. Even in the sciences, it is likely that an article may eventually be broken up into "modules," as its data, analysis tools, multimedia objects, and supplementary materials become more directly accessible from supporting databases that may reside in diverse locations such as institutional or national repositories (Kircz, 1998; 2001; 2002; Lynch, 2003). Indeed these developments have been documented in a National Science Foundation (NSF) report of the Advisory Committee for Cyberinfrastructure13 (Atkins, 2003).

Significantly, The American Council of Learned Societies has recently launched a Commission on Cyberinfrastructure and the Humanities and Social Sciences (, chaired by John Unsworth of the University of Illinois at Urbana-Champaign. The goal of the commission is to explore many of the same issues examined in the NSF study on science, as well as issues unique to the humanities and social sciences, taking into account the funding patterns and communication styles across the diverse disciplines. Among the questions that will be explored is how computing tools, digital networks, and repositories housing primary data will complement and enhance interpretive works of scholarship.

As research becomes more data intensive, a scholar's ability to store, access, and share primary data will be crucial to the advancement of scholarship. Recognizing this need, the American Anthropological Association, the largest scholarly association for anthropologists in the world, is currently developing a "scholar's portal" known as AnthroSource ( that will not only allow users to access the full text of journals published by the association, but will also provide repository services that will enable researchers to deposit data associated with their publications. Other disciplines in the humanities and social sciences will likely follow suit.

But as Lynch (2003) also pointed out, not all disciplines are accustomed to depositing data in centralized or disciplinary repositories (as is common in molecular biology, genomics, and physics), and it is likely that disciplinary repositories will not be fully comprehensive even when they are set up. So an ideal scenario would be to take advantage of the tools made possible by the Open Archive Initiative Metadata Harvesting Protocol (Van de Sompel, Young, & Hickey, 2003) and use the institutional repository as a gateway to a distributed system of disciplinary repositories, and vice versa. Researchers would have the option of depositing their data in their institutional repository or with an appropriate disciplinary repository. In this way, IRs and subject-based repositories are truly complementary to each other. But while it is technically feasible for such linkage between institutional and disciplinary repositories to exist, there are numerous cultural, policy, legal, and copyright issues that need to be agreed upon between participating institutions and learned societies. Fortunately, these dialogues are beginning to take place at an increasing rate.14


The development of institutional repositories is still in its infancy, but a number of important lessons have already emerged. Preliminary research shows that institutional repositories facilitate more timely and open access to research and scholarship, and that they maximize the potential research impact of archived publications. In addition to supporting more traditional scholarship, IRs could play an important role in supporting alternative forms of journal publishing and novel forms of digital scholarship in the humanities and social sciences. By preserving and making accessible academic digital objects, datasets, and analytic tools that exist outside of the traditional scholarly publishing system, IRs also represent a recognition of the importance of the broader range of scholarly material that is now part of the scholarly communication process and record.

So while IRs may have an eventual impact on the economics of scholarly publishing, or the "affordability" problem, their primary and immediate role is in facilitating open access to traditional scholarship, as well as advancing, supporting, and legitimizing the broader spectrum of scholarly communications that is emerging in the electronic environment. By making available research generated in poor countries in addition to knowledge created in well-endowed institutions, IRs could play a role in bridging the global knowledge gap. Research institutions and universities have the primary mission of creating, sharing, and disseminating knowledge, which are public goods. Open access through institutional repositories is a low-cost and low-barrier strategy for achieving this mission.

IRs are indeed proliferating, but they are filling up slowly. Faculty inertia, though known to be a deterrent in similar projects (Björk, 2004), has not proven to be a key problem in the pilot project at UTSC. Provided that the incentives for faculty are clearly expressed, and that the broader context of the need for reform in the scholarly communication process is fully explained, participation should increase. What remains to be addressed now is the collective will and determination on the part of higher education and research communities to return the control of the scholarly communication process to its rightful home (Guédon, 2001). If the progress of the open-access movement in the last two years is any indication, over the next two years we should expect to see substantial growth both in the volume and diversity of content that will be publicly accessible through IRs, and scholarship and knowledge-sharing the world over will be vastly enriched.


I would like to thank Rea Devakos and Gabriela Mircea of the University of Toronto Libraries for their collaboration and assistance with the T-Space pilot project. Barbara Kirsop and two anonymous reviewers provided valuable comments that help clarify aspects of the paper.


  1. For a concise overview of the history of the "Scholarly Communication Crisis," see the Create Change website:

  2. See, for example, Trends in ARL Libraries:

  3. Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities:

  4. Wellcome Trust, "Scientific Publishing: A Position Statement by the Wellcome Trust in Support of Open Access Publishing:"

  5. BioMed Central, which is a commercial publisher, is also an open-access publisher:

  6. See, for example, the current Nature website debate on access to the literature:

  7. Technical details about the OAI-PMH are available at:

  8. See the series of OAI workshops held at CERN, Geneva:

  9. See also the many messages on the American Scientist Open Access forum, which Harnad founded, in the archives of the forum:

  10. It is worth noting that OAIster, based at the University of Michigan, recently entered into an agreement with Yahoo Inc! so that the content in OAIster is also accessible through Yahoo! search. And Google also announced in April that it is adding support to DSpace so that searchers will be able to find quality scholarly materials residing in institutional repositories using Google's advance search. Development of these indexing services will boost the success of institutional repositories. (See and

  11. For a video interview of Boonstra's view of T-Space and what it means for researchers, see

  12. Dostoevsky, Grigor'ev and Native Soil Conservatism, University of Toronto Press, 1982.

  13. "Cyberinfrastructure" refers to the new research environments in which capabilities of the highest level of computing tools are available to researchers in an interoperable network.

  14. See, for example, the conference "Scholarly Communication in the Humanities: Does Open Access Apply?" ( and, of course, the Commission on Cyberinfrastructure and the Humanities and Social Sciences (


Adam, David. (2002). The counting house. Nature, 415 (February 14), 726-729.

Atkins, Dan. (2003). Revolutionizing science and engineering through cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on cyberinfrastructure. URL: [January 15, 2004].

Ayre, Catherine, & Muir, Adrienne. (2004). The right to preserve. D-Lib 10(3). URL: [March 20, 2004].

Bjork, Bo-Christer. (2004). Open access to scientific publications - An analysis of the barriers to change. Information Research, 9(2). URL: [January 30, 2004].

Brahic, Catherine. (2004). UK hears open access evidence. The Scientists. URL: [March 15, 2004].

Brody, Tim, Stamerjohanns, Heinrich, Vallières, François, Harnad, Stevan, Gingras, Yves, & Oppenheim, Charles. (2004). The effect of open access on citation impact. Presentation at the National Policies on Open Access (OA) Provision for University Research Output: An International Meeting, Southampton. URL: [March 15, 2004].

Budapest Open Access Initiative. (2002). URL: [December 1, 2003].

Canhos, Vanderlei, Chan, Leslie, & Kirsop, Barbara. (2001). Bioline publications: How its evolution has mirrored the growth of the Internet. Learned Publishing, 14, 41-48. URL: [January 15, 2004].

Chan, Leslie, & Sahu, D. K. (2003). Bioline International and the Journal of Postgraduate Medicine: A collaborative model of open access publishing. International Symposium on Open Access and the Public Domain in Digital Data and Information for Science (2003, Paris, France). Washington, DC: The National Academies Press.

Corporation for National Research Initiatives. (2003). Introduction to the handle system. URL: [January 10, 2004].

Cox, John. (2001). Licensing serials. Serials, 14, 139-142.

Crow, Raym. (2002). The case for institutional repositories: A SPARC position paper. URL: [March 10, 2003].

Davis, Trisha. (1997). License agreements in lieu of copyright: Are we signing away our rights? Library Acquisitions: Practice & Theory, 21(1), 19-28.

Dowler, Wayne. (2004). Abstract: Russian heritage: Land, people and culture. URL: [April 4, 2004].

Edwards, R., & Schulenburger, D. (2003). The high cost of scholarly journals (and what to do about it). Change, (Nov/Dec), 10-19.

Friedlander, Amy. (2002). The National Digital Information Infrastructure Preservation Program: Expectations, realities, choices and progress to date. D-Lib, 8(4). URL: [March 15, 2004].

Guédon, Jean-Claude. (2001). In Oldenburg's long shadow: Librarians, research scientists, publishers, and the control of scientific publishing. Proceedings of the 138th Annual Meeting of the Association of Research Libraries: Creating the digital future (2001, Toronto, Canada). URL: [April 30, 2002].

Guédon, Jean-Claude. (2003). Open access archives: From scientific plutocracy to the republic of science. IFLA Journal, 29(2), 129-139.

Harnad, Stevan. (2001). Research access, impact and assessment. Times Higher Education Supplement, 1487, 16. URL: [April 15, 2002].

Harnad, Stevan. (2001b). The self-archiving initiative. Nature, 410, 1024-1025. URL: [April 15, 2002].

Harnad, Stevan. (2002). Is self-archiving publication? Self-archiving FAQ for the Budapest Open Access Initiative (BOAI). URL:

Harnad, Stevan. (2003). Online archives for peer-reviewed journal publications. In J. Feather & P. Sturges (Eds.), International encyclopedia of library and information. London: Routledge. URL: [January 10, 2004].

Harnad, Stevan, & Carr, Leslie. (2000). Integrating, navigating, and analysing open eprint archives through open citation linking (the opcit project). Current Science, 79(5), 629-638.

Harnad, Stevan, Carr, Leslie, Brody, Tim, & Oppenheim, Charles. (2003). Mandated online RAE CVs linked to university eprint archives. Ariadne, 35. URL: [January 15, 2004].

Johnson, Rick. (2002). Institutional repositories: Partnering with faculty to enhance scholarly communication. D-Lib, 8(11). URL: [December 10 2002].

Kircz, Joost. (1998). Modularity: The next form of scientific information presentation? Journal of Documentation, 54(2), 210-235.

Kircz, Joost. (2001). New practices for electronic publishing 1: Will the scientific paper keep its form. Learned Publishing, 14(4), 265-272. URL:
[January 10, 2004].

Kircz, Joost. (2002). New practices for electronic publishing 2: New forms of the scientific paper. Learned Publishing, 15(1), 27-32. URL:
[January 10, 2004].

Lawrence, Steven. (2001). Online or invisible? Nature, 411(6837), 521.URL: [December 4, 2003].

Lynch, Clifford. (1994). Scholarly communication in the networked environment: Reconsidering economics and organizational mission. Serials Review, 20(3), 23-30.

Lynch, Clifford. (2003). Institutional repositories: Essential infrastructure for scholarship in the digital age. ARL Bimonthly Report, 226. URL: [December 4, 2004].

Mason, Jean. (2002). Introduction page to From Gutenberg's Galaxy to cyberspace: The transforming power of electronic hypertext. URL: [June 3, 2004].

Mason, Jean. (2002b). Preface to From Gutenberg's Galaxy to cyberspace: The transforming power of electronic hypertext. Electronic Monograph. Toronto, ON: Centre for Instructional Technology Development Press. [June 3, 2004].

McCord, Alan. (2003). Institutional repositories: Enhancing teaching, learning, and research. EDUCAUSE Evolving Technologies Committee white paper. URL: [December 4, 2003].

McGinnis, Suzan. (2000). Selling our collecting souls: How license agreements are controlling collection management. Journal of Library Administration, 31(2), 63-76.

Muir, Adrienne. (2003). Copyright and licensing for digital preservation. Library & Information Update, 2(6), 34-36. URL:

Nixon, William. (2003). DAEDALUS: Initial experiences with EPrints and DSpace at the University of Glasgow. Ariadne, 37. URL: [December 1, 2003].

Okerson, Ann. (1992). The missing model: A 'circle of gifts.' Serials Review, 18(1-2), 92-96.

Park, Robert. (2002). The Faustian grip of academic publishing. Journal of Economic Methodology, 9(1), 317-335.

Pringle, James. (2004). Do open access journals have impact? Nature Web focus: Access to the literature. URL: [April 15, 2004].

Prosser, David. (2003). From here to there: A proposed mechanism for transforming journals from closed to open access. Learned Publishing, 16(3), 163-166.

Ross, Seamus. (2000). Changing trains at Wigan: Digital preservation and the future of scholarship. National Preservation Office, Occasional Papers. URL: [December 4, 2003].

Siemens, Ray. (Ed.). (2000). The credibility of electronic publishing: A report to the Humanities and Social Sciences Federation of Canada. URL: [January 15, 2001].

Smith, Mackenzie, Barton, Mary, Bass, Mike, Branschofsky, Margret, McClellan, Greg, Stuve, Dave, Tansley, Robert, & Walker, Julie. (2003). DSpace: An open source dynamic digital repository. D-Lib, 9(1). URL: [January 15, 2004].

Steele, Colin. (2003). Phoenix rising: New models for the research monograph? Learned Publishing, 16(2), 111-122. URL:
[January 15, 2004].

Suber, Peter. (2003). Removing the barriers to research: An introduction to open access for librarians. College & Research Libraries News, 64, 92-94. URL: [December 4, 2003].

Suber, Peter. (2004). Open access builds momentum. ARL Bimonthly Report, 232. URL: [March 15, 2004].

Thomson ISI. (2004). The impact of open access journals: A citation study from Thomson ISI. URL: [May 5, 2004].

University of Toronto Libraries. (2002). T-Space community and collection policies. URL: [September 15, 2003].

Van de Sompel, Herbert, Young, Jeffrey, & Hickey, Thomas. (2003). Using the OAI-PMH … differently. D-Lib, 9(7/8). URL: [December 4, 2003].

Velterop, Jan. (2003). Should scholarly societies embrace open access (or is it the kiss of death)? Learned Publishing, 16(3), 167-169. URL:
[January 10, 2004].

Ware, Mark. (2004). Institutional repositories and scholarly publishing. Learned Publishing, 17(2), 115-124.

Willinsky, John, & Wolfson, Larry. (2001). The indexing of scholarly journals: A tipping point for publishing reform? The Journal of Electronic Publishing, 7. URL: [June 10, 2003].

Willinsky, John. (2003). Scholarly associations and the economic viability of open access publishing. Journal of Digital Information, 4(2). URL: [September 15, 2003].

Zandonella, Catherine. (2003). Sabo bill assessed. The Scientist. URL: [September 15, 2003].

  •  Announcements
    Atom logo
    RSS2 logo
    RSS1 logo
  •  Current Issue
    Atom logo
    RSS2 logo
    RSS1 logo
  •  Thesis Abstracts
    Atom logo
    RSS2 logo
    RSS1 logo

We wish to acknowledge the financial support of the Social Sciences and Humanities Research Council for their financial support through theAid to Scholarly Journals Program.