The Dataverse Network Project is housed at the Institute for Quantitative Social Science (IQSS) at Harvard University. Visit us at http://TheData.org and read all about it.
The Project is an open-source software development community, housed at the IQSS. Via web application software, data citation standards, and statistical methods, the Dataverse Network project increases scholarly recognition and distributed control for authors, journals, archives, teachers, and others who produce or organize data; facilitates data access and analysis for researchers and students; and ensures long-term preservation whether or not the data are in the public domain.
The structure of the Dataverse Network (DVN) software application is shown in the following figure.
The following concepts apply to the DVN application:
The Project developed and maintains the DVN application in response to digital storage issues:
The DVN provides an interface that is easy to use. Your DVN can harvest collections and studies from other repositories that are compliant with the Open Archives Initiative (OAI), and provide access to all contents through a single interface. DVN software includes powerful browse and search features that enable users to look across all data sets, across those within a specific DVN, or only within a specific dataverse. Or users can search for a dataverse that offers a specially organized view on some relevant area. You can download studies and data with one click, or analyze data online without downloading anything.
A DVN installation, like any library, can be open to the public or closed to specific membership. Anyone can use the DVN hosted by IQSS at http://dvn.iq.harvard.edu/dvn/, shown in the following figure. Institutions that host other DVNs determine who can use their services.
When you contribute a study to a dataverse, the DVN assigns a citation to your study automatically. This citation gives recognition to study and data creators in terms of web visibility (within their own dataverse) and scholarly citation credit (through the combination of our data citation standard and links to the authors print publications).
This citation comprises several components: authors, year, title, and unique identifiers. Citations also can have optional features in a standard format, such as “Murray Research Archive [distributor].” With this citation, all participants in a study are credited and recognized.
A unique, global, persistent identifier is part of the citation for every study. When you contribute subsettable data to a study, the DVN also generates a Universal Numerical Fingerprint (UNF) identifier for the study. Each subsettable data set also receives a UNF, and the data is preserved in a format-independent state by the DVN.
The DVN hosted by the IQSS assigns a unique global identifier that complies with the international Handle System, in the form hdl. This identifier is designed to persist even if Uniform Resource Locators (URLs) no longer are used. A handle is resolved into a URL, but if the URL changes in the future the handle remains the same.
The UNF component is a string of letters and numbers computed from the data set that uniquely identifies its content. It guarantees that users can verify that data retrieved is identical to data used earlier. If the data change, the UNF changes. Four features make the UNF especially useful:
For example, the following citation includes a Handle.net ID and a UNF, is available in the ICPSR dataverse at the IQSS DVN, and contains data from a seminal study conducted in 1959-60:
”Almond, Gabriel; Verba, Sidney, 1984-06-20, “Civic Culture Study, 1959-1960”,
hdl:1902.2/720 UNF:3:rN4NdVDG3yDThIqzlhnCwg== Inter-university Consortium for Political and Social Research [Distributor].”
By viewing the UNF citation for specific data, users can validate that the data they choose to use is correct, without being able to see the data.
The DVN enables you to control distribution and contribution of both studies and data. You can enable or disable contributions to your dataverse. You also can enable and disable downloads of your study files.
In addition, to protect your dataverse and your contributors, you can apply Terms of Use at both the dataverse and study levels. The DVN also can apply Terms to users and contributors.
A dataverse offers you a secure facility to upload data and associated documentation. You can organize your studies in collections that you find useful. And, you can enable others to share their data, too, and provide a venue for making your collected works publicly accessible and properly recognized. You set the Terms of Use for each study, or for your entire dataverse. The result is that you control your data, and no one else.
Creating a dataverse is easy. You create an account using basic information (name, email address) and provide a name for your dataverse. That is all you need to do.
You can create studies in your dataverse and upload any number of files associated with those studies. Other dataverse owners might permit you to contribute studies in their dataverses, or they might create collection within their dataverses and select studies from within your dataverse to include in those collections.
When you create a study, you enter associated metadata that comprises the Cataloging Information about the study. Metadata can include such information as the survey for which the data were collected or are otherwise connected with, how the data were obtained, and keyword summaries of its content. It also might include more detailed information, such as variable names and labels, value labels, missing value codes, data structures, and other types of extensive machine-readable documentation. You can define standards for minimal required metadata fields in your dataverse, but in general the more metadata you provide, the more services the DVN can offer to other researchers. Additional metadata improves searchability of your study (search is based on metadata) so other users can more easily find and understand your study. You can make your metadata public but restrict access to the data files.
After uploading, the DVN assigns to each study a formal citation to these data, to be used in your study results and to be cited in other publications.
You decide which data to include in your dataverse: only data sets that you produced, any data associated with your community, or links to collections of data available in other dataverses.
For example, if your web site is for a project, research center, or other entity, the main page of your dataverse could offer an expandable list of projects, subjects within projects, issues within subjects, surveys within issues, and data sets corresponding to specific surveys.
You can create collections of studies within your own dataverse, or of studies from other dataverses. There are three ways to define a collection:
You can brand your dataverse as your own, with the same look and feel as your web site. Copy your style information or HTML from your personal web site, and paste that code to your dataverse header and footer. Then link to your dataverse from your main web page, and from your main web page to your dataverse. The result is two web sites that appear as one, but your web site is hosted by your provider, and your dataverse is hosted by the DVN. You leverage the services and support of the DVN to incorporate your data into your web site, without having to install or back up anything.
There are three types of controls and protections that you can assign to different functions in your dataverse:
Institutions and large organization can install the DVN software to host shared-access digital library services, and to control access to their library’s contents. In addition, a DVN can harvest studies from other DVNs. Community members use these digital storage and archive services to view and create dataverses and studies. A DVN provides all the services of a digital repository: ease of use, recognition, persistence, preservation (verification), validation, access control, and legal protections.
Each DVN has infrastructure to identify members, such as by Internet Protocol (IP) address, or through pin servers or passwords. Hosts can subscribe to data services and offer them to selected members, and make metadata available more widely so others can see what is available and request access.
A DVN provides the following types of Search services:
When a study includes subsettable data sets, or data sets in SPSS and Stata formats, many services are available to users to study these data:
The DVN incorporates Zelig statistical software, which provides a common framework for statistical analysis and software development, and a unified syntax for using statistical methods implemented in R (see R Project for Statistical Computing, http://www.r-project.org). This means that any method in Zelig also can be run by anyone through a dataverse, without knowing R. These models all can be run online, without having to download data or install, learn, or run your own statistical software.
The Project is developing support to provide subsetting and analysis functions for other data formats.
At the IQSS DVN, you can access existing data, create your own dataverse, or contribute your data in the Murray Research Archive. You can download data from any available dataverse and use it in your own applications, or use the DVN subsetting and analysis tools online.
Anyone can create a dataverse at IQSS DVN:
Or, you can contribute a study to the Murray Research Archive dataverse:
The IQSS and Harvard University guarantee to maintain all contributions and keep each available at the handle identifier assigned to it. Harvard guarantees the data backup terms for the IQSS DVN.
Data currently in the IQSS DVN includes the following:
The Project is a member of a large community of open-access partners. Currently, we harvest data from and export to other repositories: ICPSR, ODUM, Roper, National Archives, US Census, and more. International partnerships are underway with NESSTAR and the UN Universities. Dataverse
owners include Chinese, Mexican, Japanese, and French participants. Our community of developers is active and growing. See our Partners page on TheData.org
DVN software is written under the Java Platform, Standard Edition (Java SE), using the latest Java technologies, including Enterprise Java Beans (EJB) and Java Server Faces. It runs on top of the GlassFish Application Server. We use the PostgreSQL database management system software, but you can use other databases easily, such as Oracle or MySQL. The data analysis component uses R and Zelig for statistical computing.
The following figure depicts the DVN architecture.
DVN software is open source, and is licensed under Affero, a version of General Public License. This license guarantees that the software is available now and that all future versions of it, including any versions created from it by others, remain available. Under this license, you have full access to all versions of the source code and a permanent license to use it and build on it as you see fit. All development is open to contributions through use of the SourceForge.net development web site.
Refer to these sources for more information.
Gary King. “An Introduction to the Dataverse Network as an Infrastructure for Data Sharing,” Sociological Methods and Research, 32, 2 (November, 2007): 173--199, (Abstract: HTML, Article: PDF).
| Attachment | Size |
|---|---|
| Data_Sheet-DVN_Project.pdf | 482.66 KB |
© 2006-2008 Dataverse Network Project. Housed at the Institute for Quantitative Social Science at Harvard University. Site map