Beautiful exotic bees from Curiosities of entomology.
Groombridge and sons. London, T. W. Wood. 1839-1910

BHL has digitized a treasury of historical, artistic, and scientific images of biological diversity. They are now available, free for a growing global community of biodiversity researchers and remixers.

Opened in 2006, the Biodiversity Heritage Library (BHL) is an international consortium that makes historic and contemporary biodiversity-related research publicly available. They have digitized over 56 million pages of biological images on their online portal

I interviewed Martin Kalfatovic, Director of BHL, and Grace Constantino, Outreach and Communications Manager. As BHL improves access to otherwise inaccessible scientific data, they encounter similar challenges as MHz Curationist faces: staying true to the nuanced, local, or obscure information encoded in each digital object, expanding Global Names Recognition, and increasing the ability of diverse publics to access and use these online resources.

Garrett Graddy-Lovelace: So nice to meet both of you. I have eight pages of notes about the Biodiversity Heritage Library project: it seems ambitious. I wanted to first begin by asking who have been the main users and beneficiaries of this treasury of information so far?

Martin Kalfatovic: The primary users are the global taxonomic systematist community: people who would be visiting, in person, libraries especially in the global north, to see these materials. But what we’ve found is an increasingly large number of users from the global south who are using this online. We keep track of our user community through basic Google Analytics for that raw data. We also have a wealth of information, more anecdotal, of people who are writing to us, tweeting at us, interacting with us in all different types of ways. Donors who give money include comments of how this collection is so useful for them. 

In the past, people would come to our libraries in Europe and in North America from, say, Brazil or Indonesia, and spend hours and hours photocopying things because they didn’t have access to those print publications. Now they get those online, and when they do come to visit one of our museums or Botanical Gardens, they can spend that time with specimen collections or with their colleagues in the region, instead of standing in front of photocopy machines. Answering the call from researchers around the world asking libraries to digitize the materials so that they don’t have to spend all of their in-person time in front of paper materials was one of the main drivers of the original formation of BHL.

GGL: That helps me ask the second question, which is, geographically has there been a diversification of people using BHL?

MK: The US remains the number one user of BHL. The number two now, for the second or third year in a row, is India. China has now moved up significantly in the rankings. Mexico is in the top 10. Brazil is in the top 10. So, we are seeing an increase in the number of users coming from the non-northern hemisphere.

GGL: In terms of language, this is all in English and Latin currently, but is there investment or interest in tapping into some of the other major languages of the world?

MK: The content of BHL is in the language as originally published. And English is the lingua franca of science and diversity science, in terms of the actual publications. So, there’s 60 languages represented in BHL in total. There has been discussion if our user interface should be in additional languages. But again, when we had the Biodiversity of Europe project, we discovered that actually the European partners all preferred English. They did not want a Czech or a Polish version. They were comfortable using English as the user interface. I do feel we probably need three additional interfaces: Mandarin, Brazilian Portuguese, and probably Arabic.

GGL: Frankly, with COVID-19 do you see expanded usage? How are you navigating the increase in online searches for such information, particularly as people aren’t able to travel for research or conferences?

MK: I was also just looking at those statistics. They show a roughly 37% year over year increase in BHL which is a statistically significant increase over our normal growth rate. We passed our millionth PDF generation just recently. 

Grace Constantino: I can confirm our developer has said that we are experiencing more PDF downloads now than ever in our history.

GGL: This is a treasury of information for researchers studying conservation. Do you also see a more public audience? More artists or students? 

“… we’ve seen a huge increase in public audiences interested in our collections, and of those groups particularly artists.”

GC: So, we’ve definitely seen an expansion in non-professional scientific audiences. The biggest catalyst for that has actually been our Flickr account. We digitize the illustrations and make the BHL books available through Flickr: we have over 170,000 there now. Since we’ve been growing and investing in Flickr, we’ve seen a huge increase in public audiences interested in our collections, and of those groups particularly artists. We’ve also been interacting much more with artistic audiences through Instagram. I have lots of examples of artists using BHL materials in their own artworks, or interior designers for decoration. And since COVID-19 has happened, we’ve also seen an increase in parents sharing with us that they are having their children interact with our illustrations and using those for activities. We’ve also done some things like creating coloring books out of images from our collection. Those have been particularly popular recently. Parents are looking for more hands-on activities that their kids can do at home that are free. Something they can download and print. So yes, we’ve definitely been seeing increases in public audiences and citizen science as well. 

GGL: Are you thinking about moving more proactively into online education curriculum or coursework as a future direction? Or making that a larger part of the front interface of the website? The coloring books and the various activities for students for example?

MK: Right now we don’t. We have a very small staff overall. A lot of our staff are volunteers at our partner institutions. So, what we really want to do is just make the content as accessible as possible and as exchangeable and reusable as possible to let the education community build on top of it. With our limited resources, we would rather just encourage and support those educators to work with our content.

GGL: My next question is a little more conceptual. I do work with indigenous groups in Peru, in Mexico, and in the US about agricultural biodiversity in particular, and the question of naming is so important. I rely a lot upon the Linnean taxonomic structure in Latin, and it does provide a global lingua franca of discoverability. But it has been critiqued for having a Eurocentric bias or for not conveying indigenous knowledge about biodiversity and traditional ecological knowledge. How does BHL navigate expanding into the inclusion of indigenous and traditional ecological knowledge and epistemologies about biodiversity while maintaining general discoverability and exchangeability?

“The classic Kingdom phylum order that we learned in grade school is of course a very fluid, dynamic intellectual construct, depending on lots of factors”

MK: One of the reasons BHL is such a big success right from the start is that we remained neutral and agnostic around naming. The classic Kingdom phylum order that we learned in grade school is of course a very fluid, dynamic intellectual construct, depending on lots of factors. So we stayed out of all of that as much as possible. The BHL just reports the actual name string, as it appears in the published text. So that’s really our core name structure. If people want to use one particular classification tree or another, that’s up to them. We are working with the Catalog of Life project to provide more interoperability within the naming structure. In terms of the indigenous information, what would really be great would be to have a good dictionary of common names for species. This was actually one of our very first challenges with one of our colleagues in the biological lab who worked on these kinds of things. Bluefish is a very common fish in the Cape Cod area; “bluefish” is also a word used globally to mean hundreds of different species. So again, tying those things together is very complicated. But, if we can say, in this particular text that Pomatomus saltatrix is called Bluefish in the northeastern United States, that’s a fact that we can state. And Bluefishes, the dolphin, is something else in the southern reaches of South America. We just want to document and let experts battle out what the real meanings of those things are.

GGL: Obviously, Biodiversity Heritage Library and Smithsonian Open Access are deeply interconnected and leaders in the broader expanding Free Knowledge movement worldwide. What do you all with your years of experience in this realm think are the main obstacles to the Open Access, Open Data, and Open Knowledge movements right now? And the main opportunities?

“The major obstacles are copyright, and perhaps the best word: aggressive copyright.”

MK: The major obstacles are copyright, and perhaps the best word: aggressive copyright. The enclosing of the commons. There has been an excessively aggressive expansion of the definition of copyright in the US. This has moved well beyond what was defined in the constitution which, as you know, limited the timeframe of what was in copyright so that it could enter into the public domain. So, I think that’s the single biggest obstacle.

GGL: How are you at BHL and in collaboration with The Smithsonian’s Open Access Initiative addressing these issues? Are you doing your work in terms of trying to push back against aggressive intellectual property regimes, to think about the broader public goods of limiting patents or copyrights and expanding public domain?

MK: The US Smithsonian staff are not allowed to lobby for or against legislation, so we can only be sort of a supportive bystander in that. And again, we are working with partners that can be aggressive lobbyists: the American Library Association or the Internet Archive are other partners that have the ability to lobby. We serve just by being a good example and providing evidence for those people who can lobby. I think this is our best position.

GGL: This is a bit more technical, but I’m fascinated with the Nagoya Protocol, and the question of Access and Benefit Sharing–another contentious space. I study the UN Food and Agriculture Organization’s International Treaty of Plant Genetic Resources for Food and Agriculture, where I see this argument unfold. There’s a vision of Open Knowledge: just circulate the data about all of the resources that are kept ex situ. And then there’s movements saying, “well, we still need to have prior informed consent, to prevent appropriation.” People who have the resources can make use of the openness of Open Knowledge, but for those who don’t have the resources of access, all of a sudden their knowledge and the benefits of it are up for grabs. Proprietary intellectual property could be applied to a derivative from data deemed Open. So, it sounds like BHL is deliberately apolitical. But is there any engagement with the Nagoya Protocol on Access and Benefit Sharing, or this broader global tensions and contentions around prior informed consent and Open Knowledge?

MK: BHL is a big supporter of the Open Knowledge concept. 

GC: The Bouchout Declaration.

MK: Yes, the BHL is a signatory to the Bouchout Declaration, in terms of ensuring that biodiversity knowledge is not enclosed. So again, that’s our stance on that: be supportive of all of those types of things.

GC: I will also add that we have been working with the Global Biodiversity Information Facility (GBIF). They have been working on the Alliance for Biodiversity Knowledge that ties into the Nagoya Protocol and a lot of those other biodiversity targets. We have had staff participate in a couple of the different conferences that they’ve posted over the years to bring together the biodiversity data community to address some of these challenges and to support open access to data and knowledge. We have particularly been involved in the GBIO framework related to data that GBIF has been helping to corral and bring together. So I think that’s another way that, by our participation in some of these consortium projects with other biodiversity data providers and aggregators that are working towards these common goals, we can support open knowledge and access to the data.

MK: And we’ve also worked with the Convention of Biological Diversity, the current head of the CBD is Braulio Dias from Brazil. He was at one of our organizational meetings in Brazil in 2010. So the CBD is also an engaged forum for BHL to interact with the global community around those types of issues like Nagoya.

GGL: There’s been a history of scientists working together to converge knowledge on climate change on biodiversity loss, and right now on pandemic-related knowledge in terms of the biological origin of certain pandemics. There’s also been a pushback from certain governments in terms of national sovereignty of knowledge or data. Does BHL encounter this? I know the US has very aggressive intellectual property rights but also advocates for open access to global data. Does DHL encounter national sovereignty of data issues or obstacles?

MK: Not really, because again, we’re dealing with published materials, generally speaking, as well as the manuscript materials that are in our own repositories. If anything, it is going in the opposite direction where BHL has been repatriating data to other parts of the world. Our main example is a publication called Biologia Centrali-Americana, which was a project from the late 19th, early 20th century to document biodiversity in Central America and Mexico. Organized out of the Musuem of London. Of sixty-three volumes globally, there were only two sets, two complete sets, of that publication in Central America and Mexico. Both of those were at the Smithsonian Tropical Research Institute in Panama. So, by digitizing that we really repatriated that important era of biodiversity knowledge back to the people of Central America and Mexico.

GGL: Amazing. And are there other instances of repatriation that come to mind?

MK: In the earlier days of BHL, Brazil was really excited about the content, and what a couple of different libraries in Brazil were doing was actually downloading PDFs from BHL, and then storing those locally so that their users would have articles specifically related to those taxon and areas within Brazil. So, lots of that sort of downloading and storing of local copies for people who might have difficulty with internet connections. Another example might be some of our African colleagues, at the National Museum of Kenya.

GC: A couple of years ago, we started working with Sub-Saharan African national institutions through a GRS Foundation grant that was designed to help us make gray literature in Sub-Saharan Africa that might not be available outside of those countries more available digitally. And today we have about ten institutions throughout Sub-Saharan Africa that we’re currently working with. The two primary ones are the National Museum of Kenya, as well as the South African National Biodiversity Institute. Kenya particularly has focused on digitizing their institutional publications, which are modern mostly, so they’ve been providing the rights for us to digitize and make those available in open access freely through BHL, so that information can now be accessed by anybody throughout the world. Those are some good success stories.

GGL: When they digitize under Creative Commons, do you all counsel them on which Creative Commons licenses to use?

GC: The standard license that our permissions agreements use is CC-BY-NonCommercial- ShareAlike. But we have had institutions opt for a less restrictive CC-BY.

GGL: I would love to learn more about the repatriation projects: is that a goal in the future–moving in that direction?

“That is really what repatriation is, and that is what BHL is about.”

MK: It’s even not a goal moving forward. It’s part of the DNA of BHL. Again, most of the literature about the global south has been published in the global north. So, really, it’s providing that information about biodiversity, wherever it lives from wherever it’s published. That is really what repatriation is, and that is what BHL is about.

GGL:  A follow-up question has to do with agricultural biodiversity. I study it, but I’m also now part of some emergency [COVID-19 related] seed distributions and swaps. I am also working with indigenous farmer groups in the US, Peru, and Mexico who are recovering native seed varieties. I worked with an Andean Peruvian Quechua agricultural community that repatriated–they use this word–native potato varieties from the International Potato Center in Lima. There is a key question about how these large ex situ gene banks could work to circulate and distribute their germplasm to communities and growers working to re-adapt varieties to diversify food system, for food security and food sovereignty reasons. There is need to know what older food crop varieties existed–and where. Is agricultural biodiversity a particular focus of BHL?

MK: Yes, and I see Grace smiling. It’s been a slightly controversial topic because BHL really did begin as a sort of hardcore systematics taxonomy community. However, just by the nature of a number of our early partners being botanical gardens–New York Botanical Garden, Missouri Botanical Garden are the two biggest–agriculture and horticulture were by default sort of part of BHL. When the USDA National Agricultural Library here in the Washington area joined, there was a floodgate of technical reports, agricultural reports that were driving some of our systemists crazy when their feeds were full of all these seed things and agriculture reports and dairy cow stuff. Overall, our world community has really, I think, appreciated this access to that material through the Biodiversity Heritage Library platform. We participated a couple of years ago, in the Feeding Nine Billion Global Food Security workshop at the National Agricultural Library. Along with agriculture, ecosystems and environmental studies, those are the next really big areas that BHL can be used for.

GC: And the food security workshop that we participated in had a heavy focus on wild crop relatives information. Those in the BHL collections were made use of by those communities. 

GGL: What would that look like, crop wild relatives information, as the next big area for BHL as global materials are digitized and systematized?

MK: I think it’s providing the access. BHL really is driven by those names strings in terms of finding stuff. I’m really not sure: that’s a good question for our developers on the name side, how to do agricultural names fit into the cultivars, and how does that fit into our naming structure? Right now we don’t have ease of use for the non systematist, and how can we work on that? We do have a big seed catalog project out of the National Agricultural Library and New York Botanical Garden.

GC: At least 50,000 nursery catalogs are saved and digitized in BHL, and they’re curated into a seed and nursery catalog collection. The primary contributors to that are the National Agricultural Library, New York Botanical Garden, and Cornell University. And a lot of that was spurred both by the National Agricultural Library joining BHL and a grant funded project to improve Optical Character Recognition in BHL. Historic catalogs often have a lot of difficult type, whether it’s type in different columns or different kinds of fonts so they’re really good for testing OCR and also, if you’re trying to develop algorithms or ways that you can improve OCR quality. So as part of that grant we focused on digitizing feeder nursery catalogs. The BHL collection worked as a testbed for enhanced OCR.

GGL: Are you thinking about focusing on agricultural biodiversity in the future building for more global food security collaborations?

MK: We don’t have a conscious plan to do that yet, because again we have limited resources at the current time.

GGL: Well, my final question is how did you all get involved in this, and what are things that are particularly compelling or surprising in the work that you’re doing right now?

“So I think having just sort of a ‘Oh, let’s do a digital library’ project turn into a global phenomenon has been quite surprising for me.”

MK: I became engaged originally in 2004 with a Mellon funded conference that we had on digital libraries. That grew by 2006 into the founding of BHL. I was the deputy director of BHL under the inaugural director. The most surprising thing about BHL is the way it’s grown into a community as opposed to yet another Digital Library Project. It really is a global community of librarians, taxonomists, informatics people, computer scientists, people that are really working together for a common goal. So I think having just sort of a ‘Oh, let’s do a digital library’ project turn into a global phenomenon has been quite surprising for me. 

GC: I got involved originally as an intern at Smithsonian libraries. While I was in grad school back in 2008, I did a summer internship, and at the end of the summer there happened to be a library technician position available at Smithsonian libraries that was focused on working on BHL. I have worked my way up through different roles to outreach and communication manager. It’s a very rewarding job, like Martin said, to be able to be part of such a passionate community involved in BHL. Those in the BHL network have their own jobs at their own institutions, they’re giving their time to support BHL because they believe so strongly in it. I have the privilege to interact directly with our users around the world, and that’s really gratifying because you get the stories from them. BHL enabled them to do their research in a lot of cases where they wouldn’t have been able to before. And the artists reusing their work; students who are able to make discoveries do their thesis work. Then, some of the children are starting to get engaged with and loving nature through these illustrations or their coloring books. So, to me, hearing the stories from our users and seeing what an impact we’re making is really special.

GGL: It’s such an exciting project. Any final comments about the future of BHL? 

MK: I think this COVID-19 pandemic is really going to be an interesting period for BHL. Many of our partners are under extreme financial stress, as well as lockdown situations. When we all do come back, it’s going to be a challenge to actually get those partner institutions ramped back up. So that’s one of our key goals right now, to provide meaningful telework opportunities for our partners around the world. Our other BHL collections manager is working with partners around the world to help their staff accomplish meaningful work with BHL, and provide their value back to their institutions. So, I think, again, BHL is really trying to serve, not only our users but also our community and I think that’s going to be our challenge for the next year: to go back in full force to that.

GC: The only thing I would add to that is I also think that while there are many challenges that we’re going to be facing fuelled by the current global situation, there’s also new opportunities that have arisen as more and more people are becoming more familiar with, and even in some cases more confident with, online meetings, visual conferences, and virtual meetings. So, I think that there are some opportunities where we can use the skills that are being developed by a lot of our community. And working to offer more virtual opportunities for our user communities, through workshops, online trainings in the future. So I’m interested to see how that develops and where we can take that in the future.


To learn more about their work:

In 2019, Martin Kalfatovic and Grace Constantino co-authored a chapter on their work “The Biodiversity Heritage Library: Unveiling a World of Knowledge about Life on Earth” in Digital Libraries for Open Knowledge, co-edited by Doucet et al. Last fall, they presented on BHL at the Biodiversity_Next Conference.

