Tuesday, December 12, 2017

Open Access: What should the priorities be today?

This year marks the 15th anniversary of the Budapest Open Access Initiative (BOAI), the meeting that led to the launch of the open access movement, and which defined open access thus:

“By ‘open access’ to this literature, we mean its free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.”

A great deal of water has passed under the bridge since 2002, but as 2017 draws to an end what should the stakeholders of scholarly communication be doing now to fully realise the vision outlined at the Budapest meeting?

That is a question I have been putting to a number of people, inviting them to say what they believe the priorities should be going forward for the following stakeholders: researchers, research institutions, research funders, politicians and governments, librarians and publishers. 

Today I am publishing the response I received from Danny Kingsley, Deputy Director of Scholarly Communication & Research Services, and Head of the Office of Scholarly Communication, at Cambridge University. This is what Danny had to say:


Researchers are under extraordinary pressure to be highly productive whilst managing large teaching loads and a considerable administrative burden. They are only rewarded for the publication of novel results in high impact journals – the remainder of their work is not systematically recognised in promotion and funding applications. Researchers are working within a reward system that is counter to the wide-scale uptake of open access.  It is not necessarily up to an individual researcher to become an open access activist (although I applaud those who have). However, having a meta understanding of how the wider system works, or even that processes and norms differ by discipline would be helpful.

Research institutions

A recent article described the need for academic leaders in research institutions to ‘step up’. Traditionally research institutions have been slow to react to issues related to open access and this is understandable. Research is a global and highly competitive endeavour. It is not in the interest of a single institution to introduce operational processes that could put its own reputation or that of its research community at risk. 

Having a university-wide discussion about the changing nature of scholarly communication, including open access, to determine the university’s position on various aspects of openness and reproducibility would provide a policy framework to assist the research community and their administrative support. For instance, research institutions have control over how they promote and hire researchers. Rewarding openness in these processes - by, for example, only considering articles that have been deposited in the institutional repository in a promotion round – will increase academic engagement with open access.

In addition, research institutions have a responsibility to ensure their students and early career researchers have the knowledge about, and tools to navigate, the fast-changing scholarly communication landscape. This means systematically putting in place support and training in this area rather than relying on individual supervisors. Recognition by research institutions that the people working in the area of scholarly communication are domain experts and offering professional respect would be hugely helpful in the progression of the scholarly communication discussion. Funding these roles as a central part of the institutional infrastructure is essential.

Research Funders

Funders hold the purse strings. Their policies have largely shaped open access uptake over the past 13 years (starting with the Wellcome Trust and NIH in 2004). However, simply pushing more money into the system is not a long-term solution because it allows the status quo to remain, where the power is held by the publishers of the prestigious journals that researchers need to publish in for promotion and grant success. Opening up what is ‘counted’ as a research output begins to address this – a recent push to accept preprints as research outputs by NIH and Wellcome Trust is a good step in this direction. However the reluctance of funders to be seen to be ‘telling researchers where to publish’ by, for example, refusing to officially state that Blood is a non-compliant journal for the RCUK policy is a real impediment to progress (see here)

Funders have the ability to create a new reward system. Movements to become publishers themselves, such as Wellcome Trust and the Bill and Melinda Gates Foundation both recently launching journals based on the F1000 platform have shown progression towards this goal. However, some in the sector have concerns that this places too much power in the hands of one stakeholder, as the decision makers of not just what research is funded, but what is published.

Something that is sorely lacking in the open space is strong support for infrastructure that underpins it. The Open Science Framework, Directory of Open Access Journals, arXiv.org and Sherpa/RoMEO are just a few examples of critical open infrastructure which is reliant on grant and sector support. This leaves these initiatives very vulnerable to being bought up and absorbed into the for-profit sector, as has been seen with several organisations and initiatives over the past couple of years. Funder support on an ongoing basis for infrastructure would be very welcome, and there has been some discussion about this approach recently, particularly in the life sciences.

Politicians and governments

Ultimately policy on public spending comes from the government. It is important that the need for openness is recognised at the highest level. It has been interesting to see how much can be achieved if the government takes a positive approach to openness such as in the Netherlands, where the open agenda, particularly in relation to research data management is exemplary. 

The House of Commons Science and Technology Committee Research integrity enquiry is a positive step in the UK and the findings should help inform discussion about the need for transparency in research including openness. There is also potential for more alignment of government funder policy with the imminent launch of UK Research and Innovation in April 2018.


Libraries are currently undertaking the vast bulk of open access work, particularly in the UK. They are hosting repositories, managing the ingest of material including navigating complex and conflicting funder policies and complex and confusing publisher embargo rules. They are, in most cases in the UK, responsible for managing the grants provided by funders to meet open access policies. This includes making decisions on engagement with offsetting deals and balancing financial requirements for subscriptions. The difficulty is that much of this work is ‘hidden’ from the research community who generally remain unaware of the amount of manpower and money that is being sunk into this endeavour. 

It would be hugely helpful for the debate if libraries were to collect information about how the sector interacts with the literature. A holistic view - including where the research community publishes, peer reviews and edits, plus how much is being spent on subscriptions and article processing charges balanced against the number of downloads and citations – would assist discussion about the value different publishers offer the sector.


There have been some good examples of publisher practice in this area recently. The Springer Compact, where a fee is paid for both subscriptions and to make all research outputs from a given institution openly accessible, has considerably brought down the per-article cost to make research open for Cambridge University.  In the last couple of months both Emerald and the Royal Society have made the decision to abandon embargoes for Author’s Accepted Manuscripts deposited in institutional repositories.

We need to move towards standardised infrastructure. Increasingly there is publisher uptake of the CRediT taxonomy which can assist the identification and reward of different contributions to the research paper. There has also been a strong response to the Initiative for Open Citations project although some significant publishers have yet to join. Mandating ORCIDs across all titles would assist in the identification of authors of papers.

However, overall we have not seen a shift in publisher practice towards open access. While considerably more UK research is now open access than there was even three years ago, this has come at a huge cost, both in payment of Article Processing Charges (APCs) for open access in hybrid journals and through the manpower needed to manage complicated embargo rules. If publishers genuinely do wish to engage with and support a transition to open access there needs to be:

·        A reduction of APCs for hybrid to match the APCs generally charged for fully open access journals (for whom this is their only source of income)

·        Raising the visibility of open access articles through hybrid which remain less discoverable than articles in fully open access journals

·        Recognition that offsetting needs to occur – despite some claims that double dipping is not happening

·        Improvement of the offsetting deals on offer to make them simpler and more standardised in terms of administrative management and the discount offered – potentially signing up to a sector agreed offsetting “standard”

·        A standardisation of embargo periods, preferably to a 6-month STEM and 12-month HASS upper limit, and ideally no embargo at all

·        Engagement with the UK Scholarly Communications License to reduce the time spent processing embargoes and the cost spent on hybrid APCs to ensure compliance with the RCUK policy

·        The ‘flipping’ of journals that have a higher percentage of open access via hybrid, at which point the APC charge needs to reflect the running cost of the journal, and allow enough to provide waivers for researchers from developing countries 

Friday, October 27, 2017

The OA Interviews: Judy Ruttenberg, ARL Program Director for Strategic Initiatives/Co-Director of SHARE

When the open access movement began it was focused on solving two problems – the affordability problem (i.e. journal subscriptions are way too high, so research institutions cannot afford to buy access to all the research their faculty need), and the accessibility problem that this gives rise to.

Today, however, there is a growing sense that what really needs addressing is an ownership problem. Thus where in 2000 The Public Library of Science petition readily acknowledged publishers’ “right to a fair financial return for their role in scientific communication” (but sought to “encourage” them to make the papers they published freely available “within 6 months of their initial publication date”), today we are seeing calls for research communication to become “a community supported and owned enterprise” outside the control of publishers (see also here).

The key issue today, therefore, concerns the question of who should “own” and control scholarly communication, and more and more OA advocates are concluding that it should no longer be traditional publishers.

This change of emphasis is not surprising: as legacy publishers have sought to co-opt open access and bend it to their own needs, it has become clear that, since it is leaving legacy publishers in control, OA is insufficient on its own – because for so long as publishers remain in control the affordability problem that drove the calls for open access will not be solved. (More on this theme here).

What gives this issue greater urgency is a new awareness that legacy publishers are looking to leverage the control they have acquired over scholarly content to dominate and control the data analytics and workflow processes/tools that are emerging in the digital space – a development that could usher in a new generation of paywalls, and lock the research community into expensive proprietary services.


This then is the ownership problem facing the research community. How is it playing out in practice? The linked interview with Judy Ruttenberg, Co-Director of SHARE, surfaces the issues well I think.

SHARE (the SHared Access Research Ecosystem) was launched in response to a 2013 memorandum issued by the US Office of Science & Technology Policy (OSTP) directing Federal agencies with more than $100M in R&D expenditures to “develop plans to make the published results of federally funded research freely available to the public within one year of publication and [require] researchers to better account for and manage the digital data resulting from federally funded scientific research.”

SHARE was an expression not just of librarians’ conviction that publicly-funded research should be freely available, but an assertion that it should be universities that provide access to it. As Ruttenberg puts it below, SHARE was founded in the belief that “university-administered digital repositories should be the mechanism by which federal agencies provide public access to funded research, most of which is conducted in universities.”

To read the Q&A please click here.

As is my custom, I have prefaced the interview with a long introduction. However, those who only wish to read the Q&A need simply click on the link at the head of the file and go directly to it.

Thursday, October 12, 2017

Q&A with PLOS co-founder Michael Eisen

Last month I suggested on Twitter that the open access movement has delayed the revolution in scholarly communication that the internet made possible. Perhaps unsurprisingly, my tweet attracted some pushback from OA advocates, not least from Michael Eisen, co-founder of open access publisher Public Library of Science (PLOS).
Photo: CC BY 4.0; Source here
Eisen objected strongly to my assertion and later complained that I was not willing to engage with him to defend what I had said. For my part, I did not feel it was possible to debate the issue adequately on Twitter, so we agreed to do a follow-up to our 2012 Q&A

Last week, therefore, I emailed Eisen a Word document explaining why I had made the assertion I had, and posing 14 questions for him. I published the explanation here yesterday.

The first question in the list I sent to Eisen was: “How would you describe the way the OA movement has developed, and the impact it has had on scholarly communication? Why do you disagree that the movement has delayed open access? What is the movement’s current status and potential? What needs to be done to ensure the revolution takes place sooner rather than later?”

Eisen did not respond to this first question, I assume because he felt that his answers to the other questions addressed the points. He did, however, answer all the other questions, which I publish below.

For anyone who might not be aware, PLOS started out as an OA advocacy group, and in 2000 launched an online petition calling on scientists to pledge that they would discontinue submitting papers to journals that did not make the full text of their papers freely available (either immediately or after a delay of no more than 6 months). The petition attracted tens of thousands of signatures, but few of the signatories changed their behaviour.

In 2003, therefore, PLOS reinvented itself as an OA publisher and began to launch its own journals. Since then it has become a significant presence in the scholarly communication world.

Eisen, an evolutionary biologist who studies flies at the University of California, Berkeley, co-founded PLOS with former director of the US National Institutes of Health Harold Varmus and biochemist Pat Brown. Eisen is still on the PLOS Board.

Earlier this year Eisen announced his intention to run for the United States Senate. 

The interview begins …

RP: OA advocates have long maintained that pay-to-publish gold OA will create a “true” market for scholarly communication, something the subscription model never has. As a result, they argue, the price of scholarly communication will start to fall with OA, thereby solving the affordability problem. I believe this was your view too. In 2012, for instance, you pointed out on my blog that the subscription model creates a “disconnect between the people making the decision about where to publish and the people who pay the subscription bills. This means that there is very little effect on author demand if prices go up.” You added that this inefficiency is “absent from gold OA.” Have your views on this changed in any way in the past 5 years? Do you expect pay-to-publish gold OA to eventually exert downward pressure on prices? (Contrary to Elsevier’s view that “average APCs would need to rise to fund the infrastructure currently paid for via the 80 percent of articles [still] published under the subscription model”).

ME: I still believe a service model for publishing creates a better market than the subscription model for the reasons outlined above. But it’s clearly not working as well as I would like it to. Prices have not dropped, nor seem likely to in the near future. You can point to several reasons.

·       people aren’t really paying for a service, they’re paying for a brand, and the brand value of something like “Nature” swamps the actual cost of the service so it’s not really a sane market

·       costs haven’t really dropped – publishing software still sucks and is expensive and the whole process requires far too much human intervention

·       costs have been externalized again through deals to pay for pay charges

Even with all this being true everything’s still better if we have a world with universal APCs than one with universal subscriptions since material is no longer paywalled. But the market advantages of APCs have yet to be realized.

Wednesday, October 11, 2017

Has the open access movement delayed the revolution?

Last month I posted a couple of tweets that attracted some pushback from OA advocates. In the process I was accused of being a species of “Russian troll bot”, of having an unspecified “other agenda”, and then told that unless I was willing to engage in “constructive discussion” I should pipe down.

Amongst those to object to my tweets was PLOS co-founder, and feisty OA advocate, Michael Eisen (see below). 

Evidently dissatisfied with my responses, Eisen declared that it was silly to make inflammatory statements on Twitter and then say that the platform is a bad place for discussions. However, after a few rounds of back and forth with my critics, I had concluded that it was not going to be possible to debate the matter in short bursts without ending up simply swapping insults. So, I proposed to Eisen that we do a follow up to the Q&A we had done in 2012.

Eisen agreed, and last week I emailed him the text below by way of explanation as to why I had made the comments I had, along with a number of questions for him to answer. I plan to publish Eisen’s answers in the next couple of days. (Now available here).

What sparked the disagreement? It began when I tweeted a link to a confessional interview that Leslie Chan had given on the OCSDnet website. Amongst other things, Chan conceded that he had, over the years, given a lot of bad advice about open access.

In posting a link to the interview I commented, “I wish all OA advocates could be this honest, rather than repeating out-dated mantras & promoting failed approaches.” 

By way of background, Chan was (with Eisen) one of the small group of people who attended the Budapest Open Access Initiative (BOAI) meeting in 2002. It was from that meeting that the term open access emerged, and BOAI is viewed as the moment the OA movement came into being. As such, Chan’s confession seemed to me to be a significant moment, not least because it was made with more candour than I have come to expect from OA advocates.

OA advocate Stephen Curry responded to my initial tweet by saying, “Also true that OA has stimulated many to think seriously about & challenge current practices around research evaluation. Myself included.” To this I replied, “To some extent, I agree. But I would phrase it this way: the internet made a revolution possible, open access has delayed that revolution.”

Eisen denounced my comment as “one of the most ridiculous, misguided, and frankly ignorant statements about scholarly publishing ever.

Anyway, below is why I said what I did in those tweets.

Wednesday, September 06, 2017

The Open Access Interviews: Justin Flatt on the Self-Citation Index

In a recently published paper, Justin Flatt and his two co-authors proposed the creation of the Self-Citation Index, or s-index. The purpose of the s-index would be to measure how often a scientist cites their own work. This is desirable the authors believe because current incentive systems tend to encourage researchers to cite their own works excessively.

In other words, since the number of citations a researcher’s works receive enhances his/her reputation there is a temptation to add superfluous self-citations to articles. This boosts the authorh-index – the author-level metric now widely used as a measure of researcher productivity.

Amongst other things, excessive self-citation gives those who engage in it an unfair advantage over more principled researchers, an advantage moreover that grows over time: a 2007 paper estimated that every self-citation increases the number of citations from others by about one after one year, and by about three after five years. This creates unjustified differences in researcher profiles.

Since women self-cite less frequently than men, they are put at a particular disadvantage. A 2006 paper found that men are between 50 and 70 per cent more likely than women to cite their own work.

In addition to unfairly enhancing less principled researchers’ reputation, say the paper’s authors, excessive self-citation is likely to have an impact on the scholarly record, since it has the effect of “diminishing the connectivity and usefulness of scientific communications, especially in the face of publication overload”?

None of this should surprise us. In an academic environment now saturated with what Lisa Mckenzie has called “metrics, scores and a false prestige”, Campbell’s Law inevitably comes into play. This states that “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

Or as Goodhart’s Law more succinctly puts it, “When a measure becomes a target, it ceases to be a good measure.”

However, academia’s obsession with metrics, measures, and monitoring is not going to go away anytime soon. Consequently, the challenge is to try and prevent or mitigate the inevitable gaming that takes place – which is what the s-index would attempt to do. In fact, there have been previous suggestions of ways to detect possible manipulation of the h-index – a 2011 paper, for instance, mooted a “q-index”.

It is also known that journals will try and game the Impact Factor. Editors may insist, for instance, that authors include superfluous citations to other papers in the same journal. This is a different type of self-citation and sometimes leads to journals being suspended from the Journal Citation Reports (JCR).

But we need to note that while the s-index is an interesting idea it would not be able to prevent self-citation. Nor would it distinguish between legitimate and non-legitimate self-citations. Rather, says Flatt, it would make excessive self-citation more transparent (some self-citing is, of course, both appropriate and necessary). This, he believes, would shame researchers into restraining inappropriate self-citing urges, and help the research community to develop norms of acceptable behaviour.

Openness and transparency

However, any plans to create and manage a researcher-led s-index face a practical challenge: much of the data that would be needed to do so are currently imprisoned behind paywalls – notably behind the paywalls of the Web of Science and Scopus. 

Tuesday, August 29, 2017

The Open Access Interviews: Rusty Speidel, The Center for Open Science

The Center for Open Science (COS) has announced today that six new preprint services have launched using COS’ preprints platform, taking the number of such services to 14. 

The announcement comes at a time when we are seeing a rising tide of preprint servers being launched, both by for-profit and non-profit organisations – a development all the more remarkable given scholarly publishers’ historic opposition to preprint servers. Indeed, so antagonistic to such services have publishers been that until recently they were often able to stop them in their tracks. 

In 1999, for instance, fierce opposition to the E-BIOMED proposal mooted by the then director of the US National Institutes of Health Harold Varmus caused it to be stillborn.  

Publisher opposition also managed to bring to a halt an earlier initiative spearheaded by NIH administrator Errett Albritton. In the early 1960s, Albritton set up a series of Information Exchange Groups in different research areas to allow “memos” (documents) to be mutually shared. Many of these memos were preprints of papers later published in journals.

Albritton’s project was greeted with angry complaints and editorials from publishers – including one from Nature decrying what it called the “inaccessibility, impermanence, illiteracy, uneven equality [quality?], and lack of considered judgment” of the documents being shared via “Dr Allbritton’s print shop”. The death knell came in late 1966 when 13 biochemistry journals barred submissions of papers that had been shared as IEG memos.

Seen in this light, the physics preprint server arXiv, created in 1991 and now hugely popular, would appear to be an outlier.

The year the tide turned

But over the last five years or so, something significant seems to have changed. And the year the tide really turned was surely 2013. In February of that year, for instance, for-profit PeerJ launched a preprint service called PeerJ Preprints

And later that year, non-profit Cold Spring Harbor Laboratory (CSHL) launched a preprint server for the biological sciences called bioRxivRather than opposing bioRxiv, a number of biology journals responded by changing their policies on preprints, indicating that they do not now consider preprints to be a “prior publication”, and thus not subject to the Ingelfinger rule (which states that findings previously published elsewhere, in other media or in other journals, cannot be accepted). Elsewhere, a growing number of funders are changing their policies on the use of preprints, and now encouraging their use.

This has allowed bioRxiv to go from strength to strength. As of today, over 14,000 papers have been accepted by the preprint server, and growth appears to be exponential: the number of monthly submissions grew from more than 810 this March to more than 1,000 in July.

But perhaps the most interesting development of 2013 was the founding of the non-profit Center for Open Science. With funding from, amongst others, the philanthropic organisations The Laura and John Arnold Foundation and The Alfred P. Sloan Foundation, COS is building a range of services designed “to increase openness, integrity, and reproducibility of scientific research”.

Thursday, August 03, 2017

The State of Open Access: Some New Data

A preprint posted on PeerJ yesterday offers some new insight into the number of articles now available on an open-access basis. 

The new study is different to previous ones in a number of ways, not least because it includes data from users of Unpaywall, a browser plug-in that identifies papers that researchers are looking for, and then checks to see whether the papers are available for free anywhere on the Web. 

Unpaywall is based on oaDOIa tool that scours the web for open-access full-text versions of journal articles.

Both tools were developed by Impactstory, a non-profit focused on open-access issues in science. Two of the authors of the PeerJ preprint  Heather Piwowar and Jason Priem – founded Impactstory. They also wrote the Unpaywall and oaDOI software.

The paper – which is called The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles – reports that 28% of the scholarly literature (19 million articles) is now OA, and growing, and that for recent articles the percentage available as OA rises to 45%.

The study authors say they also found that OA articles receive 18% more citations than average. 

In addition, the authors report on what they describe as a previously under-discussed phenomenon of open access  Bronze OA. This refers to articles that are made free-to-read on the publishers website without an explicit open licence. 

Below I publish a Q&A with Heather Piwowar about the study. 

Note: my questions were based on an earlier version of the article I saw, and a couple of the quotes I cite were changed in the final version of the paper. Nevertheless, all the questions and the answers remain relevant and useful so I have not changed any of the questions.

The interview

RP: What is new and different about your study? Do you feel it is more accurate than previous studies that have sought to estimate how much of the literature is OA, or is it just another shot at trying to do that?

HP: Our study has a few important differences:

·       We look at a broader range of the literature than previous studies and go further back (to pre-1950 articles), we look at more articles (all of Crossref, not just all of Scopus or Web of Science – Crossref has twice the number of articles that Scopus has), and we take a larger sample than most other studies. That’s because we classify OA status algorithmically, rather than relying on manual classification. This allowed us to sample 300k articles, rather than a few hundred as many OA studies have done. So, our sample is more accurate than most; and more generalizable as well.

·       We undertook a more detailed categorization of OA. We looked not just at Green and Gold OA, but also Hybrid, and a new category we call Bronze OA. Many other studies (including the most comparable to ours, the European Commission report you mention below) do not bring out all these categories specifically. (I will say more on that below). Furthermore, we didn’t include Academic Social Networks. Mixing those with publisher-hosted free-to-read content makes the results less useful to policy makers.

·       Our data and our methods are open, for anyone to use and build upon. Again, this is a big difference from the Archambault et al. study (that is, the one commissioned by the European Commission) and we think it is an important difference.

·       We include data from Unpaywall users, which allows us to get a sense of how much of the literature is OA from the perspective of actual readers. Readers massively favour newer articles, for instance, which is good news because such articles are more likely to be OA. By sampling actual reader data, from people using an OA tool that anyone can install, we can report OA percentages that are more realistic and useful for many real-world policy issues.

RP: You estimate that at least 28% of the scholarly literature is open access today. OA advocates tend nowadays to cite the earlier European Commission report which, the EU claims, indicates that back in 2011 nearly 50% of papers were OA. Was the EU study an overestimate in your view, or has there been a step backwards?

HP: Their 50% estimate was of recent papers, and included papers posted to ResearchGate (RG) and Academia.edu as open access. Our 28% estimate is for all journal articles, going back to 1900 – everything with a DOI. We found 45% OA for recent articles, and that’s excluding RG and Academia. So, they are pretty similar estimates.

RP: In fact, you came up with a number of different percentages. Can you explain the differences between these figures, why it is important to make these distinctions, and what the implications of the different figures are?

HP: There are two summary percentages: 28% OA for all journal articles, and 47% OA for journal articles that people read. As I noted, people read more recent articles, and more recent articles are more likely to be OA, so it turns out that almost half of the papers people are interested in reading right now are actually OA. Which is really cool!

Actually, when you consider that we used automated methods that missed a bit of OA it is more than half, so the 47% is a lower bound.

RP: You coin a new definition of open access in your paper, what you call Bronze OA. Can you say something about Bronze OA and its implications? It seems to me, for instance, that a lot of papers (over half?) currently available as open access are vulnerable to losing their OA status. Is that right? If so, what can be done to mitigate the problem?

HP: Yes, we did think we were coining a new term. But this morning I learned we weren’t the first to use the term Bronze OA – that honour goes to Ged Ridgway, who posted the tweet below in 2014

I guess it’s a case of Great Minds Think Alike!

Our definition of Bronze OA is the same as Ged’s: articles made free-to-read on the publisher’s website, without an explicit open license. This includes Delayed OA and promotional material like newsworthy articles that the publishers have chosen to make free but not open.

It also includes a surprising number of articles (perhaps as much as half of the Bronze total, based on a very preliminary sample) from entirely free-to-read journals that are not listed in DOAJ and do not publish content under an open license. Opinions will differ on whether these are properly called “Gold OA” journals/articles; in the paper, we suggest they might be called “Dark Gold” (because they are hard to find in OA indexes) or “Hidden Gold.” We are keen to see more research on this. 

More research is also needed to understand the other characteristics of Bronze OA. Is it disproportionately non-peer-reviewed content (e.g. front-matter), as seems likely? How much of Bronze OA is also Delayed OA? How much Bronze is Promotional, and how transient is the free-to-read status of this content? How many Bronze articles are published in “hidden gold” journals that are not listed in the DOAJ? Why are these journals not defining an explicit license for their content, and are there effective ways to encourage them to do so?

This kind of follow-up research is needed before we can understand the risks associated with Bronze and what kind of mitigation would be helpful.

RP: You say in your paper, “About 7% of the literature (and 17% of the OA literature) is Green, and this number does not seem to be growing at the rate of Gold and Hybrid OA.” You also suspect that much of this green OA is “backfilling” repositories with older articles, which are generally viewed as being of less value. What happened to the OA dream articulated by Stevan Harnad in 1994, and what future do you predict for green OA going forward?

HP: First, I should clarify: our definition of Green OA for the purposes of the study is that a paper is in a repository and is not available for free on the publisher site. This is so we don’t double count articles as both Green and Gold (or Hybrid or Bronze) for our analysis.

We gave publisher-hosted locations the priority in our classifications because we suspect most people would rather read papers there. So, in our article when we say green OA isn’t growing, what we mean is that more recent papers that are only available in repositories are available as Green OA at roughly the same rate as older papers.

It is worth future study to understand this better. I have a suspicion: perhaps much of what would have been Green OA became Bronze and what we call “shadowed green” – where there is a copy in a repository and a freely available copy on the publisher’s site as well. I suspect publishers responded to funder mandates that require self-archiving by making the paper free on the publisher sites as well, in synchronized timing.

Specifically, Biomed doesn’t look like it has as much Green as I’d expect, given the success of the NIH mandate and the number of articles in PMC. We do know many biomed journals have Delayed OA policies, which we categorized as Bronze in our analysis. Did they implement these Delayed OA policies in response to the PMC mandates? Perhaps others already know this to be true... I haven’t had a chance to look it up. Anyway. I think the interplay between Green and Bronze is especially worth more exploration.

We do also report on all the articles that are deposited in repositories, Green plus shadowed green, in the article’s Appendices. We found the proportion of the literature that is deposited in repositories to be higher for recent publication years.

One final note: We actually changed the sentence that you quoted in the final version of our paper, because we were wrong to talk about “growing” as we did. Our study didn’t measure when articles were deposited in repositories, but just looked at their publication year. Other studies have demonstrated that people often upload papers from earlier years, a practice called backfilling.

I suppose in some ways these have less value, because they are read less often. That said, anyone who really needs a particular paper and doesn’t otherwise have access to it is surely happy to find it.

RP: You also looked at the so-called citation advantage and estimate that an OA article is likely to attract 18% more citations than average. The citation advantage is a controversial topic. I don’t want to appear too cynical, but is not the idea of trying to demonstrate a citation advantage more an advocacy tool than a meaningful notion. I note, for instance, that Academia.edu has claimed that posting papers to its network provides a 73% citation advantage. Surely the real point here is that if all papers were open access there would be no advantage to open access from a citation point of view?

HP: That’s true! And that’s the world I’d love to see – one where the citation playing field is flat, because everyone can read everything.

RP: What would you say were the implications of your study for the research community, for librarians, for publishers and for open access policies?

HP: For the research community: Install Unpaywall! You’ll be able to read half the literature for free. Self-archive your papers, or publish OA.

For OA/bibliometrics researchers: Build on our open data and code, let’s learn more about OA and where it’s going.

For librarians: Use this data to negotiate with publishers: Half the literature is free. Don’t pay full price for it.

For publishers: Half the literature is now free to read. That percentage is growing. You don’t need a weathervane to know which way the wind blows: long term, there’s no money in selling things that people can get for free. Flip your journals. Sell services to authors, not access to content – it’s an increasingly smart business decision, as well as the Right Thing To Do.

For open access policy makers: We need to understand more about Bronze. Bronze OA doesn’t safeguard a paper’s free-to-read status, and it isn’t licensed for reuse. This isn’t good enough for the noble and useful content that is Scholarly Research. Also: let’s accelerate the growth.

You didn’t ask about tool developers. An increasing number of people are making tools that they can integrate OA into. They should use the oaDOI service. Now that such a large chunk of the literature is free, there are a lot of really transformative things we can build and do – in terms of knowledge extraction, indexing, search, recommendation, machine learning etc.

RP: OA was at the beginning as much (in fact more) about affordability as about access (certainly from the perspective of librarians). I note the recently published analysis of the RCUK open access policy reports that the average APC paid by RCUK rose by 14% between 2014 and 2016, and that the increase was greater for those publishers below the top 10 (who are presumably focused on catching up with their larger competitors). Likewise, the various flipping deals we are seeing emerge are focused on no more than transferring costs from subscriptions to APCs, with no realistic expectation of prices falling in the future. If the research community could not afford the subscription system (which OA advocates have always maintained) how can it afford open access in the long-term?

HP: If the rising APCs are because small publishers are catching up with the leaders by raising prices, that won’t continue forever – they’ll catch up. Then it’ll work like other competitive marketplaces.

The main issue is freeing up the money that is currently spent on subscriptions. We think studies like this, and tools like Unpaywall, can be helpful in lowering subscription rates, and foregoing Big Deals, as libraries are increasingly doing.

RP: As you say, in your study you ignored social networking sites like Academia.edu and ResearchGate “in accordance with an emerging consensus from the OA community, and based largely on concerns about long-term persistence and copyright compliance.” And you also say, “The growing proportion of OA, along with its increased availability using tools like oaDOI and Unpaywall, may make toll-access publishing increasingly unprofitable, and encourage publishers to flip to Gold OA models.” I am wondering, however, if it is not more likely that sites like Academia.edu (which researchers much prefer to use than paying to publish or depositing in their repository) and Sci-Hub (which is said to contain most of the scientific literature now) will be the trigger that will finally force legacy publishers to flip their journals to open access, whatever one’s views on the copyright issues Would you agree?

HP: It won’t be any one trigger, but rather an increasingly inhospitable environment. Sci-Hub is a huge contributor to that, and Academic Social Networks are too. Unpaywall opens up another front: a best-practice, legal approach to bypassing paywalls that librarians and others can unabashedly recommend. It all combines to make it easier and more profitable for publishers to flip, and for the future to be OA.

RP: Thank you for answering my questions.