Help Cochrane Crowd with the new clinicaltrials.gov task!

Cochrane Crowd needs your help to identify randomised controlled trials (RCT) from ClinicalTrials.gov! Today Cochrane Crowd in partnership with Cochrane’s Centralised Search Service launches a new identification task, called ClinicalTrials.Gov identification (or CT ID for short).

Don’t know what Cochrane Crowd is? It’s Cochrane’s citizen science platform. Anyone can join Cochrane Crowd’s collaborative volunteer effort to help categorise and summarise healthcare evidence so that we can make better healthcare decisions. Take a look at a 2-minute introductory video here.

Read on for the low-down on the new task from Anna Noel-Storr, Co-Lead of Cochrane Crowd.

What is this new task all about?

Cochrane Crowd’s main task is the randomised trial (RCT) identification task, where the Crowd identifies reports of RCTs or quasi-RCTs from centralised searches, for uploading to the Cochrane Central Register of Controlled Trials (CENTRAL). In doing this we are helping reviewers and researchers from around the world find the evidence they need. Until now, our amazing community has been working on records identified from bibliographic databases, such as Embase and PubMed. For this new task, the records are from clinicaltrials.gov, the largest registry of publicly and privately supported clinical studies of human participants conducted around the world.

Why is this task important?

The ClinicalTrials.gov registry contains studies that may not appear elsewhere. All clinical trials should be registered prospectively in a registry such as ClinicalTrials.gov, and all should make their results available and accessible once the trial is completed. The vast majority of trials so far identified by Cochrane Crowd are trials that have been written up in a publication. By identifying trials registered in ClinicalTrials.gov we will now also routinely identify: planned, ongoing and completed yet unpublished trials. Around a half of trials registered on ClinicalTrials.gov are RCTs. That’s a lot of trials! By running them through the Crowd (and the machine – more on that in a moment) and depositing them in CENTRAL, we hope that the Cochrane reviewers and others will ultimately be able to easily discover planned, in progress, unpublished and published RCTs from one place.

What will the process be?

Each month, all new trials that have been registered on ClinicalTrials.gov will be first run through a machine classifier that we have built. This classifier has been built based on a gold standard data set developed by the Crowd and the Centralised Search Service teams. When a record is run through the classifier it receives a score out of 100 representing the likelihood that the record is describing an RCT. For example, if a record gets a score of 99, then the machine is 99% sure that the record is describing an RCT. We’ve been working hard to determine appropriate cut points where we can be confident that the machine has got it right. After all, we don’t want to use up valuable human effort if the machine is reliable. We’ve been able to establish two cut points: one at the lower end of the spectrum and one at the upper end. The records at either end will go directly into the ‘bin’ or to CENTRAL respectively. The records that fall between these two points will go to Cochrane Crowd.

What do I need to do that is different from the usual RCT ID task?

The task is exactly the same in that you are classifying records as being RCTs, or not. But the records look a little different. To help you get used to the new format, we’ve set up a training module with 20 practice records that you’ll need to do before you start screening live records. It’s worth also just saying that with the ClinicalTrials.gov task there is a higher prevalence of RCTs so if you enjoy that moment of finding an RCT (and let’s face it, who doesn’t?), then you might find this task even more enjoyable than the standard task. We also think it’s a great one for beginners as the records are structured nicely.

How many records are there?

There are around 100,000 records currently ‘in the pot’. Once we are through the backlog, we anticipate that there will be around 2000 per month. We’re aiming to have the backlog cleared by the end of the year. I feel a “challenge” coming on!

How long will this task be posted for?

For quite a while. The more human generated data we can amass, the more training data we’ll have to feed into the machine - so one day it might be as smart as the Crowd! But for now this is a task for people and it will be on the platform for the foreseeable future.

When can I start?

Right now! Go and make a nice cup of tea, log in as usual and you will see the CT.Gov identification task at the top of your screen.

Can I still work on the usual RCT identification task?

Yes you can keep working on that task as usual. You could alternate between the two. Or you could just concentrate on the CT ID task. It’s entirely up to you.

Twitter chat

If you’re a twitterer, we’ll be using #CrowdCTID.

Who can I contact if I have any questions or queries?

You can either contact me, Anna, (anna.noel-storr@rdm.ox.ac.uk) or my brilliant colleague, Emily (crowd@cochrane.org) and we’ll try and get back to you as quickly as possible.

28 August 2017