SlideShare a Scribd company logo
1 of 33
Download to read offline
Tracking Data Reuse
Motivations, Methods, and Obstacles

                 Heather	
  Piwowar
     DataONE	
  postdoc	
  with	
  NESCent	
  and	
  Dryad
                   @researchremix	
  

                  IASSIST2011	
  #iassist
http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm
http://www.flickr.com/photos/jsmjr/62443357/
http://www.flickr.com/photos/camilleharrington/3587294608/
http://www.flickr.com/photos/rkuhnau/3318245976/
http://www.flickr.com/photos/conformpdx/1796399674/
http://www.flickr.com/photos/rkuhnau/3317418699/
http://www.flickr.com/photos/zemlinki/261617721/
http://www.flickr.com/photos/tracenmatt/3020786491/
http://www.flickr.com/photos/the-o/2078239333/
?
    http://www.flickr.com/photos/ryanr/142455033/
http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png
http://www.flickr.com/photos/archeon/2941655917/
IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
In 2009, 116 articles cited ORNL DAAC data.

Finding these articles took 70-80 hours

across at least 12 resources
all chosen from a deep understanding
of this specific research domain

    then the full text of all the hits were
             manually reviewed
                                  Valerie Enriquez interview with James Kidder
                   http://openwetware.org/wiki/DataONE:Notebook/Reuse_of_repository_data
How	
  to	
  iden9fy	
  Dataset	
  Reuse	
  in	
  the	
  published	
  literature
                                                                                                                                                                                                                                                                     This	
  cita2on	
  paCern	
  (dataset	
  
                                                                                                                                                                                                                                                                     DOI/ID	
  in	
  references	
  sec2on)	
  is	
  
                                                                                                                                                                                                                                                                     used	
  almost	
  exclusively	
  for	
  
                                   dataset	
  has	
  an	
  iden2fier?                                 with	
  dataset	
  unique	
  ID                                                     search	
  in	
  reference	
                                                 dataset	
  reuse.	
  	
  
                                    (DOI,	
  url,	
  accession	
  #)                                                                                                                    sec2ons	
  	
  of	
  all	
  papers                                           Manual	
  disambigua2on	
  not	
  
                                                                                                                                                                                                                                                                     required:	
  	
  can	
  be	
  automated	
  
                                      IDs	
  are	
  difficult	
  to	
                                                                                    DOI/ID	
  reference	
  search	
  possible	
  in	
  full-­‐text	
  portals	
  like	
                           pending	
  API	
  support.
                                      unambiguously	
  iden2fy	
  in	
                                                                                 PubMed	
  Central	
  and	
  HighWire	
  Press,	
  however	
  portal	
  
                                      full	
  text	
  	
  unless	
  they	
  have	
  a 	
                                                               coverage	
  is	
  limited	
  and	
  search	
  is	
  not	
  restricted	
  to	
                                 Does	
  not	
  require	
  access	
  to	
  
                                      unique	
  paCern	
  (DOI)	
  or	
                                                                                references	
  sec2on.                                                                                         full-­‐text
                                      unusual	
  prefix	
  or	
  suffix.                                with	
  dataset	
  unique	
  
                                                                                                                                                         DOI/ID	
  search	
  works	
  in	
  Google	
  Scholar,	
  but	
  scope	
  is	
                               This	
  cita2on	
  paCern	
  is	
  currently	
  
                                                                                                     ID
                                                                                                                                                         poorly	
  defined,	
  results	
  are	
  messy.                                                               rare

                                                                                                                                                                                                                                                                     This	
  cita2on	
  paCern	
  is	
  difficult	
  
                                                                                                                                                         DOI/ID	
  search	
  not	
  supported	
  by	
  ISI	
  Web	
  of	
  Science	
  or	
                           to	
  track	
  with	
  exis2ng	
  tool	
  
                                                                                                                                                         Scopus                                                                                                      limita2ons



                                                                                             with	
  (submi-er	
  surname	
  AND	
  
                                                                                             repository	
  name),	
  
publicly	
                     dataset	
  submission	
  record	
  has	
                      and	
  also                                                                                                                                                                 This	
  cita2on	
  paCern	
  
archived	
                      submiCer	
  name	
  or	
  dataset	
                          (dataset	
  9tle	
  AND                                 search	
  in	
  full	
  text	
  of	
  all 	
                 sort	
  hits	
  to	
  disambiguate	
                   (accession	
  numbers	
  in	
  full	
  
dataset                                       2tle?                                          	
  repository	
  name)                                            papers                                             reuse	
  from	
  submission                           text)	
  is	
  very	
  common	
  in	
  
                                                                                                                                                                                                                                                                         some	
  subdisciplines,	
  so	
  
                                   Names	
  and	
  2tles	
  are	
  messy	
                                                                                                                                     Disambigua2on	
  is	
  2me	
                              probably	
  finds	
  most	
  
                                                                                                                                                    Requires	
  ability	
  to	
  query	
  
                                   iden2fiers                                                                                                                                                                   consuming                                                 reuses.
                                                                                                                                                    full	
  text	
  across	
  all	
  
                                                                                                                                                    literature	
  that	
  may	
                                Requires	
  access	
  to	
  full	
  text	
  of	
  
                                                                                                with	
  (first	
  author	
  surname	
                contain	
  reuse                                           search	
  hits	
  for	
  sor2ng
                                                                                                AND	
  repository	
  name)




                                                                                                                                                                                                                sort	
  hits	
  to	
  disambiguate	
  
                           dataset	
  submission	
  record	
  men2ons	
                                                      gather	
  papers	
  that	
  cite	
  the	
  data 	
                                                                                          This	
  cita2on	
  paCern	
  
                                                                                                with	
  data	
                                                                                                      reuse	
  from	
  other	
  
                            data	
  collec2on	
  ar2cle	
  publica2on?                                                                  collec2on	
  paper                                                                                                               (cita2on	
  to	
  data	
  crea2on	
  
                                                                                                collec2on	
  ar2cle’s	
                                                                                              cita2on	
  contexts
                                                                                                                                                                                                                                                                         paper)	
  is	
  very	
  common	
  in	
  
                                                                                                journal,	
  volume,	
                                                                                            Disambigua2on	
  is	
  2me	
                            some	
  subdisciplines,	
  so	
  
                                                                                                page,	
  etc.                   Cita2on	
  history	
  export	
  is	
  2me	
                                                                                              probably	
  finds	
  most	
  reuses.
                           Link	
  to	
  data	
  collec2on	
  paper	
  oVen	
                                                                                                                                    consuming:	
  most	
  cita2ons	
  are	
  
                                                                                                                                consuming:	
  	
  automa2on	
  not	
  
                           missing	
  from	
  dataset	
  submission	
  record,	
                                                                                                                                 not	
  in	
  the	
  context	
  of	
  reuse
                                                                                                                                supported.
                           especially	
  when	
  dataset	
  submission	
  
                           predates	
  ar2cle	
  publica2on.
                                                                                                                                Only	
  finds	
  cita2ons	
  indexed	
  by	
                                     Requires	
  access	
  to	
  full	
  text	
  of	
  
                                                                                                                                cita2on	
  databases                                                            search	
  hits	
  for	
  sor2ng




      This	
  flow	
  s2ll	
  misses	
  aCribu2ons	
  embedded	
  in	
  supplementary	
  informa2on,	
  reuses	
  
      aCributed	
  through	
  a	
  query	
  descrip2on,	
  etc.
                                                                                                                                                                                                                                                                        Heather	
  Piwowar,	
  v1.0,	
  CC-­‐BY
IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
10 * 100 = 1000
publication-
based datasets
deposited in
   2005
IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
1. following citations to the
paper that describes the data
   collection, then filtering.
IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
2. searching for accession
numbers, urls, and DOIs in
         full text
IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
http://api.plos.org/2011/05/31/announcing_the_plos_search_api/
2005 long time ago

biomedicine familiar, also very
dominant

search interfaces not well designed
for this task

helpdesks are very helpful
stay tuned for results
poster at ASIS&T, SIGUSE
I post my data, code, and statistical scripts:
http://researchremix.org
Share yours too!
-> Open Notebook Science


                         http://www.flickr.com/photos/myklroventine/892446624/
https://notebooks.dataone.org/tracking1000datasets/
thank you
Todd Vision,
  Estephanie Sta Maria
  Jonathan Carlson
  Dryad and DataONE teams
The open science online community and those who
  release their articles, datasets and photos openly

More Related Content

More from Heather Piwowar

How to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHow to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHeather Piwowar
 
Intro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseIntro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseHeather Piwowar
 
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 The Future of OA: 
The Impact of Open Access on Readership and Subscription ... The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...Heather Piwowar
 
The time has come to talk of... who should own scholarly infrastructure?
 The time has come to talk of... who should own scholarly infrastructure? The time has come to talk of... who should own scholarly infrastructure?
The time has come to talk of... who should own scholarly infrastructure?Heather Piwowar
 
What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...Heather Piwowar
 
Data science needs Data and lots of it
Data science needs Data and lots of itData science needs Data and lots of it
Data science needs Data and lots of itHeather Piwowar
 
Impactstory OA week 2017
Impactstory OA week 2017Impactstory OA week 2017
Impactstory OA week 2017Heather Piwowar
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedHeather Piwowar
 
What's your Impactstory?
What's your Impactstory?What's your Impactstory?
What's your Impactstory?Heather Piwowar
 
capturing the impact of software AAS 2017
capturing the impact of software AAS 2017capturing the impact of software AAS 2017
capturing the impact of software AAS 2017Heather Piwowar
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedHeather Piwowar
 
Building Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipBuilding Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipHeather Piwowar
 
Right time, right place, to change the world
Right time, right place, to change the worldRight time, right place, to change the world
Right time, right place, to change the worldHeather Piwowar
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset useHeather Piwowar
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our dataHeather Piwowar
 
Libraries empowering scholars (and scholarly communication) through #altmetrics
Libraries empowering scholars (and scholarly communication) through #altmetricsLibraries empowering scholars (and scholarly communication) through #altmetrics
Libraries empowering scholars (and scholarly communication) through #altmetricsHeather Piwowar
 
AAAS 2012: Data about the costs and benefits of Open Research DAta
AAAS 2012: Data about the costs and benefits of Open Research DAtaAAAS 2012: Data about the costs and benefits of Open Research DAta
AAAS 2012: Data about the costs and benefits of Open Research DAtaHeather Piwowar
 

More from Heather Piwowar (20)

Unsub Lightning Talk
Unsub Lightning TalkUnsub Lightning Talk
Unsub Lightning Talk
 
How to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your UniversityHow to Calculate OA APC Spend for Your University
How to Calculate OA APC Spend for Your University
 
Intro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid UseIntro to Managing Serials with Net Cost per Paid Use
Intro to Managing Serials with Net Cost per Paid Use
 
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 The Future of OA: 
The Impact of Open Access on Readership and Subscription ... The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
The Future of OA: 
The Impact of Open Access on Readership and Subscription ...
 
The time has come to talk of... who should own scholarly infrastructure?
 The time has come to talk of... who should own scholarly infrastructure? The time has come to talk of... who should own scholarly infrastructure?
The time has come to talk of... who should own scholarly infrastructure?
 
What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...What kinds of open have 
made a difference in scholarly communication infrast...
What kinds of open have 
made a difference in scholarly communication infrast...
 
Data science needs Data and lots of it
Data science needs Data and lots of itData science needs Data and lots of it
Data science needs Data and lots of it
 
Oadoi and libraries
Oadoi and librariesOadoi and libraries
Oadoi and libraries
 
Impactstory OA week 2017
Impactstory OA week 2017Impactstory OA week 2017
Impactstory OA week 2017
 
Paperbuzz sneak peek
Paperbuzz sneak peekPaperbuzz sneak peek
Paperbuzz sneak peek
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learned
 
What's your Impactstory?
What's your Impactstory?What's your Impactstory?
What's your Impactstory?
 
capturing the impact of software AAS 2017
capturing the impact of software AAS 2017capturing the impact of software AAS 2017
capturing the impact of software AAS 2017
 
Software-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learnedSoftware-Native metrics: Depsy lessons learned
Software-Native metrics: Depsy lessons learned
 
Building Skyscrapers with our Scholarship
Building Skyscrapers with our ScholarshipBuilding Skyscrapers with our Scholarship
Building Skyscrapers with our Scholarship
 
Right time, right place, to change the world
Right time, right place, to change the worldRight time, right place, to change the world
Right time, right place, to change the world
 
No more waiting! Tools that work Today to reveal dataset use
No more waiting!  Tools that work Today to reveal dataset useNo more waiting!  Tools that work Today to reveal dataset use
No more waiting! Tools that work Today to reveal dataset use
 
Analyzing data about our data
Analyzing data about our dataAnalyzing data about our data
Analyzing data about our data
 
Libraries empowering scholars (and scholarly communication) through #altmetrics
Libraries empowering scholars (and scholarly communication) through #altmetricsLibraries empowering scholars (and scholarly communication) through #altmetrics
Libraries empowering scholars (and scholarly communication) through #altmetrics
 
AAAS 2012: Data about the costs and benefits of Open Research DAta
AAAS 2012: Data about the costs and benefits of Open Research DAtaAAAS 2012: Data about the costs and benefits of Open Research DAta
AAAS 2012: Data about the costs and benefits of Open Research DAta
 

Recently uploaded

3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptxmary850239
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17Celine George
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice documentXsasf Sfdfasd
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxSaurabhParmar42
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationMJDuyan
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesCeline George
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17Celine George
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxAditiChauhan701637
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxKatherine Villaluna
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxDr. Santhosh Kumar. N
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 

Recently uploaded (20)

3.21.24 The Origins of Black Power.pptx
3.21.24  The Origins of Black Power.pptx3.21.24  The Origins of Black Power.pptx
3.21.24 The Origins of Black Power.pptx
 
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quizPrelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17How to Use api.constrains ( ) in Odoo 17
How to Use api.constrains ( ) in Odoo 17
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice document
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptx
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive Education
 
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 SalesHow to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
How to Solve Singleton Error in the Odoo 17
How to Solve Singleton Error in the  Odoo 17How to Solve Singleton Error in the  Odoo 17
How to Solve Singleton Error in the Odoo 17
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
In - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptxIn - Vivo and In - Vitro Correlation.pptx
In - Vivo and In - Vitro Correlation.pptx
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
Practical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptxPractical Research 1 Lesson 9 Scope and delimitation.pptx
Practical Research 1 Lesson 9 Scope and delimitation.pptx
 
M-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptxM-2- General Reactions of amino acids.pptx
M-2- General Reactions of amino acids.pptx
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 

IASSIST 2011 presentation: Tracking Data Reuse Motivations, Methods, and Obstacles

  • 1. Tracking Data Reuse Motivations, Methods, and Obstacles Heather  Piwowar DataONE  postdoc  with  NESCent  and  Dryad @researchremix   IASSIST2011  #iassist
  • 11. ? http://www.flickr.com/photos/ryanr/142455033/
  • 16. In 2009, 116 articles cited ORNL DAAC data. Finding these articles took 70-80 hours across at least 12 resources all chosen from a deep understanding of this specific research domain then the full text of all the hits were manually reviewed Valerie Enriquez interview with James Kidder http://openwetware.org/wiki/DataONE:Notebook/Reuse_of_repository_data
  • 17. How  to  iden9fy  Dataset  Reuse  in  the  published  literature This  cita2on  paCern  (dataset   DOI/ID  in  references  sec2on)  is   used  almost  exclusively  for   dataset  has  an  iden2fier? with  dataset  unique  ID search  in  reference   dataset  reuse.     (DOI,  url,  accession  #) sec2ons    of  all  papers Manual  disambigua2on  not   required:    can  be  automated   IDs  are  difficult  to   DOI/ID  reference  search  possible  in  full-­‐text  portals  like   pending  API  support. unambiguously  iden2fy  in   PubMed  Central  and  HighWire  Press,  however  portal   full  text    unless  they  have  a   coverage  is  limited  and  search  is  not  restricted  to   Does  not  require  access  to   unique  paCern  (DOI)  or   references  sec2on. full-­‐text unusual  prefix  or  suffix. with  dataset  unique   DOI/ID  search  works  in  Google  Scholar,  but  scope  is   This  cita2on  paCern  is  currently   ID poorly  defined,  results  are  messy. rare This  cita2on  paCern  is  difficult   DOI/ID  search  not  supported  by  ISI  Web  of  Science  or   to  track  with  exis2ng  tool   Scopus limita2ons with  (submi-er  surname  AND   repository  name),   publicly   dataset  submission  record  has   and  also This  cita2on  paCern   archived   submiCer  name  or  dataset   (dataset  9tle  AND search  in  full  text  of  all   sort  hits  to  disambiguate   (accession  numbers  in  full   dataset 2tle?  repository  name) papers reuse  from  submission text)  is  very  common  in   some  subdisciplines,  so   Names  and  2tles  are  messy   Disambigua2on  is  2me   probably  finds  most   Requires  ability  to  query   iden2fiers consuming reuses. full  text  across  all   literature  that  may   Requires  access  to  full  text  of   with  (first  author  surname   contain  reuse search  hits  for  sor2ng AND  repository  name) sort  hits  to  disambiguate   dataset  submission  record  men2ons   gather  papers  that  cite  the  data   This  cita2on  paCern   with  data   reuse  from  other   data  collec2on  ar2cle  publica2on? collec2on  paper (cita2on  to  data  crea2on   collec2on  ar2cle’s   cita2on  contexts paper)  is  very  common  in   journal,  volume,   Disambigua2on  is  2me   some  subdisciplines,  so   page,  etc. Cita2on  history  export  is  2me   probably  finds  most  reuses. Link  to  data  collec2on  paper  oVen   consuming:  most  cita2ons  are   consuming:    automa2on  not   missing  from  dataset  submission  record,   not  in  the  context  of  reuse supported. especially  when  dataset  submission   predates  ar2cle  publica2on. Only  finds  cita2ons  indexed  by   Requires  access  to  full  text  of   cita2on  databases search  hits  for  sor2ng This  flow  s2ll  misses  aCribu2ons  embedded  in  supplementary  informa2on,  reuses   aCributed  through  a  query  descrip2on,  etc. Heather  Piwowar,  v1.0,  CC-­‐BY
  • 19. 10 * 100 = 1000
  • 21. deposited in 2005
  • 24. 1. following citations to the paper that describes the data collection, then filtering.
  • 26. 2. searching for accession numbers, urls, and DOIs in full text
  • 29. 2005 long time ago biomedicine familiar, also very dominant search interfaces not well designed for this task helpdesks are very helpful
  • 30. stay tuned for results poster at ASIS&T, SIGUSE
  • 31. I post my data, code, and statistical scripts: http://researchremix.org Share yours too! -> Open Notebook Science http://www.flickr.com/photos/myklroventine/892446624/
  • 33. thank you Todd Vision, Estephanie Sta Maria Jonathan Carlson Dryad and DataONE teams The open science online community and those who release their articles, datasets and photos openly