[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Outreachy project: complete workflows of tools



Hi all,

On 2021-03-05 11:09, Tassia Camoes Araujo wrote:
> [...]
> Since deadline is approaching quickly, I suggest we move this off-list
> and start writing in a wiki or pad.

Here is my first draft, pasted below for your reference, but you can
edit the text directly in this pad:
https://storm.debian.net/shared/OCYsOOEqJ5-CcfjVxIiye-ex-4LheegyP1pkIFghMKa

Help needed to write the Intern tasks section. Also, please check if you
have better scientific articles to recommend and list a few tools to be
used in the starter tasks.

Please help shaping this proposal so it ***really*** makes sense and
attract interns ;-)

Cheers,

Tassia.

--
Project title
Validation of Debian Med tools for complete bioinformatics workflows 

Description
Debian Med is a "Debian Pure Blend" which aims to develop Debian into an
operating system particularly well fit for medical practice and
biomedical
research. Data analysis in this field is typically implemented as a
workflow or
pipeline, with multiple tools executed as a chain, each processing input
and
producing output for the next tool in the chain.

This internship will focus on validating workflows that can be fully
executed
within Debian. Deliverables will be educational materials, such as video
or
written tutorials, showcasing how the various tools can be chained
together
in order to solve a particular biological problem.

An example of a workflow would be an RNA-seq workflow that executes
Trimmomatic, FastQC, salmon, and the R script using a single command
(extracted from [1]):

- FastQC, a program that checks NGS reads for common quality issues
- Trimmomatic, a program for cleaning NGS reads
- salmon, a program for estimating transcript abundance from NGS reads
- custom R script that uses DESeq2 to perform differential expression
analysis

The intern will be free to choose tools/workflows of interest, and
guidance
will be given in the choice of a relevant one for the research
community. An
useful reading to start, particularly if you are not in the field of
bioinformatics, is an article on open source tools and toolkits for
bioinformatics [2]. Typical workflows are described in numerous
scientific
peer reviewed works, such as to decipher transcriptomic data from
vitamin D
studies [3] and for the evaluation of RT-qPCR primer specificity [4].

(*** check if you have better articles to recommend ***)

[1]
https://bioinformatics.stackexchange.com/questions/7347/what-is-the-difference-between-a-bioinformatics-pipeline-and-workflow
[2]
https://www.researchgate.net/publication/6888681_Open_source_tools_and_toolkits_for_bioinformatics_Significance_and_where_are_we
[3]
https://www.sciencedirect.com/science/article/pii/S0960076018306034#bib0030
[4] https://pubmed.ncbi.nlm.nih.gov/31945455/


How can applicants make a contribution (starter tasks)
(*** More ideas? Suggested: 10-20 small and 5-10 medium-sized tasks ***)

1. Create a short written tutorial using a particular bioinformatic tool
(*** please suggest a few tools ***)

2. Translate the written tutorial in a video piece

3. Identify bioinformatics workflows of your own interest (or of a
researcher you know)

4. Select and read a peer-reviewed article describing an workflow and
extract tools used

5. Classify tools from a particular workflow as: FLOSS in Debian, FLOSS
not yet in Debian, proprietary (any known alternative?)

6. Gather sample data to be used in demonstrations of workflows of
interest


Applicant skills (description - impact on selection - experience Level)

- Writing skills - Required - Experimented
- Video editing skills - Preferred - Concepts
- Debian system knowledge - Preferred - Concepts
- Use of bioinformatics tools - Preferred - Concepts


Intern tasks

(*** HELP NEEDED ***)


Reply to: