FragPipe v24 DDA and DIA Tutorial
=================================

This tutorial shows how to prepare FragPipe v24.0 output for FLiPPR, then
run FLiPPR on DDA or DIA label-free data. It assumes that FragPipe has already
searched and quantified the raw mass spectrometry files. FLiPPR starts from the
FragPipe output directory.

FLiPPR is built for LiP-MS experiments with an optional matched TrP
normalization experiment. In this guide:

- ``LiP`` means the limited-proteolysis experiment.
- ``TrP`` means the trypsin-only experiment used for protein-level
  normalization.
- A ``process`` is one control-vs-test comparison inside a study.

The examples use two conditions, ``CTRL`` and ``DRUG``, each with three
replicates.

Inputs FLiPPR expects
---------------------

FLiPPR reads different FragPipe output files depending on the acquisition mode.

For DDA, the FragPipe output directory must contain:

.. code-block:: text

    combined_ion.tsv
    combined_protein.tsv
    experiment_annotation.tsv

For DIA, the FragPipe output directory must contain:

.. code-block:: text

    ion.tsv
    dia-quant-output/report.pr_matrix.tsv
    dia-quant-output/report.pg_matrix.tsv
    experiment_annotation.tsv

The DDA files come from FragPipe's Philosopher and IonQuant reports. The DIA
matrix files come from the DIA-NN quantification step run by FragPipe. See the
`FragPipe output guide`_ for the full list of reports and column meanings.

Sample naming
-------------

FLiPPR expects replicate intensity columns that follow the sample names in
``experiment_annotation.tsv``. Use stable sample names with the replicate number
at the end:

.. code-block:: text

    CTRL_1
    CTRL_2
    CTRL_3
    DRUG_1
    DRUG_2
    DRUG_3

When calling ``Study.add_process()``, pass the condition prefix without the
replicate suffix:

.. code-block:: python

    study.add_process(
        pid="drug",
        lip_ctrl="CTRL",
        lip_test="DRUG",
        n_rep=3,
    )

This tells FLiPPR to look for ``CTRL_1 Intensity``, ``CTRL_2 Intensity``,
``CTRL_3 Intensity``, ``DRUG_1 Intensity``, and so on.

For uneven replicate counts, pass a tuple:

.. code-block:: python

    study.add_process("drug", "CTRL", "DRUG", n_rep=(3, 4))

For non-contiguous replicate numbers, pass explicit replicate IDs:

.. code-block:: python

    study.add_process("drug", "CTRL", "DRUG", n_rep=((1, 3, 4), (1, 2, 5)))

Install FLiPPR
--------------

Use ``uv`` for a clean, reproducible Python environment:

.. code-block:: bash

    uv venv
    uv pip install flippr

For a source checkout of this repository:

.. code-block:: bash

    uv sync

DDA workflow
------------

Use this path for DDA LFQ LiP-MS data quantified by FragPipe and IonQuant.

1. Configure FragPipe v24.0
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Download and launch FragPipe v24.0 from the `FragPipe releases`_ page. On
Windows, the installer bundles a Java runtime and Python. On Linux, FragPipe
v24.0 requires Java 11 or newer.

On the ``Config`` tab, confirm that FragPipe can find the tools it needs:
MSFragger, IonQuant, and the bundled or configured auxiliary tools. The official
FragPipe usage guide recommends starting from a built-in workflow and only then
customizing settings.

2. Choose a DDA LFQ workflow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On the ``Workflow`` tab, load a DDA label-free workflow appropriate for the
experiment, commonly ``LFQ-MBR`` or a closely related LFQ workflow.

For LiP-MS, the search settings matter. If the biological question depends on
semi-tryptic or non-tryptic cleavage products, make sure the MSFragger enzyme
and digestion settings can identify those peptides. Keep the LiP and TrP
workflows as similar as practical so normalization is interpretable.

Load all files that belong to the same FragPipe experiment together and give
the samples clear names:

.. code-block:: text

    CTRL_1.raw
    CTRL_2.raw
    CTRL_3.raw
    DRUG_1.raw
    DRUG_2.raw
    DRUG_3.raw

For Thermo files, the FragPipe documentation recommends converting ``.raw``
files to centroided ``.mzML`` with peak picking for many workflows. Direct raw
reading can work, but conversion makes runs easier to reproduce across systems.

3. Configure database and search settings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On the ``Database`` tab, select the FASTA database used for the experiment.
Use the same database for LiP and TrP runs. If you add contaminants or decoys,
keep that setup consistent across all output directories that will be compared
in FLiPPR.

On the MSFragger-related tabs, review:

- Enzyme and missed cleavage settings.
- Fixed and variable modifications.
- Precursor and fragment mass tolerances.
- Validation and FDR settings.

FLiPPR assumes the FragPipe reports have already been filtered and quantified.
It does not redo peptide-spectrum matching or protein inference.

4. Enable MS1 label-free quantification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On the ``Quant (MS1)`` tab, use IonQuant for label-free quantification.
FragPipe documents IonQuant as the default LFQ tool. For protein-level TrP
normalization in DDA, FLiPPR uses ``MaxLFQ Intensity`` by default from
``combined_protein.tsv``.

Recommended checks:

- Use MaxLFQ for protein quantification when available.
- Decide whether match-between-runs is appropriate for the design.
- Keep the MBR settings consistent between LiP and TrP analyses.
- Use the same sample naming pattern for every run.

5. Run FragPipe and inspect output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Run FragPipe from the ``Run`` tab. After completion, the LiP output directory
should look like this:

.. code-block:: text

    LiP_DDA_FragPipe/
    ├── combined_ion.tsv
    ├── combined_protein.tsv
    ├── experiment_annotation.tsv
    ├── log_*.txt
    └── ...

If you have matched TrP data, run it into a separate output directory:

.. code-block:: text

    TrP_DDA_FragPipe/
    ├── combined_ion.tsv
    ├── combined_protein.tsv
    ├── experiment_annotation.tsv
    ├── log_*.txt
    └── ...

Check ``experiment_annotation.tsv`` before running FLiPPR. The sample names
should match the condition and replicate naming pattern that you will pass to
``add_process()``.

6. Run FLiPPR on DDA LiP data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from pathlib import Path

    import flippr

    lip_dir = Path("LiP_DDA_FragPipe")

    study = flippr.Study(lip=lip_dir, method="dda")
    print(study.samples)

    study.add_process(
        pid="drug",
        lip_ctrl="CTRL",
        lip_test="DRUG",
        n_rep=3,
    )

    results = study.run()
    drug = results["drug"]

    drug.ion
    drug.modified_peptide
    drug.peptide
    drug.cut_site
    drug.protein_summary

7. Run FLiPPR with DDA TrP normalization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from pathlib import Path

    import flippr

    lip_dir = Path("LiP_DDA_FragPipe")
    trp_dir = Path("TrP_DDA_FragPipe")

    study = flippr.Study(lip=lip_dir, trp=trp_dir, method="dda")
    print(study.samples)

    study.add_process(
        pid="drug_norm",
        lip_ctrl="CTRL",
        lip_test="DRUG",
        n_rep=3,
        trp_ctrl="CTRL",
        trp_test="DRUG",
        trp_n_rep=3,
    )

    results = study.run()
    drug = results["drug_norm"]

    drug.ion
    drug.trp_protein
    drug.protein_summary

DIA workflow
------------

Use this path for DIA data processed in FragPipe v24.0 with DIA-NN
quantification.

1. Choose a DIA workflow in FragPipe
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

FragPipe documents two main DIA workflows:

- ``DIA_SpecLib_Quant`` uses MSFragger-DIA to build a spectral library, then
  quantifies with DIA-NN.
- ``DIA_DIA-Umpire_SpecLib_Quant`` uses DIA-Umpire to generate pseudo-MS/MS
  spectra, searches those spectra in DDA mode, then quantifies with DIA-NN.

For most direct DIA searches, start with ``DIA_SpecLib_Quant``. If the data
were acquired with overlapping or staggered windows, follow the FragPipe DIA
documentation and demultiplex to ``.mzML`` before analysis.

If you already have a spectral library and only need quantification, use the
``Quant (DIA)`` tab to run only the DIA-NN quantification step with that
library.

2. Load files and verify data types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On the ``Workflow`` tab, load the DIA files and check the inferred data type.
FragPipe guesses from file and folder names, but you should verify it.

Use the same sample naming pattern used for DDA:

.. code-block:: text

    CTRL_1
    CTRL_2
    CTRL_3
    DRUG_1
    DRUG_2
    DRUG_3

If you include separate DDA files for spectral library construction, make sure
those files are marked with the correct DDA data type. FragPipe notes that
pseudo-MS/MS files from DIA-Umpire should be designated as DDA data.

3. Configure spectral library and DIA-NN quantification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If spectral library generation is enabled, confirm that Python and EasyPQP are
available in FragPipe. The FragPipe DIA tutorial notes that Python with EasyPQP
is required for spectral library generation.

On ``Quant (DIA)``, keep DIA-NN enabled. The generated or supplied library will
be passed to DIA-NN for quantification. Review DIA-NN settings such as library
source, quantification strategy, MBR-like options, and protein inference
options according to the experimental design.

4. Run FragPipe and inspect output
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After a DIA run, FLiPPR expects this structure:

.. code-block:: text

    LiP_DIA_FragPipe/
    ├── ion.tsv
    ├── experiment_annotation.tsv
    ├── dia-quant-output/
    │   ├── report.pr_matrix.tsv
    │   └── report.pg_matrix.tsv
    ├── log_*.txt
    └── ...

The ``report.pr_matrix.tsv`` file contains precursor-level DIA-NN quantities.
FLiPPR combines those quantities with peptide/protein location metadata from
``ion.tsv``. The ``report.pg_matrix.tsv`` file is used for optional protein
normalization when a matched TrP DIA run is provided.

Some FragPipe/DIA-NN documentation and older workflows refer to a
``diann-output`` directory. FLiPPR currently looks for ``dia-quant-output``.
If your FragPipe v24 output uses a different folder name, keep the contents
unchanged and make the path match before loading it with FLiPPR.

5. Run FLiPPR on DIA LiP data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from pathlib import Path

    import flippr

    lip_dir = Path("LiP_DIA_FragPipe")

    study = flippr.Study(lip=lip_dir, method="dia")
    print(study.samples)

    study.add_process(
        pid="drug",
        lip_ctrl="CTRL",
        lip_test="DRUG",
        n_rep=3,
    )

    results = study.run()
    drug = results["drug"]

    drug.ion
    drug.modified_peptide
    drug.peptide
    drug.cut_site
    drug.protein_summary

6. Run FLiPPR with DIA TrP normalization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: python

    from pathlib import Path

    import flippr

    lip_dir = Path("LiP_DIA_FragPipe")
    trp_dir = Path("TrP_DIA_FragPipe")

    study = flippr.Study(lip=lip_dir, trp=trp_dir, method="dia")

    study.add_process(
        pid="drug_norm",
        lip_ctrl="CTRL",
        lip_test="DRUG",
        n_rep=3,
        trp_ctrl="CTRL",
        trp_test="DRUG",
        trp_n_rep=3,
    )

    results = study.run()
    drug = results["drug_norm"]

    drug.ion
    drug.trp_protein
    drug.protein_summary

Working with multiple comparisons
---------------------------------

A single ``Study`` can hold multiple comparisons as long as they use the same
FragPipe output directory.

.. code-block:: python

    study = flippr.Study(lip="LiP_DDA_FragPipe", trp="TrP_DDA_FragPipe", method="dda")

    study.add_process("low", "CTRL", "DRUG_LOW", n_rep=3, trp_ctrl="CTRL", trp_test="DRUG_LOW", trp_n_rep=3)
    study.add_process("high", "CTRL", "DRUG_HIGH", n_rep=3, trp_ctrl="CTRL", trp_test="DRUG_HIGH", trp_n_rep=3)

    results = study.run()

Exporting results
-----------------

FLiPPR returns Polars dataframes. Use any standard Polars writer:

.. code-block:: python

    result = results["drug"]

    result.ion.write_csv("drug_ion.csv")
    result.peptide.write_csv("drug_peptide.csv")
    result.cut_site.write_csv("drug_cut_site.csv")
    result.protein_summary.write_csv("drug_protein_summary.csv")

For larger studies, Parquet is usually faster and preserves types better:

.. code-block:: python

    result.ion.write_parquet("drug_ion.parquet")
    result.protein_summary.write_parquet("drug_protein_summary.parquet")

Important FLiPPR parameters
---------------------------

Global parameters live in ``flippr.rcParams``. Change them before calling
``study.run()``.

.. code-block:: python

    import flippr

    flippr.rcParams["ion.missing_intensity_thresh"] = 1
    flippr.rcParams["protein.fc_sig_thresh"] = 1.0
    flippr.rcParams["protein.pval_sig_thresh"] = 0.01
    flippr.rcParams["protein.adj_pval_sig_thresh"] = 0.05

Useful settings:

- ``ion.missing_intensity_thresh`` controls how many missing control/test
  replicate intensities are tolerated before an ion is filtered.
- ``ion.aon_impute_loc`` and ``ion.aon_impute_scale`` control all-or-none
  imputation values.
- ``trp_protein.intensity_value`` controls the DDA protein intensity column
  used for TrP normalization. The default is ``MaxLFQ Intensity``.
- ``protein.fc_sig_thresh``, ``protein.pval_sig_thresh``, and
  ``protein.adj_pval_sig_thresh`` control protein-summary significance counts.

Quality checks
--------------

Before trusting a run, check these items.

FragPipe output files:

- DDA output has ``combined_ion.tsv`` and ``combined_protein.tsv``.
- DIA output has ``ion.tsv`` and both DIA-NN matrices.
- ``experiment_annotation.tsv`` exists in every LiP and TrP output directory.
- Sample names match the ``CTRL_1`` / ``DRUG_1`` style expected by FLiPPR.

Replicates:

- ``n_rep=3`` means FLiPPR will look for ``CONDITION_1`` through
  ``CONDITION_3``.
- ``n_rep=(3, 4)`` means three control replicates and four test replicates.
- ``n_rep=((1, 3, 4), (1, 2, 5))`` means use explicit replicate labels.

Biology and design:

- LiP and TrP runs should use compatible FASTA, search, and quantification
  settings.
- TrP normalization should use matched biological conditions, not unrelated
  controls.
- MBR should be chosen with the experimental design in mind. For highly
  separated conditions, unrestricted transfer can add noise.

Troubleshooting
---------------

``FileNotFoundError`` when creating ``Study``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The output directory is missing one of the files FLiPPR expects. Check the file
tree against the DDA or DIA input lists above. For DIA, check the exact name of
the DIA-NN output directory.

``ColumnNotFoundError`` during ``study.run()``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The sample names or replicate numbers passed to ``add_process()`` do not match
the FragPipe intensity columns. Open ``experiment_annotation.tsv`` and the
header row of the relevant quantification table, then update ``lip_ctrl``,
``lip_test``, or ``n_rep``.

No rows after filtering
~~~~~~~~~~~~~~~~~~~~~~~

The missing-intensity filter may be too strict for the dataset, or the sample
names may point to the wrong columns. Check ``ion.missing_intensity_thresh`` and
verify the sample names.

Unexpected TrP normalization values
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For DDA, verify that ``combined_protein.tsv`` contains the intensity column
configured by ``flippr.rcParams["trp_protein.intensity_value"]``. For DIA,
FLiPPR uses DIA-NN protein-group intensities from ``report.pg_matrix.tsv``.

References
----------

- `FragPipe releases`_
- `FragPipe usage guide`_
- `FragPipe DIA tutorial`_
- `FragPipe output guide`_

.. _FragPipe releases: https://github.com/Nesvilab/FragPipe/releases
.. _FragPipe usage guide: https://fragpipe.nesvilab.org/docs/tutorial_fragpipe.html
.. _FragPipe DIA tutorial: https://fragpipe.nesvilab.org/docs/tutorial_DIA.html
.. _FragPipe output guide: https://fragpipe.nesvilab.org/docs/tutorial_fragpipe_outputs.html