Skip to contents

CimpleG (Simple CpG signatures)

  • CimpleG tries to find the CpGs that best classify a cell-type given a train dataset
  • It also enables you to perform cell-type deconvolution in a couple of easy steps
  • It can use beta or M values
  • Here we show how easy it is to generate a signatures

Installation

If you haven’t installed CimpleG, you can find the instructions to do so here. However it should be as simple as:

if(!require("CimpleG")) devtools::install_github("costalab/CimpleG")

Loading package

We load the CimpleG package.

library("CimpleG")
#> --------------------------
#> CimpleG version 0.0.5.9001
#> --------------------------

Loading data

In this tutorial, we will use a small dataset with just 409 samples and 1000 CpGs. We will also use a table with metadata regarding these samples. This dataset comes included with CimpleG. You can read more about it here.

# load data
data(train_data)
data(train_targets)

Running CimpleG

Running CimpleG can be quite simple. You just need to run the CimpleG function with a few parameters.

# run CimpleG
cimpleg_result <- CimpleG(
  train_data,
  train_targets,
  target_columns = c("blood_cells","hepatocytes"),
  train_only = TRUE
)
#> Training for target 'blood_cells' with 'CimpleG' has finished.: 2.655 sec elapsed
#> Training for target 'hepatocytes' with 'CimpleG' has finished.: 0.482 sec elapsed

Here we are generating signatures to find leukocytes and hepatocytes.

Plotting CimpleG CpG signature

We can quickly visualize how our signature is able to separate the data.


signature_plot(
  cimpleg_result,
  train_data,
  train_targets,
  sample_id_column = "gsm",
  true_label_column = "cell_type"
)
#> $data
#> # A tibble: 818 × 5
#> # Groups:   sig_set [2]
#>    sample_id  true_label signatures value sig_set    
#>    <chr>      <chr>      <chr>      <dbl> <chr>      
#>  1 GSM1415516 adipocytes cg04785083 0.922 blood_cells
#>  2 GSM1415516 adipocytes cg02258444 0.938 hepatocytes
#>  3 GSM1415518 adipocytes cg04785083 0.932 blood_cells
#>  4 GSM1415518 adipocytes cg02258444 0.912 hepatocytes
#>  5 GSM1415520 adipocytes cg04785083 0.923 blood_cells
#>  6 GSM1415520 adipocytes cg02258444 0.907 hepatocytes
#>  7 GSM1415522 adipocytes cg04785083 0.936 blood_cells
#>  8 GSM1415522 adipocytes cg02258444 0.908 hepatocytes
#>  9 GSM1415526 adipocytes cg04785083 0.938 blood_cells
#> 10 GSM1415526 adipocytes cg02258444 0.905 hepatocytes
#> # ℹ 808 more rows
#> 
#> $plot