十年的生信

Perl 传递参数写法

发表于2019年7月5日由daizao

1 类似着这样的写法

#!/usr/bin/perl -w
use strict;
use warnings;

my $usage=<<USAGE;
Usage:
    perl $0 inputfile
USAGE
if(@ARGV==0){die $usage};

my $file=$ARGV[0];

2 类似这样的写法

#!/usr/bin/perl -w
use strict;
use warnings;
use Getopt::Long;
use POSIX;


my ($input,$output_dir,$seq_num,$file_num,$help,@libfiles);
GetOptions(
    "i:s" => \$input,
    "o:s" => \$output_dir,
    "n:i" => \$seq_num,
    "m:i" => \$file_num,
    "h:s" => \$help,
    "library=s" => \@libfiles
);

die `pod2text $0` if ((!$input) or (!$output_dir));

#帮助文档
=head1 Description

    This script is used to split fasta file, which is too large with thousangs of sequences;

=head1 Usage

    $0 -i <input> -o <outpu_dir> [-n <seq_num_per_file>] [-m <output_file_num>] -h --library lib/stdlib --library lib/extlib

=head1 Parameters

    -i  [str]   Input raw fasta file
    -o  [str]   Output file to which directory
    -n  [int]   Sequence number perl file, alternate chose paramerter 
    -m  [int]   Output file number (default:100)
    -h  [str]   Help manual
    --library   [dir]   Directory with multiple values

=cut

# !     代表否定
# s 代表字符串
# i     代表整型
# f     代表浮点数
#--library 用于传多个参数


print $input,"\n";
print $output_dir,"\n";
print $seq_num,"\n";
print $file_num,"\n";
@libfiles = split(/,/,join(',',@libfiles));
for my $i (0..$#libfiles){
    print $libfiles[$i],"\n";
}

参考：
perl进阶笔记
 perlcoc

发表在 Perl | 留下评论

生信技能树—单细胞R考核题

发表于2019年7月1日由daizao

rownames(df) <- df$LETTERS
dz <- df[-1]

for (i in 1:(ncol(dz))){
    for (j in 1:nrow(dz)){
        if(dz[j,i] < 0){
            dz[j,i] <- 0
        }
    }
}

a <- data.frame()
for (i in 1:nrow(dz)){
    if (sum(dz[i,]) > 0){
        a <- rbind(a,dz[i,])
    }
}
head(a)
#a即为目标结果

考核题地址

发表在 R | 留下评论

perl one-line

发表于2019年6月29日由daizao

# 直接看图片，更好理解
-a 自动分隔模式，用空格分隔$并保存在@F中，也就是@F=split //, $
-F 指定-a的分隔符
-l 对输入的内容进行自动chomp，对输出的内容自动加换行符
-n 相当于while(<>)
-e 执行命令，也就是脚本
-p 自动循环+输出，也就是while(<>){命令（脚本）; print;}

cp /etc/passwd .
perl -pe 's/sshd/SSHD/g' passwd
perl -F':' -alne '/sshd/ && print $F[0]' passwd  | sed 's/sshd/SSHD/g'

grep -v "^$" passwd  | perl -alne 'print $a++." ".$_'
#类似于下面的这个
perl -e 'while(<>){chomp($_);if ($_ =~ /^\S/){print $a++." ".$_."\n"}}' passwd
#或
perl -ne 'print $a++." ".$_ if /\S/' passwd
#perl -e 'while(<>){if ($_ =~ /^\s/){print a."\n"}}' passwd

#输出匹配行的计数
perl -lne '$a++ if /^s/; END {print $a+0}' passwd

perl -MList::Util=sum -F':' -alne 'print sum @F' passwd

perl -alne '$t += @F; print $t' passwd
perl -alne '$t += @F; END{print $t}' passwd

perl -alne 'map{ /sshd/ && $t++ } @F; END{print $t}' passwd

perl -ne '/sshd/ && print' passwd

perl -ne 'print if /^\d+/' passwd

#矩阵转置
perl -MData::Dumper -e '@matrix=(["a","b","c","d"],["e","f","g","h"],["i","j","k","l"]);print Dumper (\ @matrix); @transposed=map{$x=$_;[map{$matrix[$_][$x]} 0..$#matrix];} 0..$#{$matrix[0]}; print Dumper(\ @transposed)'
#perl打印单引号
#单个单引号 
    perl -e 'print "'\''"'
#打印多个单引号 
perl -e 'print "'\'' '\''"'

发表在 Perl | 留下评论

Perl多线程编程

发表于2019年6月4日由daizao

可以用于多种perl的编程环境，让程序并行

#!/usr/bin/perl -w
#
use strict;
use warnings;
use threads;

my $j=0;
my $thread;

while(){
    while(scalar (threads -> list())<4){
        $j++;
        threads -> create(\&程序，参数...); 
    }
    for $thread(threads -> list(threads::all)){
        if ($thread -> is_joinable()){
            $thread -> join();
        }
    }
}

sub 自己的程序{
    ......
}

参考的网址： 1、Perl 中的线程 2、Perl多线程 3、perldoc:threads

发表在 Perl | 留下评论

dbGaP申请注意事项

发表于2019年6月4日由daizao

不仅SAM每年需要renew，dbGaP每年也需要renew，登陆dbGaP填写research progress进行renew即可关于TCGA和dbGaP申请，请加群讨论

发表在 TCGA | 留下评论

根据分组计算平均值（包含NA值）

发表于2019年5月29日由daizao

数据结构如下图已知Year和Month列有重复，目的是根据Year和Month分组，计算Temperature的平均值（Temperature中存在NA值，求平均值时需要去除NA值后计算均值）

perl语言版本如下

#!/usr/bin/perl -w
use strict;
use warnings;

my $usage=<<USAGE;
Usage:
    perl $0 inputfile
USAGE
if(@ARGV==0){die $usage};

my $file=$ARGV[0];
my @data=();
my %hash_year=();
my %hash_month=();

open(RF,$file) || die $!;
open(WF,">process_1.txt") || die $!;
while(my $line=<RF>){
    chomp($line);
    next if ($.==1);
    my @arr=split('\t',$line);
    $hash_year{$arr[10]}=1;
    $hash_month{$arr[11]}=1;
    print WF $arr[10],"\t",$arr[11],"\t",$arr[14],"\n";
}

close(RF);
close(WF);

my $i=0;
open(WF,">average.txt") || die $!;
for my $key_year (sort {$a <=> $b}keys %hash_year){
    for my $key_month (sort {$a <=> $b}keys %hash_month){
        my @value=();
        open(RF,"process_1.txt") || die $!; 
        while(my $line=<RF>){
            chomp($line);
            my @arr=split('\t',$line);
            if ($arr[0]==$key_year && $arr[1]==$key_month) {
                push @value,$arr[2];
            }
        }
        close(RF);
        if ($i==0){
            print WF "Year\tMonth\tTemperature\n";
        }
        if (scalar @value > 0 ){
            my $average=&average(@value);
            print WF $key_year,"\t",$key_month,"\t",$average,"\n";
        }
        $i++;
    }
}


close(WF);
system("del process_1.txt");

sub average{
    my @num=@_;
    my $j=0;
    my $total=0;
    my $result;
    for my $i (0..$#num){
        next if ($num[$i] eq 'NA');
        $total=$total + $num[$i];
        $j++;
    }
    $result=($total/$j);
    return $result;
}

R语言for循环版本如下

dz_test <- data[,c("Year","Month","Temperature")]
a <- data.frame()
for (i in dz_test[!duplicated(dz_test$Year),]$Year){
    for (j in dz_test[!duplicated(dz_test$Month),]$Month){
        year <- dz_test[dz_test$Year==i,]
        month <- year[year$Month==j,]
        b <- cbind(i,j,mean(month$Temperature,na.rm=T))
        a <- rbind(a,b)
    }
}
names(a) <- names(dz_test)
a

最简单也比较快的方法是使用R语言的tidyverse包

library(tidyverse)
data <- data.table::fread("Temperature.txt",data.table = F)
results1 <- data %>%
    group_by(Year,Month) %>%
    summarise(Mean=mean(Temperature,na.rm=T))

得到如下图的结果

发表在 Perl, R | 留下评论

一	二	三	四	五	六	日
« 1月
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

Perl 传递参数写法

生信技能树—单细胞R考核题

perl one-line

Perl多线程编程

dbGaP申请注意事项

根据分组计算平均值（包含NA值）

链接表

近期文章

近期评论

分类目录