TCGA转录本数据合并

#!/usr/bin/perl -w
use strict;
use warnings;
use Data::Dumper;
use File::Basename;
use JSON;

my $usage=<<USAGE;
Usage:
    perl dz_merge.pl /PATH/TO/meta.json /path/to/cartfile /path/to/filename
USAGE
if(@ARGV==0){die $usage};

my $file1=$ARGV[0];
my $file2=$ARGV[1];
my $file3=$ARGV[2];
my $js;

open(RF,$file1) || die "wrong json doc";
while (my $line=<RF>){
    $js .= "$line";
}
close(RF);

my $json=decode_json($js);

my %hash;
my %gene_exp;

if (-e ($file2."/dz_result.txt")){
    unlink ($file2."/dz_result.txt");
}
my @cat_file=glob($file2."/*.txt");
#print @cat_file;
for my $doc (0..$#cat_file){
    my $basename = basename $cat_file[$doc];
    $hash{$basename}++;
}

my @tcgaid;
my @genename;
my $jishu=0;

for my $i (@{$json}){
    my $docname=$i -> {file_name};
    my @arr=split(/\./,$docname);
    my $filename="$arr[0].$arr[1].$arr[2]";
    if (exists $hash{$filename}){
        push(@tcgaid,$i -> {associated_entities} -> [0] -> {entity_submitter_id});
        open(RF,$file2."/".$filename."/".$filename);
        while(my $line=<RF>){
            chomp ($line);
            my @genearr=split("\t",$line);
            $gene_exp{$i -> {associated_entities} -> [0] -> {entity_submitter_id}}{$genearr[0]}=$genearr[1];
            if ($jishu == 0){
                push(@genename,$genearr[0]);
            }
        }
        close(RF);
        $jishu=$jishu+1;
    }
}

my $normal=0;
my $tumor=0;

open(WF,">>".$file3) || die "wrong output file !";
print WF "id";
for my $i (0..$#tcgaid){
    my @arr=split("-",$tcgaid[$i]);
    if ($arr[3]=~/^1/){#normal tissue
        print WF "\t",$tcgaid[$i];
        $normal++;  
    }elsif($arr[3]=~/^0/){#tumor tissure
        print WF "\t",$tcgaid[$i];
        $tumor++;   
    }
}

print WF "\n";
print "normal tissue:",$normal,"\n";
print "tumor tissue:",$tumor,"\n";

for my $j (0..$#genename){
    print WF $genename[$j];
    for my $i (0..$#tcgaid){
        my @arr=split("-",$tcgaid[$i]);
        if ($arr[3]=~/^1/){
                print WF "\t",$gene_exp{$tcgaid[$i]}{$genename[$j]};
        }elsif($arr[3]=~/^0/){
                print WF "\t",$gene_exp{$tcgaid[$i]}{$genename[$j]};
        }
    }
    print WF "\n";
}

close(WF);

由于样品还未排序,不好用于差异分析,所以需要根据样本类型进行排序:排序脚本
R语言合并脚本

此条目发表在Perl, TCGA分类目录。将固定链接加入收藏夹。

TCGA转录本数据合并》有5条回应

  1. Lukies说:

    大神,我用了您的脚本合并肿瘤TCGA甲基化数据,然而报了:
    Use of uninitialized value $file2 in concatenation (.) or string at C:\Users\DELL\Desktop\livermRNA\methy\02_download\geneMerge1.pl line 30.
    Use of uninitialized value $file2 in concatenation (.) or string at C:\Users\DELL\Desktop\livermRNA\methy\02_download\geneMerge1.pl line 33.
    Use of uninitialized value $file3 in concatenation (.) or string at C:\Users\DELL\Desktop\livermRNA\methy\02_download\geneMerge1.pl line 67.
    wrong output file ! at C:\Users\DELL\Desktop\livermRNA\methy\02_download\geneMerge1.pl line 67. 没办法合并呢,愁死了,咋办,能帮忙看看是啥问题吗?邮箱yuminzhongda@163.com,谢谢。

  2. 安璐璐说:

    大神能帮忙看看,我这个为什么会这样?
    Use of uninitialized value $file2 in concatenation (.) or string at mergess.pl line 30.
    Use of uninitialized value $file2 in concatenation (.) or string at mergess.pl line 33.
    Use of uninitialized value $file3 in concatenation (.) or string at mergess.pl line 67.
    wrong output file ! at mergess.pl line 67.

发表评论

电子邮件地址不会被公开。 必填项已用*标注

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax

Protected with IP Blacklist CloudIP Blacklist Cloud