ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

如何在TSV文件中填写缺少的序列行

2019-11-11 21:55:53  阅读:256  来源: 互联网

标签:infinite while-loop loops python


我仍然是一个初学者,因此对于初学者来说,很抱歉这个问题可能有一个明显的答案,而对于混乱的代码很抱歉,但是我有上万行的文件.我正在使用某种窗口框架技术来沿文件滑动,因此我需要确保每个窗口都在其中.但是,我的某些输入文件缺少某些行,因此我尝试用Python编写代码以添加这些行以及我想要的信息,以使文件完整.代码如下所示:

#!/usr/bin/env python

outfile = open ("missing_test.txt", "w")

with open("add_missing.txt", "r") as file:
    last_line = 0   #This is where it starts for bin 1
    lines = []
    header_line = next(file)
    outfile.write(header_line)
    CHROM = 'BABA_1'
    for line in file:     #go through every line to check its existence and rewrite to new file
        nums = line.split("\t")
        num1 = nums[0]        #no integer because this is a string: name individual
        num2 = int(nums[1])   #integer for window
        num3 = int(nums[2])   #integer for coverage (here always 10000 to met treshold)
        num4 = int(nums[3])   #integer for SNP count   
        if num1 == CHROM:     #
            while num2 != last_line + 10000:
                #A line is missing, so a new line is added with 0 SNPs:
                NUM2 = last_line + 10000   # New window, the one that was missing
                NUM4 = 0   #0 SNPs found
                #lines.append((num1, NUM2, num3, NUM4))
                OUTLINE = "%s\t%s\t%s\t%s" % (num1, NUM2, num3, NUM4) #write new line to outfile       
                outfile.write(OUTLINE + "\n")
                last_line += 10000
            lines.append((num1,num2,num3,num4))
            last_line += 10000    #also add 10000 here otherwise the while loop makes no sense
            outline = "%s\t%s\t%s\t%s" % (num1, num2, num3, num4)
            outfile.write(outline + "\n")   #write all existing lines to outfile

        else:
            CHROM = num1
            last_line = 0

outfile.close()        

因此,只要第一个“ CHROM”的第一个窗口等于0,就可以正常工作,但情况并非总是如此.在后一种情况下,循环将是无限的.例如,这是输入和所需输出的样子:

输入:

indiv   window  coverage    SNP
BABA_1  20000   10000   7
BABA_1  30000   10000   1
BABA_1  50000   10000   2
BABA_1  60000   10000   3
BABA_1  80000   10000   1
BABA_10 20000   10000   1
BABA_10 30000   10000   16
BABA_10 80000   10000   9

所需的输出:

indiv   window  coverage    SNP
BABA_1  10000   10000   0
BABA_1  20000   10000   7
BABA_1  30000   10000   1
BABA_1  40000   10000   0
BABA_1  50000   10000   2
BABA_1  60000   10000   3
BABA_1  70000   10000   0
BABA_1  80000   10000   1
BABA_10 10000   10000   0
BABA_10 20000   10000   1
BABA_10 30000   10000   16
BABA_10 40000   10000   0
BABA_10 50000   10000   0
BABA_10 60000   10000   0
BABA_10 70000   10000   0
BABA_10 80000   10000   9

我一直在努力寻找答案,以使我的这个循环工作而不会无限进行,但我严重看不到我的缺点.有人对我有解决方法的提示吗?

非常感谢您的帮助,在此先感谢您!

解决方法:

尝试以下方法:

#!/usr/bin/python

outfile = open ("missing_test.txt", "w")

def write_line(indiv, window, coverage, snp):
    outline = "%s\t%s\t%s\t%s\n" % (indiv, window, coverage, snp)
    outfile.write(outline)

with open("add_missing.txt", "r") as file:
    lines = file.readlines()
    write_line(*lines.pop(0).rstrip().split("\t"))
    first_line = lines[0].split("\t")
    last_indiv = first_line[0]
    last_window = int(first_line[1])

    for line in lines:
        indiv, window, coverage, snp = line.split("\t")
        window = int(window)
        coverage = int(coverage)
        snp = int(snp)

        if indiv == last_indiv:
            # If the current window is higher than expected,
            # insert a line with the missing window.
            # Repeat until we get to the expected window.
            while window > last_window + 10000:
                write_line(indiv, last_window + 10000, coverage, 0)
                last_window += 10000
            last_window = window
        else:
            last_indiv = indiv
            last_window = window
        write_line(indiv, window, coverage, snp)

它不包含的是期望某个窗口号是给定indiv中的第一个窗口,因为您没有定义该行为,并且对此的评论相当混乱.

运行此脚本后,missing_test.txt的内容:

06001

标签:infinite,while-loop,loops,python
来源: https://codeday.me/bug/20191111/2022644.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有