ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

【实验记录】7月30日 复现作者的示例代码-Roadmap

2022-07-30 17:01:43  阅读:136  来源: 互联网

标签:giggle 示例 -- 30 Roadmap xxzhang home software workplace


ssh xxzhang@192.168.79.84
cd /home/xxzhang/data/Epigenome/Roadmap/

1、拆分原始bed文件,并重命名文件名

(base) [xxzhang@cu08 Roadmap]$ python /home/xxzhang/workplace/software/giggle/examples/rme/rename.py /home/xxzhang/workplace/software/giggle/examples/rme/states.txt /home/xxzhang/workplace/software/giggle/examples/rme/EDACC_NAME.txt "Roadmap_hg38_127/*gz" "split/"

Traceback (most recent call last):
  File "/home/xxzhang/workplace/software/giggle/examples/rme/rename.py", line 5, in <module>
    from itertools import imap
ImportError: cannot import name 'imap'

主要的原因是imap函数在python2中,需要通过itertools加载的。但是在python3中,直接用map函数作为替代。
参考链接:https://mlog.club/article/2009306
将代码中的imap函数替换为map函数即可。并注释掉“from itertools import imap”这一行。
代码即可运行。

但是,我们又出现了新的问题,我们的这个候选的文件其实并不包含gz的压缩(也是因为之前,我把它们解压的缘故)。
我现在需要重新解压原始的压缩包(.tgz格式)。

tar zxvf all_hg38lift.mnemonics.bedFiles.tgz -C orig/
python /home/xxzhang/workplace/software/giggle/examples/rme/rename.py /home/xxzhang/workplace/software/giggle/examples/rme/states.txt /home/xxzhang/workplace/software/giggle/examples/rme/EDACC_NAME.txt  "orig/*gz" "split/" 

这个代码可真是厉害啊
我们通过看下图,发现它将原始的组织的全部的bed文件拆分成了组织+染色质状态的bed文件。这样的话,我们就可以直接去看,候选的位点在什么组织什么染色质状态下富集,这是很有意思的角度。
而且是多线程处理,运行的速度极快。
这个代码我要学习一下。我还有很多要学的地方啊。自己要迭代升级了。

2、批量压缩bed文件

cd split/
ls *.bed |gargs -p 30 "bgzip {}" #这个应该批量的对其进行压缩

3、对序列进行排序

mkdir split_sort
/home/xxzhang/workplace/software/giggle/scripts/sort_bed  "split/*gz" split_sort/ 30

这里也很值得学习,这里是对sort文件夹进行压缩备份

mkdir split_sort_raw
cd split_sort_raw/
cp ../split_sort/* .
ls *.bed.gz |gargs -p 30 "bgzip -d {}"
cd ..

4、构建索引

time giggle index -i "split_sort/*gz" -o split_sort_b -f -s
Indexed 56440237 intervals.

real    4m5.078s
user    3m9.876s
sys     0m8.330s

这一步真的让我战战兢兢,但是似乎到这里,是比较顺利的。

5、构建测试的query序列,与roadmap的索引进行比较

我相信真实的东西,是会非常简洁美丽的(这是我的核心价值观,如果你很费力的完成,说明用的方法不对,这是自然的基本的定律)。

(1)下载测试的数据集,并进行初步的筛选

wget ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM1218nnn/GSM1218850/suppl/GSM1218850_MB135DMMD.peak.txt.gz
zcat GSM1218850_MB135DMMD.peak.txt.gz |awk '$8>=100' |bgzip -c >GSM1218850_MB135DMMD.peak.q100.bed.gz

(2)检索

time giggle search -i split_sort_b -q GSM1218850_MB135DMMD.peak.q100.bed.gz -s >GSM1218850_MB135DMMD.peak.q100.bed.gz.giggle.result

6、画图

python /home/xxzhang/workplace/software/giggle/scripts/giggle_heat_map.py -s /home/xxzhang/workplace/software/giggle/examples/rme/states.txt -c /home/xxzhang/workplace/software/giggle/examples/rme/EDACC_NAME.txt -i GSM1218850_MB135DMMD.peak.q100.bed.gz.giggle.result -o GSM1218850_MB135DMMD.peak.q100.bed.gz.3x11.pdf -n /home/xxzhang/workplace/software/giggle/examples/rme/new_groups.txt --x_size 3 --y_size 11 --stat combo --ablines 15,26,31,43,52,60,72,82,87,89,93,101,103,116,120,122,127 --state_names /home/xxzhang/workplace/software/giggle/examples/rme/short_states.txt --group_names /home/xxzhang/workplace/software/giggle/examples/rme/new_groups_names.txt

这里出现了一个报错。
Traceback (most recent call last):
File "/home/xxzhang/workplace/software/giggle/scripts/giggle_heat_map.py", line 2, in
import matplotlib
ImportError: No module named matplotlib

后来尝试安装matplotlib。

pip install matplotlib

Collecting matplotlib
Downloading matplotlib-3.3.4-cp36-cp36m-manylinux1_x86_64.whl (11.5 MB)
|████████████████████████████████| 11.5 MB 1.1 MB/s
Requirement already satisfied: numpy>=1.15 in /home/xxzhang/miniconda3/lib/python3.6/site-packages (from matplotlib) (1.19.5)
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3
Downloading pyparsing-3.0.9-py3-none-any.whl (98 kB)
|████████████████████████████████| 98 kB 3.9 MB/s
Collecting pillow>=6.2.0
Downloading Pillow-8.4.0-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
|████████████████████████████████| 3.1 MB 23.1 MB/s
Collecting cycler>=0.10
Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Requirement already satisfied: python-dateutil>=2.1 in /home/xxzhang/miniconda3/lib/python3.6/site-packages (from matplotlib) (2.8.1)
Collecting kiwisolver>=1.0.1
Downloading kiwisolver-1.3.1-cp36-cp36m-manylinux1_x86_64.whl (1.1 MB)
|████████████████████████████████| 1.1 MB 80.8 MB/s
Requirement already satisfied: six>=1.5 in /home/xxzhang/miniconda3/lib/python3.6/site-packages (from python-dateutil>=2.1->matplotlib) (1.15.0)
Installing collected packages: pyparsing, pillow, kiwisolver, cycler, matplotlib
Successfully installed cycler-0.11.0 kiwisolver-1.3.1 matplotlib-3.3.4 pillow-8.4.0 pyparsing-3.0.9

似乎没啥用。
后来发现是numpy的版本不对。

>>> import matplotlib
RuntimeError: **module compiled against API version 0xc but this version of numpy is 0xb**
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxzhang/miniconda3/lib/python3.6/site-packages/matplotlib/__init__.py", line 174, in <module>
    _check_versions()
  File "/home/xxzhang/miniconda3/lib/python3.6/site-packages/matplotlib/__init__.py", line 159, in _check_versions
    from . import ft2font
ImportError: numpy.core.multiarray failed to import

这个错误主要产生的原因,是numpy和matplotlib之间版本不匹配的问题。

解决的方法就是:
(1)卸载干净numpy
(2)重新安装numpy
(3)安装matplotlib

pip uninstall numpy

Found existing installation: numpy 1.16.4
Uninstalling numpy-1.16.4:
  Would remove:
    /home/xxzhang/miniconda3/bin/f2py
    /home/xxzhang/miniconda3/bin/f2py3
    /home/xxzhang/miniconda3/bin/f2py3.6
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy-1.16.4.dist-info/*
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/*
  Would not remove (might be manually added):
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/_import_tools.py
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/add_newdocs.py
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/core/multiarray_tests.cpython-36m-x86_64-linux-gnu.so
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/core/operand_flag_tests.cpython-36m-x86_64-linux-gnu.so
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/core/struct_ufunc_test.cpython-36m-x86_64-linux-gnu.so
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/core/test_rational.cpython-36m-x86_64-linux-gnu.so
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/core/umath_tests.cpython-36m-x86_64-linux-gnu.so
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/distutils/environment.py
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy/distutils/site.cfg
Proceed (y/n)? y
  Successfully uninstalled numpy-1.16.4

pip uninstall numpy

Found existing installation: numpy 1.13.1
Uninstalling numpy-1.13.1:
  Would remove:
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy
    /home/xxzhang/miniconda3/lib/python3.6/site-packages/numpy-1.13.1-py3.6.egg-info
Proceed (y/n)? y
  Successfully uninstalled numpy-1.13.1

pip uninstall numpy
WARNING: Skipping numpy as it is not installed.


pip install numpy==1.19.5 -i http://pypi.douban.com/simple --trusted-host pypi.douban.com

Looking in indexes: http://pypi.douban.com/simple
Collecting numpy==1.19.5
  Downloading http://pypi.doubanio.com/packages/14/32/d3fa649ad7ec0b82737b92fefd3c4dd376b0bb23730715124569f38f3a08/numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8 MB)
     |████████████████████████████████| 14.8 MB 6.7 MB/s
Installing collected packages: numpy
Successfully installed numpy-1.19.5

【参考链接】https://blog.csdn.net/road_of_god/article/details/121218570

测试bug是否解决

Python 3.6.13 |Anaconda, Inc.| (default, Feb 23 2021, 21:15:04)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import matplotlib #无报错输出

接下来,重新加载测试代码画图。
当然还有一些其他的报错。

Traceback (most recent call last):
  File "/home/xxzhang/workplace/software/giggle/scripts/giggle_heat_map.py", line 269, in <module>
    plt.yticks(np.arange(0.5,len(group_names)+0.5,1.0),[x.decode("utf8") for x in group_names], fontsize=10)
  File "/home/xxzhang/workplace/software/giggle/scripts/giggle_heat_map.py", line 269, in <listcomp>
    plt.yticks(np.arange(0.5,len(group_names)+0.5,1.0),[x.decode("utf8") for x in group_names], fontsize=10)
AttributeError: 'str' object has no attribute 'decode'

这个同样也还是python2和python3的区别,解决方法是直接去掉“.decode("utf8")”即可。

plt.yticks(np.arange(0.5,len(group_names)+0.5,1.0),[x.decode("utf8") for x in group_names], fontsize=10)
plt.yticks(np.arange(0.5,len(group_names)+0.5,1.0),[x for x in group_names], fontsize=10)

终于最后得到了测试结果的heatmap的图。

7、用我们自己的文件进行测试。

在测试之前,需要了解人家的测试文件长什么样子。

zcat GSM1218850_MB135DMMD.peak.txt.gz |head

chr1 714140 714505 . 21 . -1 9.05234147781908 -1
chr1 766123 766513 . 40 . -1 20.6816682872462 -1
chr1 773683 774053 . 39 . -1 20.1453797317428 -1
chr1 785319 785660 . 25 . -1 12.6626688107994 -1
chr1 805180 805679 . 89 . -1 53.0106435758877 -1
chr1 821480 822377 . 66 . -1 37.4996160251318 -1
chr1 876990 877319 . 24 . -1 8.81398119324741 -1
chr1 894436 894717 . 17 . -1 6.04980472690903 -1
chr1 896632 896875 . 21 . -1 7.62492868913338 -1
chr1 934524 934836 . 15 . -1 5.26813111266616 -1

也不过是一个普通的peak文件,并且作者进一步将quality>100的peak,作为进一步的处理的对象。

接下来,我们想要加载我们的数据。

(1)所有的人特异性的序列

cp /home/xxzhang/workplace/project/giggle_workplace/new_trial/human_specific_repeat_uniq.bed.gz ./

time giggle search -i split_sort_b -q human_specific_repeat_uniq.bed.gz -s >human_specific_repeat_uniq.bed.gz.giggle.result                           

real    0m5.337s
user    0m3.662s
sys     0m0.715s

python /home/xxzhang/workplace/software/giggle/scripts/giggle_heat_map.py -s /home/xxzhang/workplace/software/giggle/examples/rme/states.txt -c /home/xxzhang/workplace
/software/giggle/examples/rme/EDACC_NAME.txt -i human_specific_repeat_uniq.bed.gz.giggle.result -o human_specific_repeat_uniq.bed.gz.3x11.pdf -n /home/xxzhang/workplace/software/giggle/examples/rme/new_groups.txt --x_size 3 --y_size 11 --stat combo --ablines 15,26,31,43,52,60,72,82,87,89,93,101,103,116,120,122,127 --state_names /home/xxzhang/workplace/software/giggle/examples/rme/short_states.txt --group_names /home/xxzhang/workplace/software/giggle/examples/rme/new_groups_names.txt
-911.6124454198172 427.1105279465215

得到的结果就挺让人失望的。

(2)分四大家族来看结果

  • LINE
cp /home/xxzhang/workplace/project/giggle_workplace/sub_family/LINE/Hs_LINE.bed.gz ./                                                                
time giggle search -i split_sort_b -q Hs_LINE.bed.gz -s >Hs_LINE.bed.gz.giggle.result                                                                 
real    0m2.052s
user    0m1.268s
sys     0m0.332s

python /home/xxzhang/workplace/software/giggle/scripts/giggle_heat_map.py -s /home/xxzhang/workplace/software/giggle/examples/rme/states.txt -c /home/xxzhang/workplace/software/giggle/examples/rme/EDACC_NAME.txt -i Hs_LINE.bed.gz.giggle.result -o Hs_LINE.bed.gz.3x11.pdf -n /home/xxzhang/workplace/software/giggle/examples/rme/new_groups.txt --x_size 3 --y_size 11 --stat combo --ablines 15,26,31,43,52,60,72,82,87,89,93,101,103,116,120,122,127 --state_names /home/xxzhang/workplace/software/giggle/examples/rme/short_states.txt --group_names /home/xxzhang/workplace/software/giggle/examples/rme/new_groups_names.txt

-1145.4813572909884 403.57550238181346

  • LTR
cp /home/xxzhang/workplace/project/giggle_workplace/sub_family/LTR/Hs_LTR.bed.gz ./                                                                  
time giggle search -i split_sort_b -q Hs_LTR.bed.gz -s >Hs_LTR.bed.gz.giggle.result                                                                   
real    0m0.654s
user    0m0.289s
sys     0m0.191s

python /home/xxzhang/workplace/software/giggle/scripts/giggle_heat_map.py -s /home/xxzhang/workplace/software/giggle/examples/rme/states.txt -c /home/xxzhang/workplace/software/giggle/examples/rme/EDACC_NAME.txt -i Hs_LTR.bed.gz.giggle.result -o Hs_LTR.bed.gz.3x11.pdf -n /home/xxzhang/workplace/software/giggle/examples/rme/new_groups.txt --x_size 3 --y_size 11 --stat combo --ablines 15,26,31,43,52,60,72,82,87,89,93,101,103,116,120,122,127 --state_names /home/xxzhang/workplace/software/giggle/examples/rme/short_states.txt --group_names /home/xxzhang/workplace/software/giggle/examples/rme/new_groups_names.txt
-93.71706066802582 295.22627402031566

  • SINE
cp /home/xxzhang/workplace/project/giggle_workplace/sub_family/SINE/Hs_SINE.bed.gz ./
time giggle search -i split_sort_b -q Hs_SINE.bed.gz -s >Hs_SINE.bed.gz.giggle.result                                                                 
real    0m3.344s
user    0m2.045s
sys     0m0.547s

python /home/xxzhang/workplace/software/giggle/scripts/giggle_heat_map.py -s /home/xxzhang/workplace/software/giggle/examples/rme/states.txt -c /home/xxzhang/workplace/software/giggle/examples/rme/EDACC_NAME.txt -i Hs_SINE.bed.gz.giggle.result -o Hs_SINE.bed.gz.3x11.pdf -n /home/xxzhang/workplace/software/giggle/examples/rme/new_groups.txt --x_size 3 --y_size 11 --stat combo --ablines 15,26,31,43,52,60,72,82,87,89,93,101,103,116,120,122,127 --state_names /home/xxzhang/workplace/software/giggle/examples/rme/short_states.txt --group_names /home/xxzhang/workplace/software/giggle/examples/rme/new_groups_names.txt

-353.71735915110963 174.10855052614107

  • SVA
cp /home/xxzhang/workplace/project/giggle_workplace/sub_family/SVA/Hs_SVA.bed.gz ./                                                                  
time giggle search -i split_sort_b -q Hs_SVA.bed.gz -s >Hs_SVA.bed.gz.giggle.result                                                                   
real    0m1.071s
user    0m0.521s
sys     0m0.204s

python /home/xxzhang/workplace/software/giggle/scripts/giggle_heat_map.py -s /home/xxzhang/workplace/software/giggle/examples/rme/states.txt -c /home/xxzhang/workplace/software/giggle/examples/rme/EDACC_NAME.txt -i Hs_SVA.bed.gz.giggle.result -o Hs_SVA.bed.gz.3x11.pdf -n /home/xxzhang/workplace/software/giggle/examples/rme/new_groups.txt --x_size 3 --y_size 11 --stat combo --ablines 15,26,31,43,52,60,72,82,87,89,93,101,103,116,120,122,127 --state_names /home/xxzhang/workplace/software/giggle/examples/rme/short_states.txt --group_names /home/xxzhang/workplace/software/giggle/examples/rme/new_groups_names.txt
-395.7527136893095 937.5406208326722

标签:giggle,示例,--,30,Roadmap,xxzhang,home,software,workplace
来源: https://www.cnblogs.com/zjuer/p/16535027.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有