Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
G
GrAnnoT
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
DIADE
dynadiv
GrAnnoT
Commits
a750b3ed
Commit
a750b3ed
authored
2 years ago
by
NMarthe
Browse files
Options
Downloads
Patches
Plain Diff
changé le mode d'execution, supprime les fichiers temporaires, traduit les commentaires en anglais
parent
b40fd71a
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
seg_coord/getSegmentsCoordinates.py
+24
-18
24 additions, 18 deletions
seg_coord/getSegmentsCoordinates.py
with
24 additions
and
18 deletions
seg_coord/getSegmentsCoordinates.py
+
24
−
18
View file @
a750b3ed
# input : gfa file with paths.
gfa
=
"
platGB2_r426_chr10_cactus.gfa
"
# output : bed files giving the coordinates of the segments on the genomes (/minigraph segments)
import
subprocess
import
subprocess
import
sys
if
not
(
len
(
sys
.
argv
)
==
2
)
:
print
(
"
expected input : gfa file with walks.
"
)
print
(
"
output : bed files giving the coordinates of the segments on the genomes (or on minigraph segments).
"
)
sys
.
exit
(
1
)
elif
(
sys
.
argv
[
1
]
==
"
-h
"
)
:
print
(
"
expected input : gfa file with walks.
"
)
print
(
"
output : bed files giving the coordinates of the segments on the genomes (or on minigraph segments).
"
)
sys
.
exit
(
1
)
gfa
=
sys
.
argv
[
1
]
#
récupérer les
li
g
nes
qui commencent par 'S'
#
get the
lines
that start with "S"
command
=
"
grep ^S
"
+
gfa
+
"
> segments.txt
"
command
=
"
grep ^S
"
+
gfa
+
"
> segments.txt
"
subprocess
.
run
(
command
,
shell
=
True
)
subprocess
.
run
(
command
,
shell
=
True
)
segments
=
open
(
'
segments.txt
'
,
'
r
'
)
segments
=
open
(
'
segments.txt
'
,
'
r
'
)
lines
=
segments
.
readlines
()
lines
=
segments
.
readlines
()
segments
.
close
()
segments
.
close
()
# faire un dictionnaire avec les tailles des segments
# build a dictionnary with the segment sizes
segments_size
=
{}
segments_size
=
{}
for
line
in
lines
:
for
line
in
lines
:
line
=
line
.
split
()
line
=
line
.
split
()
...
@@ -20,16 +27,14 @@ for line in lines:
...
@@ -20,16 +27,14 @@ for line in lines:
seg_size
=
len
(
line
[
2
])
seg_size
=
len
(
line
[
2
])
segments_size
[
seg_id
]
=
seg_size
segments_size
[
seg_id
]
=
seg_size
# get the lines that start with "W"
#print(segments_size['1'])
# récupérer les lignes qui commencent par 'W'
command
=
"
grep ^W
"
+
gfa
+
"
| sed
'
s/>/,>/g
'
| sed
'
s/</,</g
'
> walks.txt
"
command
=
"
grep ^W
"
+
gfa
+
"
| sed
'
s/>/,>/g
'
| sed
'
s/</,</g
'
> walks.txt
"
subprocess
.
run
(
command
,
shell
=
True
)
subprocess
.
run
(
command
,
shell
=
True
)
walks
=
open
(
'
walks.txt
'
,
'
r
'
)
walks
=
open
(
'
walks.txt
'
,
'
r
'
)
lines
=
walks
.
readlines
()
lines
=
walks
.
readlines
()
walks
.
close
()
walks
.
close
()
# sur ces lignes, récupérer le nom du génome pour nommer le bed
# on these lines, get the name of the genome to name the output bed file
file_names
=
list
()
file_names
=
list
()
for
line
in
lines
:
for
line
in
lines
:
line
=
line
.
split
()
line
=
line
.
split
()
...
@@ -40,8 +45,8 @@ for line in lines :
...
@@ -40,8 +45,8 @@ for line in lines :
file_name
=
name
+
'
_
'
+
chromosome
+
'
.bed
'
file_name
=
name
+
'
_
'
+
chromosome
+
'
.bed
'
#
si on écrit dans le fichier pour la première fois, l'écraser. sinon l'
append
.
#
if we are writing in the file for the first time, overwrite it. else,
append
it
#
certains chr d'un même individu sont fragmentés. les coordonnées des segments de tous les fragments sont insc
rite
s dans le même bed.
#
this is because some chromosomes are fragmented. the coordinates of all the fragments from the same chromosome are w
rit
t
e
n in the same bed file.
if
file_name
not
in
file_names
:
if
file_name
not
in
file_names
:
file_names
.
append
(
file_name
)
file_names
.
append
(
file_name
)
out_bed
=
open
(
file_name
,
'
w
'
)
out_bed
=
open
(
file_name
,
'
w
'
)
...
@@ -51,8 +56,8 @@ for line in lines :
...
@@ -51,8 +56,8 @@ for line in lines :
path
=
line
[
6
].
split
(
'
,
'
)
path
=
line
[
6
].
split
(
'
,
'
)
position
=
path_start
position
=
path_start
for
i
in
range
(
1
,
len
(
path
)):
#
pour chaque
segment
du
path,
p
ri
nt dans le bed la
position
du
segment
.
for
i
in
range
(
1
,
len
(
path
)):
#
for each
segment
in the
path,
w
ri
te the
position
of the
segment
in the output bed file
# c
alcul coordonnées
: start=position, stop=position+
taille_segment-1, ensuite
position+=
taille_
segment
# c
oordinates calculation
: start=position, stop=position+
segment_size-1, then
position+=segment
_size
chr
=
'
Chr
'
+
chromosome
[
len
(
chromosome
)
-
2
:]
chr
=
'
Chr
'
+
chromosome
[
len
(
chromosome
)
-
2
:]
...
@@ -73,5 +78,6 @@ for line in lines :
...
@@ -73,5 +78,6 @@ for line in lines :
position
+=
segments_size
[
seg_name
]
position
+=
segments_size
[
seg_name
]
out_bed
.
close
()
out_bed
.
close
()
command
=
"
rm segments.txt && rm walks.txt
"
subprocess
.
run
(
command
,
shell
=
True
)
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment