Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
G
GrAnnoT
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Package Registry
Container Registry
Model registry
Operate
Environments
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
DIADE
dynadiv
GrAnnoT
Commits
f2324623
Commit
f2324623
authored
1 year ago
by
nina.marthe_ird.fr
Browse files
Options
Downloads
Patches
Plain Diff
corrigé la detection des chromosomes
parent
328b076f
No related branches found
No related tags found
No related merge requests found
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
seg_coord/getSegmentsCoordinates.py
+26
-5
26 additions, 5 deletions
seg_coord/getSegmentsCoordinates.py
with
26 additions
and
5 deletions
seg_coord/getSegmentsCoordinates.py
+
26
−
5
View file @
f2324623
import
subprocess
import
sys
def
has_numbers
(
inputString
):
return
any
(
char
.
isdigit
()
for
char
in
inputString
)
def
getChrName
(
chromosome_field
):
chromosome_id
=
""
if
has_numbers
(
chromosome_field
):
for
char
in
reversed
(
chromosome_field
):
# take the last digits of the field
if
not
char
.
isdigit
():
break
else
:
chromosome_id
+=
char
chromosome_id
=
"
Chr
"
+
chromosome_id
[::
-
1
]
else
:
for
char
in
reversed
(
chromosome_field
):
# take the last uppercase chars of the fied
if
not
char
.
isupper
():
break
else
:
chromosome_id
+=
char
chromosome_id
=
"
Chr
"
+
chromosome_id
[::
-
1
]
return
chromosome_id
if
not
(
len
(
sys
.
argv
)
==
2
)
:
print
(
"
expected input : gfa file with walks.
"
)
print
(
"
output : bed files giving the coordinates of the segments on the genomes (or on minigraph segments).
"
)
...
...
@@ -38,12 +59,12 @@ walks.close()
file_names
=
list
()
for
line
in
lines
:
line
=
line
.
split
()
name
=
line
[
1
]
chr_field
=
line
[
3
].
split
(
'
_
'
)
chromosome
=
chr_field
[
len
(
chr_field
)
-
1
]
name
=
line
[
3
]
path_start
=
int
(
line
[
4
])
chromosome_field
=
line
[
3
]
chromosome_id
=
getChrName
(
chromosome_field
)
file_name
=
name
+
'
_
'
+
chromosome
+
'
.bed
'
file_name
=
name
+
'
.bed
'
# if we are writing in the file for the first time, overwrite it. else, append it
# this is because chromosomes can be fragmented. the coordinates of all the fragments from the same chromosome will be written in the same bed file.
...
...
@@ -59,7 +80,7 @@ for line in lines :
for
i
in
range
(
1
,
len
(
path
)):
# for each segment in the path, write the position of the segment in the output bed file
# coordinates calculation : start=position, stop=position+segment_size-1, then position+=segment_size
chr
=
'
Chr
'
+
chromosome
[
len
(
chromosome
)
-
2
:]
chr
=
'
Chr
'
+
chromosome
_id
[
len
(
chromosome
_id
)
-
2
:]
seg_start
=
position
seg_name
=
'
s
'
+
path
[
i
][
1
:]
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment