Taking text from a file and formatting it











up vote
0
down vote

favorite












My code takes numbers from a large text file, then splits it to organise the spacing and to place it into a 2-dimensional array. The code is used to get data for a job scheduler that I'm building.



#reading in workload data
def getworkload():
work =
strings =
with open("workload.txt") as f:
read_data = f.read()
jobs = read_data.split("n")
for j in jobs:
strings.append(" ".join(j.split()))
for i in strings:
work.append([float(s) for s in i.split(" ")])
return work

print(getworkload())


The text file is over 2000 lines long, and looks like this:



    1        0 1835117 330855  640   5886   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1
2 0 2265800 251924 640 3124 945 -1 -1 -1 5 2 1 4 9 -1 -1 -1
3 1 3114175 -1 640 -1 945 -1 -1 -1 5 2 1 4 9 -1 -1 -1
4 1813487 7481 -1 128 -1 20250 -1 -1 -1 5 3 1 5 8 -1 -1 -1
5 1814044 0 122 512 1.13 1181 -1 -1 -1 1 1 1 1 9 -1 -1 -1
6 1814374 1 51 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
7 1814511 0 55 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
8 1814695 1 51 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
9 1815198 0 75 512 2.14 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
10 1815617 0 115 512 1.87 1181 -1 -1 -1 1 1 1 1 9 -1 -1 -1



It takes 2 and a half minutes to run but I can print the returned data. How can it be optimised?










share|improve this question









New contributor




timtti is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site)
    – Calak
    20 hours ago










  • The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up.
    – timtti
    20 hours ago






  • 1




    "If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt.
    – Calak
    20 hours ago

















up vote
0
down vote

favorite












My code takes numbers from a large text file, then splits it to organise the spacing and to place it into a 2-dimensional array. The code is used to get data for a job scheduler that I'm building.



#reading in workload data
def getworkload():
work =
strings =
with open("workload.txt") as f:
read_data = f.read()
jobs = read_data.split("n")
for j in jobs:
strings.append(" ".join(j.split()))
for i in strings:
work.append([float(s) for s in i.split(" ")])
return work

print(getworkload())


The text file is over 2000 lines long, and looks like this:



    1        0 1835117 330855  640   5886   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1
2 0 2265800 251924 640 3124 945 -1 -1 -1 5 2 1 4 9 -1 -1 -1
3 1 3114175 -1 640 -1 945 -1 -1 -1 5 2 1 4 9 -1 -1 -1
4 1813487 7481 -1 128 -1 20250 -1 -1 -1 5 3 1 5 8 -1 -1 -1
5 1814044 0 122 512 1.13 1181 -1 -1 -1 1 1 1 1 9 -1 -1 -1
6 1814374 1 51 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
7 1814511 0 55 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
8 1814695 1 51 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
9 1815198 0 75 512 2.14 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
10 1815617 0 115 512 1.87 1181 -1 -1 -1 1 1 1 1 9 -1 -1 -1



It takes 2 and a half minutes to run but I can print the returned data. How can it be optimised?










share|improve this question









New contributor




timtti is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1




    Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site)
    – Calak
    20 hours ago










  • The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up.
    – timtti
    20 hours ago






  • 1




    "If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt.
    – Calak
    20 hours ago















up vote
0
down vote

favorite









up vote
0
down vote

favorite











My code takes numbers from a large text file, then splits it to organise the spacing and to place it into a 2-dimensional array. The code is used to get data for a job scheduler that I'm building.



#reading in workload data
def getworkload():
work =
strings =
with open("workload.txt") as f:
read_data = f.read()
jobs = read_data.split("n")
for j in jobs:
strings.append(" ".join(j.split()))
for i in strings:
work.append([float(s) for s in i.split(" ")])
return work

print(getworkload())


The text file is over 2000 lines long, and looks like this:



    1        0 1835117 330855  640   5886   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1
2 0 2265800 251924 640 3124 945 -1 -1 -1 5 2 1 4 9 -1 -1 -1
3 1 3114175 -1 640 -1 945 -1 -1 -1 5 2 1 4 9 -1 -1 -1
4 1813487 7481 -1 128 -1 20250 -1 -1 -1 5 3 1 5 8 -1 -1 -1
5 1814044 0 122 512 1.13 1181 -1 -1 -1 1 1 1 1 9 -1 -1 -1
6 1814374 1 51 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
7 1814511 0 55 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
8 1814695 1 51 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
9 1815198 0 75 512 2.14 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
10 1815617 0 115 512 1.87 1181 -1 -1 -1 1 1 1 1 9 -1 -1 -1



It takes 2 and a half minutes to run but I can print the returned data. How can it be optimised?










share|improve this question









New contributor




timtti is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











My code takes numbers from a large text file, then splits it to organise the spacing and to place it into a 2-dimensional array. The code is used to get data for a job scheduler that I'm building.



#reading in workload data
def getworkload():
work =
strings =
with open("workload.txt") as f:
read_data = f.read()
jobs = read_data.split("n")
for j in jobs:
strings.append(" ".join(j.split()))
for i in strings:
work.append([float(s) for s in i.split(" ")])
return work

print(getworkload())


The text file is over 2000 lines long, and looks like this:



    1        0 1835117 330855  640   5886   945   -1     -1    -1  5   2   1   4  9 -1 -1 -1
2 0 2265800 251924 640 3124 945 -1 -1 -1 5 2 1 4 9 -1 -1 -1
3 1 3114175 -1 640 -1 945 -1 -1 -1 5 2 1 4 9 -1 -1 -1
4 1813487 7481 -1 128 -1 20250 -1 -1 -1 5 3 1 5 8 -1 -1 -1
5 1814044 0 122 512 1.13 1181 -1 -1 -1 1 1 1 1 9 -1 -1 -1
6 1814374 1 51 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
7 1814511 0 55 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
8 1814695 1 51 512 -1 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
9 1815198 0 75 512 2.14 1181 -1 -1 -1 1 1 1 2 9 -1 -1 -1
10 1815617 0 115 512 1.87 1181 -1 -1 -1 1 1 1 1 9 -1 -1 -1



It takes 2 and a half minutes to run but I can print the returned data. How can it be optimised?







python performance csv formatting






share|improve this question









New contributor




timtti is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




timtti is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 39 mins ago









200_success

127k14148410




127k14148410






New contributor




timtti is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 20 hours ago









timtti

83




83




New contributor




timtti is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





timtti is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






timtti is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1




    Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site)
    – Calak
    20 hours ago










  • The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up.
    – timtti
    20 hours ago






  • 1




    "If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt.
    – Calak
    20 hours ago
















  • 1




    Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site)
    – Calak
    20 hours ago










  • The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up.
    – timtti
    20 hours ago






  • 1




    "If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt.
    – Calak
    20 hours ago










1




1




Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site)
– Calak
20 hours ago




Welcome on Code Review. I'm afraid this question does not match what this site is about. Code Review is about improving existing, working code. If you're having trouble getting something working, or ask for features, then you'd better ask on StackOverflow (the main site)
– Calak
20 hours ago












The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up.
– timtti
20 hours ago




The code works, as I can print work_row with out any problems and I know that work will be a two dimensional array/list. I just believe it can be sped up.
– timtti
20 hours ago




1




1




"If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt.
– Calak
20 hours ago






"If I try to print work the text is too long and I get an overflow error" for me it's sounds lile you have a problem. Try to reformulated your question to get rid of this doubt.
– Calak
20 hours ago












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










You are doing a lot of unnecessary work. Why split each row only to join it with single spaces and then split it again by those single spaces?



Instead, here is a list comprehension that should do the same thing:



def get_workload(file_name="workload.txt"):
with open(file_name) as f:
return [[float(x) for x in row.split()] for row in f]


This uses the fact that files are iterable and when iterating over them you get each row on its own.






share|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    timtti is a new contributor. Be nice, and check out our Code of Conduct.










     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207469%2ftaking-text-from-a-file-and-formatting-it%23new-answer', 'question_page');
    }
    );

    Post as a guest
































    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    You are doing a lot of unnecessary work. Why split each row only to join it with single spaces and then split it again by those single spaces?



    Instead, here is a list comprehension that should do the same thing:



    def get_workload(file_name="workload.txt"):
    with open(file_name) as f:
    return [[float(x) for x in row.split()] for row in f]


    This uses the fact that files are iterable and when iterating over them you get each row on its own.






    share|improve this answer

























      up vote
      1
      down vote



      accepted










      You are doing a lot of unnecessary work. Why split each row only to join it with single spaces and then split it again by those single spaces?



      Instead, here is a list comprehension that should do the same thing:



      def get_workload(file_name="workload.txt"):
      with open(file_name) as f:
      return [[float(x) for x in row.split()] for row in f]


      This uses the fact that files are iterable and when iterating over them you get each row on its own.






      share|improve this answer























        up vote
        1
        down vote



        accepted







        up vote
        1
        down vote



        accepted






        You are doing a lot of unnecessary work. Why split each row only to join it with single spaces and then split it again by those single spaces?



        Instead, here is a list comprehension that should do the same thing:



        def get_workload(file_name="workload.txt"):
        with open(file_name) as f:
        return [[float(x) for x in row.split()] for row in f]


        This uses the fact that files are iterable and when iterating over them you get each row on its own.






        share|improve this answer












        You are doing a lot of unnecessary work. Why split each row only to join it with single spaces and then split it again by those single spaces?



        Instead, here is a list comprehension that should do the same thing:



        def get_workload(file_name="workload.txt"):
        with open(file_name) as f:
        return [[float(x) for x in row.split()] for row in f]


        This uses the fact that files are iterable and when iterating over them you get each row on its own.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 16 hours ago









        Graipher

        21.8k53183




        21.8k53183






















            timtti is a new contributor. Be nice, and check out our Code of Conduct.










             

            draft saved


            draft discarded


















            timtti is a new contributor. Be nice, and check out our Code of Conduct.













            timtti is a new contributor. Be nice, and check out our Code of Conduct.












            timtti is a new contributor. Be nice, and check out our Code of Conduct.















             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f207469%2ftaking-text-from-a-file-and-formatting-it%23new-answer', 'question_page');
            }
            );

            Post as a guest




















































































            Popular posts from this blog

            How to make a Squid Proxy server?

            Is this a new Fibonacci Identity?

            19世紀