Find a first non NaN value in Pandas












1















I have a Pandas dataframe such that



|user_id|value|No|
|:-:|:-:|:-:|
|id1|100|1|
|id1|200|2|
|id1|250|3|
|id2|NaN|1|
|id2|100|2|
|id3|400|1|
|id3|NaN|2|
|id3|200|3|
|id4|NaN|1|
|id4|NaN|2|
|id4|300|3|.


Then I want the folloing dataset:



|user_id|value|No|NewNo|
|:-:|:-:|:-:|:-:|
|id1|100|1|1|
|id1|200|2|2|
|id1|250|3|3|
|id2|100|2|1|
|id3|400|1|1|
|id3|NaN|2|2|
|id3|200|3|3|
|id4|300|3|1|


namely, I want to delete NaN values such that the first value of user_id is not NaN value. Thank you.










share|improve this question















migrated from superuser.com Jan 22 at 2:11


This question came from our site for computer enthusiasts and power users.























    1















    I have a Pandas dataframe such that



    |user_id|value|No|
    |:-:|:-:|:-:|
    |id1|100|1|
    |id1|200|2|
    |id1|250|3|
    |id2|NaN|1|
    |id2|100|2|
    |id3|400|1|
    |id3|NaN|2|
    |id3|200|3|
    |id4|NaN|1|
    |id4|NaN|2|
    |id4|300|3|.


    Then I want the folloing dataset:



    |user_id|value|No|NewNo|
    |:-:|:-:|:-:|:-:|
    |id1|100|1|1|
    |id1|200|2|2|
    |id1|250|3|3|
    |id2|100|2|1|
    |id3|400|1|1|
    |id3|NaN|2|2|
    |id3|200|3|3|
    |id4|300|3|1|


    namely, I want to delete NaN values such that the first value of user_id is not NaN value. Thank you.










    share|improve this question















    migrated from superuser.com Jan 22 at 2:11


    This question came from our site for computer enthusiasts and power users.





















      1












      1








      1


      0






      I have a Pandas dataframe such that



      |user_id|value|No|
      |:-:|:-:|:-:|
      |id1|100|1|
      |id1|200|2|
      |id1|250|3|
      |id2|NaN|1|
      |id2|100|2|
      |id3|400|1|
      |id3|NaN|2|
      |id3|200|3|
      |id4|NaN|1|
      |id4|NaN|2|
      |id4|300|3|.


      Then I want the folloing dataset:



      |user_id|value|No|NewNo|
      |:-:|:-:|:-:|:-:|
      |id1|100|1|1|
      |id1|200|2|2|
      |id1|250|3|3|
      |id2|100|2|1|
      |id3|400|1|1|
      |id3|NaN|2|2|
      |id3|200|3|3|
      |id4|300|3|1|


      namely, I want to delete NaN values such that the first value of user_id is not NaN value. Thank you.










      share|improve this question
















      I have a Pandas dataframe such that



      |user_id|value|No|
      |:-:|:-:|:-:|
      |id1|100|1|
      |id1|200|2|
      |id1|250|3|
      |id2|NaN|1|
      |id2|100|2|
      |id3|400|1|
      |id3|NaN|2|
      |id3|200|3|
      |id4|NaN|1|
      |id4|NaN|2|
      |id4|300|3|.


      Then I want the folloing dataset:



      |user_id|value|No|NewNo|
      |:-:|:-:|:-:|:-:|
      |id1|100|1|1|
      |id1|200|2|2|
      |id1|250|3|3|
      |id2|100|2|1|
      |id3|400|1|1|
      |id3|NaN|2|2|
      |id3|200|3|3|
      |id4|300|3|1|


      namely, I want to delete NaN values such that the first value of user_id is not NaN value. Thank you.







      python pandas pandas-groupby






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 22 at 10:40









      jpp

      100k2161111




      100k2161111










      asked Jan 22 at 1:58









      s_narisawas_narisawa

      62




      62




      migrated from superuser.com Jan 22 at 2:11


      This question came from our site for computer enthusiasts and power users.









      migrated from superuser.com Jan 22 at 2:11


      This question came from our site for computer enthusiasts and power users.


























          2 Answers
          2






          active

          oldest

          votes


















          3














          you can groupby & forward fill the value column. Null values in the transformed data indicate the values from the start for each group that are null. Filter out the rows that are null



          df2 = df[df.groupby('user_id').value.ffill().apply(pd.notnull)].copy()
          # application of copy here creates a new data frame and allows us to assign
          # values to the result (df2). This is needed to create the column `NewNo`
          # in the next & final step
          # df2 outputs:
          user_id value No
          0 'id1' 100.0 1
          1 'id1' 200.0 2
          2 'id1' 250.0 3
          4 'id2' 100.0 2
          5 'id3' 400.0 1
          6 'id3' NaN 2
          7 'id3' 200.0 3
          10 'id4' 300.0 3


          Generate NewNo column using ranking within the group.



          df2['NewNo'] = df2.groupby('user_id').No.rank()

          # df2 outputs:

          user_id value No NewNo
          0 'id1' 100.0 1 1.0
          1 'id1' 200.0 2 2.0
          2 'id1' 250.0 3 3.0
          4 'id2' 100.0 2 1.0
          5 'id3' 400.0 1 1.0
          6 'id3' NaN 2 2.0
          7 'id3' 200.0 3 3.0
          10 'id4' 300.0 3 1.0





          share|improve this answer































            0















            groupby + first_valid_index + cumcount



            You can calculate indices for first non-null values by group, then use Boolean indexing:



            # use transform to align groupwise first_valid_index with dataframe
            firsts = df.groupby('user_id')['value'].transform(pd.Series.first_valid_index)

            # apply Boolean filter
            res = df[df.index >= firsts]

            # use groupby + cumcount to add groupwise labels
            res['NewNo'] = res.groupby('user_id').cumcount() + 1

            print(res)

            user_id value No NewNo
            0 id1 100.0 1 1
            1 id1 200.0 2 2
            2 id1 250.0 3 3
            4 id2 100.0 2 1
            5 id3 400.0 1 1
            6 id3 NaN 2 2
            7 id3 200.0 3 3
            10 id4 300.0 3 1





            share|improve this answer

























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54300303%2ffind-a-first-non-nan-value-in-pandas%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              3














              you can groupby & forward fill the value column. Null values in the transformed data indicate the values from the start for each group that are null. Filter out the rows that are null



              df2 = df[df.groupby('user_id').value.ffill().apply(pd.notnull)].copy()
              # application of copy here creates a new data frame and allows us to assign
              # values to the result (df2). This is needed to create the column `NewNo`
              # in the next & final step
              # df2 outputs:
              user_id value No
              0 'id1' 100.0 1
              1 'id1' 200.0 2
              2 'id1' 250.0 3
              4 'id2' 100.0 2
              5 'id3' 400.0 1
              6 'id3' NaN 2
              7 'id3' 200.0 3
              10 'id4' 300.0 3


              Generate NewNo column using ranking within the group.



              df2['NewNo'] = df2.groupby('user_id').No.rank()

              # df2 outputs:

              user_id value No NewNo
              0 'id1' 100.0 1 1.0
              1 'id1' 200.0 2 2.0
              2 'id1' 250.0 3 3.0
              4 'id2' 100.0 2 1.0
              5 'id3' 400.0 1 1.0
              6 'id3' NaN 2 2.0
              7 'id3' 200.0 3 3.0
              10 'id4' 300.0 3 1.0





              share|improve this answer




























                3














                you can groupby & forward fill the value column. Null values in the transformed data indicate the values from the start for each group that are null. Filter out the rows that are null



                df2 = df[df.groupby('user_id').value.ffill().apply(pd.notnull)].copy()
                # application of copy here creates a new data frame and allows us to assign
                # values to the result (df2). This is needed to create the column `NewNo`
                # in the next & final step
                # df2 outputs:
                user_id value No
                0 'id1' 100.0 1
                1 'id1' 200.0 2
                2 'id1' 250.0 3
                4 'id2' 100.0 2
                5 'id3' 400.0 1
                6 'id3' NaN 2
                7 'id3' 200.0 3
                10 'id4' 300.0 3


                Generate NewNo column using ranking within the group.



                df2['NewNo'] = df2.groupby('user_id').No.rank()

                # df2 outputs:

                user_id value No NewNo
                0 'id1' 100.0 1 1.0
                1 'id1' 200.0 2 2.0
                2 'id1' 250.0 3 3.0
                4 'id2' 100.0 2 1.0
                5 'id3' 400.0 1 1.0
                6 'id3' NaN 2 2.0
                7 'id3' 200.0 3 3.0
                10 'id4' 300.0 3 1.0





                share|improve this answer


























                  3












                  3








                  3







                  you can groupby & forward fill the value column. Null values in the transformed data indicate the values from the start for each group that are null. Filter out the rows that are null



                  df2 = df[df.groupby('user_id').value.ffill().apply(pd.notnull)].copy()
                  # application of copy here creates a new data frame and allows us to assign
                  # values to the result (df2). This is needed to create the column `NewNo`
                  # in the next & final step
                  # df2 outputs:
                  user_id value No
                  0 'id1' 100.0 1
                  1 'id1' 200.0 2
                  2 'id1' 250.0 3
                  4 'id2' 100.0 2
                  5 'id3' 400.0 1
                  6 'id3' NaN 2
                  7 'id3' 200.0 3
                  10 'id4' 300.0 3


                  Generate NewNo column using ranking within the group.



                  df2['NewNo'] = df2.groupby('user_id').No.rank()

                  # df2 outputs:

                  user_id value No NewNo
                  0 'id1' 100.0 1 1.0
                  1 'id1' 200.0 2 2.0
                  2 'id1' 250.0 3 3.0
                  4 'id2' 100.0 2 1.0
                  5 'id3' 400.0 1 1.0
                  6 'id3' NaN 2 2.0
                  7 'id3' 200.0 3 3.0
                  10 'id4' 300.0 3 1.0





                  share|improve this answer













                  you can groupby & forward fill the value column. Null values in the transformed data indicate the values from the start for each group that are null. Filter out the rows that are null



                  df2 = df[df.groupby('user_id').value.ffill().apply(pd.notnull)].copy()
                  # application of copy here creates a new data frame and allows us to assign
                  # values to the result (df2). This is needed to create the column `NewNo`
                  # in the next & final step
                  # df2 outputs:
                  user_id value No
                  0 'id1' 100.0 1
                  1 'id1' 200.0 2
                  2 'id1' 250.0 3
                  4 'id2' 100.0 2
                  5 'id3' 400.0 1
                  6 'id3' NaN 2
                  7 'id3' 200.0 3
                  10 'id4' 300.0 3


                  Generate NewNo column using ranking within the group.



                  df2['NewNo'] = df2.groupby('user_id').No.rank()

                  # df2 outputs:

                  user_id value No NewNo
                  0 'id1' 100.0 1 1.0
                  1 'id1' 200.0 2 2.0
                  2 'id1' 250.0 3 3.0
                  4 'id2' 100.0 2 1.0
                  5 'id3' 400.0 1 1.0
                  6 'id3' NaN 2 2.0
                  7 'id3' 200.0 3 3.0
                  10 'id4' 300.0 3 1.0






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Jan 22 at 2:46









                  Haleemur AliHaleemur Ali

                  12.3k21739




                  12.3k21739

























                      0















                      groupby + first_valid_index + cumcount



                      You can calculate indices for first non-null values by group, then use Boolean indexing:



                      # use transform to align groupwise first_valid_index with dataframe
                      firsts = df.groupby('user_id')['value'].transform(pd.Series.first_valid_index)

                      # apply Boolean filter
                      res = df[df.index >= firsts]

                      # use groupby + cumcount to add groupwise labels
                      res['NewNo'] = res.groupby('user_id').cumcount() + 1

                      print(res)

                      user_id value No NewNo
                      0 id1 100.0 1 1
                      1 id1 200.0 2 2
                      2 id1 250.0 3 3
                      4 id2 100.0 2 1
                      5 id3 400.0 1 1
                      6 id3 NaN 2 2
                      7 id3 200.0 3 3
                      10 id4 300.0 3 1





                      share|improve this answer






























                        0















                        groupby + first_valid_index + cumcount



                        You can calculate indices for first non-null values by group, then use Boolean indexing:



                        # use transform to align groupwise first_valid_index with dataframe
                        firsts = df.groupby('user_id')['value'].transform(pd.Series.first_valid_index)

                        # apply Boolean filter
                        res = df[df.index >= firsts]

                        # use groupby + cumcount to add groupwise labels
                        res['NewNo'] = res.groupby('user_id').cumcount() + 1

                        print(res)

                        user_id value No NewNo
                        0 id1 100.0 1 1
                        1 id1 200.0 2 2
                        2 id1 250.0 3 3
                        4 id2 100.0 2 1
                        5 id3 400.0 1 1
                        6 id3 NaN 2 2
                        7 id3 200.0 3 3
                        10 id4 300.0 3 1





                        share|improve this answer




























                          0












                          0








                          0








                          groupby + first_valid_index + cumcount



                          You can calculate indices for first non-null values by group, then use Boolean indexing:



                          # use transform to align groupwise first_valid_index with dataframe
                          firsts = df.groupby('user_id')['value'].transform(pd.Series.first_valid_index)

                          # apply Boolean filter
                          res = df[df.index >= firsts]

                          # use groupby + cumcount to add groupwise labels
                          res['NewNo'] = res.groupby('user_id').cumcount() + 1

                          print(res)

                          user_id value No NewNo
                          0 id1 100.0 1 1
                          1 id1 200.0 2 2
                          2 id1 250.0 3 3
                          4 id2 100.0 2 1
                          5 id3 400.0 1 1
                          6 id3 NaN 2 2
                          7 id3 200.0 3 3
                          10 id4 300.0 3 1





                          share|improve this answer
















                          groupby + first_valid_index + cumcount



                          You can calculate indices for first non-null values by group, then use Boolean indexing:



                          # use transform to align groupwise first_valid_index with dataframe
                          firsts = df.groupby('user_id')['value'].transform(pd.Series.first_valid_index)

                          # apply Boolean filter
                          res = df[df.index >= firsts]

                          # use groupby + cumcount to add groupwise labels
                          res['NewNo'] = res.groupby('user_id').cumcount() + 1

                          print(res)

                          user_id value No NewNo
                          0 id1 100.0 1 1
                          1 id1 200.0 2 2
                          2 id1 250.0 3 3
                          4 id2 100.0 2 1
                          5 id3 400.0 1 1
                          6 id3 NaN 2 2
                          7 id3 200.0 3 3
                          10 id4 300.0 3 1






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Jan 22 at 10:45

























                          answered Jan 22 at 10:39









                          jppjpp

                          100k2161111




                          100k2161111






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54300303%2ffind-a-first-non-nan-value-in-pandas%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              How to reconfigure Docker Trusted Registry 2.x.x to use CEPH FS mount instead of NFS and other traditional...

                              is 'sed' thread safe

                              How to make a Squid Proxy server?