Reducing the amount of List in a WebScraper












3














At the moment, I'm learning and experimenting on the use of web scraping content from different varieties of web pages. But I've come across a common smelly code among several of my applications. I have many repetitive List that has data being append to them.



from requests import get
import requests
import json
from time import sleep
import pandas as pd

url = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'
list_name =
list_price =
list_discount =
list_stock =

response = get(url)
json_data = response.json()


def getShockingSales():
index = 0
if response.status_code is 200:
print('Response: ' + 'OK')
else:
print('Unable to access')
total_flashsale = len(json_data['data']['items'])
total_flashsale -= 1
for i in range(index, total_flashsale):
print('Getting data from site... please wait a few seconds')
while i <= total_flashsale:
flash_name = json_data['data']['items'][i]['name']
flash_price = json_data['data']['items'][i]['price']
flash_discount = json_data['data']['items'][i]['discount']
flash_stock = json_data['data']['items'][i]['stock']
list_name.append(flash_name)
list_price.append(flash_price)
list_discount.append(flash_discount)
list_stock.append(flash_stock)
sleep(0.5)
i += 1
if i > total_flashsale:
print('Task is completed...')
return

getShockingSales()
new_panda = pd.DataFrame({'Name': list_name, 'Price': list_price,
'Discount': list_discount, 'Stock Available': list_stock})

print('Converting to Panda Frame....')
sleep(5)
print(new_panda)


Would one list be more than sufficient? Am I approaching this wrongly.










share|improve this question









New contributor




Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

























    3














    At the moment, I'm learning and experimenting on the use of web scraping content from different varieties of web pages. But I've come across a common smelly code among several of my applications. I have many repetitive List that has data being append to them.



    from requests import get
    import requests
    import json
    from time import sleep
    import pandas as pd

    url = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'
    list_name =
    list_price =
    list_discount =
    list_stock =

    response = get(url)
    json_data = response.json()


    def getShockingSales():
    index = 0
    if response.status_code is 200:
    print('Response: ' + 'OK')
    else:
    print('Unable to access')
    total_flashsale = len(json_data['data']['items'])
    total_flashsale -= 1
    for i in range(index, total_flashsale):
    print('Getting data from site... please wait a few seconds')
    while i <= total_flashsale:
    flash_name = json_data['data']['items'][i]['name']
    flash_price = json_data['data']['items'][i]['price']
    flash_discount = json_data['data']['items'][i]['discount']
    flash_stock = json_data['data']['items'][i]['stock']
    list_name.append(flash_name)
    list_price.append(flash_price)
    list_discount.append(flash_discount)
    list_stock.append(flash_stock)
    sleep(0.5)
    i += 1
    if i > total_flashsale:
    print('Task is completed...')
    return

    getShockingSales()
    new_panda = pd.DataFrame({'Name': list_name, 'Price': list_price,
    'Discount': list_discount, 'Stock Available': list_stock})

    print('Converting to Panda Frame....')
    sleep(5)
    print(new_panda)


    Would one list be more than sufficient? Am I approaching this wrongly.










    share|improve this question









    New contributor




    Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      3












      3








      3







      At the moment, I'm learning and experimenting on the use of web scraping content from different varieties of web pages. But I've come across a common smelly code among several of my applications. I have many repetitive List that has data being append to them.



      from requests import get
      import requests
      import json
      from time import sleep
      import pandas as pd

      url = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'
      list_name =
      list_price =
      list_discount =
      list_stock =

      response = get(url)
      json_data = response.json()


      def getShockingSales():
      index = 0
      if response.status_code is 200:
      print('Response: ' + 'OK')
      else:
      print('Unable to access')
      total_flashsale = len(json_data['data']['items'])
      total_flashsale -= 1
      for i in range(index, total_flashsale):
      print('Getting data from site... please wait a few seconds')
      while i <= total_flashsale:
      flash_name = json_data['data']['items'][i]['name']
      flash_price = json_data['data']['items'][i]['price']
      flash_discount = json_data['data']['items'][i]['discount']
      flash_stock = json_data['data']['items'][i]['stock']
      list_name.append(flash_name)
      list_price.append(flash_price)
      list_discount.append(flash_discount)
      list_stock.append(flash_stock)
      sleep(0.5)
      i += 1
      if i > total_flashsale:
      print('Task is completed...')
      return

      getShockingSales()
      new_panda = pd.DataFrame({'Name': list_name, 'Price': list_price,
      'Discount': list_discount, 'Stock Available': list_stock})

      print('Converting to Panda Frame....')
      sleep(5)
      print(new_panda)


      Would one list be more than sufficient? Am I approaching this wrongly.










      share|improve this question









      New contributor




      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      At the moment, I'm learning and experimenting on the use of web scraping content from different varieties of web pages. But I've come across a common smelly code among several of my applications. I have many repetitive List that has data being append to them.



      from requests import get
      import requests
      import json
      from time import sleep
      import pandas as pd

      url = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'
      list_name =
      list_price =
      list_discount =
      list_stock =

      response = get(url)
      json_data = response.json()


      def getShockingSales():
      index = 0
      if response.status_code is 200:
      print('Response: ' + 'OK')
      else:
      print('Unable to access')
      total_flashsale = len(json_data['data']['items'])
      total_flashsale -= 1
      for i in range(index, total_flashsale):
      print('Getting data from site... please wait a few seconds')
      while i <= total_flashsale:
      flash_name = json_data['data']['items'][i]['name']
      flash_price = json_data['data']['items'][i]['price']
      flash_discount = json_data['data']['items'][i]['discount']
      flash_stock = json_data['data']['items'][i]['stock']
      list_name.append(flash_name)
      list_price.append(flash_price)
      list_discount.append(flash_discount)
      list_stock.append(flash_stock)
      sleep(0.5)
      i += 1
      if i > total_flashsale:
      print('Task is completed...')
      return

      getShockingSales()
      new_panda = pd.DataFrame({'Name': list_name, 'Price': list_price,
      'Discount': list_discount, 'Stock Available': list_stock})

      print('Converting to Panda Frame....')
      sleep(5)
      print(new_panda)


      Would one list be more than sufficient? Am I approaching this wrongly.







      python python-3.x json






      share|improve this question









      New contributor




      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.











      share|improve this question









      New contributor




      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      share|improve this question




      share|improve this question








      edited 20 hours ago







      Minial













      New contributor




      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.









      asked 23 hours ago









      MinialMinial

      185




      185




      New contributor




      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





      New contributor





      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






      Minial is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.






















          2 Answers
          2






          active

          oldest

          votes


















          3














          Review




          1. Remove unnecessary imports


          2. Don't work in the global namespace



            This makes it harder to track bugs



          3. constants (url) should be UPPER_SNAKE_CASE


          4. Functions (getShockingSales()) should be lower_snake_case


          5. You don't break or return when an invalid status is encountered



          6. if response.status_code is 200: should be == instead of is



            There is a function for this though



            response.raise_for_status() this will create an exception when there is an 4xx, 5xx status




          7. Why use a while inside the for and return when finished with the while



            This is really odd!
            Either loop with a for or a while, not both! Because the while currently disregards the for loop.



            I suggest to stick with for loops, Python excels at readable for loops



            (Loop like a native)





          Would one list be more than sufficient? Am I approaching this wrongly.




          Yes.



          You don't have the use 4 separate lists, but can instead create one list and add the column names afterwards.



          Code



          from requests import get
          import pandas as pd

          URL = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'

          def get_stocking_sales():
          response = get(URL)
          response.raise_for_status()
          return [
          (item['name'], item['price'], item['discount'], item['stock'])
          for item in response.json()['data']['items']
          ]

          def create_pd():
          return pd.DataFrame(
          get_stocking_sales(),
          columns=['Name', 'Price', 'Discount', 'Stock']
          )

          if __name__ == '__main__':
          print(create_pd())





          share|improve this answer





















          • Thank you for showing where and what I did wrong and where I can improve and also making them much cleaner! I've followed what you've said and never knew about the if __name__ == '__main__': concept. Really; not only did you help ~ but I've learned more from your insight. Thank you so much~
            – Minial
            4 hours ago



















          4














          Review




          1. Creating functions that read and modify global variables is not a good idea, for example if someone wants to reuse your function, they won't know about side effects.


          2. index is not useful, and range(0, n) is the same as range(n)


          3. Using == is more appropriate than is in general, hence response.status_code == 200


          4. If response.status_code != 200, I think the function should ~return an empty result~ raise an exception like said by @Ludisposed.


          5. You use json_data["data"]["items"] a lot, you could define items = json_data["data"]["items"] instead, but see below.


          6. Your usage of i is totally messy. Never use both for and while on the same variable. I think you just want to get the information for each item. So just use for item in json_data["data"]["items"]:.


          7. Actually, print("Getting data from site... please wait a few seconds") is wrong as you got the data at response = get(url). Also, sleep(0.5) and sleep(5) don't make any sense.


          8. Speaking from this, requests.get is more explicit.


          9. You can actually create a pandas DataFrame directly from a list of dictionaries.


          10. Actually, if you don't use the response in another place, you can use the url as an argument of the function.


          11. Putting spaces in column names of a DataFrame is not a good idea. It removes the possibility to access the column named stock (for example) with df.stock. If you still want that, you can use pandas.DataFrame.rename


          12. You don't need to import json.


          13. The discounts are given as strings like "59%". I think integers are preferable if you want to perform computations on them. I used df.discount = df.discount.apply(lambda s: int(s[:-1])) to perform this.



          14. Optional: you might want to use logging instead of printing everything. Or at least print to stderr with:



            from sys import stderr



            print('Information', file=stderr)




          Code



          import requests
          import pandas as pd


          def getShockingSales(url):
          response = requests.get(url)
          columns = ["name", "price", "discount", "stock"]
          response.raise_for_status()
          print("Response: OK")
          json_data = response.json()
          df = pd.DataFrame(json_data["data"]["items"])[columns]
          df.discount = df.discount.apply(lambda s: int(s[:-1]))
          print("Task is completed...")
          return df


          URL = "https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true"
          df = getShockingSales(URL)





          share|improve this answer








          New contributor




          Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.


















          • Thank you for your insight~ I've learned more than I could hope for by reading your review. It even helped me solved and fixed a few errors in other areas of my application. I wish I could give you more upvotes v.v
            – Minial
            4 hours ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });






          Minial is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211164%2freducing-the-amount-of-list-in-a-webscraper%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          3














          Review




          1. Remove unnecessary imports


          2. Don't work in the global namespace



            This makes it harder to track bugs



          3. constants (url) should be UPPER_SNAKE_CASE


          4. Functions (getShockingSales()) should be lower_snake_case


          5. You don't break or return when an invalid status is encountered



          6. if response.status_code is 200: should be == instead of is



            There is a function for this though



            response.raise_for_status() this will create an exception when there is an 4xx, 5xx status




          7. Why use a while inside the for and return when finished with the while



            This is really odd!
            Either loop with a for or a while, not both! Because the while currently disregards the for loop.



            I suggest to stick with for loops, Python excels at readable for loops



            (Loop like a native)





          Would one list be more than sufficient? Am I approaching this wrongly.




          Yes.



          You don't have the use 4 separate lists, but can instead create one list and add the column names afterwards.



          Code



          from requests import get
          import pandas as pd

          URL = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'

          def get_stocking_sales():
          response = get(URL)
          response.raise_for_status()
          return [
          (item['name'], item['price'], item['discount'], item['stock'])
          for item in response.json()['data']['items']
          ]

          def create_pd():
          return pd.DataFrame(
          get_stocking_sales(),
          columns=['Name', 'Price', 'Discount', 'Stock']
          )

          if __name__ == '__main__':
          print(create_pd())





          share|improve this answer





















          • Thank you for showing where and what I did wrong and where I can improve and also making them much cleaner! I've followed what you've said and never knew about the if __name__ == '__main__': concept. Really; not only did you help ~ but I've learned more from your insight. Thank you so much~
            – Minial
            4 hours ago
















          3














          Review




          1. Remove unnecessary imports


          2. Don't work in the global namespace



            This makes it harder to track bugs



          3. constants (url) should be UPPER_SNAKE_CASE


          4. Functions (getShockingSales()) should be lower_snake_case


          5. You don't break or return when an invalid status is encountered



          6. if response.status_code is 200: should be == instead of is



            There is a function for this though



            response.raise_for_status() this will create an exception when there is an 4xx, 5xx status




          7. Why use a while inside the for and return when finished with the while



            This is really odd!
            Either loop with a for or a while, not both! Because the while currently disregards the for loop.



            I suggest to stick with for loops, Python excels at readable for loops



            (Loop like a native)





          Would one list be more than sufficient? Am I approaching this wrongly.




          Yes.



          You don't have the use 4 separate lists, but can instead create one list and add the column names afterwards.



          Code



          from requests import get
          import pandas as pd

          URL = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'

          def get_stocking_sales():
          response = get(URL)
          response.raise_for_status()
          return [
          (item['name'], item['price'], item['discount'], item['stock'])
          for item in response.json()['data']['items']
          ]

          def create_pd():
          return pd.DataFrame(
          get_stocking_sales(),
          columns=['Name', 'Price', 'Discount', 'Stock']
          )

          if __name__ == '__main__':
          print(create_pd())





          share|improve this answer





















          • Thank you for showing where and what I did wrong and where I can improve and also making them much cleaner! I've followed what you've said and never knew about the if __name__ == '__main__': concept. Really; not only did you help ~ but I've learned more from your insight. Thank you so much~
            – Minial
            4 hours ago














          3












          3








          3






          Review




          1. Remove unnecessary imports


          2. Don't work in the global namespace



            This makes it harder to track bugs



          3. constants (url) should be UPPER_SNAKE_CASE


          4. Functions (getShockingSales()) should be lower_snake_case


          5. You don't break or return when an invalid status is encountered



          6. if response.status_code is 200: should be == instead of is



            There is a function for this though



            response.raise_for_status() this will create an exception when there is an 4xx, 5xx status




          7. Why use a while inside the for and return when finished with the while



            This is really odd!
            Either loop with a for or a while, not both! Because the while currently disregards the for loop.



            I suggest to stick with for loops, Python excels at readable for loops



            (Loop like a native)





          Would one list be more than sufficient? Am I approaching this wrongly.




          Yes.



          You don't have the use 4 separate lists, but can instead create one list and add the column names afterwards.



          Code



          from requests import get
          import pandas as pd

          URL = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'

          def get_stocking_sales():
          response = get(URL)
          response.raise_for_status()
          return [
          (item['name'], item['price'], item['discount'], item['stock'])
          for item in response.json()['data']['items']
          ]

          def create_pd():
          return pd.DataFrame(
          get_stocking_sales(),
          columns=['Name', 'Price', 'Discount', 'Stock']
          )

          if __name__ == '__main__':
          print(create_pd())





          share|improve this answer












          Review




          1. Remove unnecessary imports


          2. Don't work in the global namespace



            This makes it harder to track bugs



          3. constants (url) should be UPPER_SNAKE_CASE


          4. Functions (getShockingSales()) should be lower_snake_case


          5. You don't break or return when an invalid status is encountered



          6. if response.status_code is 200: should be == instead of is



            There is a function for this though



            response.raise_for_status() this will create an exception when there is an 4xx, 5xx status




          7. Why use a while inside the for and return when finished with the while



            This is really odd!
            Either loop with a for or a while, not both! Because the while currently disregards the for loop.



            I suggest to stick with for loops, Python excels at readable for loops



            (Loop like a native)





          Would one list be more than sufficient? Am I approaching this wrongly.




          Yes.



          You don't have the use 4 separate lists, but can instead create one list and add the column names afterwards.



          Code



          from requests import get
          import pandas as pd

          URL = 'https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true'

          def get_stocking_sales():
          response = get(URL)
          response.raise_for_status()
          return [
          (item['name'], item['price'], item['discount'], item['stock'])
          for item in response.json()['data']['items']
          ]

          def create_pd():
          return pd.DataFrame(
          get_stocking_sales(),
          columns=['Name', 'Price', 'Discount', 'Stock']
          )

          if __name__ == '__main__':
          print(create_pd())






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 20 hours ago









          LudisposedLudisposed

          7,16721959




          7,16721959












          • Thank you for showing where and what I did wrong and where I can improve and also making them much cleaner! I've followed what you've said and never knew about the if __name__ == '__main__': concept. Really; not only did you help ~ but I've learned more from your insight. Thank you so much~
            – Minial
            4 hours ago


















          • Thank you for showing where and what I did wrong and where I can improve and also making them much cleaner! I've followed what you've said and never knew about the if __name__ == '__main__': concept. Really; not only did you help ~ but I've learned more from your insight. Thank you so much~
            – Minial
            4 hours ago
















          Thank you for showing where and what I did wrong and where I can improve and also making them much cleaner! I've followed what you've said and never knew about the if __name__ == '__main__': concept. Really; not only did you help ~ but I've learned more from your insight. Thank you so much~
          – Minial
          4 hours ago




          Thank you for showing where and what I did wrong and where I can improve and also making them much cleaner! I've followed what you've said and never knew about the if __name__ == '__main__': concept. Really; not only did you help ~ but I've learned more from your insight. Thank you so much~
          – Minial
          4 hours ago













          4














          Review




          1. Creating functions that read and modify global variables is not a good idea, for example if someone wants to reuse your function, they won't know about side effects.


          2. index is not useful, and range(0, n) is the same as range(n)


          3. Using == is more appropriate than is in general, hence response.status_code == 200


          4. If response.status_code != 200, I think the function should ~return an empty result~ raise an exception like said by @Ludisposed.


          5. You use json_data["data"]["items"] a lot, you could define items = json_data["data"]["items"] instead, but see below.


          6. Your usage of i is totally messy. Never use both for and while on the same variable. I think you just want to get the information for each item. So just use for item in json_data["data"]["items"]:.


          7. Actually, print("Getting data from site... please wait a few seconds") is wrong as you got the data at response = get(url). Also, sleep(0.5) and sleep(5) don't make any sense.


          8. Speaking from this, requests.get is more explicit.


          9. You can actually create a pandas DataFrame directly from a list of dictionaries.


          10. Actually, if you don't use the response in another place, you can use the url as an argument of the function.


          11. Putting spaces in column names of a DataFrame is not a good idea. It removes the possibility to access the column named stock (for example) with df.stock. If you still want that, you can use pandas.DataFrame.rename


          12. You don't need to import json.


          13. The discounts are given as strings like "59%". I think integers are preferable if you want to perform computations on them. I used df.discount = df.discount.apply(lambda s: int(s[:-1])) to perform this.



          14. Optional: you might want to use logging instead of printing everything. Or at least print to stderr with:



            from sys import stderr



            print('Information', file=stderr)




          Code



          import requests
          import pandas as pd


          def getShockingSales(url):
          response = requests.get(url)
          columns = ["name", "price", "discount", "stock"]
          response.raise_for_status()
          print("Response: OK")
          json_data = response.json()
          df = pd.DataFrame(json_data["data"]["items"])[columns]
          df.discount = df.discount.apply(lambda s: int(s[:-1]))
          print("Task is completed...")
          return df


          URL = "https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true"
          df = getShockingSales(URL)





          share|improve this answer








          New contributor




          Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.


















          • Thank you for your insight~ I've learned more than I could hope for by reading your review. It even helped me solved and fixed a few errors in other areas of my application. I wish I could give you more upvotes v.v
            – Minial
            4 hours ago
















          4














          Review




          1. Creating functions that read and modify global variables is not a good idea, for example if someone wants to reuse your function, they won't know about side effects.


          2. index is not useful, and range(0, n) is the same as range(n)


          3. Using == is more appropriate than is in general, hence response.status_code == 200


          4. If response.status_code != 200, I think the function should ~return an empty result~ raise an exception like said by @Ludisposed.


          5. You use json_data["data"]["items"] a lot, you could define items = json_data["data"]["items"] instead, but see below.


          6. Your usage of i is totally messy. Never use both for and while on the same variable. I think you just want to get the information for each item. So just use for item in json_data["data"]["items"]:.


          7. Actually, print("Getting data from site... please wait a few seconds") is wrong as you got the data at response = get(url). Also, sleep(0.5) and sleep(5) don't make any sense.


          8. Speaking from this, requests.get is more explicit.


          9. You can actually create a pandas DataFrame directly from a list of dictionaries.


          10. Actually, if you don't use the response in another place, you can use the url as an argument of the function.


          11. Putting spaces in column names of a DataFrame is not a good idea. It removes the possibility to access the column named stock (for example) with df.stock. If you still want that, you can use pandas.DataFrame.rename


          12. You don't need to import json.


          13. The discounts are given as strings like "59%". I think integers are preferable if you want to perform computations on them. I used df.discount = df.discount.apply(lambda s: int(s[:-1])) to perform this.



          14. Optional: you might want to use logging instead of printing everything. Or at least print to stderr with:



            from sys import stderr



            print('Information', file=stderr)




          Code



          import requests
          import pandas as pd


          def getShockingSales(url):
          response = requests.get(url)
          columns = ["name", "price", "discount", "stock"]
          response.raise_for_status()
          print("Response: OK")
          json_data = response.json()
          df = pd.DataFrame(json_data["data"]["items"])[columns]
          df.discount = df.discount.apply(lambda s: int(s[:-1]))
          print("Task is completed...")
          return df


          URL = "https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true"
          df = getShockingSales(URL)





          share|improve this answer








          New contributor




          Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.


















          • Thank you for your insight~ I've learned more than I could hope for by reading your review. It even helped me solved and fixed a few errors in other areas of my application. I wish I could give you more upvotes v.v
            – Minial
            4 hours ago














          4












          4








          4






          Review




          1. Creating functions that read and modify global variables is not a good idea, for example if someone wants to reuse your function, they won't know about side effects.


          2. index is not useful, and range(0, n) is the same as range(n)


          3. Using == is more appropriate than is in general, hence response.status_code == 200


          4. If response.status_code != 200, I think the function should ~return an empty result~ raise an exception like said by @Ludisposed.


          5. You use json_data["data"]["items"] a lot, you could define items = json_data["data"]["items"] instead, but see below.


          6. Your usage of i is totally messy. Never use both for and while on the same variable. I think you just want to get the information for each item. So just use for item in json_data["data"]["items"]:.


          7. Actually, print("Getting data from site... please wait a few seconds") is wrong as you got the data at response = get(url). Also, sleep(0.5) and sleep(5) don't make any sense.


          8. Speaking from this, requests.get is more explicit.


          9. You can actually create a pandas DataFrame directly from a list of dictionaries.


          10. Actually, if you don't use the response in another place, you can use the url as an argument of the function.


          11. Putting spaces in column names of a DataFrame is not a good idea. It removes the possibility to access the column named stock (for example) with df.stock. If you still want that, you can use pandas.DataFrame.rename


          12. You don't need to import json.


          13. The discounts are given as strings like "59%". I think integers are preferable if you want to perform computations on them. I used df.discount = df.discount.apply(lambda s: int(s[:-1])) to perform this.



          14. Optional: you might want to use logging instead of printing everything. Or at least print to stderr with:



            from sys import stderr



            print('Information', file=stderr)




          Code



          import requests
          import pandas as pd


          def getShockingSales(url):
          response = requests.get(url)
          columns = ["name", "price", "discount", "stock"]
          response.raise_for_status()
          print("Response: OK")
          json_data = response.json()
          df = pd.DataFrame(json_data["data"]["items"])[columns]
          df.discount = df.discount.apply(lambda s: int(s[:-1]))
          print("Task is completed...")
          return df


          URL = "https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true"
          df = getShockingSales(URL)





          share|improve this answer








          New contributor




          Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          Review




          1. Creating functions that read and modify global variables is not a good idea, for example if someone wants to reuse your function, they won't know about side effects.


          2. index is not useful, and range(0, n) is the same as range(n)


          3. Using == is more appropriate than is in general, hence response.status_code == 200


          4. If response.status_code != 200, I think the function should ~return an empty result~ raise an exception like said by @Ludisposed.


          5. You use json_data["data"]["items"] a lot, you could define items = json_data["data"]["items"] instead, but see below.


          6. Your usage of i is totally messy. Never use both for and while on the same variable. I think you just want to get the information for each item. So just use for item in json_data["data"]["items"]:.


          7. Actually, print("Getting data from site... please wait a few seconds") is wrong as you got the data at response = get(url). Also, sleep(0.5) and sleep(5) don't make any sense.


          8. Speaking from this, requests.get is more explicit.


          9. You can actually create a pandas DataFrame directly from a list of dictionaries.


          10. Actually, if you don't use the response in another place, you can use the url as an argument of the function.


          11. Putting spaces in column names of a DataFrame is not a good idea. It removes the possibility to access the column named stock (for example) with df.stock. If you still want that, you can use pandas.DataFrame.rename


          12. You don't need to import json.


          13. The discounts are given as strings like "59%". I think integers are preferable if you want to perform computations on them. I used df.discount = df.discount.apply(lambda s: int(s[:-1])) to perform this.



          14. Optional: you might want to use logging instead of printing everything. Or at least print to stderr with:



            from sys import stderr



            print('Information', file=stderr)




          Code



          import requests
          import pandas as pd


          def getShockingSales(url):
          response = requests.get(url)
          columns = ["name", "price", "discount", "stock"]
          response.raise_for_status()
          print("Response: OK")
          json_data = response.json()
          df = pd.DataFrame(json_data["data"]["items"])[columns]
          df.discount = df.discount.apply(lambda s: int(s[:-1]))
          print("Task is completed...")
          return df


          URL = "https://shopee.com.my/api/v2/flash_sale/get_items?offset=0&limit=16&filter_soldout=true"
          df = getShockingSales(URL)






          share|improve this answer








          New contributor




          Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          share|improve this answer



          share|improve this answer






          New contributor




          Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.









          answered 20 hours ago









          LaboLabo

          1614




          1614




          New contributor




          Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.





          New contributor





          Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.






          Labo is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
          Check out our Code of Conduct.












          • Thank you for your insight~ I've learned more than I could hope for by reading your review. It even helped me solved and fixed a few errors in other areas of my application. I wish I could give you more upvotes v.v
            – Minial
            4 hours ago


















          • Thank you for your insight~ I've learned more than I could hope for by reading your review. It even helped me solved and fixed a few errors in other areas of my application. I wish I could give you more upvotes v.v
            – Minial
            4 hours ago
















          Thank you for your insight~ I've learned more than I could hope for by reading your review. It even helped me solved and fixed a few errors in other areas of my application. I wish I could give you more upvotes v.v
          – Minial
          4 hours ago




          Thank you for your insight~ I've learned more than I could hope for by reading your review. It even helped me solved and fixed a few errors in other areas of my application. I wish I could give you more upvotes v.v
          – Minial
          4 hours ago










          Minial is a new contributor. Be nice, and check out our Code of Conduct.










          draft saved

          draft discarded


















          Minial is a new contributor. Be nice, and check out our Code of Conduct.













          Minial is a new contributor. Be nice, and check out our Code of Conduct.












          Minial is a new contributor. Be nice, and check out our Code of Conduct.
















          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f211164%2freducing-the-amount-of-list-in-a-webscraper%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to make a Squid Proxy server?

          Is this a new Fibonacci Identity?

          19世紀